Ch8 - Confidence Intervals, Effect Size, and Statistical Power Flashcards

1
Q

What are the new statistics?

A
  1. Effect sizes
    1. Confidence intervals
    2. Meta-analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Point estimate

Confidence Intervals

A
  • A summary statistic from a sample that is just one number used as an estimate of the population parameter - “best guess”
  • The true population mean is unknown - and we take a sample from the population to estimate the population mean
  • EX: In studies on gender differences in math performance - the mean for boys, the mean for girls, and the difference between them, are point estimates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Interval estimate

Confidence Intervals

A
  • Based on a sample statistic and provides a range of plausible values for the population parameter
  • Frequently used by the media, often when reporting political polls, and are usually constructed by adding and subtracting a margin of error from a point estimate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the interval estimate composed of (EQUATION)?

Confidence Intervals

A

interval estimate = percentage, + and - the margin of error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Confidence intervals: we’re not saying that we’re confident that the population mean falls in the interval, but rather…

Confidence Intervals

A

we are merely saying that we expect to find the population mean within a certain interval a certain percentage of the time - usually 95% - when we conduct this same study with the same sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Confidence level vs. interval:

Confidence Intervals

A
  • Level - the %
  • Interval - range between the two values that suround the sample mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Calculating confidence intervals with distributions

Confidence Intervals

A
  1. Draw a normal curve that has the sample mean at its center (NOTE: different from curve drawn for z test, where we had population mean at the center)
  2. Indicate the bounds of the confidence interval on the drawing
  3. Determine the z statistics that fall at each line marking the middle of 95%
  4. Turn the z statistics back into raw means
  5. Check that the confidence interval makes sense
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Step 1 to calculating CI

Confidence Intervals

A

Draw a normal curve that has the sample mean at its center (NOTE: different from curve drawn for z test, where we had population mean at the center)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Step 2 to calculating CI

Confidence Intervals

A
  • ** 2: Indicate the bounds of the confidence interval on the drawing**
  • Draw a vertical line from the mean to the top of the curve
  • For a 95% confidence interval we also draw two small vertical lines to indicate the middle 95% of the normal curve (2.5% in each tail, for a total of 5%)
  • The curve is symmetric, so half of the 95% falls above and half falls below the mean
  • Half of 95% = 47.5%, represented in the segments on either sides of the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Step 3 to calculating CI

Confidence Intervals

A

3. Determine the z statistics that fall at each line marking the middle of 95%

  • To do so: turn back to the z table
  • The % between the mean and each of the scores is 47.5% - when we look up this % in the z table, we find a statistic of 1.96
  • Can now add the z statistics of -1.96 and 1.96 to the curve
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Step 4 to calculating CI

Confidence Intervals

A

4. Turn the z statistics back into raw means

  • Need to identify appropriate mean and SD to use formula
  • Two important points to remember:
  • Center the interval around the sample mean (not the population mean), so use the sample mean in the calculation
  • Because we have a sample mean (rather than an individual score), we use a distribution of means - so we calculate standard error as the measure of spread:
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Step 5 to calculating CI

Confidence Intervals

A

5. Check that the confidence interval makes sense
* The sample mean should fall exactly in the middle of the two ends of the interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Statistically significant doesn’t/does mean…

A
  • Does NOT mean that the findings from a study represent a meaningful difference
  • ONLY means that those findings are unlikely to occur, in fact, if the null hypothesis is true
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does an increase in sample size affect SD and the test statistic? What dooes this cause?

The effect of sample size on statistical significance

A
  • Each time we increased the sample size, the SD decreased and the test statistic increased
  • Because of this, a small difference might not be statistically significant with a small sample but might be statistically significant with a large sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why would a large sample allow us to reject the null hypothesis than a small sample? (EXAMPLE)

A

If we randomly selected 5 women and they had a mean score well above the OkCupid average, we might say “it could be chance”; but if we randomly selected 1000 women with a mean rating well above the OkCupid average, it’s very unlikely that we just happened to choose 1000 people with high scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Effect size

A
  • Indicates the size of a difference and is unaffected by sample size
  • Can tell us whether a statistically significant difference might also be an important difference
  • Tells us how much two populations DO NOT overlap - the less overlap, the bigger the effect size
  • DECREASING OVERLAP IS IDEAL!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How can the amount of overlap between two distributions be decreased? TWO WAYS:

A

1: overlap decreases and effect size increases when means are farther apart (distance wise)
2: overlap decreases and effect size increases when variability within each distribution of scores is smaller (height of peak)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How does effect size differ from statistical hypothesis testing?

A

Unlike statistical hypothesis testing, effect size is a standardized measure based on distributions of scores rather than distributions of means
* Rather than om = o/√N, effect sizes are based only on the variability in the distribution of scores and do not depend on sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Since effect sizes are not dependent on sample size, what does this allow us to do?

A

This means we can compare the effect sizes of different studies with each other, even when the studies have different sample sizes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

When we conduct a z-test, the effect size is typically

A

Cohen’s D: a measure of effect size that expresses the difference between two means in terms of SD
* AKA, Cohen’s d is the standardized difference between two means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Formula for Cohen’s d for a z statistic:

A

d = (M - u)/o
- Similar to z statistic (om -> o, um -> u)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

With the results, we can determine (from Cohen’s 3 guidelines)…

Small, Medium, Large Effects

A
  • Small effects: 0.2 | 85% overlap
  • Medium effects: 0.5 | 67% overlap
  • Large effects: 0.8 | 53% overlap
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Does an effect need to be large to be meaningful?

A

Just because a statistically significant difference is small, that does not necessarily suggest no meaning; interpreting the meaningfulness of the effect sizes depends on the context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Meta-analysis:

Meta-analysis

A
  • a study that involves the calculation of a mean effect size from the individual effect sizes of more than one study
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
How do meta-analysis improve statistical power?
By considering multiple studies simultaneously and helps to resolve debates fueled by contradictory research findings
26
4 steps to calculating meta-analyses:
1: select the topic of interest and decide exactly how to proceed before beginning to track down studies 2: locate every study that has been conducted and meets criteria 3: calculate an effect size, often Cohen's d, for every study 4: calculate statistics - ideally, summary statistics, a hypothesis test, a confidence interval, and a visual display of the effect sizes
27
Considerations to keep in mind: ## Footnote 1: select the topic of interest and decide exactly how to proceed before beginning to track down studies
* **Make sure the necessary statistical information is available**, either effect sizes of the summary stats necessary to calculate effect sizes * **Consider selecting only studies in which participants meet certain criteria**, such as age, gender, or geographic location * **Consider eliminating studies based on the research design** (EX: as they were not experimental in nature)
28
Key part involves finding... ## Footnote 2: locate every study that has been conducted and meets criteria
...any studies that have been conducted but not published * Much of this "fugitive literature" or "gray literature" is unpublished simply **because studies did not find a significant difference; the overall effect size seems larger without accounting these studies** - AKA the **"file drawer problem"** * Can find by using other sources - like contacting researchers to find unpublished work
29
"File drawer problem" - 2 solutions ## Footnote 2: locate every study that has been conducted and meets criteria
1: **File drawer analysis**: a statistical calculation, following a meta-analysis, of the number of studies with null results that would have to exist so that a mean effect size would no longer be statistically significant * If just a few studies could render a mean effect size nonsignificant (no longer statistically significantly different from zero) then the mean effect size should be viewed as likely to be an inflated estimate * If it would take several hundred studies in researchers' "file drawers" to render the effect non-significant, then it's safe to conclude that there really is a significant effect 2: Can work with replication to help draw more reliable conclusions
30
What visual display can researchers include? ## Footnote 4: calculate statistics - ideally, summary statistics, a hypothesis test, a confidence interval, and a visual display of the effect sizes
**Forest plot**: type of graph which shoes the confidence interval for the effect size of every study
31
Statistical power is...
...the likelihood of rejecting the null hypothesis WHEN WE SHOULD reject the null hypothesis
32
What is the probability that researchers consider the MINIMUM for conducting a study ## Footnote Statistical power
0.80 - an 80% chance of rejecting the null if we should reject it * Thus, they perform power analysis prior to conducting a study: if they have an 80% chance of correctly rejecting the null, then it's appropriate to conduct the study
33
When we conduct a statistical null hypothesis test, we make a decision to either reject or fail to reject the null hypothesis. One issue being that we don't have direct access to the truth about what we're studying - instead...
* We make inferences based on the data we collected; which could be a right or wrong decision * Overall a researcher's goal is to be correct as often as possible - 2 ways to be right, and 2 ways to be wrong
34
What are 2 ways to be **WRONG** in rejecting/failing to reject the null hypothesis?
2 ways to be wrong - recap: **Type I and Type II errors**
35
What are 2 ways to be **RIGHT** in rejecting/failing to reject the null hypothesis?
1 - **Correct decision:** if the null is true and we fail to reject the null, we have made the correct decision (**essentially leaving the null alone**) - In this case, we're saying that there's no effect, when in fact there is none 2 - **Correct decision (Power)**: if the null hypothesis is false, and we reject the null hypothesis, that's also a correct decision - A goal of research is to maximize statistical power
36
Power is used by statisticians in a specific way - HOW?
* **Statistical power**: a measure of the likelihood that we will reject the null hypothesis, given that the null hypothesis is false * In other words - statistical power is the probability that we will reject the null hypothesis when we should reject the null hypothesis; **THE PROBABILITY THAT WE WILL NOT MAKE A TYPE II ERROR**
37
The calculation of statistical power *ranges* from:
Probability of 0.00 to 1.00 (AKA 0% to 100%)
38
Conceptual calculation for power
* Power = effect size x sample size * This means that we could achieve high power because the size of the effect is large - or we could achieve high power because the size of the effect is small, but it's a large sample
39
The most practical way to increase statistical power for many behavioural studies is...
...to add more participants
40
How can researchers **quantify** the statistical power of their studies? 2 WAYS
1: By referring to a published table 2: By using computing tools like G*Power
41
G*Power
Used in 2 ways: **1: Can calculate power *AFTER* conducting a study from several pieces of information** - Because we are calculating power after conducting the study, G*Power refers to these calculations as **post hoc, meaning after the fact** **2: Can use in reverse, *BEFORE* conducting a study, so as to *identify the sample size necessary* to achieve a given level of power** - In this case, G*Power refers to calculations as a **priori, which means prior to**
42
Of the two, which of post hoc vs priori power is more meaningful?
post hoc power is NOT as meaningful as a priori power calculation for sample size planning
43
On a practical level, statistical power calculations tell researchers...
...how many participants are needed to conduct a study whose findings we can trust
44
Five factors that affect statistical power:
1: Increase alpha 2: Turn a two-tailed hypothesis into a one-tailed hypothesis 3: Increase N/sample size 4: Exaggerate the mean difference between levels of the IV 5: Decrease SD
45
1: Increase alpha ## Footnote Five factors that affect statistical power:
* Like changing the rules by widening the goal posts in football, **statistical power can increase when we increase an alpha level of 0.05** * This has the side effect of increasing the probability of a Type I error from 5% to 10%
46
2: Turn a two-tailed hypothesis into a one-tailed hypothesis ## Footnote Five factors that affect statistical power:
* **One tailed tests provide more statistical power**, while two-tailed tests are more conservative * However, best to use two-tailed
47
3: Increase N/sample size ## Footnote Five factors that affect statistical power:
* Increasing sample size leads to an increase in the test statistic, making it easier to reject the null hypothesis * Increase => distribution of means become more narrow and there is less overlap (larger sample size means smaller standard error)
48
4: Exaggerate the mean difference between levels of the IV ## Footnote Five factors that affect statistical power:
The mean of population 2 is farther from the mean of population in part b) than it is in part a); difference in means is not easily changed, but can be done
49
5: Decrease SD ## Footnote Five factors that affect statistical power:
When SD is smaller, standard error is smaller and the curves are narrower We can reduce SD in two ways: **1: by using *reliable measures* from the beginning of the study 2: by *sampling from a more homogenous group* in which participants' responses are more likely to be similar to begin with**
50
LECTURES
51
CHP8 concepts push beyond the limits of NHST
* Effect size * Confidence intervals * Power
52
Effect size: ## Footnote CHP8 concepts push beyond the limits of NHST
* If the null is really false, **how big is that effect?** * Standardized numerical estimate of the population effect size using our sample data
53
Confidence intervals: ## Footnote CHP8 concepts push beyond the limits of NHST
* Starting with sample mean, **compute a range of plausible values for the true population of the mean** * Helps us prepare for replications
54
Power ## Footnote CHP8 concepts push beyond the limits of NHST
- If the null is really false, **how likely is it that we’re going to find a "significant effect" in our sample** - If the null is really false, how likely is it that we're going to avoid a type II error
55
Effect size: after rejecting the null, we can conclude that...
we think we drew this sample from a different population with a different sampling distribution
56
Effect size - How can we guess the population of the mean we drew from?
Calculating the tallest point in the distribution, most common score
57
What does effect size look like, visually?
Distance from the highest peak of one distribution to the other (distance between group means)
58
How/what does effect size help us estimate?
If we DID draw from a different population, how different is that new population's mean from the null mean?
59
What are some different ways to estimate an effect size for different kinds of data?
* How far away is the true mean from the null hypothesis mean? * How far apart are the experimental and control conditions * Strength of correlation * How far from equal (50% each) is the distribution of proportions
60
Which volume of effect is easiest to detect?
smaller effects are HARDER to detect from drawing a single sample; larger effects (thus, larger Cohen's d) are EASIER to detect
61
What is one of the many "standardized" indicators of effect size?
* COHEN'S D * Estimates the population parameter - **δ** (delta)
62
How many SD away from the comparison value is our sample group mean?
* d = (M-u)/o * NOTE: equation is in-between absolute value bars
63
What can we calculate to answer: "***How likely is it*** that our class is a random sample from this general population? Or do we likely come from a different population?"
We can use a z test or we can use a confidence interval
64
All CI's follow the same pattern…
Subtract margin of error to find lower bound (critical value x standard error), add margin of error to find upper point (critical value x standard error)
65
Type I error
* H0 is reject * H0 is actually true
66
Correct Decision POWER (1- β)
* H0 is reject * H0 is false
67
Type II error
* Fail to reject H0 * H0 is actually false
68
Correct Decision (1 + α)
* Retain H0 * H0 is true
69
What does β mean?
If the null is really false (effect exists), β% of the time we're going to make a mistake and say there's no effect
70
To identify power:
find the % of the curve of the H1 distribution that would lead us to correctly reject the null
71
If effect size increases, what happens to type I error rate?
* NO CHANGE
72
If effect size increases, what happens to type II error rate?
* DECREASES
73
If effect size increases, what happens to power?
* INCREASES
74
In priori power analysis, can ask two questions:
* "If I'm making the assumption that there is an effect to be found, HOW MANY PEOPLE DO I NEED IN MY STUDY?" * "If I'm limited to N participants, will I have enough power to reject the null hypothesis if I should do so?"
75