Lecture notes 3 Flashcards

(16 cards)

1
Q

What is the independent samples t-test?

A

The independent samples t-test evaluates the difference between the means of two groups. That is, we are examining the hypothesis of whether or not the means for two independent groups, such as a treat- ment and a control group, are different from each other. Also referred to as a between-groups design, the minimum tidy dataset contains only two variables—the grouping or independent variable and the de- pendent variable. The grouping variable can be a factor (e.g., “treat- ment” or “control”; “Group A” or “Group B”) or numeric (e.g., 0 or 1). Regardless of how the grouping variable is coded, there can only be two levels of that variable, excluding any missing data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the correct set of hypotheses for the independent groups t-test?

A

For a two-tailed test between a treatment and control group, the null and alternative hypotheses are as follows.
H0 : μt − μc = 0
H1 : μt − μc ̸= 0

The following hypotheses are equivalent expressions.
H0 :μt =μc
H1 :μt ̸=μc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the assumptions for the independent groups t-test?

A

There are three primary assumptions that are made for the stan-dard independent groups t-test. These assumptions are commonly made for all linear models such as ANOVA and multiple regression when the generic t-test is used to make inferences. All of these as- sumptions are made for the residuals—deviations from the group’s population mean or predicted value on the dependent variable.

  1. Independence. The residuals are independent of each other. There is no association or correlation between the residuals.
  2. Normality. The residuals are normally distributed.
  3. Homogeneity of Variance. The residuals are all drawn from the same distribution that has a constant variance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the directional null and alternative hypothesis for an independent groups t-test?

A

The directional null and alternative hypotheses, assuming that the pre- diction is that the treatment would result in higher means than the con- trol group are

H0 : μt − μc ≤ 0
H1 : μt − μc > 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What do we do since we do not have the population parameters for an independent groups t test?

A

Since we do not have the population parameters, we never observe the actual residuals but instead have es- timates of the residuals that we can examine to see to what extent these assumptions appear plausible. The residuals (ε) that we make the as- sumptions for come from the following model for each group.

yi =μ+εi

We can only observe residuals from
the sample mean.

y i = y bar + e i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the resulting assumption from combining the assumptions of the independent groups t-test?

A

Combining these three assumptions results in the assumption that each of the residuals is an independent random observation from the same normal distribution that has variance σ2. In other words, the residuals, commonly denoted as ε, are all independant and identically distributed as ε ∼ N(0, σ).
When these three assumptions are met we can express the test statistic for the independent groups t-test as the formula.

Here y ̄t and y ̄c are the means of the treatment and control groups, ()
respectively, and μt − μc |H0 is the null hypothesis value. This is normally 0 which corresponds to the null hypothesis of no difference between the groups. This simplifies the t-test to the other formula.

Here sp is the pooled standard deviation across the two groups with sample sizes nt and nc and df = nt+nc−2. If all three assumptions are met, then we know that the sampling distribution of the test statistic will correspond exactly with the t-distribution with df = nt + nc − 2. As a consequence, the p-values that we obtain from conducting this analysis will be the correct probabilities of observing the data (or data more extreme), given that H0 is true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is s pooled?

A

sp is the pooled standard deviation. This is calculated as the square root of the weighted average variance estimate across the two groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what happens when we violate the assumption of independence in the independent groups t-test?

A

The assumption of independence is absolutely critical as Type I error rates increase rapidly with even small amounts of dependence. This can be seen in Figure 1 which simulated the impact of depen- dence, indexed by the intraclass correlation (ICC), on the Type I error rate when the null hypothesis was true and the assumptions of nor- mality and homogeneity of variance were met. How might dependence emerge? Imagine if you are running an experiment on the effect of exercise on mood. The dependent variable after treatment (30 minutes of exercise compared to the control of reading travel magazines) is an assessment of positive affect (Rate the extent to which you feel right now…interested, excited, strong, enthusiastic, attentive, etc.). There
are 50 participants in each condition and you have 5 different research assistants who actually conduct the experiment—5 of the 7 dwarfs (Grumpy, Happy, Sleepy, Sneezy, and Dopey) who each run 10 par- ticipants in each of the two conditions. Happy’s mood is contagious
sp is the pooled standard deviation. This is calculated as the square root of the weighted average variance estimate across the two groups.
the independent groups t-test 2

SSt + SSc
sp =
S S c =
y c − y ̄ c 2 i
nt +nc −2
∑ ( )2 SSt= yt−y ̄t
i
∑()
and Happy’s participants, regardless of condition, report high positive affect. In contrast, Grumpy, Sleepy, and Dopey annoy and bore par- ticipants and they report, regardless of condition, lower positive affect. In other words, participants that share a research assistant are more similar than participants with different research assistants. There is a dependence in the experiment that is associated with the RA.
What can we do about this? The solution is easy in the present case—we would include RA as an additional factor in the study anal- ysis and control for these differences. However, this requires that we know which RA ran each participant and examine and include that information as an independent variable in the analysis. There may
be sources of dependence in the dataset that we are not aware of or had not considered. Without knowledge of potential sources of depen- dence, there are no clues, hints, or ways of examining whether or not there exists dependence that we need to consider and model in our analysis. Thinking about potential sources of dependence is something that needs to happen before you run a study and then ensure that any potential source of dependence is then assessed and modeled. Alter- natively you can ensure independence by having just 1 RA run all of the participants and train them carefully to not change their mood or delivery of instructions over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why does the Type I error rate increase as the dependence among the observations increases when you violate the independent assumption of the independent groups t-test?

A

In the present example, if the obser- vations are fully and completely independent (ICC = 0), then
df = 50 + 50 − 2 = 98. However, when the observations are fully dependent (ICC = 1), then all of the observations for a particular
dwarf’s participants in each condition are all the same (e.g., Happy’s participants in the control condition all report 5.6 for positive affect, Grumpy’s participants in the treatment condition all report 1.5, etc.). Then we only have 10 different observations (5 for each dwarf in each condition) and df = 10 − 2 = 8 in reality. However, if we don’t adjust for this, we will think that df = 98 and use that in our estimate of the standard error and have an estimate of the standard error that is much, much too small. This results in a larger test statistic and an increased Type I error rate. When appropriately accounting for the level of dependence in the data using multilevel modeling, functionally you are adjusting the df in your analysis to be 8 ≤ df ≤ 98. The more dependence, the closer to 8 df are actually present in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what happens when we violate the assumption of normality in the independent groups t-test?

A

In general, linear models that meet the assumptions of indepen- dence and homogeneity of variance are relatively robust to the violat- ing the assumption of normality. Robustness in this context means that the p-values obtained are fairly close to the actual probability of obtaining the observed results (or more extreme) when H0 is true.
How do we know if a distribution is normally distributed? There are tests of normality such as the Shapiro-Wilk test. However what is not important is whether or not we conclude a distribution is normal. Large samples will often lead us to conclude the distribution is not normal as statistical power is high when the departure from normality is quite small. In small samples the departure from normality can be severe and the test of normality will not reject the null hypothesis that the data are normally distributed. In other words, what matters is
not a formal test, but the estimate of the departure from normality. This is best done through visual examination of the residuals in your dataset. The way to do this is to examine the Q-Q plot. This method of plotting compares the quantiles on your data against the quantiles of a reference distribution such as the normal distribution. The syntax for this plot is qqnorm(x) where x is the the variable that you wish to plot against the normal distribution. If the observed data correspond well to the normal distribution, the QQplot will correspond well to a straight line as in Figure 2. If the observed data are not normally dis- tributed then we will see departures from normality as in Figure 3. We are able to visually discern departures from linearity quite easily so this quick eye-ball examination of the quantiles is extremely accurate. We examine whether or not the data correspond well to a straight line—some curvature at the tails is often to be expected as the tails
of a distribution are not estimated as precisely as the quantiles in the middle of the distribution (i.e., there is more uncertainty associated with the quantiles at the end of a distribution).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what happens when we violate the assumption of homogeneity of variance in the independent groups t-test?

A

What do we do if we suspect that the two groups may have dif- ferent variances? Just as with the assumption of normality there are formal statistical tests we can apply to test the assumption of homo- geneity (e.g., Levene’s test). However, with small sample sizes you may not have the statistical power to detect a real difference. There
is no reason to use these formal tests and you should ignore them. If you even suspect that there may be a difference between the groups in terms of their variances, you should use a t-test that allows for differ- ences in the variances between the groups. This is done using Welch’s t-test, defined as follows.

Welch’s test uses approximate df = ν, were ν is calculated using the welch-satterwautr

Fractional df, such as that provided by Welch-Satterthwaite Equa- tion, used to require extrapolation between entries in the Tables for critical values in textbooks. However modern programs compute prob- abilities and quantiles for fractional df and such limitations and an- noyances are irrelevant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the impact of analyzing non-normal data in the independent groups t-test?

A

What is the impact of analyzing non-normal data? With the as- sumptions independence and homogenenity of variance met, Type I er- ror rates are very close to the nominal level (α). For instance, a quick simulation with n = 25 shows that, when H0 is true and the residuals come from the same distribution plotted in Figure 3, the proportion of results that are significant with α = .05 is 0.0504566. This estimate
is based on 10, 000, 000 simulations. Relatively minor deviations from normality have no real impact on inferences and are not something to worry about. However, as we will see later, when combined with het- ereoscedasticity and uneven sample sizes, then we need to worry and consider models that account for non-normally distributed data and different variances in the groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what must you do in r when working with independet groups t-test?

A

Welch’s t-test is the default option
in R for the t.test() function. To assume equal variances you have to explicitly make that assumption and specify var.equal = TRUE. This cor- responds to the current perspective of many statisticians that robust statis- tics should be the default approach when examining hypotheses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what happens when you violate assumptions of normality and homogeneity?

A

This is the combination of violations that is concerning and can
lead to dramatic deviations from the nominal Type I error rate and
make your obtained p-value meaningless and uninterpretable—particularly if your sample sizes are unequal. The solution to this problem is to
combine the Welch t-test with trimmed means (Yuen, 1974; for more details see Wilcox, 2012, as well as Keselman et al. (2004)). There
are a number of steps and quantities that need to be calculated to implement this test statistic. First, define γ as the proportion of obser- vations that will be trimmed from each tail of the distribution within each group. Critically, each sample is ordered from lowest to highest before trimming.

  • h = n(1 − 2γ) is the number of observations in each group after trimming.
  • Each mean is trimmed. Define y ̄tt as the trimmed treatment mean and y ̄tc as the trimmed control mean. A trimmed mean is simply the mean of the sample observations after removing the nγ upper and nγ lower observations—the mean on the remaining observa- tions after doing the trimming.
  • s2w is the Winsorized variance estimate. Winsorizing is the process of replacing the lowest trimmed values with the lowest remain-
    ing value in the sample and the highest trimmed values with the highest remaining value in the sample.
  • Suppose we observe a sample of 10 observations (y) and we trim
    γ = .20. That means we will have a trimmed sample (yt) of h = 6 observations after trimming. The Windsorized sample (W ) replaces those lower 2 trimmed observations with 3 (lowest value of the trimmed sample) and the upper two trimmed observations with 6 (largest value of the trimmed sample). The Winsorized variance estimate s2w is simply the variance calculated on the Winsorized sample (W).
  • Yuen (1974) calculates the squared standard error for the trimmed mean as s2 = ns2w . This is the estimate of the variance of the sampling distribution of trimmed means.

Given the above definitions, the test statistic for the difference in the trimmed means, assuming unequal variances in each group, is the formula

This approach looks complicated (and is a bit tedious if you were to do it by hand), but really is just the Welch t-test applied to trimmed
data. We calculate the trimmed means and then the standard errors for each group’s trimmed mean. How much trimming should one en- gage in? The default values are generally γ = .20 so that you keep the middle 60% of the distribution. γ = .10 − .20 seems like reason- able values. Why does this work? Often the issue with nonnormal data are that we have outliers and heavy tails. Trimming removes those tails and provides estimates of the central tendency based on the core central part of the distribution. Outliers and extreme values have no impact on the analysis which is why this approach is robust to violations of normality. Coupled with independent estimates of the variance in each group—and not pooling the variance estimates together—the Yuen-Welch t-test provides an excellent test of the dif- ferences between groups. Current thinking from those that study the performance of different statistical tests (e.g., Wilcox, Keselman) is that these approaches should be the default. For additional reading, see Erceg-Hurn and Mirosevich (2008). The following section makes the argument based on simulation results why robust (e.g., Yuen’s t-test) should be the default option.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

does the choice of test really matter for an independent groups t-test?

A

Which test should I use? Does this choice matter? The answer to
the latter question is indeed, yes. Your answers can be quite differ- ent across the different robust and non-robust methods and analyses you will have at your disposal. How do you choose? What should be the default method? To answer these questions, I conducted a small simulation study—a Monte Carlo simulation study—where different parameters are changed and the impact of those changes is calculated through repeated simulations. In this small simulation we have an independent group design with n = 50 per group and slightly hetero- geneous standard deviations in each group (σ1 = 1.0 and σ2 = 1.2). Then the impact of different t-tests was examined for different true differences between the groups.

What was the impact of the choice of test when the assumptions of independence and normality were met but there was slight variance differences between the two groups? Figure 4 shows that the type
I error rate was basically the same for all 3 groups (when δ = 0).
For false H0s, δ > 0, with the Student and Welch t-test were almost identical in their statistical power and just slightly higher than Yuen’s t-test. Each point in this Figure represents an estimate of statistical power based on 20, 000 simulations for that particular condition. The dark green line for theoretical power represents the calculation based on theory from sources such as power.t.test() in R or the G*Power application. This presumes independence, normality, and homogeneity of variance. That the actual simulated power here is lower for the Student and Welch tests is due to the hetereogeneity in the variances.
Non-normal Distributions. Consider the following plot which illustrates the standard normal distribution and the contaminated normal distribution. The contaminated normal distribution has obser- vations that have a 90% chance of being drawn from the standard nor- mal distribution N(0,1) and a 10% chance of being from the N(0,10) distribution. This gives the contaminated normal distribution heavy tails, but it does not look that bad here when examining just the regular density plot. The QQ plot in Figure 6 shows how the contam- ination yields heavy tails (the curvature in the QQ plot). Yet most of the distribution appears linear in this plot.

When we consider the simulation results when the data are not normally distributed but have heavy tails, we see that the Type I error rates are the same, to the extent we can observe on Figure 7. However, the Student and Welch tests have very low statistical power when the effect size increases and Yuen’s test has dramatically higher statistical power.1 Wilcox argues that we should never use standard exclusion rules (e.g., drop observations who have residuals > 2 or 2.5 SDs from the mean) or transform the data to make it more normal. Instead, simply trim the tails and use the appropriate statistical test, such as Yuen’s t-test, to examine our hypotheses and make inferences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly