250B Final Flashcards

1
Q

What is the general formula for a total variance and how can it be converted to total sum of squares?

A

Variance is squared deviations from grand mean over degrees of freedom (N - 1). To get SStotal, just multiply the total variance by the degrees of freedom.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can we conceptualize the variance of Y in model comparison? How is this related to the variance sum law?

A

Y = Model + Error
Y = Fit + Residual
Y = Systematic variance + Unsystematic variance
Variance of Y arises through variance sum law: Sy^2 = Smodel^2 + Serror^2 - 0
The covariance term goes to 0 because the two sources of variance are uncorrelated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define the terms Sy^2 = Smodel^2 + Serror^2 for two-group independent t-test.

A

Comparing group mean vs grand mean.
For null model:
Sy^2 = sum(Ybar - Ybar)^2 + sum(Y - Ybar)^2
For full model:
Sy^2 = sum(Yhat - Ybar)^2 + sum(Y - Yhat)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define the terms Sy^2 = Smodel^2 + Serror^2 for one-way ANOVA.

A

Comparing group means vs grand mean.
For null model:
Sy^2 = sum(Ybar - Ybar)^2 + sum(Y - Ybar)^2
For full model:
Sy^2 = sum(Yhat - Ybar)^2 + sum(Y - Yhat)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define the terms Sy^2 = Smodel^2 + Serror^2 for two-way ANOVA.

A

????

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define the terms Sy^2 = Smodel^2 + Serror^2 for simple regression.

A

Compare reduced model (fewer predictors) with full model (more predictors).
For null model:
Sy^2 = sum(Yhat - Ybar)^2 + sum(Y - Ybar)^2
For full model:
Sy^2 = sum(Yhat - Ybar)^2 + sum(Y - Ybar)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Model comparison typically involves the comparison of error terms. Which term has more degrees of freedom, the full or reduced model sum-of-squares?

A

Reduced model has more df because fewer parameters are estimated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain the terms in F test for model comparison formula and when to use it.

A

It’s change in error variance over full model error variance. The terms used are error variance of reduced model, error variance of full model, error df for reduced model and error df for full model.
You use this formula any time you want to compare the error reduction properties of two models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A researcher has a two-way ANOVA problem but the effects (a,b, and ab) are correlated. How would model comparison proceed if the researcher wanted to use a hierarchical approach entering b, a, and the interaction in that order? What is being tested at each step?

A

Hierarchical: Type I SS.
The first model would include just B, and tests the amount of variance accounted for by B, SS(B). The second model adds in A, and tests the drop in residual variance from adding A after B, SS(A | B). The third model adds in the interaction, and tests the drop in residual variance that’s left over after accounting for A and B, SS(AxB | A, B).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Research and statistical analysis is a form of modeling. What does that mean?

A

This means that our goal as statisticians is to create a model of the world (a population) and test it to see if it accurately represents the world. We compare different statistical models of the world to one another and evaluate which ones are the best fit for the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How are partial eta sq computed from a table of SPSS output with SS? Are they the same as the change in R squared from a full (all predictors) to reduced model?

A

Partial eta squared tells you: how much variance would factor A explain if it was the only variable in the model?
Formula for partial eta-squared = SSeffect / (SSeffect + SSerror)
Sum of partial eta-squareds is not the same as change in R2 because the denominator for partial eta squares.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In SPSS output, what are the SS terms for Corrected Model, Error, Total, and Corrected Total?

A

Corrected Model SS = Corrected Total SS - Error SS. Variance due to two main effects and interaction.
Corrected Total SS is the SStotal we’re interested in. It comes from adding up SS for each factor, the interaction, and the error. Basically Corrected Model SS + Error SS
Error SS is the sum squares for error (within cells error) –> MSE
Total SS is the total sum of squares for intercept, main effects, interaction, and error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In SPSS output, what is that F value for the corrected model testing? How is that R Squared calculated?

A

The F value for the corrected model is the F test for change in error variance between reduced model and full model with all predictors (aka test for change in R squared)
The R squared is the amount of variance in DV accounted for by the mode l= Corrected Model / Corrected Total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Howell states, “ANOVA tells us that three treatments have different means. Multiple regression tell us that means are related to treatments”. Explain in what ways are these the same thing?

A

These are the same because when we say means are related to treatments, we mean that group means are different depending on what treatment they receive – aka, the treatment groups have different means.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If there were two groups, treatment and control, one could do a t-test, a one-way between ANOVA, or compute the correlation between treatment/control and the dependent variable. Are there any important differences between these three approaches? In what ways would the results be exactly the same?

A

a t-test and the ANOVA are the most direct approaches, and rejecting those null hypotheses give you direct information about whether or not there are group mean differences.
The t-test and ANOVA will give the same p-value, and the F statistic will be the square of the t statistic.
Computing a correlation is whether variability in DV is explained by treatment. This would entail computing a point biserial correlation, and will yield the same test statistic as the previous two tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If there were four treatment groups in a one-way ANOVA, and you had three effect codes (and thus three regression coefficients in the GLM), what would the effect be for the missing code, T4, and why?

A

T4 would be the reference group, which is indicated by being coded at -1 on all other regression coefficients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

A researcher has five levels of an independent variable and runs both an ANOVA and a regression. Will the eta squared from the ANOVA be equal to the R2 from the regression? Why?

A

Yes, because they both measure the amount of variance accounted for by the model. Because in an omnibus test, the regression model predicts the cell means perfectly. This is the same thing as the structural model in ANOVA, where each person’s score is predicted from his or her group mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Explain how the interpretation of regression coefficients would change depending on the whether the treatment levels are effects coded versus dummy coded.

A

In dummy coding, the reference group is coded as 0 and the intercept is the mean for the reference group. Coefficients are the difference between coded group and the grand mean.
In effects coding, the reference group is coded as -1 and the intercept is the grand mean. Coefficients are differences between the group coded one for that coef and the grand mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Given equal n in one-way or factorial ANOVA, what special properties will the design matrix have? What will the means and correlations of the design matrix look like with equal and unequal n with effects or dummy coding?

A

Special Properties: If balanced design, design matrix will have orthogonal matrix in it.
Effects coding with equal n: Different effects will be uncorrelated in the design matrix. Effect of A uncorrelated with the effect of B. By effect, we mean that Dummy codes for A uncorrelated with Dummy codes for B and same for the interaction. If balanced design, the means of each effects codes will be zero and they will be some fraction for dummy codes.
Without equal n: Different effects will be correlated in the design matrix. Effects codes will no longer have means of 0 and you cant uniquely partition it. Effects will be correlated with each other (i.e. A1 will be correlated with B1, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How does the design matrix allow us to transition from the ANOVA model to the regression framework of the GLM?

A

The design matrix relates the predictors to the dependent variable. If we code each group in ANOVA as a predictor and put those codes into the design matrix, we have a method of getting from ANOVA framework to GLM framework.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the degrees of freedom numerator and denominator for the F test in the GLM approach to ANOVA?

A

Numerator DF: difference in df between reduced and full model (i.e., difference in num parameters estimated)
Denominator DF: full model DF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q
What are these incremental SS?
SS(AB | A, B)
SS(A | B, AB)
SS(B | A, AB)
SS(A | B)
SS(B | A)
A
These SS are all SSregression
SS(AB | A, B) = SS(A, B, AB) - SS(A, B)
SS(A | B, AB) = SS(A, B, AB) - SS(B, AB)
SS(B | A, AB) = SS(A, B, AB) - SS(A, AB)
SS(A | B) = SS(A, B) - SS(B)
SS(B | A) = SS(A, B) - SS(A)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

A researcher is conducting a two-way ANOVA in a general linear model framework, but the effects are confounded. How would this impact eta squared under a Type I sum-of-squares versus a Type III sum-of-squares?

A

If the effects are confounded, we cannot partition eta-sq uniquely into variance due to each variable and variance due to the interaction.
In Type I we adjust only for terms that were entered before so we test the first factor and see how much variance it takes up without controlling for the other factor. So, its eta-sq will be enlarged. Sum of semi-partial correlations (eta squared) should be smaller for Type III
In Type III, we adjust everything for everything else and test main effects after controlling for other main effect and interaction. In Type III SS you get only variance unique to each factor and the interaction so eta-sq decreases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the order of incremental SS for Type I SS?

A

Enter A first, B second, AB third.

SS(A) for factor A, then SS(B | A) for factor B, then SS(AB | B, A) for interaction AB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

A researcher is conducting a two-way ANOVA in a general linear model framework, but the effects are confounded. How would the eta squares be computed using a change in R squared?
Related: first define semi-partial eta squared

A

When each factor’s eta squared reflects unique effect for a given factor controlling for all other factors (like in Type III SS), this is called semi-partial
We compute these semi-partial correlations by comparing Rsq in full model vs Rsq in reduced model (literally just subtract: Rsq_full - Rsq_reduced)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

In an unbalanced design in factorial ANOVA, the effect of factor A is confounded by the effects of factor B and the AB interaction. What does that really mean?

A

This means that you can’t uniquely partition variance accounted for (SS) uniquely to each factor. It’s impossible to separate what is attributable to factor A and factor B.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Does Type I SS test weighted or unweighted means?

A

Weighted (ignore effects of other variables, result in confounding in unbalanced designs)

28
Q

Does Type III SS test weighted or unweighted means?

A

Unweighted means– we treat data as if it were balanced and don’t take cell sample size into account
Confounded SS are not apportioned to any source of variation – we just take variance for each factor controlling for other factors

29
Q

In an unequal n factorial ANOVA, which method of SS has no confounded SS?

A

Type III SS because testing unweighted means, controlling for other factors

30
Q

How are the different methods of computing SS essentially different ways of weighting the means, and thus testing different hypotheses?

A

Type I SS tests differences in weighted marginal means.
Type II SS tests differences in weighted marginal means.
Type III SS tests differences in unweighted means.
Testing weighted means is a different hypothesis than testing unweighted means

31
Q

Describe the three methods or types of SS, when each might be used.

A

Type I SS:
o Hierarchical/sequential
o Each term is adjusted only for the terms that were entered before it
♣ Order of entry matters
o We use this when we want to test a difference in weighted marginal means
Type II SS:
o Tests for main effects after controlling for other main effects but not interaction
♣ So sort of like Type I with different order of factors
o Use this when there is no interaction (most powerful test of main effects) and you want to test weighted means
Type III SS:
o Everything is adjusted for everything else
o Tests presence of main effect after controlling for other main effect and interaction
o Use this when you are interested in testing unweighted means and/or there’s an interaction

32
Q

In an unbalanced design, will SSA be larger or smaller under Method II versus Method III (assuming no suppressor effects)?

A

SSa should be larger in Type II SS because in Type II you don’t control for the interaction so you let SSa take up the variance that overlaps between SSa and SSab

33
Q

What is the order of incremental SS for Type II SS?

A

Adjust B for A, A for B, and then AB for A and B.
SS(A | B) for factor A
SS(B | A) for factor B
SS(AB | B, A) for factor AB

34
Q

What is the order of incremental SS for Type III SS?

A

Everything adjusted for everything else
SS(A | B, AB) for factor A
SS(B | A, AB) for factor B
SS(AB, | A, B) for interaction

35
Q

In the GLM approach to ANOVA, do the SS of the effects plus error sum to SS total in both balanced and unbalanced designs?

A

No because in unbalanced designs you cannot uniquely partition variance due to each factor.

36
Q

What are the main purposes of ANCOVA?

A

When an unmanipulated variable (covariate) plays a role in predicting the DV, we can model these effects to reduce error in our model and increase power. We want to know how much our research factors contribute to SSy after the effect of the CV has been partialed out.
If the CV is correlated with the IV, ANCOVA adjust the group mens on the DV to predict what means would be if groups were equal on the CV

37
Q

ANCOVA differs somewhat depending the whether the covariate is related to only the dependent variable, or both the dependent and independent variable. How is it different?

A

If CV is just related to DV, this is traditional ANCOVA where the variance of the DV is reduced after adding in the CV.
If CV is related to DV and IV, CV reduces variance of DV AND ALSO variance of IV, adjusting treatment group means assuming that groups are equal on the covariate

38
Q

What factors determine the degree of group mean adjustment in ANCOVA?

A

Groups further from mean of CV get larger adjustments. The groups will not change in order, but mean differences can get larger or smaller.
Strength of linear relationship between CV and DV?

39
Q

What additional major assumption is included in ANCOVA and how is it tested?

A

Linearity and homogeneity of slopes. Homogeneity of slopes means: relationship between DV and covariate is the same at each level of the IV –> no interaction between CV and IV.
We test this in two ways:
1) running a GLM and including the CVIV interaction and seeing if it is significant, aka Rsquared test for model with and without covariateIV interaction
2) CV as dependent and see if IV predicts it

40
Q

What is the null hypothesis in ANCOVA?

A

The model without the covariate and just the IV

No dif among adjusted population means. When you adjust for CV there is no difference

41
Q

Describe the process of computing adjusted SS and testing the effects in ANCOVA, using regression models.

A

Full model includes covariate, IV, and interaction between IV and CV. Reduced model = get rid of interaction. Look at whether Rsq decreases significantly from removing the interaction, if it does keep it.
To get adjusted SS due to IV: get full model SSregression, and SSregression from model with only the CV. –> adjusted SS due to IV is SSreg_Full - SSreg_CVonly
To get adjusted SS due to CV: get full model SSreg and SSreg from model with IV only –> SSreg_IVonly - SSreg - SSreg_CVonly

42
Q

Given the results of ANCOVA, understand how to use the full model regression to compute adjusted means.

A

Get the grand mean of the CV, group CV means, group means of DV, and the coefficient associated with the CV in your regression. Then use the formula Ybarj’ = Ybarj - Bcv * (CVbarj - CVbar)
Or, literally stick all the coefs into the regression equation and calculate them.

43
Q

What is Howell’s recommended approach for an r-family effect size in one-way ANCOVA?

A

Use eta squared, knowing that it’s positively biased.
If CV varies naturally in the population, we can divide SSregression from ANCOVA by SStotal (unadjusted) from the ANCOVA. This gives us a percentage of normal variation accounted for by the IV.
Or take eta sq as difference between Rsq from model predicting DV from only CV and Rsq from model predicting DV from CV and IV. Increase in Rsq = what the IV contributes after controlling for CV.

44
Q

Know the change in R squared test and how it may be used in ANCOVA.

A

This is just an F test for the decrease in Rsq going from the full model with the CV, IV, and CV*IV to the reduced model without interaction (or whatever).

45
Q

Under what conditions would it be appropriate to use MSwithin from ANOVA for a d-family effect size for post-hoc mean comparisons?

A

This is an estimate of the average variability within each group and standardizes the mean dif in the metric of original measurements. (vs. MSwithin from ANCOVA, which standardizes mean dif in metric of adjusted scores).
We use MSwithin from ANOVA when the CV varies normally in the population.

46
Q

What are the interpretational problems of ANCOVA in the non-equivalent groups designs?

A

Since participants not randomly assigned, we can’t assume that differences in CV are due to chance or that the two groups have the same mean on DV in absence of IV effects.

47
Q

In an experiment with random assignment, the expected values of the group means on the covariate are equal, and equal the grand mean on the covariate. Any differences are due to chance random assignment. In this situation, in what sense does ANCOVA correct or unbias the results due to these chance differences?

A

ANCOVA adjusts all the observations as if they were at the mean of the CV

48
Q

In factorial ANCOVA with equal n, we test the adjusted effects of the covariate, A, B, and AB. In fact, however, the only factor that is adjusted for is the covariate. Why is that?

A

Because the purpose of the ANCOVA is to uniquely look at how a factor behaves adjusted for on a covariate.

49
Q

What are those partial eta squares output by SPSS in factorial ANCOVA?

A

They are done the same way that we’ve learned them: (SSeffect / SSeffect + SSerror)

50
Q

How would the results of an ANCOVA compare to an ordinary ANOVA (i.e., an analysis without the covariate)? Why can a result be significant in ANCOVA but not in ANOVA?

A

Usually (if CV and DV are correlated) ANCOVA is more powerful so you’re more likely to get statistically significant results. Shit can be significant in ANCOVA and not in ANOVA because the CV reduces error variance for a stronger test of the effect of the IV.

51
Q

What is the file drawer problem?

A

negative results remain unpublished in someone’s files, so things like meta analyses are limited to published (statistically significant) literature

52
Q

What is a standard error for an effect size, such as d, r, odds ratio? That is, what is its interpretation?

A

An estimate of the variability of the statistic over an infinite number of replications of a specific study

53
Q

Why are odd-ratios, or risk-ratios converted to logs before being analyzed in a meta-analysis?

A

Because odds-ratios and risk-ratios have restricted range (0-1)? and skewed distributions. All the statistics we do are based on things that vary from -inf to +inf.

54
Q

Distinguish between the random and fixed effects model for meta-analysis.

A

Fixed effects: we assume all the studies measure exactly the same thing, in the exact same population. In FE, we are mimicking pooling all data together and getting one effect size estimate of the population effect size.
Random effects: not estimating one true effect size, we are trying to describe a literature because we assume different studies are measuring different things so there are many different distributions of effect sizes. We are estimating a mean of a distribution of true effects.

55
Q

How is the distinction between random and fixed effects statistically validated?

A

Omnibus test for heterogeneity: Q heterogeneity statistic
It’s like a weighted SSbetween– larger values indicate more variability between studies.
Q~X^2 (K - 1) where K = number of studies

56
Q

What are the basic assumptions of the fixed and random effects models in terms of the observed effect sizes?

A

Fixed effects: assume all studies measure exactly the same thing
Random effects: assume all studies measure slightly different things

57
Q

In Howell, what does this equation represent, and what are the two critical variances? How is this model used in meta-analysis evaluate fixed versus random models?
Yi = mu + Tj + Eij

A

Yi is effect size in experiment i, mu is overall mean effect, Tj is effect of being in study j (difference between the true effect measuring in study j and overall mean true effect, mu)., Eij is sampling error.
In Fixed Effects we drop the Tj.

58
Q

What is the formula for each treatment’s eta squared and how is eta squared for each treatment related to R squared?

A

Eta squared = SSregression_effect / SStotal

SUM OF ETA SQUARED (SSeffect/SStotal) IS Rsquared

59
Q

What is the standard approach to computing a confidence interval in meta-analysis, and how does that differ from the optimal (from a statistician’s perspective) approach?

A

Traditionally, CI on effect sizes are computed using the non-central t distribution, which is the most accurate way to do it. In meta-analysis, the z-distribution is used because given the number of studies combined, the overall effect is likely quite accurate. However, using this approach to calculate CIs on d for the individual studies with small N may be questioned.

60
Q

What doe the T^2 equation from the DerSimonian and Laird method measure? What does the numerator represent?

A

The equation measures the variance of the true effects if we have a random model underlying our results. The numerator is the excess variability that cannot be attributed to random differences among studies. C is a term analogous to the within groups term in ANOVA, but it also standardizes the result much as dividing (Xbar –u)/SD standardizes d.

61
Q

Why are study variables that are shown to be related to the effect sizes in meta-analysis research called moderators (e.g., duration of intervention)?

A

Moderation is always interactions (effect of A depends on B.)

If they’re related to the effect size, it means that they affect the size of the effect of the IV on the DV. Whatever predicts the effect size aside from the IV is a moderator.

62
Q

In what sense is Q – df an “excess” variance?

A

This is the numerator of the Q statistic. Q measures the differences among the effect size measures, and the df is the expected variability if the null hypothesis is true. So the numerator is the excess variability that cant be attributed to random differences among studies.

63
Q

Between study variables like gender mix, length of intervention, and so on, can potentially explain what variance? What analytic tool would be used to explore the degree of parameter variance, and how it is reduced when including between study predictors?

A

You could look at the ICC and that’s what Q or T2 measures too. These study variables explain parameter variance. T squared explains degree of parameters. The last part of the question is once you know there is between study variance, you can attempt to explain it using characteristics of the studies.

64
Q

In testing simple effects in ANCOVA, why would we need to continually adjust the error term depending on the particular comparison?

A

Because predicted error is different depending on where you are in relation to mean (more uncertainty farther away from the mean)

65
Q

As explained in the Greg Miller article, what is the main problem interpretive problem with ANCOVA when used in non-experimental research and the covariate and independent variable are confounded? According to the article, can a researcher study the effects of depression (controlling for anxiety) and still be studying depression? Why or why not?

A

The main interpretative problem is that the more you try to control for things in observational research by including more covariates, the more unrealistic of a universe you generalize to. ANCOVA compares treatment group means adjusted to assume that the groups were actually equal on the covariate. In the real world, it doesn’t make sense to equate groups that are naturally unequal. Usually adjustments don’t answer the real questions of interest.