Flashcards in Exam 3 Deck (70):
The overall Fobserved in an Analysis of Regression is F(4,203) = 5.89.
a. How many predictors are there in this regression analysis?
The overall Fobserved in an Analysis of Regression is F(4,203) = 5.89.
b. How many degrees of freedom are there for the t-test for the significance of each predictor?
Suppose the observed value of a t-test for a single regression coefficient. If t(39) = 3.00, for alpha=.05, two tailed. If the test of this regression coefficient were reported as an F test instead, what would be the numerical value of Fobserved? What would be the degrees of freedom of this F test?
F (1, 39) = 9.00
Suppose you have a categorical variable with four groups, e.g. four regions of the country. You wish to use it as a predictor in a regression analysis. What is the general strategy for employing a categorical variable as a predictor in a regression analysis.
We create g-1 code variables to represent g groups so that each grou phas its own code. We treat them by coding variables that carry the information about the group membership of each case. There are numerous coding schemes that we use. Each coding system carries all the group information and represents the same nominal variable.
In any coding scheme for g groups, how many codes are required to characterize the g groups?
G - 1
A dummy variable coding scheme with group 3 as the baseline group
Group # C1 C2
1 1 0
2 0 1
3 (base) 0 0
an unweighted effects coding scheme with group 3 as the baseline group. interpret coefficients:
Yhat = b1UE1 + b2UE2 + b0
Group # C1 C2
1 1 0
2 0 1
3 (base) -1 -1
b1=mean of group 1 minus grand mean
b2=mean of group 2 minus grand mean
a series of orthogonal contrast codes. For example, be able to code these contrasts: contrast code that contrasts the mean of the first group with the average of the means of the second and third groups; a contrast code that contrasts the mean of the second group with the mean of the third group.
Group # C1 C2
1 -2 0
2 1 1
3 1 -1
Group # C1 C2
1 1 0
2 -.5 .5
3 -.5 .5
For dummy coding, be able to take the general regression equation and the codes and explain what each of the coefficients in the equation is measuring
B0 = value of the mean in the baseline group
B1= the difference between the mean of groups in first contrast
B2= the difference between the mean of groups in second contrast
What does it mean if two codes from a coding scheme are orthogonal? Under what condition will the following relationship hold:
r2multiple = r2y,c1 + r2y,c2
Orthogonal means that each code accounts for a portion of variance that does not overlap at all with the other codes in the set. The relationship above will only hold for equal sample sizes and orthogonal contrasts
Are dummy codes centered?
Dummy codes are not centered
Are the pairs of dummy codes in a dummy variable coding scheme orthogonal?
No, the dummy codes are correlated with one another. They share the same base group. They account for overlapping proportions of variance in Y. You can’t compute the correlation of each dummy code with the criterion.
What sort of data configuration lends itself to coding with dummy codes?
A configuration in which there is one definite control group or base group.
give the set of unweighted effects codes with group 3 as the baseline group
Group # C1 C2
1 1 0
2 0 1
3 (base) -1 -1
For unweighted effects coding, be able to take the general regression equation and the codes and explain what each of the coefficients in the equation is measuring if the groups are of equal size
The intercept = the grand mean of all group means
The regression coefficient for each unweighted effects code is the mean of the group coded 1 minus the grand mean.
In what way are unweighted effects codes, applied to equal group size data, intimately related to ANOV.
In ANOVA, all the contrasts that go into computing the SStreatment are of each group mean with the grand mean. Each unweighted effect code in regression provides a measure of the difference between a group mean and the grand mean.
Are unweighted effects codes centered for equal group size? for unequal group size?
Yes, unweighted effects codes are centered only if we are dealing with equal sample sizes. If we have unequal group sizes, they can be adjusted to weighted effects codes.
Are unweighted effects codes orthogonal?
No. Unweighted effects codes are not orthogonal because the sum of the squared validities does not equal r2multiple.
Return to thinking about dummy codes. Consider gender as a dummy coded varaible, 1=male, 0=female. Suppose you have the coefficients for the overall regression equation, where X is continuous and D is a dummy code:
Yhat= b1 X + b2 D + b3 XD + b0
Yhat= .4X + .3 D + .2XD + 1.5
Explain what each of the four coefficients (including the intercept) measure.
B0: The intercept for females (the intercept for the group coded zero)
B1: the regression of Y on X for females is .4 (the regression of Y on X in the group coded zero)
B2: the difference in intercepts for males minus females is .3 (the difference in the intercepts for the group coded 1 minus for the group coded 0)
B3: the difference in slopes for males minus females is .2 (the difference in the slopes for the group coded 1 minus for the group coded 0)
Be able to write the simple regression equation for the group coded zero (female) from the overall equation.
Yhat= .4X + .3 D + .2XD + 1.5
It would be = .4X + 1.5.
How would you get a simple regression for the group coded one (males)
The easiest thing would be to reverse the codes, so that 1=female, 0=male, and rerun the regression equation
In coding of a two-group variable, we considered the use of unweighted effect codes (+1,-1) versus the codes (+.5, -.5). In the equation
Yhat= b1 X + b2 C + b3 XC + b0 when using the contrast code versus
Yhat= b1 X + b2 UE + b3 X*UE + b0 when using the unweighted effects codes,
explain how the numerical values of coefficients will change when you switch coding systems. Will the significance of the coefficients change when you change between these two coding systems?
In the case of unweighted effects codes (+1, -1), the coefficient b2 and b3 will change. They will only get us halfway to the next code. Since the coefficients represent a one unit change, it will get us to the value of 0.
In the case of the contrast codes (+.5, -.5), the coefficients won’t change because they still represent a one unit change.
What is Type III partialing in SAS GLM, in SPSS GLM? Regression or "unique" partialing in SPSS MANOVA?
When we have unequal n’s, there are options for how we can analyze the data. We can use the “unique” partialing (in SPSS MANOVA) or Type III partialing (in SPSS GLM and SAS GLM) where each sums of squares is reported with all other effects partialed out.
SSA with B and AB partialed out
SSB with A and AB partialed out
SSAB with A and B partialed out.
Show the regression equation for a two-group experiment with a continuous variable included. In the analysis of covariance, what is the categorical variable called? the continuous control variables?
The categorical variables are called covariates or control variables. The continuous variable is referred to as the variate (coded variable)
Yhat = b1X1 + b2X2 … bpC + b0
C is the categorical variate
X is a covariate, continuous or categorical.
Why is there no correlation between the covariate and the variate in the true experiment? With what two things can the covariate be correlated in a quasi-experiment with nonrandom assignment?
There is no correlation between the covariate and the variate in the true experiment because random assignment should control for that. The covariate can be correlated to the DV. In a quasi-experiment, it’s okay for there to be a correlation between characteristics of the subject and assignment to the treatment. A covariate might be more or less correlated with the criterion Y. It comes about because subjects are not randomly assigned.
How is power for the test of treatment affected by the covariate in the true experiment (increase power)? How is power in the quasi-experiment affected by the covariate (may increase or decrease power depending on the relationship of the covariate to treatment and criterion)?
Power is increased because the covariate partials out from the criterion a source of variation that is irrelevant to the predictor, which increases the power of the test for an effect. The ANCOVA with a quasi-experiment might increase or decrease power for the test of the effect depending on the relationship of the covariate to treatment and criterion. It’s possible that the covariate partials out error variation in the criterion or it may partial out pre-existing b/w group differences on the criterion that should not be attributed to the treatment.
What is a within class regression line in the ANCOVA? What assumption is made about within class regression lines in analysis of covariance? If the assumption is met, is the treatment effect constant over all levels of the covariate?
Within class regression lines are regressions of the criterion on the covariate in each of the conditions. It assumes homogeneity of within class regression meaning that the b1s are equal in the two groups (the slopes are the same). If that assumption is met, the treatment is constant over all levels (no interaction).
Show an alternative arrangement of equation
Yhat = b1 X + b2 C + b3 XC + b0
into a simple regression equation that shows the regression of Y on C at different values of X. This is an arrangement that focuses on differences between the means of the two groups on the dependent variable as a function of X.
Yhat = b1 X + b2 C + b3 XC + b0
Yhat = b2 C + b3 XC + b1 X + b0
Yhat = (b2 + b3 X) C + (b1X + b0)
The simple regression coefficient here (b2 + b3 X) gives the value of the difference between the intercept of the group coded 1 minus the group coded 0, at each specific value of X.
What does the Johnson-Neyman procedure test?
This procedure provides cutoff values of X (the covariate) beyond which the treatment effect (the difference in elevation of the two regression lines) is significant. It’s the procedure of testing conditional effects of treatment C at particular values of X. It tests whether two lines have significant differences at various points on X.
If you use contrast coding for two groups (group 1 = .5, group 2 = -.5), and the contrast code interacts with the continuous variable in the equation, be able to indicate what each of the regression coefficients in the equation measures
B1: the unweighted mean of the slopes of the groups
B2: the difference between the intercepts of the high group versus the low coded group
B3: the difference between the slopes of the high coded group minus the low coded group
B0: the unweighted mean of the two intercepts
What is meant by a fixed effects regression model? Under what condition will all inferences be correct in a fixed effects multiple regression, even if we merely sample cases and observe both X and Y?
Fixed effects regression means that the predictors have specified values that are systematically included in the sample. There is no probability distribution for the predictors. We sample X systematically and observe values of Y. All predictors need to be 100% reliable with zero errors and we decide the range of values for the predictors. If we have multivariate normality, meaning each variable is normally distributed AND conditional for each value on one predictor, then all inferences will be correct.
What are components of the residual in a regression equation
The residual contains random variation in Y and specification errors of excluding relevant variables or specifying the wrong form of the relationship of predictors to criterion.
What are meant by the mean structure and the variance structure of the regression model?
The mean structure refers to the coefficients and predicted scores. The variance structure refers to the error terms, the MS residual, and the standard error.
What two aspects of regression analysis are we concerned about when we consider violations of the assumptions of the OLS regression model?
We are concerned about the estimates of coefficients, estimates of standard errors, and MS residuals
Explain conditional variances of Y, given X. What is meant by homoscedasticity? The violation of the assumption is heteroscedasticity. What are the effects of heteroscedasticity on the regression analysis?
The conditional variance of Y for each set of fixed values of predictors is an estimate of the error variance. We assume homoscedasticity. Homoscedasticity states that the conditional variances are equal across each combination of one value on each predictor. Heteroscedasticity leaves the estimates of the regression coefficients unbiased; the standard errors of OLS become biased; the direction of bias depends on the relationship of the error variance to a predictor. If the error variance increases as the predictor increases, the bias is negative and significance is over-estimated. If error variance decreases as predictor increases, the bias is positive, and significance is under estimated.
Consider the assumption of normally distributed errors? Where does the assumption come into play? What in the regression analysis is affected by violation of the assumption?
Underlies the tests of significance of the multiple correlation and individual regression coefficients, as well as the confidence intervals. The assumption comes into play in inference. Nonnormality of errors does not create bias in the regression coefficients, but may increase the standard errors relative to what these standard errors would be for OLS estimates if data were normally distributed. T and F tests might be biased by nonnormality.
Consider the assumption of independent errors across observations. By independent errors I mean that there is no correlation among observations, no clustering. How can errors become correlated? With nonrepeated measures, the measures may be taken on people within groups. With repeated measures, repeated observations on the same individual over time will be correlated with one another.
Errors can become correlated in quasi experiments and repeated measures. With nonrepeated measures, the measures may be taken on people within groups or who are related. With repeated measures, the repeated observations on the same individual over time will be correlated with one another. Correlated errors will show up in mean structure, but won't be in variance structure because it is correlated with the treatment. We will under-estimate all the error variance and have a positive bias in all our tests.
What is the ICC? What does it measure
The ICC is the intraclass correlation. It is an index of how much clustering there is in the data (non-independence). An ICC of 0 means independence. Even a small correlation will give us alpha inflation
What are the effects of correlated errors on the OLS regression analysis?
What alternative regression model is appropriate with clustered data?
The effects of correlated errors (autocorrelation) are: the regression coefficients remain unbiased; the regression coefficients may be highly unstable across replications; and MS residual may substantially underestimate the true amount of residual variance in the population. The sample estimates of the standard errors of the regression coefficients may underestimate the corresponding parameters (t-tests for coefficients are positively biased). The fix is to do multilevel modeling
What are kernel density estimates? Are they applied to the distribution of a single variable or to pairs of variables? What do they help to illustrate?
The distribution of a variable represented in a histogram that is highlighted with a nonparametric smooth. They are helpful in identifying skew of distributions and outliers.
Normal probability plots (p. 4-5, 8-9 of Plots handout). Are these applied to the distribution of a single variable or to pairs of variables? Describe what is on the X axis and Y axis of normal probability plots. If a variable is normally distributed, how will the plot appear? How will an outlier appear?
Normal probability plots detect nonnormality and outliers in the distribution of a single variable, like the residuals. A set of scores is plotted as a fxn of scores that would have been obtained if the variable were normally distributed. The scores are ranked from lowest to highest. Nonnormality may look like a light tail or heavy tail. If the actual scores are normally distributed, the data points of the graph fall on a straight line. Skewed distributions have one heavy and one light tail. Outliers appear as points toward the upper right or lower left.
Explain how partial regression leverage plots (also called added variable plots) are constructed . Can more than one variable be partialed at a time? What information do partial regression leverage plots add to plots of residuals against predicted scores.
The partial regression leverage plots allow you to identify the specific predictor that is leading to the difficulties such as model misspecification. In these plots it is also easier to see how a particular case is distorting the regression coefficients. Case 69 example 6
What do we mean by the breakdown point of an estimator? What is the breakdown point for OLS estimators?
The breakdown point is the proportion of outlying points in a sample that it takes to change the values of estimates of regression coefficients away from those that would be obtained if no errant points were present. For OLS estimators, the breakdown point is said to be: 1/n
What are the three characterizations of errant data points
In what two ways is the term "outlier" used? How is it used in the context of regression diagnostics?
Outliers are extreme points in a distribution. In multiple regression, outliers are conditional as data points whose Y scores are unexpected, given their X scores or position in the predictor space. These are scores that do not follow the regression model.
Explain leverage. Is a model required to measure leverage? Does high leverage necessarily mean that a point is affecting the regression outcome?
Leverage is based on predictors. It is the potential for a point to move the regression line. Is the case extreme on the predictors (no model is needed). High leverage does not necessarily mean that the point is affecting the regression outcome.
Explain distance. Is a model required to measure distance? Does high distance necessarily mean that a point is affecting the regression outcome?
Distance is based on the residuals. Is the point extreme on Y, given X. A model is needed. High distance does not necessarily mean that a point is affecting the regression outcome. It can also mean distortion of standard errors
Explain influence. Is a model required to measure influence? How does influence relate to leverage and distance. Does high influence necessarily mean that the point is affecting the regression outcome?
Influence is the function of both leverage and distance. These are points that distort the regression equation/surface. A model is needed. High influence means that the point is affecting the regression outcome.
What is the general strategy for studying the effect of a point on the regression outcome
We would delete the point and rerun the analysis. This is called DFBETAS: a measure of standardized change in regression coefficients when a case is deleted.
What measures are on the main diagonal of the hat matrix?
The hat diagonals measure the length from the centroid of the data (leverage)
What is the basis of all measures of distance?
The basis of all measures of distance are residuals
What is the problem in regression diagnostics with clusters of errant points?
Clustered errant points can mask each other in diagnostic analysis. Clusters can make it difficult to determine which cases to delete. If you have a cluster of influential cases, all working in the same way to change the regression outcome, if you remove one of the cases, the remaining cases in the clusger will continue to exert influence. Therefore removing a case does not show the impact of the case on the analysis.
Define effect size in terms of the null hypothesis.
The effect size is the degree to which the treatment changes treated subjects relative to control subjects. It’s the degree to which phenomenon is present in the population or the degree to which the null hypothesis is false. It’s a measure of how far an effect is from the null value (typically zero) in the population.
In general, how is effect size measured? What is d as a measure of effect size? How is effect size defined by Cohen for multiple and partial correlations (squared)?
Effect size is a function of the ratio of systematic variance (of predictors or tx manipulation) relative to error or residual variance). It is the proportion of Y variance accounted for by the source in question relative to the proportion of error. Effect sizes are unitless that do not depend on the scale of measurement.
Effect Size = Systematic variance accounted for / Error variance
Cohen’s d is a measure of the differences between the means of two groups (usually experimental and control) relative to the standard error. It is the systematic difference in the numerator by the random variability in the denominator.
Effect sizes for multiple regression is defined in squared terms f2 . The effect size for multiple and partial correlations is done by dividing the squared multiple correlation of a set of predictors by the equation without that set of predictors (if you are predicting a criterion from a set of predictors B).
Approximately what percent of the variance is accounted for in Cohen's (1988) definition of small, moderate, and large effect size for Pearson product moment correlations? What percent of variance is accounted for in Cohen's (1988) definition of small, moderate, and large effect size squared multiple correlations?
Pearson product moment correlations of .1 (small), .3 (mod), and .5 (large) account for 1%, 9%, and 25% of the variance respectively.
Multiple or correlations (not squared) correspond to .14, .36, and .51 respectively for effect sizes. This translates to an r2 multiple and accounted variance of 2%, 13%, and 26% respectively.
Define the power of a statistical test.
The power of a statistical test is the probability that you actually detect a non-zero effect. It’s the probability of rejecting the null hypothesis, given that the null hypothesis is false.
What four factors are involved in any instance of hypothesis testing? How can these factors be used to determine the number of subjects required for a research design?
The four factors involved in statistical inference are PANE
P= power of the test (i.e., the probability of rejecting a false null)
A= the level of significance chosen (alpha)
N= the sample size (n)
E= the effect size of the effect
The experimenter can specify in advance the power desired (.8)
The experimenter can specify the alpha level in advance
Estimates of effect size can come from past research or by using the small, mod, and large values set by Cohen.
Assume that with 100% reliable predictors (i.e., no measurement error, reliability of each measure = 1.00), the effect size of an effect in a regression analysis is ƒ2 = .18. If the predictors instead each had reliability of .80, would the effect size remain the same or decrease?
The effect size would decrease. This highlights the discrepancy that may exist between a true effect size in the population, and the estimate of the effect size in a sample with predictors measured with error.
If a covariate interacts with the categorical variable in ANCOVA, what does this tell us about the slopes of the within class regression lines? What is the difficulty in coming up with an estimate of the treatment effect in ANCOVA if the covariate interacts with the categorical treatment variable?
If we have an interaction between a continuous and a two-group categorical variable, this means that the regression of Y on X is different within the two groups (that the within class regression slopes differ in the two groups). Violates homogeneity of within class regression and the treatment effect estimate depends on the value of the covariate. It becomes a much more nuanced estimate
What does the phrase "measured without error" mean with respect to the reliability of a variable?
Variable is 100% reliable
What are meant by attenuation and negative bias with regard to estimates of population parameters.
Attenuation means that the parameter estimate is closer to zero in the sample than in the population (shrinks toward zero). In contrast, negative bias means that the parameter estimate is closer to minus infinity.
What is the effect of measurement error in predictors in the one-predictor regression equation?
Measurement error attenuates regression coefficients in the one predictor case
What is the effect of measurement error in the multiple regression equation? Is the direction of bias in coefficients known—can it be either positive or negative bias?
Say we have a regression equation: Yhat = b1 X + b2 Z + b0. Measurement error in predictor Z will affect the b2 regression coefficient. The measurement error in Z will also affect the b1 regression coefficient. The direction of bias in the coefficients can be in either direction, that is, to increase the regression coefficient or to decrease the regression constant. In general, the direction of the bias is not known
What is the effect of measurement error for the XZ interaction in a regression equation containing X, Z, and XZ as predictors? How is the reliability of the interaction term related to the reliabilities of the individual predictors?
The reliability of XZ interaction term is exactly equal to the product of the reliabilities of the individual variables for uncorrelated X and Z, that is Roe XZ, XZ = Roe XX Roe ZZ. Thus the reliability of the product term is lower than the reliability of the variables of which it is compromised, and the sample size requirement for detecting interactions is large
What is the problem with simply dividing a residual by its own standard error to compute a studentized (or internally studentized) residual?
the case can move the regression plane toward itself, thereby reducing its own residual and increasing the residuals for all other cases, yielding larger MS residual.
What is the solution to the problem described in Q. 61? How is MSresidual(i) computed? I will refer to a residual divided by its deleted standard error as a studentized deleted or (externally studentized) residual.
Carry out the regression analysis with the cases removed; compute the predicted score and the standard error from the analysis with the case removed
What does DFFITS measure? Is there one or more than one measure of DFFITS for each case in a a regression equation?
One measure for the case. The change in the predicted score of a case (in standardized z form) due to the inclusion of the case – shows that the case is moving the regression plane. DFFITS is a global measure of standardized change based on a predicted score.
What does DFBETAS measure? Is there one or more than one measure of DFFITS for each case in a a regression equation?
For each case, more than one measure, specifically one measure of how the case is changing each regression equation, including the intercept. The change in each regression coefficient due to the inclusion of the case – shows that the case is changing the regression model. DFBETAS measures the standardized change in regression coefficients when a case is deleted.
Case IN MINUS case OUT
What is meant by under-adjustment bias in the analysis of covariance. What produces under-adjustment biase?
Under adjustment bias in the analysis of covariance refers to the failure of the covariate(s) to adjust completely for differences between groups. The under adjustment bias is due to unreliability of the covariates