Flashcards in Exam 2 Deck (46):
Explain what is meant by the tolerance of a predictor (from Study Guide 1). Explain how the measure of tolerance is computed from a regression model with more than two predictors.
The term [1 – r^2 x1, x2, x3] for any predictor Xi is called the tolerance of the predictor, the proportion of predictor Xi that is independent of all other predictors.
When predictors overlap in their association with the criterion, what approach is used to apportion variance accounted for to individual predictors?
The approach used to apportion the variance accounted for by one versus another predictor is to specify an order of priority of the variables. We assign to the first variable all the overlap of that variable with the criterion. We assign to the second variable, its overlap with the criterion which is unique (not redundant with overlap of the first variable). The decision as to which variable comes first is made based on theory.
How can you use the approach you described above in question 2 to make theoretical arguments about the importance of particular predictors or sets of predictors, taking into account other predictors?
If I were interested in how a person's beliefs about health determine his or her willingness to engage in a preventive health behavior, over and above some recommendation from their physician, I would make the first variable the physician's recommendation, and the second variable the person's health beliefs. I would be answering the question of the unique contribution of the psychological factor of health beliefs to health behavior, "with physician recommendation partialed out", or "held constant", or "taken into account" or the unique contribution of the psychological factor of health beliefs to health behavior, "taking physician recommendation into account", or “over and above physician recommendation.” It is a stronger argument for psychology to say that psychological factors account for health protective behavior, above and beyond what the doctor recommends than to say that psychological factors account for health protective behavior without taking into account what physicians recommend.
What is measured by the squared semi-partial correlation of a predictor with the criterion?
The squared semi-partial correlation measures the proportion gain in prediction of the criterion by the addition of another predictor or predictors to a regression equation already containing at least one other predictor.
What is measured by the squared partial correlation of a predictor with the criterion?
The squared partial correlation is the proportion of variance not accounted for by the first predictor that is accounted for by the added predictor. Put another way, it is the proportion of the residual variance not accounted for by the first predictor that is accounted for by the second predictor.
Know how to compute the squared semi-partial correlation from two squared multiple correlations
Say you had r^2 y.123 and r^2 y.12. Then the squared semi-partial correlation of X3 with the criterion, over and above X1 and X2 is
r^2 y(3.12) = r^2 y.123 - r^2 y.12
where R^2Y(3.12) is the squared semipartial correlation of predictor X3 with the criterion above and beyond predictors X1 and X2. The specific subscript notation of Y(3.12) indicates that predictors X1 and X2 are partialed out of X3, but not partialed out of Y. Thus the squared semipartial correlation. R^2Y(3.12) is the squared correlation between Y and the part of X3 that is independent of X1 and X2 (i.e., does not overlap with X1 and X2).
Know how to compute the squared partial correlation from two squared multiple correlations
r^2 y3.12 = (r^2 y.123 - r^2 y.12) / (1-r^2 y.12)
The subscript notation Y.12 denotes the proportion of variance accounted for by X1 and X2. (1-r^2y.12) is the proportion of variance not accounted for by X1 and X2. R^2y3.12 is the correlation between the part of Y with X1 and X2 partialed out and the part of X3 that also has X1 and X2 taken out.
Explain Horst's definition of a suppressor variable
Variables classically termed SUPPRESSOR VARIABLES (Paul Horst, 1941) that are uncorrelated with the criterion but are correlated with other predictors; these variables increase the r2multiple when they are added to a regression equation. For these variables, the higher the absolute value of the correlation with the other predictor, the higher the multiple correlation. That is, the more they are correlated with another predictor, the better the overall prediction.
Will the regression coefficient for the suppressor be zero?
Regression weights will not be zero for a suppressor variable; they may be either negative or positive. The sign of regression weights for suppressors depends upon the sign of the validities of the other variables and the sign of the correlation between the predictors and the suppressor.
Explain the designation of Type I versus Type II partialing in SAS (sequential versus unique).
Type I sums of squares are generated sequentially as the effect of each predictor with all previously listed predictors on the model statement are partialed out.
Type II sums of squares are what we expect in multiple regression, the effect of each predictor with all other predictors partialed out.
What is a conditional distribution
distribution of a variable at one value of another variable
What is a conditional mean?
the mean of a variable at one value of another variable
What does it mean if a regression equation is "linear in the coefficients".
Linearity in the coefficients means that the predicted score is a linear combination of the predictors, where the weights are regression coefficients.
The regression equation is in the form of a linear combination (weight times variable + weight times variable...):
= b1 X1 + b2 X2....bp Xp
What does it mean if a regression is "linear in the variables"?
Linearity in the variables means that the regression of Y on X is constant across the range of X. The conditional means fall on a straight line
What does additivity mean in a regression equation containing predictors X and Z and the criterion Y?
Additivity means that the relationship of one predictor to the criterion does not depend on the specific values of the other variables. It means that the regression of Y on X is the same at every value of Z, so that if you talk about the regression of Y on X with Z held constant, you do not have to indicate the specific value of Z.
What is the general form of a polynomial equation?
Polynomial equations contain a series of higher order functions of a single variable X. The polynomial of order p has (p-1) bends or inflections. 1st order has no bends (straight, linear line). 2nd order has one bend (parabola, quadratic).
How can a polynomial regression equation be used to test a prediction of a curvilinear relationship of a predictor to the criterion.
You can use a second order polynomial of a predictor. For instance, Yhat = b1X + b2X2 + b0. We would be testing whether the second order predictor adds significant predictability to the equation containing the first order predictor. If it does, then it lends support for the curvilinear hypothesis.
What do we mean by "higher" order terms in regression analysis? Give an example of a regression equation containing a higher order curvilinear term, a higher order interaction term.
Higher order terms are terms beyond 1st order (linear) polynomial regression. Higher order terms can be interactions, quadratic, cubic, quartic, etc. An example of a regression equation with a curvilinear term is: Yhat = b1X + b2X2 + b0. An example of an equation with an interaction term is: Yhat = b1X1 + b2X2 + b3X1X2 + b0
What is the multicollinearity problem that arises when higher order terms are included in regression equations? Distinguish between essential and nonessential multicollinearity. What is the "fix" for the collinearity problem? Which type of multicollinearity can be eliminated by the fix?
Non-essential multicollinearity is the amount of correlation that is produced by scaling. It’s excess correlation due only to scaling that does not reflect the true relationship between a variable and a function of the same variable. Essential multicollinearity is the amount of correlation between X and XZ that is due to skew in X and can’t be removed from centering.
When there is a nonlinear or interactive relationship between predictors and criterion, what is the general strategy that is used so that the data can be analyzed with OLS regression
We create a nonlinear function of X, like (X-Xbar)2 that is linearly related to Y. This follows the rule of linearity and explains how OLS regression can be used to model non-linear relationships.
Will the b2, higher order coefficient change if the variables are centered or not? Explain your answer. Will the b1, lower order coefficient change.
The b2 coefficient will not be changed with centering because it is a higher order term that displays the shape of the curve. The b1 coefficient will change with centering because centering changes the position of the slope of the equation at 0. Centering moves the 0 point, which will change the slope.
Interpret the b1 coefficient in three ways, assuming X has been centered before analysis
1) b1 is the average of all these little regressions across the range of X.
2) the slope of the curve of Y on X at the value of X for which (X-Xbar)2 = 0.
3) the slope of Y on centered X at the mean of centered X.
Why must all lower order terms be included in a regression equation when the prediction from a higher order term is being examined for significance?
We have to include all lower order terms because the coefficient for the highest order term is only an accurate reflection of the curvilinearity at the highest level if all lower levels are partialed out.
What do interactions between predictors signify in multiple regression analysis? Is there additivity of two predictors, if they interact--explain your answer.
Interactions signify a multiplicative effect. There is no additivity. It signifies that the effect of your predictor on your criterion depends on the value of another predictor.
If an interaction coefficient is significant, what does that tell you?
A significant interaction coefficient signifies that the strength or shape of the relationship between your predictor and criterion depends on the value of another predictor
A strategy in data analysis with continuous variables is to dichotomize the variables (e.g., “median splits”) and analyze them in ANOVA. What are two problems with this strategy?
When we dichotomize variables, it results in a huge loss of power and can get spurious main effects, meaning that we may have an effect of the x and z variable that doesn’t exist in the population. Say we want to predict performance on an experimental task by the ability required to do the task. We want to know whether people with high ability do better than people with low ability. We can look at this by correlating ability with performance on a task. If we’re only interested in high versus low, we could do a median split and divide the distribution. This decreases our correlation (attenuates correlation). We end up with huge within cell variations.
Assume that we are working with the equation
Y hat = b1 X + b2 Z + b3 X Z + b0
and that both X and Z are centered, and that the XZ term is the product of centered X times centered Z. What are three interpretations of the b1 coefficient, the b2 coefficient?
1) the regression of Y on x at z=0 (this is true whether it’s centered or uncentered)
2) the regression of Y on x at the arithmetic mean of z, since the mean of z, after centering, equals zero.
3) the average of all the regression slopes of Y on x at every value of z taken across the whole range of the predictor z. Average regression of Y on x across the range of z.
1) the regression of Y on z at x=0
2) the regression of Y on z at the artithmetic mean of x, since the mean of x, after centering, equals zero.
3) the average of all the regression slopes of Y on z at every value of x taken across the range of the predictor x.
Rearrange the equation
Y hat = b1 X + b2 Z + b3 X Z + b0
into a simple regression equation showing the regression of Y on X at values of Z. Explain how the regression coefficient in the rearranged equation shows that the regression of Y on X depends on the value of Z.
Y hat = (b1 + b3 Z)X + (b2 Z + b0)
Now, the value of z will determine the regression of Y on X.
What do we mean by simple regression equations, simple slopes?
Simple slopes are the equations and slopes of Y on X at each value of Z. We use these simple slopes to determine significance. Simple regression equations are the regression of Y on x at single values of z.
Explain how you would use the equation Y hat = (b1 + b3 Z)X + (b2 Z + b0) to generate three simple regression equations, one at ZHigh (one standard deviation above the mean of Z), one at ZMean (at the mean of Z), and one at ZLow (one standard deviation below the mean of Z). Be able to do this if given a numerical example.
1)estimate overall regression equation
2)rearrange equation into simple regression equations
3)gather centered statistics for: xbar, std dev of x, zbar, std dev of z
4)form simple regression equations: obtain value of z at one standard deviation above the mean (when centered), at the mean (0, when centered), and one standard deviation below the mean (when centered), then substitute these values of z in the simple regression equations.
5)plot simple regression lines
6)test for significance by t-test = simple slope / standard error of simple slope
What is meant by a linear x linear interaction?
A linear x linear interaction means that the regression of a criterion and a predictor is linear at every value of the other predictor or that the regression coefficient changes at a constant rate as a function of changes in the other predictor.
What term in a regression equation would detect a curvilinear relationship of X to Y that varied in curvature as a function of the value of Z?
The term that detects a curvilinear relationship is the squared version of the lower ordered term. In this case the degree of curvature of the relationship of X to Y depends on the value of Z.
Still considering linear by linear interactions, how many degrees of freedom does such an interaction have in Regression Analysis? What is the analog in ANOVA?
Linear by linear interactions have 1 degree of freedom. This is the same number of degrees of freedom as a 2x2 interaction in ANOVA. With only two levels of each factor in an ANOVA, we can only detect linear effects.
Explain how to find an appropriate "standardized solution" in regression with interactions. Why is the standardized solution that accompanies the solution with the centered predictors an inappropriate standardized solution?
To get a standardized solution, we must 1) form the z score of each predictor and the criterion and 2) form the cross-product of the z-scores, which becomes the predictor on which the standardized regression coefficient for the interaction is based and 3) run the regression analysis using the standardized scores you calculated. The problem with the computer printouts is that they mess up the order of operations when computing the cross-product of the z scores. Instead of creating a cross-product of the z scores between two standardized predictors, the computer will create a cross product of the unstandardized predictors and create a z score from that.
How do we build a quadratic by linear interaction into a regression analysis?
We build a quadratic by linear interaction by:
Yhat = b1X + b2X2 + b3Z + b4XZ + b5X2Z + b0
What does the quadratic by linear interaction reflect about a curvilinear relationship of a predictor X to Y in the presence of another predictor Z?
How many degrees of freedom does such an interaction have in regression analysis?
The interaction of X with Z, from the two terms involving X with Z and X with Z2. These two terms represent the curvilinear by linear interaction. It signifies that the amount of curvature of Y on Z depends on the value of Z.
Curvilinear has 2 degrees of freedom. In ANOVA, we may have interactions that have more than one degree of freedom. For instance, two way interactions have (p-1) x (q-1) degrees of freedom where p is the number of levels of Factor A and q is the number of levels in Factor B.
Contrast how designs are built into ANOVA computer programs versus into regression analyses in terms of what the data analyst must do.
In ANOVA, if you merely list the two factor names, the default is to estimate the full factorial design, all effects up through the highest possible interaction.
In regression, you must build the complete equation yourself, term by term, and put all the higher order terms you wish to estimate into the regression equation. No terms are added by default.
Explain the difference between stepwise and hierarchical regression. Which is operating based on a theoretical model and which is a variable search procedure
In hierarchical regression, predictors are entered cumulatively according to some pre-specified order which is dictated in advance by the purpose and logic of the research. The hierarchical model calls for a determination of R-squared and the partial regression coefficients of each variable or set of variables at the stage at which each variable block is added to the multiple regression.
Stepwise regression select from a group of predictors the one variable stage which has the largest semi-partial r squared, and hence makes the largest contribution to R-squared (largest t value).
Define effect size in terms of the null hypothesis.
The effect size is the degree to which the treatment changes treated subjects relative to control subjects. It’s the degree to which phenomenon is present in the population or the degree to which the null hypothesis is false. It’s a measure of how far an effect is from the null value (typically zero) in the population.
In general, how is effect size measured? What is d as a measure of effect size? How is effect size defined by Cohen for multiple and partial correlations (squared)?
Effect size is a function of the ratio of systematic variance (of predictors or tx manipulation) relative to error or residual variance). It is the proportion of Y variance accounted for by the source in question relative to the proportion of error. Effect sizes are unitless that do not depend on the scale of measurement.
Effect Size = Systematic variance accounted for / Error variance
Cohen’s d is a measure of the differences between the means of two groups (usually experimental and control) relative to the standard error. It is the systematic difference in the numerator by the random variability in the denominator.
Effect sizes for multiple regression is defined in squared terms f2 . The effect size for multiple and partial correlations is done by dividing the squared multiple correlation of a set of predictors by the equation without that set of predictors (if you are predicting a criterion from a set of predictors B).
Approximately what percent of the variance is accounted for in Cohen's (1988) definition of small, moderate, and large effect size for Pearson product moment correlations
Pearson product moment correlations of .1 (small), .3 (mod), and .5 (large) account for 1%, 9%, and 25% of the variance respectively.
What percent of variance is accounted for in Cohen's (1988) definition of small, moderate, and large effect size squared multiple correlations
Multiple or correlations (not squared) correspond to .14, .36, and .51 respectively for effect sizes. This translates to an r2 multiple and accounted variance of 2%, 13%, and 26% respectively.
Define the power of a statistical test.
The power of a statistical test is the probability that you actually detect a non-zero effect. It’s the probability of rejecting the null hypothesis, given that the null hypothesis is false.
What four factors are involved in any instance of hypothesis testing? How can these factors be used to determine the number of subjects required for a research design?
The four factors involved in statistical inference are PANE
P= power of the test (i.e., the probability of rejecting a false null)
A= the level of significance chosen (alpha)
N= the sample size (n)
E= the effect size of the effect
The experimenter can specify in advance the power desired (.8)
The experimenter can specify the alpha level in advance
Estimates of effect size can come from past research or by using the small, mod, and large values set by Cohen.
What is the formula for squared semi-partial correlation?
r^2 y(set2 above set 1) = r^2 y(all) - r^2 y(set1)