Exam 1 Flashcards Preview

Multiple Regression > Exam 1 > Flashcards

Flashcards in Exam 1 Deck (64):

What are three uses of multiple regression

Theory Testing


What three criteria must be met for making causal statements?

Temporal Precedence
Ruling out alternative explanations


How do you calculate sample variance?



How do you calculate sample standard deviation



How do you calculate variation?



How do you calculate covariation?



How do you calculate covariance



How do you calculate correlation?

SP/√SSx * √SSy


What is a phi correlation?

Phi correlations are correlations between two true dichotomies. For example, we can correlate treatment condition and gender.


What are point-biserial correlations?

Point biserial correlations are correlations between a continuous variable and a true dichotomy (categorical). For example, we can correlate the treatment condition by number of sick days.


What is the difference between true and artificial dichotomies?

True dichotomies are discrete categories, while artificial dichotomies are categories that are created by making a cut score on a continuum.


What does the notation in the subscript of byx signify?

This is the slope of the regression line that summarizes the relationship between the predictor variable and the criterion. It’s also called the regression coefficient or the coefficient for the regression of Y on X.


What is the function of the regression intercept?

This tells us the y-intercept of the regression equation. It’s the regression constant. It tells us the value of Y when X=0. The function is to produce equality of the means of the observed and predicted scores. The mean of the predicted scores always equals the mean of the observed scores


If there is no relationship between predictor X and criterion Y, what is the best prediction of Y for any value of X?

The best prediction of Y for any value of X will be the mean of Y.


Into what two components can we partition any observed criterion score? What does each of these components measure? Are they orthogonal or do they share overlapping variance?

A criterion score can be partitioned into the part that can be predicted from X and the part that cannot be predicted from X (the residual). These parts are orthogonal (non-overlapping). An observed score equals the mean of the predicted score plus the residual.


What does r^2yyhat measure? What is it called?

This is the squared multiple correlation and it is a global effect size measure of a complete regression equation. There is only one R2 multiple for the whole regression equation. It’s the proportion of variation in the criterion Y accounted for by the set of predictors.


Given SSy, SSyhat (predictable variation), and SSy-yhat (residual variation). How would you predict r2multiple?

You would divide SSyhat by SSy.


In general, what hypothesis is tested in the analysis of regression?

Hypothesis tests are always written in population parameters. In regression, our null hypothesis is that the population regression coefficient (rho) equals 0, meaning that the proportion of variance explained by the predictor equals 0. The alternative hypothesis is that the population regression coefficient is greater than 0.


What is the structure of the F test and its degrees of freedom in the one-predictor case?

The structure of the F-test is the ratio of the mean squares regression (predictable) over the mean squares of the residuals. Said differently, it is the systematic variance among the predicted scores plus the random variance among the predicted scores divided by the random variance among the residual scores. df= (p, n-p-1)


What are meant by biased versus unbiased estimators?

An unbiased estimator is one whose expected value equals the corresponding population parameter. An example of this is the estimation of the sample mean and the expected value of the sample regression coefficient. A biased estimator is one where the value is not the same as the corresponding population parameter. It can be negatively (smaller than population) or positively (greater than population) biased. An example of a negatively biased estimator is degrees of freedom in sample standard deviation. An example of a positively biased estimator is the sample r2 multiple.


What is a sampling distribution? How would you create a sampling distribution of a regression coefficient? Is the sample regression coefficient an unbiased statistic or a biased statistic?

A sampling distribution is a distribution made up of j samples containing n people from a population with replacement. It’s a relative frequency distribution of a sample statistic. We can create a sampling distribution of a regression coefficient by continuously sampling sample regression coefficients, which would create a sampling distribution of the regression coefficient. This shows how stable or unstable the sample regression coefficient will be from sample to sample. This would be an unbiased statistic.


What is the standard error, what does it measure? What two things can we do to decrease the standard error of the regression coefficient?

The standard error is the measure of instability of the sample regression coefficient as an estimate of the population regression coefficient. A standard error is the Std deviation of a sampling distribution. We can increase the sample size and sample widely on x in order to decrease the standard error.


What is the standard error of estimate?

This is the standard deviation of the residual scores. It helps to measure which extreme scores bias the outcome.


What is the general structure of all confidence intervals?

C [ byx – A < Byx < byx + A ] = 1 - alpha


What two values are contained in the allowance factor? What is another name for the allowance facto

An allowance factor is a margin of error that contains the critical t value and the standard error of the regression coefficient


If a confidence interval on a regression coefficient includes the value zero, what does that tell you about the significance of the regression coefficient?

If a confidence interval includes zero, it tells me that 0 is a possible value of the true population statistic, which means that the regression equation is not significant.


Be able to explain what unstandardized byx measures

Unstandardized regression coefficient measures the change in Y for every 1 unit increase in X.


In what units are standardized regression coefficients measured?

Standardized regression coefficients are measured in standard deviation units.


What is the relationship between the standardized regression coefficient and the correlation between X and Y in the one-predictor case?

b*yx = rxy


In general, what is meant by a “loss function?” State algebraically the loss function for OLS regression in the one predictor case, i.e. show specifically what is minimized.

A loss function is a function of the data that we want to behave in a certain way. In OLS, we want to select the slope and the intercept so that the squared residuals of the line are as low as possible. We do this analytically by starting with the loss function (minimize sum of squared residuals) and get an equation for the slope (byx) and the intercept (bo) that minimizes the sum of the squared residuals as much as possible and creates a unique solution. The loss function is the function that sets the criterion for the solution to the analysis.


What is the difference between pairwise and listwise deletion. Be able to explain how each treats missing data. Are these approaches the current state of the art or are there newer approaches?

Listwise deletion (AKA casewise deletion, complete case deletion) throws out all the data of any case in which a person is missing even one score out of the variables in the variables listed for analysis. Pairwise deletion means that all the data points available for a particular correlation are entered into that correlation. These are not optimal nor state-of-the-art methods. In fact, multiple imputation and maximum likelihood are much better approaches.


What is a scatterplot matrix, i.e., what is contained in a scatterplot matrix and how is it laid out?

A scatterplot matrix is a matrix of scatterplots which is a visual representation of a correlation matrix. Each correlation is replaced by a little scatterplot that comprise the correlation. It’s laid out where rjk is the Pearson product moment correlation between the variable in row j and the variable in column k.


What does the kernel density plot portray about a univariate distribution?

A kernel density estimate is a smoothing function that is applied to a sample frequency distribution in order to provide an estimate of the shape of the distribution in the population.


What is a lowess line? Is a specific model involved in generating the line? What does it show us in a bivariate plot? How is it affected by extreme scores at the ends of the continuum? How does the percent of points involved in smoothing (say 50%, 10%, 80%) affect the appearance of the lowess line?

A lowess line is a nonparametric smoothing function helps inspect whether a model is appropriate. It helps track the data and is subject to extreme scores. The percentage of points involved in fitting the line will make the line smooth or jagged. Lower percentage of data (50%) will make the line appear more jagged, whereas a higher percentage (80%) will smooth the line. Helps us note a specification error.


Describe the structure of all t tests.

All t-tests have the parameter value under the null hypothesis in the numerator divided by an estimate of the standard error of the test statistic in the denominator.

t= byx – 0/ sbyx


If you test the unstandardized regression coefficient for significance with a t-test, and then you test the standardized regression coefficient for significance with a t-test, how do the values of the two t-tests relate to one another? What are the degrees of freedom of each t-test?

The t-tests applies to both the standardized and unstandardized regression coefficients and it has the same degrees of freedom.


What is estimation in statistics? What are point estimates? What are interval estimates?

An estimation is the use of sample statistics to make an “educated guess” of the value of corresponding population parameters. Point estimates are single sample statistic value to make a direct estimate of the population parameter (Byx). Interval estimates are estimates of a range of values within which it is likely that the parameter in question lies (confidence intervals).


What is the effect of unreliability in variables on the correlation between the variables?

Unreliability attenuates the correlation


What is restriction of range? How, if at all, does restriction of range affect the correlation between two variables?

Truncation of range attenuates the correlation between variables. Because the total variance is much larger in a sample with a greater range.


What two components comprise an observed score in true score theory. Using these components, define the reliability of a score in terms of true score theory.

An observed score has a random component that weakens its correlation with other variables. An observed score is made up of the true score plus some error. The reliability of a score is the proportion of the total variance that is the true score variance. The reliability of variables is the proportion of a score that is available to correlate with other scores.


What are part-whole correlations? Can we use standard significance tests for testing the significance of part-whole correlations?

A correlation that’s been computed between some variable j and another variable w, which is the sum of scores on a set of variables the includes j. These correlations should not be attached to significance because two independent correlations correlate to some degree and this correlation is not pulled out of the part-whole correlation. The variables with the highest variance will correlate the most highly with the total score


In general what is meant by the Fixed Effects Regression Model?

It is the model underlying OLS regression. It assumes that predictors are measured without error and that the predictors take on a fixed set of values.


What does the notation by1.2 and by2.1 signify? Why are these coefficients called
"partial regression coefficients?"

The regression of y on predictor x1 holding x2 constant. The regression of y on predictor x2 holding x1 constant. These notations stand for regression coefficients. They represent predictors in our regression equation. They are called partial regression coefficients because each measures the unique prediction from one predictor as it relates to the criterion, taking out any redundancy in the prediction with the other predictors.


In a two-predictor regression analysis, describe how one would go about creating a score on predictor X2 that was completely independent of X1 (describe the two steps involved).

(1) In a regression equation, first you predict X2 from X1, which give you the part of X2 that is redundant and overlapping with X1.
(2) In a second step, you take away the part of X2 that can be predicted from X1 (i.e., is redundant with X1), so that you leave only the part that is not redundant with X1.

In equation form

(1) X2predicted = b1 X1 + b0;

(2) X2 independent of X1 = X2 - X2predicted


Explain what is meant by "statistically controlling" or "partialing out" or "holding constant" a variable.

We want to obtain a regression coefficient for a predictor that reflects its unique prediction of Y that is unrelated to the other predictors. Therefore, we statistically control for or partial out the portion that is related to the other predictor. We are taking out any variance in a predictor that overlaps with the variance of another predictor to understand the unique variance it contributes.


Describe geometrically the formation of the data X1, X2, and Y in a two-predictor regression situation. What is the regression plane? What is the predictor plane?

The two predictors together define a predictor plane, like the floor of a room (2 dimension surface). Each pair of observed values of the predictors are represented by one point on the plane defined by the predictor. The criterion Y is plotted along the third axis going perpendicular to the plane defined by the predictors (dowel rod). For each pair of values, the Y is represented by a point directly above point X1 X2, with the height of the point defined by the value of the observed score on the criterion Y. The regression plane covers the average of all the predicted scores. The intercept says how high the plane is off the ground.


What is contained in the data matrix?

The data matrix (X) contains case numbers, observed values of the predictors and the value of the criterion


What is contained in the covariation matrix?

The covariation matrix (P) contains the variation of X1, X2, and Y as well as the covariation (sum of products) between each pair of variables as well: X1 and X2, X1 with Y, and X2 with Y. The main diagonal contains the variation of each predictor and the variation of the criterion. The off-diagonal elements are all covariations between pairs of scores.


What is contained in the covariance matrix?

The covariance matrix (S) contains the variances of each predictor and the criterion on the main diagonal and the covariances off the main diagonal.


What is contained in the correlation matrix?

The correlation matrix (R) contains 1 along the main diagonal, representing the correlation with each variable with itself and the correlations off the main diagonal. The correlation of each predictor with the criterion is the validity. The column of the correlations with each predictor with the criterion is the validity vector.


Assume you have two predictors X1 and X2, and you predict Y from X1 and X2 in a single two predictor regression equation. Then you predict Y from X1 only. Under what condition, if any, will the unstandardized regression coefficient for predictor X1 in the two predictor regression equation, equal the unstandardized regression coefficient for X1 in the one predictor case? What about the standardized regression coefficient for X1 in the one predictor versus the two predictor case?

Only if x1 and x2 (the two predictors) are completely uncorrelated. Is that the same for standardized regression coefficient.


Explain the squared multiple correlation in terms of the observed criterion score and the predicted score.

The squared multiple correlation represents the square of the correlation between the observed criterion scores and the predicted scores. If it’s 0, there is no prediction.


What is a linear combination? If you use other than OLS weights in computing the predicted score, can you obtain a higher correlation between Y and than with the OLS weights?

A linear combination is any combination in the form:
W=c1X1 + c2X2 + ckXk.

This regression equation derived in ordinary least squares regression is a special case in which the weights are chosen to minimize the sum of squared residuals. The coefficients lead to a predicted score that is maximally correlated with the criterion Y. No other set of weights can lead to a linear combination of the predictors that has a higher correlation with the criterion than does the predicted score from OLS regression.


Show how you can compute r2multiple from
(a) SSregression and SSy
(b) SSresidual and SSy
(c) observed criterion Y and predicted score
(d) standardized regression coefficients and validity vector

(a) SSyhat/SSy

(b) (SSy - SSy-yhat)/SSy

(c) just correlate Y and Yhat

(d) b*y1.2 x ry1 + b*y2.1 x ry2


how R2multiple is built into this standard error and how R2multiple
affects the size of the standard error for each predictor;

The higher the correlation between predictors, the smaller the standard error. A good overall prediction measured by rsquared multiple decreases the standard error of each predictor.


how the number of predictors affects the size of the standard error for each predictor;

The more predictors, the higher the standard error. The more cases, the less the standard error.


what the tolerance of a variable is, what it measures, where it appears in the standard error, and how it affects the standard error;

We are concerned with the relationship among predictors (2 predictors). Tolerance measures the proportion of variance in the predictor that does not overlap (is independent) with the other predictor. It appears in the (1-r212). It measures the redundancy of the variable. High redundancy will cause the regression coefficients to be unstable, and increase standard error.


what the variance inflation factor is, where it appears in the standard error, and how it affects the standard error;

The variance inflation factor is a measure of the extent to which the redundancy of one predictor with the other predictor causes an increase in the standard error. It multiplies the rest of the standard error. As the correlation between predictors increases, VIF increases and the standard error of the regression coefficient increases. The extent to which the redundancy of one predictor with all other predictors increases the standard error. (1/1-r212).


Why do we need the measure r2adjusted? What is shrinkage?

We need to measure the adjusted r2 because r2 multiple is a positively biased estimate of the extent to which the predictors accounts for the criterion in the population. It’s not completely unbiased, but it is less biased. The larger the ratio of the number of predictors p to the sample size n, the greater is the downward adjustment in the estimate of the r squared multiple. There is more shrinkage in generalizing from one sample to another sample. Again, the r2multiple is a positively biased estimate, we would expect it to be greater. However, when we try to generalize to another sample, we have to keep adjusting and making r2adjusted smaller.


What does an overlay plot show?

It shows two or more dependent variables plotted as a function of a single independent variable.


What is meant by saying that in this graph the "linear trend has been removed".

The linear trend is removed. When we predict residuals from x, the slope = 0. The residuals are what is left of each observed Y score that does not fit the OLS regression line. It’s the part of each observed score that does NOT follow a linear trend. When we plot these residuals as a function of predictor X, we no longer see the linear trend.


What is meant by a specification error

A specification error means we have fitted the wrong model on our data. This can occur by having the wrong model content (omitting relevant variables) or from fitting the wrong form of a relationship. For example, we fit a linear model in our data, which resulted in an r2multiple of .187 (low correlation). When we fit a quadratic model to the data, we came up with an r2multiple of .993, which clearly fits our data better.


In the ANOVA Table of a regression output, what do the following terms stand for?
R Square
Adjusted R Square
Std. Error of the Estimate?

R: ryyhat = rmultiple, the correlation of the observed criterion scores with the predicted scores

R Square: ryyhat2 = r2multiple, squared multiple correlation, the square of the correlation of the observed criterion scores with the predicted scores; this is the effect size measure of the overall regression analysis

Adjusted R Square: an adjusted estimate of the overall squared multiple correlation in the population

Std. Error of the Estimate: Standard deviation of the residual scores, used in computing z-score of the residuals.


What does the standard error of any regression coefficient tell you?

The standard error of any sample statistic tells us the estimate of variability between our sample statistic and the true population value.