Linear Regression Models Flashcards

1
Q

What are some recommendations when selecting control variables?

A
  1. When in doubt, leave them out!
  2. Select conceptually meaningful CVs and avoid proxies
  3. When feasible, include CVs in hypotheses and models
  4. Clearly justify the measures of CVs and the methods of control.
  5. Subject CVs to the same standards of reliability and validity as are applied to other variables.
  6. If the hypotheses do not include CVs, do not include CVs in the analysis
  7. Conduct comparative tests of relationships between the IVs and CVs.
  8. Run results with and without the CVs and contrast the findings.
  9. Report standard descriptive statistics and correlations for CVs, and the correlations between the measured predictors and their partialled counterparts.
  10. Be cautious when generalizing results involving residual variables.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is regression analysis?

A

Regression analyses are a set of statistical techniques that allow one to assess the relationship between one dependent variable DV and several IVs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does a parameter estimate explain?

A

Parameter estimates in multiple regression are the unstandardized regression coefficients (B weights). B weight for a particular IV represents the change in the DV associated with a one unit change in that IV, all other IVs held constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is one important limitation to Regression Analyses ?

A

Regression analyses reveal relationships among variables but do not imply that the relationships are causal.
An apparently strong relationship between variables could stem from many sources, including the influence of other, currently unmeasured variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some assumptions of multiple regression?

A
  • Linearity: the population model is linear in it’s parameters.
  • Random sampling
  • No perfect collinearity/ multicollinearity
  • Zero conditional mean of error: can be cause by misspecifiing relationships or omitting important variables correlated with IVs.
  • Homoscedasdicity: IVs error term must have the same variance.
  • Normality
  • Outliers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the assumption of homoscedasticity mean?

A

It is the assumption that the standard deviations of errors of prediction are approximately equal for all predicted DV scores.
Homoscedasticity means that the band enclosing the residuals is approximately equal in width at all values of the predicted DV.
Heteroscedasticity may occur when some of the variables are skewed and others are not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Mention the Major Types of Multiple Regression.

A

There are three major analytic strategies in multiple regression: standard multiple regression, sequential (hierarchical) regression, and statistical (stepwise) regression.
Differences among the strategies involve what happens to overlapping variability due to correlated IVs and who determines the order of entry of IVs into the equation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain Standard Multiple Regression

A

In the standard, or simultaneous, model, all IVs enter the regression equation at once.
Each IV is evaluated in terms of what it adds to prediction of the DV that is different from the predictability afforded by all the other IVs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain Sequential Multiple Regression

A

In sequential regression (hierarchical regression), IVs enter the equation in an order specified by the researcher.
Each IV is assessed in terms of what it adds to the equation at its own point of entry.
The researcher normally assigns order of entry of variables according to logical or theoretical considerations. Variables with greater theoretical importance could also be given early entry.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain Statistical (Stepwise) Regression

A

Statistical regression (stepwise regression) is a procedure, in which order of entry of variables is based solely on statistical criteria. It is typically used to develop a subset of IVs that is useful in predicting the DV, and to eliminate those IVs that do not provide additional prediction to the IVs already in the equation.

There are three versions of statistical regression: forward selection, backward deletion, and stepwise regression.

In forward selection, the equation starts out empty and IVs are added one at a time provided they meet the statistical criteria for entry. Once in the equation, an IV stays in. Often start with the variable with the highest simple correlation with the DV.

stepwise regression: same as forward, but also remove variables that are least useful in explaining variance.

In backward deletion, the equation starts out with all IVs entered and they are deleted one at a time if they do not contribute significantly to regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the reasons for choosing the different types of regression?

A

To simply assess relationships among variables and answer the basic question of multiple correlation, the method of choice is the standard multiple regression.

Reasons for using sequential regression are theoretical or for testing hypotheses. It allows the researcher to control the advancement of the regression process.

Statistical regression is a model-building rather than model-testing procedure. As an exploratory technique, it may be useful for such purposes as eliminating variables that are clearly superfluous to tighten up future research.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When should you use centering in multiple regression?

A

When you want to include interactions of IVs in the prediction equation, they can cause problems of multicollinearity unless they have been centered: converted to deviation scores so that each variable has a mean of zero
Centering an IV does not affect its simple correlation with other variables, but it does affect regression coefficients for interactions or powers of IVs included in the regression equation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a type I error?

A

we reject H0 although it is truth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a type II error?

A

the error that occurs when one fails to reject a null hypothesis that is actually false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is used for goodness of fit test in multiple regression?

A

R^2 and adjusted R^2.

Adjusted R^2 is best because it penalizes for added IVs.

It explains how much variance in the DV is explained by our model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why use control variables?

A

Adding control variables in the model means that the variables of interest are explaining the “left over” variance, especially if they are correlated with the control variables themselves.

17
Q

What does test statistics indicate?

A

It is a number that summarizes how good our model is, how close is it is to reality, how much of the variance in DV is our model explaining.

18
Q

Name some different test statistics.

A

F-test- variance explained by model (between-group variance in ANOVA) / error variance

Chi-squared- the result of fitting function, we know that if we model reality in our CFA, the fitting function would be 0

T-statistics in a regression is based on variance of the coefficient beta estimate

19
Q

What can be done is the assumption of normality is not fulfilled?

A

Look in the bootstrap table for coefficent. If 0 is not included in the confidence interval, the result of the IV is significant,