Multiple Regression - Model Selection Flashcards

1
Q

How does a simultaneous regression model differ from hierarchical regression?

A

Simultaneous regression - all predictors are added at the SAME time (in a single step)

This means that the 1st variable we enter soaks up all of the variability – so if we entered physical health first, it soaked up the variability, leaving the left-overs for mental health. When we remove physical health, mental health all of a sudden is significant.

In hierarchical regression, predictors that are most important are entered first, then step 2 will include the first and 2nd; etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What affects the attribution of shared variability in simultaneous and hierarchical regression?

A

The way the variables are entered (in their order).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why would we use the hierarchical model?

A

If we want to see the effects over and above a covariate - we will enter the uninteresting predictors first.

ex)If we add SES (covariate) in the model first, we can see what we’re actually interested in, after controlling for SES.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If there’s no theory about predictors, what do we do?

A

We run all regressions!

We first evaluate all possible regressions, or employ a selection algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

If we compare all possible regressions (2 to the k) involving a set of predictors in a search for the “best” model, on what criteria might we compare the models? (5)

A
  1. Compare models to R² (largest)- problem is Inflation (model with more predictors always looks better)
  2. Adj R2 (largest)= We are penalized by adding predictors that aren’t worth their weight in df.
  3. Press (Smallest)- Prediction sum of squares deleting 1 residual, computing new sum of squares…
  4. Mallow CP (smallest)- Addresses the error of the fitted values and predictor after removal of residuals. The expected value of Mallow CP is 1 + the predicted value we have.
  5. Parsimony - Less predictors the better
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does a significant R2∆ change indicate in a hierarchical regression output?

ex)
Step 1 R2∆ is significant.

Step 2 R2∆ is not significant.

Step 3 R2∆ is significant.

A

Step 1, or the physical health contributes to the model significantly

Step 2, or physical and mental health does not contribute to the model significantly

Step 3, or physical, mental health and stress contributes to the model significantly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In regards to degrees of freedom 2 (df2) in the hierarchical r output, how does it go up and down?

A

As we add more predictors, the denominator DF (n-k-1) goes down.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

As we add more predictors to a model, what happens to the R2?

A

The R2 value inflates- making it look like we have more effect when we probably don’t.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does selection algorithms do?

A

We can compare 2 predictors each time, instead of using simultaneous or hierarchical regression… this way, we don’t inflate the R2 value and we might actually see where the differences are without any inflation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What’s the formula for Adjusted R2?

A

1 - (n-1)(1-R2) ÷ n-k-1

or by doing

SSres ÷ SStotal x (dfTotal ÷ dfRes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the major disadvantage to the all possible regression approach?

A

It is time-consuming. Evaluating 32 models by just having 5 predictors (4,096 times!)… is daunting. The difference between R2 and adjusted R2 would be very large.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the function of the Model Selection Algorithm?

What does R use to compute selection algorithms?

What are the 3 types of getting the AIC?

A

It doesn’t compute all possible models-uses just an algorithm to develop a single “best” model.

R uses AIC

3 Types:

  • Forward selection
  • Backward elimination
  • Stepwise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the algorithm that R uses to compute the model selection algorithm?

A

It uses the AIC, or the Akaike Information Criterion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is AIC referred to as a measure of relative goodness of fit of a model?

A

AIC is a measure of model fit that can only be compared to the RELATIVE goodness of fit of a model - it is not a free-standing measure of model fit. Only when it’s compared to another model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What’s the difference between R² and AIC?

A

R² is an absolute model fit value, while AIC can only be used to compare it to other models with the same outcome variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What’s the benefit of using AIC?

A

It’s a trade-off between model complexity and accuracy (2k, or 2 times the model predictors).

It penalizes us for adding predictors that don’t improve our model.

17
Q

Which is the preferred model in regards to AIC?

A

Preferred model has the smallest AIC - it’s the most accurate because the larger value of AIC indicates worse fit, corrected for the number of variables.

18
Q

How do the 3 types of model selection algorithms differ?

A

They differ in the way the variables are added and removed to the model.

19
Q

Describe forward selection.

A

Forward selection – start with the smallest model (usually Empty or Covariate), then r determines which model adding one predictor at a time produces the model with the smallest AIC.

20
Q

Describe backwards elimination.

A

Starts with full model that includes all predictors. Shows which predictor can be removed to produce the smallest AIC.

21
Q

What’s the problems with the forward selection and backward elimination algorithms?

A

Due to shared variability, a previously added or removed predictor could become a bad good or bad predictor.

The problem is that good or bad predictors cannot be added or removed from these models because for forward selection, we can only add, and backward elimination, we can only remove.

22
Q

Describe Stepwise selection.

A

Stepwise combats the limitations of forward and backwards by removing and adding predictors that will result in the model with the lowest AIC.

23
Q

How do we address the model selection algorithms and its capitalization on chance?

What do I mean by capitalization on chance?

A

All of the model selection algorithms (AIC) are all sample-based estimates of R², therefore it overestimates population R². Adjusted R² addresses the capitalization on chance by indicating the loss of predictive power or shrinkage.

Shrinkage:
We can address this issue by applying the prediction equation developed in 1 sample to a second sample which will result in a smaller R² (calculated as the squared correlation bw Y and y-hat), providing a better estimate of the population.

24
Q

How can we adjust for Shrinkage?

A
  • Cross-validate- collect data from 2nd sample to see if it matches
  • Double cross-validate: apply estimate model from old sample to new and new to old.
  • Data Splitting – split a large dataset in half and cross-validate within the dataset