model selection Flashcards

(15 cards)

1
Q

What is the goal of model selection?

A

To choose the simplest model that explains the data well (parsimony), balancing underfitting (too few variables) and overfitting (too many variables).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is collinearity, and why is it a problem?

A

Collinearity occurs when predictors are correlated. It causes unstable coefficient estimates and inflated standard errors, making it hard to isolate individual effects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you detect collinearity?

A

Use Variance Inflation Factor (VIF).

Rule of thumb: VIF > 5 (or 10) indicates high collinearity.

R code: car::vif(model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you fix collinearity?

A

Remove one of the correlated predictors.

Combine them (e.g., average left/right leg length).

Use regularization (e.g., ridge regression).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between forward selection and backward elimination?

A

Forward: Start with no predictors; add one at a time based on significance.

Backward: Start with all predictors; remove the least significant one at a time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is AIC, and how do you interpret it?

A

Akaike Information Criterion balances model fit and complexity:

AIC=−2log(likelihood)+2P
Lower AIC = better model.

R code: AIC(model1, model2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When should you use AICc instead of AIC?

A

Use AICc (corrected for small samples) when the sample size is close to the number of parameters (e.g., n/P<40).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between AIC and BIC?

A

AIC: Favors better-fitting models (+2P penalty).

BIC: Favors simpler models (+Plog(n) penalty, harsher for large n).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you compare non-nested models (e.g., Y ~ X1 + X2 vs. Y ~ X1 + X3)?

A

Use AIC/BIC (F-tests only work for nested models).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does dredge() (from MuMIn) do?

A

It ranks all possible models by a fit criterion (e.g., AICc).

Key output: Models sorted by AICc; “delta” shows difference from the best model.

R code: options(na.action = “na.fail”); MuMIn::dredge(model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you interpret AIC weights?

A

Weights sum to 1 across models; higher weights indicate stronger evidence for that model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why might automated selection (e.g., stepwise) be risky?

A

It can overfit noise in the data and produce unstable models. Always validate with theory/hypotheses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What’s the difference between Type I and Type II SS in ANOVA?

A

Type I: Sequential (order matters).

Type II: Tests each predictor after accounting for others (order doesn’t matter).

R code: car::Anova(model, type = “II”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When should you use p-values vs. AIC for selection?

A

p-values: For hypothesis testing (e.g., “Is SST significant?”).

AIC: For prediction-focused model comparison.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Occam’s Razor in model selection?

A

Among models with similar explanatory power, the simplest (fewest parameters) is best.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly