model selection Flashcards

Question 1

Q

What is the goal of model selection?

Answer

A

To choose the simplest model that explains the data well (parsimony), balancing underfitting (too few variables) and overfitting (too many variables).

Question 2

Q

What is collinearity, and why is it a problem?

Answer

A

Collinearity occurs when predictors are correlated. It causes unstable coefficient estimates and inflated standard errors, making it hard to isolate individual effects.

Question 3

Q

How do you detect collinearity?

Answer

A

Use Variance Inflation Factor (VIF).

Rule of thumb: VIF > 5 (or 10) indicates high collinearity.

R code: car::vif(model)

Question 4

Q

How do you fix collinearity?

Answer

A

Remove one of the correlated predictors.

Combine them (e.g., average left/right leg length).

Use regularization (e.g., ridge regression).

Question 5

Q

What is the difference between forward selection and backward elimination?

Answer

A

Forward: Start with no predictors; add one at a time based on significance.

Backward: Start with all predictors; remove the least significant one at a time.

Question 6

Q

What is AIC, and how do you interpret it?

Answer

A

Akaike Information Criterion balances model fit and complexity:

AIC=−2log(likelihood)+2P
Lower AIC = better model.

R code: AIC(model1, model2)

Question 7

Q

When should you use AICc instead of AIC?

Answer

A

Use AICc (corrected for small samples) when the sample size is close to the number of parameters (e.g., n/P<40).

Question 8

Q

What is the difference between AIC and BIC?

Answer

A

AIC: Favors better-fitting models (+2P penalty).

BIC: Favors simpler models (+Plog(n) penalty, harsher for large n).

Question 9

Q

How do you compare non-nested models (e.g., Y ~ X1 + X2 vs. Y ~ X1 + X3)?

Answer

A

Use AIC/BIC (F-tests only work for nested models).

Question 10

Q

What does dredge() (from MuMIn) do?

Answer

A

It ranks all possible models by a fit criterion (e.g., AICc).

Key output: Models sorted by AICc; “delta” shows difference from the best model.

R code: options(na.action = “na.fail”); MuMIn::dredge(model)

Question 11

Q

How do you interpret AIC weights?

Answer

A

Weights sum to 1 across models; higher weights indicate stronger evidence for that model.

Question 12

Q

Why might automated selection (e.g., stepwise) be risky?

Answer

A

It can overfit noise in the data and produce unstable models. Always validate with theory/hypotheses.

Question 13

Q

What’s the difference between Type I and Type II SS in ANOVA?

Answer

A

Type I: Sequential (order matters).

Type II: Tests each predictor after accounting for others (order doesn’t matter).

R code: car::Anova(model, type = “II”)

Question 14

Q

When should you use p-values vs. AIC for selection?

Answer

A

p-values: For hypothesis testing (e.g., “Is SST significant?”).

AIC: For prediction-focused model comparison.

Question 15

Q

What is Occam’s Razor in model selection?

Answer

A

Among models with similar explanatory power, the simplest (fewest parameters) is best.”

model selection Flashcards

(15 cards)