linear model diagnostic Flashcards

(15 cards)

1
Q

What are the four key assumptions of linear regression?

A

Linearity: Relationship between predictors and response is linear.

Independence: Residuals are uncorrelated (no autocorrelation).

Homoscedasticity: Residuals have constant variance.

Normality: Residuals are normally distributed (critical for inference).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you check for linearity?

A

Plot: Residuals vs. Fitted values (plot(model, which = 1)).

Look for: Random scatter around zero (no patterns/U-shapes).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does heteroscedasticity look like, and how do you test for it?

A

Signs: Funnel shape in Residuals vs. Fitted plot.

Tests:

Breusch-Pagan test (lmtest::bptest(model)).

Scale-Location plot (plot(model, which = 3)).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you assess normality of residuals?

A

Q-Q Plot: plot(model, which = 2) → Points should follow the dashed line.

Test: Shapiro-Wilk (shapiro.test(residuals(model)).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is multicollinearity, and how do you detect it?

A

Definition: Predictors are highly correlated.

Detection:

VIF > 5 or 10 (car::vif(model)).

High correlation matrix (cor(df)).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are leverage points vs. influential points?

A

Leverage: Unusual predictor values (high hat values).

Influence: Changes model coefficients (high Cook’s distance).

Check: plot(model, which = 5) or influence.measures(model).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you interpret Cook’s distance?

A

Rule of thumb: Values > 4/n are influential.

R code: plot(model, which = 4).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

hat is a partial residual plot, and why is it useful?

A

Purpose: Isolates the relationship between Y and Xj
, adjusting for other predictors.

R code: car::crPlots(model).

Interpretation: Linear trend supports linearity assumption.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you check for outliers?

A

Studentized residuals:
∣ri∣>2 or >3.

R code: rstudent(model).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is adjusted R^2
, and when should you use it?

A

Definition: R2 penalised for unnecessary predictors.

Use: Compare models with different numbers of predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is MSE and how does it relate to predictive power?

A

Mean Squared Error: Average squared residuals.

Lower MSE = Better fit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you use AIC/BIC for model diagnostics?

A

Purpose: Balance fit and complexity (lower = better).

R code: AIC(model), BIC(model).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Durbin-Watson test used for?

A

Tests: Independence of residuals (autocorrelation).

H₀: No autocorrelation.

R code: lmtest::dwtest(model).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you fix non-constant variance (heteroscedasticity)?

A

Transformations: LogY or sqrtY

Models: Use weighted least squares (WLS).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are component + residual plots (aka partial residual plots)?

A

Shows: Nonlinearity in predictors after accounting for others.

R code: car::crPlots(model).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly