linear model diagnostic Flashcards

Question 1

Q

What are the four key assumptions of linear regression?

Answer

A

Linearity: Relationship between predictors and response is linear.

Independence: Residuals are uncorrelated (no autocorrelation).

Homoscedasticity: Residuals have constant variance.

Normality: Residuals are normally distributed (critical for inference).

Question 2

Q

How do you check for linearity?

Answer

A

Plot: Residuals vs. Fitted values (plot(model, which = 1)).

Look for: Random scatter around zero (no patterns/U-shapes).

Question 3

Q

What does heteroscedasticity look like, and how do you test for it?

Answer

A

Signs: Funnel shape in Residuals vs. Fitted plot.

Tests:

Breusch-Pagan test (lmtest::bptest(model)).

Scale-Location plot (plot(model, which = 3)).

Question 4

Q

How do you assess normality of residuals?

Answer

A

Q-Q Plot: plot(model, which = 2) → Points should follow the dashed line.

Test: Shapiro-Wilk (shapiro.test(residuals(model)).

Question 5

Q

What is multicollinearity, and how do you detect it?

Answer

A

Definition: Predictors are highly correlated.

Detection:

VIF > 5 or 10 (car::vif(model)).

High correlation matrix (cor(df)).

Question 6

Q

What are leverage points vs. influential points?

Answer

A

Leverage: Unusual predictor values (high hat values).

Influence: Changes model coefficients (high Cook’s distance).

Check: plot(model, which = 5) or influence.measures(model).

Question 7

Q

How do you interpret Cook’s distance?

Answer

A

Rule of thumb: Values > 4/n are influential.

R code: plot(model, which = 4).

Question 8

Q

hat is a partial residual plot, and why is it useful?

Answer

A

Purpose: Isolates the relationship between Y and Xj
, adjusting for other predictors.

R code: car::crPlots(model).

Interpretation: Linear trend supports linearity assumption.

Question 9

Q

How do you check for outliers?

Answer

A

Studentized residuals:
∣ri∣>2 or >3.

R code: rstudent(model).

Question 10

Q

What is adjusted R^2
, and when should you use it?

Answer

A

Definition: R2 penalised for unnecessary predictors.

Use: Compare models with different numbers of predictors.

Question 11

Q

What is MSE and how does it relate to predictive power?

Answer

A

Mean Squared Error: Average squared residuals.

Lower MSE = Better fit.

Question 12

Q

How do you use AIC/BIC for model diagnostics?

Answer

A

Purpose: Balance fit and complexity (lower = better).

R code: AIC(model), BIC(model).

Question 13

Q

What is the Durbin-Watson test used for?

Answer

A

Tests: Independence of residuals (autocorrelation).

H₀: No autocorrelation.

R code: lmtest::dwtest(model).

Question 14

Q

How do you fix non-constant variance (heteroscedasticity)?

Answer

A

Transformations: LogY or sqrtY

Models: Use weighted least squares (WLS).

Question 15

Q

What are component + residual plots (aka partial residual plots)?

Answer

A

Shows: Nonlinearity in predictors after accounting for others.

R code: car::crPlots(model).

linear model diagnostic Flashcards

(15 cards)