linear model diagnostic Flashcards
(15 cards)
What are the four key assumptions of linear regression?
Linearity: Relationship between predictors and response is linear.
Independence: Residuals are uncorrelated (no autocorrelation).
Homoscedasticity: Residuals have constant variance.
Normality: Residuals are normally distributed (critical for inference).
How do you check for linearity?
Plot: Residuals vs. Fitted values (plot(model, which = 1)).
Look for: Random scatter around zero (no patterns/U-shapes).
What does heteroscedasticity look like, and how do you test for it?
Signs: Funnel shape in Residuals vs. Fitted plot.
Tests:
Breusch-Pagan test (lmtest::bptest(model)).
Scale-Location plot (plot(model, which = 3)).
How do you assess normality of residuals?
Q-Q Plot: plot(model, which = 2) → Points should follow the dashed line.
Test: Shapiro-Wilk (shapiro.test(residuals(model)).
What is multicollinearity, and how do you detect it?
Definition: Predictors are highly correlated.
Detection:
VIF > 5 or 10 (car::vif(model)).
High correlation matrix (cor(df)).
What are leverage points vs. influential points?
Leverage: Unusual predictor values (high hat values).
Influence: Changes model coefficients (high Cook’s distance).
Check: plot(model, which = 5) or influence.measures(model).
How do you interpret Cook’s distance?
Rule of thumb: Values > 4/n are influential.
R code: plot(model, which = 4).
hat is a partial residual plot, and why is it useful?
Purpose: Isolates the relationship between Y and Xj
, adjusting for other predictors.
R code: car::crPlots(model).
Interpretation: Linear trend supports linearity assumption.
How do you check for outliers?
Studentized residuals:
∣ri∣>2 or >3.
R code: rstudent(model).
What is adjusted R^2
, and when should you use it?
Definition: R2 penalised for unnecessary predictors.
Use: Compare models with different numbers of predictors.
What is MSE and how does it relate to predictive power?
Mean Squared Error: Average squared residuals.
Lower MSE = Better fit.
How do you use AIC/BIC for model diagnostics?
Purpose: Balance fit and complexity (lower = better).
R code: AIC(model), BIC(model).
What is the Durbin-Watson test used for?
Tests: Independence of residuals (autocorrelation).
H₀: No autocorrelation.
R code: lmtest::dwtest(model).
How do you fix non-constant variance (heteroscedasticity)?
Transformations: LogY or sqrtY
Models: Use weighted least squares (WLS).
What are component + residual plots (aka partial residual plots)?
Shows: Nonlinearity in predictors after accounting for others.
R code: car::crPlots(model).