chapter 20 Flashcards
(14 cards)
What are common violations of linear model assumptions?
Non-linearity: Pattern in residuals vs. fitted plot.
Heteroscedasticity: Non-constant variance (funnel shape).
Non-normality: Skewed/heavy-tailed residuals (Q-Q plot deviation).
Correlated errors: Autocorrelation (e.g., time series data).
How can you fix non-linearity?
Transform predictors (
log(X), X^2).
Add polynomial terms (e.g., y ~ x + I(x^2)).
Use generalized additive models (GAMs).
What is heteroscedasticity, and how do you address it?
Problem: Residual variance changes with fitted values.
Solutions:
Transform Y (e.g., log(Y)).
Use weighted least squares (WLS).
Robust standard errors (e.g., sandwich::vcovHC()).
How do you handle non-normal residuals?
Mild cases: Robust methods (e.g., bootstrapping).
Severe cases:
Transform Y (Box-Cox).
Switch to non-parametric models (quantile regression).
What is bootstrapping in regression?
A resampling technique to estimate uncertainty when assumptions fail:
Repeatedly sample data with replacement.
Refit the model to each sample.
Compute confidence intervals from bootstrap distributions.
When should you use bootstrapping?
Sample size is small.
Residuals are non-normal.
Formulas for standard errors are unreliable.
What are residual bootstrap vs. case bootstrap?
Residual bootstrap: Resamples residuals (preserves predictors).
Case bootstrap: Resamples entire rows (more common, robust).
How does bootstrapping help with heteroscedasticity?
Provides valid CIs for coefficients without assuming constant variance.
Uses empirical distribution of data rather than theoretical formulas.
What is a robust regression method?
Example: Huber regression (MASS::rlm()).
Purpose: Less sensitive to outliers than OLS.
Use case: Heavy-tailed residuals.
How do you check for influential points?
Cook’s distance: plot(model, which = 4).
DFFITS/DFBETAS: influence.measures(model).
Rule: Cook’s D> n/4
What is the Box-Cox transformation?
Finds optimal λ to make Y more normal:
How do bootstrap confidence intervals differ from classical CIs?
Classical: Assume normality (e.g., β^ ±t×SE).
Bootstrap: Empirical, distribution-free (e.g., 2.5%–97.5% percentiles).
What are sandwich estimators?
Robust standard errors for heteroscedasticity (sandwich::vcovHC()).
Use: lmtest::coeftest(model, vcov = vcovHC).
Why might you prefer quantile regression?
Models median (or other quantiles) instead of mean.
Robust to outliers, non-normality.
R code: quantreg::rq(y ~ x, tau = 0.5).