chapter 20 Flashcards

Question 1

Q

What are common violations of linear model assumptions?

Answer

A

Non-linearity: Pattern in residuals vs. fitted plot.

Heteroscedasticity: Non-constant variance (funnel shape).

Non-normality: Skewed/heavy-tailed residuals (Q-Q plot deviation).

Correlated errors: Autocorrelation (e.g., time series data).

Question 2

Q

How can you fix non-linearity?

Answer

A

Transform predictors (
log(X), X^2).

Add polynomial terms (e.g., y ~ x + I(x^2)).

Use generalized additive models (GAMs).

Question 3

Q

What is heteroscedasticity, and how do you address it?

Answer

A

Problem: Residual variance changes with fitted values.

Solutions:

Transform Y (e.g., log(Y)).

Use weighted least squares (WLS).

Robust standard errors (e.g., sandwich::vcovHC()).

Question 4

Q

How do you handle non-normal residuals?

Answer

A

Mild cases: Robust methods (e.g., bootstrapping).

Severe cases:

Transform Y (Box-Cox).

Switch to non-parametric models (quantile regression).

Question 5

Q

What is bootstrapping in regression?

Answer

A

A resampling technique to estimate uncertainty when assumptions fail:

Repeatedly sample data with replacement.

Refit the model to each sample.

Compute confidence intervals from bootstrap distributions.

Question 6

Q

When should you use bootstrapping?

Answer

A

Sample size is small.

Residuals are non-normal.

Formulas for standard errors are unreliable.

Question 7

Q

What are residual bootstrap vs. case bootstrap?

Answer

A

Residual bootstrap: Resamples residuals (preserves predictors).

Case bootstrap: Resamples entire rows (more common, robust).

Question 8

Q

How does bootstrapping help with heteroscedasticity?

Answer

A

Provides valid CIs for coefficients without assuming constant variance.

Uses empirical distribution of data rather than theoretical formulas.

Question 9

Q

What is a robust regression method?

Answer

A

Example: Huber regression (MASS::rlm()).

Purpose: Less sensitive to outliers than OLS.

Use case: Heavy-tailed residuals.

Question 10

Q

How do you check for influential points?

Answer

A

Cook’s distance: plot(model, which = 4).

DFFITS/DFBETAS: influence.measures(model).

Rule: Cook’s D> n/4

Question 11

Q

What is the Box-Cox transformation?

Answer

A

Finds optimal λ to make Y more normal:

Question 12

Q

How do bootstrap confidence intervals differ from classical CIs?

Answer

A

Classical: Assume normality (e.g., β^ ±t×SE).

Bootstrap: Empirical, distribution-free (e.g., 2.5%–97.5% percentiles).

Question 13

Q

What are sandwich estimators?

Answer

A

Robust standard errors for heteroscedasticity (sandwich::vcovHC()).

Use: lmtest::coeftest(model, vcov = vcovHC).

Question 14

Q

Why might you prefer quantile regression?

Answer

A

Models median (or other quantiles) instead of mean.

Robust to outliers, non-normality.

R code: quantreg::rq(y ~ x, tau = 0.5).

chapter 20 Flashcards

(14 cards)