chapter 20 Flashcards

(14 cards)

1
Q

What are common violations of linear model assumptions?

A

Non-linearity: Pattern in residuals vs. fitted plot.

Heteroscedasticity: Non-constant variance (funnel shape).

Non-normality: Skewed/heavy-tailed residuals (Q-Q plot deviation).

Correlated errors: Autocorrelation (e.g., time series data).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can you fix non-linearity?

A

Transform predictors (
log(X), X^2).

Add polynomial terms (e.g., y ~ x + I(x^2)).

Use generalized additive models (GAMs).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is heteroscedasticity, and how do you address it?

A

Problem: Residual variance changes with fitted values.

Solutions:

Transform Y (e.g., log(Y)).

Use weighted least squares (WLS).

Robust standard errors (e.g., sandwich::vcovHC()).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you handle non-normal residuals?

A

Mild cases: Robust methods (e.g., bootstrapping).

Severe cases:

Transform Y (Box-Cox).

Switch to non-parametric models (quantile regression).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is bootstrapping in regression?

A

A resampling technique to estimate uncertainty when assumptions fail:

Repeatedly sample data with replacement.

Refit the model to each sample.

Compute confidence intervals from bootstrap distributions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When should you use bootstrapping?

A

Sample size is small.

Residuals are non-normal.

Formulas for standard errors are unreliable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are residual bootstrap vs. case bootstrap?

A

Residual bootstrap: Resamples residuals (preserves predictors).

Case bootstrap: Resamples entire rows (more common, robust).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does bootstrapping help with heteroscedasticity?

A

Provides valid CIs for coefficients without assuming constant variance.

Uses empirical distribution of data rather than theoretical formulas.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a robust regression method?

A

Example: Huber regression (MASS::rlm()).

Purpose: Less sensitive to outliers than OLS.

Use case: Heavy-tailed residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you check for influential points?

A

Cook’s distance: plot(model, which = 4).

DFFITS/DFBETAS: influence.measures(model).

Rule: Cook’s D> n/4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the Box-Cox transformation?

A

Finds optimal λ to make Y more normal:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do bootstrap confidence intervals differ from classical CIs?

A

Classical: Assume normality (e.g., β^ ±t×SE).

Bootstrap: Empirical, distribution-free (e.g., 2.5%–97.5% percentiles).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are sandwich estimators?

A

Robust standard errors for heteroscedasticity (sandwich::vcovHC()).

Use: lmtest::coeftest(model, vcov = vcovHC).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why might you prefer quantile regression?

A

Models median (or other quantiles) instead of mean.

Robust to outliers, non-normality.

R code: quantreg::rq(y ~ x, tau = 0.5).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly