multiple regression Flashcards
(17 cards)
What is multiple linear regression?
A model predicting a continuous response (Y) using two or more predictors (X₁, X₂, …).
What is multiple linear regression equation?
Y=β0+β1X1+β2X2 +⋯+βpXp+ϵ
How are categorical variables (e.g., TrawlDepth) handled in regression?
Converted to dummy variables (0/1 coding).
Baseline level: Omitted (e.g., “Bottom Trawl”).
What is multicollinearity, and how do you detect it?
Definition: Predictors are correlated, inflating standard errors.
Detection:
VIF > 5 (or 10) (from car::vif(model)).
High pairwise correlations.
How do you check model assumptions in R?
plot(model) # Check:
1. Residuals vs. Fitted (linearity)
2. Q-Q Plot (normality)
3. Scale-Location (homoscedasticity)
What’s the difference between R² and Adjusted R²?
R²: Proportion of variance explained (inflates with more predictors).
Adjusted R²: Penalizes extra predictors. Use for model comparison!
How do you test if a factor variable (e.g., Sediment) is significant?
Use ANOVA F-test:
anova(lm(WeightA ~ Depth + SST + Sediment, data=fish))
H₀: All Sediment coefficients = 0.
Reject H₀ if p < 0.05.
What is the matrix form of the regression equation?
^Y =Xβ
How do you calculate a 95% CI for a coefficient?
β^±tα/2,df×SE(β^)
What’s the difference between Type I and Type II SS?
Type I: Sequential (order matters).
Type II: Tests each predictor after accounting for others (order-independent).
How do you handle non-linear relationships in multiple regression?
Add polynomial terms (e.g., Depth + Depth²).
Use transformations (e.g., log(Y)).
What’s the key pitfall of stepwise selection?
Overfitting! It may capitalize on noise. Always validate with theory or a holdout dataset.
How do you compare non-nested models (e.g., Y ~ X1 + X2 vs. Y ~ X1 + X3)?
Use AIC/BIC (lower = better). F-tests only work for nested models.
What’s the interpretation of the intercept?
Expected Y when all predictors = 0. Often meaningless if X=0 is unrealistic (e.g., depth=0).
Why might a predictor change significance when others are added?
Due to confounding or collinearity. Example: TrawlDepth was significant alone but not with SST/Depth.
How do you fit a multiple regression model in R?
model <- lm(WeightA ~ Depth + SST + factor(TrawlDepth), data=fish)
summary(model) # Coefficients, p-values
confint(model) # 95% CIs
How do you check for multicollinearity?
car::vif(model)