Unit 2: Multiple Linear Regression Flashcards
What is the purpose of MLR?
To predict a response variable (Y) using a set of predictor variables (X1, X2,…Xi)
What is the method of MLR
BLUE
What are the assumptions for MLR?
LINE
Linearity: Y can be modeled as a linear function of the independent variables.
Independence: Observations are independent and of equal importance; predictors are linearly independent of each other (no multicollinearity)
Normality of errors: Errors are independent and normally-distributed with zero mean
and constant variance
E
How can we check if a linear relationship is appropriate?
- Plot of the residuals against the fitted values (y)
2. Plot of the residuals against each predictor variable(xij)
How can we check if the error assumptions are appropriate?
- Plot of the residuals against the fitted values (y)
- Plot of the residuals against each predictor variable (xij)
- Histogram and/or normal probability plot of the residuals
- Plot of the residuals against the index or order of data collection (to check independence)
What is the overall F-Test?
What does it mean when we reject the null of a n Overall F test?
Tests that the entire collection of independent variables are associated with the outcome.
Rejecting H0 indicates that the model with all predictors is better than an intercept-only model; further testing may be needed.
(H0: All Bj = 0
H1 : at least one Bj not equal to 0)
What is the partial T-Test?
What does it mean to reject the null of a partial t-test?
Tests that a specific independent variable is associated with the outcome, given the association with the other predictors has
already been accounted for
Rejecting H0 : j = 0 implies that there is signicant evidence of a
linear association between Xj and Y, given all other predictors are
already included in the model
What is the partial F-Test?
What are the hypotheses?
Tests that a specific collection of independent variables associated with the outcome, given the association with the other predictors has already been accounted for.
The reduced has to be a nested version of the full model.
Hypotheses:
H0: Reduced is better than the full
H1: Full model is the better model
Rejecting H0 indicates that the full model is better than the reduced model; further testing may be needed.
How can you check for multicollinearity?
Checking for multicollinearity problems:
Plot predictor variables against each other
Look for large sample correlation coefficients
Look for large variance inflation factors (VIFs)
How can we solve for the unconditional variance of Y using the ANOVA table?
We can multiply the SST by n-1.
Will SSM overlap for independent predictors?
No! Independent predictors will not have overlapping SSMs.
Can the Adjusted R2 be negative?
YES! for really poor models where there are too many predictors, since it penalizes for number of predictors
Type I SSM Characteristics
- ‘Sequential sums of squares’
- Predictor-order matters
- Sums to the overall SSM
- Useful for conducting partial F-tests
Type III SSM Characteristics
- `Partial sums of squares’
- Predictor-order does not matter
- Does not sum to the SSM (unless predictors are independent)
- Useful for computing partial correlations and partial R2
When is a variable a confounder?
Variable Z is a confounder (`lurking variable’) if it’s inclusion changes the relationship between X and Y (e.g., Department
confounds the relationship between gender and admission rates