Unit 2: Multiple Linear Regression Flashcards

Question 1

Q

What is the purpose of MLR?

Answer

A

To predict a response variable (Y) using a set of predictor variables (X1, X2,…Xi)

Question 2

Q

What is the method of MLR

Question 3

Q

What are the assumptions for MLR?

Answer

A

LINE
Linearity: Y can be modeled as a linear function of the independent variables.

Independence: Observations are independent and of equal importance; predictors are linearly independent of each other (no multicollinearity)

Normality of errors: Errors are independent and normally-distributed with zero mean
and constant variance

E

Question 4

Q

How can we check if a linear relationship is appropriate?

Answer

A

Plot of the residuals against the fitted values (y)

2. Plot of the residuals against each predictor variable(xij)

Question 5

Q

How can we check if the error assumptions are appropriate?

Answer

A

Plot of the residuals against the fitted values (y)
Plot of the residuals against each predictor variable (xij)
Histogram and/or normal probability plot of the residuals
Plot of the residuals against the index or order of data collection (to check independence)

Question 6

Q

What is the overall F-Test?

What does it mean when we reject the null of a n Overall F test?

Answer

A

Tests that the entire collection of independent variables are associated with the outcome.

Rejecting H0 indicates that the model with all predictors is better than an intercept-only model; further testing may be needed.
(H0: All Bj = 0
H1 : at least one Bj not equal to 0)

Question 7

Q

What is the partial T-Test?

What does it mean to reject the null of a partial t-test?

Answer

A

Tests that a specific independent variable is associated with the outcome, given the association with the other predictors has
already been accounted for

Rejecting H0 : j = 0 implies that there is signicant evidence of a
linear association between Xj and Y, given all other predictors are
already included in the model

Question 8

Q

What is the partial F-Test?

What are the hypotheses?

Answer

A

Tests that a specific collection of independent variables associated with the outcome, given the association with the other predictors has already been accounted for.
The reduced has to be a nested version of the full model.

Hypotheses:
H0: Reduced is better than the full
H1: Full model is the better model

Rejecting H0 indicates that the full model is better than the reduced model; further testing may be needed.

Question 9

Q

How can you check for multicollinearity?

Answer

A

Checking for multicollinearity problems:
Plot predictor variables against each other
Look for large sample correlation coefficients
Look for large variance inflation factors (VIFs)

Question 10

Q

How can we solve for the unconditional variance of Y using the ANOVA table?

Answer

A

We can multiply the SST by n-1.

Question 11

Q

Will SSM overlap for independent predictors?

Answer

A

No! Independent predictors will not have overlapping SSMs.

Question 12

Q

Can the Adjusted R2 be negative?

Answer

A

YES! for really poor models where there are too many predictors, since it penalizes for number of predictors

Question 13

Q

Type I SSM Characteristics

Answer

A

‘Sequential sums of squares’
Predictor-order matters
Sums to the overall SSM
Useful for conducting partial F-tests

Question 14

Q

Type III SSM Characteristics

Answer

A

`Partial sums of squares’
Predictor-order does not matter
Does not sum to the SSM (unless predictors are independent)
Useful for computing partial correlations and partial R2

Question 15

Q

When is a variable a confounder?

Answer

A

Variable Z is a confounder (`lurking variable’) if it’s inclusion changes the relationship between X and Y (e.g., Department
confounds the relationship between gender and admission rates

Question 16

Q

When is there interaction/effect modifier?

Answer

Study These Flashcards

A

`The relationship between X and Y depends on the values of Z.’

Interactions (`effect modiers’) can be used to account for a relationship between Y and X that varies across the levels/values of Z

Question 17

Q

Why center predictors?

Answer

Study These Flashcards

A

Change the interpretation of B0: The average value of the response variable at the average
value of the predictors (i.e., y).

It helps to alleviate `variance inflation’ issues associated with fitting models with higher-order polynomial terms, a special case of
multicollinearity.

Question 18

Q

Why would you standardize your predictors?

Answer

Study These Flashcards

A

The magnitude of coefficient estimates is comparable acrosspredictors
It puts all predictors `on an equal playing eld’ when building a model
Similar to centering, it helps alleviate a special type of multicollinearity issue introduced when fitting models with higher-order polynomial terms
You should only standardize continuous predictors with roughly Normal distributions - do not standardize categorical predictors!

Question 19

Q

Why is it a problem if the predictors are correlated?

Answer

Study These Flashcards

A

There are typically large changes between the regression coefficients
in the unadjusted and adjusted models.
It is dicult to interpret the regression coefficients, because the
`holding all other predictors constant’ statement is not reasonable.
The standard errors will be inflated, which causes problems with
inference (i.e., p-values are too big).

Question 20

Q

What is the Hierarchical Principle?

Answer

Study These Flashcards

A

Higher-order terms should only be included if the corresponding ‘main effects’ are also included
Categorical variables should enter the model ‘all or nothing’

Question 21

Q

What are the implications of the hierarchical principal?

Answer

Study These Flashcards

A

Avoid splitting up dummy variables representing categorical predictors
Only consider additional polynomial terms if the lower-order terms are already included
Only consider interactions between variables that are included in your model

Question 22

Q

What is usually the result of using internal validation procedures?

Answer

Study These Flashcards

A

Using the same data to both train and validate your model will result in measures of model t that are too optimistic

Question 23

Q

What are the types of model validation?

Answer

Study These Flashcards

A

External Validation:
Assess prediction accuracy in a completely different sample (e.g., patients from a different hospital or geographic location). EV addresses the generalizability of the model
Temporal Validation:
Utilize subsequent patients from the same study or center (E.g., build your model on the first 100 patients, validate it using the next 50)
Internal Validation:
Use data-splitting techniques
Construct training and validation datasets
Cross-validation (e.g., PRESS statistic)
IV is the most common approach to model validation but tends to be too optimistic
Cannot account for variability in the population that is not already captured by the sample

Unit 2: Multiple Linear Regression Flashcards

(23 cards)