Unit 2: Multiple Linear Regression Flashcards Preview

Biostatistics II > Unit 2: Multiple Linear Regression > Flashcards

Flashcards in Unit 2: Multiple Linear Regression Deck (23):
1

What is the purpose of MLR?

To predict a response variable (Y) using a set of predictor variables (X1, X2,...Xi)

2

What is the method of MLR

BLUE

3

What are the assumptions for MLR?

LINE
Linearity: Y can be modeled as a linear function of the independent variables.

Independence: Observations are independent and of equal importance; predictors are linearly independent of each other (no multicollinearity)

Normality of errors: Errors are independent and normally-distributed with zero mean
and constant variance

E

4

How can we check if a linear relationship is appropriate?

1. Plot of the residuals against the fi tted values (y)
2. Plot of the residuals against each predictor variable(xij)

5

How can we check if the error assumptions are appropriate?

1. Plot of the residuals against the fi tted values (y)
2. Plot of the residuals against each predictor variable (xij)
3. Histogram and/or normal probability plot of the residuals
4. Plot of the residuals against the index or order of data collection (to check independence)

6

What is the overall F-Test?
What does it mean when we reject the null of a n Overall F test?

Tests that the entire collection of independent variables are associated with the outcome.

Rejecting H0 indicates that the model with all predictors is better than an intercept-only model; further testing may be needed.
(H0: All Bj = 0
H1 : at least one B j not equal to 0)

7

What is the partial T-Test?
What does it mean to reject the null of a partial t-test?

Tests that a specifi c independent variable is associated with the outcome, given the association with the other predictors has
already been accounted for

Rejecting H0 : j = 0 implies that there is signi cant evidence of a
linear association between Xj and Y, given all other predictors are
already included in the model

8

What is the partial F-Test?
What are the hypotheses?

Tests that a specifi c collection of independent variables associated with the outcome, given the association with the other predictors has already been accounted for.
The reduced has to be a nested version of the full model.

Hypotheses:
H0: Reduced is better than the full
H1: Full model is the better model

Rejecting H0 indicates that the full model is better than the reduced model; further testing may be needed.

9

How can you check for multicollinearity?

Checking for multicollinearity problems:
Plot predictor variables against each other
Look for large sample correlation coefficients
Look for large variance inflation factors (VIFs)

10

How can we solve for the unconditional variance of Y using the ANOVA table?

We can multiply the SST by n-1.

11

Will SSM overlap for independent predictors?

No! Independent predictors will not have overlapping SSMs.

12

Can the Adjusted R2 be negative?

YES! for really poor models where there are too many predictors, since it penalizes for number of predictors

13

Type I SSM Characteristics

1. 'Sequential sums of squares'
2. Predictor-order matters
3. Sums to the overall SSM
4. Useful for conducting partial F-tests

14

Type III SSM Characteristics

1. `Partial sums of squares'
2. Predictor-order does not matter
3. Does not sum to the SSM (unless predictors are independent)
4. Useful for computing partial correlations and partial R2

15

When is a variable a confounder?

Variable Z is a confounder (`lurking variable') if it's inclusion changes the relationship between X and Y (e.g., Department
confounds the relationship between gender and admission rates

16

When is there interaction/effect modifier?

`The relationship between X and Y depends on the values of Z.'

Interactions (`eff ect modi ers') can be used to account for a relationship between Y and X that varies across the levels/values of Z

17

Why center predictors?

Change the interpretation of B0: The average value of the response variable at the average
value of the predictors (i.e., y).

It helps to alleviate `variance inflation' issues associated with fitting models with higher-order polynomial terms, a special case of
multicollinearity.

18

Why would you standardize your predictors?

The magnitude of coefficient estimates is comparable acrosspredictors
It puts all predictors `on an equal playing eld' when building a model
Similar to centering, it helps alleviate a special type of multicollinearity issue introduced when fi tting models with higher-order polynomial terms
You should only standardize continuous predictors with roughly Normal distributions - do not standardize categorical predictors!

19

Why is it a problem if the predictors are correlated?

There are typically large changes between the regression coefficients
in the unadjusted and adjusted models.
It is dicult to interpret the regression coefficients, because the
`holding all other predictors constant' statement is not reasonable.
The standard errors will be inflated, which causes problems with
inference (i.e., p-values are too big).

20

What is the Hierarchical Principle?

Higher-order terms should only be included if the corresponding 'main e ffects' are also included
Categorical variables should enter the model 'all or nothing'

21

What are the implications of the hierarchical principal?

Avoid splitting up dummy variables representing categorical predictors
Only consider additional polynomial terms if the lower-order terms are already included
Only consider interactions between variables that are included in your model

22

What is usually the result of using internal validation procedures?

Using the same data to both train and validate your model will result in measures of model t that are too optimistic

23

What are the types of model validation?

1. External Validation:
Assess prediction accuracy in a completely di fferent sample (e.g., patients from a di fferent hospital or geographic location). EV addresses the generalizability of the model
2. Temporal Validation:
Utilize subsequent patients from the same study or center (E.g., build your model on the first 100 patients, validate it using the next 50)
3. Internal Validation:
Use data-splitting techniques
Construct training and validation datasets
Cross-validation (e.g., PRESS statistic)
IV is the most common approach to model validation but tends to be too optimistic
Cannot account for variability in the population that is not already captured by the sample