Week 9: Assumptions of Multivariable Linear Regression Flashcards

(34 cards)

1
Q

How do you check the distribution of continuous variables?

A

<hist varname, freq normal> to create a histogram and overlay a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do we call a coefficient when the dependent variable decreases as the independent variable increases?

A

Negative coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the coefficient represent?

A

The change in the dependent variable for a one-unit change in the predictor, holding other variables constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the significance of residuals in regression analysis?

A

Residuals measure the difference between observed and predicted values, indicating model fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can you test if residuals are normally distributed using a kernel density plot?

A

<kdensity resid_varname, normal> and overlay a normal curve to check for alignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does a pnorm plot help in assessing normality of residuals?

A

It compares the cumulative distribution of residuals to a normal distribution; closer alignment suggests normality. Qnorm plots show deviations from normality in the middle range of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does deviation from the line in a qnorm plot represent?

A

Deviation at the tails indicates non-normality, suggesting potential outliers or skewness. Shows deviations from normality at the extremities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is normality of residuals important in linear regression?

A

Normal residuals ensure valid hypothesis testing and confidence intervals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the purpose of a residual vs fitted plot?

A

It checks for patterns that indicate violations of linearity, equal variance, or non-normality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can you assess if the linearity and equal variance assumptions are met?

A

Look for random scatter in a residual vs fitted plot; fanning or patterns suggest heteroscedasticity
In a residual vs fitted plot, non-linearity is shown by a pattern, whereas unequal variance is shown by a funnel shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the presence of leverage points indicate?

A

Leverage points are influential observations that can disproportionately affect model fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can you identify leverage points in regression analysis?

A

Plot residuals or fitted values against predictors and look for isolated points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does multicollinearity indicate in a regression model?

A

High correlation between predictors can distort coefficients, making them unreliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can multicollinearity be detected?

A

Calculate correlation coefficients between predictors; values near +/- 1 indicate multicollinearity. Use command <cor></cor>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why might adding two highly correlated predictors distort regression results?

A

The shared variance between predictors reduces the model’s ability to isolate individual effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does recoding or transforming a variable achieve in regression analysis?

A

It can improve the fit by correcting for skewness, non-linearity, or address data inaccuracies (e.g., age grouping)

17
Q

Why is it important to check for missing data after transformations?

A

Transformations can exclude cases, reducing sample size and potentially biasing results

18
Q

How can you improve the normality of residuals?

A

Apply transformations (e.g., log, square root) to the dependent variable to reduce skewness

19
Q

What does a funnel shape in a residual plot suggest?

A

It indicates heteroscedasticity - variance of residuals increases with fitted values

20
Q

What is the implication of non-linearity in residual plots?

A

Non-linearity suggests that the relationship between predictors and outcome may not be adequately captured by the model

21
Q

Why might adding interaction terms improve model fit?

A

Interactions account for cases where the effect of one predictor depends on the level of another predictor

22
Q

What does a constant (_cons) represent?

A

It is the predicted value of the dependent variable when all predictors are zero

23
Q

How can transformations improve model assumptions?

A

They can stabilise variance, reduce skewness, and make relationships more linear

24
Q

What does a high peak in a histogram indicate about the distribution?

A

It may suggest a large concentration of data around a specific value, potentially indicating skewness or rounding

25
Why is assessing the distribution of predictors and outcomes crucial in regression?
Non-normality or outliers can lead to biased estimates and affect the validity of the model
26
How does excluding extreme observations affect model fit?
It can reduce leverage effects and improve stability, but may also remove meaningful data
27
What is the purpose of fitting multiple models with different predictors?
It helps to identify the best combination of variables and assess the robustness of results
28
Why is it useful to visualise residuals after each regression model?
It allows for continuous assessment of model assumptions and fit
29
How do you generate residuals for a regression model?
This will generate a new variable. This cannot be overwritten by any other residuals variable created - subsequent variables must have a new name
30
How do you plot a pnorm and qnorm plot to examine the normality of residuals?
31
What command do you use to check normality of residuals using a residual vs fitted value plot?
The option draws a line at 0 where the residuals should be densest The option is used to change the default symbol which is now set at a hollow circle changes the size of the symbol
32
How can we make a variable containing the fitted values and why would we want to do this?
The fitted values can be plotted on a scatterplot to see where there may be issues with fitted values (may be far away from the others) - Or
33
How can you check the options for transforming a variable?
33
How do you perform a log transformation on a variable?