Week 9: Assumptions of Multivariable Linear Regression Flashcards
(34 cards)
How do you check the distribution of continuous variables?
<hist varname, freq normal> to create a histogram and overlay a normal distribution
How do we call a coefficient when the dependent variable decreases as the independent variable increases?
Negative coefficient
What does the coefficient represent?
The change in the dependent variable for a one-unit change in the predictor, holding other variables constant
What is the significance of residuals in regression analysis?
Residuals measure the difference between observed and predicted values, indicating model fit
How can you test if residuals are normally distributed using a kernel density plot?
<kdensity resid_varname, normal> and overlay a normal curve to check for alignment
How does a pnorm plot help in assessing normality of residuals?
It compares the cumulative distribution of residuals to a normal distribution; closer alignment suggests normality. Qnorm plots show deviations from normality in the middle range of data
What does deviation from the line in a qnorm plot represent?
Deviation at the tails indicates non-normality, suggesting potential outliers or skewness. Shows deviations from normality at the extremities.
Why is normality of residuals important in linear regression?
Normal residuals ensure valid hypothesis testing and confidence intervals
What is the purpose of a residual vs fitted plot?
It checks for patterns that indicate violations of linearity, equal variance, or non-normality
How can you assess if the linearity and equal variance assumptions are met?
Look for random scatter in a residual vs fitted plot; fanning or patterns suggest heteroscedasticity
In a residual vs fitted plot, non-linearity is shown by a pattern, whereas unequal variance is shown by a funnel shape
What does the presence of leverage points indicate?
Leverage points are influential observations that can disproportionately affect model fit
How can you identify leverage points in regression analysis?
Plot residuals or fitted values against predictors and look for isolated points
What does multicollinearity indicate in a regression model?
High correlation between predictors can distort coefficients, making them unreliable
How can multicollinearity be detected?
Calculate correlation coefficients between predictors; values near +/- 1 indicate multicollinearity. Use command <cor></cor>
Why might adding two highly correlated predictors distort regression results?
The shared variance between predictors reduces the model’s ability to isolate individual effects
What does recoding or transforming a variable achieve in regression analysis?
It can improve the fit by correcting for skewness, non-linearity, or address data inaccuracies (e.g., age grouping)
Why is it important to check for missing data after transformations?
Transformations can exclude cases, reducing sample size and potentially biasing results
How can you improve the normality of residuals?
Apply transformations (e.g., log, square root) to the dependent variable to reduce skewness
What does a funnel shape in a residual plot suggest?
It indicates heteroscedasticity - variance of residuals increases with fitted values
What is the implication of non-linearity in residual plots?
Non-linearity suggests that the relationship between predictors and outcome may not be adequately captured by the model
Why might adding interaction terms improve model fit?
Interactions account for cases where the effect of one predictor depends on the level of another predictor
What does a constant (_cons) represent?
It is the predicted value of the dependent variable when all predictors are zero
How can transformations improve model assumptions?
They can stabilise variance, reduce skewness, and make relationships more linear
What does a high peak in a histogram indicate about the distribution?
It may suggest a large concentration of data around a specific value, potentially indicating skewness or rounding