Week 9: Assumptions of Multivariable Linear Regression Flashcards by Amelia Jasinski

How do you check the distribution of continuous variables?

<hist varname, freq normal> to create a histogram and overlay a normal distribution

How well did you know this?

Not at all

Perfectly

How do we call a coefficient when the dependent variable decreases as the independent variable increases?

Negative coefficient

How well did you know this?

Not at all

Perfectly

What does the coefficient represent?

The change in the dependent variable for a one-unit change in the predictor, holding other variables constant

How well did you know this?

Not at all

Perfectly

What is the significance of residuals in regression analysis?

Residuals measure the difference between observed and predicted values, indicating model fit

How well did you know this?

Not at all

Perfectly

How can you test if residuals are normally distributed using a kernel density plot?

<kdensity resid_varname, normal> and overlay a normal curve to check for alignment

How well did you know this?

Not at all

Perfectly

How does a pnorm plot help in assessing normality of residuals?

It compares the cumulative distribution of residuals to a normal distribution; closer alignment suggests normality. Qnorm plots show deviations from normality in the middle range of data

How well did you know this?

Not at all

Perfectly

What does deviation from the line in a qnorm plot represent?

Deviation at the tails indicates non-normality, suggesting potential outliers or skewness. Shows deviations from normality at the extremities.

How well did you know this?

Not at all

Perfectly

Why is normality of residuals important in linear regression?

Normal residuals ensure valid hypothesis testing and confidence intervals

How well did you know this?

Not at all

Perfectly

What is the purpose of a residual vs fitted plot?

It checks for patterns that indicate violations of linearity, equal variance, or non-normality

How well did you know this?

Not at all

Perfectly

How can you assess if the linearity and equal variance assumptions are met?

Look for random scatter in a residual vs fitted plot; fanning or patterns suggest heteroscedasticity
In a residual vs fitted plot, non-linearity is shown by a pattern, whereas unequal variance is shown by a funnel shape

How well did you know this?

Not at all

Perfectly

What does the presence of leverage points indicate?

Leverage points are influential observations that can disproportionately affect model fit

How well did you know this?

Not at all

Perfectly

How can you identify leverage points in regression analysis?

Plot residuals or fitted values against predictors and look for isolated points

How well did you know this?

Not at all

Perfectly

What does multicollinearity indicate in a regression model?

High correlation between predictors can distort coefficients, making them unreliable

How well did you know this?

Not at all

Perfectly

How can multicollinearity be detected?

Calculate correlation coefficients between predictors; values near +/- 1 indicate multicollinearity. Use command <cor></cor>

How well did you know this?

Not at all

Perfectly

Why might adding two highly correlated predictors distort regression results?

The shared variance between predictors reduces the model’s ability to isolate individual effects

How well did you know this?

Not at all

Perfectly

What does recoding or transforming a variable achieve in regression analysis?

Study These Flashcards

It can improve the fit by correcting for skewness, non-linearity, or address data inaccuracies (e.g., age grouping)

Why is it important to check for missing data after transformations?

Study These Flashcards

Transformations can exclude cases, reducing sample size and potentially biasing results

How can you improve the normality of residuals?

Study These Flashcards

Apply transformations (e.g., log, square root) to the dependent variable to reduce skewness

What does a funnel shape in a residual plot suggest?

Study These Flashcards

It indicates heteroscedasticity - variance of residuals increases with fitted values

What is the implication of non-linearity in residual plots?

Study These Flashcards

Non-linearity suggests that the relationship between predictors and outcome may not be adequately captured by the model

Why might adding interaction terms improve model fit?

Study These Flashcards

Interactions account for cases where the effect of one predictor depends on the level of another predictor

What does a constant (_cons) represent?

Study These Flashcards

It is the predicted value of the dependent variable when all predictors are zero

How can transformations improve model assumptions?

Study These Flashcards

They can stabilise variance, reduce skewness, and make relationships more linear

What does a high peak in a histogram indicate about the distribution?

Study These Flashcards

It may suggest a large concentration of data around a specific value, potentially indicating skewness or rounding

Why is assessing the distribution of predictors and outcomes crucial in regression?

Non-normality or outliers can lead to biased estimates and affect the validity of the model

How does excluding extreme observations affect model fit?

It can reduce leverage effects and improve stability, but may also remove meaningful data

What is the purpose of fitting multiple models with different predictors?

It helps to identify the best combination of variables and assess the robustness of results

Why is it useful to visualise residuals after each regression model?

It allows for continuous assessment of model assumptions and fit

How do you generate residuals for a regression model?

This will generate a new variable. This cannot be overwritten by any other residuals variable created - subsequent variables must have a new name

How do you plot a pnorm and qnorm plot to examine the normality of residuals?

What command do you use to check normality of residuals using a residual vs fitted value plot?

The option draws a line at 0 where the residuals should be densest The option is used to change the default symbol which is now set at a hollow circle changes the size of the symbol

How can we make a variable containing the fitted values and why would we want to do this?

The fitted values can be plotted on a scatterplot to see where there may be issues with fitted values (may be far away from the others) - Or

How can you check the options for transforming a variable?

How do you perform a log transformation on a variable?

Week 9: Assumptions of Multivariable Linear Regression Flashcards

(34 cards)