Topic 3: Regression Diagnostics Flashcards
linear regression assumptions
- linearity
- normality
- homoscedasticity
- independence
- outliers
- multicollinearity
linearity
the relationship between x and y is linear
normality
the error term follows a normal distribution
homoscedasticity
the error term has a mean 0 & a constant variance
independence
the error terms are not related to each otehr
outliers
there are no outliers
multicollinearity
there are no high correlations among IVs
testing normaltiy
skewness & kurtosis, shapiro-wilk test, normal quantile plot
skewness
the spread of the data
kurtosis
how peaked the data are
interpreting skewness & kurtosis
if t skewness or t kurtosis > 3.2, violation of the respective assumption
shapiro-wilk test
tests for normality
null hypothesis of shapiro-wilk test
the sample comes from a normal distribution
interpreting shapiro-wilk results
significant result = may not come from a normal distirbution
normal quantile plot
sorts observations from smallest to largest, calculates z-scores of the sorted observations, and plots the observations against corresponding z-scores
intepreting normal quantile plot
if close to normal, the points will lie close to some straight line
dealing with non-normality
data transformation or resampling methods (ex., bootstrap, jackknife)
bootstrap
uses resampling with replacements to emulate the process of obtaining new samples so that we can estimate the variability of a parameter estimate without generating additional samples
what happens if homoscedasticity is violated?
- the variances of regression coefficient estimates tend to be under-estimated
- thus, t-ratios tend to be inflated
testing homoscedasticity
residual plots
residuals
differences between Yi & Ŷi
interpreting residual plots for homoscedasticity
funnel shape = violation of homoscedasticity
dealing with heteroscedasticity
data transformation, other estimation methods, other regression methods
testing linearity
residual plots