Topic 3: Regression Diagnostics Flashcards

1
Q

linear regression assumptions

A
  • linearity
  • normality
  • homoscedasticity
  • independence
  • outliers
  • multicollinearity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

linearity

A

the relationship between x and y is linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

normality

A

the error term follows a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

homoscedasticity

A

the error term has a mean 0 & a constant variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

independence

A

the error terms are not related to each otehr

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

outliers

A

there are no outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

multicollinearity

A

there are no high correlations among IVs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

testing normaltiy

A

skewness & kurtosis, shapiro-wilk test, normal quantile plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

skewness

A

the spread of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

kurtosis

A

how peaked the data are

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

interpreting skewness & kurtosis

A

if t skewness or t kurtosis > 3.2, violation of the respective assumption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

shapiro-wilk test

A

tests for normality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

null hypothesis of shapiro-wilk test

A

the sample comes from a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

interpreting shapiro-wilk results

A

significant result = may not come from a normal distirbution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

normal quantile plot

A

sorts observations from smallest to largest, calculates z-scores of the sorted observations, and plots the observations against corresponding z-scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

intepreting normal quantile plot

A

if close to normal, the points will lie close to some straight line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

dealing with non-normality

A

data transformation or resampling methods (ex., bootstrap, jackknife)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

bootstrap

A

uses resampling with replacements to emulate the process of obtaining new samples so that we can estimate the variability of a parameter estimate without generating additional samples

19
Q

what happens if homoscedasticity is violated?

A
  • the variances of regression coefficient estimates tend to be under-estimated
  • thus, t-ratios tend to be inflated
20
Q

testing homoscedasticity

A

residual plots

21
Q

residuals

A

differences between Yi & Ŷi

22
Q

interpreting residual plots for homoscedasticity

A

funnel shape = violation of homoscedasticity

23
Q

dealing with heteroscedasticity

A

data transformation, other estimation methods, other regression methods

24
Q

testing linearity

A

residual plots

25
interpreting residual plots for linearity
curve shape = violation of linearity
26
dealing with non-linearity
data transformation, add another IV to the equation (non-linear function of one of the other IVs), use non-linear methods
27
testing independence
Durbin-Watson (d) of autocorrelation
28
Durbin-Watson test
tests the correlation between error terms ordered in time or space
29
interpreting Durbin-Watson test results
1.5-2.5 = normal below 1 or above 3 = abnormal
30
dealing with dependence
data transformation, use other estimate methods, use other regression methods
31
outlier
a data point disconnected from the rest of the datas
32
checking outliers
cook's distance
33
interpreting Cook's distnace
Cook's D > 4 suggests potentially serious outliers
34
dealing with outliers
if an unusual case is not likely to reoccur, delete the case or use robust regression
35
consequences of multicollinearity
1. unstable regression coefficient estimates (lower t-ratios) 2. a high r2 (or significant F) but few significant t-ratios 3. unexpected signs of regression coefficients 4. the matrix inversion problem
36
checking multicollinearity
tolerance, VIF, condition index
37
tolerance
R2 for the regression of each IV on the other IVs, ignoring the DV
38
interpreting tolerance
values < 0.1 = multicollinearity problem
39
variance inflation factor (VIF)
1/tolerance
40
interpreting VIF
values > 10 = multicollinearity problem
41
condition index
measure of dependency of one variable on the others
42
interpreting condition index
values > 30 = multicollinearity
43
dealing with multicollinearity
drop a variable, incorporate more info (composite variable), or use other regression methods
44
r2 vs. R2 in linear regression
- Simple linear regression: r2 = R2 - Multiple linear regression: r2 ≠ R2