Flashcards in 6. Regression assumptions, Diagnostics and Influencial cases Deck (43):

1

## How many assumptions are there for multiple linear regression?

###
9 mathematical

+2 design

=11

2

## What are the 2 design assumptions of multiple linear regression?

###
Independence (each participant only 1 score on each IV)

Interval Scale on IV and DV (or dichotomous IV)

3

## What are the 9 mathematical assumptions of multiple linear regression

###
Normality (6 sub assumptions)

No multicollinearity (3 ways to check)

Linearity

Normal distribution of residuals

Independent Residuals

Residuals unrelated to predictors

Homogeneity of Variance

4

## What are the 6 tests of normality?

###
Symmetry

Modality

Skew

Kurtosis

Outliers

Shapiro-Wilk

5

## What do you check with the assumption of symmetry?

### Mean = Medium = Mode

6

## What do you check for in modality?

### Only 1 most frequently occurring score (Unimodal not multi/bimodal)

7

## What do you check for in skew and kurtosis?

### Skew / SE Skew

8

## What constitutes an outlier?

###
95% of cases should be 1.96

No more than 3% of cases should be >2.58

If there are... they are outliers

9

## What do you check for in the shaprio-wilk statistic?

### That it is not sig. (>.05)

10

## What are the 3 checks for multicollinearity?

###
Pearson Correlations .01

VIF

11

## What does VIF stand for?

### Variance inflation factor

12

## Where and what for do you look to check whether the residuals have a normal distribution?

###
Mean of residuals = 0

No skew (Snaking) and No Kurtosis (Sag) in the P-P plot and histogram

No outliers in the histogram

13

## Why are the residual statistics so important?

###
Because if they aren’t normally distributed then we can't say that 68% of cases will fall within + or -

the RMSE of the regression line

14

## How do we check linearity and why do we check it?

###
Using pearson correlation

because if the IV is not related to the DV then it can't be a good predictor

15

## How is the Independence of Residuals tested?

### Using the Durbin-Watson

16

## What does the Durbin Watson show?

### The independence of residuals

17

## When reading the Durbin-Watson, what are we looking for to meet our assumption of independent residuals?

###
Values between 1.5-2.5

(Actual range from 1 = strong pos. to 4= strong neg.)

18

## What do we look at to test homogeneity of variance?

###
The scatterplot of standardized predicted value against the studentized residual

(don't want any funnelling or patterns)

19

## What is it called if there is funnelling (and an unequal distribution on either side of x= 0) where the assumption of homogeneity is not met?

### Heteroscedasticity

20

## What is evidence of heteroscedasticity?

###
No pattern or funnelling

Equal distribution on either side of x=0 (divide graph in half)

21

## How is the 'Residuals unrelated to predictors' assumption checked?

### By obtaining a Pearson correlation between each IV and the unstandardized residuals (RES_1)

22

## What should the correlation between the predictors and the residuals be?

### 0 and non sig.

23

## What should we do if the assumptions are violated?

### Question the validity of the model and caution about the interpretations

24

## What three violations of assumptions cause the most problems for linear regression?

###
Normality (especially on the DV)

Homogeneity of Variance

Presence of outliers

25

## What are the 3 options if normality is violated?

###
Transformation NO (if sig. skew)

Bootstrap YES (less biased)

Outliers FIRST (check influence)

26

## What are considered extreme cases in a data set?

### >2SD from the mean

27

## What are outliers a problem?

### They affect the value of the estimated regression coefficients = biased model

28

## Where are the problem cases located in SPSS output?

### Casewise Diagnostics

29

## What should you look at to determine the amount of influence the outliers are having?

###
Studentized Residuals (Y-Ypred: Error)

Influential cases

30

## What does it mean if a case has a large residual?

### It doesn't fit the model well and should be checked as a possible outlier

31

## What are the 3 types of residuals?

###
Unstandardized

Standardized

Studentized (most precise)

32

## What are the 8 statistics that can be used to assess the influence of a particular case on a model?

###
Adjusted predicted value.

Deleted residual and the studentized deleted residual.

DFFit and standardized DFFit.

Cook’s distance.

Leverage.

Mahalanobis distances.

DFBeta and Standardized DFBeta.

Covariance ratio.

33

## What is the rule for Adj Pred Value?

### It should be = to predicted value

34

## What is the rule for the studentized deleted residual?

### Within the range of -2 to 2

35

## What is the rule for Mahalanobis Distance?

###
When:

N= 500, 25+ = bad

N=100 and k=3, 15+=bad

N=30 and k=2, 11+=bad

36

## What is the rule for Cook's distance?

###
1.0+ = bad

Close to 0 = good

37

## What is the rule for Leverage values?

###
If value is 2x more than average leverage value

Average Leverage value = (k +1) / n

38

## What is the rule for the covariance ratio?

###
If above upper end of range (> 1 + [3(k + 1)/n]) DON'T DELETE

If BELOW lower end of range (

39

## What is the rule for DFFit?

### Depending on range of scale e.g. either 0-1 or 1-100... the DFFit value should be closer to 1 (i.e. in 0-1 a value of 0.5 is terrible but in 1-100 it's nothing)

40

## What is the rule for the SD DFFit?

### Should be between -2 and 2

41

## What is the rule for SD Df Beta?

### If +-> 2 = bad

42

## What should we do if we remove the outliers?

### Run the regression again and compare the new and old

43