week 4: Multiple regressions Flashcards
(19 cards)
what is a regression?
expands on correlation, examining whether we can estimate the value of an outcome variable (Y) on the basis of our predictor (X)
- how does Y change in relation to X
What is forced entry regression?
- predictors based on previous research and theory
- do not state a particular order for the variables to be entered
- all variables are forced into the model at the same time
- the ‘enter method’
what is hierarchical regression?
- predictors based on previous research
- researcher decides the order in which the predictors are entered into the model
- enter known predictors from prior research first, and then the new predictors
- new predictors can be entered: all at once, in a hierarchical manner, in a stepwise manner
What is stepwise regression?
- computer programme selects the predictor that best predicts the outcome and enters that into the model first
What is unstandardised beta?
change in Y for 1 unit change in X
What is standardised beta?
change in Y for 1 standard deviation change in X
What does the R squared value measure?
how much variance is accounted for by the model (effect size)
What are the assumptions of a multiple regression?
- sample size
- all predictor variables should be quantitative (continuous, categorical or ordinal), the outcome variable must be continuous
- non-zero variance: predictor variables should show variance
- independence: all values of the outcome variable should be independent
- linearity: assume that the predictor and outcome variables have a linear relationship
How many participants are needed for every 1 predictor variable?
10 participants
if you have 2 predictor variables, how many participants are needed for a small, medium and large effect size?
- small: 478
- medium: 67
- large: 31
if you have 3 predictor variables, how many participants are needed for a small, medium and large effect size?
- small: 543
- medium: 76
- large: 36
if you have 4 predictor variables, how many participants are needed for a small, medium and large effect size?
- small: 597
- medium: 84
- large: 39
What is multicollinearity
strong correlation between predictor variables
What 2 statistics identify multicollinearity?
- VIF (Variance inflation factor): if the average VIF is much greater than 1, then regression may be biased. if largest VIF is greater than 10, there is definitely a problem
- Tolerance: if tolerance is below 0.1 there is a serious problem, and a potential problem if below 0.2
What is homoscedasticity?
at each level of the predictor variable, the variance of the residuals should be constant
- if the variance of the residuals are different, we have heteroscedasticity
what are residuals?
distances between the line of best fit and the individual data points
What are independent errors?
for any two observations (data points) the residual points should not correlate, they should be independent
how do you test for independent errors?
use a Durbin-Watson test
- tests whether residuals next to each other are correlated
- test statistic varies between 0-4
- a value of 2 means the residuals are uncorrelated
- a value greater than 2 indicates a positive correlation, and less than 2 indicates negative correlation
- Values greater than 3 and less than 1 indicate an issue
How do you interpret a multiple regression?
model fit:
- R squared value
- ANOVA results (F and p-values)
to examine relationships:
- beta values
- intercept
Multiply the R squared value by 100 to get the percentage proportion of variance accounted for by the model.
F(3,31) = 81.07, p < .001
f = 81.07
3 = degrees of freedom
31 = residual