Week 4 (Multiple Regression) Flashcards
(39 cards)
What is a regression?
Extends upon a correlation (relationship between two variables)
Multiple Regression
-Extension of simple linear regression
-Explores impact of multiple predictor variables
-Tests relationship in parallel
Regression VS ANOVA
Regression
-Focuses on relationships between predictor variables and one outcome variable
Factorial ANOVA
-Focuses on differences in scores on the dependent variable, according to two or more independent variables.
Regression VS ANOVA (Requirements)
Regression
-Predictor variables can be continuous, ordinal or binary data, outcomes must be continuous.
-One hypothesis per predictor
Factorial ANOVA
-IVs must be categorical data, can have 2+ conditions, dependent variable must be continuous.
-One hypothesis per IV (=2) and one hypothesis for the interaction (so 3 in total)
Types of multiple regression
-Forced Entry
-Hierarchical Multiple Regression
-Stepwise Multiple Regression
Forced Entry Multiple Regression
-Predictors based on previous research and theory
-Do not state a particular order for the variables to be entered
-All variables are forced into the model at the same time
-Known as Enter method in SPSS
Hierarchical Regression
-Predictors based on previous research
-Researcher designs the order in which predictors entered into model
-Enter known predictors first and then enter new predictors
Stepwise regression
-Based on maths rather than previous research/theory
-Both forward and backward methods
-Computer programme selects the predictor that best predicts the outcome and enters that into the model first
Assumptions of multiple regression
- Sample Size
- Variable Types
- Non-zero variance
- Independence
- Linearity
- (Lack of) Multicollinearity
- Homoscedasticity
- Independence Erros
- Normally Distributed Errors
Variable types (Regression Assumption)
-All predictor variables should be quantitative
*Can be continuous, categorical, or ordinal
-Outcome variable must be quantitative and continuous
Non-zero variance (Regression Assumption)
-Predictor variables should have a variance
-In other words, should not have a variance of zero
Independence (Regression Assumption)
-All values of the outcome variable should be independent
-Each value of the outcome variable should be a separate entity
Linearity (Regression Assumption)
-Assume that the relationship between the predictor and outcome variable will be linear
-If analysis is run on a non-linear relationship, the model can be unreliable
Sample Size of Regression (Regression Assumption)
More is better
-Field (2010) suggests you use the following equations to identify an appropriate size
50+8k where k = number of predictor variables
104+k
Or can use power analysis
Multicollinearity (Regression Assumption)
-Strong correlation between predictor variables
*Perfect collinearity when you have a correlation of 1 between predictors
-Becomes difficult to interpret results
-Untrustworthy beta values
-Can’t identify individual importance of each predictor
-Limits size of r squared
-Threatens the validity of the model produced
Identifying multicollinearity (Regression Assumption)
-VIF (Variance Inflation Factor)
*If the average VIF is substantially greater than 1then regression might be biased
*If largest VIF is greater than 10 there is definitely a problem
-Tolerance
*If tolerance is below 0.1 a serious problem
*If tolerance is below 0.2 a potential problem
What are residuals (Regression Assumption)
Distances between regression line and individual data points
Homoscedasticity (Regression Assumption)
-At each level of the predictor, the variance of the residuals should be constant
Independent Errors (Regression Assumption)
-For any two observations (data points) the residual points should not correlate, they should be independent
-This can be indentified as an issue with the Durbin-Watson Test
Normally Distributed errors (Regression Assumption)
-The residual values in the regression model are random and normally distributed, with a mean of. I.e, there is an even chance of points lying above and below the best-fit line
How to check sample size (up-front assumption)
-Calculate the desired sample size in advance
How to check variable types (up-front assumption)
-Make sure your measures provides data appropriate for multiple regression
How to check non-zero variance (up-front assumption)
Calculate the standard deviation of your variables, check if they have variance > 0
How to check independence (up-front assumption)
-A measurement issue (make sure your outcome scores are all from different people)
-Should not have two or more values on your outcome variable from the same person