shit Flashcards
(22 cards)
What is multiple regression
Used when there is more than one predictor (X) variable while there is one outcome (Y) variable
What are the two uses for multiple regression
To predict Y, given a combination of predictor (IV or X) variables
To asses the relative importance of each predictor variable in explaining the response variable Y
What is the goal of multiple regression
To test the accuracy of a linear model where DV is predicted or influenced by several IV
Determines the proportion of the variance in a DV explained by the IV
Determines significance of model based on r squared
Establish the relative predictive importance of the IVs as well as the strength and direction as a predictor
Comparing simple and multiple regression
Simple linear regression = Y’ = bX + A
Multiple linear regression =
Y = a + b1X1 +b2X2 + … + bKXk
where:
b1 = regression coefficient for first predictor variable X1
b2 = regression coefficient for the second predictor variable X2
a = intercept, value of Y when all predictor variables are 0 m
What does a multiple regression do
Tests if the model is generalizable to the population
Running a regression analysis is not a simple matter of inputting data, clicking a button and obtaining a “fixed” model of the data
You create model of your data
- subjective process
- You shape model you created
- task is to create the model that best describes data
- how will you create visual representation of research question to reach a model
How many types is there in multiple regression
multiple types
1. Standar multiple Regression
2. Hierarchical Multiple Regression
3. Sequential/Stepwise Regression
- Forward addition
- Backward selection
- stepwise
4. Combinatorial
Each type will produce a different model, and a different way of explaining outcome variable from different predictor variable
What do we asses in multiple regression
Assesses the relative contribution of each predictor variable to response variable
- which variable contributes most
- which is the second biggest predictor
- which variables don’t seem to contribute to the prediction
What are the things to note in multiple regression
Order with which you input variables into the analysis influences the model
Variable entered first is attributed more variance
By the time last variable is entered, there might be very little variance left to explain
What is standard multiple regression
AKA Simultaneous multiple regression
All IVs are entered into the process at the same time
Each IV is evaluated in terms of its prediction of the DV over and above what is predicted by the other IVs
Computer package (SPSS) enters all predictor variables into model simultaneously
- creates a regression equation including all predictor variables
- allows to asses the unique contribution of each predictor variable when all other variables are held constant
What are advantages and disadvantages of standard multiple regression
Easy to see which variable significantly predict the response variable
may not create the best model for predicting Y as it will include variables that don’t significantly predict Y
What is hierarchical mutliple regression
Researcher decides the order in which the variables are entered
- order based on theory and prior research
- order of entry: follow logic of theory in where the most important variables were first entered
What are other characteristics of hierarchical multiple regression
Allows you to asses whether each predictor adds anything to the model, given the predictors that are already in the model
IVs are entered into the equation in the order specified by the researcher (one at a time)
Each IV is assessed in terms of what it adds to the equation at its point of entry
What is sequential model
It aims to create the best model - combination of variables that best predicts the response variable
IVs are entered into the equation in order specified by SPSS
IV with the best correlation is included first - followed by the next highest correlation, while controlling the first, and so on, until all is entered
Builds several models in a series of steps, adding or deleting variables at each step, depending on contribution to predicting the response variable
Final model includes only variables which significantly and uniquely predict the response variable
What is forward addition in sequential models
Begins with only one variable in the model - the variable that makes the biggest contribution to response variable (highest r)
Adds the variable with the next highest contribution
continues to add variables until there are no more variables that make significant contributions to the response variable over and above the variables that are already in the equation
What is backward selection in sequential models
Begins will all predictor variables in the model and successively deletes variables until only significant ones remain
What is stepwise regression in sequential models
Similar to the previous two, but more versatile
Generally moves forward, adding significant variables
BUT is able to move backward to eliminate a variable if it no longer significantly predicts when another variable is added
What are caveats about sequential methods
Inclusion in the model depends on mathematical criterion rather than psychological theory or research
Variable selection could depend upon tiny differences in correlation between each predictor variable and the response variable
Difficult to replicate results
Can be misleading especially for small sample sizes
Requires a large sample size (40 cases per IV needed)
What is combinatorial methods in sequential models
Best subsets method
Computes models with all possible combination of the predictor variables and chooses the model that explains most variance in the response variable
What are assumptions of multiple regression
Continuous variables
Non-zero correlation between any IV and the DV - BUT, no multi-collinearity
Absence of outliers
Normal distribution -in presence of skewedness, try to get more data
Homoscedasticity
Linearity of relationship
Large sample size
What is multicollinearity
Multicollinear variables measure exactly the same concept or common value
May tend to have one variable push down contribution of other variable, making it appear non-significant even if it really is
Distortion of prediction
What is the rule of thumb for multicollinearity
if the r > .80 , the variables are multicollinear
how to solve: select just one variable and remove the other
we must control for multicollinearity
What are the other assiumptions of multiple regression
Failure to comply with requirements
- remove outliers
- transform skewed DV or IVs
If you cannot do any, report violations of normality/assumptions
Risk that you cannot generalize to the whole population