week 4 Flashcards
(51 cards)
How many predictor variables in simple linear regression?
In simple linear regression, we only have one predictor variable to predict the criterion variable.
How many predictor variables in multiple linear regression?
In multiple linear regression, we can have two or more predictor variables to predict the criterion variable.
Application of Multiple Linear Regression
The application of multiple linear regression is ubiquitous.
§ Psychology Research
§ use different personality traits as predictor variables to predict a life outcome (e.g., GPA).
§ use couple interaction measures to predict relationship satisfaction.
§ Marketing Research:
§ use expenditure, pricing, and market conditions to predict sales.
§ Finance and Economics:
§ use interest rates, reading volumes, and market sentiment measures to predict the stock market price.
§ Large Language Processing:
§ use average sentence length, vocabulary richness, and frequency of complex words to predict the overall readability of a text.
Almost all advanced statistical methods are extensions of_________________________________
multiple linear regression.
§ e.g., structural equation modelling, multilevel modelling
What is a bivariate regression?
multiple regression with exactly two predictor variables (x1 and x2) to predict the criterion variable y.
The population bivariate regression model is denoted as:
uyi|xi = B0 + B1x1 + B2x2
x1 and x2 are scores on the predictor variables.
β0 is the population intercept.
β1 and β2 are population regression coefficients for x1 and x2, respectively.
µyi|xi is the predicted score on the criterion variable for participant i using the population regression model.
The sample bivariate regression model is denoted as:
y^i =B^0 + B^1x1 +B^2x2
x1 and x2 are scores on the predictor variables.
§ βˆ0 is the estimate of the population intercept β0.
§ βˆ1 and βˆ2 are the estimates of the population regression coefficients β1 and β2, respectively.
§ yˆi is the predicted score on the criterion variable for participant i using the sample regression model.
In a simple linear regression, the regression equation represents a line in what dimension
2D dimension
Can a regression with more than two predictors be represented using graphs?
No - represents hyperplane in higher dimensions
In a bivariate linear regression, the regression equation represents a plane in what dimension
3D dimension
Least Square Estimation Method in Bivariate Regression
- What does it involve
- What does the residual represent?
The least-square method in the bivariate regression also involves minimizing the sum of squared residual
The residual represents the vertical distance between the regression plane and the data points.
Minimizing SSresidual is minimizing the sum of the squared vertical distances between the regression plane and the data points.
Obtain the regression plane such that the sum of the squared vertical distances is the minimum
The intercept is the amount of y when…
x1 and x2 are both at 0
The regression coefficient βˆ1 represents….
the slope between x1 and yˆ
What does βˆ1 = 0.4 really mean?
while holding x2 (loneliness) constant, for one unit
increase in x1 (stress), there is 0.4 unit increase in ˆy (predicted illness).
represents the effect of x1 on ˆy while controlling for x2.
the regression coefficient in the bivariate regression
is a __________ coefficient
CONDITIONAL
partialling out the effect of the
other predictor in the model.
In multiple regression, the regression coefficient is also called the “partial regression coefficient”.
To further demonstrate what it means by partialling out the effect of the other predictor on the criterion variable, we will compute βˆ 1 in the bivariate regression using a two-stage method.
Find the part of x1 that is uncorrelated with x2, which
we will call e1.
- 1.1 run a simple regression model using x2 to predict x1.
1.2 then find the residual vector, which we will call e1
§ the residuals are part of x1 that is uncorrelated from x2.
Use e1 to predict y.
§ In other words, we will use the part of x1 that is
uncorrelated with x2 to predict y, hence partialling out the effect of x2 on y.
How would you interpret βˆ0? in multiple regression with many predictors
When all predictors are at 0, the predicted score (yˆ) is βˆ0 units.
How would you interpret βˆ1? in multiple regression with many predictors
Holding all other predictors constant, for one unit change in x1, there is βˆ1 unit change in y
§ When interpreting standardized partial regression coefficient, the only thing you need to change is…
The unit is the standard deviation (SD) unit.
Holding zx2 (or x2) constant at a specific value, for one standard deviation change in zx1 (or x1), there are βˆ 1 standard deviations change in zˆy (or yˆ).
Why do we want to partial out (or remove) the effect of all the other predictors?
One of the main reasons why we want to partial out the effects of other predictors is that we want to control for confounding
variables.
Controlling for confounding variables by including them in a regression model is called statistically control.
Often you want to control for demographic variables: “statistically controlling for age, ethnicity, gender, etc”.
Other times, you may want to control for a substantive variable due to research interest.
§ e.g., study the effect of anxiety on performance controlling for depression.
§ i.e., interested in the part of the anxiety that is not related to depression.
What are other ways of partial out the effect of other variables?
Another way of controlling for other variables is through random assignment in an experiment.
§ By randomly assigning participants to different conditions, we are automatically holding all other variables constant across conditions.
Statistical Control adv and disad
Advantages:
§ Easy to include predictors as long as you measure them.
Disadvantages:
§ Cannot infer causation.
§ Need to measure the predictors accurately.
§ Unlimited number of variables you want to control for.
Experimental Control ad and disadv
Advantages:
§ Can infer causation in an experimental study.
§ Can control for all other variables.
Disadvantages:
§ Can’t randomly assign some variables due to ethical issues.
§ Demand characteristics:
participants change their behaviour because they know they are being manipulated
§ Simple regression: minimize the sum of squared vertical distances in a
2D line