week 4 Flashcards by Emma R

How many predictor variables in simple linear regression?

In simple linear regression, we only have one predictor variable to predict the criterion variable.

How well did you know this?

Not at all

Perfectly

How many predictor variables in multiple linear regression?

In multiple linear regression, we can have two or more predictor variables to predict the criterion variable.

How well did you know this?

Not at all

Perfectly

Application of Multiple Linear Regression

The application of multiple linear regression is ubiquitous.
§ Psychology Research
§ use different personality traits as predictor variables to predict a life outcome (e.g., GPA).
§ use couple interaction measures to predict relationship satisfaction.
§ Marketing Research:
§ use expenditure, pricing, and market conditions to predict sales.
§ Finance and Economics:
§ use interest rates, reading volumes, and market sentiment measures to predict the stock market price.
§ Large Language Processing:
§ use average sentence length, vocabulary richness, and frequency of complex words to predict the overall readability of a text.

How well did you know this?

Not at all

Perfectly

Almost all advanced statistical methods are extensions of_________________________________

multiple linear regression.

§ e.g., structural equation modelling, multilevel modelling

How well did you know this?

Not at all

Perfectly

What is a bivariate regression?

multiple regression with exactly two predictor variables (x1 and x2) to predict the criterion variable y.

How well did you know this?

Not at all

Perfectly

The population bivariate regression model is denoted as:

uyi|xi = B0 + B1x1 + B2x2

x1 and x2 are scores on the predictor variables.

β0 is the population intercept.

β1 and β2 are population regression coefficients for x1 and x2, respectively.

µyi|xi is the predicted score on the criterion variable for participant i using the population regression model.

How well did you know this?

Not at all

Perfectly

The sample bivariate regression model is denoted as:

y^i =B^0 + B^1x1 +B^2x2

x1 and x2 are scores on the predictor variables.

§ βˆ0 is the estimate of the population intercept β0.

§ βˆ1 and βˆ2 are the estimates of the population regression coefficients β1 and β2, respectively.

§ yˆi is the predicted score on the criterion variable for participant i using the sample regression model.

How well did you know this?

Not at all

Perfectly

In a simple linear regression, the regression equation represents a line in what dimension

2D dimension

How well did you know this?

Not at all

Perfectly

Can a regression with more than two predictors be represented using graphs?

No - represents hyperplane in higher dimensions

How well did you know this?

Not at all

Perfectly

In a bivariate linear regression, the regression equation represents a plane in what dimension

3D dimension

How well did you know this?

Not at all

Perfectly

Least Square Estimation Method in Bivariate Regression
- What does it involve
- What does the residual represent?

The least-square method in the bivariate regression also involves minimizing the sum of squared residual

The residual represents the vertical distance between the regression plane and the data points.

Minimizing SSresidual is minimizing the sum of the squared vertical distances between the regression plane and the data points.
Obtain the regression plane such that the sum of the squared vertical distances is the minimum

How well did you know this?

Not at all

Perfectly

The intercept is the amount of y when…

x1 and x2 are both at 0

How well did you know this?

Not at all

Perfectly

The regression coefficient βˆ1 represents….

the slope between x1 and yˆ

How well did you know this?

Not at all

Perfectly

What does βˆ1 = 0.4 really mean?

while holding x2 (loneliness) constant, for one unit
increase in x1 (stress), there is 0.4 unit increase in ˆy (predicted illness).

represents the effect of x1 on ˆy while controlling for x2.

How well did you know this?

Not at all

Perfectly

the regression coefficient in the bivariate regression
is a __________ coefficient

CONDITIONAL

partialling out the effect of the
other predictor in the model.

In multiple regression, the regression coefficient is also called the “partial regression coefficient”.

How well did you know this?

Not at all

Perfectly

To further demonstrate what it means by partialling out the effect of the other predictor on the criterion variable, we will compute βˆ 1 in the bivariate regression using a two-stage method.

Find the part of x1 that is uncorrelated with x2, which
we will call e1.
- 1.1 run a simple regression model using x2 to predict x1.
1.2 then find the residual vector, which we will call e1
§ the residuals are part of x1 that is uncorrelated from x2.

Use e1 to predict y.
§ In other words, we will use the part of x1 that is
uncorrelated with x2 to predict y, hence partialling out the effect of x2 on y.

How well did you know this?

Not at all

Perfectly

How would you interpret βˆ0? in multiple regression with many predictors

When all predictors are at 0, the predicted score (yˆ) is βˆ0 units.

How well did you know this?

Not at all

Perfectly

How would you interpret βˆ1? in multiple regression with many predictors

Holding all other predictors constant, for one unit change in x1, there is βˆ1 unit change in y

How well did you know this?

Not at all

Perfectly

§ When interpreting standardized partial regression coefficient, the only thing you need to change is…

The unit is the standard deviation (SD) unit.

Holding zx2 (or x2) constant at a specific value, for one standard deviation change in zx1 (or x1), there are βˆ 1 standard deviations change in zˆy (or yˆ).

How well did you know this?

Not at all

Perfectly

Why do we want to partial out (or remove) the effect of all the other predictors?

One of the main reasons why we want to partial out the effects of other predictors is that we want to control for confounding
variables.

Controlling for confounding variables by including them in a regression model is called statistically control.
Often you want to control for demographic variables: “statistically controlling for age, ethnicity, gender, etc”.

Other times, you may want to control for a substantive variable due to research interest.
§ e.g., study the effect of anxiety on performance controlling for depression.
§ i.e., interested in the part of the anxiety that is not related to depression.

How well did you know this?

Not at all

Perfectly

What are other ways of partial out the effect of other variables?

Study These Flashcards

Another way of controlling for other variables is through random assignment in an experiment.
§ By randomly assigning participants to different conditions, we are automatically holding all other variables constant across conditions.

Statistical Control adv and disad

Study These Flashcards

Advantages:
§ Easy to include predictors as long as you measure them.

Disadvantages:
§ Cannot infer causation.
§ Need to measure the predictors accurately.
§ Unlimited number of variables you want to control for.

Experimental Control ad and disadv

Study These Flashcards

Advantages:
§ Can infer causation in an experimental study.
§ Can control for all other variables.

Disadvantages:
§ Can’t randomly assign some variables due to ethical issues.
§ Demand characteristics:
participants change their behaviour because they know they are being manipulated

§ Simple regression: minimize the sum of squared vertical distances in a

Study These Flashcards

2D line

Bivariate regression: minimize the sum of squared vertical distances in a

3D plane.

Multiple regression: minimize the sum of squared vertical distances in a

higher dimension hyperplane.

Does the regression plane always go through the mean of x1,x2,y?

§ Being able to use x¯1, x¯2, y¯ to solve for βˆ 0 shows that the regression plane always goes through (x¯1, x¯2, y¯) point.

§ ryx1

is the correlation between y and x1

ryx2

is the correlation between y and x2

rx1x2

is the correlation between x1 and x2

is the standard deviation of y.

sx1

is the standard deviation of x1.

sx2

is the standard deviation of x2.

When will the βˆ1 in the bivariate equation be equal to that in the simple linear regression?

When the two predictors are uncorrelated with each other (rx1x2 “ 0), the regression coefficient in the simple linear regression equals the partial regression coefficient in the bivariate regression.

Multicollinearity

When the predictors are multicollinear, their coefficients can be quite different in a bivariate versus a simple regression.

t-test for linear regression

In regression, the goal of t-tests is using sample intercept (βˆ0) and regression coefficients (βˆ1 and βˆ2) to draw inference about or make conclusion about population intercept (β0) and regression coefficients (β1 and β2) The t-test for the intercept tests whether the population intercept is 0: H0 : β0 = 0 H1 : β0 does not equal 0

The H0 : β1 = 0 means that

in the population, y cannot be explained or predicted by x1 while we control for x2.

H1 : β0 does not equal 0 means that

§ If we reject H0, we can conclude that y can be significantly explained or predicted by x1 while we control for x2.

In a bivariate regression, the t-statistic for the partial regression coefficient is

Over repeated samples, t~t(n-p-1) where p is the number of predictors. In the bivariate case, t~t(n-1)

What happens to the SE and the t-statistic as multicollinearity increases?

As the multicolinearity increases, the SE (βˆ1) increases and t-statistic decreases, making it less likely to be rejected (i.e., p-value increases). In other words, if the two predictors are highly correlated, one of them is likely to be non-significant.

The t-test also tests whether the population regression coefficient β2 is 0: What does the null and alternative hypothesis indicate:

The H0 : β2 = 0 means that in the population, y cannot be explained by x2 while we control for x1. If we reject H0, we can conclude that y can be significantly explained by x2 while we control for x1.

In linear regression, the total variation in the criterion variable y can be broken down into two independent sources of variation. SStotal = What is each part equivalent to?

SStotal = SSregression + SSresidual SSregression is equivalent to SSbetween in ANOVA. - part of variation in Y that can be explained by all the predictors in the model SSresidual is equivalent to SSwithin in ANOVA. - part of variation in Y that cannot be explained by all the predictors in the model

Multiple R-Squared

the proportion of variation in y that can be explained by the predictors in the model SSregression/SStotal the square of the correlation between observed (y) and predicted (yˆ) values. Multiple R-squared is the proportion of variation in y that can be explained by the predictors in the model.

How do we interpret the multiple R-squared value of 0.3528?

The proportion of variation in y that is explained by the predictor x1 in the model is 0.3528.

When thinking about the population multiple R-squared, we mean ...

the proportion of variation in Y explained by the predictors in the population regression model built using the population data. § denoted as ρ2 § ρ is a Greek letter pronounced as rho

When thinking about the sample multiple R-squared, we mean...

the proportion of variation in Y explained by the predictors in the sample regression model built using the sample data. § denoted as r2

F-test

The F-test in linear regression tests whether the population multiple R-squared is zero: H0: p2 = 0 H1: p2 >0

H0: p2 = 0 means...

means that in the population, the proportion of variance in y that is explained by the predictors is 0:

H1: p2 >0 is equivalent to

§ H1 : at least one of βi’s is not zero. If we reject H0 and endorse H1, we can conclude that the variation in y can be significantly explained by at least one of predictors in the model.

Reject null conclusion in simple linear regression

Therefore, if we reject H0 and endorse H1, we can conclude that the proportion of variation in y can be significantly explained by this one predictor in the model. § This is equivalent to the t-test for the one predictor

What is the relationship between t and F

t^2=F

week 4 Flashcards

(51 cards)