Regression Flashcards

1
Q

Regression is…

A

used to understand relationship between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Independent variable (X)

A

Predictor or regressor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Dependent variable (y)

A

Outcome or reponse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Goal of regression

A

Predict changes in Y based on X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Correlation vs regression

A

Correlation - measures the strength and direction of a linear relationship

Regression Predicts Y based on X

Shared variance (r^2); Proportion of Ys variance explained by X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Simple linear Regression Equation

A

Y = B0 + B1 X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

B0

A

Intercept: value of y when x is 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

B1

A

Slope: Change in Y per unit X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

e

A

Error: Difference between observed and predicted Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Regression using a SAMPLE of the populations

A

Sample estimates intercept and slope and predicted values of Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Predicted values of Y are…

A

points on the Regression line that corresponds to the given value of X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Residuals (e’hat’) are…

A

distances between observed and predicted values of Y for corresponding X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Equations for the Slope (B1)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Equation for the Intercept (B0)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Correlation equation

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What do you do with the r to find the proportion of shared variance?

A

rxy^2

Square it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

1-r^2xy is the…

A

Variance of Y independent of X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Suppose we observe a high correlation between a child’s weight and their reading ability. This correlation is likely due to age, how can we combat the confounds?

A

We can control for the hypothesized influence of age on reading ability by removing the shared variance between age and weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Squared Multiple Correlation (R-squared) formula:

A

The SMC represents the proportion of variance in Y shared with (or “explained by”) the set of all X variables

Numerator: proportion of non-redundant variance in Y shared with X1 and X2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Shared variance in Prediction

A

In two predictor regression, we are interested in imposing statistical control over X2 to test the unique effects of X1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Goal of Multiple Regression

A
  1. Evaluate the unique effect of X predictors on Y outcomes (holding constant other X)
  2. Determine the incremental contribution of new X predictors to estimating variance in Y (in addition to X already in the model)
  3. Determine the amount of variance explained in Y from a set of X predictors
22
Q

To determine Incremental contribution to the model we use…

A

squared semi partial correlation

23
Q

To determine variance explained in Y we use…

A

Squared multiple correlation

24
Q

Regression is a method of finding an equation to describe…

A

The line of best for a set of data

25
How to define "best fitting" line when there are so many possibilities?
A line that is best fit for the actual data **minimizes prediction errors**
26
Error of prediction is...
the distance each point is from the regression line (Y- Ŷ)
27
Least-squared-error solution
Procedure that produces a line that minimizes the squared error of prediction
28
Linear model with several predictors
The linear model can be expanded to include as many predictors as you like Expanded formula: 𝑌𝑖= (𝑏0+ 𝑏1 𝑋1𝑖+ 𝑏2 𝑋2𝑖 )+𝑒𝑖
29
r can be thought of a standardized version of...
b (slope)
30
Model Estimation
Total
31
Residual Sums of Squares (SSr)
Gauge of how well a particular line fits the data
32
Sums of Squares Regression (SSR)
Tells us how much error there is in the model but not wheter it is a better fit than nothing Need to compare our model against a baseline model Mean is a model of no relationship
33
Sums of Squares Total (SST)
The differences between observed values and the values predicted by the mean
34
Sums of Squares Model (SSM)
The difference between SST and SSR
35
SSY and SST notation is
Sums of Squares total dfy = n-1 Consists of adding Sums of squares Regression df regression = 1 and Sums of Squares residual df residual = n-1
36
if SSM is large the regression model...
is very different from the mean to predict the outcome variable This implies that the regression model has made a big improvement to how well the outcome variable can be predicted. If its small then using the regression model is better than using the mean
37
Variance explained by the regression model (R^2) formula
38
Mean Squares Regression Formula
39
Mean Squares Residual Formula
40
F Statistic Formula
41
Assessing individual predictors
42
Bivariate observations variable measurement scale
Interval
43
Different notation for sample and population regression statistics
44
A test of (rho=0)
If Rho=0 then the sampling distribution of r is almost normal with an expected value of rho and an estimated standard error of (Sr), given below (where n is the number of bivariate 'pairs' of observations...
45
To test the hypothesis of p Formula
Df = n-2
46
Example of
47
Confidence Interval on r (Formula)
48
Confidence Interval on r (example)
49
50
Residual sums of squares (SSR)
Gauge how well the particular line fits the data
51
Sums of Squares Total (SST)
The differences between the observed values and the values predicted by the mean
52
Sums of Squares Model (SSM)
The difference between SST and SSR