Least Squares Regression Equation

Y’ = bX + a

Y’ = the predicted value

X = the known value

a and b = numbers calculated from the original correlation analysis.

b = r √(SSy / SSx)

a = ̅Y – b ̅X

Give the 5 steps for determining the least squares regression equation

- Determine values of SSx, SSy, and r by referring to the original correlation analysis.
- Substitute numbers into the formula and solve for b.
- Assign values to ̅X and ̅Y by referring to the original correlation analysis.
- Substitute numbers into the formula and solve for a.
- Substitute numbers for b and a in the least squares regression equation.

Least Squares Regression Equation

The equation that minimizes the total of all squared prediction errors for known Y scores in the original correlation analysis.

Assume that an r of .30 describes the relationship between educational level (highest grade completed) and estimated number of hours spent reading each week. More specifically:

Educational Level (X): ̅X = 13 SSx = 25

Weekly Reading Time (Y)

̅Y = 8

SSy = 50

r = .30

Determine the least squares equation for predicting weekly reading time from educational level.

b = r √(SSy/SSx) b = 0.30 √(50/25) = .42

a = ̅Y – b ̅X a = 8 – (.42)(13) = 2.54

Y' = bX + a Y' = .42X + 2.54

Assume that an r of .30 describes the relationship between educational level (highest grade completed) and estimated number of hours spent reading each week. More specifically:

Educational Level (X): ̅X = 13 SSx = 25

Weekly Reading Time (Y)

̅Y = 8

SSy = 50

r = .30

Faith’s education level is 15. What is her predicted reading time?

Y’ = .42X + 2.54

Y’ = (.42)(15) + 2.54 = 8.84

Assume that an r of .30 describes the relationship between educational level (highest grade completed) and estimated number of hours spent reading each week. More specifically:

Educational Level (X): ̅X = 13 SSx = 25

Weekly Reading Time (Y)

̅Y = 8

SSy = 50

r = .30

Keegan’s educational level is 11. What is his predicted reading time?

Y’ = .42X + 2.54

Y’ = (.42)(11) + 2.54 = 7.16

Standard Error of Estimate (Definition Formula)

Sy|x = √[Sy|x / (n – 2)]

= √[∑(Y – Y’)² / (n – 2)]

Standard Error of Estimate (Computation Formula)

Sy|x = √[SSy (1 – r²) / (n – 2)]

SSy = ∑Y² – (∑Y)²/n

Standard Error of Estimate (Sy|x)

A rough measure of the average amount of predictive error.

Give the 2 steps for the calculation of the standard error of estimate, Sy|x

- Assign values to SSy and r by referring to previous work with the least squares regression equation.
- Substitute numbers into the formula and solve for Sy|x

Calculate the standard error of estimate assuming that the correlation of .30 is based on n = 35 pairs of observations and supply a rough interpretation of the standard error of estimate.

Educational Level (X): ̅X = 13 SSx = 25

Weekly Reading Time (Y)

̅Y = 8

SSy = 50

r = .30

Sy|x = √[SSy (1 – r²) / (n – 2)]

Sy|x = √[50 (1 – 0.30²) / (35 – 2)] = √[50 (0.91) / 33] = √45.5/33 =√1,38 = 1.17

Roughly indicates the average amount by which the prediction is in error.

Squared Correlation Coefficient (r²)

The proportion of the total variability in one variable that is predictable from its relationship with the other variable.

r² interpretation

r² (computation formula)

r² = SSy’ / SSy = (SSy – Sy|x) / SSy

r² = [SPxy / √(SSxSSy)]²

Assume that an r of .30 describes the relationship between educational level and estimated hours spent reading each week.

According to r², what percent of the variability in weekly reading time is predictable from its relationship with educational level?

9 % predicted

Assume that an r of .30 describes the relationship between educational level and estimated hours spent reading each week.

What percent of variability in weekly reading time is not predictable from this relationship?

91 % not predicted

Assume that an r of .30 describes the relationship between educational level and estimated hours spent reading each week.

Someone claims that 9 percent of “each” person’s estimated reading time is predictable from the relationship. What is wrong with this claim?

9 % refers to the variability of “all” estimated reading times.

The correlation between the IQ scores of parents and children is .50, and that between the IQ scores of foster parents and foster children is .27.

Does this signify, therefore, that the relationship between foster parents and foster children is about one-half as strong as the relationship between parents and children?

Use r² to compare the strengths of these two correlations.

No.

The r² of .25 for parents and children is about four times greater than the r of 0.07 for foster parents and foster children.

Multiple Regression Equation

A least squares equation that contains more than one prediction or X variable.

Regression Toward the Mean

A tendency for scores, particularly extreme scores, to shrink toward the mean.

Regression Fallacy

Occurs whenever regression toward the mean is interpreted as a real, rather than a chance, effect.

After a group of college students attended a stress-reduction clinic, declines were observed in the anxiety scores of those who, prior to attending the clinic, had scored high on a test for anxiety.

Can this decline be attributed to the stress-reduction clinic? Explain your answer.

No, because the observed decline could be due to regression toward the mean, given that the students scored high on the anxiety test prior to attending the clinic.

After a group of college students attended a stress-reduction clinic, declines were observed in the anxiety scores of those who, prior to attending the clinic, had scored high on a test for anxiety.

What type of study, if any, would permit valid conclusions about the effect of the stress-reduction clinic?

An experiment where students who score high on the anxiety test are randomly assigned either to attend the stress-reduction clinic or to be in a control group.