What does a Pearson Correlation Coefficient measure?

The degree of linear association between two numerical variables

What is the formula to find the Pearson Correlation Coefficient?

r=(Σ(x-x̄)-(y-ȳ))/√Σ(x-x̄)^{2}Σ(y-ȳ)^{2})

What are the units for the correlation coefficient?

It has no units

What is the range of possible values for the correlation coefficient?

-1

If two variables x and y are positively correlated, how will the data appear?

Large/small values of x will be associated with large/small values of y

If two variables x and y are negatively correlated, how will the data appear?

Large/small values of x will be associated with small/large values of y

If two variables are significantly correlated, can we conclude that one must be the cause of the other?

No

What two things can an equation of a ‘linear’ association line used to make predictions called?

Regression equation

Regression line

What does a least squares regression?

A least squares method minimises the sum of squares of residuals, which are the vertical distances from the line to points.

What is the formula for the estimated equation of the regression line?

ŷ=b_{0}-b_{1}x

Where b_{0 }=_{ }the y-intercept

and b_{1}x = the slope

What is the formula for finding b_{1}?

b_{1}=(Σ[(x_{i}-x̄ )(y_{i}-ȳ)])/(Σ(x_{i-}x̄^{2}))

What is the formula for finding b_{0}?

b_{0}=ȳ-b_{1}x̄

What is b_{1}?

The change in y across 1x

What is b_{0}?

The y-intercept

What does the y-intercept, b_{0 }tell us?

The value of y at x=0

What is the slope b_{1}?

The rate of change of y with respect to x.

What does the slope b_{1} tell us?

How much y will change when x increases by one unit.

How can we determine how well a regression line fits the data?

Using the coefficient of determination R^{2}

What is the formula for the coefficient of determination R_{2}?

R^{2}=(S_{xy})^{2}/S_{xx}S_{yy}

What is the coefficient of determination R^{2} in terms of the correlation coefficient r?

R^{2} is the square of the correlation coefficient r

How do we determine using the coefficient of determination R^{2}, how well the line fits the data?

The closer the value of R^{2} is to 1, the better the line fits the data.

What does the value of the coefficient of determination R^{2} tell us?

How much of variability in the dependent y‐variable can be explained by the independent x‐variable.

If the value of R^{2} is >90%, what does this tell us about the strength of linear association and thus the quality of the simple linear regression model?

The strength of linear association is very strong and thus the quality of the simple linear regression model is excellent

If the value of R^{2} is 75-90%, what does this tell us about the strength of linear association and thus the quality of the simple linear regression model?

The strength of linear association is strong and thus the quality of the simple linear regression model is very good

If the value of R^{2} is 50-75%, what does this tell us about the strength of linear association and thus the quality of the simple linear regression model?

The strength of linear association is reasonable and thus the quality of the simple linear regression model is good

If the value of R^{2} is 25-50%, what does this tell us about the strength of linear association and thus the quality of the simple linear regression model?

The strength of linear association is weak and thus the quality of the simple linear regression model is weak

If the value of R2 is <25%, what does this tell us about the strength of linear association and thus the quality of the simple linear regression model?

The strength of linear association is very weak and thus the quality of the simple linear regression model is poor

The 4 assumptions are made in a linear regression?

The observations are independent/Repeated observations on the same individual are not allowed

The relationship is linear

The response varies Normally about the population regression line

The standard deviation (or variance) of the response about the population line is the same everywhere.

What is the problem with this data set?

The data is not independent

What is the problem with this data set?

There is non-constant variance