regression and correlation Flashcards Preview

Stats intercal > regression and correlation > Flashcards

Flashcards in regression and correlation Deck (12)
Loading flashcards...

What is correlation?

quantifies the strength of the association between two quantitative variables. Pearson's correlation coefficient is a measure of the scatter of the points underlying a linear trend between two quantitative variables


What is linear regression?

studies the linear relationship between two quantitative variables when one us modelled depending on the other. The model allows us to make specific predictions about what we expect to see among individuals who have not had the dependent variable measured.


Describe P

P measures the strength of the linear relationship between two variables on a scale from -1 to 1. where P=+1 is a perfect positive linear relationship
P=0 means no relationship
P=-1 perfect negative relationship


what is 100rsquared?

the percentage variability of X or Y which is explained by the relationship between them


What is the difference between r and p?

r is the sample correlation and p is the estimated popuation correlation.
P= estimate +/- (1.96XSE)


What are some important points about correlation?

The correlation coefficient is not dependent on the units of measurement of the variables
Should always look at the data first to make sure there is some sort of linear relationship


What can be done if the sample is not normally distributed before calculating the Pearson coefficient?

transforming the data (by taking logs)
using a rank correlation coefficient such as Spearman's


What is the best fitting line in regression?

the one which makes the sum of the squares of the residuals as small as possible - the equivalent to minimising the variance of the residuals and the line is known as the least squares linear regression line


What are the assumptions for linear regression?

Constant variance
Independent observations
Normality of residuals
Error free values for x


In the residual plots, what do graphs A and B assess?

The assumptions that the residuals are normally distributed


What does graph C assess?

Whether the relationship is linear and whether the spread of response is the same for all values of x


What does graph D show?

Some indication of a lack of independence