Correlation and Linear Regression Flashcards
(33 cards)
What type of graph would you use to visualise the relationship between two continuous variable?
Scatterplot
What are the two main uses of scatterplots?
▪️Investigate empirical relationship between X (independent) and Y (dependent)
▪️Attempt to predict Y from X
What correlation?
How close two variables are to having a linear relationship
‘R’ is used to quantify direction and magnitude
What are the two types of correlation coefficient?
▪️Pearson’s
▪️Spearman’s
What is the posh was of saying there is a correlation?
There is a linear association
What can you determine from the correlation coefficient?
▪️The direction of the effect
▪️The magnitude of the effect
When do you use Pearson’s correlation coefficient ‘r’?
To check the magnitude and direction of a linear relationship between two variable
What assumptions are needed for Pearson’s correlation coefficient?
▪️Variables are approx. normally distributed
▪️Variables are continuous
▪️Each observation should have a pair of values
▪️No significant outliers
▪️A straight line relationship should be formed (linearity)
When should be use Spearman’s Correlation coefficient ‘rs’/’ρ’ ?
When one or both of the variable are NOT normally distributed
Or if the data is ordinal
(less sensitive to extreme influential points)
What does Spearman’s Correlation coefficient measure?
▪️Strength and direction of MONOTONIC relationship between two ranked variables
▪️Decrease or increase together but not necessarily at a constant rate as it would if linear
What is the non-parametric version of the Pearson’s correlation coefficient?
Spearman’s
How Spearman’s Correlation coefficient is calculated depends on whether the data…
▪️Does not have tied ranks
▪️Does have tied ranksn
What are the regression coefficients?
β0 (intercept) and β1 (slope)
What is the Y variable?
The dependent variable (outcome/response)
What is the X variable?
The independent variable (predictor/explanatory/covariate)
What is the best linear regression line?
The line closest to all data points (residual ε is as small as possible)
How might we estimate the linear regression line?
Ordinary Least Squares (OLS) - minimises the squared residuals to estimate β0 and β1
When do we use the Simple Linear Regression Model?
To measure to what extent there is a linear relationship between two variables
What is β1 in the null hypothesis?
0
(slope)
What assumptions are needed for the simple linear regression model?
▪️There’s a linear relationship
▪️Residuals are independent of one another
▪️Residuals follow normal distribution with mean 0
▪️Homogeneity of variance - size of error doesn’t change significantly across IV
What is R?
The simple correlation coefficient
What is R squared?
How much the total variation of the DV can be explained by the IV
E.g. 0.270 = 27%
How do you interpret a significant p-value in a simple linear regression ANOVA?
The regression model statistically significantly predicts the outcome variable (good fit)
What do you use to predict the AVERAGE Y of a specific value of X?
Confidence interval of the MEAN