Correlation & multiple regression Flashcards
(34 cards)
what is correlation?
- An association or dependency between two independently observed variables
- use a scatterplot to visualise a correlation
what does Pearsons correlation coefficient do?
- tells you how strong the correlation is between X and Y
- its a number between -1 and 1
- 0 they are completely independent of eachother
- 1.0 they are identical to eachther
- -1.0 they are exactly inverse of one another
when is covariance greater?
when the values if X and Y and more similar
when do we conduct a Pearsons’s coefficient (r)?
two interval/ration variables
when do we conduct a Spearman’s rank coefficient
two ordinal (rank) variables
when do we conduct a Kendall’s rank coefficient
two true dichotomy values
when do we conduct a Phi coefficient?
two true dichotomy variables
when do we conduct a point-biserial coefficient
one true dichotomy variable and one interval/ratio variable
what is partial correlation?
when information from different variables is overlapping
what is multiple regression?
it describes the relationship between one or more predictor variables (X1, X2 etc) and a single criterion (Y)
linear regression equation
𝑌̂=𝛽0+𝛽1 𝑋1+𝛽2 𝑋2+…+𝛽𝑚 𝑋𝑚
𝒀̂ = the predicted value of the criterion variable 𝒀
𝜷𝟎 = the intercept term
𝜷𝒊 = the 𝑖th regression coefficient, indicating how strongly predictor variable 𝑿𝒊 can be used to predict 𝒀 in the model
𝒎 = the number of predictor variables in the model
what is 𝑦=𝑎𝑥+𝑏 equivalent to?
𝑌̂=𝛽0+𝛽1 𝑋1
where a is the slope and b is the y intercept
what is the equation for residual error?
𝜀=𝑌−𝑌̂
what is the equation for the variance unexplained?
〖𝑆𝑆〗_𝑅=∑(𝑌−𝑌̂ )^2
what is variance explained question?
〖𝑆𝑆〗_𝑀=∑(𝑌̂−𝑌̅ )^2
what is prediction error?
the difference between the actual values 𝑌 and the predicted values 𝑌̂
𝜀=𝑌−𝑌̂
what is the goal of a regression?
to find the best fit between the model and the observations, by adjusting the values of 𝛽_𝑖 until the prediction error is minimised
what is multiple correlation coefficient (R)?
Correlation between the predicted values 𝒀̂ and the observed values 𝒀
- cannot directly be calculated
- has to be calculated by the square root of the coefficient of determination (R^2)
what os the coefficient of determination (R^2)
- Proportion of variance of explained by the regression model
- This is simply the square of the multiple correlation coefficient
F-ratio
the proportion of explained variance with the residual variance, allowing a statistical test
effect size formultiple linear regression for cohen’s f
small effect size = cohen’s f of 0.02
medium effect size = cohen’s f of 0.15
large effect size = cohen’s f of 0.35
what is a simultaneous (standard) multiple regression approach?
- No a priori model assumed
- All predictor variables are fit together
what is a stepwise approach to multiple regression?
- No a priori model
- Predictor variables are added/removed one at time, to maximize fit
- Not a good approach because it will always overfit the data
what is a hierarchical multiple regression approach?
- Based on a priori knowledge of variables – we may know a relationship exists for some variables, but are interested in the added explanatory power of a new variable
- Several subsequent regression models are analysed (adding or removing predictor variables)
- We can use this assess how much better one model explains the criterion variable than another (∆𝑅^2)