Correlation and Regression Flashcards
(16 cards)
What is Correlation
A measure of the strength of an association between two continuous variables. Is doesn’t provide evidence for a causal association. Can by positive or negative
What is a correlation coefficient and how do you interpret?
A dimensionless measurement of correlation sclaed between -1 and +1 which describes the strength and a p value to show where the association is statictically significant.
What are the three methods of correlation?
Pearson’s R, Spearman’s and Kendall’s
What is Pearson’s?
For two variables with a simple linear association. The r value will be close to zero and data must be normally distributed around 0 and each value independent of each other.
No outliers or increasing variance
What is Spearmans?
-ranked data and non parametric
What is Kendall’s?
-ranked data, non-linear monotonic association, not normal and some outliers
How to choose the right correlation method?
Pearson’s: data is normally distributed and no outliers
Spearman’s: Normal and few outliers or not normal with no outliers (N>20).
Kendall’s: Not normal and a few outliers or a monotonic association that is not linear
What is the difference between correlation and regression?
Correlation shows an association or lack between two variables but regression predicts the value of the dependent variable (y) based on the known value of the independent variable (x).
What are the steps of interpreting regression?
- Find eqn y = a + bx and calculate slope.
- Test is slope = 0 (null) to give p value and R squared
- Look at the residual plots
what is the R-squared?
The coefficient of determination which is a measure of the total variability in y that is explained by the regression. Can have a significant relationship but still be weak.
What do we looks for in residual plots?
These tell us about the appropriate use of a linear eqn. Versus Fits: used to check that we aren’t trying to fit a linear regression with a curvilinear pattern. Ideal to see residuals scattered either side ofo zero line randomly.
What are residuals?
These are the difference between the observed and the predicted value and it shows how good of a fit the eqn is. Regression isn’t appropriate if there is a pattern to the residuals.
What are confidence and prediction bands?
The fitted line plot shows 2 narrow bands which are the 95% confidence limits to say that we’re 95% confident that the mean y value corresponding to that x value will fall between limits of the CI. The wider bands show the 95% prediction bands and if observations are outside the band then these are outliers.
2 Slope Regression
Used to test whether 2 regression lines are the same or different.
Ho no difference B1 = B2 and Ha B1 doesn’t equal B2
Use T calc eqn to compare to t crit
Tcalc greater than t crit we reject the null.
tcalc = b1-b2/square root of the added variances squared.
What the assumptions of regression
- The residual (errors) have a mean of zero and constant variance
- The residuals are independent of each other (value of one not affected by value of another)
- The data values are normally distributed
What are the three types of non-linear regression?
Logarithmic: used when the rate of change in the data increases or decreases quickly then levels out
Power: Used to fit a line to data sets that compare meansurements that increase at a specific rate
Exponential: Used on data sets where the data values rise or fall constantly.