Correlation and Regression Flashcards

(16 cards)

1
Q

What is Correlation

A

A measure of the strength of an association between two continuous variables. Is doesn’t provide evidence for a causal association. Can by positive or negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a correlation coefficient and how do you interpret?

A

A dimensionless measurement of correlation sclaed between -1 and +1 which describes the strength and a p value to show where the association is statictically significant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three methods of correlation?

A

Pearson’s R, Spearman’s and Kendall’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Pearson’s?

A

For two variables with a simple linear association. The r value will be close to zero and data must be normally distributed around 0 and each value independent of each other.
No outliers or increasing variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Spearmans?

A

-ranked data and non parametric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Kendall’s?

A

-ranked data, non-linear monotonic association, not normal and some outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to choose the right correlation method?

A

Pearson’s: data is normally distributed and no outliers
Spearman’s: Normal and few outliers or not normal with no outliers (N>20).
Kendall’s: Not normal and a few outliers or a monotonic association that is not linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between correlation and regression?

A

Correlation shows an association or lack between two variables but regression predicts the value of the dependent variable (y) based on the known value of the independent variable (x).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the steps of interpreting regression?

A
  1. Find eqn y = a + bx and calculate slope.
  2. Test is slope = 0 (null) to give p value and R squared
  3. Look at the residual plots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the R-squared?

A

The coefficient of determination which is a measure of the total variability in y that is explained by the regression. Can have a significant relationship but still be weak.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do we looks for in residual plots?

A

These tell us about the appropriate use of a linear eqn. Versus Fits: used to check that we aren’t trying to fit a linear regression with a curvilinear pattern. Ideal to see residuals scattered either side ofo zero line randomly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are residuals?

A

These are the difference between the observed and the predicted value and it shows how good of a fit the eqn is. Regression isn’t appropriate if there is a pattern to the residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are confidence and prediction bands?

A

The fitted line plot shows 2 narrow bands which are the 95% confidence limits to say that we’re 95% confident that the mean y value corresponding to that x value will fall between limits of the CI. The wider bands show the 95% prediction bands and if observations are outside the band then these are outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

2 Slope Regression

A

Used to test whether 2 regression lines are the same or different.
Ho no difference B1 = B2 and Ha B1 doesn’t equal B2
Use T calc eqn to compare to t crit
Tcalc greater than t crit we reject the null.
tcalc = b1-b2/square root of the added variances squared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What the assumptions of regression

A
  1. The residual (errors) have a mean of zero and constant variance
  2. The residuals are independent of each other (value of one not affected by value of another)
  3. The data values are normally distributed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the three types of non-linear regression?

A

Logarithmic: used when the rate of change in the data increases or decreases quickly then levels out
Power: Used to fit a line to data sets that compare meansurements that increase at a specific rate
Exponential: Used on data sets where the data values rise or fall constantly.