Week 4 - Correlations and linear regression Flashcards
If a research question has the words “relationship between X and Y” or “after controlling for Z, is there an association between X and Y?”
What kind of statistical analysis would be appropriate for these questions?
Correlations and regression
What is a correlation?
A statistical technique for measuring the extent to which two variables are associated/ related.
Measures the pattern of responses across variables.
Assumes linear association between variables
Changes in one leads to predictable changes in another variable
Usually a bivariate association
How is an association/ relationship determined?
When changes in one variable can show persistent and predictable changes in other variables
Does correlation always mean causation?
No, because a third variable might be causing the observed associations
What is the range of values for a correlation
-1 (perfect negative) to +1 (perfect positive)
0 = no association therefore represents the null hypothesis
How s the significance of a correlation determined?
- sample size (n)
- alpha value (one (0.05) vs two-tailed (0.05)
- –> e.g. predicting one direction (positive or negative) or two direction
Which type of alpha has greater statistical power? one-tailed or two tailed?
A one-tailed test because it only tests in one direction (very confident hypothesis is in that direction - back up with theory)
What does variance measure?
How much the scores deviate from the mean of the distribution (one-variable)
variance = average squared distance from the mean
What does covariance measure?
How much TWO variables differ from the mean
instead of sum of squares, sum of cross products are observed
What are the problems with covariance?
How are they fixed?
UNIT OF MEASUREMENT - e.g. covariance of two variables might be measured in miles = 4.25 but then if converted to km the covariance is 11
–> standardise it (divide by the SD of both variables)
The standardised version of a covariance is known as the ____
correlation coefficient
- unaffected by units of measurement
- makes the variances equal
covariance = standardised/unstandardised
whereas Pearson correlation = standardised/unstandardised
covariance = unstandardised
Pearson correlation= standardised
What does Pearson Correlation tell us?
Direction + strength of linear relationship between two interval/ratio variables (continuous data)
What symbol denotes Pearson Correlation
r
r = strength and direction
What does the size/magnitude of ‘r’ denote?
degree to which points fit on a straight line. Closer to 1 = more straight line indicating a linear relationship
+1 positive relationship
-1 negative relationship
0 = no relationship/ two variables are independent of one another
What is a correlation matrix?
Represents each correlation between pairwise combination of variables.
Can be used for descriptive statistics/ exploratory analysis
In a correlation matrix, each correlation is a separate test. What is the issue with multiple testing?
How can we fix this?
more tests (without seperate justifiable hypotheses) Increases the risk of a false positive
–> post hoc analysis = reporting associated found after data collection
What are the assumptions of a Pearson Correlation?
- Parametric test therefore is assumes variables are normally distributed
- linear association
- variables measured on an interval or ratio scale
How to deal with violation to the assumption of normality in a Pearson Correlation?
- if N >30, use CLT to justify preceding despite violation
- Spearman correlation
How to deal with violation to the assumption of linearity in a Pearson Correlation?
- If relationship is monotonic, use Spearman correlation
- Otherwise, transformation to achieve linearity
What are the two situations where you can use Spearman Correlation (r s or rho)
- to find the association between two ordinal variables (X & Y consist of ranks)
- to measure the consistency of direction of the association between two interval/ratio variables
- -> variables converted to ranks first before Spearman is used
- Measures the degree of monotonic relationship between variables
Do the Spearman Correlation Coefficient and Pearson Correlation Coefficient use the same formula?
Does this make the analysos more or less powerful?
yes, only the calculations are performed on ranked data instead in Spearman
Less powerful because data is lost during it’s conversion into ordinal data
What is a monotonic relatonship?
Assumption that even tho the data doesn’t fit on straight line, data points are generally going in the same direction.
As Pearson correlation assumes linearity, use can use Spearman if data is non-linear but monotonic (increasing in the same direction)
What do you use to find:
The proportion of variability in Y variable that can be attributed to variability in X
Coefficient of determination (r2/ r squared)
Shows how accurately one variable predicts the other