Lecture 7 - Statistical Tests III: Correlations & Comparing Two Groups Flashcards
(50 cards)
what are the two statistical tests for correlative studies for both parametric and no-parametric data?
correlative parametric data - pearson’s correlation
correlative non-parametric data - spearman’s rank correlation
what will pearsons correlation be used for?
pearsons correlation will be used for two continuous variables where the correlation coefficient “R” describes the strength and direction of the association, numbered between -1 and 1
what does correlation describe?
correlation describes the amount of variation or scatter in a scatter plot
the higher the scatter…
the lower the strength of correlation
R values for positive, negative & no correlations:
positive correlation: r >0
negative correlation: r <0
no correlation: r = 0
what is the difference between a linear regression and a pearsons correlation?
the difference is that with a pearsons correlation there is no line fitted however with a linear regression there is an implemented regression line
pearsons assumptions:
both continuous variables are normally distributed
random sampling
independence of observations
pearsons null hypothesis:
there is no correlation between the variable p (rho) = 0
if the p-value is larger than 0.05, it is not worth discussing the R values
regression or correlation?
how are x & y related? how much does y change with x? = regression
how well are x & y related? = correlation
it is correlation rather than regression if:
it is correlation rather than regression if neither of the two continuous variables is predicted to depend on the other (e.g. there may not be a biological reason to assume such dependant - when the correlation seems to have little reasoning
it is regression rather than a correlation if:
your data comes from an EXPERIMENT as with experiments there is usually a direct relationship [we assume y is dependant on x] between the two variables, therefore a linear regression must be plotted
how can we check to see if it is safe to use pearsons correlation?
after first deducing that it’s a random correlation and not a direct relationship [as a result of experiment], you must check if both variable data sets are of a normal distribution using the shapiro.test command in R
how can we check for normal distribution of variable data before confirming if we can use pearsons correlation?
we attach our data frame and command for the names(data)
then for each name we input:
shapiro.test(variable_1_name)
shapiro.test(variable_2_name)
providing the p-values for both sets of data are ABOVE 0.05 we can assume for normal data distribution
how can you command R to give the pearsons correlation?
cor.test(variable_1, variable_2, method = “pearson”)
note: doesn’t matter what way around your variables are - answer will be the same either way
how do we write up the results of a pearsons cor.test in R?
the (variable one) and (variable two) of (object) were negatively/positively correlated (pearsons correlation; R = value, p = value, N = 15)
what do we receive from a pearsons cor.test command and how do you infer it?
you will get a p-value and a test statistic found underneath “cor” at the bottom of the output which is our correlation coefficient
(1) if the p value is smaller than <0.05 then we can assume that the two variables are correlated
(2) if the cor value if positive it means there is a positive correlation, if the cor value is negative it means there is a negative correlation
if the shapiro.test results are greater/lower than 0.05 we:
> 0.05: data IS normally distributed
<0.05: data IS NOT normally distributed
what is the non-parametric equivalent of the pearsons correlation?
spearman’s rank
spearman’s rank overall function and assumptions:
- ranks both the x and y variable used to calculate a measure of correlation
- assumptions: none about distribution of variables; random sampling; independence of observations
what does spearman’s rank correlation, r/s / R/s describe?
describes the strength and direction of the linear association between the ranks of the two variables, number between -1 & 1
what is different between the pearsons correlation and spearman’s rank?
pearsons is parametric data that is unranked
spearman’s rank is non-parametric data that in ranked
what must be done to your variables when calculating spearman’s rank?
the data from both variables must be ranked separately from low to high - lowest values gets rank one and they progressively get higher integers for the larger they are
how can you use R to calculate your spearman’s rank values?
we, once again use:
cor.test(variable one, variable two, method = “spearman”)
how do we infer the results of our spearman’s rank values in R?
you are given a p-value: if it is greater than 0.05 then we must accept the null hypothesis and assume no correlation, if the value is smaller than 0.05 we must accept the alternative hypothesis and assume a correlation
you are also given a “rho” test statistic (Rs) at the bottom of the output: ONLY if the p-value is <0.05, we look at this value - if it is positive it suggests a positive correlation and if it is a negative value it suggests a negative correlation