Lecture 7 - Statistical Tests III: Correlations & Comparing Two Groups Flashcards by Charlie Davies

what are the two statistical tests for correlative studies for both parametric and no-parametric data?

correlative parametric data - pearson’s correlation

correlative non-parametric data - spearman’s rank correlation

How well did you know this?

Not at all

Perfectly

what will pearsons correlation be used for?

pearsons correlation will be used for two continuous variables where the correlation coefficient “R” describes the strength and direction of the association, numbered between -1 and 1

How well did you know this?

Not at all

Perfectly

what does correlation describe?

correlation describes the amount of variation or scatter in a scatter plot

How well did you know this?

Not at all

Perfectly

the higher the scatter…

the lower the strength of correlation

How well did you know this?

Not at all

Perfectly

R values for positive, negative & no correlations:

positive correlation: r >0

negative correlation: r <0

no correlation: r = 0

How well did you know this?

Not at all

Perfectly

what is the difference between a linear regression and a pearsons correlation?

the difference is that with a pearsons correlation there is no line fitted however with a linear regression there is an implemented regression line

How well did you know this?

Not at all

Perfectly

pearsons assumptions:

both continuous variables are normally distributed

random sampling

independence of observations

How well did you know this?

Not at all

Perfectly

pearsons null hypothesis:

there is no correlation between the variable p (rho) = 0

if the p-value is larger than 0.05, it is not worth discussing the R values

How well did you know this?

Not at all

Perfectly

regression or correlation?

how are x & y related? how much does y change with x? = regression

how well are x & y related? = correlation

How well did you know this?

Not at all

Perfectly

it is correlation rather than regression if:

it is correlation rather than regression if neither of the two continuous variables is predicted to depend on the other (e.g. there may not be a biological reason to assume such dependant - when the correlation seems to have little reasoning

How well did you know this?

Not at all

Perfectly

it is regression rather than a correlation if:

your data comes from an EXPERIMENT as with experiments there is usually a direct relationship [we assume y is dependant on x] between the two variables, therefore a linear regression must be plotted

How well did you know this?

Not at all

Perfectly

how can we check to see if it is safe to use pearsons correlation?

after first deducing that it’s a random correlation and not a direct relationship [as a result of experiment], you must check if both variable data sets are of a normal distribution using the shapiro.test command in R

How well did you know this?

Not at all

Perfectly

how can we check for normal distribution of variable data before confirming if we can use pearsons correlation?

we attach our data frame and command for the names(data)

then for each name we input:

shapiro.test(variable_1_name)

shapiro.test(variable_2_name)

providing the p-values for both sets of data are ABOVE 0.05 we can assume for normal data distribution

How well did you know this?

Not at all

Perfectly

how can you command R to give the pearsons correlation?

cor.test(variable_1, variable_2, method = “pearson”)

note: doesn’t matter what way around your variables are - answer will be the same either way

How well did you know this?

Not at all

Perfectly

how do we write up the results of a pearsons cor.test in R?

the (variable one) and (variable two) of (object) were negatively/positively correlated (pearsons correlation; R = value, p = value, N = 15)

How well did you know this?

Not at all

Perfectly

what do we receive from a pearsons cor.test command and how do you infer it?

you will get a p-value and a test statistic found underneath “cor” at the bottom of the output which is our correlation coefficient

(1) if the p value is smaller than <0.05 then we can assume that the two variables are correlated

(2) if the cor value if positive it means there is a positive correlation, if the cor value is negative it means there is a negative correlation

How well did you know this?

Not at all

Perfectly

if the shapiro.test results are greater/lower than 0.05 we:

> 0.05: data IS normally distributed

<0.05: data IS NOT normally distributed

How well did you know this?

Not at all

Perfectly

what is the non-parametric equivalent of the pearsons correlation?

spearman’s rank

How well did you know this?

Not at all

Perfectly

spearman’s rank overall function and assumptions:

ranks both the x and y variable used to calculate a measure of correlation
assumptions: none about distribution of variables; random sampling; independence of observations

How well did you know this?

Not at all

Perfectly

what does spearman’s rank correlation, r/s / R/s describe?

describes the strength and direction of the linear association between the ranks of the two variables, number between -1 & 1

How well did you know this?

Not at all

Perfectly

what is different between the pearsons correlation and spearman’s rank?

Study These Flashcards

pearsons is parametric data that is unranked

spearman’s rank is non-parametric data that in ranked

what must be done to your variables when calculating spearman’s rank?

Study These Flashcards

the data from both variables must be ranked separately from low to high - lowest values gets rank one and they progressively get higher integers for the larger they are

how can you use R to calculate your spearman’s rank values?

Study These Flashcards

we, once again use:

cor.test(variable one, variable two, method = “spearman”)

how do we infer the results of our spearman’s rank values in R?

Study These Flashcards

you are given a p-value: if it is greater than 0.05 then we must accept the null hypothesis and assume no correlation, if the value is smaller than 0.05 we must accept the alternative hypothesis and assume a correlation

you are also given a “rho” test statistic (Rs) at the bottom of the output: ONLY if the p-value is <0.05, we look at this value - if it is positive it suggests a positive correlation and if it is a negative value it suggests a negative correlation

what is a crucial thing you must always do before statistically testing correlations to ensure you are using the right test?

you must always check if the data present for each variable is either parametric or non-parametric using the shapiro.test(variable name) command in R as parametric = pearsons and non-parametric = spearmans

what statistical tests do we use when investigating the difference between normally distributed samples?

for paired parametric samples: paired t-test for independent parametric samples: t-tests

what statistical tests do we use when investigating the difference between non-parametric samples?

for non-parametric paired samples = Paired Wilcoxon Test for non-parametric independent samples = Mann-Whitney U Test / Wilcoxon test

when is students t-test used?

normal distribution of both groups and equal variances

when is Welch’s t-test used?

normal distribution of both groups and unequal variance

when is Mann Whitney U Test / Wilcoxon Test used?

non-normal distribution (no assumptions)

how can we test for normality?

graphically: histograms or quantile plots formal tests: Shapiro-Wilk Test [shapiro.test(variable name)]

how can you use R and histograms to test for normality when you are comparing two groups?

hist(x_variable[male_type==“control”]) hist(x_variable[male_type==“knockout”])

F-test command:

var.test(x-variable~y-variable)

what requirements do we need in order to do a t-test?

we need to ensure we have normally distributed data and non-differing variance test distribution using: shapiro.test(variable name) test for variance using: var.test(y-variable~x-variable) - if the p-value if over 0.05 in the F test it means that the variances do not differ

what is the t-test command in R?

t.test(y-variable~x-variable, var.equal=TRUE) note: you can only carry out the T-test providing that the variation is actually equal to zero, something you can find out through doing an f test with the command var.test(y~x) - your p-value must be >0.05 for the variances not to differ

how do we infer the results of our t-test in R?

p-value = if your p-value is below 0.05 it means there is a relationship

how can you get the mean results for different variable data sets?

you can get the mean results for variable data sets using the command: tapply(variable name, variable name, etc)

what statistical test would you use if continuous variable in each of the groups were normally distributed but the variances were not equal?

[variances = not equal] & [distribution = normal] = welch’s t-test

how do you do a welch’s-t-test in R, and how does it differ form a normal students t-test command?

you simply write > t.test(y-v~x-v) this above differs from the student t-test and the code doesn’t have the additional “…var.equal=TRUE” as we only apply welch’s test when variation isn’t equal

mann-whitney U test/ Wilcoxon-Mann-Whitney test requirements:

non-parametric equivalent of the independent samples t-test, one continuous variable (response variable) and one categorical variable with two factor levels (explanatory variable)

wilcoxon test in R:

(1) attach(data-frame) (2) names(data) (3) wilcox.test(y~x)

how do we infer the results of our wilcoxon test in R?

you are given a test statistic (= w) and also a p-value, if your p-value is <0.05 then it means that we accept the alternative hypothesis - significant difference established

when can we construct once we have confirmed statistical significant difference via a wilcoxon test in R?

once confirming significant difference (<0.05 - wilcoxon p-value) you can then construct your plot via the command: > plot(y~x, las = 1)

paired wilcoxon test requirements:

- non-parametric test, uses medians - assumptions: non - null hypothesis: median difference between measurements is 0

paired wilcoxon test R command and interpretation:

wilcox.test(paired-variable-1,paired-variable-2, paired = T) p-value <0.05 = reject null hypothesis - alternative hypothesis = true

Mann-Whitney U Test & Wilcoxon Tests are:

the exact same non-parametric test!

how can we check the for the assumptions of a linear regression in R?

at the end of your linear regression command you can check that constant variance and normal distribution is present via the command: > plot(m1) this will show you the two graphs where (1) = star-filled sky & (2) dots along the line - providing assumptions are met

the three statistical tests used for comparing two groups against a continuous variable:

students t-test: both groups are parametric with equal variances welch’s t-test: both groups parametric but with unequal variance Mann-Whitney U Test / Wilcoxon Test: non-parametric data (no assumptions

strongest statistical test out of std.t-test, welch’s & Mann-Whitney / Wilcoxon:

std.t-test

correlations are used when:

- when we are interested in how WELL x and y are related - if neither of the two variables is predicted to depend on the other (not clear what is the response variable and what is the explanatory)

Lecture 7 - Statistical Tests III: Correlations & Comparing Two Groups Flashcards

(50 cards)