Lecture 7 - Statistical Tests III: Correlations & Comparing Two Groups Flashcards

1
Q

what are the two statistical tests for correlative studies for both parametric and no-parametric data?

A

correlative parametric data - pearson’s correlation

correlative non-parametric data - spearman’s rank correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what will pearsons correlation be used for?

A

pearsons correlation will be used for two continuous variables where the correlation coefficient “R” describes the strength and direction of the association, numbered between -1 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what does correlation describe?

A

correlation describes the amount of variation or scatter in a scatter plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

the higher the scatter…

A

the lower the strength of correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

R values for positive, negative & no correlations:

A

positive correlation: r >0

negative correlation: r <0

no correlation: r = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the difference between a linear regression and a pearsons correlation?

A

the difference is that with a pearsons correlation there is no line fitted however with a linear regression there is an implemented regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

pearsons assumptions:

A

both continuous variables are normally distributed

random sampling

independence of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

pearsons null hypothesis:

A

there is no correlation between the variable p (rho) = 0

if the p-value is larger than 0.05, it is not worth discussing the R values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

regression or correlation?

A

how are x & y related? how much does y change with x? = regression

how well are x & y related? = correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

it is correlation rather than regression if:

A

it is correlation rather than regression if neither of the two continuous variables is predicted to depend on the other (e.g. there may not be a biological reason to assume such dependant - when the correlation seems to have little reasoning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

it is regression rather than a correlation if:

A

your data comes from an EXPERIMENT as with experiments there is usually a direct relationship [we assume y is dependant on x] between the two variables, therefore a linear regression must be plotted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how can we check to see if it is safe to use pearsons correlation?

A

after first deducing that it’s a random correlation and not a direct relationship [as a result of experiment], you must check if both variable data sets are of a normal distribution using the shapiro.test command in R

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how can we check for normal distribution of variable data before confirming if we can use pearsons correlation?

A

we attach our data frame and command for the names(data)

then for each name we input:

shapiro.test(variable_1_name)

shapiro.test(variable_2_name)

providing the p-values for both sets of data are ABOVE 0.05 we can assume for normal data distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how can you command R to give the pearsons correlation?

A

cor.test(variable_1, variable_2, method = “pearson”)

note: doesn’t matter what way around your variables are - answer will be the same either way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how do we write up the results of a pearsons cor.test in R?

A

the (variable one) and (variable two) of (object) were negatively/positively correlated (pearsons correlation; R = value, p = value, N = 15)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what do we receive from a pearsons cor.test command and how do you infer it?

A

you will get a p-value and a test statistic found underneath “cor” at the bottom of the output which is our correlation coefficient

(1) if the p value is smaller than <0.05 then we can assume that the two variables are correlated

(2) if the cor value if positive it means there is a positive correlation, if the cor value is negative it means there is a negative correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

if the shapiro.test results are greater/lower than 0.05 we:

A

> 0.05: data IS normally distributed

<0.05: data IS NOT normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is the non-parametric equivalent of the pearsons correlation?

A

spearman’s rank

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

spearman’s rank overall function and assumptions:

A
  • ranks both the x and y variable used to calculate a measure of correlation
  • assumptions: none about distribution of variables; random sampling; independence of observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what does spearman’s rank correlation, r/s / R/s describe?

A

describes the strength and direction of the linear association between the ranks of the two variables, number between -1 & 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is different between the pearsons correlation and spearman’s rank?

A

pearsons is parametric data that is unranked

spearman’s rank is non-parametric data that in ranked

22
Q

what must be done to your variables when calculating spearman’s rank?

A

the data from both variables must be ranked separately from low to high - lowest values gets rank one and they progressively get higher integers for the larger they are

23
Q

how can you use R to calculate your spearman’s rank values?

A

we, once again use:

cor.test(variable one, variable two, method = “spearman”)

24
Q

how do we infer the results of our spearman’s rank values in R?

A

you are given a p-value: if it is greater than 0.05 then we must accept the null hypothesis and assume no correlation, if the value is smaller than 0.05 we must accept the alternative hypothesis and assume a correlation

you are also given a “rho” test statistic (Rs) at the bottom of the output: ONLY if the p-value is <0.05, we look at this value - if it is positive it suggests a positive correlation and if it is a negative value it suggests a negative correlation

25
Q

what is a crucial thing you must always do before statistically testing correlations to ensure you are using the right test?

A

you must always check if the data present for each variable is either parametric or non-parametric using the shapiro.test(variable name) command in R

as parametric = pearsons
and non-parametric = spearmans

26
Q

what statistical tests do we use when investigating the difference between normally distributed samples?

A

for paired parametric samples: paired t-test

for independent parametric samples: t-tests

27
Q

what statistical tests do we use when investigating the difference between non-parametric samples?

A

for non-parametric paired samples = Paired Wilcoxon Test

for non-parametric independent samples = Mann-Whitney U Test / Wilcoxon test

28
Q

when is students t-test used?

A

normal distribution of both groups and equal variances

29
Q

when is Welch’s t-test used?

A

normal distribution of both groups and unequal variance

30
Q

when is Mann Whitney U Test / Wilcoxon Test used?

A

non-normal distribution (no assumptions)

31
Q

how can we test for normality?

A

graphically: histograms or quantile plots

formal tests: Shapiro-Wilk Test [shapiro.test(variable name)]

32
Q

how can you use R and histograms to test for normality when you are comparing two groups?

A

hist(x_variable[male_type==“control”])

hist(x_variable[male_type==“knockout”])

33
Q

F-test command:

A

var.test(x-variable~y-variable)

34
Q

what requirements do we need in order to do a t-test?

A

we need to ensure we have normally distributed data and non-differing variance

test distribution using: shapiro.test(variable name)

test for variance using: var.test(y-variable~x-variable) - if the p-value if over 0.05 in the F test it means that the variances do not differ

35
Q

what is the t-test command in R?

A

t.test(y-variable~x-variable, var.equal=TRUE)

note: you can only carry out the T-test providing that the variation is actually equal to zero, something you can find out through doing an f test with the command var.test(y~x) - your p-value must be >0.05 for the variances not to differ

36
Q

how do we infer the results of our t-test in R?

A

p-value = if your p-value is below 0.05 it means there is a relationship

37
Q

how can you get the mean results for different variable data sets?

A

you can get the mean results for variable data sets using the command: tapply(variable name, variable name, etc)

38
Q

what statistical test would you use if continuous variable in each of the groups were normally distributed but the variances were not equal?

A

[variances = not equal] & [distribution = normal] = welch’s t-test

39
Q

how do you do a welch’s-t-test in R, and how does it differ form a normal students t-test command?

A

you simply write > t.test(y-v~x-v)

this above differs from the student t-test and the code doesn’t have the additional “…var.equal=TRUE” as we only apply welch’s test when variation isn’t equal

40
Q

mann-whitney U test/ Wilcoxon-Mann-Whitney test requirements:

A

non-parametric equivalent of the independent samples t-test, one continuous variable (response variable) and one categorical variable with two factor levels (explanatory variable)

41
Q

wilcoxon test in R:

A

(1) attach(data-frame)

(2) names(data)

(3) wilcox.test(y~x)

42
Q

how do we infer the results of our wilcoxon test in R?

A

you are given a test statistic (= w) and also a p-value, if your p-value is <0.05 then it means that we accept the alternative hypothesis - significant difference established

43
Q

when can we construct once we have confirmed statistical significant difference via a wilcoxon test in R?

A

once confirming significant difference (<0.05 - wilcoxon p-value) you can then construct your plot via the command:

> plot(y~x, las = 1)

44
Q

paired wilcoxon test requirements:

A
  • non-parametric test, uses medians
  • assumptions: non
  • null hypothesis: median difference between measurements is 0
45
Q

paired wilcoxon test R command and interpretation:

A

wilcox.test(paired-variable-1,paired-variable-2, paired = T)

p-value <0.05 = reject null hypothesis - alternative hypothesis = true

46
Q

Mann-Whitney U Test & Wilcoxon Tests are:

A

the exact same non-parametric test!

47
Q

how can we check the for the assumptions of a linear regression in R?

A

at the end of your linear regression command you can check that constant variance and normal distribution is present via the command:

> plot(m1)

this will show you the two graphs where (1) = star-filled sky & (2) dots along the line - providing assumptions are met

48
Q

the three statistical tests used for comparing two groups against a continuous variable:

A

students t-test: both groups are parametric with equal variances

welch’s t-test: both groups parametric but with unequal variance

Mann-Whitney U Test / Wilcoxon Test: non-parametric data (no assumptions

49
Q

strongest statistical test out of std.t-test, welch’s & Mann-Whitney / Wilcoxon:

A

std.t-test

50
Q

correlations are used when:

A
  • when we are interested in how WELL x and y are related
  • if neither of the two variables is predicted to depend on the other (not clear what is the response variable and what is the explanatory)