Lecture 7 - Statistical Tests III: Correlations & Comparing Two Groups Flashcards

Question 1

Q

what are the two statistical tests for correlative studies for both parametric and no-parametric data?

Answer

A

correlative parametric data - pearson’s correlation

correlative non-parametric data - spearman’s rank correlation

Question 2

Q

what will pearsons correlation be used for?

Answer

A

pearsons correlation will be used for two continuous variables where the correlation coefficient “R” describes the strength and direction of the association, numbered between -1 and 1

Question 3

Q

what does correlation describe?

Answer

A

correlation describes the amount of variation or scatter in a scatter plot

Question 4

Q

the higher the scatter…

Answer

A

the lower the strength of correlation

Question 5

Q

R values for positive, negative & no correlations:

Answer

A

positive correlation: r >0

negative correlation: r <0

no correlation: r = 0

Question 6

Q

what is the difference between a linear regression and a pearsons correlation?

Answer

A

the difference is that with a pearsons correlation there is no line fitted however with a linear regression there is an implemented regression line

Question 7

Q

pearsons assumptions:

Answer

A

both continuous variables are normally distributed

random sampling

independence of observations

Question 8

Q

pearsons null hypothesis:

Answer

A

there is no correlation between the variable p (rho) = 0

if the p-value is larger than 0.05, it is not worth discussing the R values

Question 9

Q

regression or correlation?

Answer

A

how are x & y related? how much does y change with x? = regression

how well are x & y related? = correlation

Question 10

Q

it is correlation rather than regression if:

Answer

A

it is correlation rather than regression if neither of the two continuous variables is predicted to depend on the other (e.g. there may not be a biological reason to assume such dependant - when the correlation seems to have little reasoning

Question 11

Q

it is regression rather than a correlation if:

Answer

A

your data comes from an EXPERIMENT as with experiments there is usually a direct relationship [we assume y is dependant on x] between the two variables, therefore a linear regression must be plotted

Question 12

Q

how can we check to see if it is safe to use pearsons correlation?

Answer

A

after first deducing that it’s a random correlation and not a direct relationship [as a result of experiment], you must check if both variable data sets are of a normal distribution using the shapiro.test command in R

Question 13

Q

how can we check for normal distribution of variable data before confirming if we can use pearsons correlation?

Answer

A

we attach our data frame and command for the names(data)

then for each name we input:

shapiro.test(variable_1_name)

shapiro.test(variable_2_name)

providing the p-values for both sets of data are ABOVE 0.05 we can assume for normal data distribution

Question 14

Q

how can you command R to give the pearsons correlation?

Answer

A

cor.test(variable_1, variable_2, method = “pearson”)

note: doesn’t matter what way around your variables are - answer will be the same either way

Question 15

Q

how do we write up the results of a pearsons cor.test in R?

Answer

A

the (variable one) and (variable two) of (object) were negatively/positively correlated (pearsons correlation; R = value, p = value, N = 15)

Question 16

Q

what do we receive from a pearsons cor.test command and how do you infer it?

Answer

A

you will get a p-value and a test statistic found underneath “cor” at the bottom of the output which is our correlation coefficient

(1) if the p value is smaller than <0.05 then we can assume that the two variables are correlated

(2) if the cor value if positive it means there is a positive correlation, if the cor value is negative it means there is a negative correlation

Question 17

Q

if the shapiro.test results are greater/lower than 0.05 we:

Answer

A

> 0.05: data IS normally distributed

<0.05: data IS NOT normally distributed

Question 18

Q

what is the non-parametric equivalent of the pearsons correlation?

Answer

A

spearman’s rank

Question 19

Q

spearman’s rank overall function and assumptions:

Answer

A

ranks both the x and y variable used to calculate a measure of correlation
assumptions: none about distribution of variables; random sampling; independence of observations

Question 20

Q

what does spearman’s rank correlation, r/s / R/s describe?

Answer

A

describes the strength and direction of the linear association between the ranks of the two variables, number between -1 & 1

Question 21

Q

what is different between the pearsons correlation and spearman’s rank?

Answer

A

pearsons is parametric data that is unranked

spearman’s rank is non-parametric data that in ranked

Question 22

Q

what must be done to your variables when calculating spearman’s rank?

Answer

A

the data from both variables must be ranked separately from low to high - lowest values gets rank one and they progressively get higher integers for the larger they are

Question 23

Q

how can you use R to calculate your spearman’s rank values?

Answer

A

we, once again use:

cor.test(variable one, variable two, method = “spearman”)

Question 24

Q

how do we infer the results of our spearman’s rank values in R?

Answer

A

you are given a p-value: if it is greater than 0.05 then we must accept the null hypothesis and assume no correlation, if the value is smaller than 0.05 we must accept the alternative hypothesis and assume a correlation

you are also given a “rho” test statistic (Rs) at the bottom of the output: ONLY if the p-value is <0.05, we look at this value - if it is positive it suggests a positive correlation and if it is a negative value it suggests a negative correlation

Question 25

Q

what is a crucial thing you must always do before statistically testing correlations to ensure you are using the right test?

Answer

A

you must always check if the data present for each variable is either parametric or non-parametric using the shapiro.test(variable name) command in R

as parametric = pearsons
and non-parametric = spearmans

Question 26

Q

what statistical tests do we use when investigating the difference between normally distributed samples?

Answer

A

for paired parametric samples: paired t-test

for independent parametric samples: t-tests

Question 27

Q

what statistical tests do we use when investigating the difference between non-parametric samples?

Answer

A

for non-parametric paired samples = Paired Wilcoxon Test

for non-parametric independent samples = Mann-Whitney U Test / Wilcoxon test

Question 28

Q

when is students t-test used?

Answer

A

normal distribution of both groups and equal variances

Question 29

Q

when is Welch’s t-test used?

Answer

A

normal distribution of both groups and unequal variance

Question 30

Q

when is Mann Whitney U Test / Wilcoxon Test used?

Answer

A

non-normal distribution (no assumptions)

Question 31

Q

how can we test for normality?

Answer

A

graphically: histograms or quantile plots

formal tests: Shapiro-Wilk Test [shapiro.test(variable name)]

Question 32

Q

how can you use R and histograms to test for normality when you are comparing two groups?

Answer

A

hist(x_variable[male_type==“control”])

hist(x_variable[male_type==“knockout”])

Question 33

Q

F-test command:

Answer

A

var.test(x-variable~y-variable)

Question 34

Q

what requirements do we need in order to do a t-test?

Answer

A

we need to ensure we have normally distributed data and non-differing variance

test distribution using: shapiro.test(variable name)

test for variance using: var.test(y-variable~x-variable) - if the p-value if over 0.05 in the F test it means that the variances do not differ

Question 35

Q

what is the t-test command in R?

Answer

A

t.test(y-variable~x-variable, var.equal=TRUE)

note: you can only carry out the T-test providing that the variation is actually equal to zero, something you can find out through doing an f test with the command var.test(y~x) - your p-value must be >0.05 for the variances not to differ

Question 36

Q

how do we infer the results of our t-test in R?

Answer

A

p-value = if your p-value is below 0.05 it means there is a relationship

Question 37

Q

how can you get the mean results for different variable data sets?

Answer

A

you can get the mean results for variable data sets using the command: tapply(variable name, variable name, etc)

Question 38

Q

what statistical test would you use if continuous variable in each of the groups were normally distributed but the variances were not equal?

Answer

A

[variances = not equal] & [distribution = normal] = welch’s t-test

Question 39

Q

how do you do a welch’s-t-test in R, and how does it differ form a normal students t-test command?

Answer

A

you simply write > t.test(y-v~x-v)

this above differs from the student t-test and the code doesn’t have the additional “…var.equal=TRUE” as we only apply welch’s test when variation isn’t equal

Question 40

Q

mann-whitney U test/ Wilcoxon-Mann-Whitney test requirements:

Answer

A

non-parametric equivalent of the independent samples t-test, one continuous variable (response variable) and one categorical variable with two factor levels (explanatory variable)

Question 41

Q

wilcoxon test in R:

Answer

A

(1) attach(data-frame)

(2) names(data)

(3) wilcox.test(y~x)

Question 42

Q

how do we infer the results of our wilcoxon test in R?

Answer

A

you are given a test statistic (= w) and also a p-value, if your p-value is <0.05 then it means that we accept the alternative hypothesis - significant difference established

Question 43

Q

when can we construct once we have confirmed statistical significant difference via a wilcoxon test in R?

Answer

A

once confirming significant difference (<0.05 - wilcoxon p-value) you can then construct your plot via the command:

> plot(y~x, las = 1)

Question 44

Q

paired wilcoxon test requirements:

Answer

A

non-parametric test, uses medians
assumptions: non
null hypothesis: median difference between measurements is 0

Question 45

Q

paired wilcoxon test R command and interpretation:

Answer

A

wilcox.test(paired-variable-1,paired-variable-2, paired = T)

p-value <0.05 = reject null hypothesis - alternative hypothesis = true

Question 46

Q

Mann-Whitney U Test & Wilcoxon Tests are:

Answer

A

the exact same non-parametric test!

Question 47

Q

how can we check the for the assumptions of a linear regression in R?

Answer

A

at the end of your linear regression command you can check that constant variance and normal distribution is present via the command:

> plot(m1)

this will show you the two graphs where (1) = star-filled sky & (2) dots along the line - providing assumptions are met

Question 48

Q

the three statistical tests used for comparing two groups against a continuous variable:

Answer

A

students t-test: both groups are parametric with equal variances

welch’s t-test: both groups parametric but with unequal variance

Mann-Whitney U Test / Wilcoxon Test: non-parametric data (no assumptions

Question 49

Q

strongest statistical test out of std.t-test, welch’s & Mann-Whitney / Wilcoxon:

Answer

A

std.t-test

Question 50

Q

correlations are used when:

Answer

A

when we are interested in how WELL x and y are related
if neither of the two variables is predicted to depend on the other (not clear what is the response variable and what is the explanatory)

Brainscape's Knowledge GenomeTM

Lecture 7 - Statistical Tests III: Correlations & Comparing Two Groups Flashcards

Brainscape's Knowledge Genome^TM