Revision - Everything after lecture 5! Flashcards

1
Q

What does ANOVA stand for?

A

analysis of variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can you do if data is not normal and you still want to use a parametric test?

A

Log 10(x)

If any values are zero do
Log10(x+1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the convenient form of variance?

A

Sum of squares (SS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are sums of squares?

A
The sum of squared deviations from the mean. (The more values the bigger the SS)
e.g.
2, 5, 11 
Mean is (2+5+11)/3 = 6
Deviations from 6 are -4, -1, +5 
Squared deviations are 16, 1, 25 
Sum of squares is 16+1+25=42 
The SS for 2, 5, 11 is 42
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we account for the number of x values in the sums of squares? (standardise)

A

The mean square:

The sum of squares divided by the degrees of freedom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Taking into account the sums of squares, how do we calculate analysis of variance (ANOVA)?

A

SS of all numbers =

SS within samples + SS between samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the F statistic test?

A

to find out of the variance is greater than we would expect from the variance within samples.
If the variances are equal, F = 1
Reported as F(sample,error) = __

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Within the one-way anova, how can you test for differences between samples?

A

Use the Tukey test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the correlation coefficient r?

A

the degree to which 2 variables are correlated

Varies between 1 (perfect positive) and -1 (perfect negative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the range for

a) Very weak correlation
b) modest correlation
c) very strong correlation

A

a) 0 - 0.2
b) 0.4-0.7
c) 0.9-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is covariance and how is it calculated?

A

measure of correlation

sum of products / degrees of freedom (n-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is statistical significance of covariance checked?

A

Looking up the value of r for a given number of degrees of freedom in a table for critical values for r
- In minitab it is a Pearson correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the requirements for using r as a measure of correlation? (6)

A

> Data should be continuous or interval variables
The distribution of each variable needs to be normal->Check for Normality I.e. Anderson-Darling test and probability plot
The relationship between x and y must be linear
Check linearity using a plot
If not linear, data transformations can be attempted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Rsquared value?

A

The coefficient of determination.
Tells us whether the independent variable(s) we fit to our data analyses or models satisfactorily explain our dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the regression line equation and what does each part mean?

A
y = a + bx
y = constant + (slope X number of x units)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what does it mean when a horizontal line is above x or y?

A

The mean of

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the main differences between regression and correlation?

A

Regression establishes an equation that assumes x affects y.
Correlation establishes how they co-vary.

Regression can be used for prediction. e.g y is __ so x is __

Regression uses F statistic and t test to give P values
Correlation uses a correlation coefficient to indicate p values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the power of a statistical test?

A

Probability that it will yield statistically significant results
power of an analysis can vary from 0 to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What things affect power?

A
  • Sample size
  • Strength of the effect under study (e.g. strong relationship etc.)
  • The variability of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the power effect size? (d)

A

strength of the biological effect and its variability are combined into a measure
e.g. for the difference between 2 means:
d = (m1-m1)/SD
Range 0-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the Mann-Whitney test used for?

A

To compare the medians of two unpaired non-parametric samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the Wilcoxon test used for?

A

To compare the medians of two paired non-parametric samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the spearmans rank used for?

A

Non parametric, used with variables that are proportions/counts
All observations are converted to ranks
Significance is checked by looking up the value of spearmans rank for a given number of observations on a table of critical values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Is my data parametric?
If it is not continuous, it usually is parametric/non parametric
If it is non-normal it is parametric/non parametric

A

NOT parametric

NOT parametric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
When is a chi square test used?
To test on counts or frequencies of things on nominal scales (rather than difference or relationship)
26
What are the types of data (e.g. Ordinal) and what do they mean?
>Categorical/nominal: Non numerical >Ordinal: Obvious order > Quantitative: Continuous numerical > Discrete: Discontinuous (always a whole number)
27
What are the 4 types of distribution?
>Normal >Binomial >Negative binomial >Poisson
28
What is also referred to as a homogeneity, randomness, association, independence and goodness of fit test
Chi-square test
29
For the chi-square test, If there is a big discrepancy between the frequency we expect from the null hypothesis and the frequency we observe, the value of the calculated test statistic will (be more/less) than the critical value at the appropriate ______.
The value of the calculated test statistic will exceed the critical value at the appropriate number of freedoms. We will have to reject the null hypothesis
30
What are the assumptions of a chi-square test?
1) Random sampling | 2) Independent observations
31
How would you test homogeneity (similarity) of data in a program that generates random integers from 0 to 9?
1) Generate 100 integers 2) Compare generated frequencies to expectation 3) calculate chi-squared 4) Determine the degrees of freedom (n observations-1) = 9 5) Check critical value in table at 0.5 significance.
32
Pearsons: What does p=0.008 actually mean
The probability of getting the results obtained* if the null hypothesis is true is 0.008
33
What was the first immortal human cell line? | Where were the cells derived from?
HeLa | Derived from cervical cancer cells
34
what are the 3 R's in animal experimentation approval?
Replacement Reduction Refinement
35
When is ethical approval not required?
- For immortalised cell lines e.g. HeLa - Using hair and nails from living persons - On data that is freely available to the public and doesn't used personal data
36
Risk =
Risk = Likelihood X Hazard
37
The funding body ____ wants all UK Universities to have a data management policy in place by May 2015
EPSRC
38
Standard deviation and variance are very similar. Both are used to find the typical or average distance a value is to the mean... but what is the difference between them?
In fact, the only difference between the two is that in the variance you don't take the square root of the sum of the difference scores.
39
If your data is more spread out (has more variability) then you will have a higher/lower standard deviation
higher
40
What is the coefficient of variation?
Standard deviation / mean (helps interpret the magnitude of the standard deviation) e.g. If the standard deviation is .20 and the mean is .50, then the cv = .20/.50 = .4 or 40%
41
What is the empirical rule of standard deviation?
that the bulk of the data cluster around the mean in a normal distribution 68% of values fall within ±1 standard deviation of the mean 95% fall within ± 2 standard deviations of the mean 99% fall within ± 3 standard deviations of the mean
42
What is a popular way to show Q1, Q3, median and IQR
Box plot or box and whisker graph
43
If the resulting P-value of Levene's test is less than some critical value (typically 0.05), the obtained differences in sample variances are unlikely to have occurred based on random sampling from a population with equal variances. Thus, the null hypothesis of equal variances is ________
rejected
44
In the Levene's test, what does = 0.56, p=0.651 suggest about the homogenity of the variances
There IS homogenity (P>0.05) because the null hypothesis that there is equality of variances is accepted
45
What does 0.45 power mean
45% chance of getting a significant result
46
``` Difference = mu (1) - mu (2) Estimate for difference: -2.887 95% CI for difference: (-4.900, -0.874) T-Test of difference = 0 (vs not =): T-Value = -2.91 P-Value = 0.006 DF = 36 Both use Pooled StDev = 3.0548 ``` With this information, complete the following: t__ = ___, p=____. H0 is _____ and HA ______
t36 = -2.91, p=0.006. H0 is rejected and HA accepted
47
Pearson’s r = 0.940, p=<0.001 The regression equation is Hg in blood (ng/g) = - 20.6 + 0.641 Methyl Hg intake (mu g/day) What is parameter b?
0.641
48
What would you conclude about the correlation and significance of the following: Pearson’s r = -0.814, p=<0.001
there is a strong negative correlation that is highly significant
49
In a 2-way chi-square what is the rule for calculating the number of degrees of freedom?
2 variables, (numbers of columns – 1)(number of rows – 1)
50
What is the z value and how do you calculate it
(Z is otherwise known as a standard score) It indicates how many standard deviations an element is away from the mean. z = (sample proportion (p) - hypothesised proportion (p0) ) / Standard error
51
What is accuracy?
the closeness to the real value. | e.g. the units of measurement 5g vs 5.1g
52
What is precision?
the closeness of repeated measures to the same value... | e.g. using the same balance to weigh something
53
What are derived variables?
Usually calculated from two or more other variables.. for example, ratios or percentages
54
What is a distribution in stats?
An assumption of where the data will lie
55
If the variance is greater than the mean then the population is more ______ than random distribution
clumped/aggregated
56
If the variance is less than the mean then it is more ____ than random
ordered/uniform
57
In binomial distribution, there is a ____ distribution of number of events. When there are 2 possible outcomes for an event the probability of each is ___.
In binomial distribution, there is a discrete distribution of number of events. When there are 2 possible outcomes for an event the probability of each is constant/equal.
58
If the distribution of individuals were highly clumped or aggregated, quadrats used to sample variance from this population would show that variance in number of individuals per quadrat would be greater/less than the mean
greater variance than the mean
59
Negative binomial distribution is a discrete distribution that can be used to describe _____ data, and therefore variance is ____ than the mean.
Negative binomial distribution is a discrete distribution that can be used to describe clumped/aggregated data, and therefore variance is greater than the mean.
60
What methods can you test to test normal distribution? | 4
Kolmogorov-Smirnov Anderson-Darling Shapiro-Wilk Chi-square goodness of fit
61
In normal distribution ____% of the observations fall with in 1 standard deviation of the mean ____% fall within 2 SD ____% fall within 3 SD
68. 25% 95. 45% 99. 73%
62
What is Kurtosis
The measure of shape/flatness of distribution