Data analysis Flashcards

(95 cards)

1
Q

Why is statistics used in biology?

A

To understand and explain biological phenomena.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the process of using statistics?

A
  1. Formulate a precise, biological question
  2. Design the study -> variables, sampling, replicates, tests in mind
  3. Select/collect the appropriate data
  4. Graphically present the data
  5. Investigate the distribution of each sample, conducting tests to investigate whether or not they fit the normal distribution
  6. Select the appropriate statistical test OR fit a model to the data
    (7.) Evaluate the model fits
  7. Run the statistical test/model
  8. Interpret the statistical outputs
  9. Write up findings with figures
  10. Discuss what findings mean from a biological point of view
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is graphical presentation important?

A

To visualise patterns, trends and variation in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why must data distributions be investigated?

A

To check whether data fits assumptions -> e.g normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When should you select a statistical test or model?

A

After understanding the data type and distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does it mean to evaluate the model fit?

A

Assessing how well the model explains the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why are there many different statistical tests?

A

Because data varies in type and biological questions differ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the main data types?

A
  • Measurement -> continuous or discrete
  • Rank -> ordinal, numbers in order
  • Categorical -> nominal, frequencies of observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When are parametric tests used?

A

For normally distributed measurement data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When a non-parametric tests used?

A

For rank, cateogrical or non-nonormally distributed measurement data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why are replicated observations important?

A

To overcome variation and assess uncertainty.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does having few replicates mean?

A

Results are more likely due to chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does having more replicates mean?

A

Sample mean and SD are closer to real population values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What % of data falls within: 1 SD, 2 SDs, 3 SDs?

A

68%, 95%, 99.7%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What defines a normal distribution?

A

Mean (μ) = centre, SD (σ) = spread.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why is the normal distribution important?

A

It makes probability calculations exact and predictable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the steps of hypotheis testing?

A
  1. Formulate H₀
  2. Calculate the test statistic
  3. Calculate the p-value
  4. Decide whether to reject H₀
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do effect size and sample size influence p?

A

Larger effect or larger sample size = smaller p-value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the null hypothesis (H₀)?

A

There is no difference or relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What skew values indicate moderate skew?

A

-1 to -0.5 or +0.5 to +1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When is H₀ rejected?

A

When p≤0.05.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What defines parametric data?

A

Normally distributed, symmetrical, skew ≈ 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What skew values indicate approximate symmetry?

A

-0.5 to +0.5.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
What defines non-parametric data?
Mean pulled to tail, skew ≠ 0.
19
What is the formula for population mean(μ)?
μ = Σxᵢ / N
19
What skew values indicate high skew?
≤ −1 or ≥ +1.
19
What is the formula for sample mean (x̄)?
x̄ = Σxᵢ / n
19
What is the formula for population SD (σ)?
σ = √[Σ(xᵢ − μ)² / N]
19
What is variance (s²)?
SD squared, mean squared deviation from mean.
19
What is the formula for sample SD (s)?
s = √[Σ(xᵢ − x̄)² / (n−1)]
19
When is a Mann-Whitney U test used?
Instead of a two-sample t-test when data is not normally distributed.
19
What is the formula for variance?
s² = Σ(xᵢ − x̄)² / (n−1)
19
What is the formula for SE?
SE = s / √n
19
What is a one-sample t-test used for?
Comparing a sample mean to an expected value.
19
What does standard error measure?
Expected variance of sample mean around the population mean.
19
What is a paired t-test for?
Testing differences between paired or matched measurements.
19
What is a two-sample t-test used for?
Testing differences between means of two independent groups.
19
What is a 95% confidence interval?
A range of plausible population values based on a sample statistic and SE.
19
What are key assumptions of two-sample t-tests?
Data is normally distributed, independent observations, equal variance.
19
How does the Mann-Whitney U test work?
Ranks observations and compares rank distributions.
19
What does a smaller SE indicate?
More precise estimate of population mean.
19
How is a confidence interval calculated?
Sample mean ± critical t × SE
19
What is the purpose of a Kruskal-Wallis test?
Test differences in medians among ≥3 groups.
19
What happens after a significant result from a Kruskal-Wallis test?
Dunn post-hoc test to identify which groups differ.
19
What are the assumptions for a Mann-Whitney U test?
Non-normally distributed data, individual samples.
19
When is a Kruskal-Wallis test used?
Instead of a one-way ANOVA for non-normal data.
19
What is the null-hypothesis for a two-sample t-test?
Group means are not different.
20
What is the purpose of a Mann-Whitney U test?
Test differences in medians of two independent groups.
20
What is the null hypothesis for a Mann-Whitney U test?
No difference in medians.
20
What type of relationship does Spearman's rank detect?
Monotonic -> not necessarily linear.
20
What is the null hypothesis for a Kruskal-Wallis test?
No difference in group medians.
20
What is the purpose of Spearman's rank correlation?
To test whether ranks of 2 paired variables are correlated.
20
What is the null hypothesis for Spearman's rank?
There is no correlation or association.
20
When is Spearman used instead of Pearson?
In curved or non-linear relationships.
21
What does Pearsons correlation test?
Strength and direction of linear association.
22
Is Pearsons correlation symmetrical?
Yes, it has the same result when swapping X and Y.
23
Does correlation imply causation?
No.
24
What are the value meanings of r in a Pearson's correlation?
-1 = perfect negative 0 = no correlation +1 = perfect positive
25
What is the purpose of linear regression?
Model how changes in x predict changes in y.
26
What does regression allow that correlation does not?
Prediction and modelling.
27
What is the line equation for linear regression?
y = a + bx
28
In linear regression, what does the slope (b) represent?
Change in y per unit change in x.
29
In linear regression, what does the intercept (a) represent.
Expect y when x=0.
30
What are the assumptions of linear regression?
Linear x-y relationship, individual observations, normal residuals, constant variation (homoscedasticity).
31
What is the least sqaures estimation in linear regression?
Minimises sum of sqaured results.
32
What does R² measure?
Proportion of variance in y explained by x.
33
What does R²=0 mean?
Model explains none of the variation in y.
34
What does R²=1 mean?
Model explains all the variation in y.
35
What features should be assessed in scatter plots?
Direction, form, strength, outliers.
36
What are the possible directions of scatter plots?
Positive, negative, unclear.
37
What are the possible forms of scatter plots?
Linear, curvilinear, scattered.
38
What is the purpose of one-way ANOVA?
Test differences between means of ≥3 unrelated groups.
39
What are the key assumptions of an ANOVA?
Normally distributed data, equal variance, individual observations.
40
What does ANOVA partition?
Variance within and between groups.
40
What is the F-statistic?
Ratio of between-group variance to within-group variance.
40
What does a larger F-value mean?
The more likely it is that the means are significantly different.
40
What is the null hypothesis for a one-way ANOVA?
All group means are equal.
41
How do you calculate F?
1. Calculate the between groups sum of squares 2. Calculate the within groups sum of squares 3. Calculate the variance for each by dividing the sum of squares by the appropriate number of degrees of freedom 4. Calculate F ## Footnote F = between-group variance/within-group variance
42
How do you calculate between-group variance?
Sum of squares/(number of groups - 1)
43
How do you calculate within-group variation?
Sum of squares/(number of observations - number of groups)
44
What is the purpose of a two-way ANOVA?
Test the effects of 2 factors on one response variable.
45
What hypotheses are tested with a two-way ANOVA?
- Main effect of factor 1 - Main effect of factor 2 - Interaction effect (variable 1 x variable 2)
46
What is an interaction effect?
Effect of one factor depends on the other. - Can be synergistic -> 2 factors combined can amplify each others effects - Or inhibitory -> 2 factors reduce or weaken each others effects
47
When is repeated-measures ANOVA used?
When the same individual is measured multiple times.
48
When is nested ANOVA used?
When one factor is a subset within another.
49
What is the purpose of a Post-hoc Tukey test?
Pairwise comparisons after significant ANOVA.
50
When is a Post-hoc Tukey test required?
When a factor has more than 2 levels.
51
What is the purpose of χ² for differences?
Compare observed vs expected frequencies.
52
What are the data requirements for χ² for differences?
Count data, observations in one category only.
53
What is the null hypothesis for χ² for differences?
Observed frequencies do not differ from expected frequencies.
54
What is the purpose of χ² for association?
Test if two categorical variables are associated.
55
What is the null hypothesis for χ² for association?
Variables are independent.
56
What determines test choice first?
Measurement vs frequency data.
57
What are the violations to a one-way ANOVA?
Repeated measures (pseudoreplication), paired data, related individuals.