Statistical tests- lecture slides Flashcards

(42 cards)

1
Q

What is a two-way ANOVA test used for?

A

To test the interaction effects between two independent categorical variables on a continuous dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What test would we use to compare two related measurements?

A

A Paired t-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What test would we use to compare three or more independent groups?

A

One-way ANOVA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What test would we use to examine two factors together?

A

Two-way ANOVA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Normal Distribution?

A

A bell-shaped curve where most values cluster around the mean.
Symmetrical, with equal probability of values occurring above or below the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Central Limit Theorem?

A

As sample size increases, sample means approximate a normal distribution, regardless of the population’s shape. This is important as many statistical tests assume normality and it allows us to make inferences about populations from sample data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Non-parametric tests

A

Wilcoxon, Kruskal-Wallis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do t-tests do?

A

t-statistic (t)
Compares two groups (e.g., test scores of Group A vs. Group B).
A bigger t-value means a bigger difference between groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do ANOVA tests do?

A

F-statistic (F)
Compares three or more groups.
A bigger F-value means more variation between groups than within groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Common transformations

A

Log Transformation: Used when data is right-skewed (e.g., income, rainfall).
Square Root Transformation: Useful for count data, left-skewed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Key points of CLT

A

Larger samples reduce variability in sample means.
Sample means from a large enough sample will always approximate a normal distribution.
The CLT allows us to use statistical tests even if the population is not normally distributed → fundamental to statistical inference.
Practical Implication: Even with non-normal data, large sample sizes ensure the validity of hypothesis testing.
This principle underpins many statistical tests used in social sciences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Shapiro-Wilk Test

A

A statistical test that checks if a dataset is normally distributed.
Compares the actual data distribution to a perfect normal distribution.
Interpretation: p > 0.05 → Data is likely normal ✅; p ≤ 0.05 → Data is not normal ❌

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Levene’s Test

A

A test that checks if different groups have similar variances (spread of data).
This is important as T-tests & ANOVA assume equal variances between groups.
If variances aren’t equal, results may be misleading.
Interpreting Results:
p > 0.05 → Variances are equal ✅ (Safe to use t-test or ANOVA).
p ≤ 0.05 → Variances are not equal ❌ (Use Welch’s t-test or Welch’s ANOVA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When should you use a paired t-test?

A

Comparing two related measurements within the same group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Key assumptions of a paired t-test

A

Data is normally distributed.
Observations are dependent (paired data).
If assumptions fail: Wilcoxon Signed-Rank Test (non-parametric)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When should you use a one-way ANOVA

A

Comparing means across three or more independent groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Key assumptions of a one-way ANOVA

A

Normality in each group.
Homogeneity of variances (Levene’s test).
Observations are independent.
Alternatives if Assumptions Fail: Kruskal-Wallis Test, Welch’s ANOVA.

18
Q

Properties of Kruskal-Wallis test

A

A non-parametric test.
Compares median ranks rather than means.
Used when data is not normal, variances are unequal, and when using ordinal data.

19
Q

Properties of Welch’s ANOVA

A

A parametric test.
Compares means.
Used when normality assumed but variances are unequal.

20
Q

When should you use a two-way ANOVA?

A

Evaluating the effect of two independent variables simultaneously

21
Q

Key assumptions of a two-way ANOVA

A

Normality in each group.
Homogeneity of variances.
No significant interaction between variables unless tested.

Alternatives if Assumptions Fail: Generalised Linear Models (GLM)

22
Q

When is a Chi Square test used? + properties

A

Used where nominal variables are being compared.
There are two or more categories for each variable.
Non-parametric.
The test measures the extent to which observed data departs from expectation

23
Q

Results in a Chi-square test

A

Where chi-square is below 0.05 there is a high probability that the variables are correlated.
Results are between 0 and 1 where:
0 shows no association between the two variables
1 shows a perfect association between variables.
The higher the value, the stronger the association.

24
Q

How do you calculate Pearson’s correlation in R?

A

> cor.test (variable1, variable2)
Pearson’s product-moment correlation data:

25
How do you calculate Spearman's correlation in R?
> cor.test(asthma$Asthma, asthma$PM10, method="spearman") Spearman's rank correlation rho data:
26
What does regression measure?
Association where a causal relationship is believed to exist e.g. based on scientific studies. Compares one ‘dependent’ variable and one or more ‘independent’ variables. E.g. regression rather than correlation should be used when comparing lung cancer rates (dependent variable) vs. number of cigarettes smoked per day (independent).
27
Correlation
Allows you to identify some information about a relationship: the direction and significance
28
Regression
Allows you to model the relationship between the variables in detail: how much does x change with y?
29
When should you use linear regression?
When understanding more than the strength and direction of a relationship between two continuous variables When your data meet the necessary assumptions
30
Assumptions for linear regression models about the nature of the data/residuals
The data are independent The measurement scale of the data is interval or ratio (not categorical) The relationship between the variables is linear (but there are ways around this) There is no significant measurement error in the x variable The variance in the residuals is constant (the model fit is similar across the data) The residuals are normally distributed
31
How do you build a linear model in R?
Dependent ~ Independent (or Response ~ Explanatory) y ~ x y = mx + c c = constant or intercept – your model will include an intercept (unless you specify otherwise) m = slope, temperature lm(formula = Biomass ~ temperature, data = data)
32
What do the stars mean in R?
They correspond to the level of significance (the smaller the decimal the more significant)
33
What do residuals measure?
The distance from your line of best fit to your data
34
The smaller the residuals..
..the closer the line of best fit to the data
35
What must residuals be?
Normally distributed, equal of variance throughout
36
What does no pattern in the residuals mean?
The model is succeeding in capturing the pattern in the data
37
What can you do if it is not appropriate to use a linear model for a dataset?
Log-transforming- transformations express the same data on a different scale. lm(Biomass ~ log(Temperature), data = data)
38
What do logistic models model
The log-odds of an event as a linear combination of one or more independent variables
39
What is it called when there are correlated variables
collinearity
40
What must you test for when there are many explanatory variables
multicollinearity- collinearity between all variables
41
VIF (variation inflation scores)
over 5 is problematic, over 10 should not be included in the model
42
ANCOVA
analysis of COvariance a linear relationship with multiple levels or 'treatments' (categorical data) e.g. lm(SLA ~Altitude + Veg_type, data = data)