Statistical tests- lecture slides Flashcards
(42 cards)
What is a two-way ANOVA test used for?
To test the interaction effects between two independent categorical variables on a continuous dependent variable.
What test would we use to compare two related measurements?
A Paired t-test
What test would we use to compare three or more independent groups?
One-way ANOVA
What test would we use to examine two factors together?
Two-way ANOVA
What is the Normal Distribution?
A bell-shaped curve where most values cluster around the mean.
Symmetrical, with equal probability of values occurring above or below the mean.
What is Central Limit Theorem?
As sample size increases, sample means approximate a normal distribution, regardless of the population’s shape. This is important as many statistical tests assume normality and it allows us to make inferences about populations from sample data.
Non-parametric tests
Wilcoxon, Kruskal-Wallis
What do t-tests do?
t-statistic (t)
Compares two groups (e.g., test scores of Group A vs. Group B).
A bigger t-value means a bigger difference between groups
What do ANOVA tests do?
F-statistic (F)
Compares three or more groups.
A bigger F-value means more variation between groups than within groups.
Common transformations
Log Transformation: Used when data is right-skewed (e.g., income, rainfall).
Square Root Transformation: Useful for count data, left-skewed data.
Key points of CLT
Larger samples reduce variability in sample means.
Sample means from a large enough sample will always approximate a normal distribution.
The CLT allows us to use statistical tests even if the population is not normally distributed → fundamental to statistical inference.
Practical Implication: Even with non-normal data, large sample sizes ensure the validity of hypothesis testing.
This principle underpins many statistical tests used in social sciences.
What is the Shapiro-Wilk Test
A statistical test that checks if a dataset is normally distributed.
Compares the actual data distribution to a perfect normal distribution.
Interpretation: p > 0.05 → Data is likely normal ✅; p ≤ 0.05 → Data is not normal ❌
What is the Levene’s Test
A test that checks if different groups have similar variances (spread of data).
This is important as T-tests & ANOVA assume equal variances between groups.
If variances aren’t equal, results may be misleading.
Interpreting Results:
p > 0.05 → Variances are equal ✅ (Safe to use t-test or ANOVA).
p ≤ 0.05 → Variances are not equal ❌ (Use Welch’s t-test or Welch’s ANOVA)
When should you use a paired t-test?
Comparing two related measurements within the same group.
Key assumptions of a paired t-test
Data is normally distributed.
Observations are dependent (paired data).
If assumptions fail: Wilcoxon Signed-Rank Test (non-parametric)
When should you use a one-way ANOVA
Comparing means across three or more independent groups.
Key assumptions of a one-way ANOVA
Normality in each group.
Homogeneity of variances (Levene’s test).
Observations are independent.
Alternatives if Assumptions Fail: Kruskal-Wallis Test, Welch’s ANOVA.
Properties of Kruskal-Wallis test
A non-parametric test.
Compares median ranks rather than means.
Used when data is not normal, variances are unequal, and when using ordinal data.
Properties of Welch’s ANOVA
A parametric test.
Compares means.
Used when normality assumed but variances are unequal.
When should you use a two-way ANOVA?
Evaluating the effect of two independent variables simultaneously
Key assumptions of a two-way ANOVA
Normality in each group.
Homogeneity of variances.
No significant interaction between variables unless tested.
Alternatives if Assumptions Fail: Generalised Linear Models (GLM)
When is a Chi Square test used? + properties
Used where nominal variables are being compared.
There are two or more categories for each variable.
Non-parametric.
The test measures the extent to which observed data departs from expectation
Results in a Chi-square test
Where chi-square is below 0.05 there is a high probability that the variables are correlated.
Results are between 0 and 1 where:
0 shows no association between the two variables
1 shows a perfect association between variables.
The higher the value, the stronger the association.
How do you calculate Pearson’s correlation in R?
> cor.test (variable1, variable2)
Pearson’s product-moment correlation data: