Statistical tests- lecture slides Flashcards by Alex Attryde

What is a two-way ANOVA test used for?

To test the interaction effects between two independent categorical variables on a continuous dependent variable.

How well did you know this?

Not at all

Perfectly

What test would we use to compare two related measurements?

A Paired t-test

How well did you know this?

Not at all

Perfectly

What test would we use to compare three or more independent groups?

One-way ANOVA

How well did you know this?

Not at all

Perfectly

What test would we use to examine two factors together?

Two-way ANOVA

How well did you know this?

Not at all

Perfectly

What is the Normal Distribution?

A bell-shaped curve where most values cluster around the mean.
Symmetrical, with equal probability of values occurring above or below the mean.

How well did you know this?

Not at all

Perfectly

What is Central Limit Theorem?

As sample size increases, sample means approximate a normal distribution, regardless of the population’s shape. This is important as many statistical tests assume normality and it allows us to make inferences about populations from sample data.

How well did you know this?

Not at all

Perfectly

Non-parametric tests

Wilcoxon, Kruskal-Wallis

How well did you know this?

Not at all

Perfectly

What do t-tests do?

t-statistic (t)
Compares two groups (e.g., test scores of Group A vs. Group B).
A bigger t-value means a bigger difference between groups

How well did you know this?

Not at all

Perfectly

What do ANOVA tests do?

F-statistic (F)
Compares three or more groups.
A bigger F-value means more variation between groups than within groups.

How well did you know this?

Not at all

Perfectly

Common transformations

Log Transformation: Used when data is right-skewed (e.g., income, rainfall).
Square Root Transformation: Useful for count data, left-skewed data.

How well did you know this?

Not at all

Perfectly

Key points of CLT

Larger samples reduce variability in sample means.
Sample means from a large enough sample will always approximate a normal distribution.
The CLT allows us to use statistical tests even if the population is not normally distributed → fundamental to statistical inference.
Practical Implication: Even with non-normal data, large sample sizes ensure the validity of hypothesis testing.
This principle underpins many statistical tests used in social sciences.

How well did you know this?

Not at all

Perfectly

What is the Shapiro-Wilk Test

A statistical test that checks if a dataset is normally distributed.
Compares the actual data distribution to a perfect normal distribution.
Interpretation: p > 0.05 → Data is likely normal ✅; p ≤ 0.05 → Data is not normal ❌

How well did you know this?

Not at all

Perfectly

What is the Levene’s Test

A test that checks if different groups have similar variances (spread of data).
This is important as T-tests & ANOVA assume equal variances between groups.
If variances aren’t equal, results may be misleading.
Interpreting Results:
p > 0.05 → Variances are equal ✅ (Safe to use t-test or ANOVA).
p ≤ 0.05 → Variances are not equal ❌ (Use Welch’s t-test or Welch’s ANOVA)

How well did you know this?

Not at all

Perfectly

When should you use a paired t-test?

Comparing two related measurements within the same group.

How well did you know this?

Not at all

Perfectly

Key assumptions of a paired t-test

Data is normally distributed.
Observations are dependent (paired data).
If assumptions fail: Wilcoxon Signed-Rank Test (non-parametric)

How well did you know this?

Not at all

Perfectly

When should you use a one-way ANOVA

Comparing means across three or more independent groups.

How well did you know this?

Not at all

Perfectly

Key assumptions of a one-way ANOVA

Study These Flashcards

Normality in each group.
Homogeneity of variances (Levene’s test).
Observations are independent.
Alternatives if Assumptions Fail: Kruskal-Wallis Test, Welch’s ANOVA.

Properties of Kruskal-Wallis test

Study These Flashcards

A non-parametric test.
Compares median ranks rather than means.
Used when data is not normal, variances are unequal, and when using ordinal data.

Properties of Welch’s ANOVA

Study These Flashcards

A parametric test.
Compares means.
Used when normality assumed but variances are unequal.

When should you use a two-way ANOVA?

Study These Flashcards

Evaluating the effect of two independent variables simultaneously

Key assumptions of a two-way ANOVA

Study These Flashcards

Normality in each group.
Homogeneity of variances.
No significant interaction between variables unless tested.

Alternatives if Assumptions Fail: Generalised Linear Models (GLM)

When is a Chi Square test used? + properties

Study These Flashcards

Used where nominal variables are being compared.
There are two or more categories for each variable.
Non-parametric.
The test measures the extent to which observed data departs from expectation

Results in a Chi-square test

Study These Flashcards

Where chi-square is below 0.05 there is a high probability that the variables are correlated.
Results are between 0 and 1 where:
0 shows no association between the two variables
1 shows a perfect association between variables.
The higher the value, the stronger the association.

How do you calculate Pearson’s correlation in R?

Study These Flashcards

> cor.test (variable1, variable2)
Pearson’s product-moment correlation data:

How do you calculate Spearman's correlation in R?

> cor.test(asthma$Asthma, asthma$PM10, method="spearman") Spearman's rank correlation rho data:

What does regression measure?

Association where a causal relationship is believed to exist e.g. based on scientific studies. Compares one ‘dependent’ variable and one or more ‘independent’ variables. E.g. regression rather than correlation should be used when comparing lung cancer rates (dependent variable) vs. number of cigarettes smoked per day (independent).

Correlation

Allows you to identify some information about a relationship: the direction and significance

Regression

Allows you to model the relationship between the variables in detail: how much does x change with y?

When should you use linear regression?

When understanding more than the strength and direction of a relationship between two continuous variables When your data meet the necessary assumptions

Assumptions for linear regression models about the nature of the data/residuals

The data are independent The measurement scale of the data is interval or ratio (not categorical) The relationship between the variables is linear (but there are ways around this) There is no significant measurement error in the x variable The variance in the residuals is constant (the model fit is similar across the data) The residuals are normally distributed

How do you build a linear model in R?

Dependent ~ Independent (or Response ~ Explanatory) y ~ x y = mx + c c = constant or intercept – your model will include an intercept (unless you specify otherwise) m = slope, temperature lm(formula = Biomass ~ temperature, data = data)

What do the stars mean in R?

They correspond to the level of significance (the smaller the decimal the more significant)

What do residuals measure?

The distance from your line of best fit to your data

The smaller the residuals..

..the closer the line of best fit to the data

What must residuals be?

Normally distributed, equal of variance throughout

What does no pattern in the residuals mean?

The model is succeeding in capturing the pattern in the data

What can you do if it is not appropriate to use a linear model for a dataset?

Log-transforming- transformations express the same data on a different scale. lm(Biomass ~ log(Temperature), data = data)

What do logistic models model

The log-odds of an event as a linear combination of one or more independent variables

What is it called when there are correlated variables

collinearity

What must you test for when there are many explanatory variables

multicollinearity- collinearity between all variables

VIF (variation inflation scores)

over 5 is problematic, over 10 should not be included in the model

ANCOVA

analysis of COvariance a linear relationship with multiple levels or 'treatments' (categorical data) e.g. lm(SLA ~Altitude + Veg_type, data = data)

Statistical tests- lecture slides Flashcards

(42 cards)