Research methods Flashcards
The misuse of NHST (Null Hypothesis of Significance Testing)
- The American Statistical Association (2016) outlined principles on the misuse of p values in significance testing
- P-values are not measuring the probability of getting results by chance, or that a specific hypothesis is true
- Statistical significance is not the same as practical importance
- The p-value alone is not a good measure of evidence regarding model or hypothesis
Type 1 and Type 2 errors
- Type 1 = incorrectly accepting alternative hypothesis
- Type 2 = incorrectly accepting null hypothesis
Power
- The probability of finding an effect assuming one exists in the population
- Calculated as 1-B
- B is the probability of not finding the effect (usually 0.2 as stated by Cohen)
What effects power? 3 factors
- Effect size: an objective and standardised measure of the magnitude of an effect (larger value = bigger effect size)
Depends on test concluded – cohen’s d, pearson r, partial eta squared (ANOVA) - Number of participants: more participants = more ‘signal’, less ‘noise’. You should choose sample size depending on the expected effect size (larger effect size = fewer pp’s, smaller effect size = more pp’s)
- Alpha level: the probability of obtaining a Typer 1 error. We compare our p value to this criterion when testing significance
- Other factors: variability, design, test choice
Problems with alpha testing
- If we run multiple tests, this will increase the rate at which we might get a type 1 error (family wise experimental error rate)
- We can account for this by limiting the number of test or by using corrections such as Bonferroni correction (but this reduces statistical power)
What is the difference between one and two-tailed tests
- One-tailed- we hypothesise there will be a difference in scores, and we’re specific about which score will be higher (α=.05 at one end)
- Two-tailed- We hypothesise there will be a difference in scores, but this could be in either direction (α= .025 at both ends)
- For a one-tailed test, our p-value is half of the two-tailed p-value
Which type of test do I run?
- One-tailed tests are more powerful as a is higher
- However, there are several caveats and considerations so in most cases, it is recommended that run a two-tailed test
Power and study design:
- Within-subjects studies are more powerful than between-subjects studies
- To run a t-test with a: two-tailed design, medium effect size, a level of 0.05, power level of 0.8
- 1) Calculate the power we have obtained in a study post-hoc
- 2) Calculate how many participants we need to collect for a study a priori (this can be done using statistical programs such as G*Power)
What is analysis of variance?
- Analysis of variance (ANOVA) is an extension of the t-test
- it allows us to test whether 3 or more population means are the same, without reducing power
Assumptions of ANOVA
- the scores were sampled randomly and are independent
- roughly normal distribution
- roughly equal number of participants in the groups
- roughly equal variance for each condition
The basis of the ANOVA test
- analysis of variance is a way to compare multiple conditioned in a single, powerful test
- It was invented by Fisher (so its test statistic is F)
- It compares the amount of variance explained by our experiment with the variance that is unexplained
Between-groups ANOVA
- The aim of ANOVA is to compare the ‘amount of variance explained by our experiment with the variance that is unexplained’
- For between-group designs:
- A) the explained variance is the variance between group
- B) the unexplained is the variance within a group
- The calculation is referred to as the mean squared (MS) error
Degrees of freedom
- There are degrees of freedom associated with both variance values:
- A) degrees of freedom between conditions
- B) residual degrees of freedom
- ANOVA critical values require 2 d.f. values, one for each aspect of the variance
- We must report both
Pair-wise comparisons
- ANOVA tells us whether groups differ or not
- How do we know which particular conditions?
- Run the multiple comparisons (those we were trying to avoid0
- Some of these are ‘planned comparisons’, some are ‘post-hoc’
- Correct for multiple comparisons
Versions of ANOVA
- Analysis of variance (ANOVA) – one factor ANOVA and multifactor ANOVA
- Multivariate analysis of variance (MANOVA) – extension of ANOVA for multiple dependent variables
- Analysis of covariance (ANCOVA) – extension of ANOVA to handle continuous variables (e.g. correlations)
What is ANOVA based on? (for between-groups)
A) the variance explained by the experiment (the effect)
B) the residual (remaining) variance that cannot be explained (noise)
- For between-group design, the variance comes from only two sources:
A) variance between groups (explained)
B) variance within groups (unexplained)
Between group vs repeated measures for ANOVA
- For repeated-measure design, there are three possible sources of variance:
A) variance between conditions
B) variance between subjects (individual differences)
C) residual (unexplained) variance - In the between-group study, the variance between subjects fell under the category ‘unexplained’
What is the F-ratio and MS unexplained formulas?
- F = MS explained / MS unexplained
- MS explained is the variance between conditioned
- MS unexplained is the remaining variance after accounting for individual differences
- MS unexplained = MS total – MS explained – MS ind diffs
- MS ind diffs is the variance between subjects within a condition
- MS total is the variance of all subjects in all conditions
What is multi-factorial ANOVA?
- Like repeated-measures ANOVA
- Factors can all be within-subject, all between-group or a ‘mixed’ design
- We can have a ‘main’ effect or a variety of ‘interactions
- Main effect = one of the factors (IV) consistently affects the (DV) in the same way
- Interaction = the effect of one factor depends on the presence of another
What is 2x2 ANOVA?
- The multifactorial ANOVA is a single test
- It returns multiple F values (one for each main effect to be checked and one for the interaction)
- With one two levels, there is no need for post-hoc tests
- So 2x2 is just a single test (no family-wise error)
What are contingency tables?
- A table of frequencies for how often an often an observation occurs in a category
- Categories must be mutually exclusive and exhaustive
What is the Chi-Square test?
- Devised by Karl Pearson in 1900, also known as Pearson’s chi-square
- Calculates how often a particular observation falls into a category based on how many were expected by chance
- Null hypothesis = the frequencies observed were expected by chance
- Alternative hypothesis = the frequencies observed reflect real differences in categories
- Assumptions: 1. Independence (each person can only contribute to one cell of a contingency table) 2. Expected frequencies (all expected counts should be greater than 1 and no more than 20% of expected counts should be less than 5)
Violating expected frequencies: and how to reverse it
- Results in a loss of power
- How to reverse this:
- A) use an ‘Exact’ test instead
- B) remove data across one variable
- C) collapse levels of one variable
- D) collect more data
- E) accept the loss of power
Chi-square by hand: one IV
- Calculate expected frequencies
- Calculate Chi-Square value based on observed and expected frequencies
- Compare Chi-Square value against a critical values table
- To interpret a table, we need to know our degrees of freedom, and our desired alpha value (degrees of freedom = number of categories – 1)