significance and power Flashcards
(24 cards)
what are the principles of null hypothesis significance testing (NHST)?
assume H0 is true - fit a model to get data, get a test statistic - calculate the probability of getting test statistic, assuming H0 is true (p)
how do you get the test statistic in NHST?
comparing amount of “signal” to “noise”
or “systematic variation” to “unsystematic variation”
or “effect” to “error”
how can NHST be mised?
p-value not measuring probability to getting results by chance or that a specific hypothesis is true
statistical significance is not the same as practical importance
p-value alone is not a good measure of evidence regarding a model of hypothesis
when do you get a type I error?
experiment result - H1 true
reality - H0 true
α
when do you get a type II error?
experiment result - H0 true
reality - H1 true
β
what is power?
probability of finding an effect assuming one exists in the population
how do you calculate power?
1 - β
1 = absolute certainty
β = usually how much type II error you are happy to accept, probability of not finding effect
what is β typically?
0.2
what factors affect power?
effect size
number of participants
alpha level
variability
design
test choice
what is effect size?
objective and standardised measure of magnitude of an effect
larger value = bigger effect size
can help to know how many participants
what are is the name of the effect size for the different tests?
Cohen’s d = t-test
Pearson’s r = correlation
Partial eta squared = ANOVA
how does a larger number of participants affect power?
more “signal”, less “noise”
more powerful study
more population have, less remove for sample error
how should choose the effect size?
depending on expected effect size
larger effect size = fewer participants needed to get “real” effect
smaller effect size = more participants needed to detect “real” effect
what is alpha level?
probability of obtaining a type I error
compare p value to this criteria when testing significance
when should an alpha level be chosen?
before running study
what is the general choice for alpha level?
.05
when are results statistically significant?
if p-value < α
what are the problems with alpha level?
balance of type I vs type II error
if run multiple tests, will increase rate at which might be get type I error - Familywise experimental error rate
can account for this by limiting number of tests or by using corrections such as Bonferroni correlation but this reduces power
how does design affect power?
within-subjects more powerful than between-subjects studies
design depends on type of study
what is a one-tailed test?
hypothesise will be difference in scores
specific about which score will be higher
α = .05 at one end
what is a two-tailed test?
hypothesise will be difference in scores
could be either direction
α = .025 at both ends
why does p-value change between one and two tailed test?
two-tailed hypothesis tries to assess in both directions
how does type of test affect power?
one-tailed test more powerful as α higher
several caveats and considerations
most recommended that run two-tailed tests
why does power and the factors that affect power matter?
want to calculate power obtained in study post-hoc
want to calculate how many participants we need to collect for a study a priori (can be done using statistical programs like G* power)