NHST Flashcards by D'Mayah Lewis

sampling variation

every sample drawn at random from a population will be composed of different individuals and therefore have different means
- difference between means is termed sampling variation or sampling error

two samples drawn from a single population may occasionally have quite different means
- therefore possible to obtain a statistically significant t result, even though the two samples are from the same population
- false positive result, type 1 error

two samples drawn from quite different populations may occasionally have quite similar means
- possible to obtain a non-sig result, even though there’s a real difference between the two pops
- false negative result, or type 2 error

How well did you know this?

Not at all

Perfectly

type 1 error

false positive results

How well did you know this?

Not at all

Perfectly

type 2 error

false negative results

How well did you know this?

Not at all

Perfectly

error is unavoidable

NHST generates a p value, which is probability that we would have obtained our observed data if H0 is true

if p= 0.02, there’s only a 2% probability that we would have obtained our data if H0 is true, so we reject H0

if p= 0.1, there’s a 10% probability that we would have obtained our data if H0 is true, so we fail to reject H0

How well did you know this?

Not at all

Perfectly

alpha and beta

alpha (false positive): acceptable probability of a type one error
- usually alpha= 0.05
- accept we will make a type 1 error up to 5% of the time

beta (false negative): acceptable probability of a type 2 error
- usually b= 0.20
- make a type 2 error up to 20% of the time

How well did you know this?

Not at all

Perfectly

significant result

likely to lead to follow up studies, investment of considerable time and money which is wasted if based on type 1 error

How well did you know this?

Not at all

Perfectly

false negative result

interesting results get missed
less serious

How well did you know this?

Not at all

Perfectly

visualizing alpha

NHST: directly tests the null hypothesis, not the alternate
H0: single testable numerical predication
H1: doesn’t predict a single testable numerical prediction
- infinite values for which =/ 0 is true

t statistics in the trial are unlikely to be obtained if H0 is true
- if H0 is true, and observed data is in the tail= type 1 error

How well did you know this?

Not at all

Perfectly

visualizing beta

generate a distribution of t statistics that would be obtained if H1 is rue

have to pick a specific size of the effect that we are expecting

How well did you know this?

Not at all

Perfectly

effect size

Cohen’s d: continous data consisting of two groups (T TESTS)

eta squared: continuous data consisting of >2 groups (ANOVA)

partial eta squared: continuous data with >1 predictor variable (factorial ANOVA or multiple regression)

Pearson’s r: relationship between two continuous variables (correlation or regression)

R^2: continuous data with a continuous or categorical predictor (correlation, regression or ANOVA)

odds ratio (OR): categorical data (uses X^2 or logistic regression)

How well did you know this?

Not at all

Perfectly

general properties of effect size

quantifies the size of the effect of the predictor variable on the outcome variable (effect of x on y)

effect size is generally not affected by sample size
- large sample sizes do increase the probability of a statistically significant result
- large sample sizes do not systematically affect the effect size

the larger the effect size associated with the predictor variable, the easier it will be to obtain a statically-significant result
- probability of a false negative error will be reduced

How well did you know this?

Not at all

Perfectly

Cohen’s d

simpler measure of effect size, used for continuous data consisting of two group
- appropriate for data analyzed using paired and independent t-tests

expresses the difference between group means as the number of standard deviations between the means

How well did you know this?

Not at all

Perfectly

Cohen’s d: repeated measures

expressed the difference between condition means (D-bar) as the number of standard deviations (Sd) between the means

d= D-bar/ Sd

Sd: average value of Di- D-bar
- average residual or error from GLM
- if d=2, the difference between the conditions is twice the average error or residual

How well did you know this?

Not at all

Perfectly

Cohen’s d: independent groups

difference between group means (y-bar1 - y-bar0) as the number of standard deviations (Sp) between the means

d= (y-bar1 - y-bar0)/ Sp

Sp: pooled standard deviation and is the average difference between each score (y1 or y0) and the group mean
- average residual or error from GLM
- if Sp=2, the difference between the group means is twice as large as the average error/residual

How well did you know this?

Not at all

Perfectly

impacts on Cohen’s d

greater between difference of means = the greater Cohen’s D

Sp gets smaller= Cohen’s d gets larger

How well did you know this?

Not at all

Perfectly

interpreting Cohen’s d

Study These Flashcards

small effect: d<0.2
medium effect: 0.2 <d<0.8
large effect: d>0.8

Cohen’s d vs t

Study These Flashcards

d: divides by the average difference between each score and the mean (unaffected by n)

t: divides by the average difference between each mean of the sampling distribution and the distribution mean (affected by n)

visualizing beta

Study These Flashcards

calculate the t distribution based on H0 (shows alpha)
- gives range of t stats that we would expect if H0 was true

randomly sample scores per group from two normally-distributed population
then calculate t statistic
repeat million times to generate distribution that would be expected if H1 was true with d= + 0.8

power

Study These Flashcards

beta: probability of obtaining a negative result where H1 is true
- statistical analyses focus on power vs beta
- probability of a true positive result

aim for beta<0.2, power > 80%

3 ways to increase power to be 0.8

Study These Flashcards

make it easier to obtain a sig result by reducing alpha
- reduces the probability of a false negative, but will increase probability of false positive result
increase sample size
- reduce the standard error, and therefore the standard deviation of our probability distributions
change H1 by increasing expected effect size

changing n impact

Study These Flashcards

increasing n reduces standard error
- result in larger value of t ( if mean of distribution isn’t 0)
- mean of beta distribution will be shifted away from alpha distribution

changing n, changes df and alters tcrit

mainly shifts the mean of beta distribution away from alpha
- contribute to increased power

2 explanations for a non-significant result

Study These Flashcards

no effect of the manipulation
there is an effect of manipulation, but effect no effect detected due to weak effect size, low power or bad luck

power calculations in R

Study These Flashcards

pwr.t.test (n=12, d=0.8, sig.level= 0.05, power=NULL, type=”two.sample”)
n: number of observations
d: effect size
sig.level: significance level (type 1 error probability)
type: type of t test (one or two)
power: power of test

thresholds of statistical significance (alpha)

Study These Flashcards

without a threshold, there is no type 1 or 2 error

p hacking due to thresholds

any form of data manipulation in order to get results where p<0.05 - conduct multiple statistical analyses and only admit to performing the analyses that produced sig results - deciding to remove an outlier to generate sig result - remove an entire group to generate sig result - select a different statistical test to generate sig result

publication bias due to thresholds

less likely to be accepted to publication with entirely negative results

how did we adopt a threshold of significance?

using alpha to convert a continuous probability value into a binary decision results in type 1 and 2 error, leading to p-hacking and publication bias threshold of sig does make it easier to communicate scientific findings, especially to a lay audience

Karl Pearson

founder of mathematical statistics Pearson's r Pearson's X^2 test p value

William Sealy Gosset

developed t distributions (Student's t distribution) statistical work was developed to improve methodologies for brewing Guinness Company policy prevented him publishing under his own name, so adopted Student

Ronald Fisher

-Developed ANOVA (F test after Fisher) - formalized concept of null hypothesis, and stat test of H0 - formalized use of p values to evaluate H0 end point of NHST was p value - argued against using a threshold for stat significance

Jerzy Neyman and Egon Pearson

developed concept of alternative hypothesis calculated prob of H0 being true, and prob of H1 being true comparing two probabilities, selected which hypothesis was more likely argued their approach was better as it evaluated two competing probabilities to identify which was most probable fisher argued that applying binary decision would lead to confusion

best practices in NHST

know how to interpret results - sig results may be false positive esp if p= 0.05 - understand nonsig results may be false neg esp if sample size is small or effect size is small always report effect sizes to contextualize both sig and non-sig results plan analysis in advance to avoid p-hacking replicate findings esp if findings are critically important to future research but are only marginally sig apply meta analyses to research questions - reanalyze data from multiple related publication in attempt to resolve the apparent contradictions consider alternatives to NHST - Bayesian stats

NHST Flashcards

(32 cards)