NHST Flashcards
sampling variation
every sample drawn at random from a population will be composed of different individuals and therefore have different means
- difference between means is termed sampling variation or sampling error
two samples drawn from a single population may occasionally have quite different means
- therefore possible to obtain a statistically significant t result, even though the two samples are from the same population
- false positive result, type 1 error
two samples drawn from quite different populations may occasionally have quite similar means
- possible to obtain a non-sig result, even though there’s a real difference between the two pops
- false negative result, or type 2 error
type 1 error
false positive results
type 2 error
false negative results
error is unavoidable
NHST generates a p value, which is probability that we would have obtained our observed data if H0 is true
if p= 0.02, there’s only a 2% probability that we would have obtained our data if H0 is true, so we reject H0
if p= 0.1, there’s a 10% probability that we would have obtained our data if H0 is true, so we fail to reject H0
alpha and beta
alpha (false positive): acceptable probability of a type one error
- usually alpha= 0.05
- accept we will make a type 1 error up to 5% of the time
beta (false negative): acceptable probability of a type 2 error
- usually b= 0.20
- make a type 2 error up to 20% of the time
significant result
likely to lead to follow up studies, investment of considerable time and money which is wasted if based on type 1 error
false negative result
- interesting results get missed
- less serious
visualizing alpha
NHST: directly tests the null hypothesis, not the alternate
H0: single testable numerical predication
H1: doesn’t predict a single testable numerical prediction
- infinite values for which =/ 0 is true
t statistics in the trial are unlikely to be obtained if H0 is true
- if H0 is true, and observed data is in the tail= type 1 error
visualizing beta
generate a distribution of t statistics that would be obtained if H1 is rue
have to pick a specific size of the effect that we are expecting
effect size
Cohen’s d: continous data consisting of two groups (T TESTS)
eta squared: continuous data consisting of >2 groups (ANOVA)
partial eta squared: continuous data with >1 predictor variable (factorial ANOVA or multiple regression)
Pearson’s r: relationship between two continuous variables (correlation or regression)
R^2: continuous data with a continuous or categorical predictor (correlation, regression or ANOVA)
odds ratio (OR): categorical data (uses X^2 or logistic regression)
general properties of effect size
quantifies the size of the effect of the predictor variable on the outcome variable (effect of x on y)
effect size is generally not affected by sample size
- large sample sizes do increase the probability of a statistically significant result
- large sample sizes do not systematically affect the effect size
the larger the effect size associated with the predictor variable, the easier it will be to obtain a statically-significant result
- probability of a false negative error will be reduced
Cohen’s d
simpler measure of effect size, used for continuous data consisting of two group
- appropriate for data analyzed using paired and independent t-tests
expresses the difference between group means as the number of standard deviations between the means
Cohen’s d: repeated measures
expressed the difference between condition means (D-bar) as the number of standard deviations (Sd) between the means
d= D-bar/ Sd
Sd: average value of Di- D-bar
- average residual or error from GLM
- if d=2, the difference between the conditions is twice the average error or residual
Cohen’s d: independent groups
difference between group means (y-bar1 - y-bar0) as the number of standard deviations (Sp) between the means
d= (y-bar1 - y-bar0)/ Sp
Sp: pooled standard deviation and is the average difference between each score (y1 or y0) and the group mean
- average residual or error from GLM
- if Sp=2, the difference between the group means is twice as large as the average error/residual
impacts on Cohen’s d
greater between difference of means = the greater Cohen’s D
Sp gets smaller= Cohen’s d gets larger
interpreting Cohen’s d
small effect: d<0.2
medium effect: 0.2 <d<0.8
large effect: d>0.8
Cohen’s d vs t
d: divides by the average difference between each score and the mean (unaffected by n)
t: divides by the average difference between each mean of the sampling distribution and the distribution mean (affected by n)
visualizing beta
calculate the t distribution based on H0 (shows alpha)
- gives range of t stats that we would expect if H0 was true
randomly sample scores per group from two normally-distributed population
then calculate t statistic
repeat million times to generate distribution that would be expected if H1 was true with d= + 0.8
power
beta: probability of obtaining a negative result where H1 is true
- statistical analyses focus on power vs beta
- probability of a true positive result
aim for beta<0.2, power > 80%
3 ways to increase power to be 0.8
- make it easier to obtain a sig result by reducing alpha
- reduces the probability of a false negative, but will increase probability of false positive result - increase sample size
- reduce the standard error, and therefore the standard deviation of our probability distributions - change H1 by increasing expected effect size
changing n impact
increasing n reduces standard error
- result in larger value of t ( if mean of distribution isn’t 0)
- mean of beta distribution will be shifted away from alpha distribution
changing n, changes df and alters tcrit
mainly shifts the mean of beta distribution away from alpha
- contribute to increased power
2 explanations for a non-significant result
- no effect of the manipulation
- there is an effect of manipulation, but effect no effect detected due to weak effect size, low power or bad luck
power calculations in R
pwr.t.test (n=12, d=0.8, sig.level= 0.05, power=NULL, type=”two.sample”)
n: number of observations
d: effect size
sig.level: significance level (type 1 error probability)
type: type of t test (one or two)
power: power of test
thresholds of statistical significance (alpha)
without a threshold, there is no type 1 or 2 error