t tests Flashcards
two ways to generate two gets of scores
- repeated measures design
- independent groups design
repeated measures design
- every subject is exposed to each treatment condition and scores measured
- comparison are between scores for the same individuals under different conditions
- analyze using paired t test
independent groups design
- each subject is exposed to a single treatment condition and scores measured
- comparisons are between scores from different individuals under different conditions
- analyze using the independent t test
what are some sources of variation between scores?
effects of treatment
individual differences
- differences in baseline score
- differences in responsiveness to treatment
effects associated with uncontrolled variables
measurement error
repeated measures eliminates variations due to individual differences!
repeated-measures calculated difference (D) scores
Di = yBi — yAi
difference= (condition B) - (condition A)
this converts two sets of scores (conditions A and B) into a single set of scores (D)
- can compare and see if there’s variation between scores or variation between D scores
is there an effect of treatment?
Is there a difference between scores measured under the two conditions?
no effect of treatment, the sets of scores are identical
Di = yBi - yAi = 0
we wouldn’t expect all D scores to be exactly zero, but the mean of all D scores should approx to 0
null hypothesis: H0: muD=0
how to test null hypothesis
determine the probability that observed sample mean would have been obtained from a population where muD=0
- p value is the probability of obtaining observed data, if H0 is true
2 approaches to test H0
- use sample data to obtain a sampling distribution (similar to DSM) with a mean of D-bar
- determine location of muD=0 (H0) within this distribution
- calc 95% CIs - position the same sampling distribution with a mean of muD=0 (H0)
- determine location of observed D-bar within this distribution
- hypothesis testing
GLM for D scores
GLM is:
Di= D-bar + error
based on D scores, not raw scores
bootstrapping 95% Cl for muD
- use observed D scores to generate an infinite hypohtesis population of D scores
- randomly sample n=16 D scores (match original sample size) from pop and calculate D-bar
- repeat many time, generating new random sample and calculating D-bar
- generate a distribution of D-bar
- instead of distribution of sample means (DSM), this is the distribution of mean differences (DMD
bootstrapping 95% Cl in R
boot.t.test ()
- calculates standard deviation of DMD (standard error)
- defines a precise p value for probability of obtaining observed data, if null hypothesis is true
how to test H0
DMD estimates the distribution of all sample means that would be obtained from a population that matched our sample
we can located 0 in the distribution and focus on difference between D-bar and 0
subtract value of D-bar from all values in distribution
- calculates difference between D-bar and 0 but now 0 is the mean of the DMD
If D-bar - muD is small == observed data are likely if H0 is true
If D-bar - muD is large == observed data are unlikely if H0 is true
standardizing D-bar - muD
convert scores to z scores
zyi= (yi-y-bar)/(Sy)
subtract the mean of the DMD (0) then divide by standard deviation of the DMD
standard deviation of DMD represents the average value of the D-bar - muD
- standard deviation seeks to calculate the average deviation score
standardized values tell us the average difference that we would expect between D-bar and mu=0 if H0 is true
- if the observed difference between D-bar - muD is twice as large as the average difference that would be expected due to sampling variation, if H0 is true
central limit theorem and t statistics
t= D-bar / (Sd-bar) = D-bar / (Sd/ sqrt n)
if t=2, the observed difference between D-bar and H0 is twice as large as the average difference expected due to sampling variation
t distribution
start with noramly-distributed pop of D scores with muD=0
- pop has normal distribution, assumption of CLT
- pop can have nay standard deviation as the t statistic standardizes the values of D-bar - mud based on the observed value of Sd
define sample size (n)
randomly sample n D scores from poulation and calculate t based on sample (t= D-bar - muD/S d-bar)
repeat one million times and plot distribution
what does the t distribution represent ?
all values of t that would be expected based on the sample size if H0: muD=0, is true
what does the shape of the t distribution depend on?
sample size!
when calculating t, Sd is being used to estimate the corresponding population parameter
- estimate is less accurate for samples with smaller n
- t statistics will therefore be less precise for smaller n, resulting in some estimates of t that are unusually large or small
- result is t distribution with smaller n have wider tails
applying the CLT t statistic 95% CI
for 95% CI we modify the same t distribution so it has muD=D-bar and s= Sd-bar
- 95% CI defined by the boundaries of the central 95% of this distribution
95%CI = D-bar +/- (tcrit x Sd-bar)
using R to find 95% CI
- find tcrit
qt( p=0.025, df=15) — lower
qt(p=0.975, df=15) — upper - insert values into equation
95% CI= D-bar +/- (tcrit x Sd-bar)
paired test in R
t.test (study 1$b, study1$a, paired= TRUE)
t.test (study 2$b, study2$a, paired= TRUE)
- running a paired t test for differences between a and b
results:
t statistic, df, p=value, 95%CI interval, mean difference (D-bar)
bootstrapping t test in R
boot.t.test( study1$b, study1$a, paired=TRUE, R=100000)
boot.t.test( study2$b, study2$a, paired=TRUE, R=100000)
- from MKinfer package
results:
p value, standard deviation, 95%CI interval
independent groups design
compare two sets of scores generated through independent groups design
ex. x= grouping variable (CTRL, DRUG), y= all measured scores
fitting GLM with independent groups designs
recode xi with 0 or 1
- CTRL= 0
- DRUG= 1
use the model that predicts the value of y:
y-hat= b0 + b1x1
GLM is a linear model
- b0 is intercept, b1 is slope
hypothesis testing
CTRL referred to as 0 group with DRUG as 1 group
H0: no difference between groups
- no difference between the means of the populations
H0: u1-u0= 0
H1: u1-u0 =/ 0