resampling statistics Flashcards

introduction, motivation, common uses of resampling, resampling for hypothesis testing, permutation tests, bootstrap resamples, other resample approaches, issues and concerns with resampling (26 cards)

1
Q

what does resampling technique represent?

A

novel method that is assumption-free(er) but retains power

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is the motivation for using resampling statistics?

A

fewer assumptions - so more accurate if assumptions not met

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what very generally is resampling statistics?

A

a few basic ideas that can be modified and reused

no equations or tables - to look up the maths is actually easier

thinking about the test forces us to think about our data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

why are resampling approaches not popular?

A

new (1979 is recent for stats) and assumed (incorrectly) to be more complex

parametric statistics do a reasonably good job and are discussed in simple (ish) language in textbooks

resamples requires a computer, some programming (not available in SPSS)

a lot of people don’t like thinking about their data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are common uses of resampling?

A

permutation tests

bootstrap resampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are permutation tests?

A

for comparing groups/conditions (e.g. t-test replacement)

shuffle data according to your conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is bootstrap resampling?

A

for generating confidence intervals (e.g. make error bars)

resample-with-replacement the values in a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the point of inferential statistics?

A

to determine the probability that the differences we measured were caused by sampling error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the principle of resampling techniques?

A

to measure that sampling error by repeating the sampling process a large number of times

can determine the likely error introduced by the sampling by looking at the variability in the resample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are the different types of permutation tests?

A

between-subject randomisation tests

within-subjects randomisation tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how do between-subject randomisation tests?

A

want to determine the likelihood of getting differences this extreme if the data all came from a single “population”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the process of between-subject randomisation tests?

A

simulate running the experiment many times with the data coming from one population - check what range of values commonly occur

in practice - keep the measured values but shuffle them (randomly assigning them to two groups), count how often the difference between the new means is bigger than between the measures means

assume that these are real and sensible values but don’t assume anything about their distribution

repeat process a large number of times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the null distribution in between-subject randomisation tests?

A

distribution of expected experiment results if the null were true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the summary of between-subjects randomisation tests?

A

repeat simulated experiment a large number of times, forcing the null hypothesis to be true, and check how extreme the real value was

no equation needed except for the statistic of interest (e.g. mean)

no table needed, the data themselves give the p value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is generalisation in between-subjects randomisation tests?

A

if our hypothesis is that the groups differ in diversity (standard deviation) rather than mean - repeat process large number of times

don’t need a whole new test if we change our opinion of what is interesting in the data

don’t need a parametric and non-parametric version of the test

very similar approach for within-subjects design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is the within-subjects randomisation test?

A

now the populations that we randomise are within subjects

in each resample, values are shuffled for each subject rather than across the dataset

just randomise the sign of difference for each pair

repeat process a large number of times

17
Q

how does number of participants affect the within-subjects randomisation test?

A

t-tests use n (number of subjects) in their equation

how is that accounted for here?

the sample size for the resamples has to be the same as the original data

the variance in the mean differences will automatically reflect the number of subjects

18
Q

what are bootstrap resamples?

A

used to calculate confidence intervals (confidence interval of a mean, standard error of the mean)

also determine whether some test value is inside or outside the 95% confidence interval (like a one-sample test)

used for confidence of simple values (like mean) or for fitted parameters (like gradient of a line)

resample with replacement

19
Q

how do you use bootstrap resamples to calculate SEM?

A

SEM = standard deviation of the means of all possible samples

can be estimated from the standard deviation of the bootstrap means

difference in SEM based on formula and SEM based on bootstrap resamples is due to fact that they are both estimates, calculated in two different ways

20
Q

how can bootstrap resamples be used to calculate a confidence interval?

A

95% confidence interval from the bootstraps represents the range of values that 95% of the means take

21
Q

what is bootstrapping with a model fit?

A

comparing the mean to a specific value is effectively having a very simple model of the world

bootstrapping generalises more easily to more complex models than just the mean

22
Q

what are the advantages of the bootstrap?

A

very general method - any type of model can be used and confidence intervals of any of its parameters can be estimated

can also be used to perform hypothesis testing (for one-sample tests)

not based on any assumptions about the data

no tables, no equations (except for the model)

23
Q

what are other resample approaches?

A

Jack-knife

Monte-Carlo method

24
Q

what is the Jack-knife?

A

similar to bootstrap

rather than “randomly sampling with replacement”, resampling done by “selecting all data except one”

25
what is Monte-Carlo method?
create data based on model simulations compare these to real data
26
what are the issues and concerns with resampling?
how many data samples (participants) do I need? - no a priori answer, try and see how many resamples must I generate? - 1000-10,000 depending on how accurate you want p which type of resampling should I use? - whatever best simulates the original sampling what if my data are not representative of the population? - garbage in = garbage out