resampling statistics Flashcards
Why do we use resampling techniques?
-Fewer assumptions = more accurate if assumptions aren’t met
-General = basic ideas can be modified and reused, no equations/tables to look up
-Retains power
-Thinking about tests = allows us to think about our data
Why aren’t they very popular?
-They are new (1979)
-Assumed to be more complex
-Parametric stats are typically quite simple and do a good job
-Requires a computer and programming
-People don’t like having to think about their data
What are the 2 types of resampling techniques?
-Permutation tests
-Bootstrap resampling
What are permutation tests?
-Compare groups and conditions (replacing t-tests)
-Shuffle data in accordance to conditions
What are bootstrap resampling tests?
-Generate confidence intervals
-Make error bars
-Resample-with-replacement the values in sample
-Can look at the variability
What are the main points of inferential statistics?
-Whether the probability that the differences were caused by sampling error
-Resampling = measure sampling error by repeating sample process a lot of times
What would the process be if it was between subjects design?
-Run experiment multiple times
-Check the range of values that typically occur
-Shuffle the values
What distribution is created by shuffling?
-Null distribution
-Distribution of expected experimental results if the null was true
Give a summary of the process of using between-subjects test
-Repeat experiment large num. of times
-Force nut hypothesis to be true
-Check how extreme the real value was
-No equation
-No tables
How do we look at generalisation within between subjects?
-If the hypothesis is that the groups differ in diversity (SD) rather than the mean
How does shuffling change for within subjects design?
-The values are shuffled for each subject rather than the entire data set
-Randomise the sign of of difference for each pair
How do we manipulate the number of ppts?
-Sample size for resamples has to be the same as the original data
-Variance in mean differences = reflects the num. of subjects
Describe bootstrap resamples
-Used to calculate confidence intervals e.g. CI of mean and SE of mean
-Can determine whether a test value is inside or outside the confidence interval
-Resample with replacement
What is meant by resample with replacement?
-The piece of data can be used once, more than once or none at all
What is SEM?
-Standard deviation of the means of all possible samples
-Estimated from SD of bootstrap means
How do we calculate a confidence interval?
-95% confidence interval from bootstrap represents range of values that 95% of the means take
-Order them and cut off the highest and lowest 2.5%
How do we link one sample t-test to bootstrapping?
-Count how often a mean of 100 or less occurs within our bootstrap population
-Order data and find the values that are less than or equal to 100
What is bootstrapping with a model fit?
-Very simple model
-Generalises more easily
What are the advantages of bootstrapping?
-General method = any model can be used and any CI can be estimated
-Used to perform hypothesis testing (using one-sample t-test)
-No assumptions
-No tables or equations
What are the 2 other resample approaches?
-Jack-knife
-Monte-Carlo method
What is the Jack-knife approach?
-Similar to bootstrap
-Resampling done by selecting all data except one
What is the Monte-Carlo approach?
-Create data from model simulations
-Compare to real data
What are the issues and concerns around resampling?
-Not an exact number of resampling that you have to generate, can be between 1000 and 10000 depending on accuracy of p