Flashcards in Bootstrapping Deck (11):
what is bootstrapping?
any test/metric that uses random sampling with replacement
What is the empirical distribution function?
the distribution function associated with the empirical measure of a sample
What is resampling?
any method for:
- estimating the precision of sample statistics (medians, variances, perecentiles) by using subsets of data (jackknifing) or drawing randomly (bootstrapping)
- validating models using random subsets (bootstrapping, cross validation)
- exchange labels on data points (for significance tests) = permutation tests
intuition for bootstrap
- infer info about a population by resampling the sample data
- the 'population' is the sample and the quality of inference using resampled data can be measured
what is variability?
aka dispersion, scatter
- is the extent to which a distribution is stretched or squeezed
- measures: variance, std deviation, interquantile range (IQR=Q3[75%] -Q1[25%]), median absolute deviation
- terms restricted to cases where the same procedure can be applied to any number of data items
- single measure of some attribute of a sample
- calculated by applying a function (statistical algorithm) to the set of data = values of the items of the sample
what is point estimation?
- use of sample data to calculate a single value (a 'statistic') which is to serve as the best guess/best estimate of an unknown population parameter
recommendations for boostrap
- when the distribution of the statistic of interest is unknown or complex
- when the sample size for the unknown statistic is insufficient
- when power calculations have to be performed, and a small pilot sample is available
- MUST be sure that the distribution is NOT a power law/heavy tailed
how to do bootstarp (simple case)?
- using MonteCarlo algorithm: resample with replacement, use the same data set size as the original, calculate the statistic of interest, repeat to increase estimate's precision