Bootstrapping Flashcards Preview

ML cards > Bootstrapping > Flashcards

Flashcards in Bootstrapping Deck (11):

what is bootstrapping?

any test/metric that uses random sampling with replacement


What is the empirical distribution function?

the distribution function associated with the empirical measure of a sample


What is resampling?

any method for:
- estimating the precision of sample statistics (medians, variances, perecentiles) by using subsets of data (jackknifing) or drawing randomly (bootstrapping)
- validating models using random subsets (bootstrapping, cross validation)
- exchange labels on data points (for significance tests) = permutation tests


intuition for bootstrap

- infer info about a population by resampling the sample data
- the 'population' is the sample and the quality of inference using resampled data can be measured


what is variability?

aka dispersion, scatter
- is the extent to which a distribution is stretched or squeezed
- measures: variance, std deviation, interquantile range (IQR=Q3[75%] -Q1[25%]), median absolute deviation


consistent? consistency?

- terms restricted to cases where the same procedure can be applied to any number of data items


statistic/sample statistic

- single measure of some attribute of a sample
- calculated by applying a function (statistical algorithm) to the set of data = values of the items of the sample


what is point estimation?

- use of sample data to calculate a single value (a 'statistic') which is to serve as the best guess/best estimate of an unknown population parameter


recommendations for boostrap

- when the distribution of the statistic of interest is unknown or complex
- when the sample size for the unknown statistic is insufficient
- when power calculations have to be performed, and a small pilot sample is available
- MUST be sure that the distribution is NOT a power law/heavy tailed


how to do bootstarp (simple case)?

- using MonteCarlo algorithm: resample with replacement, use the same data set size as the original, calculate the statistic of interest, repeat to increase estimate's precision


other bootstrap types?

- bayesian
- parametric
- wild
- gaussian process regression
- smooth