Flashcards in Bootstrapping Deck (11):

1

## what is bootstrapping?

### any test/metric that uses random sampling with replacement

2

## What is the empirical distribution function?

### the distribution function associated with the empirical measure of a sample

3

## What is resampling?

###
any method for:

- estimating the precision of sample statistics (medians, variances, perecentiles) by using subsets of data (jackknifing) or drawing randomly (bootstrapping)

- validating models using random subsets (bootstrapping, cross validation)

- exchange labels on data points (for significance tests) = permutation tests

4

## intuition for bootstrap

###
- infer info about a population by resampling the sample data

- the 'population' is the sample and the quality of inference using resampled data can be measured

5

## what is variability?

###
aka dispersion, scatter

- is the extent to which a distribution is stretched or squeezed

- measures: variance, std deviation, interquantile range (IQR=Q3[75%] -Q1[25%]), median absolute deviation

6

## consistent? consistency?

### - terms restricted to cases where the same procedure can be applied to any number of data items

7

## statistic/sample statistic

###
- single measure of some attribute of a sample

- calculated by applying a function (statistical algorithm) to the set of data = values of the items of the sample

8

## what is point estimation?

### - use of sample data to calculate a single value (a 'statistic') which is to serve as the best guess/best estimate of an unknown population parameter

9

## recommendations for boostrap

###
- when the distribution of the statistic of interest is unknown or complex

- when the sample size for the unknown statistic is insufficient

- when power calculations have to be performed, and a small pilot sample is available

- MUST be sure that the distribution is NOT a power law/heavy tailed

10

## how to do bootstarp (simple case)?

### - using MonteCarlo algorithm: resample with replacement, use the same data set size as the original, calculate the statistic of interest, repeat to increase estimate's precision

11