Sampling - ReSampling Methods Flashcards by Thomas Hein

what are non-parametric statistics?

Nonparametric statistics refer to a statistical method wherein the data is not required to fit a normal distribution. Nonparametric statistics uses data that is often ordinal, meaning it does not rely on numbers, but rather a ranking or order of sorts.

How well did you know this?

Not at all

Perfectly

what is an example of non-para alternatives to parametric tests?

Mann-Whitney U / Spearman

How well did you know this?

Not at all

Perfectly

And what do these tests do to the data in order to work?

They rank continuous data — we can always rank continuous data (become uniform distribution – might not be normal – but works well)

How well did you know this?

Not at all

Perfectly

what is the nonparametric alternative to the t test?

Mann-Whitney U or Log_Rank

How well did you know this?

Not at all

Perfectly

what is the nonparametric alternative to one way within subjects paired t test?

Wilcoxen signed ranks

How well did you know this?

Not at all

Perfectly

what is the nonparametric alternative to the ANOVA?

Kruskal-Wallis

How well did you know this?

Not at all

Perfectly

what is the nonparametric alternative to linear regression?

Spearman-Rank / Logistic Regression

How well did you know this?

Not at all

Perfectly

what are the advantages of non-parametric tests?

Ð Far fewer assumptions and less restrictive – apply more widely

Ð If the assumptions have been violated – they are more powerful.

How well did you know this?

Not at all

Perfectly

what are the dis-advantages of non-parametric tests?

Ð They test more complex null hypotheses (although personally I feel this is an advantage)
Ð Test is less powerful if parametric violations have not been violated

How well did you know this?

Not at all

Perfectly

when would we conduct non-para?

Ð Small sample
Ð Non-parametric
Ð Weird distribution

How well did you know this?

Not at all

Perfectly

What are Resampling – Randomisation tests?

Instead of working out theoretical distribution (normal stats)– we avoid that in non-para resampling

randomisation tests empirically generate a population distribution using computer power to numerically estimate (randomly and repeatedly) the sampling distribution for the data.

So, we generate from our original data a sampling distribution- running many pseudo experiments based on this generated sampling distribution. IF, under the null, you ran infinitely many experiments, generating many like the one you got, how would the statistic be distributed?

How well did you know this?

Not at all

Perfectly

what is the sampling distribution ?

Sampling distribution: Asampling distributionis a probabilitydistribution of a statistic obtained through a large number of samples drawn from a specific population. The sampling distributionof a given population is the distributionof frequencies of a range of different outcomes that could possibly occur for a statistic of a population.

How well did you know this?

Not at all

Perfectly

what is the difference between randomization tests and permutation tests?

Randomisation test = randomly sampled from a subset

Permutation test = every possible permutation sampled from

How well did you know this?

Not at all

Perfectly

how do randomization tests and permutation tests draw from the sample in comparison to bootstrapping?

without replacement

bootstrapping replaces data after being sampled

How well did you know this?

Not at all

Perfectly

why does bootstrapping replace data after being sampled?

so the effect is preserved and may be repeated

How well did you know this?

Not at all

Perfectly

what is the key to the randomisation and sampling without replacement process?

Each resample is obtained by randomising the actual data without replacement - as if H0 were true.

How well did you know this?

Not at all

Perfectly

after generating the sampling distribution, we…?

THEN use the artificially generated sampling distribution to test the statistic.

How well did you know this?

Not at all

Perfectly

how many times to resample?

generally, 5,000 - 10,000 times – if the null hypothesis is true, there will be no difference I any conceivable numerically generated combination.

How well did you know this?

Not at all

Perfectly

after we generate the sampling distribution we can

Study These Flashcards

do normal inferential statistics on it – (without para assumptions – and liberated in which stat you use e.g. mean difference, total, medium, whatever)

give some examples of what you could test them with?

Study These Flashcards

difference scores
median
mean
chi 2 value

sum squared deviation of means = SSQDEV (if not sqr’d it would have to be 0 - sqr’ing them makes them + ……. So it should be larger… kind of like if not identical to model sum of squares anyway in ANOVA – measuring the amount of variation across the groups)

What is bootstrapping?

Study These Flashcards

Resampling WITH replacement (some pairs can occur more than once – assuming the data is the best evidence of the population that’s out there, and we’re just drawing from it lots of times, but not limiting it so we get exactly the same sampling data, we allow numbers to occur more than once, the numbers more clustered together will be more represented in the data so we’ll get a picture that looks like the real population.

bootstrapping ‘preserves the effect which is there’ …. what does this mean?

Study These Flashcards

Preserve any effect that might be there in the data – i.e. any effect that was there will be in each random resample, and to the same extent, random variation will also be equally present, SO this gives a measure of our effect and amount of variation around it that you could expect over many resamples.

bootstrapping does not give us a xxxx - but provides xxxx

Study These Flashcards

p value

confidence intervals

it asks… across all these bootstrapped resamples….

Study These Flashcards

Across all these bootstrap samples: would a 0 difference lie within the confidence limits?

The effect is still there in our resampling, we now use that to calculate the possible range of values that our statistic might have, the 95% confidence limits, and say: under the null hypothesis, our statistic would have a value of say 0, is that in those 95% limits.

The randomisation process (generally) is done in such a way that it will, on average

remove the hypothetical effect of interest in these resampled samples.

From the resamples, One calculates the...

statistic of interest in each of the resampled samples

and so can construct a ....

a distribution of the statistic, across the many resamplings,

which thus generates a distribution of the statistic

under the null hypothesis.

One can then test whether the value of the statistic in the actual sample is sufficiently

unlikely (judged by its percentile value within the resampled distribution) to have occurred by chance.

This is directly analogous to

standard hypothesis testing (SHT),

except that in SHT one makes some

parametric assumptions about the data which lead one to expect that the statistic calculated, under the null hypothesis, will follow a well-known statistical distribution.

the sampling distribution when resampling is generated from....

the actual data.

Resampling does assume that each resampled sample is generated from a

“pseudo-population”

to obtain an exact p value from a resampled distribution of 1000, we need to.....

rank order the 1000 resampled values of alpha from lowest to highest and work out where the actual sample value (0.255) comes in such an ordering

Imagine it came between the 961st and 962nd resampled values. ...then what would the calculation be?

The exact 1-tailed p-value would be (1-961/1000), or 0.039. original value was (0.255)

how do you get a z-score?

z = (x – μ) / σ

If you increased the number of resampled data sets from 1000 to 5000 the mean and standard deviation of the resampling distribution for alpha would

change a little randomly, due to sampling fluctuations caused by having a larger sample; > one hopes that having more resamples would give more accurate estimates of the distribution parameters ++++++++although one can’t predict whether the s.d. values would go up or down, and there may be a point beyond which increasing the number of resamples has virtually no effect)++++++++++ ++++++No marks for this subsection if a student says the standard deviation would go down owing to the larger number of resamples+++++++

If you increased the number of participants from 100 to 200 then you would expect some

small random change in the value of alpha calculated from the whole sample (sampling fluctuation). However, if the 100 subjects used to calculate 0.255 were a random subset of the 200 then one would not expect any systematic change. No marks for this subsection if any systematic change is proposed.

If you (went from 100N at 1000 resamples) to using a sample of 200 participants to generate the 1000 resamples, then one would expect a change in....

standard deviation of the resampling distribution (the standard error of alpha) would be reduced. (This is expected for exactly the same reason that the standard error of the mean is reduced by increasing the sample size.) So if the value of alpha was found to be 0.255 in the whole sample of 200 (ignoring sampling fluctuation), then resampling 1000 times from this sample of 200 would find alpha to be more significantly different from zero than based on the “100 subject and 1000 resamples” distribution. Having more subjects would increase the power of the resampling test of alpha.

what is measure of effect size?

cohen's d

how to calculate cohen's d?

determined by calculating the mean difference between your two groups, and then dividing the result by the pooled standard deviation.

when adjusting power... increasing N works to

reduce the sampling error of the mean and thereby increases effect size

if the mean and SDs of two populations are the same ... running resamples could determine ....

determine the type I error rate from such a simulation (may be called alpha rate) as there is no effect in the data other than random (sampling variation).

If one calculates the statistic of interest in each of the resampled samples (and so can construct a distribution of the statistic, across the many resamplings, which thus generates a distribution of the statistic under the null hypothesis), how would one then test the achieved value in the data?

One can then test whether the value of the statistic in the actual sample is sufficiently unlikely (judged by its percentile value within the resampled distribution) to have occurred by chance.

The randomisation process (generally) is done in such a way that it will, on average,

remove the hypothetical effect of interest in these resampled samples

. Resampling does assume that each resampled sample is generated from a

“pseudo-population”.

The distribution of the data in the pseudo-population is exactly that in the observed sample because...

(because of the random nature of the resampling).

Sampling - ReSampling Methods Flashcards

(47 cards)