Sampling - ReSampling Methods Flashcards
what are non-parametric statistics?
Nonparametric statistics refer to a statistical method wherein the data is not required to fit a normal distribution. Nonparametric statistics uses data that is often ordinal, meaning it does not rely on numbers, but rather a ranking or order of sorts.
what is an example of non-para alternatives to parametric tests?
Mann-Whitney U / Spearman
And what do these tests do to the data in order to work?
They rank continuous data — we can always rank continuous data (become uniform distribution – might not be normal – but works well)
what is the nonparametric alternative to the t test?
Mann-Whitney U or Log_Rank
what is the nonparametric alternative to one way within subjects paired t test?
Wilcoxen signed ranks
what is the nonparametric alternative to the ANOVA?
Kruskal-Wallis
what is the nonparametric alternative to linear regression?
Spearman-Rank / Logistic Regression
what are the advantages of non-parametric tests?
Ð Far fewer assumptions and less restrictive – apply more widely
Ð If the assumptions have been violated – they are more powerful.
what are the dis-advantages of non-parametric tests?
Ð They test more complex null hypotheses (although personally I feel this is an advantage)
Ð Test is less powerful if parametric violations have not been violated
when would we conduct non-para?
Ð Small sample
Ð Non-parametric
Ð Weird distribution
What are Resampling – Randomisation tests?
Instead of working out theoretical distribution (normal stats)– we avoid that in non-para resampling
randomisation tests empirically generate a population distribution using computer power to numerically estimate (randomly and repeatedly) the sampling distribution for the data.
So, we generate from our original data a sampling distribution- running many pseudo experiments based on this generated sampling distribution. IF, under the null, you ran infinitely many experiments, generating many like the one you got, how would the statistic be distributed?
what is the sampling distribution ?
Sampling distribution: Asampling distributionis a probabilitydistribution of a statistic obtained through a large number of samples drawn from a specific population. The sampling distributionof a given population is the distributionof frequencies of a range of different outcomes that could possibly occur for a statistic of a population.
what is the difference between randomization tests and permutation tests?
Randomisation test = randomly sampled from a subset
Permutation test = every possible permutation sampled from
how do randomization tests and permutation tests draw from the sample in comparison to bootstrapping?
without replacement
bootstrapping replaces data after being sampled
why does bootstrapping replace data after being sampled?
so the effect is preserved and may be repeated
what is the key to the randomisation and sampling without replacement process?
Each resample is obtained by randomising the actual data without replacement - as if H0 were true.
after generating the sampling distribution, we…?
THEN use the artificially generated sampling distribution to test the statistic.
how many times to resample?
generally, 5,000 - 10,000 times – if the null hypothesis is true, there will be no difference I any conceivable numerically generated combination.
after we generate the sampling distribution we can
do normal inferential statistics on it – (without para assumptions – and liberated in which stat you use e.g. mean difference, total, medium, whatever)
give some examples of what you could test them with?
difference scores
median
mean
chi 2 value
sum squared deviation of means = SSQDEV (if not sqr’d it would have to be 0 - sqr’ing them makes them + ……. So it should be larger… kind of like if not identical to model sum of squares anyway in ANOVA – measuring the amount of variation across the groups)
What is bootstrapping?
Resampling WITH replacement (some pairs can occur more than once – assuming the data is the best evidence of the population that’s out there, and we’re just drawing from it lots of times, but not limiting it so we get exactly the same sampling data, we allow numbers to occur more than once, the numbers more clustered together will be more represented in the data so we’ll get a picture that looks like the real population.
bootstrapping ‘preserves the effect which is there’ …. what does this mean?
Preserve any effect that might be there in the data – i.e. any effect that was there will be in each random resample, and to the same extent, random variation will also be equally present, SO this gives a measure of our effect and amount of variation around it that you could expect over many resamples.
bootstrapping does not give us a xxxx - but provides xxxx
p value
confidence intervals
it asks… across all these bootstrapped resamples….
Across all these bootstrap samples: would a 0 difference lie within the confidence limits?
The effect is still there in our resampling, we now use that to calculate the possible range of values that our statistic might have, the 95% confidence limits, and say: under the null hypothesis, our statistic would have a value of say 0, is that in those 95% limits.