Inferential statistics Flashcards
(12 cards)
why is randomisation important
avoid bias and confounders
sampling methods
simple random sampling
stratified sampling
convenience samplong
what is data provenance
Data provenance is the history of a dataset – Data cleaning processes – Imputations for missing data – How it was collected and by whom – Access to previous versions etc
standard error of the mean
SE = s/√n
Hypothesis testing
A hypothesis should be something that is testable and falsifiable
procedure for hypothesis test
• a random sample is drawn from a population
• a null hypothesis is formulated
• a test-statistic is calculated, of which we
know the probability distribution
• p-value: evidence for a hypothesis comparing the observed value of the statistic
with the corresponding distribution
• if the p-value<0.05, reject the null hypothesis
normality testing
Shapiro tests
H0 = Null hypothesis: Data is normally
distributed
H1 = Alternative hypothesis: Data is not
normally distributed
T-test & p value
The probability that these two variables are
from the same population, specifically the means.
z test and t test definition
• The probability that two means are from the
same populations
• The probability that two means are from
different populations
Type I Error
– errors where the result is statistically significant
despite the fact that the null hypothesis is true
– i.e., a diagnosis of cancer (“positive”) for healthy
subject
Solution: change alpha value from 5% to 1%
Type II Errors
– errors where the result is NOT significant despite
the fact that the hypothesis is true
– i.e., a diagnosis of healthy for a subject who has
cancer
Sensitivity and Specificity
• Sensitivity (power): proportion of the
positives that are correctly identified by a
test as being positive
• Specificity: proportion of negatives that are
correctly identified by a test as being
negative