Miscellaneous Flashcards
(34 cards)
central limit theorem
for large data samples, the means of many samples are normally distributed, even if the individual sample is not normally distributed
alpha
- how much overlap you are willing to tolerate
- your tolerance for making a type 1 error
- the higher your alpha, the more likely you are to make a type 1 error
- typically 5%
- can be spread over 2 tails or 1 tail (2 is stricter than 1)
p value
- probability that your result is due to chance
- want P to be less than or equal to alpha
- the smaller your P value, the less likely your results are due to chance
generalizability/external validity
- the degree to which you can extrapolate a sample to your population
- the more exclusive/tighter your inclusion criteria for subjects is, the lower the generalizability
- the looser your criteria for subjects is, the higher your generalizability
paired test
- each subject gets the intervention you are testing
- measure value before and after intervention ( each subject acts as their own control)
Does standard error apply to population or sample?
population
what does 95% confidence interval mean?
- 95% chance that the true population mean is within the mean + or - 2 SE
5 steps of hypothesis testing
1) establish hypothesis (Ho, A=B)
2) establish alpha (usually 5%)
3) do the STAT MAGIC
4) compare your p value to alpha
5) reject or fail to reject Ho
a false positive is a type ___ error
1 (alpha)
a false negative is a type ___ error
2 (beta)
when is the likelihood of making a type 1 error at its lowest?
the first time you analyze your data (each time after your chance of type 1 error increases)
If you have an intervention that you believe will only slightly differ from the placebo, would you use a large or small sample?
small
- if difference between intervention and placebo is v small, then you want dispersion to be narrow enough to minimize the overlap
- SE= SD/ sqrt(n), so if you increase n (sample size), then you decrease the dispersion of your population data, and therefore decrease the overlap
describe the x and y variables of the chi-squared/fischer’s exact test.
- categorical x variable (can have multiple)
- categorical y variable (can have multiple)
describe the x and y variables of the correlation test.
- continuous x variable
- continuous y variable
- ALL CONTINUOUS VARIABLES MUST HAVE NORMAL DISTRIBUTION*
describe the x and y variables of the t-test and ANOVA tests
- categorical x variable ( 2 for t-test, 3+ for ANOVA)
- continuous y variable
- ALL CONTINUOUS VARIABLES MUST HAVE NORMAL DISTRIBUTION*
if your continuous variable does not follow normal distribution, what type of analysis should you use? what kind of graph do you use? what is the statistical test?
- survival analysis
- kaplan meier
- log rank
what are two examples of dependent variables that are never normally distributed?
- death (survival)
- length of stay in hospital
bonferroni correction
if you are doing multiple analyses, you are increasing the risk of a type 1 error, so to correct it, divide alpha by the number of analyses you are doing and use that as your new alpha value
What sample size would be needed if there were a large variance in the outcome variable?
large sample size
What size sample would be needed if the investigator want to have a study result that is extremely close to the true population mean?
large sample size
What sample size would be needed if the difference that the investigator wanted to be able to detect is extremely large?
small sample size
Explain: High tolerance for type I error results in large sample size
large sample size increases the risk of type 1 error
( the more analysis you do, the more likely you are to have a type 1 error, so if you are ok with type 1 errors, you are ok with a larger sample)
Explain: Smaller α results in larger sample size in order to detect a difference
- smaller alpha means lower tolerance for type 1 error, so you need larger sample to detect a difference
Explain: Paired data results in smaller sample size and with more data points
- paired data results in two data points for each subject, which means you will need less subjects to get the same sample size for an unpaired experiment