Stat Flashcards
(8 cards)
ANOVA (Analysis of Variance)
One-way ANOVA:
Dot plots or violin plots are excellent for visualizing data and group means, especially when there are multiple groups.
Ex - Comparing the average test scores of students from three different schools (one factor: school, with three levels: school 1, school 2, school 3).
Researchers could use one-way ANOVA to see if three different drugs have significantly different effects on reducing blood pressure.
Two-way ANOVA:
Column dot plots or violin plots can be used to show data points, group means, and how well groups overlap
Comparing the average test scores of students from three different schools, while also considering whether the students are male or female (two factors: school and gender).
Researchers could use two-way ANOVA to determine if there are significant differences in BMI when considering both diet and exercise levels.
interval plot for mean and CI
Box and whisker plots
display the median, quartiles, range, and potential outliers within a dataset,
compare the distribution of an outcome variable (e.g., blood pressure, pain level) across different treatment groups in a clinical trial.
compare data across different subgroups within a clinical trial, such as age, gender, or disease severity.
compare the distribution of adverse event rates across treatment groups, helping to identify potential side effects.
parameter vs statistic
a parameter describes a characteristic of an entire population, while a statistic describes a characteristic of a sample drawn from that population
sample static is a point estimator for population parameter
Confidence interval
a range of values within which there is a high probability that a population parameter (like the population mean) lies.
A 95% CI means that in 95 out of 100 samples, the interval will include the true population mean.
95% CI for the mean height of students as [165 cm, 175 cm].
Interpretation: “We are 95% confident that the true average height of all students is between 165 cm and 175 cm.”
Higher confidence level → Wider interval (more certainty but less precision).
Sample Size (n): Larger n → Smaller SE → Narrower CI.
Population Variability (σ): More variability → Wider CI.
Confidence Level: Higher confidence → Wider CI.
Significance Level (α)
is a threshold used in hypothesis testing to determine whether to reject the null hypothesis (H0). It represents the maximum probability of making a Type I error (falsely rejecting H0 when it’s true).
α = 0.05 (5%) → Most common in research (default threshold).
Role in Hypothesis Testing
Compare α to the p-value:
If p-value ≤ α → Reject H0 (result is “statistically significant”).
If p-value > α → Fail to reject H0 (result is “not significant”).
Example: If α = 0.05
α=0.05 and p-value = 0.03 → Reject H0
If α=0.01
α=0.01 and p-value = 0.03 → Fail to reject H0
Error Type Definition Probability
1. Type I Error - (False Positive) Rejecting H0 when it’s true α (controlled by researcher)
2. Type II Error - (False Negative) Failing to reject H0 when it’s false β (depends on sample size and effect size)
α relates to the confidence level (CL): CL=1−α
If α=0.05, then CL = 95%.
If α=0.01, then CL = 99%.
Example: Drug Effectiveness Test
Null Hypothesis (H0): New drug has no effect.
Alternative (H1): New drug is effective.
Set α = 0.05.
If p-value = 0.02:
Since 0.02 ≤ 0.05, reject H0 → drug is statistically significant.
Rule
p-value = Probability of observed data (or more extreme) if H0 is true
Decision Rule Reject H0 if p ≤ α
Multiple linear regression
create separate regression models for each dependent variable, incorporating a categorical variable to represent the sample group. Then, you can compare the coefficients of the sample group variable across the models to see if there are significant differences in the effect of the group on each dependent variable.
- Define Dependent Variables:
Identify the multiple variables you want to predict (e.g., blood pressure, cholesterol levels, heart rate). - Create a Grouping Variable:
Create a categorical variable to represent the two samples (e.g., Group A vs. Group B). - Separate Regression Models:
For each dependent variable, run a separate multiple linear regression model. - Include the Grouping Variable:
In each regression model, include the grouping variable as one of the independent variables. - Interpret Coefficients:
non parametric
- Common Non-Parametric Tests & When to Use Them
Test Purpose Clinical Example
Mann-Whitney U Test Compare two independent groups Comparing pain scores (VAS) between two treatment groups
Wilcoxon Signed-Rank Test Compare two related (paired) groups Pre- vs. post-treatment blood pressure in the same patients
Kruskal-Wallis Test Compare three or more independent groups Comparing HbA1c levels across three diabetic treatment regimens
Friedman Test Compare three or more related groups Repeated pain measurements (baseline, 1 month, 3 months)
Spearman’s Rank Correlation Assess monotonic relationships Correlation between age and disease severity score
Chi-Square Test Test independence in categorical data Association between smoking (yes/no) and lung cancer (yes/no)
Fisher’s Exact Test Small-sample alternative to Chi-Square Comparing rare adverse events between drug and placebo groups
McNemar’s Test Compare paired categorical data Change in diagnosis (positive/nega
- When to Use the Chi-Square Test?
✅ Comparing proportions between groups (e.g., drug A vs. drug B).
✅ Testing independence (e.g., is recovery rate independent of treatment type?).
✅ Binary or categorical outcomes (e.g., “recovered” vs. “not recovered”).
Types of Chi-Square Tests
Pearson’s Chi-Square Test (for large sample sizes, expected counts ≥5).
Fisher’s Exact Test (for small sample sizes, expected counts <5).