Stat Flashcards

1
Q

ANOVA (Analysis of Variance)

A

One-way ANOVA:
Dot plots or violin plots are excellent for visualizing data and group means, especially when there are multiple groups.
Ex - Comparing the average test scores of students from three different schools (one factor: school, with three levels: school 1, school 2, school 3).
Researchers could use one-way ANOVA to see if three different drugs have significantly different effects on reducing blood pressure.

Two-way ANOVA:
Column dot plots or violin plots can be used to show data points, group means, and how well groups overlap
Comparing the average test scores of students from three different schools, while also considering whether the students are male or female (two factors: school and gender).
Researchers could use two-way ANOVA to determine if there are significant differences in BMI when considering both diet and exercise levels.

interval plot for mean and CI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Box and whisker plots

A

display the median, quartiles, range, and potential outliers within a dataset,

compare the distribution of an outcome variable (e.g., blood pressure, pain level) across different treatment groups in a clinical trial.

compare data across different subgroups within a clinical trial, such as age, gender, or disease severity.

compare the distribution of adverse event rates across treatment groups, helping to identify potential side effects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

parameter vs statistic

A

a parameter describes a characteristic of an entire population, while a statistic describes a characteristic of a sample drawn from that population
sample static is a point estimator for population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Confidence interval

A

a range of values within which there is a high probability that a population parameter (like the population mean) lies.

A 95% CI means that in 95 out of 100 samples, the interval will include the true population mean.
95% CI for the mean height of students as [165 cm, 175 cm].
Interpretation: “We are 95% confident that the true average height of all students is between 165 cm and 175 cm.”

Higher confidence level → Wider interval (more certainty but less precision).
Sample Size (n): Larger n → Smaller SE → Narrower CI.
Population Variability (σ): More variability → Wider CI.

Confidence Level: Higher confidence → Wider CI.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Significance Level (α)

A

is a threshold used in hypothesis testing to determine whether to reject the null hypothesis (H0). It represents the maximum probability of making a Type I error (falsely rejecting H0 when it’s true).
α = 0.05 (5%) → Most common in research (default threshold).
Role in Hypothesis Testing
Compare α to the p-value:

If p-value ≤ α → Reject H0 (result is “statistically significant”).
If p-value > α → Fail to reject H0 (result is “not significant”).

Example: If α = 0.05
α=0.05 and p-value = 0.03 → Reject H0
If α=0.01
α=0.01 and p-value = 0.03 → Fail to reject H0

Error Type Definition Probability
1. Type I Error - (False Positive) Rejecting H0 when it’s true α (controlled by researcher)
2. Type II Error - (False Negative) Failing to reject H0 when it’s false β (depends on sample size and effect size)

α relates to the confidence level (CL): CL=1−α
If α=0.05, then CL = 95%.
If α=0.01, then CL = 99%.

Example: Drug Effectiveness Test
Null Hypothesis (H0): New drug has no effect.
Alternative (H1): New drug is effective.

Set α = 0.05.
If p-value = 0.02:
Since 0.02 ≤ 0.05, reject H0 → drug is statistically significant.

Rule
p-value = Probability of observed data (or more extreme) if H0 is true
Decision Rule Reject H0 if p ≤ α

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Multiple linear regression

A

create separate regression models for each dependent variable, incorporating a categorical variable to represent the sample group. Then, you can compare the coefficients of the sample group variable across the models to see if there are significant differences in the effect of the group on each dependent variable.

  1. Define Dependent Variables:
    Identify the multiple variables you want to predict (e.g., blood pressure, cholesterol levels, heart rate).
  2. Create a Grouping Variable:
    Create a categorical variable to represent the two samples (e.g., Group A vs. Group B).
  3. Separate Regression Models:
    For each dependent variable, run a separate multiple linear regression model.
  4. Include the Grouping Variable:
    In each regression model, include the grouping variable as one of the independent variables.
  5. Interpret Coefficients:
How well did you know this?
1
Not at all
2
3
4
5
Perfectly