Stats Part 3 Flashcards
(26 cards)
What is the chi-square test used for?
To assess whether observed categorical frequencies differ from expected frequencies.
What are the assumptions of a chi-square test?
Expected frequencies > 5, independent observations, categorical data.
What is the chi-square statistic?
The sum of squared differences between observed and expected frequencies, divided by expected frequencies.
When would you use a chi-square test of independence?
To test if two categorical variables are associated.
What is ANOVA?
Analysis of variance, used to compare means across three or more groups.
What is the null hypothesis in ANOVA?
That all group means are equal.
What does a significant F-statistic in ANOVA suggest?
At least one group mean differs from the others.
What are assumptions of ANOVA?
Normality, homogeneity of variances, independent observations.
What is the multiple comparisons problem?
Testing many hypotheses increases the chance of false positives.
What is the Bonferroni correction?
A method to adjust p-values to reduce the chance of Type I errors in multiple testing.
What is linear regression?
A model that describes the relationship between a dependent variable and one or more predictors.
What is the slope coefficient in linear regression?
It represents the expected change in the outcome for a one-unit change in the predictor.
What does the intercept mean in linear regression?
The expected value of the outcome when all predictors are zero.
What is R-squared?
The proportion of variance in the outcome explained by the model.
What are the assumptions of linear regression?
Linearity, independence, homoscedasticity, normality of residuals.
What is logistic regression?
A model used to predict the probability of a binary outcome.
What is an odds ratio?
The ratio of the odds of an event occurring in one group to another.
Why can’t we use linear regression for binary outcomes?
Because it can predict probabilities outside the [0, 1] range.
What is bootstrapping?
A resampling method that draws repeated samples from the data with replacement.
What is permutation testing?
A method that shuffles data labels to test the null hypothesis without assumptions.
Why use bootstrapping?
To estimate confidence intervals and standard errors when the theoretical distribution is unknown.
What is a Monte Carlo simulation?
A technique using repeated random sampling to model uncertainty in a process.
When is simulation useful?
When analytical solutions are hard or when exploring complex systems.
What is the Shapiro-Wilk test?
A test for normality of a distribution.