11 Flashcards
(10 cards)
What is a confidence interval, and how is it interpreted?
A confidence interval provides a range of plausible values for a population parameter (e.g., mean) with a specified confidence level (e.g., 95%), indicating that if the study were repeated, the true parameter would fall within that range most of the time.
When might nonparametric methods be preferable to parametric methods?
When data do not meet parametric assumptions (e.g., normality, equal variances), have small sample sizes, or include ordinal/categorical variables.
Give an example of a nonparametric test and its basic application.
The Wilcoxon rank-sum test (Mann–Whitney U) compares the central tendency of two independent groups when data are skewed or ordinal.
Explain the purpose of analysis of variance (ANOVA).
ANOVA tests whether there are statistically significant differences among the means of three or more independent groups by comparing between-group variability to within-group variability.
What is a chi-squared test used for?
To assess whether observed frequencies in categorical data differ from expected frequencies under a null hypothesis of no association or no difference.
How does logistic regression differ from linear regression?
Logistic regression models the probability of a binary outcome using a logit link, whereas linear regression predicts a continuous outcome using a linear relationship.
Why is it important to assess residuals after fitting a regression model?
Checking residuals helps verify model assumptions—such as normality, homoscedasticity, and independence—and identify potential outliers or influential observations.
Define overfitting in the context of model building.
Overfitting occurs when a model captures noise or random fluctuations in the training data, performing well on that data but poorly on new, unseen data.
What strategies can help prevent overfitting?
Use simpler models, cross-validation, regularization techniques (e.g., Lasso, Ridge), and ensure adequate sample size relative to the number of predictors.
Why should descriptive plots always accompany numerical summaries in data analysis?
Because visualizations (e.g., histograms, scatterplots) reveal patterns, outliers, and distribution shapes that numerical summaries alone might obscure.