11 Flashcards

Question 1

Q

What is a confidence interval, and how is it interpreted?

Answer

A

A confidence interval provides a range of plausible values for a population parameter (e.g., mean) with a specified confidence level (e.g., 95%), indicating that if the study were repeated, the true parameter would fall within that range most of the time.

Question 2

Q

When might nonparametric methods be preferable to parametric methods?

Answer

A

When data do not meet parametric assumptions (e.g., normality, equal variances), have small sample sizes, or include ordinal/categorical variables.

Question 3

Q

Give an example of a nonparametric test and its basic application.

Answer

A

The Wilcoxon rank-sum test (Mann–Whitney U) compares the central tendency of two independent groups when data are skewed or ordinal.

Question 4

Q

Explain the purpose of analysis of variance (ANOVA).

Answer

A

ANOVA tests whether there are statistically significant differences among the means of three or more independent groups by comparing between-group variability to within-group variability.

Question 5

Q

What is a chi-squared test used for?

Answer

A

To assess whether observed frequencies in categorical data differ from expected frequencies under a null hypothesis of no association or no difference.

Question 6

Q

How does logistic regression differ from linear regression?

Answer

A

Logistic regression models the probability of a binary outcome using a logit link, whereas linear regression predicts a continuous outcome using a linear relationship.

Question 7

Q

Why is it important to assess residuals after fitting a regression model?

Answer

A

Checking residuals helps verify model assumptions—such as normality, homoscedasticity, and independence—and identify potential outliers or influential observations.

Question 8

Q

Define overfitting in the context of model building.

Answer

A

Overfitting occurs when a model captures noise or random fluctuations in the training data, performing well on that data but poorly on new, unseen data.

Question 9

Q

What strategies can help prevent overfitting?

Answer

A

Use simpler models, cross-validation, regularization techniques (e.g., Lasso, Ridge), and ensure adequate sample size relative to the number of predictors.

Question 10

Q

Why should descriptive plots always accompany numerical summaries in data analysis?

Answer

A

Because visualizations (e.g., histograms, scatterplots) reveal patterns, outliers, and distribution shapes that numerical summaries alone might obscure.

11 Flashcards

(10 cards)