11 Flashcards

(10 cards)

1
Q

What is a confidence interval, and how is it interpreted?

A

A confidence interval provides a range of plausible values for a population parameter (e.g., mean) with a specified confidence level (e.g., 95%), indicating that if the study were repeated, the true parameter would fall within that range most of the time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When might nonparametric methods be preferable to parametric methods?

A

When data do not meet parametric assumptions (e.g., normality, equal variances), have small sample sizes, or include ordinal/categorical variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Give an example of a nonparametric test and its basic application.

A

The Wilcoxon rank-sum test (Mann–Whitney U) compares the central tendency of two independent groups when data are skewed or ordinal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the purpose of analysis of variance (ANOVA).

A

ANOVA tests whether there are statistically significant differences among the means of three or more independent groups by comparing between-group variability to within-group variability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a chi-squared test used for?

A

To assess whether observed frequencies in categorical data differ from expected frequencies under a null hypothesis of no association or no difference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does logistic regression differ from linear regression?

A

Logistic regression models the probability of a binary outcome using a logit link, whereas linear regression predicts a continuous outcome using a linear relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is it important to assess residuals after fitting a regression model?

A

Checking residuals helps verify model assumptions—such as normality, homoscedasticity, and independence—and identify potential outliers or influential observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define overfitting in the context of model building.

A

Overfitting occurs when a model captures noise or random fluctuations in the training data, performing well on that data but poorly on new, unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What strategies can help prevent overfitting?

A

Use simpler models, cross-validation, regularization techniques (e.g., Lasso, Ridge), and ensure adequate sample size relative to the number of predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why should descriptive plots always accompany numerical summaries in data analysis?

A

Because visualizations (e.g., histograms, scatterplots) reveal patterns, outliers, and distribution shapes that numerical summaries alone might obscure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly