flashcard 5

Question

What does kurtosis describe in a distribution?

Answer 1

Kurtosis measures the “tailedness” or “peakedness” of a distribution compared to a normal distribution, indicating the propensity for outliers.

Answer 2

Because measures of central tendency (location) alone don’t convey the spread or variability; dispersion measures (e.g., range, interquartile range, standard deviation) describe how data are distributed around the center.

Answer 3

A contingency table (cross-tabulation) displays frequencies of two categorical variables simultaneously, facilitating analysis of associations or interactions.

Answer 4

A statistic that quantifies the strength and direction of a linear relationship between two continuous variables, ranging from –1 (perfect negative) to +1 (perfect positive).

Answer 5

When the relationship is non-linear, when outliers heavily influence the data, or when the variables are not measured on an interval/ratio scale.

Answer 6

Spearman correlation assesses the strength of a monotonic relationship using ranked data; it is robust to outliers and appropriate for ordinal data or non-linear (but monotonic) associations.

Answer 7

Use a histogram to visualize the detailed shape of a distribution (e.g., modality, skewness). Use a boxplot to summarize distribution in terms of median, quartiles, and identify outliers.

Answer 8

The box spans the interquartile range (Q1 to Q3) with a line at the median; whiskers extend to the most extreme values within 1.5×IQR; points beyond whiskers are outliers.

Answer 9

Larger sample sizes reduce the standard error, making estimates more precise. Greater variability in data increases uncertainty around estimates, requiring larger samples to achieve the same precision.

Answer 10

Random sampling minimizes selection bias and ensures that the sample represents the broader population, allowing valid inferences and generalizations.

Answer 11

A type I error is incorrectly rejecting a true null hypothesis (false positive). A type II error is failing to reject a false null hypothesis (false negative).

Answer 12

The significance level (α) is the probability threshold for rejecting the null hypothesis; setting α = 0.05 means there is a 5% chance of making a type I error.

Answer 13

Statistical power is the probability of correctly rejecting a false null hypothesis (1 − type II error rate). It increases with larger sample size, larger effect size, and lower data variability.

Answer 14

The p-value is the probability of observing data as extreme or more extreme than what was collected, assuming the null hypothesis is true.

Answer 15

A result can be statistically significant (unlikely due to chance) but have a trivial or clinically irrelevant effect size, so one must assess real-world impact.

Answer 16

A one-sided test assesses deviation in a specific direction (e.g., greater than); a two-sided test assesses deviations in both directions (greater or less than).

Answer 17

A confidence interval provides a range of plausible values for a population parameter (e.g., mean) with a specified confidence level (e.g., 95%), indicating that if the study were repeated, the true parameter would fall within that range most of the time.

Answer 18

When data do not meet parametric assumptions (e.g., normality, equal variances), have small sample sizes, or include ordinal/categorical variables.

Answer 19

The Wilcoxon rank-sum test (Mann–Whitney U) compares the central tendency of two independent groups when data are skewed or ordinal.

Answer 20

ANOVA tests whether there are statistically significant differences among the means of three or more independent groups by comparing between-group variability to within-group variability.

Answer 21

To assess whether observed frequencies in categorical data differ from expected frequencies under a null hypothesis of no association or no difference.

Answer 22

Logistic regression models the probability of a binary outcome using a logit link, whereas linear regression predicts a continuous outcome using a linear relationship.

Answer 23

Checking residuals helps verify model assumptions—such as normality, homoscedasticity, and independence—and identify potential outliers or influential observations.

Answer 24

Overfitting occurs when a model captures noise or random fluctuations in the training data, performing well on that data but poorly on new, unseen data.

Answer 25

Use simpler models, cross-validation, regularization techniques (e.g., Lasso, Ridge), and ensure adequate sample size relative to the number of predictors.

Answer 26

Because visualizations (e.g., histograms, scatterplots) reveal patterns, outliers, and distribution shapes that numerical summaries alone might obscure.

flashcard 5

(50 cards)