10 Flashcards
(20 cards)
Why might one choose the median over the mean as a measure of central tendency?
Because the median is less sensitive to extreme values or skewed data, providing a better “central” value when outliers are present.
How do you interpret the standard deviation of a dataset?
Standard deviation quantifies the average distance of data points from the mean, reflecting the spread or variability in the dataset.
What is the standard error of the mean (SEM), and how does it differ from standard deviation?
SEM measures the variability of sample means if the same experiment were repeated, whereas standard deviation measures variability of individual observations around the sample mean.
Define skewness and explain what positive or negative skew indicates about a distribution’s shape.
Skewness measures asymmetry of a distribution. Positive skew (right-tailed) indicates a long tail on the higher side; negative skew (left-tailed) indicates a long tail on the lower side.
What does kurtosis describe in a distribution?
Kurtosis measures the “tailedness” or “peakedness” of a distribution compared to a normal distribution, indicating the propensity for outliers.
Why is it important to assess both location and dispersion when summarizing data?
Because measures of central tendency (location) alone don’t convey the spread or variability; dispersion measures (e.g., range, interquartile range, standard deviation) describe how data are distributed around the center.
What is a contingency table, and when is it used?
A contingency table (cross-tabulation) displays frequencies of two categorical variables simultaneously, facilitating analysis of associations or interactions.
Define the Pearson correlation coefficient (r).
A statistic that quantifies the strength and direction of a linear relationship between two continuous variables, ranging from –1 (perfect negative) to +1 (perfect positive).
Under what circumstances might Pearson correlation be misleading?
When the relationship is non-linear, when outliers heavily influence the data, or when the variables are not measured on an interval/ratio scale.
What is Spearman rank correlation, and why is it used instead of Pearson correlation in some cases?
Spearman correlation assesses the strength of a monotonic relationship using ranked data; it is robust to outliers and appropriate for ordinal data or non-linear (but monotonic) associations.
Describe when you would use a histogram versus a boxplot to explore a dataset.
Use a histogram to visualize the detailed shape of a distribution (e.g., modality, skewness). Use a boxplot to summarize distribution in terms of median, quartiles, and identify outliers.
In a boxplot, what do the box, whiskers, and outliers represent?
The box spans the interquartile range (Q1 to Q3) with a line at the median; whiskers extend to the most extreme values within 1.5×IQR; points beyond whiskers are outliers.
How do sample size and variability affect the reliability of statistical estimates?
Larger sample sizes reduce the standard error, making estimates more precise. Greater variability in data increases uncertainty around estimates, requiring larger samples to achieve the same precision.
Explain why random sampling is important in statistical studies.
Random sampling minimizes selection bias and ensures that the sample represents the broader population, allowing valid inferences and generalizations.
What are type I and type II errors in hypothesis testing?
A type I error is incorrectly rejecting a true null hypothesis (false positive). A type II error is failing to reject a false null hypothesis (false negative).
How does significance level (α) relate to type I error?
The significance level (α) is the probability threshold for rejecting the null hypothesis; setting α = 0.05 means there is a 5% chance of making a type I error.
What is statistical power, and what factors influence it?
Statistical power is the probability of correctly rejecting a false null hypothesis (1 − type II error rate). It increases with larger sample size, larger effect size, and lower data variability.
Define p-value in the context of hypothesis testing.
The p-value is the probability of observing data as extreme or more extreme than what was collected, assuming the null hypothesis is true.
Why is it important to distinguish between statistical significance and practical significance?
A result can be statistically significant (unlikely due to chance) but have a trivial or clinically irrelevant effect size, so one must assess real-world impact.
Describe the difference between one-sided and two-sided hypothesis tests.
A one-sided test assesses deviation in a specific direction (e.g., greater than); a two-sided test assesses deviations in both directions (greater or less than).