10 Flashcards

Question 1

Q

Why might one choose the median over the mean as a measure of central tendency?

Answer

A

Because the median is less sensitive to extreme values or skewed data, providing a better “central” value when outliers are present.

Question 2

Q

How do you interpret the standard deviation of a dataset?

Answer

A

Standard deviation quantifies the average distance of data points from the mean, reflecting the spread or variability in the dataset.

Question 3

Q

What is the standard error of the mean (SEM), and how does it differ from standard deviation?

Answer

A

SEM measures the variability of sample means if the same experiment were repeated, whereas standard deviation measures variability of individual observations around the sample mean.

Question 4

Q

Define skewness and explain what positive or negative skew indicates about a distribution’s shape.

Answer

A

Skewness measures asymmetry of a distribution. Positive skew (right-tailed) indicates a long tail on the higher side; negative skew (left-tailed) indicates a long tail on the lower side.

Question 5

Q

What does kurtosis describe in a distribution?

Answer

A

Kurtosis measures the “tailedness” or “peakedness” of a distribution compared to a normal distribution, indicating the propensity for outliers.

Question 6

Q

Why is it important to assess both location and dispersion when summarizing data?

Answer

A

Because measures of central tendency (location) alone don’t convey the spread or variability; dispersion measures (e.g., range, interquartile range, standard deviation) describe how data are distributed around the center.

Question 7

Q

What is a contingency table, and when is it used?

Answer

A

A contingency table (cross-tabulation) displays frequencies of two categorical variables simultaneously, facilitating analysis of associations or interactions.

Question 8

Q

Define the Pearson correlation coefficient (r).

Answer

A

A statistic that quantifies the strength and direction of a linear relationship between two continuous variables, ranging from –1 (perfect negative) to +1 (perfect positive).

Question 9

Q

Under what circumstances might Pearson correlation be misleading?

Answer

A

When the relationship is non-linear, when outliers heavily influence the data, or when the variables are not measured on an interval/ratio scale.

Question 10

Q

What is Spearman rank correlation, and why is it used instead of Pearson correlation in some cases?

Answer

A

Spearman correlation assesses the strength of a monotonic relationship using ranked data; it is robust to outliers and appropriate for ordinal data or non-linear (but monotonic) associations.

Question 11

Q

Describe when you would use a histogram versus a boxplot to explore a dataset.

Answer

A

Use a histogram to visualize the detailed shape of a distribution (e.g., modality, skewness). Use a boxplot to summarize distribution in terms of median, quartiles, and identify outliers.

Question 12

Q

In a boxplot, what do the box, whiskers, and outliers represent?

Answer

A

The box spans the interquartile range (Q1 to Q3) with a line at the median; whiskers extend to the most extreme values within 1.5×IQR; points beyond whiskers are outliers.

Question 13

Q

How do sample size and variability affect the reliability of statistical estimates?

Answer

A

Larger sample sizes reduce the standard error, making estimates more precise. Greater variability in data increases uncertainty around estimates, requiring larger samples to achieve the same precision.

Question 14

Q

Explain why random sampling is important in statistical studies.

Answer

A

Random sampling minimizes selection bias and ensures that the sample represents the broader population, allowing valid inferences and generalizations.

Question 15

Q

What are type I and type II errors in hypothesis testing?

Answer

A

A type I error is incorrectly rejecting a true null hypothesis (false positive). A type II error is failing to reject a false null hypothesis (false negative).

Question 16

Q

How does significance level (α) relate to type I error?

Answer

Study These Flashcards

A

The significance level (α) is the probability threshold for rejecting the null hypothesis; setting α = 0.05 means there is a 5% chance of making a type I error.

Question 17

Q

What is statistical power, and what factors influence it?

Answer

Study These Flashcards

A

Statistical power is the probability of correctly rejecting a false null hypothesis (1 − type II error rate). It increases with larger sample size, larger effect size, and lower data variability.

Question 18

Q

Define p-value in the context of hypothesis testing.

Answer

Study These Flashcards

A

The p-value is the probability of observing data as extreme or more extreme than what was collected, assuming the null hypothesis is true.

Question 19

Q

Why is it important to distinguish between statistical significance and practical significance?

Answer

Study These Flashcards

A

A result can be statistically significant (unlikely due to chance) but have a trivial or clinically irrelevant effect size, so one must assess real-world impact.

Question 20

Q

Describe the difference between one-sided and two-sided hypothesis tests.

Answer

Study These Flashcards

A

A one-sided test assesses deviation in a specific direction (e.g., greater than); a two-sided test assesses deviations in both directions (greater or less than).

10 Flashcards

(20 cards)