10 Flashcards

(20 cards)

1
Q

Why might one choose the median over the mean as a measure of central tendency?

A

Because the median is less sensitive to extreme values or skewed data, providing a better “central” value when outliers are present.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you interpret the standard deviation of a dataset?

A

Standard deviation quantifies the average distance of data points from the mean, reflecting the spread or variability in the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the standard error of the mean (SEM), and how does it differ from standard deviation?

A

SEM measures the variability of sample means if the same experiment were repeated, whereas standard deviation measures variability of individual observations around the sample mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define skewness and explain what positive or negative skew indicates about a distribution’s shape.

A

Skewness measures asymmetry of a distribution. Positive skew (right-tailed) indicates a long tail on the higher side; negative skew (left-tailed) indicates a long tail on the lower side.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does kurtosis describe in a distribution?

A

Kurtosis measures the “tailedness” or “peakedness” of a distribution compared to a normal distribution, indicating the propensity for outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is it important to assess both location and dispersion when summarizing data?

A

Because measures of central tendency (location) alone don’t convey the spread or variability; dispersion measures (e.g., range, interquartile range, standard deviation) describe how data are distributed around the center.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a contingency table, and when is it used?

A

A contingency table (cross-tabulation) displays frequencies of two categorical variables simultaneously, facilitating analysis of associations or interactions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define the Pearson correlation coefficient (r).

A

A statistic that quantifies the strength and direction of a linear relationship between two continuous variables, ranging from –1 (perfect negative) to +1 (perfect positive).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Under what circumstances might Pearson correlation be misleading?

A

When the relationship is non-linear, when outliers heavily influence the data, or when the variables are not measured on an interval/ratio scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Spearman rank correlation, and why is it used instead of Pearson correlation in some cases?

A

Spearman correlation assesses the strength of a monotonic relationship using ranked data; it is robust to outliers and appropriate for ordinal data or non-linear (but monotonic) associations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe when you would use a histogram versus a boxplot to explore a dataset.

A

Use a histogram to visualize the detailed shape of a distribution (e.g., modality, skewness). Use a boxplot to summarize distribution in terms of median, quartiles, and identify outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In a boxplot, what do the box, whiskers, and outliers represent?

A

The box spans the interquartile range (Q1 to Q3) with a line at the median; whiskers extend to the most extreme values within 1.5×IQR; points beyond whiskers are outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do sample size and variability affect the reliability of statistical estimates?

A

Larger sample sizes reduce the standard error, making estimates more precise. Greater variability in data increases uncertainty around estimates, requiring larger samples to achieve the same precision.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain why random sampling is important in statistical studies.

A

Random sampling minimizes selection bias and ensures that the sample represents the broader population, allowing valid inferences and generalizations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are type I and type II errors in hypothesis testing?

A

A type I error is incorrectly rejecting a true null hypothesis (false positive). A type II error is failing to reject a false null hypothesis (false negative).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does significance level (α) relate to type I error?

A

The significance level (α) is the probability threshold for rejecting the null hypothesis; setting α = 0.05 means there is a 5% chance of making a type I error.

17
Q

What is statistical power, and what factors influence it?

A

Statistical power is the probability of correctly rejecting a false null hypothesis (1 − type II error rate). It increases with larger sample size, larger effect size, and lower data variability.

18
Q

Define p-value in the context of hypothesis testing.

A

The p-value is the probability of observing data as extreme or more extreme than what was collected, assuming the null hypothesis is true.

19
Q

Why is it important to distinguish between statistical significance and practical significance?

A

A result can be statistically significant (unlikely due to chance) but have a trivial or clinically irrelevant effect size, so one must assess real-world impact.

20
Q

Describe the difference between one-sided and two-sided hypothesis tests.

A

A one-sided test assesses deviation in a specific direction (e.g., greater than); a two-sided test assesses deviations in both directions (greater or less than).