flashcard 5
(50 cards)
What is the main purpose of statistics in research?
To collect, analyze, interpret, and present numerical data in order to distinguish real patterns from random variation and make informed decisions.
Name the four primary tasks involved in statistical analysis.
1) Designing experiments and collecting data, 2) Describing and summarizing data, 3) Testing hypotheses (inferential statistics), and 4) Building and evaluating predictive or explanatory models.
Why is it important to design experiments carefully before collecting data?
Because a well-designed experiment ensures that data are representative, unbiased, and appropriate for answering the research question, making subsequent analysis valid.
How does descriptive statistics differ from inferential statistics?
Descriptive statistics summarize and visualize the features of a dataset (e.g., mean, median, charts), whereas inferential statistics draw conclusions or make predictions about a population based on sample data.
Explain what a random variable is in probability theory.
A random variable is a numerical representation of outcomes in an experiment, where each possible outcome is assigned a probability.
What distinguishes a discrete random variable from a continuous random variable?
Discrete variables take on countable values (e.g., number of successes), while continuous variables can take any value within an interval (e.g., height or weight).
How can probability be interpreted through the lens of long-run frequencies?
As the proportion of times an event occurs out of many repetitions of the same experiment, approaching a stable value as trials increase.
Define sensitivity in the context of diagnostic testing.
Sensitivity measures the ability of a test to correctly identify individuals who have the condition (true positives) out of all actual positives.
Define specificity in diagnostic testing.
Specificity measures the ability of a test to correctly identify individuals who do not have the condition (true negatives) out of all actual negatives.
What do positive predictive value (PPV) and negative predictive value (NPV) convey about a diagnostic test?
PPV indicates the probability that someone with a positive test truly has the condition; NPV indicates the probability that someone with a negative test truly does not have the condition.
Why does the prevalence of a disease in a population affect PPV and NPV?
Because PPV and NPV depend on the proportion of true cases versus non-cases; in low-prevalence settings, even tests with high sensitivity and specificity can yield a low PPV.
Describe the key differences between randomized controlled trials (RCTs) and cohort studies.
In RCTs, participants are randomly assigned to treatment or control, controlling for confounders. In cohort studies, participants are followed over time based on exposure status without random assignment.
What is the distinction between prospective and retrospective study designs?
Prospective studies collect data moving forward from a defined point, while retrospective studies analyze existing data that were collected in the past.
Compare cross-sectional and longitudinal studies.
Cross-sectional studies measure variables at a single point in time, providing a “snapshot,” whereas longitudinal studies follow the same subjects over multiple time points to observe changes.
What defines quantitative data versus qualitative (categorical) data?
Quantitative data are numerical measurements (either discrete or continuous), while qualitative data describe categories or attributes (nominal or ordinal).
Give examples of nominal and ordinal qualitative variables.
Nominal examples: blood type, type of cuisine. Ordinal examples: pain severity scale, education level (high school, bachelor’s, master’s).
What is a probability distribution, and why is it useful?
A probability distribution describes how probabilities are assigned to different values of a random variable, helping to understand the variable’s behavior and make inferences.
Name three common probability distributions and their typical applications.
Normal distribution for continuous traits in populations; Binomial distribution for counts of successes in fixed trials; Poisson distribution for rare event counts over a fixed interval.
Explain the concept of a normal distribution and its key characteristics.
A normal distribution is symmetric, bell-shaped, and defined by its mean (center) and standard deviation (spread); most values cluster around the mean.
What does the empirical rule (“68–95–99.7 rule”) describe?
In a normal distribution, approximately 68% of values fall within one standard deviation of the mean, 95% within two, and 99.7% within three.
Why might one choose the median over the mean as a measure of central tendency?
Because the median is less sensitive to extreme values or skewed data, providing a better “central” value when outliers are present.
How do you interpret the standard deviation of a dataset?
Standard deviation quantifies the average distance of data points from the mean, reflecting the spread or variability in the dataset.
What is the standard error of the mean (SEM), and how does it differ from standard deviation?
SEM measures the variability of sample means if the same experiment were repeated, whereas standard deviation measures variability of individual observations around the sample mean.
Define skewness and explain what positive or negative skew indicates about a distribution’s shape.
Skewness measures asymmetry of a distribution. Positive skew (right-tailed) indicates a long tail on the higher side; negative skew (left-tailed) indicates a long tail on the lower side.