13 Flashcards
(15 cards)
Under what conditions can risk ratios (RR) be used instead of odds ratios (OR)?
Risk ratios can be calculated in prospective (cohort) studies where both the number of events and total individuals at risk in each exposure group are known, allowing direct computation of incidence proportions.
How do odds ratios and risk ratios relate when the event of interest is rare?
When the event is rare (i.e., incidence is very low), the numerical values of OR and RR are very close because odds and probabilities converge when probabilities are small.
What key assumptions are shared by both linear and logistic regression models?
Both models assume that observations are independent, that predictors are measured without error, and that the specified link function (identity for linear, logit for logistic) correctly describes how predictors relate to the transformed outcome.
Why is it incorrect to infer causation solely from an observed association in regression or correlation analyses?
Regression and correlation reveal statistical association but do not control for unmeasured confounders or establish temporal precedence. Establishing causation requires controlled experimental design, such as randomized controlled trials, to rule out alternative explanations.
How does multicollinearity affect the interpretation of logistic regression coefficients?
Multicollinearity (high correlation between predictors) inflates the standard errors of logistic regression coefficients, making OR estimates unstable and reducing confidence in interpreting any single predictor’s effect.
Explain why logistic regression uses maximum likelihood estimation rather than ordinary least squares.
Because the outcome is binary, residuals are not normally distributed around a continuous response. Maximum likelihood estimation finds the parameter values that maximize the probability of observing the given binary outcomes under the logistic model.
What is the practical difference between a probability and an odds when interpreting logistic regression?
Probability is the chance of an event happening ([0,1] range). Odds are the ratio of the probability of the event to the probability of no event (p / (1–p)). Odds greater than 1 indicate p > 0.5, and vice versa.
How do confidence intervals around odds ratios inform the reliability of estimated associations?
The CI width reflects uncertainty: a narrow CI indicates a precise OR estimate, while a wide CI indicates less precision. If the entire CI lies above (or below) 1, it confirms a reliably increased (or decreased) odds of the outcome.
Why might logistic regression include continuous and categorical predictors simultaneously?
Logistic regression can model a binary outcome as a function of any mix of predictor types, allowing adjustment for confounding variables, capturing nuanced relationships, and improving predictive accuracy by using all relevant information.
What does an R²-like measure (e.g., pseudo-R²) indicate in logistic regression?
Pseudo-R² measures (such as McFadden’s R²) quantify how well the logistic model explains outcome variability in a way analogous to linear R². Although they do not have exactly the same interpretation, higher values indicate better model fit.
How do one interpret a regression coefficient of zero in both linear and logistic regression?
In linear regression, a coefficient of zero implies no change in the continuous outcome per unit change in predictor (no association). In logistic regression, a coefficient of zero implies an OR of 1 (no change in odds of the outcome), similarly indicating no association.
When constructing a standard curve via linear regression, why must one check that the relationship between concentration and signal is approximately linear?
Because linear regression assumes a linear relationship; if the calibration points do not align linearly (e.g., at very high concentrations saturation occurs), the fitted line will not accurately predict unknown concentrations, leading to biased estimates.
Why is the concept “association does not imply causation” emphasized when teaching regression and correlation?
Because it reminds practitioners that observed statistical relationships might arise from confounding variables, reverse causality, or chance, and that only controlled experimental designs can rigorously establish a causal link.
How can one use the fitted logistic regression model to predict an individual’s probability of an outcome?
First calculate the linear predictor (η = α + β₁ X₁ + … + βₖ Xₖ), then transform via the logistic function: probability = 1 / [1 + exp(–η)]. This yields a value between 0 and 1 representing the predicted likelihood of the event.
Explain why one would examine confidence intervals of regression coefficients rather than rely solely on p-values.
Confidence intervals convey both statistical significance (via inclusion/exclusion of a null value) and the magnitude and precision of effect estimates. P-values alone do not show how large or precise an effect is, only whether it is unlikely to be zero by chance.