Epi Methods 753 Flashcards by Francesca Marino

2 categories where 1 is reference group (typically “unexposed”)

Dichotomous Variable

How well did you know this?

Not at all

Perfectly

Parameterization of variable into discrete categories

Categorical Variable

How well did you know this?

Not at all

Perfectly

Categorical variable that assigns 0 or 1

Binary Variable

How well did you know this?

Not at all

Perfectly

Categorical variable that doesn’t have ordering/order not of interest; collection of k-1 binary indicator variables

Nominal Variable

How well did you know this?

Not at all

Perfectly

Categorical variable that has ordering/order of interest; collection of binary variables assigned score; step between categories constrained to be equal

Ordinal Variable

How well did you know this?

Not at all

Perfectly

Test Ho that B=0, where B is coefficient for category score variable; if p<0.05 best estimate for step from one category to next is different from 0

Mantel Test for Trend

How well did you know this?

Not at all

Perfectly

Variable can take any value between lower & upper limit

Continuous Variable

How well did you know this?

Not at all

Perfectly

Divide continuous variable by factor; coefficient of variable affected

Rescaling

How well did you know this?

Not at all

Perfectly

Subtract continuous variable by factor; intercept affected

Centering

How well did you know this?

Not at all

Perfectly

Closely related to counterfactual; compare observed outcome to non-observed (counterfactual) outcome; estimate measures of causal effect by measures of association assuming exchangeability (differences due to confounding)

Potential Outcomes

How well did you know this?

Not at all

Perfectly

Observed outcomes in unexposed are good stand-in for unobserved potential outcomes for exposed persons under no exposure & vice versa; not testable but met in expectation with randomization

Exchangeability Assumption

How well did you know this?

Not at all

Perfectly

Comparison of pre-treatment covariates in exposed & unexposed groups; comparability doesn’t guarantee assumption met

Exchangeability Assessment

How well did you know this?

Not at all

Perfectly

Relax exchangeability assumption to be conditional on covariates; assumes no unmeasured confounders

Conditional Exchangeability

How well did you know this?

Not at all

Perfectly

Used to assess how average value of continuous outcome varies systematically with X’s; E[Y] = B0+B1X1+…; B1=average difference (cross-sectional) or change (longitudinal) in Y per 1-unit X1

Linear Regression

How well did you know this?

Not at all

Perfectly

Used to assess how log odds binary outcome varies systematically with X’s; log(odds Y)=B0+B1X1+…; B1=difference in log(odds Y) per 1-unit X1; PrOR for cross-sectional or ROR for longitudinal

Logistic Regression

How well did you know this?

Not at all

Perfectly

Risk or prevalence > 10%

OR Overestimates RR or PrR

How well did you know this?

Not at all

Perfectly

Used to assess how log probability binary outcome varies systematically with X’s; log(Pr(Y=1))=B0+B1X1+…; B1=difference in log(prob Y) per 1-unit X1; PrR for cross-sectional or RR for longitudinal

Log-Binomial Regression

How well did you know this?

Not at all

Perfectly

Path from E to O that starts with E & all arrows point in same direction

Causal Path

How well did you know this?

Not at all

Perfectly

Any other path from E to O; unconditionally open backdoor paths are confounded vs. unconditionally closed backdoor paths are blocked at collider

Non-Causal Path

How well did you know this?

Not at all

Perfectly

Covariate set that leaves all causal paths open & non-causal paths closed vs. does this without any extra variables

Sufficient vs. Minimally Sufficient

How well did you know this?

Not at all

Perfectly

Variables only causally associated with exposure; decreases precision if put into model

Instrument

How well did you know this?

Not at all

Perfectly

Not necessary for confounder control but may increase precision

Variables Associated with Outcome

How well did you know this?

Not at all

Perfectly

Confounding is causal concept but collapsibility is statistical concept; depends on prevalence of outcome & type of measure of association

Problems with Collapsibility Definition

How well did you know this?

Not at all

Perfectly

Stratify table by exposure, do not include outcome, & do not include p-values

Causal Inference Table 1

How well did you know this?

Not at all

Perfectly

Give similar results when number of confounders is small & no confounders are continuous

Stratified Analysis vs. Regression

Expresses incomplete adjustment of confounding variables due to mismeasurement or misspecification

Residual Confounding

Prioritize accurate representation & interpretation of exposure but fit for confounders

Causal Inference Modeling

2 or more risk factors modify effect of each other with regard to occurrence/level of outcome; effect of E on O differs across strata of X; potential outcomes indexed by E only & estimated conditional on X (1 exchangeability assumption)

Effect Measure Modifier

Risk of O in presence of both E & X differs from what would be expected based on effect of E alone & X alone; potential outcomes indexed by both E & X (2 exchangeability assumptions)

Causal Interaction

Difference of risk differences (liner), ratio of odds ratios (logistic), or ratio of risk ratios (log-binomial)

Coefficient of Product Term

Difference of risk differences expressed as proportion of reference risk (RR00); R11-R01-R10+1

Relative Excess Risk due to Interaction (RERI)

P-value of Wald test for interaction coefficient; LRT (or F-test for linear models); underpowered & likely to return false positives

Test of Homogeneity

Used to identify potential associations with outcome; hypothesis generating; non-causal, potential for multiple comparisons, different across studies

Risk Factor Analysis

Stratify table by outcome, no p-values

Risk Factor Analysis Table 1

Prioritize interpretability & model fit of all covariates

Risk Factor Analysis Modeling

Assign predicted probability of condition based on baseline characteristics; use logistic regression then convert to probability

Prediction Model

Use baseline characteristics to predict current disease stage; useful if gold standard is invasive or expensive; assessed against gold standard

Diagnostic Model

Use baseline characteristics to predict future disease state; assessed by future outcomes data

Prognostic Model

Degree of closeness of measured/predicted quantity to actual/gold standard value

Accuracy

Accuracy of output from prediction model applied to data used to develop model; calibration & discrimination

Model Accuracy

Accuracy of output from prediction model applied to data not used to develop model; calibration & discrimination

Model Predictive Accuracy

Ability to correctly estimate disease state or risk/probability of future event

Calibration

Ability to separate persons with/without disease or various disease states

Discrimination

Prioritize model fit & parsimony

Prediction Modeling

Continuous outcome; R2 closer to 1 vs. intercept = 0 & slope = 1

Good Discrimination vs. Calibration

Plot sensitivity vs. 1-specificity for binary outcome; each point corresponds to different cutoff of what defines "positive test"

ROC Curve

Area under ROC curve; if one person with & one without disease were randomly selected, probability that person with disease has higher predicted probability

C-Statistic

Measure of calibration for binary outcomes; measures closeness of distributions of observed & predicted values; tests Ho that observed=expected, p<0.05 indicates poor fit

Hosmer-Lemeshow X2 Goodness-of-Fit Test

Stratify by dataset, no p-values

Prediction Table 1

Describes models whose output reflects statistical "noise" in particular dataset rather than underlying, stable relationships that may be reproducible

Overfitting

Correlation between predictors high enough to degrade precision of regression coefficient estimates substantially for some/all correlated predictors; do not tolerate VIF > 10

Collinearity

Measurement of predictive accuracy; discrimination/calibration of model on data not used to derive model

Validation

Predictive accuracy measured within same population (training set vs. validation set); e.g. split sample or h-fold cross-validation

Internal Validation

Predictive accuracy measured within different population

External Validation

2-state, non-recurrent event

Outcome for Survival Analysis

Reflects beginning of time individuals biologically & methodologically at risk; elapsed time measured from this point (aligns individuals)

Time Origin

Yardstick by which time is measured; controls for that measurement of time

Time Metric

Time at beginning of individual's observation in study

Entry Time

Time origin < study entry; assume individuals representative of all other participants & those who don't enter at all

Late Entry

Time during which study outcome cannot occur because individual not under observation; downwardly biased outcome rate & upwardly biased survival curve

Immortal Person-Time

Exclusion of prevalent cases

Left Censoring

Individual did not experience outcome under follow-up & can't be further observed (no longer methodologically at risk); administrative censoring, LTFU, or competing risk; assumed to be non-informative

Right Censoring

Assumption that risk of outcome at any given moment of follow-up is similar across individuals

Equivalence of Person-Time at Risk

Group of individuals aligned by time origin & at risk for event at time t; used for comparisons in survival analysis; assembled at each time of event (continuous) or period (discrete)

Risk Set

Instantaneous rate of event among those who survive without event to that time point among those who make it to time point; estimated using p(t)/width

Continuous Time Hazard

Conditional probability of event among those who survive without event to that time period among those who make it to time period; # events/#at risk; determines whether risk is increasing, decreasing, or constant

Discrete Time Hazard

Cumulative probability of surviving beyond time j; S(tj-1)(1-h(tj)) or S(tj-1)(1-p(tj)); plot using Kaplan-Meier

Survival

Cumulative probability of having event at or before time j; complement of survival function; plot using Kaplan-Meier

Cumulative Incidence

Cumulation of hazard between t0 & tj for individual; shape represents behavior of hazard function in continuous time; estimated using Kaplan-Meier --> plot -ln(S(tij))

Cumulative Hazard

One record for each person-period when individual at risk (often multiple rows of data per person); define late entries, exclude person-time prior to study entry or after study exit, & identify gaps

Discrete Time Data Setup

Models discrete-time hazard function for truly discrete hazard; log hazard odds=[aD1+...]*BXi; aj=log hazard odds for time period j when X's=0 (estimates hazard in each time period); B=log hazard OR in exposed vs. unexposed

Pooled Logistic Regression

Truly discrete hazard (hazard is conditional probability & constant within each time period) & proportional hazard odds (hazard OR constant across periods)

Assumptions of Pooled Logistic Regression

Models discrete-time hazard function for underlying continuous event processes; ln(-ln(1-h(tij|Xij))=[a1D1+...]*BX1; B=log HR outcome in exposed vs. unexposed

Discrete Time Proportional Hazards Regression (cloglog)

Continuous-time hazard & proportional hazards

Assumptions of Discrete Time Proportional Hazards Regression

One record for each individual (can be multiple if time-varying covariates); define late entries & exclude person-time prior to study entry or after study exit

Continuous Time Data Setup

Models continuous-time hazard function; log(h(t))=log(h0(t))+B1X1+...; B1=log HR outcome in exposed vs. unexposed; semi-parametric; sensitive to ties

Cox Proportional Hazards Regression

Shape of hazard allowed to vary & proportional hazards

Assumptions of Cox Proportional Hazards Regression

Parallel lines for plot H(t) vs. time or ln(H(t)) vs. time; horizontal line or correlation of 0 for plot of Schoenfeld residuals vs. time

Assessing Proportional Hazards Assumption

Tests Ho of no difference between survival functions; p<0.05 indicates survival differs in at least 1 group

Log-Rank Test

Unit of analysis is time period in which variable is constant; include additional rows of data for each transition time

Time-Varying Covariates

Conditional logistic regression to calculate matched OR (discordant pairs); rare disease assumption met & OR is valid estimate of HR (representative subsample & cohort is reasonable size)

Analysis for Nested Case-Control Studies

Cox proportional hazards regression with late entries for cases outside subcohort; rare disease assumption met (cohort reasonable size & few ties)

Analysis for Case-Cohort Studies

Used to assess how log incidence rate of count outcome varies systematically with X's; log(IRk)=uj+B0+B1X1+... or log(A)=uj+B0+B1X1+...+log(T); B1=difference in log(IR Y) per 1-unit X1 (same as log IRR)

Poisson Regression

Equivalent to hazard when hazard is constant or average hazard when hazard isn't constant

Incidence Rate

Communication, no meaningful time origin, no multi-level data, or outcome is count

Reasons to Estimate IR

Offset

ln(person-time) in Poisson Regression

Each row corresponds to one bin of person-time; each row needs covariate(s) values, # events, & amount of person-time

Poisson Data Setup

Constant multiplicative effect, constant average hazard, mean=variance

Assumptions of Poisson Regression

Used to assess how log incidence rate of count outcome varies systematically with X's; relaxes mean=variance assumption using dispersion parameter (a); log(IRk)=uj+B0+B1X1+... or log(A)=uj+B0+B1X1+...+log(T); B1=difference in log(IR Y) per 1-unit X1 (same as log IRR)

Negative Binomial Regression

LRT with Ho: a=0; if p<0.05 then use NB

Evaluating Overdispersion

Variance>mean in dataset where outcome assumed to be Poisson distributed; may occur if confounder not included in model or outcomes correlated across time bins; can produce underestimated SE & overestimated test statistics

Overdispersion

Epi Methods 753 Flashcards

(91 cards)