Epi Methods 753 Flashcards

(91 cards)

1
Q

2 categories where 1 is reference group (typically “unexposed”)

A

Dichotomous Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Parameterization of variable into discrete categories

A

Categorical Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Categorical variable that assigns 0 or 1

A

Binary Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Categorical variable that doesn’t have ordering/order not of interest; collection of k-1 binary indicator variables

A

Nominal Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Categorical variable that has ordering/order of interest; collection of binary variables assigned score; step between categories constrained to be equal

A

Ordinal Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Test Ho that B=0, where B is coefficient for category score variable; if p<0.05 best estimate for step from one category to next is different from 0

A

Mantel Test for Trend

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Variable can take any value between lower & upper limit

A

Continuous Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Divide continuous variable by factor; coefficient of variable affected

A

Rescaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Subtract continuous variable by factor; intercept affected

A

Centering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Closely related to counterfactual; compare observed outcome to non-observed (counterfactual) outcome; estimate measures of causal effect by measures of association assuming exchangeability (differences due to confounding)

A

Potential Outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Observed outcomes in unexposed are good stand-in for unobserved potential outcomes for exposed persons under no exposure & vice versa; not testable but met in expectation with randomization

A

Exchangeability Assumption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Comparison of pre-treatment covariates in exposed & unexposed groups; comparability doesn’t guarantee assumption met

A

Exchangeability Assessment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Relax exchangeability assumption to be conditional on covariates; assumes no unmeasured confounders

A

Conditional Exchangeability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Used to assess how average value of continuous outcome varies systematically with X’s; E[Y] = B0+B1X1+…; B1=average difference (cross-sectional) or change (longitudinal) in Y per 1-unit X1

A

Linear Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Used to assess how log odds binary outcome varies systematically with X’s; log(odds Y)=B0+B1X1+…; B1=difference in log(odds Y) per 1-unit X1; PrOR for cross-sectional or ROR for longitudinal

A

Logistic Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Risk or prevalence > 10%

A

OR Overestimates RR or PrR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Used to assess how log probability binary outcome varies systematically with X’s; log(Pr(Y=1))=B0+B1X1+…; B1=difference in log(prob Y) per 1-unit X1; PrR for cross-sectional or RR for longitudinal

A

Log-Binomial Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Path from E to O that starts with E & all arrows point in same direction

A

Causal Path

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Any other path from E to O; unconditionally open backdoor paths are confounded vs. unconditionally closed backdoor paths are blocked at collider

A

Non-Causal Path

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Covariate set that leaves all causal paths open & non-causal paths closed vs. does this without any extra variables

A

Sufficient vs. Minimally Sufficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Variables only causally associated with exposure; decreases precision if put into model

A

Instrument

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Not necessary for confounder control but may increase precision

A

Variables Associated with Outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Confounding is causal concept but collapsibility is statistical concept; depends on prevalence of outcome & type of measure of association

A

Problems with Collapsibility Definition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Stratify table by exposure, do not include outcome, & do not include p-values

A

Causal Inference Table 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Give similar results when number of confounders is small & no confounders are continuous
Stratified Analysis vs. Regression
26
Expresses incomplete adjustment of confounding variables due to mismeasurement or misspecification
Residual Confounding
27
Prioritize accurate representation & interpretation of exposure but fit for confounders
Causal Inference Modeling
28
2 or more risk factors modify effect of each other with regard to occurrence/level of outcome; effect of E on O differs across strata of X; potential outcomes indexed by E only & estimated conditional on X (1 exchangeability assumption)
Effect Measure Modifier
29
Risk of O in presence of both E & X differs from what would be expected based on effect of E alone & X alone; potential outcomes indexed by both E & X (2 exchangeability assumptions)
Causal Interaction
30
Difference of risk differences (liner), ratio of odds ratios (logistic), or ratio of risk ratios (log-binomial)
Coefficient of Product Term
31
Difference of risk differences expressed as proportion of reference risk (RR00); R11-R01-R10+1
Relative Excess Risk due to Interaction (RERI)
32
P-value of Wald test for interaction coefficient; LRT (or F-test for linear models); underpowered & likely to return false positives
Test of Homogeneity
33
Used to identify potential associations with outcome; hypothesis generating; non-causal, potential for multiple comparisons, different across studies
Risk Factor Analysis
34
Stratify table by outcome, no p-values
Risk Factor Analysis Table 1
35
Prioritize interpretability & model fit of all covariates
Risk Factor Analysis Modeling
36
Assign predicted probability of condition based on baseline characteristics; use logistic regression then convert to probability
Prediction Model
37
Use baseline characteristics to predict current disease stage; useful if gold standard is invasive or expensive; assessed against gold standard
Diagnostic Model
38
Use baseline characteristics to predict future disease state; assessed by future outcomes data
Prognostic Model
39
Degree of closeness of measured/predicted quantity to actual/gold standard value
Accuracy
40
Accuracy of output from prediction model applied to data used to develop model; calibration & discrimination
Model Accuracy
41
Accuracy of output from prediction model applied to data not used to develop model; calibration & discrimination
Model Predictive Accuracy
42
Ability to correctly estimate disease state or risk/probability of future event
Calibration
43
Ability to separate persons with/without disease or various disease states
Discrimination
44
Prioritize model fit & parsimony
Prediction Modeling
45
Continuous outcome; R2 closer to 1 vs. intercept = 0 & slope = 1
Good Discrimination vs. Calibration
46
Plot sensitivity vs. 1-specificity for binary outcome; each point corresponds to different cutoff of what defines "positive test"
ROC Curve
47
Area under ROC curve; if one person with & one without disease were randomly selected, probability that person with disease has higher predicted probability
C-Statistic
48
Measure of calibration for binary outcomes; measures closeness of distributions of observed & predicted values; tests Ho that observed=expected, p<0.05 indicates poor fit
Hosmer-Lemeshow X2 Goodness-of-Fit Test
49
Stratify by dataset, no p-values
Prediction Table 1
50
Describes models whose output reflects statistical "noise" in particular dataset rather than underlying, stable relationships that may be reproducible
Overfitting
51
Correlation between predictors high enough to degrade precision of regression coefficient estimates substantially for some/all correlated predictors; do not tolerate VIF > 10
Collinearity
52
Measurement of predictive accuracy; discrimination/calibration of model on data not used to derive model
Validation
53
Predictive accuracy measured within same population (training set vs. validation set); e.g. split sample or h-fold cross-validation
Internal Validation
54
Predictive accuracy measured within different population
External Validation
55
2-state, non-recurrent event
Outcome for Survival Analysis
56
Reflects beginning of time individuals biologically & methodologically at risk; elapsed time measured from this point (aligns individuals)
Time Origin
57
Yardstick by which time is measured; controls for that measurement of time
Time Metric
58
Time at beginning of individual's observation in study
Entry Time
59
Time origin < study entry; assume individuals representative of all other participants & those who don't enter at all
Late Entry
60
Time during which study outcome cannot occur because individual not under observation; downwardly biased outcome rate & upwardly biased survival curve
Immortal Person-Time
61
Exclusion of prevalent cases
Left Censoring
62
Individual did not experience outcome under follow-up & can't be further observed (no longer methodologically at risk); administrative censoring, LTFU, or competing risk; assumed to be non-informative
Right Censoring
63
Assumption that risk of outcome at any given moment of follow-up is similar across individuals
Equivalence of Person-Time at Risk
64
Group of individuals aligned by time origin & at risk for event at time t; used for comparisons in survival analysis; assembled at each time of event (continuous) or period (discrete)
Risk Set
65
Instantaneous rate of event among those who survive without event to that time point among those who make it to time point; estimated using p(t)/width
Continuous Time Hazard
66
Conditional probability of event among those who survive without event to that time period among those who make it to time period; # events/#at risk; determines whether risk is increasing, decreasing, or constant
Discrete Time Hazard
67
Cumulative probability of surviving beyond time j; S(tj-1)(1-h(tj)) or S(tj-1)(1-p(tj)); plot using Kaplan-Meier
Survival
68
Cumulative probability of having event at or before time j; complement of survival function; plot using Kaplan-Meier
Cumulative Incidence
69
Cumulation of hazard between t0 & tj for individual; shape represents behavior of hazard function in continuous time; estimated using Kaplan-Meier --> plot -ln(S(tij))
Cumulative Hazard
70
One record for each person-period when individual at risk (often multiple rows of data per person); define late entries, exclude person-time prior to study entry or after study exit, & identify gaps
Discrete Time Data Setup
71
Models discrete-time hazard function for truly discrete hazard; log hazard odds=[aD1+...]*BXi; aj=log hazard odds for time period j when X's=0 (estimates hazard in each time period); B=log hazard OR in exposed vs. unexposed
Pooled Logistic Regression
72
Truly discrete hazard (hazard is conditional probability & constant within each time period) & proportional hazard odds (hazard OR constant across periods)
Assumptions of Pooled Logistic Regression
73
Models discrete-time hazard function for underlying continuous event processes; ln(-ln(1-h(tij|Xij))=[a1D1+...]*BX1; B=log HR outcome in exposed vs. unexposed
Discrete Time Proportional Hazards Regression (cloglog)
74
Continuous-time hazard & proportional hazards
Assumptions of Discrete Time Proportional Hazards Regression
75
One record for each individual (can be multiple if time-varying covariates); define late entries & exclude person-time prior to study entry or after study exit
Continuous Time Data Setup
76
Models continuous-time hazard function; log(h(t))=log(h0(t))+B1X1+...; B1=log HR outcome in exposed vs. unexposed; semi-parametric; sensitive to ties
Cox Proportional Hazards Regression
77
Shape of hazard allowed to vary & proportional hazards
Assumptions of Cox Proportional Hazards Regression
78
Parallel lines for plot H(t) vs. time or ln(H(t)) vs. time; horizontal line or correlation of 0 for plot of Schoenfeld residuals vs. time
Assessing Proportional Hazards Assumption
79
Tests Ho of no difference between survival functions; p<0.05 indicates survival differs in at least 1 group
Log-Rank Test
80
Unit of analysis is time period in which variable is constant; include additional rows of data for each transition time
Time-Varying Covariates
81
Conditional logistic regression to calculate matched OR (discordant pairs); rare disease assumption met & OR is valid estimate of HR (representative subsample & cohort is reasonable size)
Analysis for Nested Case-Control Studies
82
Cox proportional hazards regression with late entries for cases outside subcohort; rare disease assumption met (cohort reasonable size & few ties)
Analysis for Case-Cohort Studies
83
Used to assess how log incidence rate of count outcome varies systematically with X's; log(IRk)=uj+B0+B1X1+... or log(A)=uj+B0+B1X1+...+log(T); B1=difference in log(IR Y) per 1-unit X1 (same as log IRR)
Poisson Regression
84
Equivalent to hazard when hazard is constant or average hazard when hazard isn't constant
Incidence Rate
85
Communication, no meaningful time origin, no multi-level data, or outcome is count
Reasons to Estimate IR
86
Offset
ln(person-time) in Poisson Regression
87
Each row corresponds to one bin of person-time; each row needs covariate(s) values, # events, & amount of person-time
Poisson Data Setup
88
Constant multiplicative effect, constant average hazard, mean=variance
Assumptions of Poisson Regression
89
Used to assess how log incidence rate of count outcome varies systematically with X's; relaxes mean=variance assumption using dispersion parameter (a); log(IRk)=uj+B0+B1X1+... or log(A)=uj+B0+B1X1+...+log(T); B1=difference in log(IR Y) per 1-unit X1 (same as log IRR)
Negative Binomial Regression
90
LRT with Ho: a=0; if p<0.05 then use NB
Evaluating Overdispersion
91
Variance>mean in dataset where outcome assumed to be Poisson distributed; may occur if confounder not included in model or outcomes correlated across time bins; can produce underestimated SE & overestimated test statistics
Overdispersion