Midterm Flashcards

(75 cards)

1
Q

Association vs. Causality

A

Causality requires meeting assumptions such as temporal relationship, strength of association, dose response relationship. Experimental studies tend to look at causality.

Association is when there is limited knowledge and you cannot say for sure that the exposure causes the outcome. Observational studies tend to look at association.

When a study is about association, they will have a hypothesis that states “is associated with” while a causality study will say “increases/decreases the risk”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Descriptive Study

A

A study that describes the distribution of disease (e.g. person, place, or time).

Often an implicit hypothesis such as “the distribution of disease varies by person, place or time”. But can also be explicit as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Analytic Study

A

Motivation is often to identify a causal determinant and find an association between exposure and outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Relative Risk

A

RR can mean incidence rate ratio, risk ratio (cumulative incidence ratio), hazard ratio, and odds ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Bias

A

Systematic error in the design or conduct of a study that results in a measure of association among study participants that is meaningfully different from the true measure of association (e.g. such as that in the source population)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Information Bias

A

Error due to collection of incorrect information about study participants.

Participants are classified into incorrect exposure or disease categories (misclassification)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Selection Bias

A

Error arising from 1) criteria or procedures used to select study participants or 2) nonparticipation (occurring at initial enrollment or due to losses to follow-up)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Direction of bias for RR

A

Axis 1: Upward vs. downward (this does not provide information on strength of association is being over or underestimated)

Axis 2: Toward the null vs. away from the null

When assessing direction of bias the reference point is always the true RR. (e.g. if the True OR is .8 and the Obs OR is .2, then the bias is downward and away from the null).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Strength of Association

A

The further from the null, the stronger the association.

Bias away from null overestimates the strength of association

Bias towards the null underestimates the strength of association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Source population in a cohort study

A

The population that gave rise to the study sample. (should always include calendar time)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

General cohort

A

Defined by a factor unrelated to any particular exposure

Typically a convenience sample based on logistical advantages (e.g. willingness to participate, ease of recruitment, and/or follow-up)

Use of an internal comparison group

Uses RR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Specific-exposure cohort

A

Defined by a specific exposure

Use of an external comparison group (e.g. general population).

Method to analyze is indirect standardization

Uses RR

Susceptible to selection bias (such as healthy worker effect) - The main issue is that the exposed cohort and nonexposed external
comparison group are not selected in the same fashion from the same
source population. Selection from different source populations may result in
different disease risk for reasons other than the exposure under study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sources of selection bias

A
  1. different criteria are used to select exposed and unexposed participants
  2. Selection of exposed or nonexposed participants is related to the development of the outcome of interest
  3. Loss to follow-up is related to both the exposure and the outcome of interest (differential losses to follow-up)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Susceptibility to selection bias

A

Cohorts with internal comparison groups are less prone to selection bias than specific-exposure cohorts. Study participants are selected before the development of the disease and it is unlikely that future events will bias selection process. Cohorts using internal groups could have selection bias due to differential losses to follow-up.

Cohort using an external comparison group - healthy worker effect - RR is biased downward

specific-exposure cohorts are extremely prone to selection bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Differential losses to follow-up

A

a situation in research where participants who drop out of a study have different characteristics than those who stay in the study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Source Population in case-control

A

The population that gave rise to the cases. Essentially, the population of persons who would have been identified as cases if they had developed the condition of interest during the time period in which the cases were identified.

Calendar time should be included

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Types of Source populations

A

Primary source population - well-defined (e.g. residence, calendar period), and specified a priori. Determines case ascertainment
Examples include:
- residents of a defined geographic area
-members of a health plan
-members of a general cohort

Secondary source population (more prone to selection bias than primary) - theoretically defined and inferred based on the method of case ascertainment. case ascertainment method is defined a priori. “Would/if criterion” is employed.
Examples include:
- cases ascertained through a hospital “person who would attend the hospital if..”
-cases recruited through advertisements “person who would answer the ad if they were…”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Case-control studies

A

a method of sampling controls from the source population such that the controls reflect exposure distribution in the source population that gave rise to the cases. Controls should be randomly sampled and representative of source population.

Uses odds ratio.

case selection: includes all cases that arise in the source population. But in reality usually only a sample of cases are included but they need to be representative of all cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Selection bias in case-control

A

If the exposure under study is not similar among study cases compared to all cases that arose in the source population.

If the exposure under study is not similar among study controls compared to the source population.

Prone to selection bias. cases and controls are often selected through fundamentally different processes
- imperfect method of case ascertainment
- case non-participation
- case refusal, inability to locate cases, case too sick, case died

Controls: - non-participation, control refusal, inability to locate, random sampling from primary source is hard, secondary source pop is difficult to operationally define

partial non-participation among cases and/or controls

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Timeline of case and control recruitment

A

ascertain and recruit incident cases

accumulate controls during the study period at same rate that cases are being accumulated

source population is restricted to persons at risk of becoming a case

a control who later becomes a case serves as both a control and a case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

2 x 2 table

A

cases controls
exposed a b
non-exposed c d

odds ratio = ad/bc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Types of Case-control Studies

A

Population-based
- primary source population
- cases: all new cases of disease x that arise
- control: rep sample of the source pop with respect to exposure

Hospital-based
- secondary source population
-cases: same as above but in a hospital
-control: same as above, but it’s hard to achieve in a secondary source pop

Source pop can come from place of residence, insurance, access to a regular physician, etc.

One exception: if most residents of a defined geographic area would attend hospital A and no other hospitals if they contracted a disease then cases could be considered population-based and population-based controls can be used.

Nested
- primary source population
-case and control same as above
-typically conducted when the exposure of interest are measured by assay of stored biologic specimens

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Pros and Cons of hospital-based case-control studies

A

Pro
- easily accessible and high participation rate
- protect against recall bias

Con
-nonrandom sample of the source pop, most of whom are healthy
-some may not even be members of source population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Strategies for selection of hospital controls

A

only include patients admitted for diseases for which there is no suspicion of an association with the exposure under study

include controls with a variety of diseases

include diseases thought to have a comparable source population as the disease under study

base exclusions on diagnosis at the current hospitalization, not on past medical history

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
pros and cons of nested
pro - exposure measured at baseline before development of disease -selection of controls by random sampling from a well-defined, primary source population sources of selection bias - incomplete case ascertainment -cohort losses to follow-up -selection bias associated with participant selection in the entire cohort itself
26
Confounding
When associated with both exposure and outcome and is not a mediator on the casual pathway. Can be caused by an imbalance b/w exposed and nonexposed groups in another, extraneous exposure (confounder) If there is confounding and the variable is identified and measured, then can adjust as long as there was no bias in selection of cases or controls within each stratum of the covariate. For example - if SES is only associated with exposure, but there is over selection of high SES controls, then there is an artifactual inverse association with the outcome leading to confounding. Can be addressed through stratification (Mantel-Haenszel method)
27
Key test of validity of a case control study
Controls and the source pop should be alike with respect to the exposure under study
28
The best indication of the presence of confounding is
A meaningful difference between the unadjusted RR and the adjusted RR calculate and inspect RR for each stratum of potential confounder. If the stratum specific RRs are similar, then potential confounding. If they are different it may be effect modification
29
When and how to address confounding
Design phase - identify potential confounders by consulting the lit data collection - measure potential confounders accurately analysis - check theoretical confounders and other study variables. determine if there is confounding.
30
Methods used to adjust for confounding in analysis stage
methods based on stratification multivariable statistical models standardization (direct or indirect)
31
Magnitude of confounding
RRunadj - RR adj ------------------------ x 100 RRadj This percentage should be more than 10%. No need to look at p value here. This will show whether is confounding
32
Mantel-Haenszel summary RRs
To calculate: Set up i 2x2 tables (where i is the # of strata or categories of a potential confounding variable) Compute the weighted average of the stratum-specific RRs (# of subjects or person-time experience in each stratum) For cohort - risk ratio or rate ratio For case-control studies - odds ratio ORmh = sum of aidi/Ni ---------------------- sum of bici/Ni
33
When to use MH
adjustment for a single confounder that is a categorical variable simult adj for 2 or 3 confounders, as long as the number of strata for each confounder is relatively small MH only is to be used for categorical variables cannot use for large strata - too cumbersome
34
Generalized linear models
Linear - no RR estimated Y = b0 + b1 *X1 If want to find b1 in 10 years, just b1 * 10 Null value is 0 The following models are all log-transformed: Logistic - odds ratio - used in case-control studies -other studies with binary dependent variables -risk prediction -ignores time ln(odds of Y) = bo + b1*X... Poisson (log-linear) - IRR - cohort studies with person-time data - incidence rate studies that use aggregate level data ln(incidence rate of Y) = b0 + b1*X... Cox Proportional Hazards - HRR - studies with binary outcome and person-time data - cohort studies - RCTs - Survival analysis ln[h(t)] = ln[h0(t)] + b1*X...
35
Unconditional logistic regression vs conditional logistic regression
unconditional - used in unmatched case control (can also use stratification such as the traditional Mantel-Haenszel method with stratification by the matching factors for unmatched case control) conditional - used in some matched cases control
36
Deriving RR (per N-year increase of age) in a log-transformed model
for example, it could be any continuous variable. ex: N = 10 beta(per year) = .04 RR(per year) = 1.05 beta(per year)*10 = .4 then e^.4 = 1.63 This is done in relation to reference level - could be 10 could be 20, but N=10 will always be the same.
37
Categorical or continuous in model for a natural order variable? (test for trend)
Can model it as a singular term, if linear then can use as continuous. Test for trend can be used to assess evidence of an exponential trend (linear on a log scale). only applied to exposures with a natural order. To do this: 1) model variable as categorical variable to capture shape of the dose-response relationship 2) model variable as a single term in a separate regression model to test for trend p-value for trend = p-value for b1 in the single term model. if p<.05 then it is significant
38
Standardization
Stratification-based method of comparing rates of an outcome between two populations that have different distributions of one or more confounders To make the comparisons fair (i.e. to remove confounding) by forcing the two populations to have the same covariate distribution
39
Indirect Standardization
Used for retrospective (historical) cohorts with an external comparison group such as special exposed cohorts. Standardization covariates MUST be categorical
40
Standardized incidence/mortality ratio
incidence ratio: total observed cases ----------------------------- total expected cases mortality ratio: total observed deaths -------------------------------- total expected deaths
41
Residual confounding
When your study adjusts for a variable or set of related variables that do not completely remove the confounding by that/those variables. Coarse categorization: This may be because you use too broad of categories so that there are heterogenous groups of people within each stratum. This is problematic because these heterogenenous groups of people could also differ with respect to their exposure prevalence and risk of the outcome. Suboptimal modeling of the confounder in a multivariable model (e.g. modeling a covariate as continuous when the true dose-response curve is U-shaped) Inadequate adjustment for complex, multidimensional confounders, such as smoking, SES, and health status Inadequate measurement of the confounder (measurement error - unvalidated data collection instrument), collection of insufficiently detailed information **If confounding remains due to not adjusting at all for a particular confounder this is NOT considered residual confounding.
42
Health status as confounder
Healthy vaccinee effect - seniors are at high short-term risk of death who are unvaccinated
43
Addressing residual confounding
Measurement - measure potential confounders as carefully as the exposure under study. Especially if multidimensional Data analysis - Use sufficiently fine covariate categorization, optimize modeling of covariates in multivariable models, strive to capture full dimensionality of multidimensional confounders in multivariable models *however need to take into account statistical imperative of model parsimony - ratio of # of outcomes to # of covariates should be more than 10. interpretation - Be transparent about the residual confounding in interpretation and how it could be better accounted for.
44
Matching in Cohort Studies
Adjust for one or more potential confounders in the design phase of your study Select non-exposed participants who are similar to the exposed participants with respect to the distribution of one or more potential confounders. Potential confounders are called matching factors. When matched no need to account in the analysis phase but only if there's complete follow-up
45
Matching in case-control studies
to adjust for one or more potential confounders in the design phase of the study selection of controls who are similar to cases with respect to their distribution of one or more potential confounders However matching in the design phase alone does not completely remove confounding and so will need to still adjust in the analysis phase matching intentionally introduces selection bias and creates a new, superimposed confounding toward the null Matching on a true confounder increases statistical efficiency by optimizing precision
46
Frequency matching
Selection of controls such that the distribution(s) of one or more potential confounders is/are similar in cases and controls Often used when matching factors are demographic variables (e.g. age, sex, race) For example if some stratum have 0 individuals, you risk not being able to use the data from all subjects in the study leading to reduced statistical efficiency
47
Individual Matching
Selection of one or more controls that are identical to a given case with respect to one or more potential confounders Useful for controlling for a confounder using "fine stratification" (mini stratum) matching factors that are multidimensional confounders using risk-set sampling of controls in nested case-control studies The matched set is the stratum Cannot do twin studies with unmatched case-control Must use conditional logistic regression - don't need to include matching factors OR stratification - mantel-Haenszel matched analysis (McNemar Test) - this gives matched OR
48
Nested Case-Control Studies with Matching
For each case, N number of matched controls can randomly sampled from the case's risk set - can restrict the risk set by matching factors Enables selection of control with the same risk set as case - same concurrent time at risk for development of outcome
49
Simplest Mantel-Haenszel matched analysis
four possible combinations of matched pairs concordant, concordant, discordant, and discordant. Only need to look at the discordant pairs. q r s t r and s are the discordant pairs r/s = Matched Odds Ratio
50
Can association between matching factors and disease be studied?
No, because matching forces controls to be the same as cases with respect to the matching factor therefore, there is no way to find the association.
51
Overmatching
Overmatching generally refers to matching that is counter productive, by either causing bias or reducing efficiency. This causes a new superimposed confounding toward the null and leads to loss of statistical efficiency. Overmatching must be corrected in analysis phase Matching on a mediator in a causal pathway between exposure and disease will bias the effect estimate towards the null Matching on a non-confounder that is associated with exposure, but not a risk factor for disease
52
Survival analysis
Study of the distribution of time elapsed from a baseline time to an outcome(event) Study of the effect of exposures (including treatments) affect the distribution of time to event Used for two study designs: cohort studies and RCTs Baseline data examples: date of entry into a cohort, birth date, etc. outcome examples: death, incident disease, disease cure, etc. It is better to experience a beneficial outcome earlier than later It is better to experience an adverse outcome later than earlier
53
Cumulative incidence vs. cumulative survival
CI (0 to 1) is the proportion of a specified population at risk that experienced the outcome under study during a specified time period Probability (risk) of experiencing the outcome under study in the specified time period CS (0 to 1) is the proportion of a specified population at risk that does NOT experience the outcome under study (i.e. "survives") during a specified time period Probability (risk) of NOT experiencing the outcome under study in the specified time period Can both be calculated directly if closed cohort
54
CS + CI =?
1
55
CI curve vs CS curve
CI curve is the proportion of subjects who have experienced the event as a function of time since baseline CS curve is the proportion of subjects who have NOT experienced the event as a function of time since baseline
56
Median survival time
where CI = CS = .5
57
How to plot cumulative incidence/survival as a step function
1. rank survival times from lowest to highest 2. create intervals that start when one or more events occur 3. calculate cumulative incidence during interval 4. calculating cumulative survival would just be subtraction/total instead of addition/total
58
Cumulative Incidence in open cohort
cumulative incidence will be underestimated because it assumes those who withdrew, lost to follow-up or died did not experience that incidence.
59
Kaplan Meier method
1. Rank survival times from lowest to highest 2. divide survival time into intervals that start when one or more events occur (ei and ci) and calculate # at risk at start of each interval (ni) 3. calculate probability of surviving each interval (pi = (ni-ei)/ni) 4. calculate cumulative survival during each interval (Si = Si-1 x pi) - first interval is always 1
60
Censoring
Termination of follow-up for a subject on a specified date because it is unknown whether the outcome occurred or would have occurred after that date. unknown whether outcome occurred or would have occurred Kaplan-Meier survival estimates calculated cumulative incidence/survival taking censoring into account, but assumes that censoring is unbiased
61
Log-rank test
Compares K-M curves for 2 or more groups.
62
Stratified log rank test
Compares K-M curves for 2 or more groups using stratification to control for confounding limitation: method breaks down if data becomes too sparse
63
Interpreting and presenting K-M curves
The further to the right, the fewer subjects at risk and the more uncertainty Good practice to end the plot at a follow-up time when only 10-20% of subjects are still at risk.
64
Two main survival analysis methods
KM survival curves (descriptive and cannot readily calculate RR or adjust for multiple covariates) and cox proportional hazards regression
65
Cox proportional hazards regression
allows baseline hazard to vary over time assumes the hazard ratio is constant over time which is equivalent to stating that the exposure-outcome relationship is NOT modified by follow-up time (therefore not an effect modifier) - if PH assumption is violated then follow-up time is a modifier and stratification by follow-up time would be needed allows adjustment of multiple covariates and provides an RR
66
Hazard
the instantaneous incidence rate at a point in time (change in number of new cases at time point) - basically the slope between two points on the curve. Incidence rate could change with time
67
When is Proportional Hazards assumption not met
When the proportional hazards curves cross one another.
68
Cause-specific mortality is always ___ overall mortality
less than or equal to
69
cause-specific survival is always _____ overall survival
more than or equal to
70
Measuring cause-specific mortality is ____ logistically challenging than measuring overall mortality
more
71
Methods for assessing cause-specific mortality
direct methods (gold standard)- determine the cause of death for each decedent. This can be done by review of medical records or death certificates but medical records is better. indirect methods - take overall-mortality estimates and apply a correction to them, in order to estimate the number of deaths due to a specific cause - through relative survival
72
Relative survival
Provides an estimate of cause-specific survival in a cohort. corrects for deaths from causes other than the disease under study RS = observed OS/expected OS If expected OS = 1, RS = observed OS If expected OS<1, RS > or equal to observed OS
73
Expected OS
Usually expected OS of person of the same demographics and calendar period from publicly available vital statistics data Key assumption: OS in the diseased cohort would be the same as the OS of the comparison population, if the cohort members did not have the disease (assuming that the only difference between the two cohorts is the disease).
74
Effect modification
Variation in the magnitude of the association between an exposure and an outcome across strata of a second exposure (the effect modifier) Has an underlying public health, clinical, biologic, or psychosocial basis. Not merely a statistical phenomenon. Can be assessed through stratified analysis and multivariable models Effect modification is reciprocal since there is an interaction
75
Effect modification via stratified analysis
If stratify and RR for each stratum is not similar then there is a potential for effect modification. If this is the case you can then calculate a p-value for heterogeneity (interaction) (this is a likelihood ratio test). If p-value for heterogeneity is significant, then effect modification/interaction, if not then no effect modification/interaction. calculate p-value of the interaction term, if multiple, the interaction terms in aggregate