Data, Analysis, and Informatics Flashcards

(133 cards)

1
Q

Ratios

A

Compares 2 quantities; indicates their size in relation to each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In a ratio, does the numerator need to be a subset of the denominator?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Propportions

A

A unit or number that is considered in comparative relationship to a whole; a type of ratio in which the numerator is a subset of the denominator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Rates

A

Calculated by dividing one number by another where the numerator is a subset of the denominator; the denominator includes a time component

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Incidence

A

Measures the number of NEW cases of a disease in a specific population during a given period of time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Are pre-existing cases counted in incidence?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is incidence assessed?

A

Proportion or rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Is cumulative incidence a proportion or rate?

A

Proportion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Is rate incidence a proportion or rate?

A

Rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Cumulative incidence definition

A

Number of new cases of disease in a population over a specified time period

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Cumulative incidence calculation

A

of new cases / total # of people in the population who are at risk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Incidence rate calculation

A

of new cases / total person-time of observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Prevalence

A

Number of existing cases of a disease during a given time period

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Are pre-existing cases included in prevalence?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Prevalence calculation

A

of people with the disease / total # of people in the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Point prevalence

A

Proportion of the population that is diseased at a single point in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Period prevalence

A

Proportion of the population that is diseased during a specific duration of time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Endemic

A

A situation in a community which there is a consistently elevated rate of a certain disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Epidemic

A

An increase in the number of cases of disease in a community, above what is expected in that geographic area at that time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Pandemic

A

Worldwide epidemic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Two types of epidemiologic study designs

A

Descriptive
Analytic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Describe descriptive epidemiological studies

A

Observational

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Describe analytic epidemiological studies

A

Can be experimental or observational

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Types of descriptive studies

A

Case reports
Case series
Cross-sectional studies
Ecological studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Case reports and case series
Studies that are used to alert people of a new illness or new association with illness. They are usually reports of only people with the condition of interest
26
Cross-sectional studies
Studies that include people who are representative of a given population. They are not selected based on illness or exposure and can be used to determine initial associations and identify the prevalence of either exposure or illness in a group
27
Ecological studies
Studies that are used to describe populations; data are not analyzed on the individual level but on the aggregate level
28
Ecological fallacy
When group-level data are interpreted at the individual level
29
Types of analytic studies
Case-control studies Cohort studies Randomized Controlled Trials Systematic reviews and meta-analyses
30
Case-control studies
Studies that select people with or without disease and then proceed to look back over time to see if people had different rates of exposure.
31
When would you use a case-control study?
For studying rare diseases with long latency periods
32
Cohort studies
Studies that select people based on exposure and determine if people develop disease at different rates
33
When would you use a cohort study?
For rare exposures
34
Are cohort studies prospective or retrospective
They can be either
35
Are people who have the disease of interest at the time point when the study period begins included in cohort study calculations? Why?
No, cohort studies calculate incidence
36
Randomized Controlled Trials (RCT)
These trial test an intervention that is given by the researcher to two or more groups. People are randomly assigned into groups and some are given the active item under study whereas the other group(s) are given the usual treatment, nothing, or a placebo.
37
Single-blind studies
Research participants do not know which group they are in
38
Double-blind studies
Neither participants nor the researchers know who is getting which treatment
39
Systematic reviews and meta-analyses
These are studies that pool the results of multiple independent studies with established criteria to identify the evidence for associations
40
Relative Risk (RR) definition
A measure of the magnitude of an association between an exposure and a disease
41
Relative Risk calculation
Risk (incidence) of outcome in the exposed / Risk of outcome in the non-exposed
42
Relative Risk 2x2 formula
A/(A+B) / C/(C+D)
43
Odds ratio definition
odds of exposure among cases divided by the odds of exposure among controls, which equals the odds of disease among the exposed divided by the odds of the disease among the non-exposed
44
Odds ratio calculation
odds of outcome in the exposed / odds of outcome in the nonexposed
45
Odds ratio 2x2 formula
AB/CD = AD/BC
46
In what types of studies would you calculate RR?
Cohort studies
47
In what types of studies would you calculate OR?
Case-control studies Cross-section studies
48
Interpreting RR or OR = 1
There is NO association between the exposure and the outcome The risk or odds in the exposed equals the risk or odds in the nonexposed
49
Interpreting RR or OR > 1
The exposure increases the risk of the outcome. The risk or odds in the exposed is greater than the risk or odds in the nonexposed
50
Interpreting RR or OR < 1
The exposure decreases the risk of the outcome. The risk or odds in the exposed is less that the risk or odds in the nonexposed; the exposure is a protective factor
51
Three reasons epidemiologist may obtain a false association between an exposure an an outcome from sample data
The finding may be attributable to chance alone Bias Confounding
52
Bias
A systematic error that results from the study's methods and procedures as opposed to random error
53
Bias away from the null
Belief that there is an association when there is non or make the association appear stronger than it really is
54
Bias toward the null
When an association that actually exists is hidden or made to appear weaker than it really is
55
Selection bias
Results from procedures used to select participants into a study
56
In which types of studies is selection bias most likely to occur?
case-control or retrospective cohort studies prospective cohort studies
57
Observational bias
Bias that arises from systematic differences int eh way information on exposure or disease is obtained from the study groups; aka information bias
58
Types of observational bias
recall bias interviewer bias Misclassification
59
Recall bias
differential reporting of past events between study groups
60
Interviewer bias
may include the effects of the interviewer's body language, voice, or demeanor on the response
61
Misclassification error
happens when participants are incorrectly classified into the wrong population group, distorting the link between exposure and outcome
62
Differential misclassification
bias is different between groups
63
Nondifferential misclassification
bias is equal across groups
64
Confounding
occurs when a researcher is evaluating the relationship between an exposure and an outcome but a third variable that is associated with the exposure and the outcome distorts the finding
65
Methods of preventing confounding
Randomization Restriction Matching Standardization Stratification Multivariable analysis
66
Randomization
Process by which a random mechanism is used to select a sample from a population or assign subjects to different groups
67
How does randomization reduce bias/confounding?
Minimizes selection bias Enhances statistical Validity Controls for confounding due to measurable and unmeasurable causes
68
Restriction
Involves only including participants from specific categories of a confounder, thereby eliminating the confounding effect
69
Matching
Involves creating comparable groups with respect to the distribution of potential confounders
70
How does matching reduce bias/confounding?
Ensures groups being compared are similar in terms of confounding variables
71
Standardization
Involves adjusting rates to remove the confounding effect of variables that differ in the population being compared
72
Stratification
Includes creating separate tables of disease by exposure for each possible combination of confounders
73
Multivariable analysis
Involves considering multiple variables simultaneously through statistical models and allows researchers to adjust for the effects of multiple confounders
74
Effect modification
Occurs when the magnitude of the association between an exposure and an outcome varies by the presence or level of a third variable
75
If a variable is only a confounder, stratum specific estimates for the measure of association will be
close to one another Crude estimate with be outside the range of the stratum-specific estimates Crude estimate > all the stratum-specific estimates or Crude estimate < all the stratum-specific estimates
76
If the variable is only an effect modifier, the stratum estimates for the measures of association will be
significantly different from one another crude estimate will be within the range of all the stratum-specific estimates
77
If the variable is a confounder and an effect modifier, stratum specific estimates for the measures of association will be
significantly different from one another crude estimates will be considerably outside the range of stratum-specific estimates
78
List Koch's Postulates
1. The microorganism must be found in abundance in all organisms suffering from the disease but should not be found in healthy organisms 2. It should be possible to isolate the causative microorganism from a diseased organism and grow it in pure culture 3. The organism from a pure culture should be able to cause disease when inoculated into a healthy host organism 4. The microorganism should then be ablet o be isolated from the new host and grown in pure culture
79
Hill's Nine Criteria of Causality
1. Analogy 2. Coherence 3. Reversibility 4. Specificity 5. Plausibility 6. Strength of the association 7. Consistency 8. Biological gradient 9. Temporality
80
Analogy
when a researcher can identify a similar relationship between another exposure and/or disease, it provides evidence for a similar biological pathway that might be due to causation
81
Coherence
This considers the entire picture of the association between exposure and outcome across different models
82
Reversibility
considers if removing the exposure diminishes the probability or occurrence of disease. If an individual is no longer exposed, does the disease diminish?
83
Specificity
Indicates that one exposure should cause one disease. It is more relevant to infectious diseases, although it does not always hold true
84
Plausibility
Refers to whether or not the exposure is likely to cause the disease. Is there a biological or social model that can explain the association? Does the association not conflict with current science?
85
Strength of the association
This refers to the fact that the stronger the association, the more likely it is attributable to a causal factor
86
Consistency
Considers regularity of an association across multiple studies
87
Biological gradient
A measure of the dose-response relationship. Does the risk of disease increase with the risk of exposure? Do higher levels of exposure cause higher levels of disease?
88
Temporality
The exposure must come before the disease
89
Sufficient-Component Cause Model
Sufficient: the presence of a component always produces the outcome Necessary: the outcome cannot occur without the component
90
Directed acyclic graphs (DAGs)
Causal graphs; graphical representations of associations between exposures and outcomes that also include variables that may bias the association (i.e. confounders) based on background knowledge of the topic
91
Primordial prevention
Earliest stage of prevention; concerned with preventing risk factors of disease by targeting lifestyles, behaviors, and exposure patterns that contribute to increased risk of disease
92
Primary prevention
Concerned with preventing the disease and occurs before biological onset of disease
93
Secondary prevention
Early detection; occurs in the preclinical phase after the disease is present but before symptoms appear. The focus is on early detection so that treatment can be provided before the disease progresses
94
Tertiary preveition
Focuses on rehabilitation and support; as the disease has already occurred, the goal is to improve quality of life and reduce symptoms
95
Reliability
Repeatability or consistency
96
Validity
ability of a test to accurately identify diseased and no diseased individuals
97
How is validity measured?
sensitivity and specficity
98
Sensitivity
the ability of a test to correctly identify people with a disease; true positives
99
How is sensitivity expressed?
A percentage of people who test positive out of all those who have the disease; true positives / true positives + false negatives
100
Specificity (screening)
The ability of a test to correctly identify people without a disease
101
How is specificity expressed?
Percentage of people who truly do not have the disease and test negative; true negatives / true negatives + false positives
102
Positive predictive value (PPV)
of people who test positive who actually have the disease / # of positive tests
103
Negative predictive value (NPV)
of people who test negative for a disease and do not have the disease / total # of people who test negative
104
Receiver operating characteristic curve (ROC curve)
used to set the cutoff value of a continuous value test
105
What does the ROC curve show?
trade=off between sensitivity and specificity
106
Lead time bias
Overestimation of survival duration attributable to earlier detection by screening than clinical presentation
107
Length bias
Screening is more likely to detect cases that are progressing slowly compared with those with rapid progression of disease who manifest clinically. The slow-progressing cases are usually milder and more likely to survive, leading to an overestimation of survival as a result of screening
108
Active surveillance
Involves having a research team go out into the community and look for cases of disease. It is very accurate but also expensive.
109
Passive surveillance
Relies on existing reporting systems such as the mandatory reporting of nationally notifiable diseases; may result in missed cases
110
Digital surveillance
Web crawling to identify reports of disease
111
Sentinel surveillance
Monitors a special community for changes in the distribution of disease
112
Dichotomous Variables
Have only 2 possible values
113
Nominal variables
Aka categorical variables; data are collected on what category the participant falls in. The categories of the variable have no inherent order
114
Ordinal variables
Categories have an inherent order
115
Continuous Variables
theoretically can be any value between a minimum and maximum value
116
Interval variables
Type of continuous variable that has a distinct order and clearly defined intervals but lack a true zero or a zero value that is equivalent to an absence of the variable. The values also fail to reveal ratios of amounts
117
Ratio variables
Have a true zero and values of the variable act as true ratios of one another
118
When do you use frequency tables?
Nominal and ordinal variables
119
Measures of central tendency
mean, median, mode
120
Measures of variablity
Variance Standard deviation
121
When do you use pie charts?
Nominal variables
122
When do you use bar charts?
nominal variables
123
When do you use histograms?
Ordinal variables
124
When do you use box and whisker plots?
Continuous variables
125
Central Limit Theorem
With repeated sampling, the individual mean calculations of samples form a normal distribution
126
Type 1 error
Rejecting a true null hypothesis, alpha
127
Type 2 error
Failing to reject a false null hypothesis, beta
128
Power
1-B; probability of correctly rejecting the null hypothesis
129
Ways to manipulate power
Increase alpha: shift the cutoff for rejecting the null to the left. Power will increase but may increase likelihood of type 1 error Increase the effect size: Deals with how far your sample value is from the population value. Researchers usually have no control over the effect of the conditions that cause a deviation from the conditions of the null. In calculating power, a researcher will read past studies or conduct a pilot study to get an estimate for how large an effect size they can expect to observe in their study Increase the sample size: increase power
130
Non-parametric tests
Distribution free tests; can be used when data are not normal
131
Pearson correlation coefficient (r)
ranges from -1 to +1. Reveals the fit of the data to the regression line and the direction of the association between the variables.
132
Coefficient of determination (r squared)
Standardizes the value to be used as a metric of the fit of the linear regression model for positively and negatively associated variables.
133
Kaplan Meier curves
Used in survival analysis, provide a graphical display with study time on x axis and probability of surviving on the y axis. Survivorship at each point of time is the probability an individual will survive to the next time point given that they have already survived up until now