Intro to Biostatistics Flashcards Preview

Infection & Immunity Block > Intro to Biostatistics > Flashcards

Flashcards in Intro to Biostatistics Deck (40):

Types of Study Design

Descriptive Study: Description of what is happening in a population.

Analytic Study: Quantification of the relationship between two factors (i.e., effect of intervention on an outcome).

Experimental Study: Manipulation of the exposure via randomization to intervention or exposure.

Observational Study: Measurement of exposure to a matched group


Types of Observational vs. Experimental Studies


Randomized control trial
Community trial


Ecological Study

Units of analysis are populations or groups of people and not individuals.

Focuses on the comparison of groups rather than individuals


Ecological Study: Advantages

Low cost


Not all measurements can be made on individuals

Ecologic effects are main interest (at the population level)

Simplicity of analyses and presentation

Hypotheses generating for future research


Ecological Study: Disadvantages

Prone to “ecological fallacy”:
Assumptions that relationships observed for groups hold true for individuals.
Such inferences made using group-level data may not always be correct at the individual level.

Cannot adjust for confounds due to lack of comparability (due to lack of data on all potential covariates)
A covariate is a secondary variable that can affect the relationship between the dependent variable and other independent variables of primary interest.

Missing data


Cross-Sectional Study

Surveys exposures and disease status at a single point in time (a cross-section of the population)

Measures prevalence, not incidence of disease

Suitable for studying conditions that are relatively frequent with long duration of expression (nonfatal, chronic conditions)

Not suitable for studying rare or highly fatal diseases or a disease with short duration of expression

Example: community surveys


Incidence vs Prevalence

Incidence: rate of new cases

Prevalence: actual number of cases alive at one point in time


Cross-Sectional Study: Advantages

Low cost


Less time-consuming than other designs

Allows study of several diseases/exposures

Provides estimates for population burden, health planning and priority setting of health problems


Cross-Sectional Study: Disadvantages

Weaker design because it measures prevalence, not incidence of disease. (Prevalent cases are survivors).

Temporal sequence of exposure and effect is difficult to determine.

Difficult to determine when disease occurred.

Rare diseases and quickly emerging diseases are difficult to study.


Cohort Study

One or more cohorts (i.e., samples) are followed prospectively.

Prospective studies follow a condition, concern or disease into the future to determine which risk factors are associated with it

Following and measuring things from people over time for certain conditions, concerns, or diseases to determine risk factors


Cohort Study: Advantages

Exposure status determined before disease detection.

Study subjects selected before disease detection.

Study subjects can be matched to help control for confounding variables.

Ability to study several outcomes for each exposure


Cohort Study: Disadvantages



Not suitable for rare diseases or diseases with long latency

No randomization (subject characteristics imbalances in patient characteristics could exist

Loss to follow-up


Case-Control Study

Compares exposures in disease cases versus healthy controls from same population

At one point in time, but looking back (retrospective)


Case-Control Study: Advantages

Low cost

Less time-consuming than other designs

Most feasible design for disease outcomes that are rare


Case-Control Study: Disadvantages

Not a suitable design when disease outcome for a specific exposure is not known at start of study.

Exposure measurements taken after disease occurrence (retrospective data).

Disease status can influence selection of study subjects


Randomized Controlled Trials (RCTs)

Experimental comparison study where participants are randomized to experimental or control groups.

Best for studying the effect of an treatment/test.

Gold standard for epidemiological research


Randomized Control Trials (RCTs)

Primary purpose
Reduces selection bias in the allocation of intervention.
Each participant has an equal chance of being in experimental or control group.

Secondary purpose
If large sample size, the experimental and control groups should have similar baseline characteristics.
Helps to control for known and unknown factors.


Advantages of RCTs

Randomization balances distribution of confounders.

Blinding of participants and researchers reduces bias in assessment of outcomes.

Detailed information collected at baseline and follow-up periods.

Populations of participating individuals are clearly identified

Results can be analyzed with well-known statistical tools


Disadvantages of RCTs

Expensive and time-consuming

Volunteer bias

Large sample size may be required

Participant exclusion may limit generalizability

Adherence may be an issue

Sponsor or funding source may be an issue

Ethical concerns


Community Trial

Experimental studies with whole communities (e.g., cities, states) as experimental units.

The intervention is assigned to all members in each of a number of communities.

Community trials follow the same procedures as RCTs (eligibility criteria, informed consent, randomization, follow-up measures).

Blinding and double blinding are not generally used in community trials


Community Trial Advantages

Randomization balances distribution of confounders.

Detailed information collected at baseline and follow-up periods.

Results can be analyzed with well-known statistical tools.

Directly estimate the impact of change in behavior or modifiable exposure on the incidence of disease.


Community Trial Disadvantages

Expensive, time-consuming

Difficulty controlling study entrance study, intervention delivery, and monitoring of outcomes.

Fewer study units are capable of being randomized, which affects comparability.

Affected by population dynamics, secular trends, and nonintervention influences.


Systematic and Random Error

Errors can be systematic (differential ) or random (non-differential)

Systematic error: Use of an invalid outcome measure that is consistently wrong in a particular direction (e.g., faulty measuring instrument)

Random error: Use of an invalid outcome measure that has no apparent connection to any other measurement or variable, generally regarded as due to chance

Use of the term “Bias” should be reserved for systematic (differential) error


Selection vs. Detection Bias

Selection bias: systematic error in the ascertainment of study subjects; not random
Can lead to systematic differences between baseline characteristics of the groups that are compared.

Detection bias: systematic differences between groups in how outcomes are determined.
A potential artifact caused by use of a particular diagnostic technique or type of equipment.


Confounding and Membership Bias

Confounding bias: a third factor that is related to both exposure and outcome accounts for some/all of the observed relationship.

Membership bias: individuals who belong to an organized group (e.g. military, religious group) tend to differ systematically with regards to health from the general population.
Members of an organized tend to be healthier and less prone to morbidity and premature mortality.


Recall and Instrument Bias

Recall bias: remembering past exposure error differs by time or between cases and controls.

Instrument bias: this occurs when the measuring instrument is not properly calibrated.(e.g., A scale may be biased to give a higher reading than actual, or lower than actual).


Attrition, Social Desirability, and Lead Time Biases

Attrition bias: systematic differences between groups in withdrawals from a study.

Social desirability bias: tendency to respond to personally or socially sensitive questions in a socially acceptable direction.

Lead time bias: time by which a diagnosis can be advanced by screening
In estimating survival time, acknowledge the point when early diagnosis is made versus usual diagnosis in order to control for lead time bias.


Types of Data

Nominal data: Numbers used to categorize data.
e.g., Gender, race, marital status, etc.

Ordinal data: Numbers used to order or rank data.
e.g., "Is your health poor, reasonable, good, or excellent?

Interval data: Numbers used to order data by equal intervals.
e.g., Time of day, temperature

Ratio data: Numbers that can be compared to an absolute zero (i.e., a point where none of the variable being measured exists).
e.g., Height, weight, age, income


Central Tendency

Where the center of the distribution tends to be located

Three measures of central tendency

Which one you report is related to the scale of measurement and the shape of the distribution



The most frequently occurring score

Look at the simple frequency of each score

Unimodal or bimodal

Report mode when using nominal scale, the most frequently occurring category

If you have a rectangular distribution do not report the mode



Score at the 50th percentile

If normal distribution the median is the same as the mode and mean

Arrange scores from lowest to highest, if odd number of scores the median is the one in the middle, if even number of scores then average the two scores in the middle

Used when have ordinal scale and when the distribution is skewed



Score at the exact mathematical center of distribution (average)

Used with interval and ratio scales, and when have a symmetrical and unimodal distribution

Not accurate when distribution is skewed because it is pulled towards the tail


Measures of Variability

Extent to which the scores differ from each other or how spread out the scores are

Tells us how accurately the measure of central tendency describes the distribution

Shape of the distribution

Types: range variance, standard deviation



Can report the lowest and highest value

Or report the maximum difference between the lowest and highest

Semi-interquartile range used with the median: one half the distance between the scores at the 25th and 75th percentile



Statistical variance gives a measure of how the data distributes itself about the mean or expected value.

If individual observations vary greatly from the group mean, the variance is big; and vice versa.

Unlike range that only looks at the extremes, the variance looks at all the data points and then determines their distribution.


Standard Deviation

Standard Deviation is a measure of variability of scores in a particular sample

σ = is the population standard deviation.

s = the sample standard deviation (sq root of the variance)

Variance = s2


Null Hypothesis

The null hypothesis (H0) is an essential part of any research design and is always tested.

It reflects that there will be no observed effect for the experiment.

The null hypothesis (H0) is a hypothesis which the researcher tries to reject.


Alternative Hypothesis

The alternative or experimental hypothesis (HA or H1) reflects that there will be an observed effect for the experiment


Type I Error

Type I error
Rejecting the null when it is true
“False positive”

Type 1 errors can be controlled
Alpha is the maximum probability that we have a type I error.
It is related to the level of significance selected.
For a 95% confidence level, alpha is 0.05.
There is a 5% probability that we will reject the true null hypothesis


Type II Error

Type II Error

NOT rejecting the null when it is wrong
“False negative”
The probability of a type II error is denoted by beta.

This number is related to the power or sensitivity of the hypothesis test, denoted by 1 – β