Any factor that moves the findings of a study away from the truth


Binary data

Data where there are only two possible values such as survived/died; also known as dichotomous data


Blinding in a randomized controlled trial

When the treatment allocation is concealed from either the subject or the assessor or both


Case-control studies

Observational study that starts with cases with a disease and compares them with controls without the disease to investigate possible risk factors


Chi-squared goodness of fit test

A statistical test used to investigate whether a frequency distribution follows a specific theoretical distribution


Chi-squared test

A statistical test used to investigate the association between two categorical variables


Cluster Analysis

A statistical method used to identify groups or clusters of individuals who have common features in terms of known variables


Cluster randomization

When groups of individuals are allocated to treatments so that all subjects in a group receive the same treatment


Cohort study

Observational study that starts with a sample of individuals who are disease-free and measures possible causal factors at baseline and over time. The cohort of subjects is followed and their disease status is observed to investigate which factors are linked to the disease


Confidence interval (CI)

A range of values that indicates the precision of an estimate; for a 95% CI we can be 95% confident that the interval contains the true value


Continuous data

Data that lie on a continuum and so can take any value between two limits


Cox proportional hazards regression

A multifactorial regression model used with a time-to-event outcome


Crossover trial

A single group study where each patient receives each of two or more treatments in turn so that they act as their own control


Degrees of freedom (DF or df)

A quantity used in statistical testing and modelling that is related to the size of the sample and the number of parameters that have been estimated


Dummy variables

Used in regression modelling to enable a categorical predictor variable to be included, by converting a variable with n categories into n–1 binary variables, where one category is the reference category


Equivalence trial

A trial that aims to see if a new treatment is no better or worse than an existing one


Fisher’s exact test

A statistical test that can be used to investigate the association between two categorical variables when the sample is small


Forest plot

A graph used to display individual study estimates and confidence intervals, and the pooled estimate and confidence interval in a meta-analysis


Gold standard test

A diagnostic test that is regarded as definitive, i.e. it gives the correct answer


Funnel plot

A simple graphical method for exploring the results from studies to see if publication bias might be present


Hazard ratio

Hazard ratio

In survival analysis, the ratio of hazards or risks of outcome in two groups



Where there is statistical variability between estimates such as may be found in a meta-analysis



The number of new cases of a given condition occurring within a specific time period


Indirect standardization

Gives the standardized mortality ratio (SMR), which is the ratio of the observed number of deaths in the comparison population and the number expected if that population had the same age-specific death rates as the standard population


Intention to treat analysis

Statistical analysis where patients are analysed in the treatment group to which they were originally randomly allocated even if they did not actually receive that treatment


Logistic regression

A multifactorial regression model used with a binary outcome


Logrank test

A statistical test used to compare time-to-event data in two or more groups



A statistical analysis which combines the results of several independent studies examining the same question


Multifactorial methods

Statistical models fitted to datasets with one outcome variable and several predictor variables; used to disentangle effects


Multiple regression

A multifactorial regression model used with a continuous outcome