Study designs Flashcards
(22 cards)
Diseases vary in time and space and by population group. Epidemiology takes a systematic approach to exploring this variation – following highly structured sequences of observation
Bradford Hill criteria
1. Consistency of association - results are replicated in studies using different methods, in a variety of settings – melanoma associated with UV exposure, all studies suggest that
2. Strength of association - the larger the magnitude of the excess risk or relative risk the more persuasive the evidence – scrotal cancer, 200% higher in chimney sweepers – occupational exposure, asbestos industry-lung cancer ((association does not mean causality but larger associations lead to higher probability)
3. Specificity - when a suspected cause produces a single well-defined effect (There must be a one to one relationship between cause and outcome.) asbestos exposure to lung disease – dosage, more specific the relationship, the more likely causes disease.
4. Dose/response relationship - when increasing the exposure produces larger excess or relative risks, liver toxicity paracetamol >4g/day result in liver toxicity (linear dose response might be there) (NAPQI- Acetaminophen is metabolized to form NAPQI)
5. Time sequence (temporality)- the cause/exposure must have preceded the event, cause must proceed the effect, thalidomide in pregnant women, children died
6. Biological plausibility - when the association makes biological sense-radium girls – watch factories- lip, dip and paint, anaemia, factures, paint was safe – girls painted, nails, skin and teeth with radium
7. Coherence - for example if the association ‘explains’ other facts known about the disease, such as age and sex distribution (cause and effect make sense, it shouldn’t contradict scientific findings) Broncho epithelial change with carcinogenicity of cigarette smoke
8. Experiment - if removal of the cause reduces the risk of disease
Epidemiological studies: design
There are four basic questions to ask about how disease is patterned in human populations, and to these correspond the four basic types of study design:
• How much of this is there? – descriptive surveys
• Why have some people got this and other people not got it? – case and comparator (control)
• What happens over time to people with different starting levels of some factor? – cohort;
• What happens if we change something? – intervention trials if we have introduced the change deliberately; interrupted time series if we are merely watching the effect of someone else’s change (such as a new law)
Cross sectional studies gather data about people / places etc. ONCE, at ONE point / period in time
They are excellent for giving us a picture, a snap shot, of the levels and distribution of characteristics in a population
By gathering information about both outcomes (i.e. disease) and exposure cross-sectional studies can be used to determine the association between exposures and outcomes
Remember, association means that the values of the exposure variable have a relationship with the value of the outcome variable, it does NOT mean that the exposure causes the disease.
This is an observational study design
For each subject, exposure and disease outcome are assessed simultaneously
e.g. Our study might determine if an individual has hypertension and what their consumption of cholesterol is
We then usually compare the prevalence of disease in persons with and without the exposure of interest
e.g. Are rates of hypertension higher among those who consume relatively large amounts of cholesterol than among those who consume relatively small amounts?
Are often used to study conditions that are relatively common with long duration (non-fatal, chronic conditions)
Are Not suitable for studying highly fatal diseases, or a disease with short duration of expression, or emerging diseases
Why not?
Quick, easy, and cheap
Can study multiple exposures and disease outcomes simultaneously
Possible to get very, very large sample sizes and easier to include a wide variety of people
Good design for hypothesis generation
Good for describing the magnitude and distribution of health problems (prevalence)
Good for studying conditions that are relatively frequent with long duration of expression (nonfatal, chronic conditions)
No issue of subjecting anyone to interventions
Not everything can be studied experimentally
Disadvantages
Not a useful type of study for establishing causal relationships
Prevalent cases are survivors
Disease may cure the exposure (e.g. alcoholism & recorded consumption)
Rare events and quickly emerging diseases are problematic
Very hard to completely isolate the relationship between exposure and outcome (residual confounding)
Socio-economic position and cross-sectional In public health, we are often interested the relationships between health or related behaviour and socio-economic position
In a cross-sectional study, we can often measure socio-economic position and try to account for it in our models
We adjust the relationships of interest for socio-economic position, as measured by income for example
So, for example, we might adjust the relationship between active travel and street characteristics for socio-economic position
Yet, often, our results are plagued by what we can ‘residual confounding’; some of the relationships we see might still be driven by socio-economic position
Why is this?
Income as measured at the time of survey might not reflect socio-economic position over the life course
Socio-economic position is a multi-factorial concept; it’s not possible to precisely measure it with one (or even several) variables
For example; residence in a nicer (more beautiful) neighbourhood might not only depend on income; it might depend on assets, on accumulated wealth, on family history, on family structure etc etc.
People often dismiss cross-sectional studies because
They can’t prove causality: “correlation is not causation”
There’s a fear about residual confounding
Yet, think back to the Bradford-Hill criteria…
Things that are causally related are usually (though not always) associated
Cross-sectional studies can tell us something useful, provided we understand their limitations
If you look at the MRC guidance for developing complex interventions (i.e. experimental studies on public health), its position on associational studies is very clear; you need them as a foundation for your experimental study. The association is part of the rationale.
• Scottish Health Survey
• SALSUS
What if your survey includes questions about what they used to do in the past?
Does that make it a longitudinal study?
Variety in the level and sophistication of ways to use a cross-sectional survey to capture data with a temporal element
Simple time-related questions
“When were you diagnosed with high blood pressure?”
“How long have you lived in this area?”
More complex techniques for reconstructing a person’s history
Seeks to guide the recollection of past events and experiences by
“…cross-referencing the dates of any changes in the areas of interest, for example occupation and housing, against dates in the subject’s personal life, such as marriage and death of mother, as well as against events in the external world, like coronations and wars.”
So, first, the interviewer builds a grid to represent time, and places key events in the respondent’s life on it.
Then, they use the grid to help the person describe the timing and duration of events in their life
A repeat cross-sectional study is one in which you do a cross-sectional study on a population, and then repeat it again (and again, and again) at later dates
Very often, population health surveys are like this, and the Scottish health survey is a prime example
What can you do with a repeat cross-sectional study?
Repeated assessment of prevalence = time series
Study designs are the frameworks we can use to plan our data capture and analysis
Cross-sectional studies are one type of study design, defined by the fact that data are gathered at one point in time
Ecological studies are those in which the unit of analysis is an area or a group of people, Study in which the units of analysis are areas, or groups of people, rather than individuals.
Eg: countries, regions, cities, neighbourhoods, school classes, ethnic groups etc
Geographic correlations between cancer mortality rates and alcohol-tobacco consumption in the United States.J Natl Cancer Inst. 1974 Sep;53(3):631-9
ecological studies
• Examines relationship between exposure & outcome with population-level rather than individual-level data (usually defines groups by place, time, or both)
Examples
• eg John Snow and cholera in London
• eg Intersalt – comparison of BP and salt intake in different countries
advantages
• Useful to generate hypotheses (e.g. salt and high BP, childhood influences on adult CHD)
• Inexpensive
• Less time-consuming
• Simple and easy to understand
• Examines community- group, or national-level data and trends
disavan
• Subject to the ecological fallacy, which infers association at the population level whereas one may not exist at the individual level
• Difficult to detect complicated exposure-outcome relationships
The ecological fallacy is thatrelationshipsbetweenexposureanddiseasewhicharederivedfromtheanalysisofgroupsorareasdonotalwaysholdtrueatindividuallevel.Forexample,noteveryoneinanareawithahighunemploymentrate,willbeunemployed.NoteveryoneinacountrywithahighaverageBMI,willbefat.NoteveryoneinScotlandisafat,drunk,smoker…
(e.g. Case Series)
• A set of cases of a disease or health problem. Often from one hospital or clinician.
• Aims to draw conclusions about common features of the condition or problem
• Needs a case definition & a method of identifying cases
advan
• Small series can often be assembled quickly and cheaply
• Can rapidly generate hypotheses about causes
• can help generate case definitions for new conditions
disadvan
• Often selected and therefore unrepresentative cases (eg from a specialist practice)
Read the paper.- 2005;59;565-567
Richard Mitchell, Gerry Fowkes, David Blane and Mel Bartley
High rates of ischaemic heart disease in Scotland are not explained by conventional risk factors
What did the study do?
Assess the prevalence of obesity and investigate sex-specific associations between risk factors and youth obesity. There is little justification from the literature provided to justify undertaking the study. However, the authors have justified it by suggesting information on this is required for designing policy and programmes that promote healthy weight. It should be noted that this is a short communication and therefore authors are limited on words and number of references.
Discuss its strengths and weaknesses
Study design – cross-sectional (repeat cross-sectional if the authors make comparisons between the different years); therefore it is observational and analytical (assessing associations between different groups defined by the responses to questions – internal comparison groups are made). Can only provide associations. Not possible to explore the temporal nature of the relationship – could those who are more obese deliberately avoid physical activity or could those who are obese be using physical activity and diet to reduce their weight.
Sample selection – uses existing cross-sectional studies. Recruitment into these cross-sectional studies was by random digit dialling. While this will reduce selection bias, there is still and issue with this approach being unable to contact those without a telephone or land line – not giving them the opportunity to be in the study. Response rate is also an issue; however, if you go to the original survey report for 2011, you can see response rate was 86%, which is high for a study of this type. If you look at the original survey you can also see that recruitment was during the school term-time (and at different times of the day to capture working parents/guardians), making it more likely to get responses but it could be possible that some families are away on holiday during term-time. There’s no information provided about the non-responders and whether they were systematically different from the participants. It is generally a large representative sample but no power calculation is presented.
Exposures – while an extensive questionnaire was used and is the same for both years studied. However, there’s no indication of this being a validated questionnaire in the paper but the original survey states that it was “validated by other researchers”. Biggest issue with determination of exposures is that they asked parents of children who are able to decide what they eat when away from their parents or who may well uses electronic devices without their parent’s knowledge. This could be information bias because it may not be random and could lead to an over- or under- estimate of the associations e.g. one could suggest that single working parents/guardians may be less likely to know about hidden activities?
Outcome measure – self reported height and weight versus measured – could be information bias if one group more or less likely to under- or over- estimate this. There is some evidence of this from the literature but the authors do report that a Government paper suggests it is a reasonable proxy. Self-report used as simpler and cheaper to do on such a large population.
Effect modification – could be an issue because the relationships between exposures and outcome could be different – there will be more on this in the alternative explanations session.
Consider how else it might be done
Is a cross-sectional design the only one possible for this question?
Cohort study could be used and could provide evidence of temporal relationship.
Is an observational design the only one possible?
Could you / how would you explore this issue in an experimental design?
I’d argue yes because for many of the risk factors it would be unethical or extremely challenging to undertake an experimental approach e.g. change marital status or encourage less PA when we know it is of benefit. However, it would be possible to look at the impact of improving modifiable adverse risk factors using an experimental design e.g. PA initiatives in schools.
Bradford Hill applied to Vit D/the Glasgow effect
Strength of association – yes (20% reduction in all cause mortality in those with highest compared to lowest levels Vit D.. But wide CIs)
Dose response, consistency, coherence – inconsistent and non-linear relationship between vit D level and mortality risk
Specificity – no specific hypothesis – “all cause mortality”
Temporality – inconsistent relationship between length of f/u and outcome, reverse causality
Biological plausibility – main causes of excess premature mortality in Glasgow ‘external causes’, substance misuse, suicide… unlikely cause is as simple as low vitamin D (unfortunately)
Experimental evidence – very few RCTs, mostly observational evidence
Analogy – plenty of examples of nutritional deficiencies as causal factors in disease outcome
Does exposure to Zika virus in pregnancy cause microcephaly?
(Association between Zika virus and microcephaly in French Polynesia, 2013-15: a retrospective study. Lancet.2016 May 21;387(10033):2125-2132.)
Does economic recession cause suicide?
(Suicides associated with the 2008-10 economic recession in England: time trend analysis. BMJ2012;345)
Does living next to fast food outlets cause obesity?
(Associations between exposure to takeaway food outlets, takeaway food consumption, and body weight in Cambridgeshire, UK: population based, cross sectional study BMJ 2014;348:g1464)
Self Directed Learning Exercise Week 2
Reference: 25-hydroxyvitamin D levels and the risk of mortality in the general population.
Melamed ML1, Michos ED, Post W, Astor B. Arch Intern Med. 2008 Aug 11;168(15):1629-37. doi: 10.1001/archinte.168.15.1629.
Questions
- What are the aims of this study and how do the authors justify these in the introduction?
- What are the exposures and outcomes in this study?
- What type of study design is this (justify your answer)?
- What are the advantages and disadvantages of using national population surveys to carry out this type of research?
- Summarise the main findings of the study.
- The authors state that:
“Several authors have commented that the optimal levels of 25(OH)D should be greater than 30 ng/mL.19,20 In our observational study, we found that there was a lower risk of mortality at levels of 30 to 49 ng/mL, but that at levels greater than 50 ng/mL there was again a higher risk of mortality in women”.
Comment on the implications of this statement for public health practice.
Answers
- Vitamin D deficiency has been associated with several conditions including cardiovascular disease, cancer and diabetes and previous randomised controlled trials have suggested that supplementation with vitamin D can reduce mortality. This may be important as almost half of the US population (41% of men and 53% of women) have below the recommended level. This study aims to determine the association between vitamin D levels and mortality in a large sample of the general population.
- Exposure – 25 hydroxyvitamin D level Outcomes – all cause mortality, cause specific mortality (Table 4 - cardiovascular, cancer, infectious disease, external causes).
- This is a prospective cohort study design: prospective, because the exposure (vitamin D levels) was collected at baseline and the subjects followed up over a period of time to determine outcomes of interest.
- Advantages
o Large numbers of participants increases power to detect effects
o Sampling frame ensures representativeness of population in terms of age, sex, ethnicity, socioeconomic status which increases transferability of findings to the general population
o Large number of exposures and outcomes can be studied simultaneously
Disadvantages
o Selection bias may be present in form of non-response bias – this may be important if those who do and do not participate differ in other characteristics
o Data collected at baseline may not capture all of the information required to provide background necessary for current research question
o May be prone to bias from loss to follow-up – important if those who leave the study population differ from those who do not
o Risk of ‘data fishing’ – also called ‘data dredging’, this refers to performing multiple analyses on exposures and outcomes without a pre-defined research question or hypothesis. This increases the possibility of finding statistically significant associations through chance when in reality there is no causal relationship. Note that this can occur in other study types. Statistical techniques are available to address this issue at the analysis stage.
- At completion of follow-up, individuals who had vitamin D levels in the lowest quartile were 26% more likely to die compared to those with highest levels in the study population (Table 4, fully adjusted model). None of the associations between vitamin D level and cause specific mortality were statistically significant.
- It has been suggested that supplementation with vitamin D may be of benefit at a population level, however the optimum level of vitamin D for health remains undetermined. In this study and others, increased mortality was observed at highest as well as lowest levels of vitamin D compared to those with levels in the mid-range (Figures 1 and 2). Supplementation should be advised for those groups who are most at risk of deficiency (infants, pregnant and breastfeeding women, the elderly and those with restricted exposure to sunlight) but cannot be recommended for the population as a whole on the basis of observational studies alone.
Critical appraisal – Session 2
Introduction to Epidemiology
MPH, 2015-16
The following questions are based on the paper:
Hidaka, B.H., Depression as a disease of modernity: Explanations for increasing prevalence, J. Affect.
Disord. (2012), doi:10.1016/j.jad.2011.12.036
Notes
1. What are the exposure(s) and outcome(s) being studied in this paper? Comment on the methods of measuring their occurrence.
The exposure is “modernity” and the outcome is depression. Modernity is defined as a set of environmental factors on page 2 but the classification in section 4 is largely about individual factors that are different. It is therefore not really argued that modernity, as initially defined, is associated with depression. The author says that depression is defined only by current criteria based on clinical symptoms for his review. However, no definition for these symptoms is given and it is not clear that all studies being reviewed use consistent criteria.
2. The author describes “rates” of depression throughout the paper. Comment on the use of this term.
A rate is usually defined as the occurrence of an outcome of interest over a defined person-time – although the term is used somewhat inconsistently. In general, prevalence should be considered as a proportion of the at-risk population, not as a rate. The author mixes two measures of prevalence. The first is point prevalence and the second is period prevalence, specifically one-year and lifetime prevalence. By collecting occurrence of depression at any point over a period of time, its occurrence will be higher than at a single point in time and the longer the observation period, the higher the occurrence. The two measures of prevalence are used interchangeably but they should be separated as, for example, shorter episodes of mild depression may produce higher lifetime prevalences than single episodes of more severe depression but it could be argued that the latter is more significant. There are examples, however, where the same method (e.g. 1-year period prevalence) is used in repeat samples to suggest an increasing prevalence over time.
3. What type of epidemiologic study is shown in Figures 1 and 2? What are the strengths and weaknesses of using this approach in this case?
Both figures are ecologic studies, in that they use geographic areas, not individuals, as the unit of analysis. The advantages of this approach is that routinely-available data (such as Gross Domestic Product or Gini coefficients) can be correlated. New data collection and follow-up time is usually not needed. Any observed associations may generate hypotheses that can be tested with analytic studies. The weaknesses of this method include the ecological fallacy – in this case that income and inequalities are associated with mood disorders at an individual level, whereas it may be that individuals who are less affluent are more likely to suffer from depression. These plots do not show depression, as such, which is the topic of the paper.
4. Do you agree with the conclusion on page 4 that “The above evidence suggests higher depression prevalence and risk is associated with general aspects of modernization.”?
Despite the author stating that only clinically symptomatic depression was being considered, it remains unclear what these are. If you look up some of the references (Colla, J., Buka, S., Harrington, D., Murphy, J.M., 2006. Depression and modernization: a cross-cultural study of women. Social Psychiatry and Psychiatric Epidemiology 41, 271–279.) the DSM-III criteria are used, which is a good, consistent definition of depressive symptoms. The Lundby study (Hagnell, O., 1989. Repeated incidence and prevalence studies of mental disorders in a total population followed during 25 years. The Lundby Study, Sweden. Acta Psychiatrica Scandinavica. Supplementum 348, 61–77 discussion 167–178) used DSM-IV criteria, which are very similar. So it seems probable that consistent definitions of depression have been used and a lower threshold for diagnosing depression is not obviously the reason for increasing prevalence. The question of risk is different, as it requires measures of incidence, not prevalence. Prevalence measures the likelihood of being depressed at a point in time, or over a defined period, but incidence (and thereby risk) measures the likelihood of becoming depressed over a given time period. As noted already, changes in both incidence and length of depressive illness will affect prevalence of depression.
One of the implications from this paper is that modernity causes depression. If it doesn’t then there is little practical value to observing an association. Criteria for deciding if an association is causal or not will be taught later in the course. The author says he will argue that modernisation – “the
conglomeration of a society’s urbanization, industrialization, technological advancement, secularization, consumerism, and westernization” – explains an increase in the prevalence of depression. What he actually argues is that a few factors (largely associated with overweight – which is not surprising given the provenance is a department of dietetics and nutrition), which are mainly individual behaviours or attributes, are associated with depression.
5. What are the main epidemiologic factors associated with depression?
It is useful to get into a habit of classifying epidemiologic exposures in terms of time, place and person plus artefact. Thus you could say
Artefact – differences in criteria for measuring depression over time or between geographic areas; differences in validity of criteria in different cultures (these are information/measurement biases). Selection biases may also be responsible. Confounding – that is, it is not modernity but other risk factors that are responsible for the association – may also be responsible.
Time – Were prevalences standardised or adjusted for different demographics between populations (specifically age under this heading).
Place – social environment and income inequalities have been discussed but as noted already, most aspects of modernization are said to be environmental so one would expect greater emphasis on relationships between depression and urbanisation, technology, westernization, etc.
Person – the author considers obesity, diet, work patterns as they affect exposure to daylight and sleep. Sex differences, genetic influences, spiritual beliefs and many other factors might also have been considered.
Cohort - a number of people banded together or treated as a group
Characteristics of cohort studies
Defined group with common characteristics (time, place, person)
Follow study population over period of time
Health outcome data may be obtained on the same individuals more than once
Study population may be general or have defined characteristics or disease
In a descriptive cohort study, the aim is to describe risk/incidence of an event
In an analytical cohort study, the aim is to examine relationship between exposure and the outcome of interest
Defining the study question
Establish inclusion criteria and select study population – make sure representative
Select comparison group (if analytical cohort study)
Collection of baseline data including measurement of exposure
Collect follow-up data
Determine if outcome has occurred
Analyse data
Interpretation of results
Cohort studies may be prospective, retrospective or ambidirectional
Classification is based on temporal relationship between the initiation of the study and the occurrence of the outcome, i.e. outcome before initiation = retrospective
Prospective cohort
Retrospective cohort
Ambidirectional cohort
Framingham – prospective
Recruited 1948-1950 from Framingham without coronary heart disease and followed up over time to see if they developed it.
Increasingly used for both research and service purposes in settings with high quality routine data sources
Link research dataset to routine data
Record linkage
Deterministic/Probabilistic
Records linked on balance of probabilities: date of birth, sex, postcode
Use of CHI (Community Health Index number) or unique identifier allows deterministic linkage
Mandatory on all clinical communications, i.e. anyone who has received any healthcare in Scotland should have a CHI number
CHI seeding
Advantages of cohort studies
Useful for rare exposures
Useful for more than one outcome
Best way to study the incidence of an outcome
Temporal relationship between exposure and outcome is clear as exposure status defined at start of study
If prospective, minimises bias in measurement of exposure
Sometimes the only ethical or legal way
Loss to follow up
Disadvantages of cohort studies
Not good for study of rare outcomes
If retrospective they rely on the adequacy of records
Exposed may be followed more closely than unexposed
If prospective they can be very expensive and slow
As they are follow up studies, the validity of results is highly sensitive to losses to follow up (migration, withdrawal, lack of participation, death)
Assessing loss to follow-up
Is follow-up likely to be random or non-random
What proportion are lost to follow-up?
Is loss to follow-up different between the exposed and the non-exposed group?
What are the baseline characteristics of those who are lost to follow-up versus those not? (i.e. what do the missing people look like?)
How is loss to follow-up likely to impact on the results of the study (over-estimate or under-estimate association?)
Loss to follow-up is a very important consideration for cohort studies in particular
The number of losses is less important than how losses are related to outcome and risk factor:
If losses are random, only power is affected
If disease incidence is research question, loss to follow-up that is related to the outcome will bias results
If association of risk factor to disease is focus, bias results only if related to both the outcome and the risk factor
HPN Guidance 2012
Provision of a range of condoms in a range of settings
HIV treatment and support for treatment adherence
Brief and intensive interventions (inter-personal skills and motivation to adopt safe-sex)
New SMC Ruling 2017
- Pre-exposure prophylaxis for HIV (PrEP) approved for use by ‘high-risk’ MSM
The PROUD study
86% reduction in HIV among MSM (men who have sex with men) PrEP users
Other sexually transmitted infections increased (not statistically significant)
1. How would you design a cohort study to evaluate if PrEP is effective?
- How would you select participants?
MSM commenced on PrEP in NHS clinics
- Would you have an internal/external comparison group?
Regular vs non-regular users, before versus after, Scotland versus England
- How would you measure exposure?
Clinic records to check regularly using PrEP
- How would you measure outcomes?
Clinic records, national HIV database, HIV and other STIs
2. What would be the main types of bias in your study?
Selection bias, misclassification bias, follow-up bias
Person years of follow-up
Each individual contributes person years once they are recruited to the study
Recruitment may be period of years
Develop outcome of interest – may be censored
Death - ‘censored’, and no longer contribute person years to the study
Total number of person years are added up over course of study
100 people in study followed for 3 years:
50 people are followed up for 2 years, and 50 are followed up for 3 years
Total PY = 250 person years
Incidence rates: the number of new events (cases of disease or other health outcome) occurring in a specified time period in a defined population (person-time at risk); excludes prevalent cases
Relative risk: ratio of incidence rates in exposed and non-exposed groups
Also called: rate ratio, risk ratio, relative risk ratio
Incidence rate = Number of events/Number of person years
Incidence of HIV among needle exchange users
= (10/712) X 1,000 = 14.0 per 1,000
Incidence of HIV among participants who did not use needle exchange
= (9/368) X 1,000 = 24.5 per 1,000
RR =Incidence rate exposed/Incidence rate non exposed
Ratio of the incidence rates between the exposed and non exposed groups
Can only be calculated from cohort studies
Interpretation:
RR > 1: Increased risk of outcome among “exposed” group
RR < 1: Decreased risk, or protective effects, among “exposed” group
RR = 1: No association between exposure and outcome
Relative risk for being infected with HIV among needle exchange versus non-needle exchange users
= 14.0/24.5 = 0.57
Needle exchange is associated with a reduction in HIV risk, i.e. There is a protective association
Hazard ratios
A hazard ratio takes account of time, i.e. calculated using person years of follow-up
2x2 table is slightly different
What is the relative risk of death among people in the HIV outpatient cohort in the first quarter, compared to the last quarter?
Calculation of incidence rates
Incidence rate first quarter =
Number of events = 16/Number of person years = 45.6
Incidence rate = 35.1 per 100 person years
Incidence rate last quarter =
Number of events = 12/Number of person years = 136.4
Incidence rate = 8.8 per 100 person years
Relative risk of death = (12/136.4)/(16/45.6) = 0.25
i.e. HIV therapy is protective
Hazard ratio (HR) is a measure of an effect of an intervention on an outcome of interest over time. Hazard ratio is reported most commonly in time-to-event analysis or survival analysis (i.e. when we are interested in knowing how long it takes for a particular event/outcome to occur).
The outcome could be an adverse/negative outcome (e.g. time from treatment/surgery until death/relapse) or a positive outcome (e.g. time to cure/discharge/conceive/heal or disease-free survival).
Hazard Ratio (i.e. the ratio of hazards) = Hazard in the intervention group ÷ Hazard in the control group
Hazard represents the instantaneous event rate, which means the probability that an individual would experience an event (e.g. death/relapse) at a particular given point in time after the intervention, assuming that this individual has survived to that particular point of time without experiencing any event.
Confidence Interval (CI): is the range of values that is likely to include the true population value and is used to measure the precision of the study’s estimate (in this case, the precision of the Hazard Ratio). The narrower the confidence interval, the more precise the estimate. (Precision will be affected by the study’s sample size). If the confidence interval includes 1, then the hazard ratio is not significant.
Interpretation of Hazard Ratio
Because Hazard Ratio is a ratio, then when:
HR = 0.5: at any particular time, half as many patients in the treatment group are experiencing an event compared to the control group.
HR = 1: at any particular time, event rates are the same in both groups,
HR = 2: at any particular time, twice as many patients in the treatment group are experiencing an event compared to the control group.
Odds ratio – if univariate/univariable and multivariate/multivariable logistic regression. Is close to the relative risk for smaller probabilities/rarer diseases.
Hazard ratio – used in survival analysis, like relative risk takes account of time to event or censored BUT represents instantaneous risk at a specific time.
Cohort studies are common in health services research
Terminology can be confusing
Numerous examples of large and influential cohort studies
Unique strengths and weaknesses that must be taken into consideration when interpreting findings
Remember particular risks of bias: selection, misclassification, and loss to follow-up; also confounding
Cohort Studies : Examines a possible association between exposure and outcome by following a group of individuals according to their exposure status (i.e. exposed vs unexposed) over a period of time (often years) to see whether they develop the disease or outcome of interest. Can measure incidence of disease.
• Participants selected on basis of exposure & should be free of outcome under investigation at start of study. In other words, all participants (exposed & unexposed) must be at risk of developing outcome
• Need a clear definition of exposure & outcome
• Prospective Cohort Study – participants identified and followed up over time. A temporal relationship between exposure and outcome can thus be established
• Retrospective Cohort Study – exposure and outcome have already occurred at the start of the study. Pre-existing data, such as medical notes, can be used to assess any causal links, so lengthy follow-up is not required.
• Ambi-directional Cohort Study – Goes both directions. Useful for exposures that have both short and long-term outcomes. E.g. the Air Force Health Study (AFHS) examined potential health effects of exposure to various chemicals. Short-term outcomes like rash were examined retrospectively whilst long-term outcomes including cancer were examined prospectively.
Examples
E.g. Doll & Hills study of Smoking of British Doctors (prospective)
Framingham Study - examines risk factors for CVD in a group of people living in Framingham
Whitehall II Study – explores the relationship between deprivation, stress and CVD in British Civil Servants.
Scottish Health and Ethnicity Linkage Study (Bhopal) (retrospective)
Advan
• Can evaluate multiple outcomes of a single exposure (e.g. cigarette smoking (exposure) & cancer, heart disease, stroke etc (outcomes)
• More efficient for rare exposures
• Good way to directly measure incidence and gain insight into natural history of disorder
• Clear chronological relationship between exposure and outcome
• Can provide a life-course perspective on the causes of disease
• Allows for calculation of incidence rates, relative risk, CIs, survival curves, hazard ratios
Disadvan
• Expensive (especially prospective)
• Time-consuming (retrospective are relatively quicker)
• Inefficient for rare outcomes with long induction or latency periods
• Susceptible to loss to follow up, causing selection bias
Case-control
Define the study question/hypothesis
Select cases
Select appropriate controls
Measure exposures
Analyse data
Interpret results and assess potential sources of error
Define the study question/hypothesis
Make sure that a case control study is the appropriate design to answer your study question
More detail in strength and limitations section
Select cases:
Clear case (outcome) definition
Time
Place
Person
Incident cases / Prevalent cases?
Description of population from which cases come
Description of methods used to identify cases
Select appropriate controls:
Population must be the same as for the cases
(i.e. if individuals selected as controls had developed the outcome of interest, they would have been identified as cases for the study)
Source – how to identify controls
Ratio of controls to cases – power for the study
Number of groups – different sources of controls to answer different questions
Matching?
Measure exposures:
Timeframe of interest?
Data source - routine v study specific
Data collection methods
How will data be collected?
Who by?
Analyse data:
Assess the strength of association between exposure and outcome – odds ratio Measures of effect and impact
Interpret results and assess potential sources of error
Chance
Bias (selection bias, information bias)
Confounding
True result?
What is the contribution of environmental and host-related risk factors to the development of TB in West Africa?
No. of h’holds in the compound
No. of people in the h’hold
Persons per room
Family history of TB
Material walls / floor built with
Ceiling
Number of windows
Water source (tap, well, other)
Electricity, latrines, waste
Animals
House ownership
Occupation, schooling
Religion
Sex
Marital status
Former TB, BCG
HIV status
Smoking, alcohol, drug use
History /tmt worms
History /tmt of asthma
Diabetes
Anaemia
Design a multi-centre epidemiological case-control study
Discuss:
How to define and select cases
How to define and select controls
How to measure exposures
Select cases
Newly detected smear positive pulmonary TB, confirmed by two consecutive acid-fast bacilli (AFB) positive smears and/or positive culture (incident cases)
Age >15 years
Presented at major urban health centre
Recruitment within three countries (Guinée, Guinea Bissau, and The Gambia) with similar ethnic mix, socioeconomic indicators, geographical environment, burden of TB
Select appropriate controls
Two groups
1. Healthy control selected at random within same household
Age-matched to within 10 years of case
- Healthy control selected at random within neighbourhood of cases’ household
Age-matched to within 10 years
of case
Select appropriate controls - Healthy control selected at random within same household
Allows investigation of the effect of individual level factors (by controlling for environmental factor: same household – very similar environmental exposures) - Healthy control selected at random within neighbourhood of cases’ household
Allows investigation of household level exposures, whilst still controlling for environmental exposures associated with the wider neighbourhood
Measure exposures
Standardised questionnaire
Field assistants using appropriate colloquial language
Blood samples
Presence of BCG scar
Analysis: case control studies use odds ratios as the measure of association between the exposure and the outcome
Definition
Odds ratio – the ratio of the odds of exposure amongst cases to the odds of exposure amongst non-cases.
Odds ratio (OR) = odds of exposure in the cases/ odds of exposure in the controls
Definition odds = probability of something occurring/ probability of this not occurring
Amongst cases, the odds of having eaten beef burger to not having eaten beefburger are 16 to 4 (16/4 = 4/1)
Amongst controls, the odds of having eaten beef burger to not having eaten beef burger are 8 to 12 (8/12 = 2/3
OR = odds of having eaten burger in the cases/ odds of having eaten burger in the controls
OR=(16/4)/(8/12)=6
OR = 6, means that the odds of having eaten beef burger was 6x greater in the cases than the controls
Interpretation of odds ratio:
OR > 1, higher odds of exposure for cases than controls – ie exposure is associated with the outcome - > may be a risk factor for the outcome
OR = 1, no difference in odds of exposure between cases and controls – no association between exposure and outcome OR <1, lower odds of exposure for cases than controls – ie exposure may be a protective factor for the outcome (e.g. a vaccine)
Analyse data
Index case & household control pairs to assess the effect of host-related factors
Index case & community control pairs to assess the effect of (household level) environmental factors
Odds ratios & confidence intervals using logistic regression
822 cases meeting criteria
687 with household control
816 with community control
OR = AD/BC =163 * 546/126 * 509
Strengths
Relatively quick and less expensive than cohort studies
Good for low incidence conditions/outcomes
Can measure multiple exposures
More information on exposure-outcome relationship than descriptive methods
Weaknesses
Do not objectively measure temporality (essential criterion for causality)
Subject to biases – selection & information
Subject to confounding
Can only measure one outcome
Limited generalisability – cases (and therefore controls) not necessarily representative
Odds difficult to interpret
Risk of overmatching
What types of questions would/ would not be suitable for a case control study
Epidemiology: Study of the distribution and determinants of disease frequency in human populations
Determinants:
Observation Hypothesis Test
Interpreting the test results:
1. Is there an association?
2. Is it valid?
a) Chance
b) Bias
c) Confounding
3. Is the association causal?
Observational / intervention
Descriptive / analytical
Starting point is the exposure / outcome
Study looks forwards (prospective) / looks backwards (retrospective) in time
If a research question is asking how effective a new treatment or intervention is, then the randomised controlled trial is the appropriate design to use. It is
the study design where researchers have the most control in reducing sources of bias.
A randomised controlled trial is a study design in which a sample of participants meeting pre-specified inclusion criteria are randomly assigned to two or more groups to test a specific drug treatment or other intervention.
Parallel group: Parallel arm design is the most commonly used study design. In this design, subjects are randomized to one or more study arms and each study arm will be allocated a different intervention. After randomization, each participant will stay in their assigned treatment arm for the duration of the study [Figure 5]. Parallel group design can be applied to many diseases and allows running experiments simultaneously in a number of groups, and groups can be in separate locations. The randomized patients in parallel groups should not inadvertently contaminate the other group by unplanned co-interventions or cross-overs.
Cluster RCTs:In this design, some participants start with drug A and then switch to drug B (AB sequence) in one trial arm, while subjects in other trial arm start with drug B and then switch to drug A (BA sequence). Adequate washout period must be given before crossover to eliminate the effects of the initially administered intervention. Outcomes are then compared within the same subject (effect of A vs. effect of B). The requirements are two fold. (a) The disease must be chronic, stable, and incurable and characteristics must not vary for the duration of the two study periods and the interim wash out period and (b) the effect of each drug must not be irreversible. Bioequivalence and biosimilar equivalence studies usually utilize a cross over design. The duration of follow-up for the patient is longer than for a parallel design, and there is a risk that a significant number of patients do not complete the study and drop out leading to compromised study power. Each person serves as his/her own control results in balancing the covariates in treatment and control arms. Another advantage is requirement of a smaller sample size.
Points to be factored in during cross over design
Effects of intervention during first period should not carry over into second period. In case of suspected carry over effects more complex sequences are needed which increase study duration and thus the chance of drop outs
The treatment effect should be relatively rapid in onset with rapid reversibility of effect
The disease has to be chronic, stable, and non-self-resolving. This design is usually avoided in vaccine trials because immune system is permanently affected. Patient’s health status must be identical at the beginning of each treatment period
Period effect - the changes in patient characteristics due to the effect of the first drug or extraneous factors to which the patients are exposed to over time leads to what is called the ‘period effect’.Internal and external trial environment must remain constant over time. This reduces ‘period effect’
Before the cross over is implemented, a drug free wash out period for complete reversibility of drug effect administered in the first period in a trial arm needs to be ensured so as to avoid a cumulative or substractive effect which piggy backs on to the drug administered in the second period. An accepted convention for the wash out period is five half lives of the drug involved
Stepped wedge
Crossover
Non-inferiority / equivalence
Adaptive
Factorial / Latin square
In an RCT, it is considered best practice to define a primary outcome, that is, the variable you have hypothesised will be impacted upon by the new intervention.
It is common to have multiple secondary outcomes but it is the primary one which determines the size of the study.
an RCT is only justified if allocating an exposure is ethical
an RCT is only justified if there is genuine equipoise about the treatment of interest
Remaining challenges: validity
Performance bias
researchers subvert the allocation process to allocate people to the group they would like them to be in
researchers treat people differently based on their knowledge of their group allocation (eg treatment provided, intensity of investigations, frequency of appointments, etc)
Information bias
researchers assess outcomes differently based on their knowledge of their group allocation
depending on inclusion and exclusion criteria, the trial may have more or less generalisability
eg Ruokoniemi (2014) found that only 57% of patients with diabetes would be eligible for large statin trials
Recruitment
PICO question needs to be clear at outset
what is the effect of (Intervention) on (Outcome) compared with (Comparison) among (Population)
Exclusions
increase the feasibility of the study
reduce the generalisability of the study
many trials don’t accurately report exclusion criteria (Blumle 2011)
Sample size calculation
important to ensure trial worthwhile
allocation needs to be truly random
haphazard is not sufficient; “quasi-random” is not acceptable
computerised random number generation
simple randomisation
restricted (eg block) randomisation
stratified randomisation for uncommon high risk states
minimisation methods
need to take steps to avoid subversion of randomisation
eg sealed envelopes, central telephone allocation etc
compare baseline characteristics to assess the success of randomisation
but no point in statistical testing (Assmann 2000)
Delivery of intervention
intervention needs to be sufficiently described and standardised
important for quality of the trial itself
but also for replication
easier for drugs; hard for complex interventions
standardised training, training manuals etc may help
factorial designs may be useful to test several interventions
Equivalent treatment
groups should be treated the same in every way apart from the treatment of interest
eg similar programme of visits, similar monitoring tests, similar supportive care
similar efforts at follow up
comparison group needs to get the best available treatment/care (placebo alone rarely justified
Blinding (allocation concealment)
of those allocating treatment
of those delivering treatment
of those receiving treatment
of those assessing the outcomes
easier for drugs than other interventions
may be subverted by non-obvious routes – eg taste of medicine, need for more frequent follow-up or particular investigations, occurrence of typical side-effects etc
ideally trials should assess success of blinding
Outcomes/ analyses
there should be a pre-defined statistical analysis plan
may be primary and secondary outcomes, or composite (eg MI or CHD death)
analysis resembles that of a cohort study – incidence of outcomes in each group
cumulative incidence or person-time approaches
can adjust for baseline characteristics
need to include reporting of harms
intention to treat analyses are usually more appropriate than “per-protocol”
subgroup analyses should be pre-specified, present a test for interaction and be treated with caution
the CONSORT guidelines are useful to assess reporting
Zelen design
most trials obtain consent to participate in the trial then allocate the person to a treatment/control group
but patients may be aware there is a new treatment and may want to receive it
in the Zelen method person is randomly allocated to a group and then asked for consent
may only be asked for consent if allocated the new treatment (single consent model)
may be more acceptable to clinicians and patients
however may remove blinding
Other issues for RCTs
half of trials aren’t published
interim analyses and early stopping
when are trials unnecessary?
is the RCT still the highest level of evidence?
could we be using RCTs more in the policy arena?
continuing arguments against relying exclusively on RCTS:
we need to learn to use and integrate all forms of evidence
it’s unlikely that the trial population is ever exactly like this individual patient
trials aren’t available to answer most questions
and for the primacy of the RCT for questions about effectiveness
qualitative research is not fully generalisable;
observational research has proven to be misleading when it comes to efficacy (eg the vitamin trials)
CASP appraisal questions
did the trial address a focused issue (PICO)?
was assignment randomised?
were patients and study personnel blinded?
were groups similar at the start of the trial?
were groups treated the same?
were all participants accounted for by end of trial?
how large was the treatment effect?
how precisely was it estimated?
are the results generalisable?
were the outcomes clinically important/relevant?
were harms reported?
RCTs remain one of the most powerful tools to answer questions about what works
RCTs are hard to do well
the biggest threats to RCTs are:
difficulties funding non industry trials
selective non-publication
increasing bureaucracy
under-use of RCTs to answer policy questions (eg see Test, Learn, Adapt)
Critical Appraisal Frameworks for Part A (Gemma Ward, 2013)
Summary
Brief statement of what the study is and what it’s looking at (1 sentence)
If you know what the study type is, you can make a 1 sentence statement about the design appropriateness – e.g. the question is well focussed and is considering the effectiveness of treatment X vs treatment Y, therefore an RCT is an appropriate study design. i.e. know the strengths and uses of each design
PICO (or population, exposure, control, outcome etc)
PH Importance
Why they’ve done the study (usually find this in the last paragraph of the introduction)
Internal validity
This will depend on your study type
For all papers, comment on the specific aspects of internal validity AND if they’re not discussed/ described in the paper mention this too. E.g. if a case-control and they do not describe what method of case and control selection they used then you could say something like “the authors do not describe their method of selecting cases and controls. The selection of cases/controls is important to ensure the study is valid and without such information we are unable to comment on the affect this may have had on the validity of the study” etc- discuss in what way it would have affected it (e.g. what would happen to the validity if controls had been selected from a different population to cases)……
i.e. explain why each aspect is important to the validity of the study and what affect it might have on the outcomes
Case-control
o Case-definition – explicit, valid, reliable
o Selection bias – how, where, who
o Control definition and cases – controls must be selected from a population which is representative from the one from which the cases came
o Measurement bias – definition of the exposure and how measured – includes recall bias
o Matching and that this needs to be accounted for in the analysis
o Sample size calculation
Cohort
o Selection of subjects
o Inclusion/exclusion criteria – this should be described so if it isn’t, point this out and why it’s important
o Exposure – classification and measurement
o Outcome – classification and measurement (may get misclassification bias if aware of exposure status)
o Selection bias- losses to follow-up most important in cohorts and whether those lost likely to be very different from those who stayed in the study
o Whether data on confounders have been collected
RCT
o How selected and recruited
o Clear inclusion/exclusion criteria
o Randomisation (and why so important)
o Allocation concealment including how this was done(and why so important)
o Blinding (and why so important)
o Losses to follow-up and whether those lost likely to be very different from those who stayed in the study
o CONSORT statement- flow diagram should be there
o Consent
o Sample size calculation
o Outcome assessment – definition, same for intervention and control arms, valid tool
o Comparisons between the control and intervention group i.e. did randomisation work
o If a cluster RCT, also the intraclass correlation coefficient and why it’s important - the “design effect” so ICC needs to be taken into account in the sample size calculation (need a bigger sample)
Cross-sectional
o Selection bias- sampling frame, refusers and whether different to participants, recruitment method
o Response rates – responder bias
o Information bias – recall bias, observer bias, methods of measurement of exposure and outcome (are these valid tools themselves?)
o Ecological fallacy (this may fit under results!)
Prognostic studies
o Similar to cohort studies
o Exposure –definition, classification and measurement
o Outcome –definition, classification and measurement
o Length of follow-up – is it long enough to see the outcome of interest?
o Losses to follow-up
o Confounders
o Inclusion/exclusions
o Blinding of assessors
Economic evaluation
o Effectiveness of the treatment/intervention
o if the design was an RCT then all the validity issues relating to RCTs etc
o what the outcome assessment was – valid, same between the 2 groups etc
o clear description of the intervention/treatment
o clear description of the comparison/control
o subject selection and recruitment
o Costs
o What the perspective of the evaluation is – direct health costs, societal etc
o Discounting – why you do it
o Methods of calculating costs – same for control and intervention, appropriate etc
Diagnostic studies
o Inclusion/exclusion criteria
o Recruitment/selection
o Reference test – is it well described, is it appropriate?
o STARD checklist of reporting
o Verification/work-up bias – did ALL subjects get the gold standard/ref test or only those with a positive index test (falsely inflates sensitivity)
o Spectrum bias (if disease status known at the outset)
Systematic reviews
o Search strategy – well defined and reproducible
o Databases searched and sources used – appropriate, adequate?
o Clear inclusion/exclusion criteria
o Data extraction – who, how
o Quality appraisal – who (better if >1 person, how did they resolve disagreements), how (valid, standardised?)
o Publication bias – comment on presence/lack of a funnel plot
o Flow diagram – PRISMA statement
Results (and analysis)
o What the main results are- report the main one and explain it- include 95% and p values and what these mean (e.g. “the main finding is an OR of X which means that there is an X reduction in the risk/rate etc of Y in/due to… The 95% CI is X so we can say that we can be 95% confident….. It does not include the null value of 0/1 (depending on if point estimate/rate) therefore we know the result is significant at p0.05 which is confirmed by the p-value reported of…..). If no CIs are presented, comment on this and why you need them for interpretation
o Presentation of results
o Types of analysis used
o Specifics depending on study design:
Case-control
o OR
o The fact that it’s not incidence
o X2 if categorical
Cohort
o Incidence (rate, risk, odds)
o Relative risk
o Adjusted results for confounders
RCT
o RR
o NNT
o Intention to treat analysis or not and why this is important and its effect on results
o If a cluster RCT, analysis needs to be at the group level
Cross-sectional
o Prevalence, Ors
o X2
o Correlation
Prognostic studies
o RR/hazard ratios
o Kaplan Meier
o Survival curves
o Log rank test
o Cox’s proportional hazards
Economic evaluation
o Effectiveness: RR/OR etc
o Costs
o £ difference
o ICER
o £ per QALY
o Cost-effectiveness plane
o Sensitivity analysis
Diagnostic studies
o Sensitivity
o Specificity
o PPV
o NPV
o Likelihood ratios
o Inter-rater reliability (Kappa statistic)
o ROC curves and the area under the curve (AUC)
And what these all mean!
Systematic reviews
o Main findings- OR, RR, % etc
o Meta-analysis and forest plot – what they are and what they show here
o Sensitivity analysis
o Heterogeneity – Cochran Q and I2 and what these mean
o Fixed effect or random effects model used for analysis and what these mean
External validity
o Can it be applied to your population or is it too different? – types of population, setting etc.
o Exclusions (e.g. if only looked at white British and your local population is predominantly Asian etc)
o Is it reproducible – especially for RCTs- can the intervention be applied outside of a trial setting?
o Are the main findings clinically relevant?
o Are costs presented (not relevant for all studies) If not, comment that would want these to evaluate the external validity/generalisability of the study
Discussion
o Summary statement (just a few lines) about the study – was it reasonably well conducted, any real issues with it, clinically relevant findings and whether these are of PH importance
Framework for assessing evidence
Describe authors, journal
Summarize study
• Design
• Study population
• Exposure
• Outcome
• Main result
Explanations for findings
• Bias - design-specific
cohort: losses to follow-up
case-control: selection, recall
cross-sectional: response rate
• Confounding - age, sex, socio-economic status, smoking, etc
design: restriction, matching, stratification
analysis: stratification, standardization, matched analysis, regression
• Chance - CIs, p-values
hypothesis testing or generating
clear prior hypothesis?
post hoc? (clusters)
multiple testing/subgroup analyses
trends
sample size, power (negative studies)
Non-causal association
Causal association?
• Bradford Hill
• Temporal sequence lags
control periods
• Strength of association - environmental exposures and associated risks are often low
exposure misclassification (ecological design, migration etc)
• Exposure-response
• Specificity
• Consistency
• Coherence
• Biological plausibility
• Experiment
External validity
• To the eligible population and source population
• Other populations (generalization)
Public health importance
• Clinical importance
• Population attributable risk
• Scope for prevention/amelioration cause and effect?
reversibility
• Public perceptions
Policy relevance
• Immediate steps protection
incident control /investigation team
media
• Further investigation immediate and longer term
exposure studies
risk assessment
epidemiological study
screening
surveillance
• Policy social and political context
key players
responsibilities
consequences of actions
legal vs voluntary framework
costs
D
P
I
C
O
(PICO for cohort and case-controls – Cases, Controls, Exposure, Outcome)
Suitability of design
PH importance
Internal validity
Chance – could precision be over-estimated?
Has primary outcome been specified?
Have multiple analyses been carried out – if so has ↓ precision been accounted for (bonferoni, use of more precise CI eg p<0.01)
Cluster RCTs:
Design effect
Adequate accounting for clustering – intra-cluster correlation coefficient and then measures above.
Bias – could anything be overestimating effect?
Trials:
Selection bias (failure to randomise, failure to conceal allocation)
Outcome measurement bias (blinded)
Attrition bias
Dataflow – COSORT
Quality score – JADAD score
Cohort:
Loss to follow up
Case-control:
Identification of controls – equal chance of being case, over-matching, if only 1:1 could suggest ↑ to ↑ power.
Case selection
Case assignment
Measurement bias – recall bias, classification bias, reverse causality
Cluster RCT:
As trials
Economic evaluation
Effectiveness data
Costs, discounting
Perspective
Sensitivity analysis
Prognosis:
Confounding
Adequate randomisation, subverted (non-differential misclassification -> could affect in either direction)
Baseline characteristics
Main Results
Has power calculation been done?
What is primary result – effect estimate, CI and P values
Chance of type I or II errors?
What do these results actually mean ie real benefits (NNT, ARR etc, clinical significane, how much not accounted for)
Post hoc analyses, Selective reporting bias
Association or causation?
External Validity
• Agree with previous evidence
• Generalisable to our contecxt?
• Could it be delivered in wider context (ie rolled out)
• Are outcomes sufficiently important to implement?
• Economic evaluation
Bottom line
Prognostic studies:
Inception cohort (incident is ideal)
Can be prospective (best but long) or retrospective (quicker but lack of control over measurement of data – missing, changes in methods) or a retrospective look at data from a prospective study
Bias:
Selection – Representative sample? susceptibility bias (there may be some reason why some have prognostic factor of interest and some not eg a reason why some given medication and some not, age, severity)
Loss to follow up – are reasons for loss to follow up linked to prognostic group? Censoring.
Information bias – differently measure outcomes in prognostic groups, should be blind
Confounding:
Need to identify them and adjust in multivariate analysis
Results
Should have:
Kaplan Meier – log rank to test for diff
Cox’s proportional hazard/log rank test – will give hazard ratio + CI
Test for proportionality
Univariable then multivariable analysis
Diagnostic studies:
Population
Test
Setting
Outcomes
Comparison with appropriate reference standard
Is there a reference test and how good is it? Is there blind, independent comparison with this for all subjects?
Would like to see STARD diagram (standards for reporting of diagnostic accuracy studies)
QUADSAS-2 tool for diagnostic study quality
Internal validity
Spectrum bias =when subjects used are not representative of patients the clinician would see. will give an over-estimation of effect (of sens)
Did the study include people with common presentations of the target disorder (not just severe/obvious Sx etc)
Verification bias (work up bias) = not everyone with index test gets reference test. Will ↑sens and ↓spec
did all patients get the diagnostic test and the reference standard? Accuracy of test can be over-estmated if you perform index test initially in people you know have the disease then separately in healthy people rather then performing both index and reference test in same group without knowing who has disease.
Incorporation bias = reference test includes part of index test. Will falsely inflate sens + spec
Differential reference bias = subjects get different reference tests depending on the results of the index test.
Was the reference standard applied regardless of the index diagnostic test result? Sometimes not done as unethical to do invasive test to confirm negative result, may instead apply alternative reference test of long term follow up to confirm doesn’t have condition. Need to have some confirmation that negative result is in fact negative
Observer bias –blind to results of index/reference test. Can over-estimate index test accuracy
(kappa statistic can measure inter-rater relaibaility, 1=perfect, 0.5=chance, >0.6 is OK but not great)
Results
Mention and define Sensitivity and specificity, do 2X2 if you can
CI – by chance?
How Sens + spec lead to likelihood ratios
External validity
Can I apply to my own setting
- Reproducability – can look at kappa statistic (if rubbish then it is highly subjective as a test)
- Interpretation
Do results apply to mix of patients I see?
Costs mentioned?
Acceptability of tests?
Economic studies:
Needs to be comparing at least 2 options otherwise lust a costing study
Decision tree included of alternatives
Internal validity:
• Evidence that the programme would be effective
Evidence of effectiveness – ideal is systematic review of many RCTs (weak if only one RCT or not RCT)
Contamination of ‘intervention plus usual care’
• Where the effects of the intervention identified and measured correctly (eg QALYs)
Outcomes of intervention measured and valued – is the outcome actually meaningful to the patient (ie QOL) and is there evidence of this?
• Where all important and relevant costs and consequences assessed and compared
Relevant sources, costs credible?
Perspective – Health +/- PSS (personal social service – ie publically funded) or societal (better but hard to measure – what sources have been used?)
Time horizon – adequate to include all follow up/long-term affects?
• Where costs adjusted for different times at which they occurred?
Discounting
Results:
• What is the bottom line, what units were used
• Was an incremental analysis of the costs and consequences performed?
• Was an adequate sensitivity analysis performed?
Sensitivity analysis – deterministic vs probabilistic
External validity:
• Is the programme likely to be equally effective in your setting
• Are the costs translatable to your setting
• Is it worth doing in your setting
Systematic reviews:
P
I
C
O
Suitability of design: systematic review with meta-analysis good method of increasing paower
PH importance
Right type of studies included – ideally RCTs best as highest form of evidence and minimise confounding (cohort prone to confounding)
Should follow PRISMA (preferred reporting items for systematic reviews and meta-analyses) guidance – states there must be a flow diagram with reasons for study exclusion
Internal validity
- Did the reviewers try to identify all relevant studies
o Clear inclusion criteria
o Which databases, follow up from reference lists, personal contacts/experts, unpublished, non-english (eg LILAC site)
- Did the reviewers assess the quality of the included studies –ie risk of bias in each
o Pre-determined strategy, scoring system, more than one assessor
Should have 2 independent assessors not just one assessor ‘checking’ another as likely to over-estimate agreement
Should present in table and also produce a summary statement of quality of studies overall, and also discuss meta-analysis in relation to quality - If results of studies have been combined, was it reasobable to do so?
o Heterogeneity: qualitative and statistical approach
o If heterogeneity would need to use random effects model
Chi2 and I2 statistic for heterogeneity – look at distribution of the SE of each of the studies
o Chi1 p<0.1 then heterogeneity exists
o I2 statistic quantifies the heterogeneity (proportion of effect due to heterogeneity)
Publication bias – can be assessed with funnel plot and eggers test
Results
- How are the results presented and what is the main result
o How are results expressed (eg OR, RR etc)
o How large is size of the result and how meaningful is it?
o Have results been reviewed in light of quality of study eg by strtatification?
o Can do sensitivity analysis – splitting up studies by particular characteristic, looking to see whether pooled estimates are different from overall estimate. But: is number of studies in each pool sufficient?
Moderator effects: RCTS won’t necessarily have randomised by moderator factors. There may not be sufficient studies to stratify by moderator variable/factors. Small number of studies can’t rule out type I due to moderator effects
- How precise are results
o CI
o P value - Forest plot (graphical representation of meta-analysis)
o
External validity
- Can the results be applied to the local population?
o Population/setting
o Can same intervention be delivered?
o Costs considered
- Were all important outcomes considered?
o From point of view of individual, policy makers, family/carers, wider community
- Should policy or practice change as result of review?
o Benefits outweigh harms? Meaningful effect?
Cross-sectional studies:
Can be descriptive – cross-sectional survey to Ax disease burden
Can be Analytic – to Ix association between putative RF and health outcome (nb difficulty with temporality)
Internal validity
Bias:
- Non-response
- Recall bias (misclassification)
Results
Prevalence
WHAT IS A SYSTEMATIC REVIEW?
Systematic reviews are a type of literature review that use systematic methods
to collect secondary data, critically appraise research studies, and synthesise
findings qualitatively or quantitatively. Its methodology evolved in response to
the recognition of the multitude of research evidence being published, many
studies providing contradictory findings, and the need for means by which to
systematically summarise findings from multiple studies examining the same or
similar research question.
Cochrane Collaboration was formed in 1983. Its
aim was to facilitate systematic reviews of randomised control trials across all
areas of health care, providing the best up-to-date evidence. Now, the
Cochrane Collaboration is international, with multiple disease review groups
who use standard Cochrane methods to conduct the reviews and keep them
up-to-date.
A critical stage in conducting a Cochrane review is the publishing of the review
protocol, ensuring a clear and transparent review process. Protocols and final
reviews are peer reviewed by other review group members to maintain the
highest quality standards. Cochrane reviews aim to identify, assess,
synthesise, and apply the results of randomised control trials addressing a
defined question.
Not all reviews are conducted within Cochrane, so PROSPERO was set up and
has a wider remit. It is an international perspective register of systematic
reviews. It registers reviews with research questions requiring the synthesis of
primary papers from a wider range of study designs than simply RCTs. For
1
FutureLearn
example, a systematic review may be conducted to examine outcome
measures used in studies of social inclusion of patients with severe mental
illness. Researchers aiming to conduct a systematic review can register their
review protocol on PROSPERO. When the final review is published, it is then possible for it to be compared to the protocol to assess the likelihood of reporting bias.
Characteristics of a systematic review
1.
A clearly stated set of objectives with pre-defined eligibility criteria for studies;
2.
An explicit, reproducible methodology;
3.
A systematic search that attempts to identify all studies that would meet the eligibility criteria;
4.
An assessment of the validity of the findings of the included studies, for example through the assessment of risk of bias;
5.
Finally systematic presentation, and synthesis, of the characteristics and findings of the studies
Meta-analysis –quantitative pooling of results from selected studies
Stages in a Review
Formulate an answerable question
Develop the eligibility criteria
Develop a search strategy
Run the search strategy
Screen and select studies for inclusion
Extract data from the studies & assess study quality
Analyse the data
Presentation of summary data and interpretation
How effective is a smoking prevention workshop versus a smoking prevention leaflet in preventing initiation of smoking in young people?
PICO is a very useful tool to focus your question to enable you to develop a search strategy
2.Develop the eligibility criteria- study protocol
Inclusion/exclusion criteria
Sets out exactly what you will include in your review
Studies were included/excluded if:
P = Young people
I = Smoking prevention workshop
C = smoking prevention leaflet
O = preventing initiation of smoking initiation
3. Develop a search strategy
Sources
Published literature databases
e.g. MEDLINE, EMBASE, CINAHL, PsycINFO, ASSIA, CENTRAL, Cochrane Library, LILACS, African Journals Online
https://guides.lib.strath.ac.uk/az.php
Grey literature databases
e.g. Open Grey, Social Care Online, WorldCat Dissertations and Theseses, WHO International Clinical Trials Registry Platform (ICTRP)
Websites
Depends very much on your area of research
Other methods
Reference lists of included studies and other similar systematic reviews
Contact experts in the field
Hand searching specific journals
Google (?)
Text words (or free text)
Words that appear in the record of the article within a database
Whole text or limit to title and/or abstract
Index terms (controlled vocabulary or keywords)
Words (i.e. index terms) that the database has applied to a particular record
Different in different databases (e.g. MeSH terms in Medline, EMTREE terms in EMBASE)
Building the search strategy
OR broadens the search by combing all terms together
(“Adolescent” OR “Teenager” OR “Child” OR …)
AND narrows the search
({young person}) AND ({smoking prevention workshop})
NOT also narrows the search
Theoretically should exclude irrelevant records but can end also exclude relevant records
Refrain from using this => eligibility criteria
Remember PICOT/PICOS
Specific search strategies have been developed for particular study designs (ISSG Search Filter Resource)
Can be integrated with your specific topic search terms
Challenges in the search strategy
Database Bias: No single database likely to contain all published studies on a given subject
Publication Bias: Selective publication of articles that show positive treatment of effects/statistical significance
Search for unpublished studies “by hand”
English-language bias: Exclude papers published in languages other than English
Citation bias: Studies with significant or positive results are referenced in other publications, compared with studies with inconclusive or negative findings
. Run the search strategy
Run the search in each of the databases
Database-specific search interface
Imported articles from each search into reference management system (e.g. Endnote)
Apply the screening and selection process (i.e. PRISMA)
5. Screen and select studies for inclusion
Data Management
Reference management system (e.g. Endnote)
Directly download references and abstracts from databases
Useful for managing and sharing search results
Removes duplicates
Save excluded articles into separate files at each stage
Cite-Whilst-You-Write
Calculating selection reliability
Each article will be judged as ‘include’ (1) or ‘exclude’ (0)
An SPSS file will contain two columns of data (0 or 1) and one row of data for each article
Calculate Cohen’s kappa
https://www.youtube.com/watch?v=M8pWMpJqsGU (it talks about sensitivity and specificity – ignore this)
Kappa scores for each stage of screening/selection
Kappa between 0.61–0.80 = Substantial agreement
Kappa between 0.81–0.99 = Almost perfect agreement
0.7 and above is what we aim for
6a. Extract data from the studies
Design a bespoke data extraction tool
Depends on your research question(s)
Only extract what you need
Population
Intervention
Comparator
Outcomes (of interest)
Many others that are not relevant
Other (e.g. study design)
6b. Assess study quality
Beware of the term “quality”!
Used in the context of:
Reporting quality
Study Quality
Bias
Reporting quality
How well is a study reported?
Tools specifically for reporting quality (e.g. PRISMA, CONSORT, EQUATOR)
Poor reporting can hinder assessment of study quality and risk of bias
Well reported study is not necessarily high quality or free from bias
Need to look beyond reporting!
“extent to which its design, conduct, analysis, and presentation were appropriate to answer its research question” (Higgins et al., 2011)
“Highest possible standards”
Not well defined!
Risk of bias
Extent to which study results can be believed
Internal validity
Biased study will produce distorted findings
Over- or underestimate effect sizes
Even if highest possible standards maintained
e.g. May not be feasible to blind/randomise participants
Quantitative studies: Measuring Risk of Bias
Domain based assessment
What is the risk of different forms of bias applicable to the study (e.g. selection bias, blinding)?
Different study designs need different tools
Higgins ROB
Newcastle-Ottawa
Effective Public Health Practice
Qualitative studies: Measuring Trustworthiness
Assess for methodological rigour across the following domains:
Transparency of data collection and methods of analysis
Respondent validation
Indications of typicality of views and presentation of verbatim quotes
Triangulation of data
Multiple researchers to analyse data
Identification of deviant cases
Reflexivity
Type of synthesis
Narrative (qualitative): Textual approach using words and text to summarise and explain the results of included studies
Meta-analytic (quantitative)
Meta-synthesis (qualitative)
8. Presentation of summary data and interpretation
Protocol
Background
Objectives
Pre-determined selection criteria
Planned search strategy
Planned data abstraction
Proposed method of synthesis of findings
Usually an advisory group is established
http://www.crd.york.ac.uk/PROSPERO/
Search bias
Inclusion criteria bias
Selector bias
Bias in scoring study quality
Extractor bias
Recording error bias
Outcome reporting bias
Language bias
Country/region bias
Temporal bias
Database bias
Systematic review of quantitative studies
Usually of intervention studies (may also be diagnostic accuracy, epidemiology)
Study data may/may not be pooled in a meta-analysis
Data can transformed to a single standard effect measure (e.g. mean difference, odds ratio) and presented narratively
Systematic reviews of qualitative studies
Often look at views and experiences or try to explore a phenomenon
Meta-ethnography: Aims to develop new theory
Meta-synthesis: Aims to integrate data
Thematic analysis: identification of recurrent/important themes
Systematic reviews of quantitative and qualitative data
Often used to better understand what works for who and why
Mixed-methods reviews: intervention and views and experiences studies collected and analysed separately before being synthesis together
Realist reviews: Focus on casual mechanisms or theories that underpin interventions
Advantages of systematic reviews
Reduce bias
Replicability
Resolve controversy between conflicting studies
Identify gaps in current research
Provide reliable basis for decision making
Limitations
Complex and time-consuming
May create uncertainty in the decision making process (Hemingway & Brereton, 2009).
Methodological weaknesses
Heterogeneity of studies
Small sample size (may equal large 95% CI)
Side effects or cost = no treatment despite effectiveness
Association vs. causationBradford-Hill criteria
Strength of association
Dose-response relationship
Temporality
Consistency
Biological plausibility
Reversibility
Strength of association
Effect Size
Effect size is a statistical concept that measures the strength of the relationship between two variables on a numeric scale. For instance, if we have data on the height of men and women and we notice that, on average, men are taller than women, the difference between the height of men and the height of women is known as the effect size. The greater the effect size, the greater the height difference between men and women will be. Statistic effect size helps us in determining if the difference is real or if it is due to a change of factors. In hypothesis testing, effect size, power, sample size, and critical significance level are related to each other. In Meta-analysis, effect size is concerned with different studies and then combines all the studies into single analysis. In statistics analysis, the effect size is usually measured in three ways: (1) standardized mean difference, (2) odd ratio, (3) correlation coefficient.
Types of effect size
Pearson r correlation: Pearson r correlation was developed by Karl Pearson, and it is most widely used in statistics. This parameter of effect size is denoted by r. The value of the effect size of Pearson r correlation varies between -1 to +1. According to Cohen (1988, 1992), the effect size is low if the value of r varies around 0.1, medium if r varies around 0.3, and large if r varies more than 0.5. The Pearson correlation is computed using the
Meta-analysis: Statistical methods (meta-analysis) may or may not be used to analyze and summarize the results of the included studies. Meta-analysis refers to the use of statistical techniques in a systematic review to integrate the results of included studies.
Cochrane Collaboration
Strengths of meta-analysis
Imposes a discipline on the process of summarising research findings
Represents findings in a more differentiated and sophisticated manner
Identifies relationships across studies
Protects against over-interpretation of differences across studies
Handles large numbers of studies
Weaknesses of meta-analysis
Requires considerable resource
Mechanical aspects prevent capturing more qualitative distinctions between studies
“Apples and oranges” criticism
Most include “blemished” studies
Selection bias poses a continual threat
Negative and null finding studies that cannot be found
Negative or null findings that were not reported
Analysis of between study differences fundamentally correlational
When to do a meta-analysis?
All meta-analyses should be based on a Systematic review
Not all Systematic reviews will/can result in a meta-analysis
Number of studies
Heterogeneity
Type of data (i.e. qualitative)
Features of forest plot
= line of ‘no effect’
If 95% CI crosses line = non-significant (p > .05)
Horizontal axis = measure
RD, MD, SMD
OR, RR
For each trial:
= Point estimate (size indicates weighting)
= 95% CI
= combined estimate and 95% CI
= pooled point estimate
effect size
Effect size serves as the “DV”
Represents the magnitude and direction of the relationship of interest
Any standardized index can be an “effect size” (e.g. Standardized mean difference, odds-ratio)
Comparable across studies
Estimates and confidence intervals
Individual studies and meta-analyses reported with a point estimate and confidence interval.
Point estimate: The best guess of the magnitude and direction of the experimental condition compared with the comparator condition
Confidence interval: The uncertainty inherent in the point estimate, and presents the range of values within which we can be reasonably sure that the true effect actually lies.
Dichotomous data
Participants who had the relevant outcome in each condition
Summary statistic:
Odds/Risk ratio
Risk
Refers to the probability of that event occurring.
RD = Risk difference
RR = Relative risk
Odds
Refers to the ratio of events to non-events
OR = Odds ratio
Continuous data
Mean and standard deviation of the effect for each condition
Summary statistic:
Mean difference
Calculating event rates
Experimental event rate (EER) risk = number of events/total number of people in the condition
25/100 = 0.25
Control event rate (CER) risk = number of events/total number of people receiving the control
45/100 = 0.45
Risk difference
RD = absolute change in risk attributable to the experimental condition
1 ≤ RD ≥ -1
RD = 0 when experimental condition has an identical effect to comparator condition
RD < 0 when experimental condition reduces risk
RD > 0 when experimental condition reduces risk
RD = EER - CER
EER = 25/100 = 0.25
CER = 45/100 = 0.45
RD = -0.2
Reduction of risk by 20%
RR = Risk of the event in one group divided by the risk of the event in the other group (ratio)
RR ≥ 0
RR = 1, experimental condition has same effect as comparator
RR < 1, experimental condition reduces the risk of the event
RR > 1, experimental condition increases the risk of the event
RR = EER/CER
EER = 25/100 = .25
CER = 45/100 = .45
RR = 0.56
44% less in the experimental condition experience the event
OR = Odds of the event occurring in one group divided by the odds of the event occurring in the other group
OR ≥ 0
OR = 1, experimental condition has same effect as the comparator
OR < 1, experimental condition reduces the odds of the event
OR > 1, experimental condition increases the odds of the event
OR= [EER/(1-EER)]/[CER/(1-CER)]
EER = 0.25/0.75 = 0.33
CER = 0.45/0.55 = 0.82
OR = 0.40
The odds of the event occurring is 40% lower in the experimental condition
95% CI
Often mis-interpreted as indicating a range within which we can be 95% certain the true effect lies
Actually: If a study were repeated infinite number of times, and on each occasion a 95% confidence interval calculated, then 95% of these intervals would contain the true effect.
Width of CI largely depends on sample size
Larger samples ≈ more precise estimates of effects = narrower 95% CI
Averaging studies
A simple average gives each study equal weight
Some studies are more likely to give an answer closer to the ‘true’ effect than others
Weighting studies
More weight given to studies which give more information
More participants (key determinant)
More events
Lower variance
Weight is closely related to the width of the study confidence interval
Wider confidence interval = less weight
Heterogeneity
Variation between the studies’ results
Differences between studies results from
Participants
Manipulations
Outcomes
Quality and methodology
Identifying heterogeneity
Two main approaches to identifying statistical heterogeneity
Examine overlap of 95% CI’s in forest plot
e.g. If 95% CI’s of two studies don’t overlap, likely to be more variation between the study results than you would expect by chance
If there are lots of studies suspect heterogeneity!
Perform “chi-square” test (Q)
Quantifying heterogeneity I square
I2=100xQ-df/Q
Q = heterogeneity χ2 statistic
𝐼^2: the proportion of total variability explained by heterogeneity, rather than by chance
𝐼^2 threshold values(Higgins et al., 2003)
25%, low heterogeneity
50%, moderate heterogeneity
75%, high heterogeneity
Dealing with heterogeneity
Do not pool at all
Ignore heterogeneity: fixed effects model
Allow for heterogeneity: random effects model
Explore heterogeneity: meta-regression
Fixed effects model
Assumes every study evaluating a common “fixed” manipulation effect
Effect of manipulation (allowing for chance) is same in all studies.
If heterogeneity exists and is ignored:
CI will be too narrow
Increased possibility of finding an effect that does not exist
Random effects model
Assumes that true manipulation effects may differ for each study
No single number to estimate in the meta-analysis, but rather a distribution of numbers
Many possible real values for the manipulation (depending on dose, duration, etc etc).
Each trial estimates its own real value
Meta analysis and Funnel Plots
Reminder of some things:
• I2 is interpreted: <40% low (fixed); >60% high (random)
• I2 is included in a qualitative judgement.
• Fixed effects assume that the individual specific effects are correlated with the independent variables and consistent over time. Random effects assume uncorrelated and vary.
• Heterogeneity is both statistical and methodological.
• Sub-group sensitivity analyses help to unpick heterogeneity.
Drawing a funnel plot
odds ratio on x axis and population size on the y axis
1. Calculate the width of each confidence interval
2. Divide this by 3.92 (2*1.96) to get the SE
3. Plot this SE upside down on the y axis.
4. Plot the OR on the x-axis
5. Draw the studies on
6. Thick line for pooled effect estimate
7. Draw the funnel
8. Clear title and labels
9. Interpret. Small Study Effect could be:
o Publication bias
o Smaller studies less rigorous
o Trials reported more than once
o Studies not truly comparable (heterog.)
o ‘True’ treatment effect improved over time so ‘true’ mean has shifted
Egger Test
• The Egger Test is testing for Funnel plot asymmetry
• It is essentially a linear regression assessing whether Y intercept=0 (think diagrammatically)
• Should be at least 10 studies in the meta-analysis
• Assumes all studies independent (no sub-analyses);
• Recommended by Cochrane, caution recommended due to low power
• Alternatives including Begg
Egger Test
• The Egger Test is testing for Funnel plot asymmetry
• It is essentially a linear regression assessing whether Y intercept=0 (think diagrammatically)
• Should be at least 10 studies in the meta-analysis
• Assumes all studies independent (no sub-analyses);
• Recommended by Cochrane, caution recommended due to low power
• Alternatives including Begg
Funnel plots
In the funnel plot the X-axis represents the mean result (that may be an odds or risk ratio, or a percent difference) and the Y-axis shows the sample size or an index of precision (Egger et al., 1997). Sterne and Egger (2001) recommended the inverse standard error for the Y-axis and the log of the odds ratio for the X-axis. (The symmetry of the plot may vary depending on whether sample size or inverse standard error is used as an index of precision Tang and Liu, 2000.)
Because there are usually more small than large samples, the points that represent each mean value are widely spread at the base and narrow as they move to the top, thus resembling an inverted funnel or a fir tree (Fig. 36.2).
12.12 Methods to Assess Publication Bias
Publication bias (also called reporting bias) refers to absence of information caused by either nonpublication of entire studies (missing studies), or selective outcome reporting in published studies based on their results (missing outcomes). The latter problem is also called outcome reporting bias. Studies that report a statistically significant result (P<0.05) are more likely to be published, and published sooner, than studies that do not show a statistically significant result (P>0.05). Similarly, selective outcome reporting frequently occurs, which is biased and usually inconsistent with study protocols. This and other types of scientific misconduct are discussed in Chapter 2. Publication bias is particularly problematic in RCTs, as it leads to inflated and unreliable results regarding the benefits of different treatments. Identification and control of publication bias is essential to preserve the validity of a systematic review.
Missing studies may be found in protocol registries; however, the time period between publication of the protocol and publication of the study report varies, and in some cases can be quite long. Missing outcomes may be detected by comparing the published report with the protocol. Although publication of a protocol is now required for most RCTs before study recruitment starts (a prerequisite for publication in many journals), this practice is not equally common for observational studies. For this reason, it is difficult to verify the existence of missing outcomes in observational studies.
The main graphical tool used to identify publication bias in the form of missing studies is the funnel plot,19 which specifically targets small study bias, in which small studies tend to show larger estimates of effects and greater variability than larger studies. The funnel plot is a scatter plot with effect estimates on the x-axis and some measure of study precision (or study size) on the y-axis. The standard errors of the effect estimates are commonly used. The scale of the y-axis is reversed such that studies with low precision are placed at the bottom and studies with greater precision at the top of the plot. If the effect measure is a ratio measure (such as RR), the x-axis is log transformed. In the absence of missing studies, the shape of the scatter plot should resemble a symmetrical inverted funnel with a wide base (consisting of small studies with large effect estimate variability) and a narrow top (consisting of large studies with small effect estimate variability). Presence of large “holes”—most often seen close to the bottom—or asymmetry in the plot indicates publication bias, though these holes may have other causes, such as study heterogeneity.19 An ideal funnel plot based on 17 hypothetical studies with no signs of asymmetry or holes is shown in Figure 12.7. The dotted vertical line represents the estimated common effect.
The funnel plot is often used to assess bias (Ferrer, 1998; Tang and Liu, 2000; Song et al., 2002; Souza et al., 2007). Prominent causes of bias are publication bias, with studies giving positive results more frequently submitted for publication and more likely to be published (Roehr, 2012); English language bias—negative studies are less likely to be published in English language journals, although this has not always been observed; and citation bias—studies with positive conclusions are cited more frequently and thus are more easily identified and incorporated in the database. Bias may be deliberate, as occurs when a Pharmaceutical Company deliberately withholds studies that do not favor their product (Eyding et al., 2010). The basis of assessing bias is that if all the studies give random assessments of the same unbiased mean value, the plot should be symmetrical. If the studies are biased, for example, by having too few small studies with positive results and large effect sizes, then the funnel plot becomes asymmetrical with a deficit near the bottom
Concerning meta-analysis
•
Fixed-effects analysis – assume no or low heterogeneity and a single underlying effect • Fixed-effects analysis – usually weight the results by the reciprocal of their variance • Random-effects analysis – allow for heterogeneity, to give an average treatment effect across studies • Random-effects analyses – incorporate heterogeneity as well as study variance into the weights given to individual studies • Publication bias may be suggested by funnel plot asymmetry