Basic statistics (MRCP) Flashcards

(140 cards)

1
Q

A study is evaluating the effect of agomelatine on postnatal depression at
a mother and baby unit. Which one of the following should be considered
when assessing the internal validity of this study?
A. Benefi ts of agomelatine in major depression outside the postpartum period
B. The degree to which the subjects adhered to the study protocol
C. The cost of using agomelatine compared with standard care
D. Consistency of the reported outcome in comparison with previous studies
E. Benefi ts of agomelatine in postpartum depression when used at an outpatient service

A

B. Internal validity is the degree to which a study establishes the cause-and-effect relationship
between the treatment and the observed outcome. External validity is the degree to which
the results of a study becomes applicable outside the experimental setting in which the study
was conducted. In other words, external validity refers to generalizability of study results while
internal validity refers to rigorousness of the research method. The benefi t of agomelatine
in different populations (choices A and E) refers to external validity; the cost of the drug
and consistency of results obtained from different studies are related to applicability of the
intervention in a clinical setting. Assessment of adherence to study protocol is one of many ways
of analysing the quality of an intervention trial.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A new clinician-administered test for assessing suicidal risk is studied in a
prison population in Canada, where a high suicide rate of 1 in 25 has been
recorded. Which of the following indicate that this test is NOT suitable for
your clinical population?
A. The positive predictive value is 80%
B. The likelihood ratio for a positive test is 14
C. The prevalence of suicide in your clinical sample is 1 in 890
D. The inter-rater reliability (kappa) of the test is 0.8
E. The literacy rate of the prison population is very low but comparable with your clinical
sample

A

C. Having a high positive predictive value, a likelihood ratio more than 10, and good interrater
reliability as measured by kappa are desirable properties of an instrument. But when the
same instrument is applied to a population with much lower prevalence of suicide (the studied
phenomenon), the post-test probability decreases substantially. Post-test probability is a measure
of positive predictive value in the target population; it depends on pretest probability, i.e. the
prevalence and likelihood ratio.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
A new rating scale being evaluated for anxiety has a sensitivity of 80% and
specifi city of 90% against the standard ICD-10 diagnosis. The likelihood ratio
of a positive result is
A. Nearly 2
B. Nearly 0.2
C. 0.08
D. 8
E. 0.5
A

D. The likelihood ratio of a positive test (LR+) is the ratio between the probability of a
positive test in a person with disease and the probability of a positive test in a person without
disease. It can also be expressed as
LR+ = sensitivity/(1 – specifi city)
Here, sensitivity = 0.8; specifi city = 0.9.
Hence LR+ = 0.8/1 – 0.9 = 8.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A pharmaceutical company developed a new antidepressant ‘X’. They
conducted a randomized double-blind placebo controlled trial of the drug.
The study had two arms: an active medication arm and a placebo arm.
Each arm had 100 subjects. Over a 4-week period, a 50% drop in Hamilton
depression scale (HAMD) scores were seen in 40 subjects in the active
medication arm, while a similar drop was seen only in 20 subjects in the
placebo arm. What is the number needed to treat (NNT) from this trial for
the new antidepressant?
A. 1
B. 2
C. 3
D. 4
E. 5

A

5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

During the same placebo controlled trial described in question 4, 20% of
people on X developed active suicidal ideas, while only 10% of patients on
placebo developed the same side-effect. What is the number needed to
harm (NNH) associated with the suicidal ideas from the trial data?
A. 5
B. 10
C. 15
D. 20
E. 25

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The prevalence of depression in patients with mild cognitive impairment
is 10%. On applying a depression rating scale with the likelihood ratio of a
positive test (LR+) equal to 10, a patient with mild cognitive impairment
becomes test positive. The probability that this patient is depressed is equal
to
A. 15%
B. 32%
C. 52%
D. 85%
E. 100%

A

C. This question tests one’s ability to calculate post-test probability from likelihood ratios.
The probability of having a disease after testing positive with a diagnostic test depends on
two factors: (a) the prevalence of the disease, (b) the likelihood of a positive test result using
the instrument. It is important to remember that baseline prevalence of a disease for which a
diagnostic instrument is being tested is taken as the pretest probability.
So pretest probability = 10%
Now, post-test odds = likelihood ratio × pretest odds
From a given probability odds can be calculated using the formula
odds = (probability)/(1 – probability)
Here pretest odds = (10%)/(1 – 10%) = 10/90 = 1/9.
Now post-test odds = likelihood ratio × pretest odds
= 10 × 1/9 = 10/9
Using the formula probability = odds/(1 + odds)
post-test probability = (10/9)/[1 + (10/9)] = 10/19 = 52.3%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A multi-centre double blind pragmatic randomized controlled trial (RCT)
reported remission rates for depression of 65% for fl uoxetine and 60% for
dosulepin. The number of patients that must receive fl uoxetine for one
patient to achieve the demonstrated benefi cial effect is
A. 60
B. 20
C. 15
D. 10
E. 5

A

B. This question tests one’s knowledge of the NNT (number needed to treat) concept. NNT
is given by the inverse ratio of the absolute benefi t increase (ABI) in therapeutic trials. ABI is
the difference between benefi t due to experimental intervention and the compared standard/
placebo. Here it is given by 65% – 60% = 5%. If ABI = 5%, NNT = 100/5 = 20.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In a randomized double-blind trial two groups of hospitalized depressed
patients treated with selective serotonin reuptake inhibitors (SSRIs) are
evaluated for benefi cial effects on insomnia of trazodone vs temazepam.
Which of the following is NOT an important factor when evaluating the
internal validity of results obtained from the above study?
A. Baseline differences in antidepressant therapy between the two groups
B. The method used to randomize the sample
C. Setting in which the study takes place
D. Sensitivity of the insomnia scale to pick up changes in severity
E. Inclusion of the data in fi nal analysis from patients who have dropped out

A

C. Threats to internal validity of an experimental study include confounding, selection bias,
differential attrition, and quality of measurement. Having a signifi cant difference in baseline SSRI
therapy could explain differential outcomes in the trazodone vs temazepam groups. Similarly,
poor randomization may lead to selection bias and infl uence the differences in outcome. Failure
to account for differential drop-out rates may spuriously infl ate or defl ate the difference in
outcome. Using a scale with poor sensitivity to change will reduce the magnitude of differences
that could be observed. Given both groups are recruited from the same setting (hospital), this
must not infl uence validity; on the other hand, this might well infl uence generalizability of results
to the non-hospitalized population (external validity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

While adapting the results of an RCT into clinical practice, a clinician wants
to calculate the new NNT values for his own clinical population using the
results of the RCT. Apart from the reported RCT which of the following is
needed to carry out the calculation of the new NNT?
A. The expected rate of spontaneous resolution of the treated condition in the clinical
population
B. The size of the clinical population
C. The case fatality rate for the treated condition in the clinical population
D. Lifetime prevalence of the disease in the clinical population
E. All of the above

A

A. Published RCTs may quote impressive outcomes in terms of NNT. Applying principles of
evidence-based medicine, one must check for the internal validity of a study and the degree of
generalizability before adapting the results to clinical practice. One must also be aware of the
fact that though clinically more meaningful, NNTs quoted in RCTs may not translate to the same
extent in actual clinical practice. One way of appreciating the usefulness of a newly introduced
drug is to calculate the NNT for one’s own clinical population (target population). To enable
this one may estimate the patient expected event rate (PEER), which is given by the expected
spontaneous resolution rate or the response rate for an existing standard treatment. This can
be obtained from the local audit data or clinical experience. The product of PEER and relative
benefi t increase from the published RCT gives the new absolute benefi t increase (ABI new)
value for the target population. The inverse of the new ABI gives the new NNT for the target
population. The disease prevalence rate or absolute size of the target population has no effect on
the new NNT.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In an attempt to ensure equivalent distribution of potential effect-modifying
factors in treating refractory depression, a researcher weighs the imbalance
that might be caused whenever an individual patient enters one of the two
arms of the study. Every patient is assigned to the group where the least
amount of imbalance will be caused. This method is called
A. Stratifi cation
B. Matching
C. Minimization
D. Randomization
E. Systematic sampling

A

C. In most treatment trials interventions are allocated by randomization. Block
randomization and stratifi ed randomization can be used to ensure the balance between groups
in size and patient characteristics. But it is very diffi cult to stratify using several variables in a
small sample. A widely acceptable alternative approach is minimization. This method can be used
to ensure very good balance between groups for several confounding factors irrespective of the
size of the sample. With minimization the treatment allocated to the next participant enrolled in
the trial depends (wholly or partly) on the characteristics of those participants already enrolled.
This is achieved by a simple mathematical computation of magnitude of imbalance during each
allocation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
The effectiveness of an intervention is measured by using pragmatic trials.
Which trial design is normally employed when carrying out a pragmatic
trial?
A. RCT
B. Meta analysis
C. Systematic review
D. Cohort study
E. Case series
A

A. RCTs provide high-quality evidence for or against proposed interventions. But RCTs
have a major limitation in terms of generalizability. This is because the trials are conducted in a
somewhat artifi cial experimental setting that is different from clinical practice. So RCTs have
high internal validity due to rigorous methodology but poor external validity. Pragmatic RCTs are
a type of RCTs introduced with the intention of increasing external validity, i.e. generalizability
of RCT results. But this takes place at the expense of internal validity. In pragmatic RCTs the
trial takes place in a setting as close as possible to natural clinical practice, i.e. the inclusion and
exclusion criteria are less fastidious, often ‘treatment as usual’ is employed for comparisons,
instead of placebos and real world, functionally signifi cant outcomes are considered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The probability of detecting the magnitude of a treatment effect from a
study when such an effect actually exists is called
A. Validity
B. Precision
C. Accuracy
D. Power
E. Yield

A

D. The power of a study refers to the ability of the study to show the difference in outcome
between studied groups if such a difference actually exists. The term power calculation is often
used while referring to sample size estimation before a study is undertaken. In order to carry out
power calculation one has to know the expected precision and variance of measurements within
the study sample (obtained from a literature search or pilot studies), the magnitude of a clinically
signifi cant difference, the certainty of avoiding type 1 error as refl ected by the chosen
p value, and the type of statistical test one will be performing. There is no point in calculating the
statistical power once the results of a study are known. On completion of trials, measures such as
confi dence intervals indicate the power of a study and the precision of results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Power is the ability of a study to detect an effect that truly exists. Power can
also be defi ned as
A. Probability of avoiding type 1 error
B. Probability of committing type 1 error
C. Probability of committing type 2 error
D. Probability of detecting a type 2 error
E. Probability of avoiding type 2 error

A

E. Power refers to the probability of avoiding a type 2 error. To calculate power, one needs
to know four variables.
1. sample size
2. magnitude of a clinically signifi cant difference
3. probability of type 1 error (signifi cance level from which p value is derived)
4. variance of the measure in the study sample.
Underpowered trials are those that enrol too few participants to identify differences between
interventions (arbitrarily taken as at least 80% of the time) when such differences truly exist.
Underpowered RCTs are prone to false-negative conclusions (type 2 errors). Somewhat
controversially, underpowered trials are considered to be unethical, as they expose participants
to the ordeals of research without providing an adequate contribution to clinical development

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A new diagnostic test detects 60 out of 100 schizophrenia patients correctly.
It does not wrongly diagnose anyone in a sample of 100 controls.
The positive predictive value of this test is
A. 50%
B. 60%
C. 40%
D. 100%
E. 0%

A

D. It is useful to construct a 2 × 2 table for calculating properties of reported diagnostic
tests. From the given information we can draw the following:
Now, positive predictive value = true positive/total positive = 60/60 = 100%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A new diagnostic test detects 60 out of 100 schizophrenia patients correctly.
It does not wrongly diagnose anyone in a sample of 100 controls.
How sensitive is this test in detecting schizophrenia?
A. 60%
B. 40%
C. 100%
D. 90%
E. 0%

A

a

Sensitivity = true positive/total diseased (schizophrenia subjects) = 60/100 = 60%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A new diagnostic test detects 60 out of 100 schizophrenia patients correctly.
It does not wrongly diagnose anyone in a sample of 100 controls. How
accurate is this test in detecting schizophrenia?
A. 100%
B. 80%
C. 60%
D. 40%
E. 70%

A

b

Accuracy = all true observations/total population studied = (100 + 60)/200 = 160/200 = 80%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

A new diagnostic test detects 60 out of 100 schizophrenia patients correctly.
It does not wrongly diagnose anyone in a sample of 100 controls. What
are the chances that the text will turn negative in your next patient with
schizophrenia?
A. 100%
B. 70%
C. 60%
D. 40%
E. 30%

A

D. This question asks the candidate to calculate the probability of a negative test in
someone with the disorder – false-negative rate (FNR)
This is given by FNR = false negative/total diseased = 40/100 = 40%
FNR is same as (1 – sensitivity); similarly false-positive rate (FPR) is same as (1 – specifi city).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which of the following properties of a screening test increases with
increasing disease prevalence in the population?
A. Negative predictive value
B. Sensitivity
C. Specifi city
D. Accuracy
E. Positive predictive value

A

E. Sensitivity, specifi city, and accuracy are measures that refl ect the characteristics of the
test instrument. These measures do not vary with changes in the disease prevalence. Positive
predictive value increases while negative predictive value decreases with rising population
prevalence of the disease studied. The prevalence can be interpreted as the probability before the
test is carried out that the subject has the disease, known as the prior probability of disease. The
positive and negative predictive values are the revised estimates of the same probability for those
subjects who are positive and negative on the test, and are known as posterior probabilities.
Thus the difference between the prior and posterior probabilities is one way of assessing the
usefulness of the test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Two observers are rating MRI scans for the presence or absence of white
matter hyperintensities. On a particular day from the records, they are
observed to have an agreement of 78%. If they could be expected to agree
50% of the time, even if the process of detecting hyperintensities is by pure
chance, then the value of kappa statistics is given by
A. 50%
B. 44%
C. 56%
D. 78%
E. 22%

A

C. Agreement between different observers can be measured using the kappa (κ) statistic
for categorical measures such as the one highlighted in this question (presence or absence of
MRI hyperintensities). Kappa is a measure of the level of agreement in excess of that which would
be expected by chance. It is calculated as the observed agreement in excess of chance, expressed
as a proportion of the maximum possible agreement in excess of chance. In other words
kappa = the difference between observed and expected agreement/(1 – expected agreement).
In this example, the observed agreement is 78%. The expected agreement is 50%. Hence
kappa = (0.78 – 0.50)/(1 – 0.50) = 0.28/0.50 = 56%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

The number of days that a series of fi ve patients had to wait before starting
cognitive behavioural therapy (CBT) at a psychotherapy unit is as follows:
12, 12, 14, 16, and 21. The median waiting time to get CBT is
A. 15 days
B. 12 days
C. 14 days
D. 21 days
E. 13 days

A

C. The median is calculated by placing observations in a rank order (either ascending
or descending) and picking up the most central value. If the number of observations is even
(multiples of two), then the median is taken as the arithmetic mean of the two middle values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q
The number of days that a series of fi ve patients had to wait before starting
CBT at a psychotherapy unit is as follows: 12, 12, 14, 16, and 21. The mean
waiting time to get CBT is
A. 15 days
B. 12 days
C. 14 days
D. 21 days
E. 13 days
A

A. The arithmetic mean is calculated from the sum of all individual observations divided
by the number of observations. Here the number of observations = 5. The sum of individual
observations = 12 + 12 + 14 + 16 + 21 = 75. The average = 75/5 = 15.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The most clinically useful measure that helps to inform the likelihood of
having a disease in a patient with positive results from a diagnostic test is
A. Accuracy
B. Positive predictive value
C. Sensitivity
D. Specifi city
E. Reliability

A

B. The probability that a test will provide a correct diagnosis is not given by the sensitivity
or specifi city of the test. Sensitivity and specifi city are properties of the test instrument – they
are not functions of the target population/clinical sample. On the other hand, positive and
negative predictive values are functions of the population studied; they provide much more
clinically useful information. Predictive values observed in one study do not apply universally.
Positive predictive value increases with increasing prevalence of the disease; negative predictive
value decreases with increasing prevalence. Sensitivity and specifi city, being properties of the
instrument used, do not vary with prevalence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Zarkin et al., 2008 reported the cost-effectiveness comparison of naltrexone
and placebo in alcohol abstinence. The mean effectiveness measured as
percentage days of abstinence was nearly 80% for naltrexone group while
it was 73% for the placebo group. The mean cost incurred for the placebo
group was $400 per patient. The naltrexone group incurred a cost of
680 per patient. How much additional cost needs to be spent per patient
for each percentage point increase in total days of abstinence when using
naltrexone compared with placebo?
A. $40
B. $50
C. $7
D. $500
E. $2

A

A. The incremental cost-effectiveness ratio (ICERAB) can be defi ned as the difference in
cost (C) of interventions A and B divided by the difference in mean effectiveness (E), (CA – CB)/
(EA – EB), where intervention B is usually the placebo or standard intervention that is compared
with intervention A. In this example, the difference in costs = $680 – 400 = $280. The difference
in effectiveness as measured by percentage days of abstinence is 80 – 73% = 7%. Hence
ICER = 280/7 = $40 per patient per percentage point of days of abstinence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Two continuous variables A and B are found to be correlated in a nonlinear
fashion. All of the following can be considered as suitable statistical
techniques for examining this relationship except
A. Curvilinear regression
B. Logistic regression
C. Multiple linear regression
D. Polynomial regression
E. Exponential regression

A

C. When the relationship between two continuous variables is plotted in a graph,
the resulting distribution may be a straight line or a curve. If the relationship between the
independent (X) variable and dependent (Y) variable appear to follow a straight line, then linear
regression can be constructed to predict the dependent variable from the independent variable.
Otherwise, one can resort to one of the following methods:
1. Attempting to transform the available data to straighten the curved relationship.
2. One can try curvilinear regression, e.g. logarithmic regression, exponential regression, and
trigonometric regression.
3. Unless there is a theoretical reason for supposing that a particular form of the equation as
mentioned above, such as logarithmic or exponential, is needed, we usually test for
non-linearity by using a polynomial regression equation.
4. Multiple linear regression is often used to examine the linear relationships when there is more
than one independent variable infl uencing a dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
A drug representative presents data on a new trial. The data show that drug A prevents annual hospitalization in 20% more dementia patients than placebo. You are very impressed but your consultant wants to know how many patients you need to treat to prevent one hospitalization. The correct answer is A. 20 B. 5 C. 80 D. 1 E. 100
B. The answer to this question can be found by calculating the number needed to treat (NNT). The absolute increase in benefi t (ABI) is given by the difference in outcome between two groups. This is 20% as quoted by the drug representative. Hence NNT = 100/20 = 5. You need to treat fi ve patients with the new drug to prevent one annual hospitalization. How small must the NNT be to be clinically impressive? This depends on the availability of other interventions and their NNTs, incremental cost of the proposed intervention, and tolerability of the intervention. The last one can be partly deciphered by calculating the number needed to harm for a notable side-effect of the intervention
26
A new study attempts to evaluate the benefi ts of regular exercise in preventing depression compared with unmodifi ed lifestyle in a sample of 80 healthy elderly men. Which of the following is not possible in such a study design? A. Randomized trial B. Allocation concealed trial C. Prospective trial D. Double-blinded trial E. Controlled trial
D. Blinding reduces differential assessment of outcomes of interest (ascertainment bias, information bias, or observer bias) that can occur if the investigator or participant is aware of the group assignment. Blinding can also improve compliance and retention of trial participants and reduce unaccounted supplemental care or treatment that may be sought by the participants. Single blinding refers to either the investigator or the patient being blind to group assignment. Double blinding refers to both the patient and the investigator remaining unaware of the group assignment after randomization. This is desirable but not always possible in RCTs. In the example above, the subjects who undertake the exercise schedule cannot be kept unaware of exercising! A single-blind trial is possible in such cases
27
When searching medical databases, the term MeSH refers to A. Software that distributes all indexed articles B. A keyword that will retrieve all published articles by an author C. A thesaurus of medical subject headings D. A keyword that stops ongoing search process E. A database of mental health and social care topics
C. MeSH stands for medical subject headings. It is a thesaurus embedded in the Pubmed–Medline interface and can be used to search literature more effectively using recognized key words
28
Which of the following is strictly correct about a single-blind study design? A. Only the patients, but not the researchers, do not know whether placebo or active drug is being administered B. Only the researchers, but not the patients, do not know whether placebo or active drug is being administered C. Both the patient and researchers do not know the treatment given D. Only one group of the trial subjects is kept unaware of the treatment status E. Either the patients or the researchers do not know whether placebo or active drug is administered
E. Single blind: either the patient or the clinician remains unaware of the intervention given. Double blind: both the patient and investigator are unaware of the given intervention. Open label: both researchers and the participants are aware of treatment being given after randomisation. Triple blind: apart from the patient and the researcher, those who measure the study outcomes (the assessors) are also unaware of the given intervention.
29
Which one of the following correctly describes a crossover trial? A. Halfway through the treatment phase, the subjects from both arms interchange randomly B. Each subject receives both intervention and control with a washout period in between C. Controls from one trial are shared with another trial where a different drug is evaluated simultaneously D. The trial permits investigation of the effect of more than one independent variable on the clinical outcome E. None of the above
B. If random interchange between treatment and placebo groups occurs halfway through the study, this will lead to chaos and failed randomization. This is termed as contamination. This can occur when participants or their care givers discover they are ‘controls’, and obtain the experimental treatment outside the trial, thus effectively becoming the active treatment group. Choice C is practically impossible; to share controls of one RCT with another means the trial is open label. When each subject in the trial receives both intervention and placebo with a washout period in between while remaining blind to the intervention, this is called as crossover RCT. Crossover trials are possible only if short-term outcomes are evaluated in chronic diseases. This is because the disease process must be suffi ciently long for the subject to receive both interventions across its course. Any intervention applied in a crossover setting must not permanently alter the disease process.
30
A study evaluates the effect of various psychological interventions on bulimia. This study could be termed as a factorial design if A. Halfway through the treatment phase, the subjects from two arms interchange randomly B. Each subject receives both intervention and control with a washout period in between C. Controls from one trial are shared with another trial where a totally different psychotherapy is evaluated simultaneously D. The trial permits investigation of the effect of more than one psychotherapy, both separately and combined, on the clinical outcome. E. None of the above
D. If one wishes to compare the effect of more than one intervention against placebo either a multi-arm RCT or a factorial design can be chosen. A multi-arm RCT is a simple extension of the usual RCTs where an extra arm of subjects is generated through randomization to allocate the second intervention in addition to placebo and the fi rst intervention groups. A factorial RCT evaluates the effect of more than one intervention, independently and also in combination. In the above example the effect of two different psychotherapies independently and in combination could be studied using a factorial design
31
A 2 × 2 contingency table is constructed to analyse the primary outcome data of a trial. The degrees of freedom to use chi-square statistics is A. 1 B. 2 C. 3 D. 4 E. –4
A. ‘Degree of freedom’ is defi ned as the number of values in the fi nal calculation of statistics that are free to vary. In a two-way chi-square test, this is given by Degrees of freedom (d.f.) = (number of rows – 1) × (number of columns – 1) In this question, for a 2 × 2 table, there are 2 rows and 2 columns. Hence d.f. = (2 – 1) × (2 – 1) = 1 × 1 = 1. Degrees of freedom cannot take negative values
32
``` Which one of the following is correctly matched with the most suitable study method? A. Diagnostic test: case–control study B. Prognosis: prospective cohort study C. Therapy: cross-sectional survey D. Aetiology: case–series study E. Epidemiology: RCT. ```
B. No single study design is suffi cient in itself to answer various clinical questions. For evaluation of a diagnostic test, a survey design that allows comparison with the gold standard is often used. For prognostic studies a prospective cohort design is useful. Therapeutic interventions are best evaluated using RCTs. Aetiological studies are often cohort or case–control studies; although the RCT is ideal it may not be always possible to conduct one. Epidemiological studies are often cross-sectional surveys
33
Which of the following characters of a pragmatic RCT distinguishes it from an explanatory RCT? A. Pseudo-randomization is practised in pragmatic trials B. Type 1 error level is set to be higher in pragmatic trials C. Descriptive rather than inferential statistics are used to report the outcome of pragmatic trials D. Higher generalizability is achieved in pragmatic trials E. Strict exclusion of patients with comorbid conditions is seen in pragmatic trials
D. The RCT has traditionally been considered as a study design that can yield results with a high degree of internal validity. But the major drawback of RCTs is that the process takes place under highly experimental conditions, which are not seen in clinical practice. So any results achieved from such RCTs, though valid, may not be reproducible in everyday practice. In order to circumvent this issue, more naturalistic trials that retain core principles of RCT such as randomization, longitudinal follow-up, and controlled intervention are being increasingly used. Such real-world RCTs are called pragmatic trials or effectiveness trials. Such trials can be useful to fi nd out if an intervention will be effective in clinical practice, although they may not be suitable to study the biological effi cacy of a drug. A pragmatic RCT may reject various practices seen in an explanatory RCT, such as strict exclusion criteria, blinding, placebo use, fi xed dose intervention, high follow-up care, per-protocol analysis, etc. But basic principles such as randomization and use of probability theory (hypothesis testing and p values) are retained.
34
Which one of the following statement with respect to bias is false? A. Bias is a systematic error B. Bias cannot be controlled for during the analysis stage of a trial C. The presence of bias always overestimates the fi nal effect D. Blinding reduces measurement bias E. Randomization reduces selection bias
C. Bias is defi ned as any trend in the collection, analysis, interpretation, publication, or review of data that can lead to conclusions that are systematically different from the truth. It can also be termed as a systematic error that infl uences the result in either direction. Hence a biased study could either overestimate or underestimate the true effect, depending on the direction of the trend. Bias may be introduced by poor study design or poor data collection. Bias cannot be ‘controlled for‘ at the analysis stage. In RCTs, randomization ensures a reduction in selection bias if the process is carried out in a strictly concealed manner. Blinding can reduce the measurement bias if properly executed.
35
Which one of the following is NOT a major disadvantage of a double blind, well-concealed RCT design? A. Very expensive to carry out B. May become time consuming C. Experimental results may not translate to clinical samples D. Randomization may be unethical and not possible in certain cases E. Introduction of recall bias
E. Recall bias refers to the systematic error produced by the tendency of subjects to recall an exposure differently when they are diseased compared with when they are not. Recall bias most often occurs in case–control studies. The remaining choices refer to genuine disadvantages of a well-conducted RCT
36
The last observation carried forward (LOCF) method is not suitable for processing the data for which of the following RCTs with intention to treat analysis? A. Benzodiazepines for anxiety B. SSRIs for depression C. Venlafaxine for generalized anxiety disorder D. Memantine for Alzheimer’s disease E. Risperidone for bipolar disorder
D. In most drug trials, patients drop-out because of non-effi cacy or adverse events. If we think that a number of participants drop-out because of non-effi cacy, dropping them out of the analysis would project a favourable outcome for the drug in question. Hence the LOCF method takes the last observation and utilizes it in the analysis. For illustration, we take two subjects, in a trial of antidepressants. Subject 1, improves signifi cantly over the 4 weeks, his MADRS score has dropped to 1 from a baseline of 30, while Subject 2 dropped out of the study in the second week, due to non-effi cacy. If we remove subject 2 from the analysis, the mean score at the end would be 1 (an whopping improvement of 29 points on the MADRS), while if we carry forward his last observation score (week 2) of 30 to the end and took the mean of the two scores (15.05), the drop is only 15 points from the mean baseline score of 30. Trials of Alzheimer’s disease interventions are different, since we do not expect (although we most defi nitely would like to see) improvement in the cognitive score, but a rather slow decline in scores over time, in spite of the medications, due to the progressive nature of the illness. If a patient drops out early because of the experience of adverse effects, carrying forward his score to the endpoint analysis will falsely project a favourable outcome. Again to illustrate, let us consider a trial of cholinesterase inhibitors. Subject 1 experienced a decline of 19 points over 4 weeks, while the second subject dropped out the fi rst week, when his MMSE had not declined. If we carry forward his last observation of 20, it will look like there was no deterioration at all, and the difference in the mean scores over time would be diluted to 10, rather than a drop of 19. As a corollary, the reason for drop-out is another important issue. In trials of Alzheimer’s disease interventions, early drop-outs are most probably due to adverse effects, while late drop-outs are due to non-effi cacy. This can again project a favourable outcome for the drug.
37
All of the following measures can be used to decrease the heterogeneity in a meta-analysis except A. Transformation of the outcome variable in question B. Employing meta regression analysis C. Using a random effects model D. Doing a subgroup analysis E. Including data from smaller unpublished studies
E. There are a number of ways to manage heterogeneity. The easiest way would be to avoid it. This includes using strict inclusion criteria to include studies that are as similar as possible. In case of continuous variables, one of the ways would be to transform the data so that all data look similar and are less heterogeneous. Meta regression is a collection of statistical procedures to assess heterogeneity, in which the effect size of study is regressed on one or several covariates, with a value defi ned for each study. The fi xed-effect model of meta-analysis as reported in this question, considers the variability between the studies as exclusively due to random variation. The random-effects model assumes a different underlying effect for each study and takes this into consideration as an additional source of variation. The effects of the studies are assumed to be randomly distributed and the central point of this distribution is the focus of the combined (pooled) effect estimate. If there were some types of studies that were likely to be quite different from the others, a subgroup analysis may be done. And fi nally, one could exclude the studies that contribute a great deal to the heterogeneity. Locating unpublished studies may help reduce publication bias but will not have any predictable and constant effect on the degree of heterogeneity.
38
Both odds ratios and relative risk are often used as outcome measures in published studies. Which of the following statement is true regarding these measures? A. The odds ratio cannot be calculated in cohort studies B. Incidence rate is required to calculate the odds ratio C. Relative risk cannot be calculated for case–control studies D. If the outcome of interest is very common, the odds ratio approximates relative risk E. The odds ratio cannot be used to study dichotomous outcomes
C. Odds are the probability of an event occurring divided by the probability of the event not occurring. An odds ratio is the odds of the event in one group (e.g. intervention group) divided by the odds in another group (e.g. control group). Odds ratios tend to exaggerate the true relative risk to some degree. But this exaggeration is kept minimal and even negligible if the probability of the studied outcome is low (empirically, less than 10%); in such cases the odds ratio approximates the true relative risk. As the event becomes more common the odds ratio no longer remains a useful proxy for the relative risk. It is suggested that the use of odds ratios should probably be limited to case-control studies and logistic regression examining dichotomous variables. As risk refers to the probability of an event occurring at a time point, in other words it is the same as the incidence rate. The inherent cross-sectional nature of a case–control study (where ‘existing cases’ are recruited) does not allow one to study ‘new’ incidences. Hence we cannot measure risk, and so relative risk, from case–control designs.
39
Which one of the following clinical question can be correctly addressed by a case–control design? A. Is it effective to use hyoscine patches in treating clozapine-induced hypersalivation? B. How many inpatients in wards for elderly people suffer from untreated hypercholesterolaemia at any given time? C. How rapidly will lithium discontinuation produce relapse of schizoaffective disorder? D. Are we at local community team compliant with the NICE guidelines for prescribing antipsychotics? E. Do patients with depression have more academic examination failures than their healthy siblings?
E. Choice A refers to a clinical question related to therapeutic intervention – RCTs are best suited to answer this. Choice B is an epidemiological question – ‘how many in a population have a particular condition?’ A cross-sectional survey could answer this question. Choice C refers to a prognostic question – how long will it take for schizoaffective relapse following lithium discontinuation? A prospective cohort (or a RCT if ethically approved) is the most appropriate design for this question. Choice D requires a clinical audit, which is often closer to a cross-sectional survey in design. Choice E refers to defi ned cases and controls being compared for a possible exposure or risk factor that might have occurred in the past. Hence the case– control design is best suited to answer this question. Please note that it is possible to design a prospective cohort study by observing for a long time those with academic failure to detect development of depression.
40
A 50-year-old man sustained signifi cant memory loss following nearfatal carbon monoxide poisoning. Following discussion he agreed to take part in a double-blinded trial of donepezil vs placebo administered in six separate 4-week modules with a 2-week washout period in between. Neuropsychological measures were obtained at regular pre-planned intervals to monitor changes. He was the sole subject on the trial and the randomization sequence was generated and maintained by the pharmacy. This study design could be best described as A. Uncontrolled trial B. N-of-1 trial C. Crossover RCT D. Pragmatic RCT E. Naturalistic observational study
B. N-of-1 trials are randomized double-blind multiple crossover comparisons of an active drug against placebo in a single patient. The design uses a series of pairs of treatment periods called modules. Within each module the patient receives active treatment during one period and either an accepted standard treatment or placebo in the other. Random allocation determines the order of the two treatment periods within each pair and both clinician and patient are blinded for the intervention. This design is mostly suited for chronic recurrent conditions for which long-term interventions exist that are not curative. Interventions with rapid onset and offset of effects are best suited for n-of-1 trials. This allows shorter treatment periods wherein multiple modules of intervention and placebo/standard treatment can be compared, increasing the chance of achieving a statistically signifi cant result. It is also necessary that the interventions tested must be cleared from the patient’s system within a fi nite washout period.
41
While conducting a systematic review, publication bias could be determined using which of the following methods? A. Funnel plot B. Galbraith plot C. Failsafe N D. Soliciting and comparing published vs. unpublished data E. All of the above
E. Publication bias refers to the tendency of journals to accept and publish certain types of studies more often than the others. In general, studies with results that are impressively signifi cant or of higher quality by virtue of larger sample size are more successful in getting published. Publication bias can be considered as a form of selection bias when one attempts a systematic review or meta-analysis. Publication bias can be detected using a funnel plot – visual inspection of a graph drawn by plotting a measure of precision (often sample size) against treatment effect will reveal asymmetry of the two arms of the funnel-shaped graph if publication bias is present. Galbraith plot refers to a graph obtained by plotting a measure of precision such as (1/standard error) against standard normal deviate (log of odds ratio/standard error). The coordinates obtained from such a plot can be used to determine the extent of publication bias using linear regression. Failsafe N is another way of estimating publication bias. Consider a meta-analysis yielding a statistically signifi cant difference in outcome between two interventions, despite suspected publication bias. Then failsafe N answers the question ‘How many missing studies are needed to reduce the effect to statistical non-signifi cance?’ The higher the failsafe N, the lower the publication bias. If one could solicit and compare all unpublished data with published data, then publication bias would become obvious.
42
In a RCT the randomization sequence is protected before and until the randomization is completed. This is known as A. Concealment B. Double blinding C. Matching D. Masking E. Trial independence
A. Allocation concealment refers to the process used to prevent fore knowledge of the assignment before allocation is complete. So the investigator who recruits subjects for a trial will not know the nature of assignment of consequent subjects that enter randomization. Allocation concealment seeks to prevent selection bias, protects the allocation sequence before and until assignment, and can almost always be successfully implemented in a RCT. It is often confused with blinding which seeks to prevent ascertainment bias and protects the sequence after allocation, and cannot always be implemented
43
Data collected for a study on antidepressant effi cacy show the outcome as observations of the number of days needed to achieve remission. The standard deviation for such observations will be measured in which of the following units? A. No units B. Days C. Square root of days D. Days square E. Person-years
B. The standard deviation has the same units as the primary variable. This is an advantage of standard deviation compared with variance, which is also a measure of dispersion
44
In a study presenting outcome in terms of median days of hospital admission, the collected data show many observations substantially higher than the median. Which one of the following is correct regarding the above study? A. The results are negatively skewed B. Mean = median = mode C. The results are not skewed D. Mean > median E. Mode = median
D. If many observations are substantially higher than the median we can assume that the mean of the distribution might be greater than the median. This translates to a positively skewed distribution. No comments can be made on mode using the available information
45
A trial is conducted to evaluate the effi cacy of lamotrigine in patients with symptoms of recurrent depersonalization. While calculating the number of patients needed in the trial to demonstrate a meaningful effect, α level is set at 0.05. Which of the following is true regarding alpha (α)? A. It is the probability of a type 2 error B. It is the threshold for defi ning clinical signifi cance C. If α = 0.05, there is a 5% chance that the null hypothesis is rejected wrongly D. If α = 0.05, then 5% of treated subjects will show absence of treatment effect. E. None of the above
C. α is the probability of type 1 error. It is used to set the threshold for statistical (not clinical) signifi cance, often arbitrarily set as p = 0.01–0.05 (α = 1–5%). If α = 0.05, there is a 1 in 20 or 5% chance that the null hypothesis is rejected wrongly.
46
Which of the following is an agreed method of assessing the quality of conducting and reporting systematic reviews and meta-analyses? A. ASSERT B. CONSORT C. QUOROM D. SIGN E. NICE
C. Despite the increasing importance and abundance of systematic reviews and metaanalyses in the scientifi c literature, the reporting quality of systematic reviews varies widely. To address the issue of suboptimal reporting of meta-analyses, an international group in 1996 developed a guidance called the QUOROM Statement (QUality Of Reporting Of Metaanalyses). QUOROM focused on the standards of reporting meta-analyses of RCTs. A revision of these guidelines renamed as PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) includes several conceptual advances in the methodology of systematic reviews
47
``` All of the following methods are used to assess heterogeneity in a meta-analysis except A. Q statistic B. I squared statistic C. Galbraith plot D. L’Abbé plot E. Paired t statistics ```
E. Meta-analysis is generally done to combine the results of different trials, as individual clinical trials are often too small and hence underpowered to detect treatment effects reliably. Meta-analysis increases the power of statistical analyses by pooling the results of all available trials. But this comes at a small cost. Although similar studies are taken to be included in the meta-analysis, it is likely that each trial is different from each other just by chance. Sometimes the difference can occur due to foreseeable situations, e.g. the dosage of medication tested, the mean ages of the population tested, difference in the scales used, etc, may differ among studies. To measure if this heterogeneity is more than the random heterogeneity we expect, statisticians resort to certain tests of heterogeneity. They are statistical as in the chi-square test (or Q statistic), which tests the ‘null hypothesis’’ of homogeneity and the I-squared test (which measures the amount of variability due to heterogeneity). Galbraith’s plot and l’Abbé plot are pictorial representations of heterogeneity. A paired t test is generally not used to calculate the heterogeneity.
48
``` Which one of the following types of data can have potentially infi nite number of values? A. Continuous B. Categorical C. Nominal D. Ordinal E. Binary ```
A. Data can be qualitative or quantitative. Quantitative data refers to measures that often have a meaningful unit of expression. This can be either discrete or continuous. A discrete measure has no other observable value between two contiguous potentially observable values, i.e. there are ‘gaps’ between values. A continuous variable, on the other hand, can take potentially infi nite values. The other choices in the question refer to qualitative measures whose value can only be described and counted but cannot be expressed in meaningful units
49
A multi-centre RCT was conducted with strict inclusion criteria. Which one of the following properties of the study is most likely to be affected by the stringent inclusion criteria? A. Generalizability of results B. Precision of results C. Accuracy of the results D. Statistical signifi cance of the results E. All of the above
A. A major disadvantage with RCTs is the poor generalizability of experimental fi ndings to a clinical setting. Having strict inclusion and exclusion criteria may help chose a highly homogeneous population, increasing the internal validity of the study but at the expense of generalizability.
50
A researcher is interested in studying whether maternal smoking increases the risk of school refusal in children. Which one of the following is the correct null hypothesis for the above research question? A. School refusal increases the risk of maternal smoking B. Maternal smoking decreases the risk of school refusal C. Maternal smoking does not increase the risk of school refusal D. Maternal smoking increases the risk of school refusal E. None of the above
C. In scientifi c research, nothing can be proven; we can only disprove presumed facts. If one wants to prove maternal smoking causes school refusal, it is best to assume that maternal smoking does not cause school refusal to start with and then proceed to disprove this statement. Such statements waiting to be disproved during the course of a research study are called the null hypotheses. The converse of the null hypothesis is called the alternative hypothesis. Research question: Does maternal smoking increase risk of school refusal? Null hypothesis: Maternal smoking does not increase risk of school refusal Alternative hypothesis: Maternal smoking increases the risk of school refusal
51
From the following example, the most important methodological challenge while conducting a cohort study is A. Statistical analysis of the results B. Randomization of the cohorts C. Identifying those who develop the outcome D. Identifying a suitable comparison group E. Concealment of cohort allocation
D. Subjects do not get randomized in a simple cohort study. Hence there is no question of allocation concealment. When valid instruments and a reasonable follow-up schedule are used, identifi cation of those who develop the ‘event’ of interest/outcome is often not diffi cult in a cohort design. Often the most diffi cult part is to identify a reasonable control cohort that lacks the ‘exposure’ of interest. Internal controls refer to those who are ‘non-exposed’ but derived from the same study population as the ‘exposed’. External control refers to an independently recruited cohort without the exposure
52
In a study investigating the mean cholesterol levels in 36 patients taking olanzapine, the mean was found to be 262 mg/dL. The standard deviation of this observation was 15 mg/dL. The 95% confi dence interval for this observation is are A. 232–292 mg/dL B. 247–277 mg/dL C. 259.5–264.5 mg/dL D. 257–267 mg/dL E. 226–298 mg/dL
D. 95% confi dence limits of means of a sample are nothing but the range between an observation less than approximately two standard error units less than mean value and an observation two standard error units more than the mean value. Using mathematical expression, 95% confi dence limits = mean ± (2 × standard error of mean). Standard error of mean is calculated as SE = standard deviation/√sample size. SE = 15/√36 = 15/6 = 2.5 in this question. Hence 95% confi dence limits are 262 ± (2 × 2.5) = 262 ± 5 = 257, 267.
53
In a normal distribution curve, 99% of observations will fall within which of the following values of standard deviation (SD)? A. –2 SD to +2 SD B. –3 SD to +3 SD C. –1 SD to +2 SD D. –1 SD to +1 SD E. +1 SD to +3SD
B. An important property of the normal distribution curve is the relationship between the SD of normally distributed observations and probability. Normal distribution curves are symmetric and bell-shaped. Nearly 68.5% of the sampled population will lie within 1 SD of the mean on either side of the curve, 95.5% within 2 SDs, and 99% within 3 SDs. In other words, there is a 1% chance that an observation will fall outside +3 SD to –3 SD; a 5% chance that it will fall outside +2SD to –2SD and nearly 30% chance that it will occur outside +1SD and –1SD.
54
Confi dence intervals are used to describe the range of uncertainty around the estimated value of an outcome from the sample studied. Which of the following statements about confi dence intervals is incorrect? A. Sample size is used in calculating confi dence intervals B. It includes a range of values above and below the point estimate C. If the confi dence interval includes a null treatment effect, the null hypothesis can be rejected D. 95% confi dence interval is often used in clinical studies E. When the estimated outcome is a ratio, a positive treatment effect is shown by confi dence intervals remaining above one.
C. If the confi dence interval includes a null treatment effect, the null hypothesis cannot be rejected within the set levels of confi dence limits. Confi dence intervals provide a measure of dispersion of the point estimate within stipulated confi dence limits (arbitrarily 95% corresponds to a p value of 5%). In other words, confi dence intervals provide the assured range within which the true value may lie. Confi dence intervals are a measure of precision of the results obtained from a study. The larger the sample studied, the narrower the intervals. If the confi dence intervals cross the value ‘0’ for the difference between means then the results are statistically not signifi cant. If it crosses the value ‘1’ for ratio measures such as the odds ratio, it is not signifi cant. If it crosses infi nity for inverse ratios such as NNT then it is not signifi cant.
55
A clinical researcher is examining the incidence of akathisia in two groups of patients. One group (n = 35) has been prescribed benzodiazepine for use as required while the other group (n = 35) is free from any benzodiazepine exposure. The outcome is measured as proportion of patients who develop akathisia in a dichotomous scale. Akathisia develops in 10 patients without benzodiazepines and in 20 patients with benzodiazepines. Which of the following statistical tests is best suited to analyse the statistical signifi cance of the difference between the two groups? A. Chi square test B. Paired t test C. Multiple regression analysis D. Wilcoxon rank sum test E. Pearson coeffi cient test
A. In this study, the dependent variable is treated as a categorical outcome. In other words, the population has been categorized into ‘akathisia present’ or ‘akathisia absent’. This type of outcome yields frequency counts or proportions that can be analysed for signifi cance using the chi square test. The t test is used for comparing means. The Wilcoxon rank sum test is a non-parametric equivalent of the t test. Pearson coeffi cients are used to analyse correlation. Regression analyses are used to predict one variable from another when they are correlated
56
Considering normal distribution, which one of the following statements is incorrect? A. It is a continuous distribution B. It is symmetrical in shape C. The mean, median, and mode are identical D. The shape of the distribution depends on the number of observations made E. Both tails of the distribution extend to infi nity
D. Irrespective of the number of observations made, the shape of a normally distributed curve is symmetric and bell shaped. The exact shape of the normal distribution is defi ned by a function that has only two parameters: mean and standard deviation. For a given range of scores, when the standard deviation is small, the curve becomes leptokurtic, i.e. thin but still symmetric. When the standard deviation is larger, it becomes platykurtic.
57
In descriptive statistics, which of the following is the most widely used measure of dispersion of a frequency distribution? A. Range B. Median C. Standard deviation D. Variance E. p Value
C. Standard deviation is a widely used measure of dispersion of data in descriptive statistics. Other measures include range, interquartile range (usually accompanies median values), and variance. Standard deviation is obtained by the root mean square of differences between individual observations and the mean value. Note that standard error is often preferred as the measure of dispersion while making inferences from a sample of the population. Standard error is a measure of precision of sample estimate in comparison with the population value.
58
In qualitative research which of the following refers to modifying the research methods and hypothesis as and while one conducts the research? A. Triangulation B. Iterative approach C. Theoretical sampling D. Content analysis E. Deductive approach
B. The iterative approach in qualitative studies refers to the process of altering the research methods and building the hypothesis as the study progresses, in response to new information gained while conducting the research. This fl exibility allows qualitative studies to follow an inductive rather than the deductive approach seen in quantitative research. Data come before theory is generated in inductive methods; a stated theory is tested using generated data in deductive methods.
59
Systolic blood pressure is known to be normally distributed across the population with a mean of 120 mmHg and standard deviation of 10 mmHg. How many out of 100 patients in a population will have systolic blood pressure between 120 and 130 mmHg? A. 68 B. 97 C. 48 D. 17 E. 34
E. In the above question, the mean is given as 120 mmHg. Assuming normal distribution with a standard deviation of 10 mmHg, we can fi nd out the proportion of the population that will fall between two observed values. For values between –1 and +1 standard deviation from the mean, this will be nearly 68%. Nearly 34% will have values between the mean and 1 standard deviation. In other words 34% will have systolic blood pressure between 120 mmHg and 130 mmHg.
60
In which of the following situations is intention to treat analysis deliberately not attempted even if there are signifi cant numbers of drop-outs? A. A study that analyses the efficacy of an intervention itself B. A study that analyses the effectiveness of providing an intervention C. A study that compares two interventions for economic efficiency D. A study that compares an established standard treatment against a new treatment with the view of replacing the standard E. None of the above
A. Signifi cant numbers of subjects recruited for trials often do not complete the trial as per protocol. The data generated from such drop-outs cannot be ignored as this will potentially lead to an attrition bias in favour of the intervention generally. Therefore, it is a standard practice to analyse the results of trials on an ‘intention to treat’ basis, i.e. data from subjects are analysed as per initial allocation irrespective of trial completion. In a few situations such as the ‘effi cacy studies’, intention to treat analysis is not used, instead ‘per-protocol analysis’ is carried out. An effi cacy study is designed to explain the effects of the intervention itself. This is in contrast to effectiveness studies, which are designed to study the usefulness of making an intervention available (choices B, C and D).
61
A 24-week RCT of memantine in moderate–severe Alzheimer’s dementia was reported. The investigators recruited 126 subjects for the memantine arm and 126 for the placebo arm, out of which 100 in the memantine group and 100 in the placebo group completed the study. Using a categorical measure of treatment response it was shown that 40% of the patients in the memantine group responded while only 20% in placebo group showed a response. Calculate the relative risk reduction of using memantine A. 20 B. 5 C. 2 D. 1 E. 10
D. Relative risk reduction (or relative benefi t increase) is calculated using the following expression: relative risk reduction = absolute risk reduction/control event rate (RRR = ARR/ CER) The control event rate is 20%; the experimental event rate is 40%. Absolute risk reduction is the difference between the two event rates, i.e. 40 – 20 = 20% RRR = 20/20 = 1
62
Using the above study results calculate the number needed to treat (NNT) for patients receiving memantine compared with placebo A. 20 B. 5 C. 2 D. 10 E. 7
The NNT can be calculated from the absolute risk reduction (ARR). NNT = 1/ARR NNT = 1/0.2 = 5 Five subjects must be treated with memantine to have one additional response
63
``` If the above study used a per protocol analysis of primary outcome, the odds ratio of having a response is A. 2.7 B. 7.2 C. 0.16 D. 6 E. 0.37 ```
A. To calculate the odds ratio, it will be useful to construct a 2 × 2 table. As per protocol analysis is used, only those who completed the trial have been included in the analysis The odds ratio is obtained using the cross product ratio ad/bc = (80 × 40)/(60 × 20) = 8/3 = 2.7
64
``` Which one of the following measures is used in correlation analysis for non-parametric data? A. Kappa statistics B. Pearson’s correlation C. Spearman’s correlation D. Cohen’s d E. Cronbach’s alpha ```
C. Spearman’s correlation is used for non-parametric correlation analysis. It is also called the rank correlation test. It can be used when one or both variables to be correlated consist of ranks (ordinal) or if they exist as quantitative data but do not have normal distribution. Pearson’s correlation is used for parametric correlation. Kappa is a measure of agreement not correlation. Cohen’s d is used to calculate effect size. The internal consistency of an instrument is tested using Cronbach’s alpha.
65
Parametric statistical methods make assumptions, which when satisfi ed make the fi nal estimate precise and accurate. Which of the following is one such parametric assumption? A. The distribution of observations in the population is not known B. The variance of the compared samples are homogeneous C. The analysed variables are categorical measures D. Outliers are unequally distributed E. The sample size is at least 2% of the size of target population
B. To enable use of inferential statistics, standard sampling assumptions such as (1) the randomness of the sampled data and (2) the independent nature of the observations must be met. In addition, when parametric statistics are employed assumptions such as 1. homogeneity of variance of the samples 2. observations are obtained from continuous (interval/ratio) scales 3. normal distribution of the observed variable must be met. There is no set proportion of population size that must constitute the sample size in order to use parametric statistics. But in samples that are too small the distribution may not be normal and the central limit theorem may not be applicable. In conditions where such assumptions are not met non-parametric statistics are used. The latter are often considered to be less robust.
66
In a study comparing drug A and a placebo control, 20 out of 200 patients taking drug A die after 3 years. Twenty-fi ve out of 225 patients taking the placebo die after 3 years. If death is the outcome of interest, the control event rate is given by A. 25/225 B. 20/200 C. (25 – 20)/200 D. (25 – 20)/225 E. 25
A. Drawing a 2 × 2 table will help answering this question | Control event rate is the rate of death (‘event’ of interest) in the control group = 25/225
67
In an RCT comparing the effect of exposure therapy versus cognitive restructuring, follow-up was carried out at 6, 11, 24, and 36 weeks. At weeks 6 and 11, after rating the patient, the outcome assessors tried to guess the treatment condition. Correctness of guesses did not differ significantly from that expected by chance. This was an attempt to demonstrate which of the following? A. Adequacy of randomization B. Concealment of allocation C. Blindness of assessor D. Blindness of patient E. Matching of two groups
C. Adequacy of blinding can be tested during or after completing a trial by asking the blinded parties to guess the allocation. Guess rates that are signifi cantly higher than expected by chance indicate failure of blinding. Testing for ‘blindness’ may not generate valid answers all the time. This is because as participants begin to experience treatment response or outcomes of interest, they begin to generate ‘hunches’ about the effi cacy of the treatments being tested. Hence tests for blinding can show spurious failure of blinding while in fact they test the ‘effi cacy hunches’ that develop late in the process of a trial.
68
``` If the sample size is sufficiently large, mean values of repeated observations follow normal distribution irrespective of the distribution of original data in the population. This is known as A. Bayesian theorem B. Central limit theorem C. Bonferroni correction D. Transformation theorem E. Independent observations theorem ```
B. The central limit theorem explains why normal distributions are so frequent when considering most biological parameters. Consider repeated sampling from a population where distribution of the observed variable is unknown. You intend to plot the distribution of individual means of each sample from the population. As sample size increases, the sample means approach a normal distribution with its mean value being the same as the population mean and a standard deviation equal to the standard deviation of the population divided by the square root of the sample size. Usually 10 or more observations are suffi cient to result in an approximate normal distribution.
69
The validity of a new instrument is compared with an external criterion. A conceptually related external criterion is identified to occur sometime in the future. If the correlation between current scores obtained using the instrument and the future expected outcome is studied, this is called A. Concurrent validity B. Incremental validity C. Predictive validity D. Inter-rater reliability E. Internal consistency
C. The term validity refers to the strength of our conclusions, or in the case of statistics, the strength of our inferences. It refers to applicability. The term reliability refers to the consistency of our measurements, or the reproducibility. An important subtype of validity is called criterion validity. If an instrument provides a result that withstands the test of an external criterion then the instrument is said to have high criterion validity. The external criterion may be a measurement that can be obtained more or less at the same time (concurrent validity) or it may be an outcome that is predicted to occur in the future (predictive validity). If a test offers something over and above what is offered by an existing instrument, then incremental validity can be established. Internal consistency of a test refers to looking at how consistent the results are for different items (measuring the same construct) within the instrument studied. This can be measured by undertaking item–item correlation, item–total score correlation or split half reliability (Cronbach’s alpha; see elsewhere in this chapter).
70
A recent study conducted in a palliative care unit assessed the use of a two-item questionnaire to screen for the presence of depression. Given below is the table which compares the result of the screen to the gold standard (DSM-IV) diagnosis. In relation to this table, answer questions 70–82 Depressed- +ve 39, -ve 4 (43) Not depressed- +ve 40, -ve 84 (124) ``` The sensitivity of the overall screen using both items is approximately A. 100% B. 25% C. 67% D. 76% E. 91% ```
E. Questions similar to this are very common in the MRCPsych exam. Most of such questions provide some data and require the candidate to do a series of calculations from the data. It is always advisable to redraw as soon as possible the presented data in a format that will fi t the purpose. From the given table we can create a 2 × 2 table, with the gold standard result on the top. One should be careful while constructing the 2 × 2 table. It is advisable to stick to one style of using columns and rows to indicate a particular group of data. Here, we have drawn the 2 × 2 table with the gold standard results indicated across the two data columns with screening test results indicated across the two rows Sensitivity is defi ned as the test’s ability to identify people who, according to the diagnostic (gold) standard, actually have the disorder (true positives). Sensitivity = A/(A + C) = 39/43 = 90.69%, i.e. 90.69% of subjects who really have depression according to DSM-IV criteria have a positive test result on the screening test. In other words, sensitivity is the proportion of true positives (cases) correctly identifi ed by the test.
71
A recent study conducted in a palliative care unit assessed the use of a two-item questionnaire to screen for the presence of depression. Given below is the table which compares the result of the screen to the gold standard (DSM-IV) diagnosis. In relation to this table, answer questions 70–82 Depressed- +ve 39, -ve 4 (43) Not depressed- +ve 40, -ve 84 (124) ``` The specifi city of the overall screen is approximately A. 67% B. 95% C. 38% D. 25% E. 91% ```
A. Specifi city is defi ned as the test’s ability to exclude people who, according to the diagnostic (gold) standard, do not actually have the disorder (true negatives). Specifi city = D/ (B + D) = 84/124 = 67.74%, i.e. 67.74% of the people who do not have depression will have a negative result on the two-question screen. Thus specifi city is the proportion of true negatives among all non-diseased individuals. In other words, it is the ability of a test to rule out the disorder among people who do not have it.
72
A recent study conducted in a palliative care unit assessed the use of a two-item questionnaire to screen for the presence of depression. Given below is the table which compares the result of the screen to the gold standard (DSM-IV) diagnosis. In relation to this table, answer questions 70–82 Depressed- +ve 39, -ve 4 (43) Not depressed- +ve 40, -ve 84 (124) ``` The predictive power of a positive test using the overall screen is A. 49% B. 91% C. 67% D. 25% E. 95% ```
A. Not all of those people, who have been found to be ‘positive’ on the test, might actually have the disorder. Positive predictive value (PPV) gives the proportion of true positives among the test positives. It is calculated using the formula, PPV = A/(A + B) = 39/79 = 49.36%, i.e. 49.36% of people diagnosed with depression using the screening test actually have the illness.
73
A recent study conducted in a palliative care unit assessed the use of a two-item questionnaire to screen for the presence of depression. Given below is the table which compares the result of the screen to the gold standard (DSM-IV) diagnosis. In relation to this table, answer questions 70–82 Depressed- +ve 39, -ve 4 (43) Not depressed- +ve 40, -ve 84 (124) ``` The predictive power of a negative test using the overall two-item screen is given by A. 49% B. 91% C. 67% D. 25% E. 95% ```
E. Not all of the people who have been found to be ‘negative’ on the test might actually be disease free. Negative predictive value (NPV) answers the question ‘Of those people who have been found to be ‘disease negative’ on the test, how many actually do not have the disorder?’ It is calculated using the formula, NPV = D/(C + D) = 84/88 = 95.45%, i.e. 95.45% of people diagnosed ‘normal’ on the test don’t have the disorder.
74
A recent study conducted in a palliative care unit assessed the use of a two-item questionnaire to screen for the presence of depression. Given below is the table which compares the result of the screen to the gold standard (DSM-IV) diagnosis. In relation to this table, answer questions 70–82 Depressed- +ve 39, -ve 4 (43) Not depressed- +ve 40, -ve 84 (124) ``` The pretest probability of the overall two-item screen is A. 49% B. 91% C. 67% D. 25% E. 95% ```
D. The prevalence, also known as the pretest probability or base rate, refers to the proportion of people who have the disorder = (A + C)/N, i.e. 43/167 = 25.74%.
75
A recent study conducted in a palliative care unit assessed the use of a two-item questionnaire to screen for the presence of depression. Given below is the table which compares the result of the screen to the gold standard (DSM-IV) diagnosis. In relation to this table, answer questions 70–82 Depressed- +ve 39, -ve 4 (43) Not depressed- +ve 40, -ve 84 (124) ``` The likelihood ratio of a positive test for the overall two-item screen is A. 2.8 B. 4.8 C. 6.8 D. 8.8 E. 10.8 ```
A. PPV and NPV depend on the prevalence of the illness, and, as one can see, the prevalence of an illness can vary according to the population it tests. For example, the prevalence of depression is likely to be more in patients in a palliative care unit. Since the prevalence keeps changing with population, and hence the PPV and NPV, one way of summarizing the fi ndings of a study of a diagnostic test where there is a different prevalence is to use the likelihood ratio. The likelihood ratio for a positive test (LR+) result is the likelihood that a positive test comes from a person with the disorder rather than one without the disorder. LR+ is calculated using the formula, LR+ve = [A/(A + C)]/[B/(B + D)] or simply LR+ve = sensitivity/(1 - specifi city). So, (39/43)/(40/124) = 0.90/0.322 = 2.8. Since the specifi city and sensitivity of a test are considered to be constant for any particular test, the LR is also constant irrespective of prevalence rates.
76
A recent study conducted in a palliative care unit assessed the use of a two-item questionnaire to screen for the presence of depression. Given below is the table which compares the result of the screen to the gold standard (DSM-IV) diagnosis. In relation to this table, answer questions 70–82 Depressed- +ve 39, -ve 4 (43) Not depressed- +ve 40, -ve 84 (124) ``` The likelihood ratio of a negative test (LR–) for the overall two-item screen is A. 0.14 B. 0.34 C. 0.54 D. 0.74 E. 0.94 ```
A. The LR– represents the likelihood that a negative test comes from a person with the disorder rather than one without the disorder. LR– is calculated using the formula LR–ve = [C/(A + C)]/[D/(B + D)], or simply LR–ve = (1 – sensitivity)/specifi city. So, (4/43)/(84/124) = 0.10/0.67 = 0.14 Similar to LR+ve, LR–ve is also constant irrespective of prevalence rates.
77
A recent study conducted in a palliative care unit assessed the use of a two-item questionnaire to screen for the presence of depression. Given below is the table which compares the result of the screen to the gold standard (DSM-IV) diagnosis. In relation to this table, answer questions 70–82 Depressed- +ve 39, -ve 4 (43) Not depressed- +ve 40, -ve 84 (124) Using the nomogram below, calculate the post-test probability of a positive test when using the two-item depression screening test in the palliative care unit using the fi gures indicated at the beginning of Question 70 ``` A. 1 B. 2 C. 4 D. 10 E. 50 ```
E. The post-test probability is the probability that a patient, scoring positive on the test, actually has the disorder (PPV). It can be calculated using the nomogram that is provided. Since we know the pre-test probability (prevalence) and the likelihood ratio, we should be able to fi nd the post-test probability from the chart. A straight line drawn through the pre-test probability (25) and the likelihood ratio +ve (2.8) should yield a post-test probability of about 50.
78
A recent study conducted in a palliative care unit assessed the use of a two-item questionnaire to screen for the presence of depression. Given below is the table which compares the result of the screen to the gold standard (DSM-IV) diagnosis. In relation to this table, answer questions 70–82 Depressed- +ve 39, -ve 4 (43) Not depressed- +ve 40, -ve 84 (124) ``` Using the nomogram in Question 77, calculate the post-test probability of a negative test when using the two-item depression screening test in the palliative care unit A. 1 B. 4 C. 10 D. 50 E. 80 ```
B. In this case, since the question is about post-test probability of a negative test, the likelihood ratio –ve (0.14) and the line would pass through 4.
79
A recent study conducted in a palliative care unit assessed the use of a two-item questionnaire to screen for the presence of depression. Given below is the table which compares the result of the screen to the gold standard (DSM-IV) diagnosis. In relation to this table, answer questions 70–82 Depressed- +ve 39, -ve 4 (43) Not depressed- +ve 40, -ve 84 (124) ``` What is the false positive rate for the overall 2-items screening test? A. 32% B. 9% C. 90% D. 67% E. 25% ```
A. False positive (FP) is the number of people diagnosed to have a condition with the new test when they actually do not have the condition according to the gold standard. In this case, the percentage of people falsely identifi ed by the test as depressed. Using the 2 × 2 table, false positive is calculated FP = B/B+D = 40/124 = 32%.
80
A recent study conducted in a palliative care unit assessed the use of a two-item questionnaire to screen for the presence of depression. Given below is the table which compares the result of the screen to the gold standard (DSM-IV) diagnosis. In relation to this table, answer questions 70–82 Depressed- +ve 39, -ve 4 (43) Not depressed- +ve 40, -ve 84 (124) ``` What is the false negative rate for the overall two-item screening test? A. 32% B. 9% C. 90% D. 67% E. 25% ```
B. False negative (FN) is the number of people not diagnosed with a condition with the new test when they actually have the condition according to the gold standard. In this case, the percentage of people among the depressed group falsely identifi ed by the test as not depressed, i.e. C/A +C; 4/43 = 9.3%.
81
A recent study conducted in a palliative care unit assessed the use of a two-item questionnaire to screen for the presence of depression. Given below is the table which compares the result of the screen to the gold standard (DSM-IV) diagnosis. In relation to this table, answer questions 70–82 Depressed- +ve 39, -ve 4 (43) Not depressed- +ve 40, -ve 84 (124) Taking into consideration the above screening test, we randomly pick 1000 people from the general population. Considering the prevalence of a major depressive disorder using DSM-IV in the general population as 10%, calculate the positive predictive value of the 2-item screening test in the population? A. 49% B. 91% C. 67% D. 31% E. 95%
``` D. In Question 75, we discussed how the prevalence of a condition can vary according to the population tested. Using the same screening test for depression in the general population of 1000 subjects (N), we are asked to calculate the positive predictive value. The prevalence rate or pre-test probability is 10% (A + C/N). We need to make a fresh 2 × 2 table in order to answer the question. We know that sensitivity and specifi city remains constant for the disease. From the given data the prevalence = A+C/N = 10% As N = 1000 now, we can say A+C = 100 Sensitivity (A/A+C) = A/100 = 0.91; so, A = 91. Specifi city (D/B+D) = 67.74%; D/900 = 0.677; D = 610. ``` Using the formula for positive predictive value, PPV = A/A+B = 91/290 = 31%.
82
A recent study conducted in a palliative care unit assessed the use of a two-item questionnaire to screen for the presence of depression. Given below is the table which compares the result of the screen to the gold standard (DSM-IV) diagnosis. In relation to this table, answer questions 70–82 Depressed- +ve 39, -ve 4 (43) Not depressed- +ve 40, -ve 84 (124) Taking into consideration the above screening test, we randomly pick 1000 people from the general population. Considering the prevalence of a major depressive disorder in the general population using DSM-IV as 10%, calculate the new negative predictive value of the two-item screening test in the population? A. 49% B. 91% C. 67% D. 30% E. 98%
E. See the table in Answer 81. Using the formula for negative predictive value, NPV = D/C+D = 98.36%. Note that the same answer can be derived using pretest odds and likelihood ratios. Please see question 6.
83
The table below shows the adverse events reported during an RCT on sertraline for the prevention of relapse in detoxicated alcohol-dependent patients with a comorbid depressive disorder. Answer Questions 83–86 based on the data presented in the table ``` What proportion of patients develops dyspepsia after exposure to the sertraline? (6/44) A. 13.6% B. 5% C. 8.6% D. 63.2% E. 90.2% ```
This question looks at the chances of developing dyspepsia with sertraline. It is otherwise called the ‘experimental event rate’ (EER). This is calculated as A/(A + B); that is, 6/44 = 0.136 or 13.6%. Similar to the above question, the chances of developing dyspepsia with placebo, or the ‘control event rate’ (CER) is C/(C + D), or 2/39 = 0.05 or 5%.
84
The table below shows the adverse events reported during an RCT on sertraline for the prevention of relapse in detoxicated alcohol-dependent patients with a co-morbid depressive disorder. Answer Questions 83–86 based on the data presented in the table ``` What proportion of dyspepsia will be eliminated if sertraline was not administered? (6/44 sert) (6/88) A. 13.6% B. 5% C. 8.6% D. 63.2% E. 90.2% ```
C. This is otherwise called the ‘attributable risk’ or the ‘risk difference’ or ‘absolute risk reduction’ (ARR). It is calculated as the difference in the absolute risks of developing a headache between sertraline and placebo, that is 13.6 – 5 = 8.6%
85
The table below shows the adverse events reported during an RCT on sertraline for the prevention of relapse in detoxicated alcohol-dependent patients with a comorbid depressive disorder. Answer Questions 83–86 based on the data presented in the table ``` How many times is a person on sertraline more likely to develop dyspepsia than a person on placebo? A. 1.7 B. 2.7 C. 3.7 D. 4.7 E. 5.7 ```
B. This question asks for the ‘relative risk’ or ‘risk ratio’ of dyspepsia with sertraline. It is an estimate of how much greater is the risk of developing dyspepsia with sertraline than with placebo. It is the ratio of the absolute risks or ratio of event rates, i.e. EER/CER = 13.6/5 = 2.7. This means that the risk of dyspepsia with sertraline is 2.7 times that of placebo. If there is no difference between sertraline and placebo, the relative risk would be 1. Expressed otherwise, relative risk values that are more than 1.0 represent increases in risk. Relative risk values that are less than 1.0 represent decreases in risk. If 95% confi dence intervals are given, and if the range includes the value 1, then the elevation in risk can be considered as statistically insignifi cant. The relative risk is used as a primary summary measure in RCTs and cohort studies. Remember RR is from exposure->outcome
86
The table below shows the adverse events reported during an RCT on sertraline for the prevention of relapse in detoxicated alcohol-dependent patients with a comorbid depressive disorder. Answer Questions 83–86 based on the data presented in the table ``` How many times are the odds of being dyspeptic on sertraline higher than the odds of being dyspeptic on placebo? A. 1.9 B. 2.9 C. 3.9 D. 4.9 E. 5.9 ```
B. This question looks at the odds ratio. It is an estimate of how many times more likely it was that a person who experienced a problem (dyspepsia) was exposed to the supposed cause (risk factor) than was a control subject (those not exposed to the risk factor). Let us consider the data in the table in a different way: the number of people who developed dyspepsia is 8 and those who did not develop dyspepsia is 75. The ‘odds’ of an event happening is the ratio of the probability of its occurrence to the probability of its non-occurrence. So in patients with dyspepsia, the probability of being on sertraline is A/A + C = 6/8 = 0.75. The probability of being on a placebo is C/A + C = 2/8 = 0.25. Therefore the odds of a person with nausea being on sertraline is 0.75/0.25 = 3 or simply A/C. Similarly, we can also calculate the odds of the person ‘without dyspepsia’ being on sertraline. It is 38/37 (B/D) = 1.02, i.e. the odds of having used sertraline in those who did not have nausea is 1.02. The ratio of these odds is simply called the odds ratio. The ratio = (A/C)/(B/D) or (AD/BC). That is, 3/1.02 or 6 × 37/2 × 38 = 222/76 = 2.92. The odds ratio is interpreted in a manner more or less similar to the relative risk. Confi dence intervals are provided and interpreted in the same manner. Odds ratios are usually used in case control studies and in meta-analyses as primary summary measures Remember OR is from outcome->exposure
87
The finding of a hypothetical cost-effectiveness analysis of a new model of psychotherapy in depression is shown in the table below AD- $5000, effect 45 weeks Psychotherapy (new) $10,000, effect 50 weeks ``` Calculate the average cost-effectiveness ratio (ACER) for the new treatment? A. £200/week B. £100/week C. £50/week D. £111/week E. £20/week ```
A. As cost-effectiveness analysis has been applied to healthcare, researchers have used predominantly two methods of calculating the summary measure – the average ACER and incremental cost-effectiveness ratio (ICER). The ACER captures the average cost per effect, i.e. cost of treatment/effect of treatment. In this case, the cost of the new psychotherapy is £10,000 and the effect is 50 depression-free weeks. In the above question, the ACER for the new treatment (psychotherapy) will be C/E = 10,000/50 = £200. The ACER for antidepressants from the question will be 5000/45 = £111.
88
The finding of a hypothetical cost-effectiveness analysis of a new model of psychotherapy in depression is shown in the table below AD- $5000, effect 45 weeks Psychotherapy (new) $10,000, effect 50 weeks Calculate the incremental cost-effectiveness ratio (ICER) for the new treatment A. £1000 per additional depression-free week B. £200 per additional depression-free week C. £111 per additional depression-free week D. £89 per additional depression-free week E. £600 per additional depression-free week
A. In contrast to ACER, the ICER reports the ratio of the change in cost to the change in effect (for example ΔC/ΔE). In plain and simple language, this pretty much translates to the extra cost per extra effect, i.e. ΔC/ΔE. From the question, we can see ΔC = 10,000 – 5000 = 5000; ΔE = 50 – 45 = 5 weeks. So, ΔC/ΔE = 5000/5 = £1000. Again in plain language, this would mean that compared with antidepressants, the new treatment would cost an average of 1000 additional pounds per one added depression-free week. In many economic evaluations, the ICER indicates that a new treatment is relatively more costly (ΔC >0) and relatively more effective (ΔE >0) than usual care, as in the situation in the question. Now, it is for the decision makers to decide if this additional money is worth spending.
89
The finding of a hypothetical cost-effectiveness analysis of a new model of psychotherapy in depression is shown in the table below AD- $5000, effect 45 weeks Psychotherapy (net) $10,000, effect 50 weeks What is the incremental net benefi t (INB) if the health commissioners are willing to pay around £1500 per additional depression free week? A. £500 B. £1000 C. £2500 D. 5 weeks E. 1 week
C. An INB calculation determines whether the net benefi t of a new treatment outdoes that of usual care. In our case, the net benefi t of psychotherapy surpasses the benefi t of using antidepressants. In general, the INB is calculated by valuing ΔE in pounds and then subtracting the associated ΔC. This is where the society’s willingness to pay for the additional depression week comes into play. INB is calculated using the formula (ΔE × λ) – ΔC, where λ is society’s willingness to pay for a 1-unit gain of effect. In our question, ΔE = 5 weeks; the service managers are willing to pay around £1500/each depression free week (λ – willingness to pay) and ΔC is £5000. So, INB = (5 × 1500) – 5000 = 7500 – 5000 = £2500. The INB equation computes the net value of patient outcome gained in pounds. When the INB is positive, the value of a new treatment’s extra benefi ts (ΔE × λ) outweighs its extra costs (ΔC). In short, society values the extra effect more than the extra cost (i.e. ΔE × λ >ΔC). Conversely, when the INB is less than 0, society (or your health service management) does not consider the extra benefi t worth the extra cost.
90
The finding of a hypothetical cost-effectiveness analysis of a new model of psychotherapy in depression is shown in the table below AD- $5000, effect 45 weeks Psychotherapy $10,000, effect 50 weeks After critically appraising the above cost-effectiveness analysis paper, managers of an NHS foundation trust decide to choose psychotherapy over antidepressants as the first-line management for depression. Which of the following statements best defines the opportunity costs? A. The original cost incurred while providing psychotherapy as the fi rst choice treatment B. The cost of providing psychotherapy instead of prescribing antidepressant drugs for depression C. The apparent cost of not providing antidepressants as the fi rst choice of treatment. D. The cost of the using antidepressants in the absence of psychotherapy for depression. E. The cost of conducting this trial in order to make treatment recommendations
C. Resources are scarce and are relative to needs. The use of resources in one way prevents their use in other ways. For example, if a city council decides to build a hospital on a piece of huge vacant land in the middle of the city, the city forgoes the opportunity to benefi t from the next best alternative such as selling the land to decrease the current debt or building a shopping mall that would generate additional income for the council. Opportunity cost is assessed in not just monetary or material terms, but in terms of anything which is of value. The opportunity cost of investing in a healthcare intervention is best measured by the health benefi ts that could have been achieved had the money been spent on the next best alternative intervention. In this example the cost of not providing the ‘next best alternative’, antidepressant therapy, is the opportunity cost of providing psychotherapy as the fi rst choice treatment.
91
The finding of a hypothetical cost-effectiveness analysis of a new model of psychotherapy in depression is shown in the table below AD- $5000, effect 45 weeks Psychotherapy $10,000, effect 50 weeks The given cost-effectiveness acceptability curve (CEAC) is drawn using the data from the hypothetical study on treatment of depression. What is the probability of cost-effectiveness if the society is willing to pay £150 for every depression-free day? ``` A. >90% B. 75% C. 50% D. 25% E. <10% ```
A. How does a decision maker decide on the willingness to pay (λ)? The net benefi t approach forces decision makers to directly consider the issue of valuing additional patient outcomes. The INB can be computed with various λ s and analysed using multiple regression techniques. How sensitive the results are to the assumed λ value can be gauged using a cost effectiveness acceptability curve (CEAC). The CEAC shows the probability that a new treatment is cost-effective for different values for λ. So in the given question, if λ is £150, the probability of it being cost-effective is >90%. But if the λ is £10, the probability is less than 25%. At the same time, the probability of cost-effectiveness is also >90% if λ was £100. So, it would be sensible for the decision maker to pay £100 for every depression-free day, rather than a £150.
92
12 being the highest degree of depression) was developed to screen for depression in a population of patients with dementia. The scale was tested against the gold standard of DSM-IV in a small study. The neurologists using the test wanted a score that would identify a depressed person from a non-depressed based on this instrument. A statistician involved in the development of this instrument mailed the following graph to the neurologists. Answer Questions 96–99 based on the graph below ``` What is the above graph called? A. Scatter plot B. Funnel plot C. Receiver operator characteristics curve D. Galbraith plot E. Forest plot ```
C. This is a receiver operator curve (ROC). Scores on scales are usually considered to be continuous variables. Although dichotomizing continuous data leads to loss of information, in clinical practice, it makes sense to deal with dichotomous variables. For instance, with the new scale in the question, it would make sense if we can differentiate a depressed patient from a non-depressed patient, rather than just saying patient A had a greater score than patient B. In this situation, we should know where the ideal cut-off for the scale is. However, because the distributions of the scores in these two groups most often overlap, any cut-off point that is chosen will result in two types of errors: false negatives (that is, depressed cases judged to be normal) and false positives (that is, normal cases judged to be depressed). Changing the cut-off point will change the numbers of wrong judgements but will not eliminate the problem. The cut-off point also depends on if we want the test to be more sensitive (as in a screening test) or more specifi c (as in diagnostic tests). The ROC helps us to determine the ability of a test to discriminate between groups and to choose the optimal cut-off point
93
``` What does 1 – specifi city represent? A. False-positive rate B. False-negative rate C. True-positive rate D. True-negative rate E. None of the above ```
A. The test in question is a 12-item scale that has a potential score ranging from 1 to 12. The sensitivity and specifi city of each cut-off score (in this case, there will be 11 possible cut-off scores, as shown in the fi gure) is calculated with reference to the gold standard used to diagnose depression (in this case, DSM-IV). These pairs of values are plotted, with (1 – specifi city) on the x-axis and the sensitivity on the y-axis, yielding the curve in the fi gure in question. Note that the true positive rate is synonymous with the term sensitivity, the true negative rate is the same as specifi city, and the false positive rate means the same as (1 – specifi city); they’re simply alternative terms for the same parameters. For simplicity, the graph can be depicted as below
94
What does the dotted line represent? (ROC curve) A. It is the curve of the test that best discriminates depressed from non-depressed people B. It is the curve of a test that partially discriminates depressed from non-depressed people C. It is the curve of a test that does not discriminate depressed from non-depressed people D. It is the curve representing the application of the current screening instrument to the whole population E. It is the curve of a test with maximum sensitivity but minimum specifi city
C. The dotted line represents a test that is useless in discriminating a depressed from a non-depressed person. A perfect test would run straight up the y-axis until the top and then run horizontally to the right. The more the ROC deviates from the dotted line and tends towards the upper left-hand corner, the better the sensitivity and specifi city of the test.
95
``` Which cut-off point provides the best acceptable combination of sensitivity and specifi city? A. 1/2 B. 8/9 C. 3/4 D. 5/6 E. 6/7 ```
E. From the graph, we can see that the more the ROC curve deviates from the dotted line and tends toward the upper left-hand corner, the better the sensitivity and specifi city of the test. Hence it is generally considered that the cut-off point that’s closest to this corner is the one that minimizes the overall number of errors (‘the best trade off ’); in this case, it is 6/7. Since the scale in our question is a screening test for depression, we would want it to be more sensitive rather than specifi c. As we can see from the fi gure, a cut-off score of 11/12 would give excellent specifi city, but very poor sensitivity, thus increasing the false negative rates.
96
If the area under the curve (AUC) for the new test was found to be 0.5, what does it mean? A. The test can discriminate a depressed from a non-depressed person with high accuracy B. The test can discriminate a depressed from a non-depressed person with moderate accuracy C. The test cannot discriminate a depressed from a non-depressed person D. The test is half as good as the gold standard test E. The test can identify 50% of depressed patients correctly
C. The primary statistical measure obtained from the ROC is the AUC. The AUC value can be used to compare with the AUC value of a curve corresponding to the null hypothesis. The null hypothesis is represented by a curve that could be obtained if the test has no usefulness in discriminating those with the diagnosis and those without. This hypothetical curve will then have an AUC of 0.50, which corresponds to the area in the graph that falls below the dotted line. The difference in the two AUC consists of the area of the graph between the dotted line and the curve. The AUC can be interpreted in another very useful way. AUC is the probability that the test will show a higher value for a randomly chosen individual with depression than for a randomly chosen individual without depression. That means, if we fi nd the AUC for this particular test was 0.9 and take two individuals at random, one with and one without depression, the probability that the fi rst individual will have a higher score than the second is nearly 90%. Fortunately, the AUC, the sensitivities and specifi cities, and the whole ROC are calculated by statistical software, sparing us of the burden
97
``` What is the name of the graph shown above? (for SRs) A. Funnel plot B. Galbraith plot C. L’Abbé plot D. Scatter plot E. Forest plot ```
E. Meta-analyses are usually displayed in graphical form using Forest plots, which present the fi ndings for all studies plus (usually) the combined results. This allows the reader to visualize how much uncertainty there is around the results. The graph in question, modifi ed below, presents a Forest plot, sometimes called a ‘blobbogram’ identifying its basic components
98
``` How many studies in the meta-analysis show statistically signifi cant advantage for the new antidepressant? A. 1 B. 2 C. 4 D. 6 E. 7 ```
``` C. As shown in the diagram above, the horizontal lines along with the ‘blobs’ show the 95% confi dence intervals of the effect size or each study. If the confi dence intervals cross the line of no effect (at 0 in this case), it suggests that the effect is not statistically signifi cant. Out of the seven studies, the confi dence intervals of three of the effect sizes of three of the trials (1, 2 and 5) cross the line of no effect, and four (trials 3, 4, 6 and 7) do not cross the line. The summary measures in cases of dichotomous variables are usually odds ratios, and the line of no effect in that case will correspond to 1 ```
99
``` Which of the trials has the greatest weight on the overall analysis? A. Trial 1 B. Trial 3 C. Trial 4 D. Trial 6 E. Trial 7 ```
D. The size of the blobs (lozenges) in the blobbogram usually represents the size of the study, or more exactly the proportion of the weight that the study contributes to the combined effect. In this case, the largest blob is that of trial 6
100
In which of the following situations is sensitivity analysis especially recommended while conducting a meta-analysis? A. Presence of a high degree of homogeneity B. Any meta-analysis of continuous data C. Any meta-analysis of economic data D. Presence of signifi cant publication bias E. Pooled outcome showing a large effect of intervention
D. A systematic exploration of the uncertainty in the data is known as sensitivity analysis. It is carried out to measure the effects of varying study variables such as individual sample size, number of positive trials, number of negative trials, etc., on expected summary outcome measure of a study (often a meta-analysis or economic study). Sensitivity analysis can be undertaken to answer the question, ‘Is the conclusion generated by a meta-analysis affected by the uncertainties in the methods used?’ One such uncertainty is publication bias. So, we can use sensitivity analyses to fi nd out the impact of having missed unpublished studies.
101
The number of independent values or quantities which can be assigned to a statistical distribution
degrees of freedom
102
An estimate of the between-study variance
Tau
103
broad analysis of continuous, ratio and interval data
generally normally distributed, and therefore, can use parametric tests using mean and SD
104
analysis of ordinal/ranked (categories, order inherent, not quantifiably) related
non-parametric
105
binary/nominal analysis
compare in terms of modal values and frequency counts, however can be easily transformed into single comparative measure (eg odds ratio)
106
ratio
relationship between the numerator and denominator, instances of an observation in any reference group. number from 0 to infinity
107
proportion
type of ratio, whereby the numerator is incorporated into denominator therefore can be expressed as percentage
108
rate
ratio that is, or should be, quoted in reference to time frame
109
point prevalence vs period prevalence, rate vs ratio
point is ratio, period is rate
110
variance
sum of all differences in values from the mean, squared, and divided by the degrees of freedom (n-1)
111
when data is skewed or bimodal described how
by median and interquartile range
112
what does the standard error reflec
reflects how much the mean and SD would be likely to vary in the general population
113
CI for poulation mean
CI= mean +/- 1.96 x SE (SE=SD/sqRn)
114
probability
likelihood of an event occurring relative to the total number of possibilities
115
Type 1 error
when null hypothesis is falsely rejected (false positive)
116
Type 2 error
when null hypothesis is falsely accepted (false negative)
117
Power
probability of correctly rejecting the null, when a true difference exists. 1-B (set at 0.2)
118
Effect sizes
differenece between 2 group means, divided by SD in controls= cohen's D or the average of SD in 2 patient groups= standardised difference numerically equivalent to z scores.
119
identifying the truth?
in research, philisophically, one cannot prove anything from empirical observation, but one can disprove falsities Identifying the truth is actually achieved by moving urther away from error, rather than discovering truth.
120
Differences in means- the t-test
when data is approximately normally distributed, two groups= t test. Student T test is subjects are different, paired t test if observations of same group at different time point
121
F statistic
when normally distributed data in 3+ groups= ANOVA, which is a measure of variance F statistic = the variability around the mean between groups is compared with the variability around mean within the group ANOVA only tells you if there are differences between the groups, but it doesn't tell you where the differences are.
122
Bonferroni correction
divides the significance level by the number of observations, so if 5 observations, then 0.05/5= <0.01 minimising Type 1 error when doing multiple significance testing
123
Wilcoxon rank
For paired data Non-normally distributed Non-parametric
124
Mann Whitney U
Non parametric | 2 independent
125
benefits of parametric
more powerful | calculation easier for CI and more flexible when examining interralationships between >2 variables
126
when to use chi 2
when comparing two proportions of dichotomous data
127
down side of the flexibility of chi2
Can do contingency 2x2 tables to test significance of variables- to see if there is any difference. This is the basis of multivariate statistics, stratified analysis, however v sensitive to sample size
128
measures of association
Odds ratio Relative risk Correlatioin Regression
129
measure of aggreement/concordance
Cohen's kappa- measure of reliability of research assessment, chance agreement allowed in this calculation by comparing the actual and potential agreement beyond chance, expressed as a fraction between 0 and 1
130
reliability between and within raters
Measures stability between raters= inter-rater reliability And within rater over time= intra-rater reliability Correlation co-efficient or if 2+ raters at same time, intraclass correlation co-efficient
131
validitiy
measuring what it is actually supposed to be measuring
132
difference between odds and probability
in probability- the denominator includes the numerator, whereas in odds it does not P= 0/0+1 0=P/1-P
133
odds ratio
odds of exposure in cases, relative and divided by that in control= a/c / b/d or ad/bc Where a = Number of exposed cases b = Number of exposed non-cases c = Number of unexposed cases d = Number of unexposed non-cases odds of person with the outcome, having the exposure odds of cases having risk factor
134
relative risk
in cohort studies risk of outcome from exposure OR approximates to RR when outcome is rare a/a+b / b/b+c
135
attributable risk
difference between the disease outcome in exposed vs non exposed
136
Correlation coefficient measures
in parametric= Pearson's | in non-parametric= Spearman's
137
when to use multivariate versus logistic regression
multivariate if continuous | logitic if binary
138
A systematic review differs from a literature review in that eligibility criteria are developed based on
Population and outcomes of interest, interventions and comparisons.
139
A systematic review of qualitative studies can be undertaken by a
Meta-synthesis
140
Qualitative data reports would NOT include: A. Analysis by synthesis. B. Discourse analysis. C. Interpretive phenomenological analysis (IPA). D. Thematic analysis.
analysis by synthesis