Basic statistics (MRCP) Flashcards
(140 cards)
A study is evaluating the effect of agomelatine on postnatal depression at
a mother and baby unit. Which one of the following should be considered
when assessing the internal validity of this study?
A. Benefi ts of agomelatine in major depression outside the postpartum period
B. The degree to which the subjects adhered to the study protocol
C. The cost of using agomelatine compared with standard care
D. Consistency of the reported outcome in comparison with previous studies
E. Benefi ts of agomelatine in postpartum depression when used at an outpatient service
B. Internal validity is the degree to which a study establishes the cause-and-effect relationship
between the treatment and the observed outcome. External validity is the degree to which
the results of a study becomes applicable outside the experimental setting in which the study
was conducted. In other words, external validity refers to generalizability of study results while
internal validity refers to rigorousness of the research method. The benefi t of agomelatine
in different populations (choices A and E) refers to external validity; the cost of the drug
and consistency of results obtained from different studies are related to applicability of the
intervention in a clinical setting. Assessment of adherence to study protocol is one of many ways
of analysing the quality of an intervention trial.
A new clinician-administered test for assessing suicidal risk is studied in a
prison population in Canada, where a high suicide rate of 1 in 25 has been
recorded. Which of the following indicate that this test is NOT suitable for
your clinical population?
A. The positive predictive value is 80%
B. The likelihood ratio for a positive test is 14
C. The prevalence of suicide in your clinical sample is 1 in 890
D. The inter-rater reliability (kappa) of the test is 0.8
E. The literacy rate of the prison population is very low but comparable with your clinical
sample
C. Having a high positive predictive value, a likelihood ratio more than 10, and good interrater
reliability as measured by kappa are desirable properties of an instrument. But when the
same instrument is applied to a population with much lower prevalence of suicide (the studied
phenomenon), the post-test probability decreases substantially. Post-test probability is a measure
of positive predictive value in the target population; it depends on pretest probability, i.e. the
prevalence and likelihood ratio.
A new rating scale being evaluated for anxiety has a sensitivity of 80% and specifi city of 90% against the standard ICD-10 diagnosis. The likelihood ratio of a positive result is A. Nearly 2 B. Nearly 0.2 C. 0.08 D. 8 E. 0.5
D. The likelihood ratio of a positive test (LR+) is the ratio between the probability of a
positive test in a person with disease and the probability of a positive test in a person without
disease. It can also be expressed as
LR+ = sensitivity/(1 – specifi city)
Here, sensitivity = 0.8; specifi city = 0.9.
Hence LR+ = 0.8/1 – 0.9 = 8.
A pharmaceutical company developed a new antidepressant ‘X’. They
conducted a randomized double-blind placebo controlled trial of the drug.
The study had two arms: an active medication arm and a placebo arm.
Each arm had 100 subjects. Over a 4-week period, a 50% drop in Hamilton
depression scale (HAMD) scores were seen in 40 subjects in the active
medication arm, while a similar drop was seen only in 20 subjects in the
placebo arm. What is the number needed to treat (NNT) from this trial for
the new antidepressant?
A. 1
B. 2
C. 3
D. 4
E. 5
5
During the same placebo controlled trial described in question 4, 20% of
people on X developed active suicidal ideas, while only 10% of patients on
placebo developed the same side-effect. What is the number needed to
harm (NNH) associated with the suicidal ideas from the trial data?
A. 5
B. 10
C. 15
D. 20
E. 25
b
The prevalence of depression in patients with mild cognitive impairment
is 10%. On applying a depression rating scale with the likelihood ratio of a
positive test (LR+) equal to 10, a patient with mild cognitive impairment
becomes test positive. The probability that this patient is depressed is equal
to
A. 15%
B. 32%
C. 52%
D. 85%
E. 100%
C. This question tests one’s ability to calculate post-test probability from likelihood ratios.
The probability of having a disease after testing positive with a diagnostic test depends on
two factors: (a) the prevalence of the disease, (b) the likelihood of a positive test result using
the instrument. It is important to remember that baseline prevalence of a disease for which a
diagnostic instrument is being tested is taken as the pretest probability.
So pretest probability = 10%
Now, post-test odds = likelihood ratio × pretest odds
From a given probability odds can be calculated using the formula
odds = (probability)/(1 – probability)
Here pretest odds = (10%)/(1 – 10%) = 10/90 = 1/9.
Now post-test odds = likelihood ratio × pretest odds
= 10 × 1/9 = 10/9
Using the formula probability = odds/(1 + odds)
post-test probability = (10/9)/[1 + (10/9)] = 10/19 = 52.3%
A multi-centre double blind pragmatic randomized controlled trial (RCT)
reported remission rates for depression of 65% for fl uoxetine and 60% for
dosulepin. The number of patients that must receive fl uoxetine for one
patient to achieve the demonstrated benefi cial effect is
A. 60
B. 20
C. 15
D. 10
E. 5
B. This question tests one’s knowledge of the NNT (number needed to treat) concept. NNT
is given by the inverse ratio of the absolute benefi t increase (ABI) in therapeutic trials. ABI is
the difference between benefi t due to experimental intervention and the compared standard/
placebo. Here it is given by 65% – 60% = 5%. If ABI = 5%, NNT = 100/5 = 20.
In a randomized double-blind trial two groups of hospitalized depressed
patients treated with selective serotonin reuptake inhibitors (SSRIs) are
evaluated for benefi cial effects on insomnia of trazodone vs temazepam.
Which of the following is NOT an important factor when evaluating the
internal validity of results obtained from the above study?
A. Baseline differences in antidepressant therapy between the two groups
B. The method used to randomize the sample
C. Setting in which the study takes place
D. Sensitivity of the insomnia scale to pick up changes in severity
E. Inclusion of the data in fi nal analysis from patients who have dropped out
C. Threats to internal validity of an experimental study include confounding, selection bias,
differential attrition, and quality of measurement. Having a signifi cant difference in baseline SSRI
therapy could explain differential outcomes in the trazodone vs temazepam groups. Similarly,
poor randomization may lead to selection bias and infl uence the differences in outcome. Failure
to account for differential drop-out rates may spuriously infl ate or defl ate the difference in
outcome. Using a scale with poor sensitivity to change will reduce the magnitude of differences
that could be observed. Given both groups are recruited from the same setting (hospital), this
must not infl uence validity; on the other hand, this might well infl uence generalizability of results
to the non-hospitalized population (external validity)
While adapting the results of an RCT into clinical practice, a clinician wants
to calculate the new NNT values for his own clinical population using the
results of the RCT. Apart from the reported RCT which of the following is
needed to carry out the calculation of the new NNT?
A. The expected rate of spontaneous resolution of the treated condition in the clinical
population
B. The size of the clinical population
C. The case fatality rate for the treated condition in the clinical population
D. Lifetime prevalence of the disease in the clinical population
E. All of the above
A. Published RCTs may quote impressive outcomes in terms of NNT. Applying principles of
evidence-based medicine, one must check for the internal validity of a study and the degree of
generalizability before adapting the results to clinical practice. One must also be aware of the
fact that though clinically more meaningful, NNTs quoted in RCTs may not translate to the same
extent in actual clinical practice. One way of appreciating the usefulness of a newly introduced
drug is to calculate the NNT for one’s own clinical population (target population). To enable
this one may estimate the patient expected event rate (PEER), which is given by the expected
spontaneous resolution rate or the response rate for an existing standard treatment. This can
be obtained from the local audit data or clinical experience. The product of PEER and relative
benefi t increase from the published RCT gives the new absolute benefi t increase (ABI new)
value for the target population. The inverse of the new ABI gives the new NNT for the target
population. The disease prevalence rate or absolute size of the target population has no effect on
the new NNT.
In an attempt to ensure equivalent distribution of potential effect-modifying
factors in treating refractory depression, a researcher weighs the imbalance
that might be caused whenever an individual patient enters one of the two
arms of the study. Every patient is assigned to the group where the least
amount of imbalance will be caused. This method is called
A. Stratifi cation
B. Matching
C. Minimization
D. Randomization
E. Systematic sampling
C. In most treatment trials interventions are allocated by randomization. Block
randomization and stratifi ed randomization can be used to ensure the balance between groups
in size and patient characteristics. But it is very diffi cult to stratify using several variables in a
small sample. A widely acceptable alternative approach is minimization. This method can be used
to ensure very good balance between groups for several confounding factors irrespective of the
size of the sample. With minimization the treatment allocated to the next participant enrolled in
the trial depends (wholly or partly) on the characteristics of those participants already enrolled.
This is achieved by a simple mathematical computation of magnitude of imbalance during each
allocation.
The effectiveness of an intervention is measured by using pragmatic trials. Which trial design is normally employed when carrying out a pragmatic trial? A. RCT B. Meta analysis C. Systematic review D. Cohort study E. Case series
A. RCTs provide high-quality evidence for or against proposed interventions. But RCTs
have a major limitation in terms of generalizability. This is because the trials are conducted in a
somewhat artifi cial experimental setting that is different from clinical practice. So RCTs have
high internal validity due to rigorous methodology but poor external validity. Pragmatic RCTs are
a type of RCTs introduced with the intention of increasing external validity, i.e. generalizability
of RCT results. But this takes place at the expense of internal validity. In pragmatic RCTs the
trial takes place in a setting as close as possible to natural clinical practice, i.e. the inclusion and
exclusion criteria are less fastidious, often ‘treatment as usual’ is employed for comparisons,
instead of placebos and real world, functionally signifi cant outcomes are considered.
The probability of detecting the magnitude of a treatment effect from a
study when such an effect actually exists is called
A. Validity
B. Precision
C. Accuracy
D. Power
E. Yield
D. The power of a study refers to the ability of the study to show the difference in outcome
between studied groups if such a difference actually exists. The term power calculation is often
used while referring to sample size estimation before a study is undertaken. In order to carry out
power calculation one has to know the expected precision and variance of measurements within
the study sample (obtained from a literature search or pilot studies), the magnitude of a clinically
signifi cant difference, the certainty of avoiding type 1 error as refl ected by the chosen
p value, and the type of statistical test one will be performing. There is no point in calculating the
statistical power once the results of a study are known. On completion of trials, measures such as
confi dence intervals indicate the power of a study and the precision of results
Power is the ability of a study to detect an effect that truly exists. Power can
also be defi ned as
A. Probability of avoiding type 1 error
B. Probability of committing type 1 error
C. Probability of committing type 2 error
D. Probability of detecting a type 2 error
E. Probability of avoiding type 2 error
E. Power refers to the probability of avoiding a type 2 error. To calculate power, one needs
to know four variables.
1. sample size
2. magnitude of a clinically signifi cant difference
3. probability of type 1 error (signifi cance level from which p value is derived)
4. variance of the measure in the study sample.
Underpowered trials are those that enrol too few participants to identify differences between
interventions (arbitrarily taken as at least 80% of the time) when such differences truly exist.
Underpowered RCTs are prone to false-negative conclusions (type 2 errors). Somewhat
controversially, underpowered trials are considered to be unethical, as they expose participants
to the ordeals of research without providing an adequate contribution to clinical development
A new diagnostic test detects 60 out of 100 schizophrenia patients correctly.
It does not wrongly diagnose anyone in a sample of 100 controls.
The positive predictive value of this test is
A. 50%
B. 60%
C. 40%
D. 100%
E. 0%
D. It is useful to construct a 2 × 2 table for calculating properties of reported diagnostic
tests. From the given information we can draw the following:
Now, positive predictive value = true positive/total positive = 60/60 = 100%.
A new diagnostic test detects 60 out of 100 schizophrenia patients correctly.
It does not wrongly diagnose anyone in a sample of 100 controls.
How sensitive is this test in detecting schizophrenia?
A. 60%
B. 40%
C. 100%
D. 90%
E. 0%
a
Sensitivity = true positive/total diseased (schizophrenia subjects) = 60/100 = 60%
A new diagnostic test detects 60 out of 100 schizophrenia patients correctly.
It does not wrongly diagnose anyone in a sample of 100 controls. How
accurate is this test in detecting schizophrenia?
A. 100%
B. 80%
C. 60%
D. 40%
E. 70%
b
Accuracy = all true observations/total population studied = (100 + 60)/200 = 160/200 = 80%
A new diagnostic test detects 60 out of 100 schizophrenia patients correctly.
It does not wrongly diagnose anyone in a sample of 100 controls. What
are the chances that the text will turn negative in your next patient with
schizophrenia?
A. 100%
B. 70%
C. 60%
D. 40%
E. 30%
D. This question asks the candidate to calculate the probability of a negative test in
someone with the disorder – false-negative rate (FNR)
This is given by FNR = false negative/total diseased = 40/100 = 40%
FNR is same as (1 – sensitivity); similarly false-positive rate (FPR) is same as (1 – specifi city).
Which of the following properties of a screening test increases with
increasing disease prevalence in the population?
A. Negative predictive value
B. Sensitivity
C. Specifi city
D. Accuracy
E. Positive predictive value
E. Sensitivity, specifi city, and accuracy are measures that refl ect the characteristics of the
test instrument. These measures do not vary with changes in the disease prevalence. Positive
predictive value increases while negative predictive value decreases with rising population
prevalence of the disease studied. The prevalence can be interpreted as the probability before the
test is carried out that the subject has the disease, known as the prior probability of disease. The
positive and negative predictive values are the revised estimates of the same probability for those
subjects who are positive and negative on the test, and are known as posterior probabilities.
Thus the difference between the prior and posterior probabilities is one way of assessing the
usefulness of the test.
Two observers are rating MRI scans for the presence or absence of white
matter hyperintensities. On a particular day from the records, they are
observed to have an agreement of 78%. If they could be expected to agree
50% of the time, even if the process of detecting hyperintensities is by pure
chance, then the value of kappa statistics is given by
A. 50%
B. 44%
C. 56%
D. 78%
E. 22%
C. Agreement between different observers can be measured using the kappa (κ) statistic
for categorical measures such as the one highlighted in this question (presence or absence of
MRI hyperintensities). Kappa is a measure of the level of agreement in excess of that which would
be expected by chance. It is calculated as the observed agreement in excess of chance, expressed
as a proportion of the maximum possible agreement in excess of chance. In other words
kappa = the difference between observed and expected agreement/(1 – expected agreement).
In this example, the observed agreement is 78%. The expected agreement is 50%. Hence
kappa = (0.78 – 0.50)/(1 – 0.50) = 0.28/0.50 = 56%.
The number of days that a series of fi ve patients had to wait before starting
cognitive behavioural therapy (CBT) at a psychotherapy unit is as follows:
12, 12, 14, 16, and 21. The median waiting time to get CBT is
A. 15 days
B. 12 days
C. 14 days
D. 21 days
E. 13 days
C. The median is calculated by placing observations in a rank order (either ascending
or descending) and picking up the most central value. If the number of observations is even
(multiples of two), then the median is taken as the arithmetic mean of the two middle values
The number of days that a series of fi ve patients had to wait before starting CBT at a psychotherapy unit is as follows: 12, 12, 14, 16, and 21. The mean waiting time to get CBT is A. 15 days B. 12 days C. 14 days D. 21 days E. 13 days
A. The arithmetic mean is calculated from the sum of all individual observations divided
by the number of observations. Here the number of observations = 5. The sum of individual
observations = 12 + 12 + 14 + 16 + 21 = 75. The average = 75/5 = 15.
The most clinically useful measure that helps to inform the likelihood of
having a disease in a patient with positive results from a diagnostic test is
A. Accuracy
B. Positive predictive value
C. Sensitivity
D. Specifi city
E. Reliability
B. The probability that a test will provide a correct diagnosis is not given by the sensitivity
or specifi city of the test. Sensitivity and specifi city are properties of the test instrument – they
are not functions of the target population/clinical sample. On the other hand, positive and
negative predictive values are functions of the population studied; they provide much more
clinically useful information. Predictive values observed in one study do not apply universally.
Positive predictive value increases with increasing prevalence of the disease; negative predictive
value decreases with increasing prevalence. Sensitivity and specifi city, being properties of the
instrument used, do not vary with prevalence
Zarkin et al., 2008 reported the cost-effectiveness comparison of naltrexone
and placebo in alcohol abstinence. The mean effectiveness measured as
percentage days of abstinence was nearly 80% for naltrexone group while
it was 73% for the placebo group. The mean cost incurred for the placebo
group was $400 per patient. The naltrexone group incurred a cost of
680 per patient. How much additional cost needs to be spent per patient
for each percentage point increase in total days of abstinence when using
naltrexone compared with placebo?
A. $40
B. $50
C. $7
D. $500
E. $2
A. The incremental cost-effectiveness ratio (ICERAB) can be defi ned as the difference in
cost (C) of interventions A and B divided by the difference in mean effectiveness (E), (CA – CB)/
(EA – EB), where intervention B is usually the placebo or standard intervention that is compared
with intervention A. In this example, the difference in costs = $680 – 400 = $280. The difference
in effectiveness as measured by percentage days of abstinence is 80 – 73% = 7%. Hence
ICER = 280/7 = $40 per patient per percentage point of days of abstinence.
Two continuous variables A and B are found to be correlated in a nonlinear
fashion. All of the following can be considered as suitable statistical
techniques for examining this relationship except
A. Curvilinear regression
B. Logistic regression
C. Multiple linear regression
D. Polynomial regression
E. Exponential regression
C. When the relationship between two continuous variables is plotted in a graph,
the resulting distribution may be a straight line or a curve. If the relationship between the
independent (X) variable and dependent (Y) variable appear to follow a straight line, then linear
regression can be constructed to predict the dependent variable from the independent variable.
Otherwise, one can resort to one of the following methods:
1. Attempting to transform the available data to straighten the curved relationship.
2. One can try curvilinear regression, e.g. logarithmic regression, exponential regression, and
trigonometric regression.
3. Unless there is a theoretical reason for supposing that a particular form of the equation as
mentioned above, such as logarithmic or exponential, is needed, we usually test for
non-linearity by using a polynomial regression equation.
4. Multiple linear regression is often used to examine the linear relationships when there is more
than one independent variable infl uencing a dependent variable.