Basic statistics (MRCP) Flashcards

Question

A drug representative presents data on a new trial. The data show that drug A prevents annual hospitalization in 20% more dementia patients than placebo. You are very impressed but your consultant wants to know how many patients you need to treat to prevent one hospitalization. The correct answer is A. 20 B. 5 C. 80 D. 1 E. 100

Answer 1

B. The answer to this question can be found by calculating the number needed to treat (NNT). The absolute increase in benefi t (ABI) is given by the difference in outcome between two groups. This is 20% as quoted by the drug representative. Hence NNT = 100/20 = 5. You need to treat fi ve patients with the new drug to prevent one annual hospitalization. How small must the NNT be to be clinically impressive? This depends on the availability of other interventions and their NNTs, incremental cost of the proposed intervention, and tolerability of the intervention. The last one can be partly deciphered by calculating the number needed to harm for a notable side-effect of the intervention

Answer 2

D. Blinding reduces differential assessment of outcomes of interest (ascertainment bias, information bias, or observer bias) that can occur if the investigator or participant is aware of the group assignment. Blinding can also improve compliance and retention of trial participants and reduce unaccounted supplemental care or treatment that may be sought by the participants. Single blinding refers to either the investigator or the patient being blind to group assignment. Double blinding refers to both the patient and the investigator remaining unaware of the group assignment after randomization. This is desirable but not always possible in RCTs. In the example above, the subjects who undertake the exercise schedule cannot be kept unaware of exercising! A single-blind trial is possible in such cases

Answer 3

C. MeSH stands for medical subject headings. It is a thesaurus embedded in the Pubmed–Medline interface and can be used to search literature more effectively using recognized key words

Answer 4

E. Single blind: either the patient or the clinician remains unaware of the intervention given. Double blind: both the patient and investigator are unaware of the given intervention. Open label: both researchers and the participants are aware of treatment being given after randomisation. Triple blind: apart from the patient and the researcher, those who measure the study outcomes (the assessors) are also unaware of the given intervention.

Answer 5

B. If random interchange between treatment and placebo groups occurs halfway through the study, this will lead to chaos and failed randomization. This is termed as contamination. This can occur when participants or their care givers discover they are ‘controls’, and obtain the experimental treatment outside the trial, thus effectively becoming the active treatment group. Choice C is practically impossible; to share controls of one RCT with another means the trial is open label. When each subject in the trial receives both intervention and placebo with a washout period in between while remaining blind to the intervention, this is called as crossover RCT. Crossover trials are possible only if short-term outcomes are evaluated in chronic diseases. This is because the disease process must be suffi ciently long for the subject to receive both interventions across its course. Any intervention applied in a crossover setting must not permanently alter the disease process.

Answer 6

D. If one wishes to compare the effect of more than one intervention against placebo either a multi-arm RCT or a factorial design can be chosen. A multi-arm RCT is a simple extension of the usual RCTs where an extra arm of subjects is generated through randomization to allocate the second intervention in addition to placebo and the fi rst intervention groups. A factorial RCT evaluates the effect of more than one intervention, independently and also in combination. In the above example the effect of two different psychotherapies independently and in combination could be studied using a factorial design

Answer 7

A. ‘Degree of freedom’ is defi ned as the number of values in the fi nal calculation of statistics that are free to vary. In a two-way chi-square test, this is given by Degrees of freedom (d.f.) = (number of rows – 1) × (number of columns – 1) In this question, for a 2 × 2 table, there are 2 rows and 2 columns. Hence d.f. = (2 – 1) × (2 – 1) = 1 × 1 = 1. Degrees of freedom cannot take negative values

Answer 8

B. No single study design is suffi cient in itself to answer various clinical questions. For evaluation of a diagnostic test, a survey design that allows comparison with the gold standard is often used. For prognostic studies a prospective cohort design is useful. Therapeutic interventions are best evaluated using RCTs. Aetiological studies are often cohort or case–control studies; although the RCT is ideal it may not be always possible to conduct one. Epidemiological studies are often cross-sectional surveys

Answer 9

D. The RCT has traditionally been considered as a study design that can yield results with a high degree of internal validity. But the major drawback of RCTs is that the process takes place under highly experimental conditions, which are not seen in clinical practice. So any results achieved from such RCTs, though valid, may not be reproducible in everyday practice. In order to circumvent this issue, more naturalistic trials that retain core principles of RCT such as randomization, longitudinal follow-up, and controlled intervention are being increasingly used. Such real-world RCTs are called pragmatic trials or effectiveness trials. Such trials can be useful to fi nd out if an intervention will be effective in clinical practice, although they may not be suitable to study the biological effi cacy of a drug. A pragmatic RCT may reject various practices seen in an explanatory RCT, such as strict exclusion criteria, blinding, placebo use, fi xed dose intervention, high follow-up care, per-protocol analysis, etc. But basic principles such as randomization and use of probability theory (hypothesis testing and p values) are retained.

Answer 10

C. Bias is defi ned as any trend in the collection, analysis, interpretation, publication, or review of data that can lead to conclusions that are systematically different from the truth. It can also be termed as a systematic error that infl uences the result in either direction. Hence a biased study could either overestimate or underestimate the true effect, depending on the direction of the trend. Bias may be introduced by poor study design or poor data collection. Bias cannot be ‘controlled for‘ at the analysis stage. In RCTs, randomization ensures a reduction in selection bias if the process is carried out in a strictly concealed manner. Blinding can reduce the measurement bias if properly executed.

Answer 11

E. Recall bias refers to the systematic error produced by the tendency of subjects to recall an exposure differently when they are diseased compared with when they are not. Recall bias most often occurs in case–control studies. The remaining choices refer to genuine disadvantages of a well-conducted RCT

Answer 12

D. In most drug trials, patients drop-out because of non-effi cacy or adverse events. If we think that a number of participants drop-out because of non-effi cacy, dropping them out of the analysis would project a favourable outcome for the drug in question. Hence the LOCF method takes the last observation and utilizes it in the analysis. For illustration, we take two subjects, in a trial of antidepressants. Subject 1, improves signifi cantly over the 4 weeks, his MADRS score has dropped to 1 from a baseline of 30, while Subject 2 dropped out of the study in the second week, due to non-effi cacy. If we remove subject 2 from the analysis, the mean score at the end would be 1 (an whopping improvement of 29 points on the MADRS), while if we carry forward his last observation score (week 2) of 30 to the end and took the mean of the two scores (15.05), the drop is only 15 points from the mean baseline score of 30. Trials of Alzheimer’s disease interventions are different, since we do not expect (although we most defi nitely would like to see) improvement in the cognitive score, but a rather slow decline in scores over time, in spite of the medications, due to the progressive nature of the illness. If a patient drops out early because of the experience of adverse effects, carrying forward his score to the endpoint analysis will falsely project a favourable outcome. Again to illustrate, let us consider a trial of cholinesterase inhibitors. Subject 1 experienced a decline of 19 points over 4 weeks, while the second subject dropped out the fi rst week, when his MMSE had not declined. If we carry forward his last observation of 20, it will look like there was no deterioration at all, and the difference in the mean scores over time would be diluted to 10, rather than a drop of 19. As a corollary, the reason for drop-out is another important issue. In trials of Alzheimer’s disease interventions, early drop-outs are most probably due to adverse effects, while late drop-outs are due to non-effi cacy. This can again project a favourable outcome for the drug.

Answer 13

E. There are a number of ways to manage heterogeneity. The easiest way would be to avoid it. This includes using strict inclusion criteria to include studies that are as similar as possible. In case of continuous variables, one of the ways would be to transform the data so that all data look similar and are less heterogeneous. Meta regression is a collection of statistical procedures to assess heterogeneity, in which the effect size of study is regressed on one or several covariates, with a value defi ned for each study. The fi xed-effect model of meta-analysis as reported in this question, considers the variability between the studies as exclusively due to random variation. The random-effects model assumes a different underlying effect for each study and takes this into consideration as an additional source of variation. The effects of the studies are assumed to be randomly distributed and the central point of this distribution is the focus of the combined (pooled) effect estimate. If there were some types of studies that were likely to be quite different from the others, a subgroup analysis may be done. And fi nally, one could exclude the studies that contribute a great deal to the heterogeneity. Locating unpublished studies may help reduce publication bias but will not have any predictable and constant effect on the degree of heterogeneity.

Answer 14

C. Odds are the probability of an event occurring divided by the probability of the event not occurring. An odds ratio is the odds of the event in one group (e.g. intervention group) divided by the odds in another group (e.g. control group). Odds ratios tend to exaggerate the true relative risk to some degree. But this exaggeration is kept minimal and even negligible if the probability of the studied outcome is low (empirically, less than 10%); in such cases the odds ratio approximates the true relative risk. As the event becomes more common the odds ratio no longer remains a useful proxy for the relative risk. It is suggested that the use of odds ratios should probably be limited to case-control studies and logistic regression examining dichotomous variables. As risk refers to the probability of an event occurring at a time point, in other words it is the same as the incidence rate. The inherent cross-sectional nature of a case–control study (where ‘existing cases’ are recruited) does not allow one to study ‘new’ incidences. Hence we cannot measure risk, and so relative risk, from case–control designs.

Answer 15

E. Choice A refers to a clinical question related to therapeutic intervention – RCTs are best suited to answer this. Choice B is an epidemiological question – ‘how many in a population have a particular condition?’ A cross-sectional survey could answer this question. Choice C refers to a prognostic question – how long will it take for schizoaffective relapse following lithium discontinuation? A prospective cohort (or a RCT if ethically approved) is the most appropriate design for this question. Choice D requires a clinical audit, which is often closer to a cross-sectional survey in design. Choice E refers to defi ned cases and controls being compared for a possible exposure or risk factor that might have occurred in the past. Hence the case– control design is best suited to answer this question. Please note that it is possible to design a prospective cohort study by observing for a long time those with academic failure to detect development of depression.

Answer 16

B. N-of-1 trials are randomized double-blind multiple crossover comparisons of an active drug against placebo in a single patient. The design uses a series of pairs of treatment periods called modules. Within each module the patient receives active treatment during one period and either an accepted standard treatment or placebo in the other. Random allocation determines the order of the two treatment periods within each pair and both clinician and patient are blinded for the intervention. This design is mostly suited for chronic recurrent conditions for which long-term interventions exist that are not curative. Interventions with rapid onset and offset of effects are best suited for n-of-1 trials. This allows shorter treatment periods wherein multiple modules of intervention and placebo/standard treatment can be compared, increasing the chance of achieving a statistically signifi cant result. It is also necessary that the interventions tested must be cleared from the patient’s system within a fi nite washout period.

Answer 17

E. Publication bias refers to the tendency of journals to accept and publish certain types of studies more often than the others. In general, studies with results that are impressively signifi cant or of higher quality by virtue of larger sample size are more successful in getting published. Publication bias can be considered as a form of selection bias when one attempts a systematic review or meta-analysis. Publication bias can be detected using a funnel plot – visual inspection of a graph drawn by plotting a measure of precision (often sample size) against treatment effect will reveal asymmetry of the two arms of the funnel-shaped graph if publication bias is present. Galbraith plot refers to a graph obtained by plotting a measure of precision such as (1/standard error) against standard normal deviate (log of odds ratio/standard error). The coordinates obtained from such a plot can be used to determine the extent of publication bias using linear regression. Failsafe N is another way of estimating publication bias. Consider a meta-analysis yielding a statistically signifi cant difference in outcome between two interventions, despite suspected publication bias. Then failsafe N answers the question ‘How many missing studies are needed to reduce the effect to statistical non-signifi cance?’ The higher the failsafe N, the lower the publication bias. If one could solicit and compare all unpublished data with published data, then publication bias would become obvious.

Answer 18

A. Allocation concealment refers to the process used to prevent fore knowledge of the assignment before allocation is complete. So the investigator who recruits subjects for a trial will not know the nature of assignment of consequent subjects that enter randomization. Allocation concealment seeks to prevent selection bias, protects the allocation sequence before and until assignment, and can almost always be successfully implemented in a RCT. It is often confused with blinding which seeks to prevent ascertainment bias and protects the sequence after allocation, and cannot always be implemented

Answer 19

B. The standard deviation has the same units as the primary variable. This is an advantage of standard deviation compared with variance, which is also a measure of dispersion

Answer 20

D. If many observations are substantially higher than the median we can assume that the mean of the distribution might be greater than the median. This translates to a positively skewed distribution. No comments can be made on mode using the available information

Answer 21

C. α is the probability of type 1 error. It is used to set the threshold for statistical (not clinical) signifi cance, often arbitrarily set as p = 0.01–0.05 (α = 1–5%). If α = 0.05, there is a 1 in 20 or 5% chance that the null hypothesis is rejected wrongly.

Answer 22

C. Despite the increasing importance and abundance of systematic reviews and metaanalyses in the scientifi c literature, the reporting quality of systematic reviews varies widely. To address the issue of suboptimal reporting of meta-analyses, an international group in 1996 developed a guidance called the QUOROM Statement (QUality Of Reporting Of Metaanalyses). QUOROM focused on the standards of reporting meta-analyses of RCTs. A revision of these guidelines renamed as PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) includes several conceptual advances in the methodology of systematic reviews

Answer 23

E. Meta-analysis is generally done to combine the results of different trials, as individual clinical trials are often too small and hence underpowered to detect treatment effects reliably. Meta-analysis increases the power of statistical analyses by pooling the results of all available trials. But this comes at a small cost. Although similar studies are taken to be included in the meta-analysis, it is likely that each trial is different from each other just by chance. Sometimes the difference can occur due to foreseeable situations, e.g. the dosage of medication tested, the mean ages of the population tested, difference in the scales used, etc, may differ among studies. To measure if this heterogeneity is more than the random heterogeneity we expect, statisticians resort to certain tests of heterogeneity. They are statistical as in the chi-square test (or Q statistic), which tests the ‘null hypothesis’’ of homogeneity and the I-squared test (which measures the amount of variability due to heterogeneity). Galbraith’s plot and l’Abbé plot are pictorial representations of heterogeneity. A paired t test is generally not used to calculate the heterogeneity.

Answer 24

A. Data can be qualitative or quantitative. Quantitative data refers to measures that often have a meaningful unit of expression. This can be either discrete or continuous. A discrete measure has no other observable value between two contiguous potentially observable values, i.e. there are ‘gaps’ between values. A continuous variable, on the other hand, can take potentially infi nite values. The other choices in the question refer to qualitative measures whose value can only be described and counted but cannot be expressed in meaningful units

Answer 25

A. A major disadvantage with RCTs is the poor generalizability of experimental fi ndings to a clinical setting. Having strict inclusion and exclusion criteria may help chose a highly homogeneous population, increasing the internal validity of the study but at the expense of generalizability.

Answer 26

C. In scientifi c research, nothing can be proven; we can only disprove presumed facts. If one wants to prove maternal smoking causes school refusal, it is best to assume that maternal smoking does not cause school refusal to start with and then proceed to disprove this statement. Such statements waiting to be disproved during the course of a research study are called the null hypotheses. The converse of the null hypothesis is called the alternative hypothesis. Research question: Does maternal smoking increase risk of school refusal? Null hypothesis: Maternal smoking does not increase risk of school refusal Alternative hypothesis: Maternal smoking increases the risk of school refusal

Answer 27

D. Subjects do not get randomized in a simple cohort study. Hence there is no question of allocation concealment. When valid instruments and a reasonable follow-up schedule are used, identifi cation of those who develop the ‘event’ of interest/outcome is often not diffi cult in a cohort design. Often the most diffi cult part is to identify a reasonable control cohort that lacks the ‘exposure’ of interest. Internal controls refer to those who are ‘non-exposed’ but derived from the same study population as the ‘exposed’. External control refers to an independently recruited cohort without the exposure

Answer 28

D. 95% confi dence limits of means of a sample are nothing but the range between an observation less than approximately two standard error units less than mean value and an observation two standard error units more than the mean value. Using mathematical expression, 95% confi dence limits = mean ± (2 × standard error of mean). Standard error of mean is calculated as SE = standard deviation/√sample size. SE = 15/√36 = 15/6 = 2.5 in this question. Hence 95% confi dence limits are 262 ± (2 × 2.5) = 262 ± 5 = 257, 267.

Answer 29

B. An important property of the normal distribution curve is the relationship between the SD of normally distributed observations and probability. Normal distribution curves are symmetric and bell-shaped. Nearly 68.5% of the sampled population will lie within 1 SD of the mean on either side of the curve, 95.5% within 2 SDs, and 99% within 3 SDs. In other words, there is a 1% chance that an observation will fall outside +3 SD to –3 SD; a 5% chance that it will fall outside +2SD to –2SD and nearly 30% chance that it will occur outside +1SD and –1SD.

Answer 30

C. If the confi dence interval includes a null treatment effect, the null hypothesis cannot be rejected within the set levels of confi dence limits. Confi dence intervals provide a measure of dispersion of the point estimate within stipulated confi dence limits (arbitrarily 95% corresponds to a p value of 5%). In other words, confi dence intervals provide the assured range within which the true value may lie. Confi dence intervals are a measure of precision of the results obtained from a study. The larger the sample studied, the narrower the intervals. If the confi dence intervals cross the value ‘0’ for the difference between means then the results are statistically not signifi cant. If it crosses the value ‘1’ for ratio measures such as the odds ratio, it is not signifi cant. If it crosses infi nity for inverse ratios such as NNT then it is not signifi cant.

Answer 31

A. In this study, the dependent variable is treated as a categorical outcome. In other words, the population has been categorized into ‘akathisia present’ or ‘akathisia absent’. This type of outcome yields frequency counts or proportions that can be analysed for signifi cance using the chi square test. The t test is used for comparing means. The Wilcoxon rank sum test is a non-parametric equivalent of the t test. Pearson coeffi cients are used to analyse correlation. Regression analyses are used to predict one variable from another when they are correlated

Answer 32

D. Irrespective of the number of observations made, the shape of a normally distributed curve is symmetric and bell shaped. The exact shape of the normal distribution is defi ned by a function that has only two parameters: mean and standard deviation. For a given range of scores, when the standard deviation is small, the curve becomes leptokurtic, i.e. thin but still symmetric. When the standard deviation is larger, it becomes platykurtic.

Answer 33

C. Standard deviation is a widely used measure of dispersion of data in descriptive statistics. Other measures include range, interquartile range (usually accompanies median values), and variance. Standard deviation is obtained by the root mean square of differences between individual observations and the mean value. Note that standard error is often preferred as the measure of dispersion while making inferences from a sample of the population. Standard error is a measure of precision of sample estimate in comparison with the population value.

Answer 34

B. The iterative approach in qualitative studies refers to the process of altering the research methods and building the hypothesis as the study progresses, in response to new information gained while conducting the research. This fl exibility allows qualitative studies to follow an inductive rather than the deductive approach seen in quantitative research. Data come before theory is generated in inductive methods; a stated theory is tested using generated data in deductive methods.

Answer 35

E. In the above question, the mean is given as 120 mmHg. Assuming normal distribution with a standard deviation of 10 mmHg, we can fi nd out the proportion of the population that will fall between two observed values. For values between –1 and +1 standard deviation from the mean, this will be nearly 68%. Nearly 34% will have values between the mean and 1 standard deviation. In other words 34% will have systolic blood pressure between 120 mmHg and 130 mmHg.

Answer 36

A. Signifi cant numbers of subjects recruited for trials often do not complete the trial as per protocol. The data generated from such drop-outs cannot be ignored as this will potentially lead to an attrition bias in favour of the intervention generally. Therefore, it is a standard practice to analyse the results of trials on an ‘intention to treat’ basis, i.e. data from subjects are analysed as per initial allocation irrespective of trial completion. In a few situations such as the ‘effi cacy studies’, intention to treat analysis is not used, instead ‘per-protocol analysis’ is carried out. An effi cacy study is designed to explain the effects of the intervention itself. This is in contrast to effectiveness studies, which are designed to study the usefulness of making an intervention available (choices B, C and D).

Answer 37

D. Relative risk reduction (or relative benefi t increase) is calculated using the following expression: relative risk reduction = absolute risk reduction/control event rate (RRR = ARR/ CER) The control event rate is 20%; the experimental event rate is 40%. Absolute risk reduction is the difference between the two event rates, i.e. 40 – 20 = 20% RRR = 20/20 = 1

Answer 38

The NNT can be calculated from the absolute risk reduction (ARR). NNT = 1/ARR NNT = 1/0.2 = 5 Five subjects must be treated with memantine to have one additional response

Answer 39

A. To calculate the odds ratio, it will be useful to construct a 2 × 2 table. As per protocol analysis is used, only those who completed the trial have been included in the analysis The odds ratio is obtained using the cross product ratio ad/bc = (80 × 40)/(60 × 20) = 8/3 = 2.7

Answer 40

C. Spearman’s correlation is used for non-parametric correlation analysis. It is also called the rank correlation test. It can be used when one or both variables to be correlated consist of ranks (ordinal) or if they exist as quantitative data but do not have normal distribution. Pearson’s correlation is used for parametric correlation. Kappa is a measure of agreement not correlation. Cohen’s d is used to calculate effect size. The internal consistency of an instrument is tested using Cronbach’s alpha.

Answer 41

B. To enable use of inferential statistics, standard sampling assumptions such as (1) the randomness of the sampled data and (2) the independent nature of the observations must be met. In addition, when parametric statistics are employed assumptions such as 1. homogeneity of variance of the samples 2. observations are obtained from continuous (interval/ratio) scales 3. normal distribution of the observed variable must be met. There is no set proportion of population size that must constitute the sample size in order to use parametric statistics. But in samples that are too small the distribution may not be normal and the central limit theorem may not be applicable. In conditions where such assumptions are not met non-parametric statistics are used. The latter are often considered to be less robust.

Answer 42

A. Drawing a 2 × 2 table will help answering this question | Control event rate is the rate of death (‘event’ of interest) in the control group = 25/225

Answer 43

C. Adequacy of blinding can be tested during or after completing a trial by asking the blinded parties to guess the allocation. Guess rates that are signifi cantly higher than expected by chance indicate failure of blinding. Testing for ‘blindness’ may not generate valid answers all the time. This is because as participants begin to experience treatment response or outcomes of interest, they begin to generate ‘hunches’ about the effi cacy of the treatments being tested. Hence tests for blinding can show spurious failure of blinding while in fact they test the ‘effi cacy hunches’ that develop late in the process of a trial.

Answer 44

B. The central limit theorem explains why normal distributions are so frequent when considering most biological parameters. Consider repeated sampling from a population where distribution of the observed variable is unknown. You intend to plot the distribution of individual means of each sample from the population. As sample size increases, the sample means approach a normal distribution with its mean value being the same as the population mean and a standard deviation equal to the standard deviation of the population divided by the square root of the sample size. Usually 10 or more observations are suffi cient to result in an approximate normal distribution.

Answer 45

C. The term validity refers to the strength of our conclusions, or in the case of statistics, the strength of our inferences. It refers to applicability. The term reliability refers to the consistency of our measurements, or the reproducibility. An important subtype of validity is called criterion validity. If an instrument provides a result that withstands the test of an external criterion then the instrument is said to have high criterion validity. The external criterion may be a measurement that can be obtained more or less at the same time (concurrent validity) or it may be an outcome that is predicted to occur in the future (predictive validity). If a test offers something over and above what is offered by an existing instrument, then incremental validity can be established. Internal consistency of a test refers to looking at how consistent the results are for different items (measuring the same construct) within the instrument studied. This can be measured by undertaking item–item correlation, item–total score correlation or split half reliability (Cronbach’s alpha; see elsewhere in this chapter).

Answer 46

E. Questions similar to this are very common in the MRCPsych exam. Most of such questions provide some data and require the candidate to do a series of calculations from the data. It is always advisable to redraw as soon as possible the presented data in a format that will fi t the purpose. From the given table we can create a 2 × 2 table, with the gold standard result on the top. One should be careful while constructing the 2 × 2 table. It is advisable to stick to one style of using columns and rows to indicate a particular group of data. Here, we have drawn the 2 × 2 table with the gold standard results indicated across the two data columns with screening test results indicated across the two rows Sensitivity is defi ned as the test’s ability to identify people who, according to the diagnostic (gold) standard, actually have the disorder (true positives). Sensitivity = A/(A + C) = 39/43 = 90.69%, i.e. 90.69% of subjects who really have depression according to DSM-IV criteria have a positive test result on the screening test. In other words, sensitivity is the proportion of true positives (cases) correctly identifi ed by the test.

Answer 47

A. Specifi city is defi ned as the test’s ability to exclude people who, according to the diagnostic (gold) standard, do not actually have the disorder (true negatives). Specifi city = D/ (B + D) = 84/124 = 67.74%, i.e. 67.74% of the people who do not have depression will have a negative result on the two-question screen. Thus specifi city is the proportion of true negatives among all non-diseased individuals. In other words, it is the ability of a test to rule out the disorder among people who do not have it.

Answer 48

A. Not all of those people, who have been found to be ‘positive’ on the test, might actually have the disorder. Positive predictive value (PPV) gives the proportion of true positives among the test positives. It is calculated using the formula, PPV = A/(A + B) = 39/79 = 49.36%, i.e. 49.36% of people diagnosed with depression using the screening test actually have the illness.

Answer 49

E. Not all of the people who have been found to be ‘negative’ on the test might actually be disease free. Negative predictive value (NPV) answers the question ‘Of those people who have been found to be ‘disease negative’ on the test, how many actually do not have the disorder?’ It is calculated using the formula, NPV = D/(C + D) = 84/88 = 95.45%, i.e. 95.45% of people diagnosed ‘normal’ on the test don’t have the disorder.

Answer 50

D. The prevalence, also known as the pretest probability or base rate, refers to the proportion of people who have the disorder = (A + C)/N, i.e. 43/167 = 25.74%.

Answer 51

A. PPV and NPV depend on the prevalence of the illness, and, as one can see, the prevalence of an illness can vary according to the population it tests. For example, the prevalence of depression is likely to be more in patients in a palliative care unit. Since the prevalence keeps changing with population, and hence the PPV and NPV, one way of summarizing the fi ndings of a study of a diagnostic test where there is a different prevalence is to use the likelihood ratio. The likelihood ratio for a positive test (LR+) result is the likelihood that a positive test comes from a person with the disorder rather than one without the disorder. LR+ is calculated using the formula, LR+ve = [A/(A + C)]/[B/(B + D)] or simply LR+ve = sensitivity/(1 - specifi city). So, (39/43)/(40/124) = 0.90/0.322 = 2.8. Since the specifi city and sensitivity of a test are considered to be constant for any particular test, the LR is also constant irrespective of prevalence rates.

Answer 52

A. The LR– represents the likelihood that a negative test comes from a person with the disorder rather than one without the disorder. LR– is calculated using the formula LR–ve = [C/(A + C)]/[D/(B + D)], or simply LR–ve = (1 – sensitivity)/specifi city. So, (4/43)/(84/124) = 0.10/0.67 = 0.14 Similar to LR+ve, LR–ve is also constant irrespective of prevalence rates.

Answer 53

E. The post-test probability is the probability that a patient, scoring positive on the test, actually has the disorder (PPV). It can be calculated using the nomogram that is provided. Since we know the pre-test probability (prevalence) and the likelihood ratio, we should be able to fi nd the post-test probability from the chart. A straight line drawn through the pre-test probability (25) and the likelihood ratio +ve (2.8) should yield a post-test probability of about 50.

Answer 54

B. In this case, since the question is about post-test probability of a negative test, the likelihood ratio –ve (0.14) and the line would pass through 4.

Answer 55

A. False positive (FP) is the number of people diagnosed to have a condition with the new test when they actually do not have the condition according to the gold standard. In this case, the percentage of people falsely identifi ed by the test as depressed. Using the 2 × 2 table, false positive is calculated FP = B/B+D = 40/124 = 32%.

Answer 56

B. False negative (FN) is the number of people not diagnosed with a condition with the new test when they actually have the condition according to the gold standard. In this case, the percentage of people among the depressed group falsely identifi ed by the test as not depressed, i.e. C/A +C; 4/43 = 9.3%.

Answer 57

``` D. In Question 75, we discussed how the prevalence of a condition can vary according to the population tested. Using the same screening test for depression in the general population of 1000 subjects (N), we are asked to calculate the positive predictive value. The prevalence rate or pre-test probability is 10% (A + C/N). We need to make a fresh 2 × 2 table in order to answer the question. We know that sensitivity and specifi city remains constant for the disease. From the given data the prevalence = A+C/N = 10% As N = 1000 now, we can say A+C = 100 Sensitivity (A/A+C) = A/100 = 0.91; so, A = 91. Specifi city (D/B+D) = 67.74%; D/900 = 0.677; D = 610. ``` Using the formula for positive predictive value, PPV = A/A+B = 91/290 = 31%.

Answer 58

E. See the table in Answer 81. Using the formula for negative predictive value, NPV = D/C+D = 98.36%. Note that the same answer can be derived using pretest odds and likelihood ratios. Please see question 6.

Answer 59

This question looks at the chances of developing dyspepsia with sertraline. It is otherwise called the ‘experimental event rate’ (EER). This is calculated as A/(A + B); that is, 6/44 = 0.136 or 13.6%. Similar to the above question, the chances of developing dyspepsia with placebo, or the ‘control event rate’ (CER) is C/(C + D), or 2/39 = 0.05 or 5%.

Answer 60

C. This is otherwise called the ‘attributable risk’ or the ‘risk difference’ or ‘absolute risk reduction’ (ARR). It is calculated as the difference in the absolute risks of developing a headache between sertraline and placebo, that is 13.6 – 5 = 8.6%

Answer 61

B. This question asks for the ‘relative risk’ or ‘risk ratio’ of dyspepsia with sertraline. It is an estimate of how much greater is the risk of developing dyspepsia with sertraline than with placebo. It is the ratio of the absolute risks or ratio of event rates, i.e. EER/CER = 13.6/5 = 2.7. This means that the risk of dyspepsia with sertraline is 2.7 times that of placebo. If there is no difference between sertraline and placebo, the relative risk would be 1. Expressed otherwise, relative risk values that are more than 1.0 represent increases in risk. Relative risk values that are less than 1.0 represent decreases in risk. If 95% confi dence intervals are given, and if the range includes the value 1, then the elevation in risk can be considered as statistically insignifi cant. The relative risk is used as a primary summary measure in RCTs and cohort studies. Remember RR is from exposure->outcome

Answer 62

B. This question looks at the odds ratio. It is an estimate of how many times more likely it was that a person who experienced a problem (dyspepsia) was exposed to the supposed cause (risk factor) than was a control subject (those not exposed to the risk factor). Let us consider the data in the table in a different way: the number of people who developed dyspepsia is 8 and those who did not develop dyspepsia is 75. The ‘odds’ of an event happening is the ratio of the probability of its occurrence to the probability of its non-occurrence. So in patients with dyspepsia, the probability of being on sertraline is A/A + C = 6/8 = 0.75. The probability of being on a placebo is C/A + C = 2/8 = 0.25. Therefore the odds of a person with nausea being on sertraline is 0.75/0.25 = 3 or simply A/C. Similarly, we can also calculate the odds of the person ‘without dyspepsia’ being on sertraline. It is 38/37 (B/D) = 1.02, i.e. the odds of having used sertraline in those who did not have nausea is 1.02. The ratio of these odds is simply called the odds ratio. The ratio = (A/C)/(B/D) or (AD/BC). That is, 3/1.02 or 6 × 37/2 × 38 = 222/76 = 2.92. The odds ratio is interpreted in a manner more or less similar to the relative risk. Confi dence intervals are provided and interpreted in the same manner. Odds ratios are usually used in case control studies and in meta-analyses as primary summary measures Remember OR is from outcome->exposure

Answer 63

A. As cost-effectiveness analysis has been applied to healthcare, researchers have used predominantly two methods of calculating the summary measure – the average ACER and incremental cost-effectiveness ratio (ICER). The ACER captures the average cost per effect, i.e. cost of treatment/effect of treatment. In this case, the cost of the new psychotherapy is £10,000 and the effect is 50 depression-free weeks. In the above question, the ACER for the new treatment (psychotherapy) will be C/E = 10,000/50 = £200. The ACER for antidepressants from the question will be 5000/45 = £111.

Answer 64

A. In contrast to ACER, the ICER reports the ratio of the change in cost to the change in effect (for example ΔC/ΔE). In plain and simple language, this pretty much translates to the extra cost per extra effect, i.e. ΔC/ΔE. From the question, we can see ΔC = 10,000 – 5000 = 5000; ΔE = 50 – 45 = 5 weeks. So, ΔC/ΔE = 5000/5 = £1000. Again in plain language, this would mean that compared with antidepressants, the new treatment would cost an average of 1000 additional pounds per one added depression-free week. In many economic evaluations, the ICER indicates that a new treatment is relatively more costly (ΔC >0) and relatively more effective (ΔE >0) than usual care, as in the situation in the question. Now, it is for the decision makers to decide if this additional money is worth spending.

Answer 65

C. An INB calculation determines whether the net benefi t of a new treatment outdoes that of usual care. In our case, the net benefi t of psychotherapy surpasses the benefi t of using antidepressants. In general, the INB is calculated by valuing ΔE in pounds and then subtracting the associated ΔC. This is where the society’s willingness to pay for the additional depression week comes into play. INB is calculated using the formula (ΔE × λ) – ΔC, where λ is society’s willingness to pay for a 1-unit gain of effect. In our question, ΔE = 5 weeks; the service managers are willing to pay around £1500/each depression free week (λ – willingness to pay) and ΔC is £5000. So, INB = (5 × 1500) – 5000 = 7500 – 5000 = £2500. The INB equation computes the net value of patient outcome gained in pounds. When the INB is positive, the value of a new treatment’s extra benefi ts (ΔE × λ) outweighs its extra costs (ΔC). In short, society values the extra effect more than the extra cost (i.e. ΔE × λ >ΔC). Conversely, when the INB is less than 0, society (or your health service management) does not consider the extra benefi t worth the extra cost.

Answer 66

C. Resources are scarce and are relative to needs. The use of resources in one way prevents their use in other ways. For example, if a city council decides to build a hospital on a piece of huge vacant land in the middle of the city, the city forgoes the opportunity to benefi t from the next best alternative such as selling the land to decrease the current debt or building a shopping mall that would generate additional income for the council. Opportunity cost is assessed in not just monetary or material terms, but in terms of anything which is of value. The opportunity cost of investing in a healthcare intervention is best measured by the health benefi ts that could have been achieved had the money been spent on the next best alternative intervention. In this example the cost of not providing the ‘next best alternative’, antidepressant therapy, is the opportunity cost of providing psychotherapy as the fi rst choice treatment.

Answer 67

A. How does a decision maker decide on the willingness to pay (λ)? The net benefi t approach forces decision makers to directly consider the issue of valuing additional patient outcomes. The INB can be computed with various λ s and analysed using multiple regression techniques. How sensitive the results are to the assumed λ value can be gauged using a cost effectiveness acceptability curve (CEAC). The CEAC shows the probability that a new treatment is cost-effective for different values for λ. So in the given question, if λ is £150, the probability of it being cost-effective is >90%. But if the λ is £10, the probability is less than 25%. At the same time, the probability of cost-effectiveness is also >90% if λ was £100. So, it would be sensible for the decision maker to pay £100 for every depression-free day, rather than a £150.

Answer 68

C. This is a receiver operator curve (ROC). Scores on scales are usually considered to be continuous variables. Although dichotomizing continuous data leads to loss of information, in clinical practice, it makes sense to deal with dichotomous variables. For instance, with the new scale in the question, it would make sense if we can differentiate a depressed patient from a non-depressed patient, rather than just saying patient A had a greater score than patient B. In this situation, we should know where the ideal cut-off for the scale is. However, because the distributions of the scores in these two groups most often overlap, any cut-off point that is chosen will result in two types of errors: false negatives (that is, depressed cases judged to be normal) and false positives (that is, normal cases judged to be depressed). Changing the cut-off point will change the numbers of wrong judgements but will not eliminate the problem. The cut-off point also depends on if we want the test to be more sensitive (as in a screening test) or more specifi c (as in diagnostic tests). The ROC helps us to determine the ability of a test to discriminate between groups and to choose the optimal cut-off point

Answer 69

A. The test in question is a 12-item scale that has a potential score ranging from 1 to 12. The sensitivity and specifi city of each cut-off score (in this case, there will be 11 possible cut-off scores, as shown in the fi gure) is calculated with reference to the gold standard used to diagnose depression (in this case, DSM-IV). These pairs of values are plotted, with (1 – specifi city) on the x-axis and the sensitivity on the y-axis, yielding the curve in the fi gure in question. Note that the true positive rate is synonymous with the term sensitivity, the true negative rate is the same as specifi city, and the false positive rate means the same as (1 – specifi city); they’re simply alternative terms for the same parameters. For simplicity, the graph can be depicted as below

Answer 70

C. The dotted line represents a test that is useless in discriminating a depressed from a non-depressed person. A perfect test would run straight up the y-axis until the top and then run horizontally to the right. The more the ROC deviates from the dotted line and tends towards the upper left-hand corner, the better the sensitivity and specifi city of the test.

Answer 71

E. From the graph, we can see that the more the ROC curve deviates from the dotted line and tends toward the upper left-hand corner, the better the sensitivity and specifi city of the test. Hence it is generally considered that the cut-off point that’s closest to this corner is the one that minimizes the overall number of errors (‘the best trade off ’); in this case, it is 6/7. Since the scale in our question is a screening test for depression, we would want it to be more sensitive rather than specifi c. As we can see from the fi gure, a cut-off score of 11/12 would give excellent specifi city, but very poor sensitivity, thus increasing the false negative rates.

Answer 72

C. The primary statistical measure obtained from the ROC is the AUC. The AUC value can be used to compare with the AUC value of a curve corresponding to the null hypothesis. The null hypothesis is represented by a curve that could be obtained if the test has no usefulness in discriminating those with the diagnosis and those without. This hypothetical curve will then have an AUC of 0.50, which corresponds to the area in the graph that falls below the dotted line. The difference in the two AUC consists of the area of the graph between the dotted line and the curve. The AUC can be interpreted in another very useful way. AUC is the probability that the test will show a higher value for a randomly chosen individual with depression than for a randomly chosen individual without depression. That means, if we fi nd the AUC for this particular test was 0.9 and take two individuals at random, one with and one without depression, the probability that the fi rst individual will have a higher score than the second is nearly 90%. Fortunately, the AUC, the sensitivities and specifi cities, and the whole ROC are calculated by statistical software, sparing us of the burden

Answer 73

E. Meta-analyses are usually displayed in graphical form using Forest plots, which present the fi ndings for all studies plus (usually) the combined results. This allows the reader to visualize how much uncertainty there is around the results. The graph in question, modifi ed below, presents a Forest plot, sometimes called a ‘blobbogram’ identifying its basic components

Answer 74

``` C. As shown in the diagram above, the horizontal lines along with the ‘blobs’ show the 95% confi dence intervals of the effect size or each study. If the confi dence intervals cross the line of no effect (at 0 in this case), it suggests that the effect is not statistically signifi cant. Out of the seven studies, the confi dence intervals of three of the effect sizes of three of the trials (1, 2 and 5) cross the line of no effect, and four (trials 3, 4, 6 and 7) do not cross the line. The summary measures in cases of dichotomous variables are usually odds ratios, and the line of no effect in that case will correspond to 1 ```

Answer 75

D. The size of the blobs (lozenges) in the blobbogram usually represents the size of the study, or more exactly the proportion of the weight that the study contributes to the combined effect. In this case, the largest blob is that of trial 6

Answer 76

D. A systematic exploration of the uncertainty in the data is known as sensitivity analysis. It is carried out to measure the effects of varying study variables such as individual sample size, number of positive trials, number of negative trials, etc., on expected summary outcome measure of a study (often a meta-analysis or economic study). Sensitivity analysis can be undertaken to answer the question, ‘Is the conclusion generated by a meta-analysis affected by the uncertainties in the methods used?’ One such uncertainty is publication bias. So, we can use sensitivity analyses to fi nd out the impact of having missed unpublished studies.

Answer 77

degrees of freedom

Answer 78

generally normally distributed, and therefore, can use parametric tests using mean and SD

Answer 79

non-parametric

Answer 80

compare in terms of modal values and frequency counts, however can be easily transformed into single comparative measure (eg odds ratio)

Answer 81

relationship between the numerator and denominator, instances of an observation in any reference group. number from 0 to infinity

Answer 82

type of ratio, whereby the numerator is incorporated into denominator therefore can be expressed as percentage

Answer 83

ratio that is, or should be, quoted in reference to time frame

Answer 84

point is ratio, period is rate

Answer 85

sum of all differences in values from the mean, squared, and divided by the degrees of freedom (n-1)

Answer 86

by median and interquartile range

Answer 87

reflects how much the mean and SD would be likely to vary in the general population

Answer 88

CI= mean +/- 1.96 x SE (SE=SD/sqRn)

Answer 89

likelihood of an event occurring relative to the total number of possibilities

Answer 90

when null hypothesis is falsely rejected (false positive)

Answer 91

when null hypothesis is falsely accepted (false negative)

Answer 92

probability of correctly rejecting the null, when a true difference exists. 1-B (set at 0.2)

Answer 93

differenece between 2 group means, divided by SD in controls= cohen's D or the average of SD in 2 patient groups= standardised difference numerically equivalent to z scores.

Answer 94

in research, philisophically, one cannot prove anything from empirical observation, but one can disprove falsities Identifying the truth is actually achieved by moving urther away from error, rather than discovering truth.

Answer 95

when data is approximately normally distributed, two groups= t test. Student T test is subjects are different, paired t test if observations of same group at different time point

Answer 96

when normally distributed data in 3+ groups= ANOVA, which is a measure of variance F statistic = the variability around the mean between groups is compared with the variability around mean within the group ANOVA only tells you if there are differences between the groups, but it doesn't tell you where the differences are.

Answer 97

divides the significance level by the number of observations, so if 5 observations, then 0.05/5= <0.01 minimising Type 1 error when doing multiple significance testing

Answer 98

For paired data Non-normally distributed Non-parametric

Answer 99

Non parametric | 2 independent

Answer 100

more powerful | calculation easier for CI and more flexible when examining interralationships between >2 variables

Answer 101

when comparing two proportions of dichotomous data

Answer 102

Can do contingency 2x2 tables to test significance of variables- to see if there is any difference. This is the basis of multivariate statistics, stratified analysis, however v sensitive to sample size

Answer 103

Odds ratio Relative risk Correlatioin Regression

Answer 104

Cohen's kappa- measure of reliability of research assessment, chance agreement allowed in this calculation by comparing the actual and potential agreement beyond chance, expressed as a fraction between 0 and 1

Answer 105

Measures stability between raters= inter-rater reliability And within rater over time= intra-rater reliability Correlation co-efficient or if 2+ raters at same time, intraclass correlation co-efficient

Answer 106

measuring what it is actually supposed to be measuring

Answer 107

in probability- the denominator includes the numerator, whereas in odds it does not P= 0/0+1 0=P/1-P

Answer 108

odds of exposure in cases, relative and divided by that in control= a/c / b/d or ad/bc Where a = Number of exposed cases b = Number of exposed non-cases c = Number of unexposed cases d = Number of unexposed non-cases odds of person with the outcome, having the exposure odds of cases having risk factor

Answer 109

in cohort studies risk of outcome from exposure OR approximates to RR when outcome is rare a/a+b / b/b+c

Answer 110

difference between the disease outcome in exposed vs non exposed

Answer 111

in parametric= Pearson's | in non-parametric= Spearman's

Answer 112

multivariate if continuous | logitic if binary

Answer 113

Population and outcomes of interest, interventions and comparisons.

Answer 114

Meta-synthesis

Answer 115

analysis by synthesis

Basic statistics (MRCP) Flashcards

(140 cards)