Statistics Flashcards

Question

pARAMETIRC WHAT TEST -

Answer 1

kNOW PARATRIC - DETMRINE HOW MANY GROUPS TWO GROUPS pair student t test or unpair student T - depend if pair or nt More than two groups = ANOVA

Answer 2

Analyse normally distrub data - knowl differnece between means of 2 samples & SEM T - Diffce betweens / est SEM P value read tables t value - samples

Answer 3

Anal variance - compare parametric quant data - >2 groups matsanova complex - software ahdnle

Answer 4

Decide how many groups - two groups wilcoxon signed rank - pair mann whitney U if unpaired >2 groups - friedman pair paired ksural wallis unpaire

Answer 5

The mode refers to the most frequently encountered value and in normally distributed data it coincides with the mean and median values. In skewed data the geometric mean is the most appropriate measure (not the arithmetic mean). Standard deviation (SD) is the square root of the variance and is a measure of distribution of the data. In positively skewed data the mean usually lies to the right of the mode (not left). In positively skewed data the mean usually lies to the right of the median (not left).

Answer 6

In double blind placebo control clinical trials neither the patient nor the clinician knows which treatment option the patient has received. It would not be blind to the patient otherwise. If everybody received both treatments then this would be a 'double blind crossover study'. The clinician remains blind to the treatments received by the patients until the study has finished.

Answer 7

It is not possible to say confidently that this drug trial was well designed without further information about the study and its conduct. The placebo effect is often higher than 5%, with rates between 20 - 30% being common. The result may indeed have occurred by chance alone in less than one in 20 occasions. This is the meaning of the 'p value', where a 0.05 is equal to 1/20. Standard error is derived from the variance and 'probable error' is a fictitious term. A p value of less than 0.05 is the conventional level of statistical significance, thus the results should be regarded as reaching conventional levels of statistical significance. If the p-value is less than 0.05 indicates that there is strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).

Answer 8

the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error.

Answer 9

In an 'n of 1' trial the treatment and placebo are given at random treatment periods to the same patient. The results are specific to one drug and the patient studied and cannot usually be generalised. They are useful where the patient doubts the effectiveness of a treatment or where the practitioner has doubts. They are also useful for dosing or working out if a symptom is a side effect or not. Drugs with short lived effects are best, as long wash-out periods need to be included for long acting drugs.

Answer 10

Relative risk may be determined in prospective and retrospective studies and is a useful measure of the strength of association between disease and a risk factor. In a prospective study of a population, participants are selected without reference to the presence or absence of disease. - best assessed in prospective After excluding prevalence cases the population is followed over time. The number of new cases occurring thereafter is divided by the population at risk, giving an incidence rate. Of two indicates a doubling of risk between the groups

Answer 11

The Student's t test is inappropriate as we are comparing proportions not means. Pearson's coefficient of linear regression is inappropriate as there is no linear regression to plot. The data would be ideal for evaluation using the chi square test. The numbers are not too small to draw any statistical conclusions. No clinical drug trial is ever that obvious and statistical testing should be performed.

Answer 12

Sample sizes can be calculated for population studies, clinical trials and most forms of studies. Binary, ordered categorical and continuous variables can be used. It is very important before commencing a clinical trial to determine which variable will be the primary end point, what magnitude of difference is clinically relevant and have an estimate of the standard deviation (SD). From these data and statistical significance (a), usually p = 0.05. The probability of correctly rejecting a false null hypothesis equals 1 − β and is called power. With the expected mean difference/SD and a decision of significance and power a sample size can be calculated. Maximum power is achieved by having equal groups, but unequal group size can be used.

Answer 13

Life table analysis is used in various contexts to follow a population until certain end points occur. Death is a suitable end point but development of disease or disability are also suitable. For example, the mortality in groups of smokers and non-smokers could be collected over a period of time and survival plotted as a function of time. The development of retinopathy in diabetics, or the time from treatment of multiple sclerosis patients treated with a interferon or placebo to next relapse would also be suitable for life table analysis. These incidence data are best collected prospectively. Life tables from two groups can be compared by calculating a chi-square statistic (Mantel-Haenszel procedure or log rank method). Relative risk can also be calculated from such data. Mathematical models can be applied to life table data to adjust for confounding variables (co-variables). An example of this is the Cox proportional hazards model.

Answer 14

The incidence refers to how often a situation occurs (not the prevalence).

Answer 15

The prevalence refers to how common is a situation (not the incidence).

Answer 16

The sensitivity of a clinical test refers to the ability of the test to correctly identify those patients with the disease.

Answer 17

The specificity of a clinical test refers to the ability of the test to correctly identify those patients without the disease. The sensitivity and specificity are independent of the population of interest subjected to the test.

Answer 18

However, the terms positive predictive value (PPV) and negative predictive value (NPV) are used when evaluating a test to a clinician and are dependent on the prevalence a disease in the population being looked at.

Answer 19

The reliability is the ability of a test to produce the same result when repeated under identical conditions.

Answer 20

The t test is used when dealing with normally distributed interval scale data (baldness is not such data, but height is). ANOVA compares normally distributed interval scale data in multiple groups. The chi square test measures differences between nominal data, which are usually yes/no or dead/alive. The Mann-Whitney test is used for analysis of ordinal data and blood loss is normally distributed interval scale data.

Answer 21

In a normal distribution of a large population (greater than 30), 95% confidence intervals can be calculated as ± 1.96 times the standard error of the mean. This means there is a 95% chance that the true population mean will lie within the range of values. If repeated samples were taken and the 95% confidence interval was computed for each sample, 95% of the intervals would contain the population mean. Ninety five percent confidence intervals can be calculated for non-parametric or interval data but this uses a different method than 1.96 × sem When comparing the effects of two treatments (for example, active drug and placebo or two populations) 95% confidence intervals indicate the size of any effect rather than just indicating if there was an effect as in significance testing. There is a close relationship between the use of confidence intervals and the two-sided hypothesis test.

Answer 22

The Association of Anaesthetists of Great Britain and Ireland (AAGBI) guidance states: "There is little evidence to show that wearing surgical attire outside the theatre and returning to the theatre without changing increases surgical wound infection." Also, in terms of wearing headgear there is "little evidence for the effectiveness of this practice except for scrub staff in close proximity to the operating field." The AAGBI state that controversy exists regarding masks and that local policies should be followed. It should be noted that Clostridium difficile is not removed with alcohol hand gel. High level disinfection of equipment kills vegetative bacteria (not all endospores), fungi and viruses.

Answer 23

The frequency of distribution can be described by: Stem and leaf plots Histograms Table of frequencies Positive and negative skews (not medians and correlation coefficients).

Answer 24

Absolute risk reduction (ARR) is a means of measuring the difference between two treatments. Explanation The 'absolute risk reduction' is 10% − 6% = 4%. The 'number needed to treat' to prevent a stroke therefore equals 100/4 = 25. 25 patients would need to be treated at a cost of £100/month for 12 months to prevent a stroke which gives the total cost as £30,000.

Answer 25

The correlation coefficient cannot be higher than 1 and is usually between -1 to +1. If the correlation coefficient is 0 there is is no linear relationship between height and the PEFR. If the correlation coefficient is positive, the curve would have an upward slope. If a correlation can be made then figures can be extrapolated and 1.5 m is not too far from the lower height of 1.6 m. The PEFR is the dependent variable and is usually put on the Y (vertical) axis, whereas height, the independent variable is on the X (horizontal) axis.

Answer 26

Antibiotics are currently the leading cause of perioperative anaphylaxis in the UK. They are responsible for 46% of cases with identified causative agents. Co-amoxiclav and teicoplanin between them account for 89% of antibiotic-induced perioperative anaphylaxis The second leading cause is neuromuscular blocking agents (NMBAs), responsible for 33% of cases). Patent blue dye (14.6/100,000 administrations) Chlorhexidine (0.78/100,000 administrations) Suxamethonium (11.1/100,000 administrations) Teicoplanin (16.4/100,000 administrations) Co-amoxiclav (8.7/100,000 administrations) Perioperative anaphylaxis to chlorhexidine poses a significant risk in the healthcare setting due to its widespread use with some being fatal.

Answer 27

The standard error of the mean or SEM equals the standard deviation or SD divided by the square root of sample size. SEM is the standard deviation of all the means of large random samples of size n from a given population. It is of central importance in significance testing. If testing to see if there is a difference between two population means (for example, t test) then t=difference in means/SEM. The SD is a measure of observation variability and is greater than the standard error of the mean (SEM).

Answer 28

The null hypothesis is true if there are no significant differences in response. Increasing the number of patients involved in the trial will reduce the baseline differences between the groups. Patients who withdraw from the study or are lost to follow up may have suffered side effects or even have died from being given the drug, so cannot be excluded. In a clinical trial of a new drug randomisation attempts to ensure that each patient has an equal chance of being allocated a certain treatment. Stratified random allocation of treatment is appropriate where the number of patients is relatively small and can be by age, sex, disease duration, etc.

Answer 29

Type I error or a error is wrongly rejecting the null hypothesis, for example, interpretation of p<0.05 as being significant when it is not. Type II error or b error is accepting the null hypothesis when it is invalid, for example. when two treatments are compared and no significant difference (that is, p>0.05) is noted, assuming there is no difference between them when there is in fact a difference. Type II error is more likely to occur when small samples are used. Type I error is more likely to occur when multiple t tests are performed. Type II error increases with increasing variability of response to treatment (increasing standard deviation). Confidence interval provides an interval estimate of the population parameter (usually the mean or mean difference between two groups). Narrow 95% confidence interval means that the estimate is more precise. A narrow confidence interval indicates lower variability (SD) or higher statistical power or higher sample size which makes type II error less likely. Whether confidence interval is used will not affect type II error rate.

Answer 30

Ninety five per cent confidence intervals can be used for both distributional and distribution-free data. A 95% confidence interval looks at the range of values within which we are 95% confident that the true population parameter lies. Therefore, using the above definition, if we were to repeat the experiment many times, the interval would contain the true population mean on 95% of occasions. Confidence intervals increase the accuracy when comparing means with another population by looking at the spread of differences. A wide confidence interval indicates that the estimate is imprecise and if the 95% CI crosses zero, if the 95% CI crosses 0, it may indicate that the treatment has no effect, it may be that the study is underpowered and not able to detect a difference (the difference may still exist) Can be calculated at ± 1.96 times the standard error of the mean

Answer 31

Verbal rating scales (VRS) and numerical rating scales (NRS) generate discontinuous data that are unsuitable for parametric tests of statistical significance and thus non-parametric techniques must be used. When the VRS is confined to only three levels, data can be summarised in contingency tables and either the χ2 test or exact tests used. Where VRS is divided into several levels or NRS used, the Mann-Whitney test or Wilcoxon rank sum test are appropriate. Visual analogue scales (VAS) yield continuous data and t tests can be used as long as less than 25% of the data are at extreme ends of the range. If there are doubts about the validity of a t test, non-parametric tests can be used. VAS data may be analysed using standard deviation and standard error. Some authors have used nonparametric tests considering the ordinal nature of the data. A time series of numerical rating scores are best analysed using some form of analysis of variance for repeated measures or even area under the curve. Measuring the area under the curve gives a summary measure for each patient that can be analysed by a single test. The Mann-Whitney test only compares two sets of data and cannot be used for multiple testing.

Answer 32

Chi square testing refers to count data (categorical). It therefore refers to 2 by 2 tables or larger. Explanation These data would be ideal for a chi square test. It is a 2 × 2 contingency table for which there is a special chi squared formula that gives a value that can be looked up in a table giving the p value. The Student's t test cannot be used as we are comparing proportions not means. Pearson's co-efficient cannot be calculated as there is no linear regression to plot. Nothing is ever so obvious that no statistical analysis is needed.

Answer 33

The mean and median and mode of a normal distribution are equal because the distribution curve of a normal distribution is bell shaped and equal on both sides. Mu (μ) and sigma (σ) which symbolize the mean and the standard deviation respectively of a probability distribution. The probability that a normally distributed random variable x, lies between (μ − 1.96 σ) and (μ + 1.96 σ) is 0.95. The probability that a normally distributed random variable x, lies between (μ − σ) and (μ + σ) is 0.68. Ninety five per cent of the distribution of sample means lie within 1.96 standard deviations of the population mean. A parametric test is a statistical test which assumes the data are normally distributed.

Answer 34

Meta-analyses of randomised, controlled trials are usually performed when individually the trials are too small to give reliable answers. There are a number of reasons for performing meta-analysis which include: To examine variability between trials To perform subgroup analysis To identify the need for major trials, and To obtain a more stable estimate of the effect of treatment. Only randomised, controlled trials should be included in such analysis, but if only published studies (which tend to be positive) are used this will introduce bias. If unpublished but properly controlled studies are available they should be used in the analysis. It is important that patient selection and outcomes are comparable in the studies.

Answer 35

Incidence is the number of new cases of a disease in a defined time period or population. The number of cases of a disease in a population over a defined time period describes the prevalence of a disease - it is not the number of "new cases". The number of new cases of a disease does not stipulate a defined period of time or place (that is, there is no denominator from which to derive an incidence). The number of new cases of a disease seeking medical treatment describes the incidence of patients seeking medical treatment but not the incidence of the disease in a population; there will be some patients not seeking treatment who have the disease. The number of patients dying from a disease in a population describes the death rate from a disease.

Answer 36

The standard deviation (SD) is a measure of the scatter of observations about the mean and is a valid statistical parameter for observations that have a normal distribution. The SD of a group is the square root of the variance (not square). Standard error is the standard deviation divided by square root of n, and so the SD is numerically greater than the standard error of the mean. Chi square compares proportions.

Answer 37

The mode is the value that occurs most frequently. The median is that point on the scale of measurement above which exactly half the values lie and below which lie the other half. Having a normal distribution, the arithmetric mean, the mode and the median are equal. In a normally distributed variable, the probability of attaining a value higher than two standard deviations above the mean is approximately 1 in 40 (p = approx. 0.025). This is one sided (higher) - higher OR lower than two standard deviations would be 1 in 20. In a normal distribution, approximately 95% of the values will lie within the range between (mean + 2 standard deviations) and (mean - 2 standard deviations).

Answer 38

Some drugs are metabolised by enzymes susceptible to polymorphisms that affect their activity. This is the basis of fast and slow acetylation (e.g. hydralazine, procainamide, sulphonamides, and dapsone) and slow or poor metabolism (e.g. debrisoquine). The prevalence of these polymorphisms shows considerable variation between racial groups. The consequences of poor metabolism of a particular drug are clearly dependent on its pharmacological actions: drugs with a steep dose-response curve or a low therapeutic index may well produce toxic effects in poor metabolisers. Genetic polymorphisms are determined by abnormalities of gene expression and are not dependent on the pharmacological actions of the drug. A number of commonly used drugs are broken down by phase I hepatic metabolism with the same enzyme, cytochrome P2D6 (CYP2D6) for example the beta‐blockers metoprolol, and alprenolol; propafenone; codeine and tramadol; antipsychotics such as droperidol, thioridazine, and haloperidol; and ondansetron and tropisetron. Gene mutations that control the expression of CYP2D6 can result in: Complete deletion of the CYP2D6 gene Replacement of a single nucleotide leading to aberrant gene splicing The following enzymes are genetically expressed in the kidneys and are therefore, theoretically subject to genetic polymorphism. Because most biotransformation occurs in the liver, the contribution of the kidney is relatively small and not clinically important. ``` UGT1A6 (paracetamol) UGT1A9 (propofol, furosemide UGT2B7 (NSAIDs, morphine codeine) CY2B6 (ketamine, propofol) CYP3A5 (midazolam) ```

Answer 39

The null hypothesis is that there is no significant difference between two groups or specified populations. The alternative hypothesis is that there is a difference (i.e. contrary to the Null hypothesis) A type I error occurs when we reject the null hypothesis when we should have retained it. A type II error occurs when we fail to reject the null hypothesis. In other words, we believe that there isn't a genuine effect when actually there is one. Rejection of the null hypothesis depends on the probability. The significance level is usually set at p <0.05

Answer 40

A perfect correlation is when r is either −1 or +1, but this may not be statistically significant. The significant p value is <0.05 (not 0.5). When t is >1.96 it may be significant but it depends on the degrees of freedom. Chi2 must be ≥3.84 to reach conventional level of significance (p <0.05). If degrees of freedom is >1, chi2 needs to be even higher to be statistically significant.

Answer 41

The geometric mean is the nth root of the product of (a1 ... aN) and the arithmetic mean is (a1+ ...+aN)/N hence the geometric mean will always be less than (or at most equal if all values are equal) the arithmetic mean. central tendency or typical value of a set of numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum).

Answer 42

The standard error of the mean (SEM) = SD/√n. SD does not necessarily fall with sample size as the distribution of values may increase and hence SD increase. SEM would decrease with sample size as can be seen in the above calculation. Skewness does not depend on whether SD is greater than or less than the mean. Student's t test is a parametric test comparing normally distributed data.

Answer 43

Prevalence depends on the number of individuals who contract the disease in a particular time period. Because it looks at the number of individuals with a disease at a given point in time, or within a defined interval, if a patient has recovered from the illness in that duration, then they would not be included in the prevalence rate. It is expressed as a proportion. As cross-sectional studies are effectively a snap-shot they can be used to estimate the proportion of people with a disease at that time and thus the point prevalence. Prevalence is one measure that can assess the health needs of a community. P = I × D Where: ``` P = Prevalence I = Incidence D= Duration ```

Answer 44

Only about 5% percent of individuals will be beyond two standard deviations from the mean (not 10%).

Answer 45

Sample size influences level of significance through its use in the calculation of SD/SE. It does not affect The level of acceptance The alternative hypothesis with a general level set at p<0.05 The test to be used.

Answer 46

Symmetrical about the mean so that the mean, median and mode coincide Sixty eight percent of observations lie within 1SD (s) of the mean m: x ± 1SD, 95% lie between x ± 2SD, 99.7% lie between x ± 3SD Because of this symmetry, about 34% of observations lie between x and x + 1SD. Data from a normal distribution are suitable for parametric tests without prior transformation. Observations which do not conform to a normal distribution may be log-normally distributed and can be transformed to a normal distribution by converting values to log10. Counts of events (for example, bacterial colonies, radioactive counts) may follow a Poisson distribution and may be suitably transformed by taking the square root value. The 95% confidence interval gives information about the range of values within which the true population is likely to lie. The mean 95% confidence interval is calculated as the mean 1.96 times the standard errors of the mean (sem) for populations of greater than 30. For smaller populations the appropriate value of t can be taken from appropriate tables such that the 95% confidence levels are calculated as x-(t ± sem) - x + (t ± SE), where t is taken for the appropriate degrees of freedom associated with a confidence of 95% 100(1-a)%, that is, 0.05.

Answer 47

Level 1 - High-quality randomised controlled trial with statistically significant difference or no statistically significant difference but narrow confidence intervals (prospective controlled) a - systematic rev/meta anly mult rct b - 1 rct Level 2 - Prospective comparative study (prospective uncontrolled) a - 1 well desing contoll non rand b 1 desing expeiment - cohort Level 3 - Case-control study, retrospective comparative study (retrospective controlled) Level 4 - Case series (retrospective uncontrolled) Level 5 - Expert opinion.

Answer 48

The power of a study is the probability of rejecting the null hypothesis when it is false, that is, the probability of concluding a result is statistically significant.

Answer 49

Sackett et al. state that "Evidence based medicine is the conscientious, explicit and judicious use of current best evidence in making decisions about care of individual patients. This means integrating individual clinical expertise with the best available external evidence". Clinical expertise involves proficiency and judgment gained by clinicians with time and the compassionate application of knowledge to individuals. Current best evidence comes from many sources including: ``` randomised controlled trials meta-analysis national expert guidelines (for example, hypertension and asthma) patient-orientated studies, and health economic assessment. ``` Evidence based medicine is not 'cook-book' medicine, a method for cost cutting and does not solely rely on randomised controlled trials.

Answer 50

Ranked Subject mutually exclusive group intrinsic ranking not part of a scale ASA grade GCS grade

Answer 51

Not ranked bear no numerical relationship sex/blood group

Answer 52

Discrete or continous

Answer 53

Take certain value | number of sister

Answer 54

take any value height weight BP

Answer 55

Square root of SD

Answer 56

SInterpreting distribution of data Line conec highest and lowest - range Perpendic - rep median box represents percentiles (2.5-97.5) outlier - asterisk

Answer 57

Inferential stastistics 1 inferenace - several samples pop - means plotted - they will for a normal distrib around true pop mean SEM = SD of sample means Calc formla = SD of sample / square root of sample number Sample size increased -= SEM decreases ie sample mean more closely relate to true pop mean Sample mean +- 1.96x SEM

Answer 58

absolute risk reduction meta anaylsis can be easily biased

Answer 59

rejection resul - chance when real differnce | 20% max acceptable

Answer 60

non parametric used on any type data Parametric - assume normally distributed provided data contin vary w/ dev not to extreme can be used for non norm distub null hypoth reject - no diff more common accept null hypoth ie not ident diff between group when it exists smal sample size

Statistics Flashcards

(84 cards)