Flashcards in EBM Deck (26):
One way of expressing the likelihood of an event or outcome (such as pancreatitis) occurring is with odds. The odds of a person having an outcome is the number of individuals with the outcome divided by the number of individuals without the outcome.
If you see 10 people in your clinic one morning and one of them has flu then the odds would be 1 to 9. This can also be expressed as 1 / 9 = 0.11 If 64 out of 256 people in a treatment group had the event (e.g., an outcome such as a disease), the odds would be 64 / (256-64) = 64 / 192 = 0.33
expressing odds in words
There are various ways of describing odds. For example, if the odds are calculated as 0.33, this is the same as saying:
- odds of 1 in 3, or one person had the outcome for every three that didn’t
- the chances of the outcome were one third of the chances of not getting the outcome
- in betting terms: the chances of the outcome were 3 to 1 against, or 0.33 to 1
If the odds are greater than 1, then the event is more likely to occur than not. For example, if the odds are 3, this is the same as odds of 3 to 1 (in betting terminology). If the odds are 1.1, this is equivalent to 11 to 10, or 10% more likely to occur than not
Odds ratios (also known as relative odds) are used to compare whether the likelihood of a certain event occurring is the same for two groups – e.g., smokers versus non-smokers, or a treatment group versus a control group. The odds ratio is the odds of the outcome in one group divided by the odds of the outcome in another group. If the OR = 1 there is no difference between the two groups (i.e., the event is equally likely in both groups).
odds ratio when treatment is compared to control
The odds ratio is the odds of the outcome in a patient in the treatment group divided by the odds of an outcome in the control group. If the OR = 1 there is no difference between the two groups in terms of the likelihood of the outcome. With an OR > 1, there is a greater likelihood of events in the treatment group. With an OR
odds ratio when exposure is compared to no exposure
The odds ratio is the odds of being exposed in subjects with the target disorder divided by the odds of being exposed in control subjects (without the target disorder). An OR of 1.0 implies no association between the exposure and the outcome of interest (e.g., disease). An OR 1 implies that the outcome is associated with exposure, and increases as the exposure increases
example of odds ratio
In a randomised controlled trial, if 64 out of 256 people in the treatment group had the outcome, and 45 out of 180 in the control group also had the outcome, the odds of someone in treatment group having the outcome will be 64/(256-64) and the odds for the control group will be 45/(180-45). In both cases this is 1 in 3 or 33%. So the odds ratio is:
64/192 = 0.33 = 1
NB: with a Chi-square table that’s 2x3 (for example, e.g., 0, 1, 2 drinking; 0, 1 pancreatitis), you work out ORs separately (e.g., for no alcohol compared to low alcohol, and for no alcohol compared to high alcohol), essentially pretending you are working with 2 x 2 tables.
You can then perform a Chi-square analysis to determine whether the relationship shown in the table is significant. The Chi-square test measures the fit of the observed values to ‘expected’ values.
The Chi-square test (or Pearson’s Chi-square) tests a null hypothesis stating that the frequency distribution of certain events observed in a sample is consistent with a particular theoretical distribution. It’s used with categorical variables – e.g., things like whether or not someone has a disease, or has had a particular treatment, and also categories such as blood type.
chi square test of goodness of fit
• A Chi-square test of goodness of fit establishes whether or not an observed frequency distribution differs from a theoretical distribution. A simple application is to test the hypothesis that, in the general population, values would simply occur with equal frequency. But you might also want to test whether a sample from a population would resemble the population. For example, researchers would use a Chi-square goodness of fit test to test whether the frequencies of blood group in a sample match with the frequencies that are seen in the population as a whole
chi square test of independance
• A Chi-square test of independence assesses whether paired observations on two variables, expressed in a contingency table, are independent of each other. Here’s an example of a contingency table showing two variables: Stroke vs. controls, and hypertension vs. no hypertension
crude odds ratio
ORs calculated in this way (from a table such as those shown above) only consider the effect of one variable on the outcome, essentially ignoring the potential influence of other factors. For this reason, they are known as ‘crude’ odds ratios, because they don’t take into account the other variables that may be having an effect on the outcome
A confounding variable, or confounder (e.g., alcohol consumption) has an effect on the outcome (e.g., disease), and is also correlated to the exposure (e.g., smoking) – e.g., people who smoke also tend to drink more.
Common confounders include age, socioeconomic status, and gender. One example that’s often given of a confounding variable comes from the observation that children born later in the birth order (born second, last etc.) are more likely to have Down’s syndrome. But this doesn’t mean we should conclude that birth order causes Down’s syndrome. The relationship between birth order and Down’s is confounded by the mother’s age. Older women are more likely to have children with Down’s. Older women are also likely to be having children who are late in the birth order. So, mothers’ age confounds the association between birth order and Down’s syndrome: it looks like there is an association when there is not
Multiple (or multivariate) logistic regression (see below) controls for many potential confounders at one time.
In simplest terms, regression is a statistical procedure which attempts to predict the values of a given variable (termed the dependent, outcome, or response variable), based on the values of one or more other variables (called independent variables, predictors, or covariates). The result of a regression is usually an equation (or model) which summarises the relationship between the dependent and independent variable(s).
The type of regression used will be dictated by the type of response variable being analysed and by your eventual analytic goal. Linear regression is used to predict the values of a continuous outcome variable (such as height, weight, systolic blood pressure), based on the values of one or more independent predictor variables (and we have encountered simple linear regression in EBM Session 6). Logistic regression is intended for the modelling of dichotomous categorical outcomes (e.g., dead vs. alive, cancer vs. none, pain free vs. in pain).
Logistic regression is used to analyse relationships between a binary/dichotomous dependent variable and numerical or categorical independent variables. Logistic regression combines the independent variables to estimate the probability that a particular event will occur.
In general, logistic regression calculates the odds of someone getting a disease (e.g., pancreatitis) based on a set of covariates (e.g., based on how much someone drinks, smokes etc.)
simple/bivariate logistic regression
Simple logistic regression is used to explore associations between one (dichotomous) outcome and one (continuous, ordinal, or categorical) exposure variable. Simple logistic regression lets you answer questions like, “how does smoking affect the likelihood of having pancreatitis?” This approach is equivalent to that used above, using 2 x 2 tables to calculate an odds ratio, and Chi-square analysis to test the significance of this ‘crude’ OR.
Essentially: Bivariate logistic regression gives you an odds ratio showing the effect of a variable on the outcome, ignoring the effects of other variables
multiple/multivariate logistic regression
Multiple logistic regression is used to explore associations between one (dichotomous) outcome and two or more exposure variables (which may be continuous, ordinal or categorical). The purpose of multiple logistic regression is to let you isolate the relationship between the exposure variable and the outcome variable from the effects of one or more other variables (covariates or confounders). Multiple logistic regression lets you answer the question, “how does smoking affect the likelihood of having pancreatitis, after accounting for (or ‘unconfounded by’ or ‘independent of’) alcohol consumption, BMI, etc.?” This process of accounting for covariates or confounders is also called adjustment.
Comparing the results of simple and multiple logistic regression can help to answer the question “how much did the covariates in the model alter the relationship between exposure and outcome (i.e., how much confounding was there)?”
Essentially: For each variable, multivariate logistic regression gives you an odds ratio showing the effect of a variable on the outcome, after controlling for the effects of the covariates. You can see what the effect of smoking is, regardless of whether a person drinks or not, or has a low or high BMI.
the output of a multivariate logistic regression
more complicated but details wont be examined. This is just to show how you use the output of the multivariate logistic regression to calculate the effect of a change in the value of one or more of the variables.
The statistics of primary interest in logistic regression are the (beta) coefficients ( ) and their p-values (that tell us whether each one is statistically significant). Individual coefficients are expressed in log units and are not directly interpretable. However, if the coefficient is used as the power to which the base of the natural logarithm (2.71828) is raised, the result represents the change in the odds of the modeled event associated with a one-unit change in the independent variable.
If the independent variable is dichotomous with values of 1 or 0, then the coefficient represents the log odds that an individual will have the outcome for a person with a value of 1 versus a person with a value of 0. In a multivariate model, this coefficient is the independent effect of the variable on the outcome, after adjusting for all other covariates in the regression.
If the independent variable is continuous, then for every one unit increase in that variable, the odds of having the outcome changes by e , adjusting for all other covariates in the multivariate model. (e = the base of the natural logarithm = 2.71828).
If the transformed log value of the coefficient is greater than one, this means the outcome is more likely to occur. If it is less than one, then the odds of the event occurring decrease. If the transformed log value of the coefficient is 1.0 (or not significantly different from 1), this means that this covariate does not change the odds of the event one way or the other.
Bottom line: Logistic regression analysis tells you how much an increment in a given exposure variable affects the odds of the outcome.
Sensitivity is a measure of the probability of correctly diagnosing a condition. true/false
. Another way to describe sensitivity is the probability that a test result will be positive when the disease is present – it is also sometimes referred to as the true positive rate.
Specificity is a measure of the probability of correctly identifying a non-diseased person. number without/total
Specificity can also be thought of as the probability that a test result will be negative when the disease is not present – it is also sometimes referred to as the true negative rate.
is how many peoplE have it. so if 4 out of 100 have the disease, the prevalence is 4%.
positive predictive value
probability that the disease is present when the test is positive.
= true positives / (true positives + false positives)
= 3 / (3 + 7) = 0.3
negative predictive value
probability that the disease is not present when the test is negative.
= true negatives / (false negatives + true negatives)
= 89 (1 + 89) = 0.99
calculating sensitivity (the true positive rate)
The sensitivity of the test is the proportion of people with the disease correctly identified by the test, or the true positives as a proportion of all the people who actually have the disease. Of the 4 people with the disease, 3 are picked up by the diagnostic test, and there is 1 false negative. The sensitivity of the test is:
True positives / (True positives + False negatives) = 3/4 = 0.75 (or 75%)
It may be easier to think of this as: True positives / Those with the disease
From this, we also get the proportion of false negatives (or the ‘false negative rate’):
False negatives / (True positives + False negatives) = 1/4 = 0.25 (or 25%).
Or, simply: 1 – sensitivity
calculating specificity (the true negative rate)
The specificity of the test is the proportion of people without the disease correctly identified by the test, or the true negatives as a proportion of all the people without the disease. So in this example there are 89 true negatives, out of 96 people who don’t have the disease. The specificity of the test is:
True negatives / (False positives + True negatives) = 89/96 = 0.93 (or 93%)
It may be easier to think of this as: True negatives / Those without the disease
From this, we also get the proportion of false positives (or the ‘false positive rate’):
False positives / (False positives + True negatives) = 7/96 = 0.07 (or 7%)
Or, simply: 1 – specificity
trade off between sensitivity and specificity
You can change the performance of a diagnostic test by adjusting the cut-off point for a positive outcome (the criterion). Either sensitivity or specificity will improve, but the other will decline. For example, if the criterion is made less stringent, then more people will be identified as having the disease. This will mean that sensitivity will increase, but specificity will decrease. (So, the test will correctly identify more people with the disease, but it will also falsely pick up more people who don’t have the disease).
For example, see the data in the table below. The numbers indicate those who are classified by the test as being hypothyroid; e.g. 36 = people who have a T-4 value between 7.1–9 who are classified by the test as being hypothyroid, but who are actually euthyroid.
These values of sensitivity and specificity can be presented graphically, as shown below. This type of graph is called a Receiver Operating Characteristics curve (or ROC curve). It is a plot of the true positive rate (i.e., sensitivity) against the false positive rate (i.e., 1 – specificity) for the different possible criteria (or cutoff points) of a diagnostic test. It shows the trade-off between sensitivity and specificity; any increase in sensitivity will be accompanied by a decrease in specificity