Ordinal Logistic Regression Flashcards
What is ordinal logistic regression (OLR)?
Summary term for several types of ordinal outcomes
- A regression model used when the dependent variable has three or more ordered categories
- Estimates the relationship between predictors and an ordinal outcome using log odds
Proportional Odds Model (POM)
Assumes that ORs are the same across all cut-off points of the ordinal outcome i.e., the observed ORs are estimates of the same “true” OR.
Expressed as:
logit [ P ( Y > j ) = cj + β1X1 + β2X2 + … + βkXk
- Contains the coefficients estimated being the slopes (β1, β2, … βk)
- Contains the OR for a unit increase in x1 being OR = exp(β1) - this is true if x1 is continuous. For a binary/dummy outcome variable, the OR compares a specific group to the reference group.
Also known as the parallel regression assumption
Stata commands for OLR
gologit2 <outcome> <predictor(s)>, pl or - proportional odds model
gologit2 <outcome> <predictor(s)>, or - non-proportional odds model
ologit <outcome> <predictor(s)> - for Brant's test</outcome></outcome></outcome>
Non-Proportional Odds Model
- Does not assume equal ORs across different dichtomisations
- Stata models each dichotimisation separately. This gets the same results as if we performed multiple BLRs (one for each possible dichtomisation)
Brant’s test for proportional odds assumption:
H0: The proportional odds assumption holds
If p > 0.05, we assume proportional odds
Run in Stata using:
ologit <outcome> <predictor(s)> ///
brant, detail</outcome>
How would you transform a continuous predictor for non-linearity?
Centring at the mean: gen <varname> = <predictor> - <mean>
Quadratic term may be added for a potential U-shaped relationship: gen <varname> = centred_predictor^2</varname></mean></predictor></varname>
LRT for model comparison:
Compares a simpler model (e.g., linear age effect) with a more complex model (e.g., quadratic age effect)
If p > 0.05, the complex model fits better
Partial proportional odds model
Relaxes proportional odds assumption for specific variables while keeping it for others
In Stata:
gologit 2 <outcome> <predictor(s)>, or pl(<predictors></predictors></outcome>
Model selection consideration
- Use Brant’s test for proportional odds assumption
- Use LRT to compare nested models
- Consider adding interaction terms or nonlinear transformations
How is OLR related to BLR?
- A logit transformation (log odds) is used (on the left-hand side of the equation)
- The measure of effect size is the OR
What possible ways can depression be dichotomised if categorised as ‘none’, ‘moderate’ or ‘severe’?
- Cut-off 1: None / Moderate or severe
- Cut off 2: None or moderate / severe
2 dichotomisations
How many ways can depression be dichotomised if there are four categories: ‘none’, ‘mild’, ‘moderate’, ‘severe’?
Three ways:
- Mild/moderate/severe vs none
- Moderate/severe vs mild/none
- Severe vs none/ mild/moderate
In general, what is the number of possible dichotomisations equal to?
The number of categories minus one
What does OLR do with all the dichotomisations of an outcome?
OLR dichotomises the ordinal outcome in all possible ways, and models the log odds of being in a higher outcome category
I.e., in the context of depression, it compares ‘none’ to ‘moderate’/’severe’ (higher categories) or ‘none’/’moderate’ to ‘severe’ (higher categories)
What is the proportional odds assumption?
If we can assume the ORs are the same in all possible cut-offs, we only need to estimate one (common) OR for all cut-offs
E.g., with depression, if the ORs are the same for cut-off 1 (‘moderate’ or ‘severe’ vs. ‘none’) and cut-off 2 (‘severe’ vs. ‘none’ or ‘moderate’
If the ORs are different across dichotimisations, does this necessarily mean they are not proportional?
No, the ORs may be different but can be estimating the same thing
When using a non-proportional odds model for modelling the log-odds of the outcome depression (three categories: ‘none’, ‘moderate’, ‘severe’), how do we interpret the output?
Output is split into two tables: ‘none’ and ‘moderate’
First table predicts the odds of being in a higher than ‘none’ category (‘moderate’/’severe’ depression vs. ‘none’)
Second table predicts higher than ‘moderate’ depression - ‘severe’ depression vs. ‘none’/’moderate’
What does Stata display in the output in a proportional odds model?
At the top, the constraint (“let the two ORs be the same”)
The OR estimates would be the same - due to the decision to constrain them to be equal
Unlike BLR, what does OLR estimate?
The odds of being in a higher category than ‘none’
How would we report ORs if we thought the non-proportional odds model was true?
Separately
What’s the equation for a proportional odds model?
Consider an ordinal outcome, y, with j categories, labelled j = 1, 2, …, J
Let Pj = P(y > j) be the probability of being in a category higher than j
The proportional odds model is:
logit (pj) = cj + β1X1 + β2X2 + … + βkXk
How does the proportional odds equation differ to that of the BLR?
We now have ‘pj’ in the logit transformation.
There is one coefficient associated with each predictor (like in logistic regression). However, in logistic regression, we have only one intercept term (β0). In the proportional odds model, we have several intercepts, cj, which correspond to all possible cut-offs. Other coefficients would be the same under the proportional odds assumption
In a proportional odds model, would be the separate equation if J = 3?
logit(p1) = c1 + β1x1 + β2x2 + … + βkxk
logit(p2) = c2 + β1x1 + β2x2 + … + βkxk
The only differences between the right-hand sides of the equation is the intercept (c1 and c2). All slope coefficients β1, β2, etc. are the same in both equations
- The coefficients estimated are the slopes β1, β2, … βk and the cut-offs c1, c2, …, cj-1
- As in BLR, the OR for a unit increase in x1 is ORx1 = exp(β1). Like in BLR, you can get the OR by exponentiating Beta coefficients
What would be the equation for proportional odds model estimating the log-odds of depression with three categories: ‘none’, ‘moderate’, ‘severe’?
logit(P(Depression > None)) = c1 + β1_Female
logit(P(Depression > Moderate)) = c2 + β1_Female