Logistic Regression I Flashcards
What is the purpose of logistic regression?
It models the relationship between a binary outcome and one or more independent variables
How is logistic regression different from linear regression?
Unlike linear regression, logistic regression is used for binary outcomes and applies a logit transformation to ensure predicted probabilities stay between 0 and 1
What types of variables can be used as predictors in logistic regression?
Continuous and categorical
What is the formula for the logistic regression model?
To express the response probability as a linear function:
ln ( π / 1 - π ) = β0 + β1X
Where:
- ln ( π / 1 - π ) = log odds or logit
- π / 1 - π = odds
- B0 = intercept/constant
- B1 = coefficient for predictor X/slope
How do you convert logistic regression coefficients to an OR?
Take the exponent of the coefficient:
Odds ratio = e^B1
What is an OR?
The ratio of the odds of an event occurring in one group compared to another
Odds ratio = odds (group 1) / odds (group 2)
How do you interpret OR?
OR = 1: No effect (same odds in both groups)
OR > 1: Higher odds of the event occurring
OR < 1: Lower odds of the event occurring
What are probabilities and odds?
- Probability (risk): π = occurrences/opportunities (range = 0 - 1)
- Odds: π / 1 - π (probability of event occurring / probability of an event not occurring; range 0 - plus infinity)
What does an OR of 1.5 mean?
The event is 50% more likely to occur in the exposed group than in the reference group
What does an OR of 0.75 mean?
The event is 25% less likely to occur in the exposed group compared to the reference group
What is the likelihood ratio test (LRT) used for?
To compare two nested models and determine if adding a predictor improves the model
What are the hypothesis for the LRT?
H0: Adding the variable makes no difference
H1: Adding the variable improves the model
How do you calculate the LRT statistic?
LRT = 2 [ ln (L1) − ln (L0) ]
Where L1 is the likelihood of the full model and L0 is the likelihood of the nested model
What are the key assumptions of logistic regression?
- Observations are independent
- The outcome variable is binary
- No multicollinearity (high correlation between predictors)
- The relationship between continuous predictors and log-odds is linear after adjusting for any other covariates
- No significant interactions unless explicitly modelled. The effect of an exposure is the same regardless of the value of any other independent variable, and vice versa.
- No unobserved confounding
Why do we use multivariable logistic regression?
To adjust for confounders like age, sex, and BMI
How do you fit a logistic regression model in Stata?
logistic <outcome> <predictor1> <predictor2> <predictork>
Note: Adding 'i' before a categorical variable tells Stata to interpret it as a categorical variable</predictork></predictor2></predictor1></outcome>
How do you compare two models using LRT in Stata?
Store each model to memory:
<est>
<est>
Then compute an LRT test:
<lrtest>
</lrtest></est></est>
What is the logit link function?
A BLR model uses a logit (or logistic) transformation of probability π
π = exp(β0 + β1x) / 1 + exp(β0 + β1x)
What are the properties of logit transformation of π?
- Produces values of π between 0 and 1
- Symmetric about π = 0.05
- The curve is almost a straight line for 0.2 < π < 0.8
What does it mean if π = 0.5?
The event is equally likely to occur or not occur, using the equation the odds would be:
odds = 0.5 / 1 - 0.5 = 0.5 / 0.5 = 1
What does it mean if π = 0.4
The event is more likely to occur than not
Odds: = 0.4 / 1 - 0.4 = 0.4 / 0.6 = 0.67
What is the log odds a result of?
The linear relationship between Y given X is specified after making a logit transformation. This results in the log odds, which can be positive or negative
What does it mean if π > 0.5? E.g., if π = 0.6
The event is more likely to occur than not
Odds: 0.6 / 1 - 0.6 = 0.6 / 0.4 = 1.5
How do you fit a logit model in Stata?
logit <outcome> <predictors></predictors></outcome>