Logistic regression Flashcards Preview

AQM > Logistic regression > Flashcards

Flashcards in Logistic regression Deck (24)
Loading flashcards...

Normal regression equation

• Ŷ = bX + c
• This is the linear regression model equation, make sure to know this
• Ŷ is the outcome variable, “the probability of having one outcome or another based on a nonlinear function of the best linear combination of predictors” (Tabachnick and Fidel).
• Ŷ-Y is the residual(s)?
• where X is the predictor variable
• The slope of the line is b
• c is the intercept (the value of y when x = 0)


Types of research questions for logistic regression

• Can predict the presence or absence of a disorder/disease?
• Can we predict an outcome using a set of predictors?
o How good is the model?
• Does an individual predictor increase or decrease the probability of an outcome?
o Related to the importance of the predictors
• Can be used for classification and prediction
• Simple categorical outcomes
o Can we predict the outcomes using categorical predictors?


How does logistic regression differ from ordinary least squares regression?

• OLS has 3 important characteristics:
o The model is linear
o Residuals are assumed to be normally and homogenously distributed
o Predicted scores (Ŷ) are on the same scale as the data (Y)
• These characteristics don’t apply to logistic regression
The model is not a linear prediction, it is dichotomous. Better to use a 'logistic' function, sigmoidal shape fits the data better.
- There will be non-normality and heteroscedasticity in the residuals if OLS regression is used, which violates important assumptions of this method
The model is a probability value, and thus is on a different scale to the data


What is probability?

• Probability: the likelihood of an event occurring
o If p = .80, there is an 80% chance of that event occurring


What are predicted odds?

• Predicted odds: the probability of an event occurring divided by the probability of it not occurring
o Predicted odds = prob of event occurring/ prob of event not occurring
o Following on from p = .80, that means the probability of it not occurring is .2 (i.e. 1-the likelihood of it occuring)
o .8/.2 = 4
o This means the odds were 4:1 in favour of the event occurring
The logistic model give pi which is likelihood of an outcome occurring, so predicted odds is pi/1-pi


• Odds are asymmetric

so the observed odds ratio is not in the centre of the confidence interval, but we can use the natural log of the odds instead
o Log of odds = Logit


What is the odds ratio?

the odds of an event occurring across levels of another variable
o By how much do the odds of Y change as X increased by 1 unit
o Essentially it is a ratio of ratios
o Measure of effect size is central here; a good way of measuring the strength of the relationship.


Logistic regression equation

P-hat (subscript) i = 1 / 1 + e (to the power of) –(B1X1+C)


What is pi?

• Our model is of pî rather than Ŷ
o Pî is the estimated probability of the outcome i occurring (this is different to the predicted odds, which has another equation)


Predicted odds vs logit

They are just transformations of each other
• Predicted odds: odds of being a case
o Odds = p/(1-p), which ranges from 0 to positive infinity
o When p is .50, the odds are 1 (even odds, 1:1)
 .50/(1-.50) = .50/.50 = 1
o When p > .50, the odds >1
o Varies exponentially (not linearly, it's increasingly rapid?) with the predictor(s)
• Logit: natural logarithm of the odds
o Ranges from negative infinity to positive infinity
o Reflects odds of being a case but varies linearly with predictor(s)
o Not very interpretable
 If p =.8, the odds = 4 but the logit = 1.386


2 kinds of regression coefficient in logistic regression

• Typical partial regression coefficients (B)
o Identical in function to OLS regression
o Indicates increment in the logit given unit increment in predictor
• Odds ratios (eB)
o Exponential B Indicates the amount by which odds of being a case are multiplied given a unit increment in predictor (or change in level of predictor if the predictor is categorical)

o If B = 0, eB = 1, the predictor has no relationship


Estimating parameters in logistic regression

• Logistic regression uses maximum likelihood estimation, which is an iterative solution
o Regression coefficients are estimated by trial-and-error and gradual adjustment
 Seeks to maximise the likelihood (L) of the observed values of Y given a model and using the observed values of the predictors


What is the log-likelihood

Log Likelihoods
• To evaluate the model, a log likelihood (LL) value can be calculated for each model we test
• The LL is a function of the probabilities of the observed and model-predicted outcomes for each case, summed over all cases
• We can directly compare the goodness-of-fit of different models using the log likelihoods


How is model fit tested in logistic regression?

• log-likelihood ratio test - a test of model fit
• Significant likelihood ratio test tells us that the model is significantly worse with the corresponding predictor removed, thus the predictor should be retained in the model. If non-sig, you can probably remove that predictor.


How does the log-likelihood ratio test work?

Won’t be asked directly about this but need to know it for questions where you have to report results- will be good if you can interpret model fit statistics

• In likelihood ratio test, we test the null deviance (including only the constant) against the model deviance (containing k predictors)
• As k increases, the difference between the null and model deviance will generally increase, which improves the model fit
• If there is no significant improvement in fit when we add the k predictors to the model, we need to question the inclusion of those predictors
• If there is no significant deterioration in fit when we remove k predictors from the model, then we need to question the inclusion of those predictors
o I.e. they are redundant in the context of this outcome variable
• Only accept more predictors if they increase the significance of the model


What are Pseudo R2's? and limitations

These are analogous to R2 in linear multiple regression and attempt to describe the model in terms of ‘variance accounted for’ however we can not literally interpret them as such due to the heteroscedasticity which can not be avoided with a dichotomous DV.


How is significance/contribution of individual predictors tested in logistic regression?

1) Contribution to logit
Significance of the regression coefficient
The Wald statistic- predictor contribution to logit
2) Contribution to prediction:
Likelihood ratio test- compares the likelihood ratio with and without the predictor
Alternatively, the backwards stepwise method- see how good model is when you remove predictor


How are categorical and continuous predictors referred to in multinomial logistic regression?

o Categorical predictors are called ‘factors’
o Continuous predictors are called ‘covariates’


Difference between Binary vs multinomial regression?

o Binary logistic: binary DV only
o Multinomial: more than 2 categories of DV can be analysed


Assumptions of logistic regression

o High ratio of cases to variables
o Adequacy of expected frequencies?
o Linearity in the logit
o Absence of multicollinearity
o Absence of outliers in the solution
o Independence of observations and errors


Assumptions of logistic regression

o High ratio of cases to variables
o Adequacy of expected frequencies?
o Linearity in the logit
o Absence of multicollinearity
o Absence of outliers in the solution
o Independence of observations and errors


Testing the assumption of linearity in the logit

o Hosmer and Lemeshow method
-->Turn covariate into quartiles and enter as a factor. Do the Bs (the logits) show a roughly linear trend?
o Box-Tidwell Method
-->Construct a new predictor[example Leadership*log(leadership)] If this extra predictor is sig. then there is evidence of non-linearity in the logit


4 things to report when writing up logistic regression

1) Overall model fitting information
2) Predictor contribution
-->likelihood ratio test
--> contribution to logit (wald statistic) in parameter estimates table
3) Odds ratios


How to test for moderation in logistic regression

- If you have continuous variables standardise them and comppute a product term for predictor 1 x predictor 2, and entering this at step 2 in regression. The significance of this interaction term would indicate that there is a moderation.