Logistic regression Flashcards
(24 cards)
Why is logistic regression different to other lectures?
Previous lectures will examine predicting variance in a continuous, normally distributed dependent variable. i.e. Linear regression.
This is very common, but it is not the only type outcome we are interested in, somethings cannot be measured in that way particularly in the medical world
Alive vs. dead
Addicted vs. non-addicted
Relapse vs. non-relapse
This use of absolute categories is extremely common in clinical psychology
Difference between logistic and linear regression
Linear regression predicts a continuous outcome, while logistic regression predicts a categorical outcome (usually a probability).
Logistic regression
Logistic regression is a data analysis technique that uses mathematics to find the relationships between two data factors. It then uses this relationship to predict the value of one of those factors based on the other. The prediction usually has a finite number of outcomes, like yes or no.
Categorical outcomes
Analysed with logistic regression based methods
Example of categorical vs continuous outcomes in depression
Categorical - depressed vs not depressed
Continuous - scores on BDI
Advantages of continuous outcomes
Inferences can be made with fewer data points - continuous outcome so less pp needed for power
Continuous data often provides more statistical power compared to binary data, making it easier to detect meaningful differences between groups.
Higher sensitivity
More variety in analysis options - can look at people who are half way or a bit depressed, severely etc - whole continuum
Information on variability of a construct within a population
Give a better understanding of the variable in question.
Nonsensical distinctions avoided - binary outcomes can lead to distinctions that make no sense at all - somebody with a score of 5 could be treated in the same way as someone with a score of 9 abut different to score of 4 - issue with dichotomising things
Why should we use binary outcomes
Increased scores on a scale does not always mean depressed/addicted etc. The whole sample could not be depressed, their scores could just go up a bit
Has clinical relevance if we use diagnostic criteria to give formal diagnoses. Have people who meet diagnosis threshold and then measure again after intervention to see if they still meet the threshold
Summary of categorical outcomes
Categorical data has its limitations and lacks sensitivity
BUT allow us to make decisions in relation to clinical outcomes, or what we decide is a relevant effect (doesn’t have to be a clinical diagnosis)
Logistic regression summary
We use logistic regression to explore what variables are associated with an outcome
This gives us model fit statistics (similar to a linear regression)
Regression coefficients for individual predictors (similar to a linear regression)
Odds Ratio’s
Odd’s ratios
These explain the % change in the DV attributable to a unit change in an IV
What does logistic regression allow us to do that linear does not?
To make absolute conclusions about concepts
e.g. relapse - we could report increased frequency of drug use but to make absolute conclusions we need to be able to say whether it specifically causes relapse.
What does logistic regression do
Predicts membership of a group
It is called “binary” logistic regression as that refers to a dichotomous outcome - TWO POSSIBLE OUTCOMES e.g. Relapse = 1, non-relapse = 0
Log-likelihood
How likely is a model to predict that someone is in the correct group
Looks at the discrepancy between what was observed and what the model predicted
Participants observed value for the outcome (0/1) and their predicted value (which will range from 0 – certainly will not happen, and 1 certainly will happen), these discrepancies between observed and predicted is summed across all participants. Its counterpart in linear regression would be the sum of squares error (how far each observation is from the prediction).
What does the logistic regression compare results to
A baseline model
What variation of R2 is used for logistic regression
McFadden’s R2 - a measure of how well the model fits the data compared to the null/baseline model. Compared the likelihood of the full model to the null model.
A higher McFadden = a better fit.
A confusion matrix
The number of participants your model correctly classifies
Expressed as a %
We will use the content of the confusion matrix to tell us the % of cases that were correctly classified
What stats are used to report predictors in a logistic regression
The regression coefficient (b) and its SE and p value
This gives you the direction of an association and the variability in this association
A positive coefficient means high scores are associated with the group labelled as one, negative coefficient means high scores associated with the group labelled as 0
Exp(B), this an Odds Ratio
Its called Exp(B) because its an exponentialized regression coefficient
Odds ratios and their meaning
OR of 1 = no change in likelihood of event
OR of .5 = 50% decrease in likelihood of event
OR of 1.5 = 50% increase in likelihood of event
OR of 4.7 = 370% increase
Logistic regression assumptions
DV is categorical with two levels only (hence binary 0/1)
One of the DV “events” should not be rare
E.g. 2 people getting a first, 548 not getting a first
This causes a problem called “separation” where you get “perfect” predictors - the best guess model is going to be extremely accurate and hard to beat it with predictors
IVs continuous (ratio/interval) or categorical.
No multicollinearity- can assess with VIF
Logistic regression model fit statistics reported
Chi Squared value (df) = , p= , pseudo R2 (McFadden)
Where are the odds ratios on the output
exp(Est. )
How to write up a logistic regression model
A logistic regression model was conducted to exam whether individuals were depressed or not. The following predictors were added to the model…
The model significantly predicted … correctly identifying % of cases
X2() = , p = , McFadden’s R2=
How to write up predictors
Was significantly positively/negatively associated with .. B = , SE= , p= , OR = , 95% CI to
Bayes factor