Week 7-Logistic Regression Flashcards by Meg A

What are 2 methods used to measure the same thing?

Two categories (either this or that)
Continuous scores (looking to see if predictors can predict variance in the scores)

How well did you know this?

Not at all

Perfectly

What are linear models?

Whether or not we can predict variance in a continuous outcome (this lecture is NOT linear but linear things include ANOVA, T Tests etc., e.g., they’re either this or that with no grey area like dead or alive, depressed or not etc.,)

How well did you know this?

Not at all

Perfectly

What are continous outcomes?

-Analysed with linear regression-based methods

-(can also be used in ANOVA, correlation etc.,)

How well did you know this?

Not at all

Perfectly

What are categorical outcomes?

-Analysed with logistic regression-based methods

-(can also be used Chi-squared, sign test)

How well did you know this?

Not at all

Perfectly

What are the 6 advantages of continuous outcomes?

Inferences can be made with fewer data points (increased statistical power with 30% discrepancy with power compared to categorical)
Higher sensitivity (can get a range of scores seeing who scores low and high and in-between NO binary outcomes with either this or that)
More variety in analysis options
Information on the variability of a construct within a population
Give a better understanding of the variable in question.
Nonsensical distinctions avoided (e.g., From an analysis perspective people who have 4 depressive symptoms compared to 5 (no depression vs bare level of depression) are completely different BUT 5 symptoms compared to 9 are seen as the same despite having major differences in severity)

How well did you know this?

Not at all

Perfectly

What is the disadvantage of continuous outcomes?

Associations and variance between things such as alcohol use and depression are seen as clinically significant/relevant when they are in fact not!

This is because there is no proof that this is in relation to clinically relevant depression symptoms, what should be said is that alcohol consumption is associated with increased scores on the BDI

How well did you know this?

Not at all

Perfectly

What is an advantage of categorical outcomes?

When we use diagnostic criteria to give formal diagnoses we can talk about interventions and/or predictors as having a clinically relevant impact

How well did you know this?

Not at all

Perfectly

Categorical outcomes: What are criterion references?

■ Some questionnaires have cut-offs to group people:
– Hazardous drinking scores on the questionnaire (AUDIT). Use the cut-off designated by the AUDIT, scores above 8 = hazardous drinking.
– Beck depression inventory (BDI) 9+ is depressed.

Problems:
-These questionnaires have not been validated in every single sample of people e.g., low ecological validity of the questionnaire + low reliability e.g., students will typically score high in AUDIT so this sample has not been validated

■ Useless in certain groups (non-clinical samples rarely score above the cut-off on the BDI as it tends to be clinically depressed people who score high).

■ As effective as a true diagnosis?

How well did you know this?

Not at all

Perfectly

Categorical outcomes: What are normative references?

■ Compare to the norm of your sample.
■ Done using Median splits.
■ Number of units of alcohol drunk per week. Participants above the median = heavy drinkers, below the median = light drinkers.

Problems:
■ Easy to do but arbitrary (random)

■ Totally sample dependent (take a new sample and the median may well be very different)

■ Can do tertile splits (top third vs bottom third then getting rid of middle to reduce the power more), quartile splits and so on

-Lacks sensitivity

How well did you know this?

Not at all

Perfectly

What does categorical data allow us to have?

Allows us to make decisions concerning clinical outcomes, or what we decide is a relevant effect (doesn’t have to be a clinical diagnosis could just be something we decide is a relevant effect)

For example, we may decide that losing 5kg is a clinically significant outcome in a weight loss trial, giving us two groups, successful weight loss (5kg or more) vs. unsuccessful (<5kg)

How well did you know this?

Not at all

Perfectly

What do we use logistic regression for?

■ We use logistic regression to explore what variables are associated with an outcome

■ This gives us model fit statistics (similar to a linear regression i.e., how well our model fits our data)

■ Regression coefficients for each predictor (similar to a linear regression i.e., to see if there is an association with the DV)

■ Odds Ratio’s: These explain the % change in the DV attributable to a unit change in an IV (i.e., how likely someone is to be in one group compared to the other group and if we increase the IV, how much this changes)

How well did you know this?

Not at all

Perfectly

What does logistic regression do?

It predicts membership of a group (i.e., what group does an individual belong to?)

■ It is called “binary” logistic regression as that refers to a dichotomous outcome e.g. Relapse = 1, non-relapse = 0

How well did you know this?

Not at all

Perfectly

Why can’t we just fit a straight line in a regression?

-The line would go out of the scatterplot meaning that people addicted for 12 years for example, would be 0.5 relapse (which doesn’t make sense when people can either relapse or not relapse!)

-Another example is someone addicted for 20 years would be above 100% relapse above the line (which again makes no sense!)

-Values cannot exist in the box! You can’t be half a relapse

-A regression line however would give values for within the box

How well did you know this?

Not at all

Perfectly

What does logistic regression do if it is unwise to plot a straight line?

Rather than fitting a line of best fit, it fits an S-shaped curve

-It looks to see whether or not it can correctly categorise/predict people as having relapsed or not relapsed. Then it plots another one and so on to see how much it predicts and if its done correctly (i.e., the maximum likelihood e.g., sees the most people who have relapsed vs haven’t relapsed)

How well did you know this?

Not at all

Perfectly

What do linear regressions test?

Tests how close the predicted line is to the actual data (for each data point).

How well did you know this?

Not at all

Perfectly

What do logistic regression calculations produce?

Study These Flashcards

Logistic regression calculations produce a log-likelihood- i.e. How likely is a model to predict that someone is in the correct group

■ Log-Likelihood (LL): participants observed value (whats actually happened) for the outcome (0/1) and their predicted value (which will range from 0 – certainly will not happen, and 1 certainly will happen), these discrepancies between observed and predicted are summed across all participants. Its counterpart in linear regression would be the sum of squares error (how far each observation is from the prediction).

How is Log-Likelihood achieved?

Study These Flashcards

■ Logistic regression compares results to a baseline model
– The baseline prediction is done by simply predicting that participants are more likely to fall into the largest of the two groups (i.e., has a best guess)
– E.g. We have two groups; Relapse = 292 participants, abstinent = 123 participants, an educated guess would be that a randomly selected
participant will relapse

■ We can then test whether our model with IVs is better than the baseline model (best guess model) for predicting group membership

■ 2(LLnew - LL old)
■ FYI it is multiplied by two so it can be tested for significance more accurately

What variations of the R2 have been designed for logistic regression?

Study These Flashcards

■ PSEUDO R2 …..WHY? (don’t necessarily need them as you already have sufficient information but just best to report)

Cox & Snell’s R2 - -Based on LLnew - LL old and accounts the sample size (but can never reach the theoretical R2 max value of 1)
Nagelkerke’s R2 -Similar but can reach the maximum of 1.

■ People report either, I would generally report the Cox and Snell as I think Nagelkerke can be misleadingly high.

What range of stats does the Logistic regression produce for individual predictors?

Study These Flashcards

The regression coefficient (b) and its SE (standard error):
– This gives you the direction of an association and the variability in this association
– A positive coefficient means high scores are associated with the group labelled as one (relapse), negative coefficient means high scores associated with the group labelled as 0 (no relapse)

We also have some unique stats in logistic regression:
– Wald statistic
– Exp(B), this an Odds Ratio
– Its called Exp(B) because its an exponentialized regression coefficient

What does the classification table tell us?

Study These Flashcards

Tells you:
-The number of cases your model correctly
identified and the % of these that were correct (you will want to report this %)
-The true negative (Observed No/Predicted No) and positive (Observed Yes/Predicted Yes)

What is the Wald’s Statistic?

Study These Flashcards

■ Just like the linear regression it will get a b and (SE), but we do not generate a p value from this directly

■ Wald tests whether the amount of variance explained by a single predictor is significantly different than zero (what a normal regression does using a t statistic )

– Note: be cautious with the Wald test when b is large the SE is inflated (this means the Wald stat is underestimated).

What is the Exp(B)/Odds Ratio (OR)?

Study These Flashcards

■ This is an odds ratio (OR) indicates the change
in odds resulting from a one unit change in the IV

■ OR= Odds after a unit change in the IV divided by
Original odds
* OR of 1 = no change in likelihood of event
* OR of .5 = 50% decrease in likelihood of
event
* OR of 1.5 = 50% increase in likelihood of
event
* OR of 4.7 = 370% increase

Why can’t Exp(B)/Odds Ratio (OR) be negative?

Study These Flashcards

■ Cannot be negative.
■ This because 1 = no change, therefore below 1 = decrease in odds
■ OR range from 0 to infinity
■ OR less than 1 need to be treated with caution as there is less numerical space for them to operate in!

What do confidence intervals reflect about Exp(B)/Odds Ratio (OR)?

Study These Flashcards

■ Confidence intervals “95% CI” can (and should) be reported after the odds ratio

■ 95% confidence interval reflects how confident we can be in our Odds ratio, which expresses the range in which our estimate will fall, 95% of samples from this population will fall in this range

■ Low variation= “tighter” CI = more accurate estimate

■ If it overlaps with 1 this means there will not be a significant effect as the range of predicted values overlaps with one (one= no change)

What are the assumptions of logistic regression?

■ DV is categorical with two levels only (hence binary 0/1 e.g., relapse or no relapse) ■ One of the DV “events” should not be rare – E.g. 2 people getting a first, 548 not getting a first (rare event) – This causes a problem called “separation” where you get “perfect” predictors ■ IVs continuous (ratio/interval) or categorical. ■ No multicollinearity

Week 7-Logistic Regression Flashcards

(25 cards)