Logistic Regression Flashcards

(32 cards)

1
Q

The linear regression model is…

A
  • Y = bX + c
• ^
  Y is the outcome variable
• b is the slope of the line
•X is the 'explanatory/predictor variable
•c is the intercept
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The residuals are…

A

Y - Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Logistic regression

A

-A non-linear regression model
•Has a dichtomous or categorical DV
•Predictors are either continuous it categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Related methods

A

-Logistic analysis/ multiway frequency table analysis:
•Multiple categorical predictors and one categorical DV

-Discriminant analysis:
•Multiple categorical or continuous predictors and one categorical DV
•More assumptions than logistic regression

-Linear regression:
•Multiple categorical or continuous predictors and one continuous DV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Research questions

A

-Can we predict the presence or absence of a disorder/disease?
•E.g. label present as 1, absent as 0

-Can we predict an outcome using a set of predictors?
•How good is the model?

-Does an individual predictor increase or decrease the probability of an outcome?
•Related to importance of predictors

-Classification and prediction

-Simple categorical outcomes
•Can we predict the outcomes using categorical predictors?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Ordinary Least Squares (OLS)

A

-All forms of multiple regression based on the same structural model: OLS

-3 important characteristics:
•Model is linear
•Residuals are assumed to be normally and homogenously distributed
•Predicted scores (Y’hat’) are on the same scale as the date (Y)

-Characteristics don’t apply to logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When representing results of a logistic regressiom graphically…

A

-Better to use the non-linear/sigmoidal model as it represents the essence of the data better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Important concepts in logistic regression:

Probability

A

-The likelihood of an event occurring

•If p = .80, there is an 80% chance of that event occurring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Important concepts in Logistic Regression

Predicted Odds

A

-The probability of an event occurring by the probability of it not occurring
•Probs of event happening/Prob of event not happening

-If p = .80, the probs of it not occurring is .20
• = .80/.20
•=4
~The odds were 4:1 in favour of the event occurring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Important concepts in Logistic Regression

Logit

A

-Odds are asymmetric (unequal) but can use the natural log of the odds instead
•Log of odds = Logit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Important concepts in Logistic Regression

Odds ratio

A

-Relationship between the odds of an event occurring across levels of another variable
•By how much do the odds of Y change as X increased by 1 unit

•Essentially a ratio of ratios

•Measure of effect size is central here
~A good way of measuring the strength of the relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The structural model

A
  • Don’t really need to know about it
  • Log odds turns a non-linear relationship into a linear one

-Our model is of p’hat’i rather than Y’hat’
•p’hat’i is the estimated probability of the outcome i occurring

  • Base e is an irrational constant, roughly 2.718
  • B and C are model parameters
  • Relates our predictor(s) to the predicted scores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Predicted odds vs Logit

Predicted odds

A

-Odds of being a case

-Odds = p/(1-p), which ranges from 0 to positive infinity
•When p = .50, the odds are 1 (even odds, 1:1)
•When p>.50, the odds are >1

-Varies exponentially with the predictor(s)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Predicted odds vs Logit

Logit

A

-Natural logarithm of the odds
•Ranges from negative to positive infinity

-Reflects odds of being a case but varies linearly with predictor(s)

-Not very interpretable
•If p = .8, the odds = 4
~But the logit = 1.386

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Predicted odds vs Logit

Essentially…

A
  • Logit = maths
  • Predicted odds = descriptive

-Basically the same things but just transformations of each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Two kinds of regression coefficient in logistic regression

A

-Typical partial regression coefficients (B)
•Identical in function to OLS regression
•Indicates an increment in the logit given unit increment in the predictor

-Odds ratios (e^B)
•Indicates the amount by which odds of being a case are multiplied given a unit increment in predictor (or change in level of predictor if the predictor is categorical)
•If B = 0, e = 1, the predictor has no relationship

17
Q

Estimating the parameters in a Logistic Regression Model

A

-OLS uses an analytic solution
•Regression coefficients are calculated from known equations
•Seeks to minimise the sum of (the residuals)^2

-Logistic regression uses maximum likelihood estimation, which is an iterative solution
•Regression coefficients are estimated by trial and error, and gradual adjustment
~Seeks to maximise the likelihood (L) of the observed values of Y given a model and using the observed values of the predictors

18
Q

Evaluating the model: in OLS multiple regression

A

-For OLS, the sum of squares are the building blocks of model evaluation
• Focus is the partitioning of variance
~SStotal= SSregression + SSresidual
~R^2 = SSregression/SStotal

19
Q

Evaluating the model: in Logistic Regression

A

-L.G uses measures of deviance rather than sum of squares
•Deviance is essentially the lack of fit

-The focus is the lack of fit
•Null deviance, Dnull is similar to SStotal
~Relfects the amount of variability in the data and the amount of deviance that could potentially be accounted for

•Model deviance, Dk, is similar to SSresidual
~Reflects the amount of variability in the data after accounting for prediction from k predictors

20
Q

Log likelihoods

A

-A log likelihood (LL) value can be calculated for each model we test to evaluate the model
•Essential to calculate for each model

  • The LL is a function of the probabilities of the observed and model-predicted outcomes for each case, summed over all cases
  • Can directly compare the goodness-of-fit of different models using LL
21
Q

Log Likelihood Ratio Tests

A
  • Compute a LL (LLs) value for a smaller model (one with k parameters)
  • Compute a LL (LLb) value for a bigger model (one with k+m) parameters

-Likelihood ratio test (LRT) statistic:
•Compares models hierarchically
•LRT = -2LLs-(-2LLb) = -2*log(Ls/Lb)
~If the smaller model is true, the LRT statistic is distributed at chi-square with m df

22
Q

Evaluating the model: in likelihood ratio

A

-Deviance measures contrast LLs using LL ratios
•Dnull = -2ln(Lnull/Lperfect)
~This compares the maximum likelihood (L) for
a model with no predictors (only an intercept) with a perfectly fitting model (aka saturated model)

•Dk = -2ln(Lk/Lperfect)
~This compares the maximum likelihood (L) for a model with a set of k predictors with a perfectly fitting model

23
Q

Testing model fit

-Won’t be asked directly but need to know it

A

-In likelihood ratio, we test the null deviance (including only the constant) against the model deviance (containing k predictors)
•As k increases, the difference between the null and model deviance will generally increase, improving model fit

•If there is no significant improvement in fit when we add the k predictors to the model, we need to question the inclusion of those predictors

•If there is no significant deterioration in fit when we remove the k predictors from the model, we need to question the inclusion of those predictors
~I.e. they are redundant in the context of outcome variable

-Only accept more predictors if they increase the significance of model

24
Q

Different expressions of the equation for the likelihood ratio test

A
  • Dnull-Dk
  • -2LLn-(-2LLk)
  • -2ln(Lnull/Lk)
25
Testing model fit (2)
-Always relate prediction model to null model •Null model might not be interestinf but we still need to try and get better than it -An example: •Null model = 20.28 •Model with 3 predictors = 16.19 •LR test statistic: 20.28-16.19 = 4.09 •Evaluate this number against the critical chi-squared value with 3 df ~3df because we have 3 predictors ~This is like the overall R^2 test based on F statistic •This is not significant, so no improvement in model fit with predictors included in model •A low power technique so would need lots of PPs
26
Testing model fit: Caveat
-Need to be wary of sample size: •Very large sample sizes will be likely to mean trivial differences in model fit between models will be significant, so sometimes need to use adjusted fit indices * Need more PPs than a linear regression * The more complicated the model, the more PPs needed
27
Pseudo-R^2s
- Don't need to know the formulae - It is possible to evaluate likelihood ratio model in an analogous way to standard MR using McFaddon's p^2 -Variations on this: •Cox and Snell Index: ~Reaches a maximum of .75 when there is equal n in each category of the DV •Nagelkerke Index: ~Divides Cox and Snell's R^2 by its maximum in order to achieve a measure than ranges from 0-1 -These do not indicate "variance accounted for" due to inherent heteroscedasticity/not being homoscedastic
28
Testing predictor significance
-Significance of regression coefficient: •Wald statistic: ~Quite conservative ~Distributed as chi-squared with df = 1 ~Similar interpretation as testing B or beta for significance in ordinary least squares regression -> SEb often overestimated, risk of type 2 errors •Another way to do it is to do the thing where you calculate the model with all predictors and see what happens when you take out each predictor ~If an individual predictor makes a significant effect to the model, rely on that more than Wald statistic -Contribution to prediction: •Compares the likelihood ratio with and without the predictor •Chi-squared =D(k-1) - Dk •Distributed as X^2 with df = 1 •Gives similar information to sr^2 in OLS regression
29
Model building
-Also possible to examine: •Main effects model: ~Main effects •Full factorial model: ~Main effects and interactions between factors, no interactions involving covariates •Complete: ~Main effects and all interactions, including interactions with covariates •Saturated: ~Same as complete only covariates treated as factors
30
Common technique in likelihood ratio is backward-stepwise
``` -Iterative process, as used in lab: •Begin with complete model •Remove non-significant variables •Re-run model and compare fit •Ends up with a model with only significant predictors in ```
31
Assumptions and considerations
- Relatively assumption free | - Assumptions regarding distributions of predictor variables do not apply
32
Assumptions and considerations Ratio of cases to variables
-Too few cases relative to number of predictor variables can be a problem • May produce extremely large parameter estimates and standard errors •Failure of covergenve when combinations of discrete variables result into many cells with no cases ~Solution: collapse categories, delete offending category or delete the discrete variables if not important -Extremely high parameter estimates and SEs indicate a problem •Estimate increases with more iterations or solution does not converge when the maximum likelihood is being conducted ~Solution: increase number of cases or eliminate one or more predictors