Unit 3: Logistic Regression Flashcards by Emily Lemon

Which plot can be used to check the independence of observations assumption in logistic regression?

Scatterplot of residuals vs. the order of data collection

How well did you know this?

Not at all

Perfectly

Sensitivity:

The probability that a test classifies someone as sick given that the person is truly sick
P(T+|D+)

How well did you know this?

Not at all

Perfectly

Specificity:

The probability that a test classifies someone as healthy given that the person is truly healthy
P(T-|D-)

How well did you know this?

Not at all

Perfectly

Accuracy:

The probability that a test correctly classifies someone
How to calculate:
Add the concordant cells and divide by total.

How well did you know this?

Not at all

Perfectly

Area Under the Curve (AUC):

describes the overall predictive ability of
the screening test (a coin-flip has AUC=0.5)
We want AUC to be close to 1.
Shown on the ROC curve; ROC curves are useful for quantitative screening tests

How well did you know this?

Not at all

Perfectly

Which cutpoint (i.e., `decision rule’) is the best?

It depends on the purpose of the screening test and the cost of misclassication

Usually you desire a balance between sensitivity and specificity

How well did you know this?

Not at all

Perfectly

When do we do logistic regression?

When the outcome is a binary/dichotomous variable.
The appropriate measure for describing a dichotomous (binary) outcome depends on the study design, but generally ODDS RATIO is always appropriate

How well did you know this?

Not at all

Perfectly

What are the three equivalent overall tests we can do in logistic regression?

Three asymptotically-equivalent tests:

(1) Likelihood ratio
(2) Score
(3) Wald

Rejecting H0 indicates that the model with all predictors is better than a model with no predictors (i.e., an intercept-only model)

Similar to Overall F-test in MLR

How well did you know this?

Not at all

Perfectly

What is the Type 3 test for an individual predictor?

Type 3 test asks: Is the predictor variable
associated with the outcome, given the association with the other predictors has already been accounted for?

Type 3 Test can accommodate multi-level categorical predictors, in addition to continuous and binary predictors

Hypotheses:
H0 : The predictor is not important (given all other predictors)
H1 : The predictor is important (given all other predictors)

Consider doing Type 3 after rejecting H0 in the Overall test.
Rejecting H0 of Type 3 implies that there is signicant evidence of a linear association between the predictor and the binary response, given all other predictors are already included in the model.
Rejecting H0 of Type 3 Test implies an adjusted odds ratio not equal to 1
*Similar to partial F test in MLR

How well did you know this?

Not at all

Perfectly

What is the difference between the estimated model and the predicted model?

The estimated is the logit model.
phat= (logit(pi/1+pi)) = B0 + B1 + B2 +…+Bj

The predicted is the odds model that has been exponentiated.
p=odds=exp(Bj)/1+exp(bj)

How well did you know this?

Not at all

Perfectly

What is the Individual coefficient test?

Tests a single Bj predictor

H0 : Bj = 0 (given all other predictors)
H1 : B not equal to 0 (given all other predictors)

Rejecting H0 implies that there is signicant evidence of an association between Xj and Y, given all other predictors are in the model
Depending on the problem, we may be interested in testing against other
null values (e.g., H0 : Bj = 1)
Should not be used for multi-level categorical covariates
* Similar to Partial T-test in MLR

How well did you know this?

Not at all

Perfectly

What is the large sample assumption of logistic regression?

Hypothesis testing in Logistic Regression is based on large sample theory and asymptotics - large sample sizes are recommended
at least 100 observations need enough observations for each category/group

each Bj ‘costs’ about 10 observations to estimate

How well did you know this?

Not at all

Perfectly

Why the odds ratio?

Regardless of the specific study design used to collect the data, it is always appropriate to report an odds ratio

Since we are actually modeling the log(odds) in logistic regression, odds ratios tend to ‘fall out’ naturally.

How well did you know this?

Not at all

Perfectly

What are the two types of odds ratios?

Simple odds ratios associated with individual predictors can be obtained by exponentiating the corresponding regression coefficient (e.g., expBj)

Complex odds ratios comparing any two predictor-profiles can be obtained by first determining the appropriate contrast and then exponentiating
OR1v2 = odds1/odds2

How well did you know this?

Not at all

Perfectly

What are the assumptions of Logistic Regression?

Linear Relationship: Logit(p) can be modeled as a linear function of the predictors
Large sample with independent observations of equal importance (implied by errors and/or
design)
The predictors are independent of eachother (no multicollinearity)

*no error assumptions b/c there are no errors

How well did you know this?

Not at all

Perfectly

How do you check for independence of observations assumption?

Study These Flashcards

Plot of the outcome (Y ) against the index or order of data collection (to check independence)
Check characteristics of study design

How do you check for multicollinearity?

Study These Flashcards

Plot predictor variables against each other
Look for large sample correlation coefficients
Look for large variance inflation factors (VIFs) via PROC REG

What are the model Selection Criteria for Logistic Regression?

Study These Flashcards

Akaike information criterion (AIC) (smaller is better)
Bayesian information criterion (BIC) (smaller is better)
Area under the curve (AUC) (larger is better)
Has a meaningful stand-alone interpretation
Similar to R2 in MLR, the AUC tends to be overly optimistic regarding a model’s true predictive abilities when applied to an
external dataset
Generalized R2 (larger is better)
Does not have the same interpretation as in MLR (i.e., variability explained)
Use as supporting evidence only when making decisions

What are the goodness of fit tests for logistic regression?

Study These Flashcards

Pearson (categorical)
Deviance (categorical)
Hosmer Lemeshow (continous only and/or categorical data)

What are the hypotheses for the goodness of fit tests in logistic regression?

Study These Flashcards

H0 : The model fits the data well

H1 : The model does not fit the data well

Which goodness of fit test is comparing a saturated model with interaction terms with a reduced model with only the main effects for categorical predictors?

Study These Flashcards

The deviance test
Hypotheses of Deviance Test:
H0 : the (reduced) model fits the data well
H1 : the saturated (full) model provides a better fit to the data

(The Saturated Model refers to the model with all main effects and all possible
pairwise interactions)

Test statistic (D)
D= -2LogL(full) -  -2logL(reduced)

Rejecting H0 implies that the saturated model is superior to the proposed reduced model
(The Deviance Test is actually a special case of the general Likelihood Ratio Test)

Which goodness of fit test compares what is predicted vs what is actually observed for categorical predictors?

Study These Flashcards

The Pearson Chi-square test
H0 : the model fits the data well
H1 : the model does not fit the data well

Rejecting H0 implies that the model does not fit the data well

Which goodness of fit test compares what is predicted vs what is actually observed for continuous predictors?

Study These Flashcards

The Hosmer Lemeshow test
H0 : the model fits the data well
H1 : the model does not fit the data well

Rejecting H0 implies that the model does not fit the data well
Can be thought of as an extension of the Pearson’s chi-square test

What is the General Likelihood Ratio Test?

Study These Flashcards

Can be thought of as ‘the partial F-test of logistic regression’

Rejecting H0 implies that the full model is preferable to the reduced model

Reduced model must be `nested’ within the full model (i.e., a special case)

The most general type of hypothesis test for logistic regression

(The Deviance Test is actually a special case of the general Likelihood Ratio Test)

What are 4 similarities between MLR and Logistic Regression?

1. general hypothesis testing for the overall model and individual predictors/coefficients 2. assumptions about independence of observations/predictors 3. checking assumptions and identifying outliers/influentialobservations 4. general model-selection strategies

What are differences between MLR and Logistic Regression?

1. the left-hand side (i.e., modeling logit(p) instead of y) 2. No error term in logistic regression 3. Need more observations, complexity cost more 4. The distribution of test statistics under the null hypothesis ( Chi-Square vs T Dist) 5. Primarily interested in odds ratios 6. likelihoods (not sums-of-squares) are used for model comparisons

Unit 3: Logistic Regression Flashcards

(26 cards)