Flashcards in Unit 3: Logistic Regression Deck (26):

1

## Which plot can be used to check the independence of observations assumption in logistic regression?

###
Scatterplot of residuals vs. the order of data collection

2

## Sensitivity:

###
The probability that a test classifies someone as sick given that the person is truly sick

P(T+|D+)

3

## Specificity:

###
The probability that a test classifies someone as healthy given that the person is truly healthy

P(T-|D-)

4

## Accuracy:

###
The probability that a test correctly classifies someone

How to calculate:

Add the concordant cells and divide by total.

5

## Area Under the Curve (AUC):

###
describes the overall predictive ability of

the screening test (a coin-flip has AUC=0.5)

We want AUC to be close to 1.

Shown on the ROC curve; ROC curves are useful for quantitative screening tests

6

## Which cutpoint (i.e., `decision rule') is the best?

###
It depends on the purpose of the screening test and the cost of misclassication

Usually you desire a balance between sensitivity and specificity

7

## When do we do logistic regression?

###
When the outcome is a binary/dichotomous variable.

The appropriate measure for describing a dichotomous (binary) outcome depends on the study design, but generally ODDS RATIO is always appropriate

8

## What are the three equivalent overall tests we can do in logistic regression?

###
Three asymptotically-equivalent tests:

(1) Likelihood ratio

(2) Score

(3) Wald

Rejecting H0 indicates that the model with all predictors is better than a model with no predictors (i.e., an intercept-only model)

* Similar to Overall F-test in MLR

9

## What is the Type 3 test for an individual predictor?

###
Type 3 test asks: Is the predictor variable

associated with the outcome, given the association with the other predictors has already been accounted for?

Type 3 Test can accommodate multi-level categorical predictors, in addition to continuous and binary predictors

Hypotheses:

H0 : The predictor is not important (given all other predictors)

H1 : The predictor is important (given all other predictors)

Consider doing Type 3 after rejecting H0 in the Overall test.

Rejecting H0 of Type 3 implies that there is signicant evidence of a linear association between the predictor and the binary response, given all other predictors are already included in the model.

Rejecting H0 of Type 3 Test implies an adjusted odds ratio not equal to 1

*Similar to partial F test in MLR

10

## What is the difference between the estimated model and the predicted model?

###
The estimated is the logit model.

phat= (logit(pi/1+pi)) = B0 + B1 + B2 +...+Bj

The predicted is the odds model that has been exponentiated.

p=odds=exp(Bj)/1+exp(bj)

11

## What is the Individual coefficient test?

###
Tests a single Bj predictor

H0 : Bj = 0 (given all other predictors)

H1 : B not equal to 0 (given all other predictors)

Rejecting H0 implies that there is signicant evidence of an association between Xj and Y, given all other predictors are in the model

Depending on the problem, we may be interested in testing against other

null values (e.g., H0 : Bj = 1)

Should not be used for multi-level categorical covariates

* Similar to Partial T-test in MLR

12

## What is the large sample assumption of logistic regression?

###
Hypothesis testing in Logistic Regression is based on large sample theory and asymptotics - large sample sizes are recommended

at least 100 observations need enough observations for each category/group

each Bj 'costs' about 10 observations to estimate

13

## Why the odds ratio?

###
Regardless of the specific study design used to collect the data, it is always appropriate to report an odds ratio

Since we are actually modeling the log(odds) in logistic regression, odds ratios tend to 'fall out' naturally.

14

## What are the two types of odds ratios?

###
Simple odds ratios associated with individual predictors can be obtained by exponentiating the corresponding regression coefficient (e.g., expBj)

Complex odds ratios comparing any two predictor-profiles can be obtained by first determining the appropriate contrast and then exponentiating

OR1v2 = odds1/odds2

15

## What are the assumptions of Logistic Regression?

###
Linear Relationship: Logit(p) can be modeled as a linear function of the predictors

Large sample with independent observations of equal importance (implied by errors and/or

design)

The predictors are independent of eachother (no multicollinearity)

*no error assumptions b/c there are no errors

16

## How do you check for independence of observations assumption?

###
1. Plot of the outcome (Y ) against the index or order of data collection (to check independence)

2. Check characteristics of study design

17

## How do you check for multicollinearity?

###
1. Plot predictor variables against each other

2. Look for large sample correlation coefficients

3. Look for large variance inflation factors (VIFs) via PROC REG

18

## What are the model Selection Criteria for Logistic Regression?

###
1. Akaike information criterion (AIC) (smaller is better)

2. Bayesian information criterion (BIC) (smaller is better)

3. Area under the curve (AUC) (larger is better)

Has a meaningful stand-alone interpretation

Similar to R2 in MLR, the AUC tends to be overly optimistic regarding a model's true predictive abilities when applied to an

external dataset

4. Generalized R2 (larger is better)

Does not have the same interpretation as in MLR (i.e., variability explained)

Use as supporting evidence only when making decisions

19

## What are the goodness of fit tests for logistic regression?

###
1. Pearson (categorical)

2. Deviance (categorical)

3. Hosmer Lemeshow (continous only and/or categorical data)

20

## What are the hypotheses for the goodness of fit tests in logistic regression?

###
H0 : The model fits the data well

H1 : The model does not fit the data well

21

## Which goodness of fit test is comparing a saturated model with interaction terms with a reduced model with only the main effects for categorical predictors?

###
The deviance test

Hypotheses of Deviance Test:

H0 : the (reduced) model fits the data well

H1 : the saturated (full) model provides a better fit to the data

(The Saturated Model refers to the model with all main effects and all possible

pairwise interactions)

Test statistic (D)

D= -2LogL(full) - -2logL(reduced)

Rejecting H0 implies that the saturated model is superior to the proposed reduced model

(The Deviance Test is actually a special case of the general Likelihood Ratio Test)

22

## Which goodness of fit test compares what is predicted vs what is actually observed for categorical predictors?

###
The Pearson Chi-square test

H0 : the model fits the data well

H1 : the model does not fit the data well

Rejecting H0 implies that the model does not fit the data well

23

## Which goodness of fit test compares what is predicted vs what is actually observed for continuous predictors?

###
The Hosmer Lemeshow test

H0 : the model fits the data well

H1 : the model does not fit the data well

Rejecting H0 implies that the model does not fit the data well

Can be thought of as an extension of the Pearson's chi-square test

24

##
What is the General Likelihood Ratio Test?

###
Can be thought of as 'the partial F-test of logistic regression'

Rejecting H0 implies that the full model is preferable to the reduced model

Reduced model must be `nested' within the full model (i.e., a special case)

The most general type of hypothesis test for logistic regression

(The Deviance Test is actually a special case of the general Likelihood Ratio Test)

25

## What are 4 similarities between MLR and Logistic Regression?

###
1. general hypothesis testing for the overall model and individual predictors/coefficients

2. assumptions about independence of observations/predictors

3. checking assumptions and identifying outliers/influentialobservations

4. general model-selection strategies

26