Unit 3: Logistic Regression Flashcards Preview

Biostatistics II > Unit 3: Logistic Regression > Flashcards

Flashcards in Unit 3: Logistic Regression Deck (26):
1

Which plot can be used to check the independence of observations assumption in logistic regression?

Scatterplot of residuals vs. the order of data collection

2

Sensitivity:

The probability that a test classifi es someone as sick given that the person is truly sick
P(T+|D+)

3

Specificity:

The probability that a test classifi es someone as healthy given that the person is truly healthy
P(T-|D-)

4

Accuracy:

The probability that a test correctly classi fies someone
How to calculate:
Add the concordant cells and divide by total.

5

Area Under the Curve (AUC):

describes the overall predictive ability of
the screening test (a coin-flip has AUC=0.5)
We want AUC to be close to 1.
Shown on the ROC curve; ROC curves are useful for quantitative screening tests

6

Which cutpoint (i.e., `decision rule') is the best?

It depends on the purpose of the screening test and the cost of misclassi cation

Usually you desire a balance between sensitivity and speci ficity

7

When do we do logistic regression?

When the outcome is a binary/dichotomous variable.
The appropriate measure for describing a dichotomous (binary) outcome depends on the study design, but generally ODDS RATIO is always appropriate

8

What are the three equivalent overall tests we can do in logistic regression?

Three asymptotically-equivalent tests:
(1) Likelihood ratio
(2) Score
(3) Wald

Rejecting H0 indicates that the model with all predictors is better than a model with no predictors (i.e., an intercept-only model)

* Similar to Overall F-test in MLR

9

What is the Type 3 test for an individual predictor?

Type 3 test asks: Is the predictor variable
associated with the outcome, given the association with the other predictors has already been accounted for?

Type 3 Test can accommodate multi-level categorical predictors, in addition to continuous and binary predictors

Hypotheses:
H0 : The predictor is not important (given all other predictors)
H1 : The predictor is important (given all other predictors)


Consider doing Type 3 after rejecting H0 in the Overall test.
Rejecting H0 of Type 3 implies that there is signi cant evidence of a linear association between the predictor and the binary response, given all other predictors are already included in the model.
Rejecting H0 of Type 3 Test implies an adjusted odds ratio not equal to 1
*Similar to partial F test in MLR

10

What is the difference between the estimated model and the predicted model?

The estimated is the logit model.
phat= (logit(pi/1+pi)) = B0 + B1 + B2 +...+Bj


The predicted is the odds model that has been exponentiated.
p=odds=exp(Bj)/1+exp(bj)

11

What is the Individual coefficient test?

Tests a single Bj predictor

H0 : B j = 0 (given all other predictors)
H1 : B not equal to 0 (given all other predictors)

Rejecting H0 implies that there is signi cant evidence of an association between Xj and Y, given all other predictors are in the model
Depending on the problem, we may be interested in testing against other
null values (e.g., H0 : B j = 1)
Should not be used for multi-level categorical covariates
* Similar to Partial T-test in MLR

12

What is the large sample assumption of logistic regression?

Hypothesis testing in Logistic Regression is based on large sample theory and asymptotics - large sample sizes are recommended
at least 100 observations need enough observations for each category/group

each B j 'costs' about 10 observations to estimate

13

Why the odds ratio?

Regardless of the specifi c study design used to collect the data, it is always appropriate to report an odds ratio

Since we are actually modeling the log(odds) in logistic regression, odds ratios tend to 'fall out' naturally.

14

What are the two types of odds ratios?

Simple odds ratios associated with individual predictors can be obtained by exponentiating the corresponding regression coefficient (e.g., expB j)

Complex odds ratios comparing any two predictor-profiles can be obtained by first determining the appropriate contrast and then exponentiating
OR1v2 = odds1/odds2

15

What are the assumptions of Logistic Regression?

Linear Relationship: Logit(p) can be modeled as a linear function of the predictors
Large sample with independent observations of equal importance (implied by errors and/or
design)
The predictors are independent of eachother (no multicollinearity)

*no error assumptions b/c there are no errors

16

How do you check for independence of observations assumption?

1. Plot of the outcome (Y ) against the index or order of data collection (to check independence)

2. Check characteristics of study design

17

How do you check for multicollinearity?

1. Plot predictor variables against each other
2. Look for large sample correlation coefficients
3. Look for large variance inflation factors (VIFs) via PROC REG

18

What are the model Selection Criteria for Logistic Regression?

1. Akaike information criterion (AIC) (smaller is better)
2. Bayesian information criterion (BIC) (smaller is better)
3. Area under the curve (AUC) (larger is better)
Has a meaningful stand-alone interpretation
Similar to R2 in MLR, the AUC tends to be overly optimistic regarding a model's true predictive abilities when applied to an
external dataset
4. Generalized R2 (larger is better)
Does not have the same interpretation as in MLR (i.e., variability explained)
Use as supporting evidence only when making decisions

19

What are the goodness of fit tests for logistic regression?

1. Pearson (categorical)
2. Deviance (categorical)
3. Hosmer Lemeshow (continous only and/or categorical data)

20

What are the hypotheses for the goodness of fit tests in logistic regression?

H0 : The model fits the data well
H1 : The model does not fit the data well

21

Which goodness of fit test is comparing a saturated model with interaction terms with a reduced model with only the main effects for categorical predictors?

The deviance test
Hypotheses of Deviance Test:
H0 : the (reduced) model fi ts the data well
H1 : the saturated (full) model provides a better fit to the data

(The Saturated Model refers to the model with all main e ffects and all possible
pairwise interactions)

Test statistic (D)
D= -2LogL(full) - -2logL(reduced)

Rejecting H0 implies that the saturated model is superior to the proposed reduced model
(The Deviance Test is actually a special case of the general Likelihood Ratio Test)

22

Which goodness of fit test compares what is predicted vs what is actually observed for categorical predictors?

The Pearson Chi-square test
H0 : the model fi ts the data well
H1 : the model does not fit the data well

Rejecting H0 implies that the model does not fi t the data well

23

Which goodness of fit test compares what is predicted vs what is actually observed for continuous predictors?

The Hosmer Lemeshow test
H0 : the model fi ts the data well
H1 : the model does not fit the data well

Rejecting H0 implies that the model does not fi t the data well
Can be thought of as an extension of the Pearson's chi-square test

24

What is the General Likelihood Ratio Test?

Can be thought of as 'the partial F-test of logistic regression'

Rejecting H0 implies that the full model is preferable to the reduced model

Reduced model must be `nested' within the full model (i.e., a special case)

The most general type of hypothesis test for logistic regression

(The Deviance Test is actually a special case of the general Likelihood Ratio Test)

25

What are 4 similarities between MLR and Logistic Regression?

1. general hypothesis testing for the overall model and individual predictors/coefficients
2. assumptions about independence of observations/predictors
3. checking assumptions and identifying outliers/influentialobservations
4. general model-selection strategies

26

What are differences between MLR and Logistic Regression?

1. the left-hand side (i.e., modeling logit(p) instead of y)
2. No error term in logistic regression
3. Need more observations, complexity cost more
4. The distribution of test statistics under the null hypothesis ( Chi-Square vs T Dist)
5. Primarily interested in odds ratios
6. likelihoods (not sums-of-squares) are used for model comparisons