Lecture 10: Logistic regression Flashcards

Question 1

Q

What is a contingency table?

Answer

A

Assess the frequency distribution of each of two categorical variables as well as the association between two categorical variables

To form one in SPSS use crosstabs

Question 2

Q

What does expected frequencies table represent?

Answer

A

If null hypothesis was true what would the proportions be

Question 3

Q

What is the risk of an outcome?

Answer

A

The risk of an outcome is the number of times the outcome of interest occurred / the total number of possible outcomes (did & didn’t)

Question 4

Q

How is a risk ratio calculated?

Answer

A

Calculate the risk of having the outcome in both groups (with exposure / without exposure)

Risk of outcome with exposure
= number of times outcome occurred with exposure / total number of times of exposure (with & without outcome)

Risk of outcome without exposure
= number of times outcome occurred without exposure / total number of times without exposure (with & without outcome)

To calculate risk ratio
Risk with outcome / risk without outcome

if > 1 there is more of a risk of the outcome occurring with the exposure
risk of 1 there is no difference between groups
risk < 1 is risk is less with exposure

Question 5

Q

How does odds calculation differ to risk?

Answer

A

Rather than dividing by total number of events - the number of outcomes is / number of times without outcome

Odds ratio calculated by dividing odds in exposed group / odds in non-exposed group

OR = 1 - mean outcome occurs half time and is not related to exposure

OR > 1 - outcomes occurs more than half time with exposure - is related

OR < 1 - outcome occurs less than half time - exposure doesn’t associate to increase risk of disease

Question 6

Q

When can risk and odds ratios be used?

Answer

A

Risk ratio - cannot be used in a case control study - i.e. when participants are selected already for the outcome of interest. Only odds ratio used here

Rare outcomes - both risk/odds ratios can be used

Odd’s ratios can be used in many study designs and forms the basis for logistic regression
Risk ratios often preferred for clinical practice

Question 7

Q

Why can’t simple linear regression be used for a binary outcome?

Answer

A

Linear regression assumes the population distribution is normally distributed around the mean (for each value of X) - not going to be the case is here is a binary response

Linear relationship doesn’t make sense for a binary outcome

Output variable is limited to 0,1 - some of our observations would be outside this range
Our goal is to separate the two best groups rather than minimise the least square error
If linear regression was used would be very sensitive to influential outliers
Homogeneity of variance would be violated

Sigmoid - S - shape used instead

Question 8

Q

What function is used in logistic regression?

Answer

A

g(y) link function - gives ability to model the distribution between left and right side of function

n = alpha + betaX

log (n / 1 - n) = alpha + betaX

The logistic function allows a linear relationship to be plotted

Question 9

Q

If beta increases in a simple logistic regression how will the log of the odds increase?

Answer

A

The log of the odds will increase and the steepness of the curve will increase

Question 10

Q

What is used to estimate the beta coefficient and the constant in logistic regression?

Answer

A

Maximum likelihood - an iterative process - many models are tried until the best fit is found

Find the coefficient value which makes the observed data most likely

(in SLR ordinary least squares is used)

Question 11

Q

How do we interpret the coefficient in logistic regression?

Answer

A

Increase in X by one unit will affect log(odds) by the value of the coefficient

To work out the odds change by one unit increase in X - use antilog function

anti-log (eX) = e (to the power of beat coefficient value) = odds ratio for the value of concentration occuring

Question 12

Q

What are the assumptions of binary logistic regression?

Answer

A

Binary dependent variable which has a Bernouli distribution

The binary variable is only linearly related to the predictor variables after transforming into the logit scale

The observations are independent

Continuous variables have a linear effect on the log-odds scale

Use for binary dependent variable with continuous predictors

Question 13

Q

What is the probability of an event in logistic regression?

Answer

A

p = exp (L) / 1 + exp (L)
= 1 / 1 + exp(-L)

Odds of an event p / 1-p

Logit - Model: ln (p/1-p) = alpha + betaX

L = alpha + BetaX is the linear predictor

exp (L) = eL is the odds of an event

Question 14

Q

How is binary logistic regression ran in SPSS?

Answer

A

Select regression - binary logistic

Choose variables - for any categorical - change reference category from last to first - select categorical - first - change

Choose the confience ratios for exponential beta

Question 15

Q

What does the omnibus test of model coefficients explain?

Answer

A

The omnibus test of model explains whether the inclusion of a block of variables contributes to a better model fit

The coefficient of determination (R2) gives an indication of how much variation in y is explained by the model

Nagelkerke R2 - is used

The classification table indicates how does the correct classification improve when the predictors are included in the model

Question 16

Q

What is the regression equation from the output

Answer

A

ln (p / 1 - p) = constant + BetaX

Question 17

Q

Can a binary logistic regression involve more than one predictor?

Answer

A

Yes these an be numerical or categorical

More than 1 level categorical variables need to be recoded with dummy variables

Interaction terms should be considered for each dummy variable

Question 18

Q

How can a prediction be estimated using the linear predictor?

Answer

A

linear predictor (L) = constant + betacoefficient

Probability = exp (L) / 1 + exp (L)

The probability of Y if X = this is,

Question 19

Q

What is classification?

Answer

A

Establishes a certain probability value which is used to indicate a positive and one which is used to indicate a negative

The true values are then cross tabulated

Question 20

Q

How can the classification table help

Answer

A

Percentage accuracy in classification:

Sensitivity: percentage of cases with outcome that were correctly predicted by the model - true positives

Specificity - percentage of cases without observed characteristic that were correctly predicted - true negatives

Positive predictive value - percentage of correctly predicted cases with observed characteristic compared to total number of cases predicted as having characteristics

Negative predictive value - percentage of correctly predicted cases without observed characteristic compared to total number of cases predicted as not having the characteristics

Question 21

Q

How do we interpret the hosmer and lemeshow goodness of fit?

Answer

A

Non-significant indicates good fit
Large chi square value with small p value indicates poor fit
Whereas a small chi square with a large p value (close to 1) indicates good logistic regression model fit

Brainscape's Knowledge GenomeTM

Lecture 10: Logistic regression Flashcards

Brainscape's Knowledge Genome^TM