Chapter 4: Logistic and Poisson Regressions Flashcards

Question 1

Q

Logistic Regression

Answer

A

Models of discrete choice have been a topic in (Micro-) Econometrics and are nowadays widely used in Marketing research.
Logit, and probit models extend the principles of general linear models (ex., regression) to better treat the case of dichotomous and categorical target variables.
They focus on categorical dependent variables, looking at all levels of possible interaction effects.
McFadden got the 2000 Nobel price in Economics for fundamental contributions in discrete choice modeling.

Question 2

Q

Application of Logistic Regression

Answer

A

Why do commuters choose to fly or not to fly to a destination when there are alternatives.
Available modes = Air, Train, Bus, Car
Observed:
- Choice
- Attributes: Cost, terminal time, other ■Characteristics of commuters: Household income
Choose to fly iff U_fly > 0
- U_fly = β₀+β₁Cost + β₂Time + γIncome + ε

Question 3

Q

The Linear Probability Model

Answer

A

The predicted probabilities of the linear model can be

greater than 1 or less than 0

ε is not normally distributed because ! takes on only two values
The error terms are heteroscedastic

Question 4

Q

Gauss-Markov-Assumptions

Answer

A

The OLS estimator is the best linear unbiased estimator (BLUE), iff
- there is a linear relationship between predictors x and y
- the error variable is a normally distributed random variable with E(ε)=0.
- the error variance is constant for all values of * (homoscedasticity).
- The errors ε are independent of each other.
- No multicollinearity among predictors (i.e., high correlation).

Question 5

Q

The Logistic Regression Model

Answer

A

The “logit” model solves the problems of the linear model:
- ln[p/(1-p)] = β₀ + β₁X₁ + ε
p is the propability that the event Y occurs, Pr(Y= 1 | X₁)
p/(1 - p) describes the odds
- The 20% propability of winning describes odds of 0.20/0.80=0.25
- A 50% chance of winning leads to odds of 1
ln[p/(1-p)] is the log odds, or “logit”
- p = 0.50, then logit = 0
- p = 0.70, then logit = 0,84
- p = 0.30, then logit = -0,84

Question 6

Q

Logistic Function

Answer

A

The logistic function Pr1!|(3 constrains the estimated probabilities to lie between 0 and 1 (0 <= Pr(Y | X) <= 1).
- Pr(Y | X) = e^{β₀+β₁X₁} / (1 + e^{β₀ +β₁X1})
Pr(Y | X) is the estimated probability that the ith case is in a category and β₀ + β₁X₁ is the regular linear regression equation
This means that the probability of a success (Y = 1) given the predictor variable (X) is a non-linear function, specifically a logistic function
- if you let β₀ +β₁X₁ =0,then p = .50
- as β₀ + β₁X₁ gets really big, p approaches 1
- as β₀ + β₁X₁ gets really small, p approaches 0
The values in the regression equation β₁ and β₀ take on slightly different meanings.
- β₀ <- The regression constant (moves curve left and right)
- β₁ <- The regression slope (steepness of curve)
- -β₀/β₁<- The threshold, where probability of success = .50

Question 7

Q

Odds and Logit

Answer

A

By algebraic manipulation, the logistic regression equation can be written in terms of an odds of success:

p/(p-1) = e^{β₀+β₁X₁}
Odds range from 0 to positive infinity
If p/(p-1) is
- less than 1, then less than .50 probability
- greater than 1, then greater than .50 probability

Question 8

Q

The Logit

Answer

A

Finally, taking the natural log of both sides, we can write the equation in terms of logits (log-odds):

Probability is constrained between 0 and 1
Log-odds are a linear function of the predictors
Logit is now between-∞ and+∞(asthe dependent variable of a linear regression)
The regression coefficients go back to their old interpretation (kind of)
The amount the logit (log-odds) changes, with a one unit change in (1

Question 9

Q

Estimating the Coefficients of a Logistic Regression

Answer

A

Maximum Likelihood Estimation (MLE) is a statistical method for estimating the coefficients of a model
The likelihood function (l) measures the probability of observing the particular set of dependent variable values that occur in the sample
MLE involves finding the coefficients that makes the log of the likelihood function (ll < 0) as large as possible

Question 10

Q

The Likelihood Function for Logit Model

Answer

A

Suppose 10 individuals make travel choices between auto (A) and public transit (T).
All travelers are assumed to possess identical attributes (unrealistic), and so the probabilities are not functions of β’s but simply a function of p, the probability p of choosing auto.
- ■l = p^x (1 - p)^n-x = p⁷ (1-p)3
  ■ ln(l) = 7ln(p) + 3ln(1-p), maximized at 0.7

Question 11

Q

Evaluating the Logistic Regression

Answer

A

The log likelihood function (ll) is one metric to compare two logistic regression models (the higher, the better)
- Also AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) measure the goodness-of-fit
There are several measures intended to mimic the R2 analysis (Pseudo-R2, e.g., McFadden-R2 or Nagelkerke-R2), but the interpretation is different
A Wald test or t-test is used to test the statistical significance of each coefficient in the model hypothesis that βi=0
The Chi-Square statistic and associated /-value shows whether
the model coefficients as a group equal zero ( Group :
- Larger Chi-squares and smaller p-values indicate greater confidence in rejecting the null hypothesis of no
Use also error rates and gain curves to evaluate the performance

Question 12

Q

McFadden R2 / Pseudo R2

Answer

A

R²_McFadden= 1 - (ll)/(ll₀) <– describes fit of model

If the full model does much better than just a constant, in a discrete-choice model this value will be close to 1.
- 1-(80.9658/123.757)=0.3458 for the logit model on the previous to last slide
If the full model doesn’t explain much at all, the value will be close to 0.
Typically, the values are lower than those of R2 in a linear regression and need to be interpreted with care.
- >0.2 is acceptable, >0.4 is already ok

Question 13

Q

Calculating Error Rates from a Logistic Regression

Answer

A

Assume that if the estimated / is greater than or equal to .5 then the event is expected to occur and not occur otherwise.
By assigning these probabilities 0s and 1s and comparing these to the actual 0s and 1s, the % correct Yes, % correct No, and overall % correct scores are calculated.

Question 14

Q

Simple Interpretation of the Coefficients

Answer

A

If β₁ <0 then an increasein X₁ =>(0 < exp(β₁) < 1)
- then odds go down
If β₁ > 0 then an increase in X₁ => (exp(β₁) > 1)
- then odds go up
Always check for the significance of the coefficients
But can we say more than this when interpreting the coefficient values?

Question 15

Q

Multicollinearity and Irrelevant Variables

Answer

A

The presence of multicollinearity will not lead to biased coefficients, but it will have an effect on the standard errors.
- If a variable which you think should be statistically significant is not, consult the correlation coefficients.
- If two variables are correlated at a rate greater than .6, .7, .8, etc. then try dropping the least theoretically important of the two.
The inclusion of irrelevant variables can result in poor model fit.
- You can consult your Wald statistics and remove irrelevant variables.

Question 16

Q

Multiple Logistic Regression

Answer

Study These Flashcards

A

More than one independent variable
- Dichotomous, ordinal, nominal, continuous …
  ln(p/(1-p)) = β_{0 +}β₁X₁ + β₂X_{2 … +}β_{<span>n</span>}X_{<span>n</span>}
Interpretation of β_i
- Increase in log-odds for a one unit increase in xi with all the other x_is constant
  p(Y=1) = 1/ (1+e^{-(β₀+β₁X₁+…+β_N+X_N)})
Effect modification
- Interaction effectsxcn be modelled by including interaction terms, e.g. the interaction effect of age and income
  ln(p / (1-p) = β_{0 +}β₁X₁ + β₂X_{2 +}β3X1 X X₂
Discrete choice models take many forms, including:
- Binary logit, multinomial logit, conditional logit (variables vary over alternatives), ordered logit (good/bad/ugly), etc.

Question 17

Q

Multinomial Logit Models

Answer

Study These Flashcards

A

The dependent variable, Y, is a discrete variable that represents a choice, or category, from a set of mutually exclusive choices or categories.
- Examples are brand selection, transportation mode selection, etc.
- Still the residuals need to be i.i.d
Model:
- Choice between J > 2 categories
- Dependent variable y = 1,2,3, … J
If characteristics that vary over alternatives (e.g., prices, travel distances, etc.), the multinomial logit is often called “conditional logit”.

Question 18

Q

Generalized Linear Models (GLM)

Answer

Study These Flashcards

A

The models in this class are examples of generalized linear models
GLMs are a general class of linear models that are made up of three components: Random, Systematic, and Link Function
- Random component: Identifies dependent variable (Y) and its probability distribution
- Systematic Component: Identifies the set of explanatory variables 1(1,…,(&3
- Link Function: Identifies a function of the mean that is a linear function of the explanatory variables
- g(μ)=α+β₁X₁ + … +β_kX_k
Link function:
- Identity link (form used in normal regression models): g(μ) = μ
- Log link (used when μ cannot be negative as when data are Poisson counts): g(μ) = log(μ)
- Logit link (used when μ is bounded between 0 and 1 as when data are binary): μ

g(μ) = log(μ / 1− μ)

Question 19

Q

Count Variables as Dependent Variables

Answer

Study These Flashcards

A

Many dependent variables are counts: Non-negative integers
- # Crimes a person has committed in lifetime
- # Children living in a household
- # new companies founded in a year (in an industry)
- # of social protests per month in a city
Count variables can be modeled with OLS regression… but:
- 1. Linear models can yield negative predicted values… whereas counts are never negative
- 1. Count variables are often highly skewed
    * Ex: # crimes committed this year… most people are zero or very low; a few people are very high
    * Extreme skew violates the normality assumption of OLS regression.

Question 20

Q

Count Models

Answer

Study These Flashcards

A

Two most common count models:
- Poisson regression model (aka. log-linear model)
- Negative binomial regression model
Both assume the observed count is distributed according to a Poisson distribution:
- μ = expected count (and variance)
- y = observed count
P(y | μ)= e^−μμ^y / y!

Question 21

Q

Poisson Regression for Count Data

Answer

Study These Flashcards

A

Strategy: Model log of μ as a function of (s
- Quite similar to modeling log odds in logit
- Again, the log form avoids negative values
- ln(μ) = ∑(k über j=1) β_j X_ji
Which can be written as:
- μ = e^{∑β_j X_ji}
Distribution: Poisson (Restriction: E(Y) = V(Y))
- When the mean and variance are not equal (over-dispersion), often the Poisson distribution is replaced with a negative binomial distribution
Link Function: Can be identity link, but typically use the log link:
- g(μ)=ln(μ)=β₀ +β₁X₁ +…+β_kX_k
- μ(X₁…X_k) = e^{β₀ +β₁X₁ +…+β_kX_k}

Question 22

Q

Interpreting Coefficients Poisson regression

Answer

Study These Flashcards

A

In Poisson Regression, + is typically conceptualized as a rate…
- Positive coefficients indicate higher rate; negative = lower rate
Like logit, Poisson models are non-linear
- Coefficients don’t have a simple linear interpretation
Like logit, model has a log form; exponentiation aids interpretation
- Exponentiated coefficients are multiplicative
- Analogous to odds ratios… but called “incidence rate ratios”.
Exponentiated coefficients: indicate effect of unit change of X on rate
- e^b= 2.0 indicates that the rate doubles for each unit change in X
- e^b = 0.5 indicates that the rate drops by half for each unit change in X
Recall: Exponentiated coefs are multiplicative
- If)* e^b= 5.0 a 2-point change in X isn’t 10; it is 5*5 = 25
- Also: you must invert to see opposite effects
  - If) e^b = 5.0 a 2-point decrease in X isn’t-5, it is 1/5 =0.2
Again, exponentiated coefficients (rate ratios) can be converted to % change
- Formula: (e^b- 1) ∗ 100%
- Coefficent = -0.693
  - (e^-0.693 - 1) * 100% = 50% decrease in rate

Question 23

Q

Poisson Model Assumptions

Answer

Study These Flashcards

A

Poisson regression makes a big assumption: That variance of μ = μ (“equidisperson”)
- In other words, the mean and variance are the same
- This assumption is often not met in real data
- Dispersion is often greater than μ: overdispersion
Consequence of overdispersion: Standard errors will be underestimated
- Potential for overconfidence in results; rejecting H0 when you shouldn’t!
- Note: overdispersion doesn’t necessarily affect predicted counts (compared to alternative models).
Overdispersion is most often caused by highly skewed dependent variables
Often due to variables with high numbers of zeros
- Ex: Number of traffic tickets per year
- Most people have zero, some can have 50!
- Mean of variable is low, but SD is high
Other examples of skewed outcomes
- # of scholarly publications
- # cigarettes smoked per day
- # riots per year (for sample of cities in US).

Question 24

Q

General Remarks

Answer

Study These Flashcards

A

Poisson & negative binomial models suffer all the same basic issues as “normal” regression, and you should be careful about
- Model specification / omitted variable bias
- Multicollinearity
- Outliers/influential cases
Also, it uses maximum likelihood
- N > 500 = fine; N < 100 can be worrisome
  - Results aren’t necessarily wrong if N < 100;
  - But it is a possibility; and hard to know when problems crop up
- Plus ~10 cases per independent variable.
Tobit regressions are relevant if data is censored
- Estimate hours worked by employees and characteristics of employees such as age, education and family status. For unemployed people we do not have the number of hours they would have worked had they had employment.

Chapter 4: Logistic and Poisson Regressions Flashcards

(24 cards)