Chapter 4: Logistic and Poisson Regressions Flashcards Preview

Business Analytics > Chapter 4: Logistic and Poisson Regressions > Flashcards

Flashcards in Chapter 4: Logistic and Poisson Regressions Deck (24):

Logistic Regression 

  • Models of discrete choice have been a topic in (Micro-) Econometrics and are nowadays widely used in Marketing research.

  • Logit, and probit models extend the principles of general linear models (ex., regression) to better treat the case of dichotomous and categorical target variables.

  • They focus on categorical dependent variables, looking at all levels of possible interaction effects.

  • McFadden got the 2000 Nobel price in Economics for fundamental contributions in discrete choice modeling. 


Application of Logistic Regression

  • Why do commuters choose to fly or not to fly to a destination when there are alternatives.

  • Available modes = Air, Train, Bus, Car

  • Observed:

    • Choice

    • Attributes: Cost, terminal time, other ■Characteristics of commuters: Household income

  • Choose to fly iff Ufly > 0

    • Ufly = β01Cost + β2Time + γIncome + ε 


The Linear Probability Model 

  • The predicted probabilities of the linear model can be

    greater than 1 or less than 0

  • ε is not normally distributed because ! takes on only two values

  • The error terms are heteroscedastic 

A image thumb


  • The OLS estimator is the best linear unbiased estimator (BLUE), iff
    • there is a linear relationship between predictors x and y
    • the error variable is a normally distributed random variable with E(ε)=0.
    • the error variance is constant for all values of * (homoscedasticity).
    • The errors ε are independent of each other.
    • No multicollinearity among predictors (i.e., high correlation). 


The Logistic Regression Model 

  • The "logit" model solves the problems of the linear model: 
    • ln[p/(1-p)] = β0 + β1X1 + ε
  • p is the propability that the event Y occurs, Pr(Y= 1 | X1)
  • p/(1 - p) describes the odds
    • The 20% propability of winning describes odds of 0.20/0.80=0.25
    • A 50% chance of winning leads to odds of 1
  • ln[p/(1-p)] is the log odds, or "logit"
    • p = 0.50, then logit = 0
    • p = 0.70, then logit = 0,84
    • p = 0.30, then logit = -0,84

A image thumb

Logistic Function 

  • The logistic function Pr1!|(3 constrains the estimated probabilities to lie between 0 and 1 (0 <= Pr(Y | X) <= 1).

    • Pr(Y | X) = eβ01X1 / (1 + eβ01X1)

  • Pr(Y | X) is the estimated probability that the ith case is in a category and β0 + β1X1 is the regular linear regression equation

  • This means that the probability of a success (Y = 1) given the predictor variable (X) is a non-linear function, specifically a logistic function

    • if you let β01X1 =0,then p = .50

    • as β0 + β1X1 gets really big, p approaches 1

    • as β0 + β1X1 gets really small, p approaches 0 

  • The values in the regression equation β1 and β0 take on slightly different meanings.

    • β0 

    • β1 

    • 0

A image thumb

Odds and Logit 

By algebraic manipulation, the logistic regression equation can be written in terms of an odds of success: 

  • p/(p-1) = eβ01X1 
  • Odds range from 0 to positive infinity
  • If p/(p-1) is
    • less than 1, then less than .50 probability
    • greater than 1, then greater than .50 probability 


The Logit 

Finally, taking the natural log of both sides, we can write the equation in terms of logits (log-odds): 

  • Probability is constrained between 0 and 1
  • Log-odds are a linear function of the predictors
  • Logit is now between-∞ and+∞(asthe dependent variable of a linear regression)
  • The regression coefficients go back to their old interpretation (kind of)
  • The amount the logit (log-odds) changes, with a one unit change in (1 

A image thumb

Estimating the Coefficients of a Logistic Regression 

  • Maximum Likelihood Estimation (MLE) is a statistical method for estimating the coefficients of a model
  • The likelihood function (l) measures the probability of observing the particular set of dependent variable values that occur in the sample
  • MLE involves finding the coefficients that makes the log of the likelihood function (ll < 0) as large as possible 


The Likelihood Function for Logit Model 

  • Suppose 10 individuals make travel choices between auto (A) and public transit (T).
  • All travelers are assumed to possess identical attributes (unrealistic), and so the probabilities are not functions of β's but simply a function of p, the probability p of choosing auto. 
    • ■l = px (1 - p)n-x = p7 (1-p)3
      ■ ln(l) = 7ln(p) + 3ln(1-p), maximized at 0.7

A image thumb

Evaluating the Logistic Regression 

  • The log likelihood function (ll) is one metric to compare two logistic regression models (the higher, the better)
    • Also AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) measure the goodness-of-fit
  • There are several measures intended to mimic the R2 analysis (Pseudo-R2, e.g., McFadden-R2 or Nagelkerke-R2), but the interpretation is different
  • A Wald test or t-test is used to test the statistical significance of each coefficient in the model hypothesis that βi=0
  • The Chi-Square statistic and associated /-value shows whether
  • the model coefficients as a group equal zero ( Group :
    • Larger Chi-squares and smaller p-values indicate greater confidence in rejecting the null hypothesis of no
  • Use also error rates and gain curves to evaluate the performance 


McFadden R2 / Pseudo R2 

R2McFadden = 1 - (ll)/(ll0)

  • If the full model does much better than just a constant, in a discrete-choice model this value will be close to 1.

    • 1-(80.9658/123.757)=0.3458 for the logit model on the previous to last slide

  • If the full model doesn’t explain much at all, the value will be close to 0.

  • Typically, the values are lower than those of R2 in a linear regression and need to be interpreted with care.

    • >0.2 is acceptable, >0.4 is already ok 


Calculating Error Rates from a Logistic Regression 

  • Assume that if the estimated / is greater than or equal to .5 then the event is expected to occur and not occur otherwise.
  • By assigning these probabilities 0s and 1s and comparing these to the actual 0s and 1s, the % correct Yes, % correct No, and overall % correct scores are calculated. 

A image thumb

Simple Interpretation of the Coefficients 

  • If β1 <0 then an increasein X1 =>(0 < exp(β1) < 1)

    • then odds go down

  • If β1 > 0 then an increase in X1 => (exp(β1) > 1)

    • then odds go up

  • Always check for the significance of the coefficients

  • But can we say more than this when interpreting the coefficient values? 

A image thumb

Multicollinearity and Irrelevant Variables 

  • The presence of multicollinearity will not lead to biased coefficients, but it will have an effect on the standard errors.
    • If a variable which you think should be statistically significant is not, consult the correlation coefficients.
    • If two variables are correlated at a rate greater than .6, .7, .8, etc. then try dropping the least theoretically important of the two.
  • The inclusion of irrelevant variables can result in poor model fit.
    • You can consult your Wald statistics and remove irrelevant variables. 


Multiple Logistic Regression 

  • More than one independent variable
    • Dichotomous, ordinal, nominal, continuous ... 
      ln(p/(1-p)) = β0 + β1X1 + β2X2 ... + βnXn
  • Interpretation of βi

    • Increase in log-odds for a one unit increase in xi with all the other xis constant
      p(Y=1) = 1/ (1+e-(β01X1+...+βN+XN))

  • Effect modification

    • Interaction effectsxcn be modelled by including interaction terms, e.g. the interaction effect of age and income
      ln(p / (1-p) = β0 + β1X1 + β2X2 + β3X1 X X2

  • Discrete choice models take many forms, including:

    • Binary logit, multinomial logit, conditional logit (variables vary over alternatives), ordered logit (good/bad/ugly), etc. 


Multinomial Logit Models 

  • The dependent variable, Y, is a discrete variable that represents a choice, or category, from a set of mutually exclusive choices or categories.

    • Examples are brand selection, transportation mode selection, etc.

    • Still the residuals need to be i.i.d

  • Model:

    • Choice between J > 2 categories

    • Dependent variable y = 1,2,3, ... J

  • If characteristics that vary over alternatives (e.g., prices, travel distances, etc.), the multinomial logit is often called “conditional logit”. 


Generalized Linear Models (GLM) 

  • The models in this class are examples of generalized linear models
  • GLMs are a general class of linear models that are made up of three components: Random, Systematic, and Link Function
    • Random component: Identifies dependent variable (Y) and its probability distribution
    • Systematic Component: Identifies the set of explanatory variables 1(1,...,(&3
    • Link Function: Identifies a function of the mean that is a linear function of the explanatory variables
    • g(μ)=α+β1X1 + ... +βkXk 
  • Link function:
    • Identity link (form used in normal regression models): g(μ) = μ

    • Log link (used when μ cannot be negative as when data are Poisson counts): g(μ) = log(μ)

    • Logit link (used when μ is bounded between 0 and 1 as when data are binary): μ

      g(μ) = log(μ / 1− μ) 


Count Variables as Dependent Variables 

  • Many dependent variables are counts: Non-negative integers
    • # Crimes a person has committed in lifetime
    • # Children living in a household
    • # new companies founded in a year (in an industry)
    • # of social protests per month in a city 
  • Count variables can be modeled with OLS regression... but:

    • 1. Linear models can yield negative predicted values... whereas counts are never negative

    • 2. Count variables are often highly skewed

      • Ex: # crimes committed this year... most people are zero or very low; a few people are very high

      • Extreme skew violates the normality assumption of OLS regression. 


Count Models 

  • Two most common count models:
    • Poisson regression model (aka. log-linear model)  
    • Negative binomial regression model
  • Both assume the observed count is distributed according to a Poisson distribution:
    • μ = expected count (and variance)
    • y = observed count 
  • P(y | μ)= e−μμy / y! 


Poisson Regression for Count Data 

  • Strategy: Model log of μ as a function of (s
    • Quite similar to modeling log odds in logit
    • Again, the log form avoids negative values 
    • ln(μ) = ∑(k über j=1) βj Xji

  • Which can be written as: 

    • μ = e∑βj Xji

  • Distribution: Poisson (Restriction: E(Y) = V(Y))

    • When the mean and variance are not equal (over-dispersion), often the Poisson distribution is replaced with a negative binomial distribution

  • Link Function: Can be identity link, but typically use the log link: 

    • g(μ)=ln(μ)=β01X1 +...+βkXk 

    • μ(X1...Xk) = eβ01X1 +...+βkXk



Interpreting Coefficients Poisson regression

  • In Poisson Regression, + is typically conceptualized as a rate...

    • Positive coefficients indicate higher rate; negative = lower rate

  • Like logit, Poisson models are non-linear

    • Coefficients don’t have a simple linear interpretation

  • Like logit, model has a log form; exponentiation aids interpretation

    • Exponentiated coefficients are multiplicative

    • Analogous to odds ratios... but called “incidence rate ratios”. 

  • Exponentiated coefficients: indicate effect of unit change of X on rate

    •  e= 2.0 indicates that the rate doubles for each unit change in X

    • eb  = 0.5 indicates that the rate drops by half for each unit change in X

  • Recall: Exponentiated coefs are multiplicative

    • If)* e= 5.0 a 2-point change in  X isn’t 10; it is 5*5 = 25

    • Also: you must invert to see opposite effects

      • If) eb = 5.0 a 2-point decrease in X isn’t-5, it is 1/5 =0.2

  • Again, exponentiated coefficients (rate ratios) can be converted to % change

    • Formula: (e- 1) ∗ 100%

    • Coefficent = -0.693

      • (e-0.693 - 1) * 100% = 50% decrease in rate

A image thumb

Poisson Model Assumptions 

  • Poisson regression makes a big assumption: That variance of μ = μ (“equidisperson”)
    • In other words, the mean and variance are the same
    • This assumption is often not met in real data
    • Dispersion is often greater than μ: overdispersion
  • Consequence of overdispersion: Standard errors will be underestimated
    • Potential for overconfidence in results; rejecting H0 when you shouldn’t!
    • Note: overdispersion doesn’t necessarily affect predicted counts (compared to alternative models). 
  • Overdispersion is most often caused by highly skewed dependent variables

  • Often due to variables with high numbers of zeros

    • Ex: Number of traffic tickets per year

    • Most people have zero, some can have 50!

    • Mean of variable is low, but SD is high

  • Other examples of skewed outcomes

    • # of scholarly publications

    • # cigarettes smoked per day

    • # riots per year (for sample of cities in US). 


General Remarks 

  • Poisson & negative binomial models suffer all the same basic issues as “normal” regression, and you should be careful about
    • Model specification / omitted variable bias
    • Multicollinearity
    • Outliers/influential cases
  • Also, it uses maximum likelihood
    • N > 500 = fine; N
    • Results aren’t necessarily wrong if N < 100;
    • But it is a possibility; and hard to know when problems crop up
  • Plus ~10 cases per independent variable.
  • Tobit regressions are relevant if data is censored
    • Estimate hours worked by employees and characteristics of employees such as age, education and family status. For unemployed people we do not have the number of hours they would have worked had they had employment.