Models for Count Data II Flashcards
When is Poisson regression appropriate?
When the number of events (counts) follows a Poisson distribution, conditional on the predictors
What are the ways in which the assumptions for a Poisson regression can be violated?
- Overdispersion (variance > mean)
- Excess zeroes (more zeroes than in a Poisson distribution)
- No zeroes
What is equidispersion?
Assumption for Poisson regression - variance = mean
In social science, medicine, and health, in what way do count data violate assumptions for Poisson regression?
Overdispersion (variance > mean)
Models for count data: equidispersion and zeroes as expected
Poisson
Models for count data: equidispersion and excess zeroes
Zero-inflated Poisson
Models for count data: Equidispersion and no zeroes
Zero-truncated Poisson
Models for count data: Overdispersion and zeros as expected
Negative binominal
Models for count data: Overdispersion and excess zeroes
Zero-inflated negative binomial
Models for count data: Overdispersion and no zeroes
Zero-truncated negative binomial
What about underdispersion?
This can occur in principle, but is rare in practice (variance < mean)
What happens if you use the Poisson distribution even though your data are overdispersed? Or use a model that doesn’t consider excess or no zeroes when it should?
Coefficient estimates may be biased and/or misleading (i.e., slope coefficients may not be a good estimate of relationship between predictor(s) and outcome)
What are the implications for SEs when not considering overdispersion?
They may be underestimated. This implies that your p-values would be too small and your CIs to narrow, increasing the risk of Type I error
What choices do you have when outcome is overdispersed?
- Negative binomial regression (or other models accounting for overdispersion)
- Poisson regression with robust SEs
What are robust SEs?
Adjusted so they are robust to violations of Poisson regression. Robust SEs are usually larger than those from a typical Poisson regression. Considered a more cautious way of analysing the data
What is the most commonly used overdispersed distribution?
Negative binomial
What are the parameters of a negative binomial distribution?
Mean, µ, and a dispersion parameter α
The mean and variance are related (as opposed to in the normal distribution where they are independent): var(Y) = µ + αµ^2
In Poisson, we just have one parameter (mean) as variance is equal to the mean
What values can the dispersion parameter α take?
Values of 0 or larger (can never be negative)
- if α = 0, we have a Poisson distribution (with equidispersion)
- if α > 0, we have an overdispersed distribution
The larger the α, the larger the variance relative to the mean
Are there other ways of relating the mean to the variance in negative binomial regression?
Yes, different ways of relating the variance to the mean can sometimes slightly change the model or slightly improve your model
In the negative binomial distribution, what does larger dispersion imply?
Larger variance
What is the shape of the negative binomial distribution?
Tails are much larger compared to when dispersion is equal to 0 (Poisson)
Overall comparison of properties of Poisson and negative binomial distributions:
Poisson:
- Equidispersed
- One parameter (µ = mean = variance)
- Var(Y) = µ
Negative binomial:
- Overdispersed
- Two parameters (µ = mean; α = dispersion)
- Var(Y) = µ + αµ^2
In the negative binomial distribution, what is this way of specifying the variance called? Var(Y) = µ + αµ^2
NB2-parameterisation
There are other options e.g., the NB1-parameterisation: var(Y) = µ + αµ
Negative binomial regression equation:
log(µi) = β0 + β1X1i + β2X2i + … + βkXki
yi ~ NegBin(µi, α), var(yi) = µi + + αµ^2
Where ‘i’ represents each observation
- This looks similar to a Poisson regression. Again, we use a log-transformation of the outcome. The difference is that we now have an additional parameter in the model, the dispersion, α, which we need to estimate. The dispersion parameter governs the extent of overdispersion