Models for Count Data I Flashcards
What are count variables?
Count variables are discrete and take non- negative integer values (0, 1, 2, …) and represent the number of occurrences of an event
Give three examples of count variables:
Number of hospital visits, number of deaths by horse kick, and number of appointments with a counsellor
What must be considered when measuring count variables over different time periods or populations?
Counts should be adjusted using a rate (e.g., number of crimes per 100,000 people)
What is the formula for the Poisson probability mass function?
P ( Y = y ) = (μ^y^e − μ) / y!
- μ is the expected (mean) count (mean number of times that event occurs)
- Let Y be a random (count) variable that indicates the number of times a certain event occurs
What is a key property of the Poisson distribution?
The mean and variance are equal (equidispersion)
What happens the mean μ of a Poisson distribution is large?
It approximates a normal distribution
What is the general form of a Poisson regression model?
log(μi) = β0 + β1X1i + … + βpXpi
- Where μi is the expected count
Why do we use a log transformation in a Poisson regression?
To ensure predicted counts are always positive
What assumptions must be met for Poisson regression?
- The outcome is a count variable (non-negative integers)
- The variance equals the mean (no overdispersion). This implies heteroscedascity (different to what’s seen in the normal distribution): the predicted variance depends on the predicted mean.
- Observations are independent (e.g., no clustering)
- The transformed outcome (log(μ)) is linearly related to continuous predictors
- No multicollinearity
- Each subject’s count is measured over the same unit of time or space, or the same population size
What is overdispersion in Poisson regression?
When the variance is larger than the mean, suggesting a need for a different model
What can cause overdispersion?
- Excess zeroes (zero-inflation)
- An important predictor is missing
- A highly skewed count variable
What models can be used to handle overdispersion?
Negative binomial regression and zero-inflated Poisson models
How do we interpret a coefficient β in Poisson regression?
The exponentiated coefficient e^β represents the incident rate ratio (IRR)
What does an IRR indicate?
- IRR = 1: No effect of predictor
- IRR > 1: Predictor increases the outcome rate
- IRR < 1: Predictor decreases the outcome rate
How do we test for the significance of variables in Poisson regression?
Using an LR
What is an offset in Poisson regression?
A term added to account for different observation periods, population sizes, or area sizes
This may involve devising a rate
How do we include an offset in Stata?
poisson <outcome> <predictor(s)>, exposure(offset_variable)</outcome>
What is an example of using an offset?
Analysing the number of crimes per 1,000 residents rather than total crimes
What command is used for a basic Poisson regression in Stata?
poisson <outcome> <predictor(s)></outcome>
How do we check if a Poisson model fits the data well?
Compare observed vs. predicted counts using the prcounts command
How do we test for overdispersion?
Compare a Poisson model with a negative binomial model
Among the numeric variables, what two types can be established?
- Continuous: e.g., age, height, blood pressure, etc. They can take the form of fractions
- Discrete: e.g., number of siblings, number of hospital visits, etc. i.e., things you can actually count
Some variables are strictly speaking discrete, but in practice can be treated as continuous such as household income
What chart do we use to display count variables?
Bar (not histogram)
- Each bar represents one number
- Spaces between bars because of discrete values, not continuous
What statistical distribution can we use to model count variables?
Poisson distribution