Lecture 14 Flashcards
LDV models refer to those where
the dependent variable’s range is restricted:
- binary response, e.g. Probit, Logit
- Censored
- Tobit
- Truncated
- Sample Selection
Binary response models
Used when the dependent variable takes only 2 possible values
Censored models
Used when some values of Y are only partially observed
Tobit models
Used when the dependent variable is continuous, but there’s a threshold below which all values are reported as the same
- so spending on luxury goods, either 0 or the actual spending.
Sample Selection Models
Used when the sample might not be randomly selected
- e.g. studying wages, but only having data for people who choose to work
- need to adjust for bias introduced by non-random selection
Main goals when studying binary response models like Probit and Logit
- Develop better models for binary outcomes, predict the probability someone does something - Pr(y=1|x), using Logit or Probit
- justify with economic theory, leading to a latent utility model
- Develop Estimation Methods, OLS doesn’t work except for linear probability so use NLS or MLE
- Interpret coefficients so how a one unit change in x affects the probability that y=1
- Generalise F tests
Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:
Someone’s decision to work
- utility function and components.
- let y = 1 if they work, y = 0 if they dont
- person works if utility from working is greater than not working
U(y; x, ey) = ByT.x + ey - where x are observed characteristics which might affect one’s preference for work
- By are parameters which quantify how each characteristic affects utility
- ey represents un observed taste shifters like motivation.
Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:
Someone’s decision to work
- making the decision
Person will only work (y=1) iff:
- u(1;x,e1) > u(0;x,e0)
=> B1T.x + e1 > B0T.x + e0
=> (B1-B0)T.x + (e1-e0) > 0
SO, y=1 if => BT.x + e > 0, y = 0 otherwise
- since e is unobserved we make assumptions about its distribution to estimate the model.
Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:
Someone’s decision to work
- Logit and Probit, i.e. MLR, i.e. normal or logistic distribution
Probit: assume e - N(0,1),
then G(z) = integral((1/(2pi)^-.5))e^-(0.5u^2).du) limits z and negativity infinity
Logit: assume e - Logistic (0,1)
Then G(z) = ((e^z)/(1 + e^z))
Both are bell shaped and symmetric, so you get similar qualitative results, but the tails differ slightly
Compute the PDF of the Logit function
- u need this for computing marginal effects and performing MLE estimation
G(z)(1-G(z))
- PDF is symmetric around 0 and bell shaped,
- this also applies to the Probit model
Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:
Someone’s decision to work
- probability and estimation
- back to the general model
Using the symmetry of G(z), we get:
- P(y=1|x) = p(e>-BT.x) = 1 - G(-BT.x) = G(BT.x)
- can now estimate B using MLE, but OLS wont work as models is nonlinear
- this symmetry applies to both Logit and Probit
MLE using a simple binary example:
- y=1 with probability p, y=0 with probability 1-p
- lets say we observed: y1 = 0, y2 = 1, y3 = 0, independent draws from a Bernoulli distribution with unknown probability p
Use the likelihood function:
L(p) = ((1-p)^2)(p)
- maximise with respect to p, p = 1/3
- can also turn into log likelihood function and maximise to get p = 1/3
Core idea of ML is to find p which makes the observed data most likely.
Likelihood for Binary Respone:
- we want to estimate parameters B for a binary model, y - {0,1}
Given the (PDF), likelihood function f(yi|xi), if yi = 1, expression becomes G(BT.x) and if y = 0, it becomes 1 - G(BT.x)
- this can work for either Logit or Probit
Estimate B by maximising L(B) = SUM(li(B)), with respect to B, this is MLE
- estimator is consistent, asymptotically normal and efficient.
3 standard methods for testing Multiple Exclusion Restrictions in models like Logit or Probit
- Lagrange Multiplier, or Score Test
- Wald Test
- Likelihood Ratio Test
Lagrange Multiplier, or Score Test
- what it does
- key feature
- why useful
- test stat distribution
- tests whether adding extra parameters would improve the fit of a restricted model
- only estimate the restricted model, i.e. the one without excluded variables
- efficient when you’re testing if variables are needed before estimating the bigger models
- H0, restrictions are valid, test follows chi squared distribution
Wald Test
- what it does
- key feature
- why useful
- test stat distribution
- tests whether estimated parameters in the unrestricted model are statistically 0
- only estimate the unrestricted model
- common in regression packages, quick and based on SEs from full model
- may perform poorly if the null is near the boundary of the parameter space or the sample is small
LR test
- what it does
- key feature
- why useful
- test stat distribution
- compares the fir of restricted vs unrestricted models
- have to estimate both models
- chi squared distribution.
Coefficient Bj in Logit or Probit issue
What we model is:
- Pr(y=1|x) = G(BT.x)
- but BT.x is actually interpreted as the expected value of a latent variable y*, which itself is not observed
- means the coefficients Bj do not directly tell you the marginal effects of xj on prob y being 1
Coefficient Bj in Logit or Probit
- discrete vs continuous
If discrete: cant take a derivative as the variable jumps
- so instead compute difference in predicted probabilities when variable switches from 0 to 1
- Effect = G(with x=1) - G(with x=0), where G is the CDF
If continuous:
- Dp(x)/Dxj = g(BT.x).Bj
- g is the PDF - the slope of the CDF, so this tells you how much does the probability of y=1 change when you increase xj slightly
Key remarks on partial effects
- Generalisation to other variables, like nonlinear transformations
- Can be generalised for general functional forms with no nonlinearities
- Elasticities can be calculated
- Careful with interactions of variables.
PEA - Partial Effect at the Average
Dp(E{x})/Dxj = g(B^T.x_).Bj^
- plug in the average value of x into the marginal effect formula
- simple, easy to interpret
- what does it even mean to be 48% female?
- for nonlinear models, E[f(x)} does not = f(E[x]), so may not reflect reality well
APE - Average Partial Effect
E[Dp(x)/Dxj] = Bj^.1/n(SUM(g(B^T.x))
- compute the marginal effect for each individual and then average across your sample
- more representative of the sample
PEA or APE?
Use PEA if you want simplicity, use APE for more accurate average marginal effects.
In binary models, regular R^2 isn’t meaningful, few alternatives
- percent correctly predicted
- fraction of successes in sample
- pseudo R^2
- Predict y^ = 1 if G(^) >/ 0,5, otherwise, 0, then compare predicted y to actual y, calculation proportion of correct predictions
- Sometimes sample has very few 1s, can adjust threshold instead of 0.5 to match fraction of successes in sample
- = 1 - ((lnLur)/(lnL0)), lnLur is log likelihood from the model with predictors, L0 is the log-likelihood from the null model