Lecture 14 Flashcards by Learning Learner

LDV models refer to those where

the dependent variable’s range is restricted:
- binary response, e.g. Probit, Logit
- Censored
- Tobit
- Truncated
- Sample Selection

How well did you know this?

Not at all

Perfectly

Binary response models

Used when the dependent variable takes only 2 possible values

How well did you know this?

Not at all

Perfectly

Censored models

Used when some values of Y are only partially observed

How well did you know this?

Not at all

Perfectly

Tobit models

Used when the dependent variable is continuous, but there’s a threshold below which all values are reported as the same
- so spending on luxury goods, either 0 or the actual spending.

How well did you know this?

Not at all

Perfectly

Sample Selection Models

Used when the sample might not be randomly selected
- e.g. studying wages, but only having data for people who choose to work
- need to adjust for bias introduced by non-random selection

How well did you know this?

Not at all

Perfectly

Main goals when studying binary response models like Probit and Logit

Develop better models for binary outcomes, predict the probability someone does something - Pr(y=1|x), using Logit or Probit
justify with economic theory, leading to a latent utility model
Develop Estimation Methods, OLS doesn’t work except for linear probability so use NLS or MLE
Interpret coefficients so how a one unit change in x affects the probability that y=1
Generalise F tests

How well did you know this?

Not at all

Perfectly

Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:

Someone’s decision to work
- utility function and components.

let y = 1 if they work, y = 0 if they dont
person works if utility from working is greater than not working
U(y; x, ey) = ByT.x + ey
where x are observed characteristics which might affect one’s preference for work
By are parameters which quantify how each characteristic affects utility
ey represents un observed taste shifters like motivation.

How well did you know this?

Not at all

Perfectly

Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:

Someone’s decision to work
- making the decision

Person will only work (y=1) iff:
- u(1;x,e1) > u(0;x,e0)
=> B1T.x + e1 > B0T.x + e0
=> (B1-B0)T.x + (e1-e0) > 0
SO, y=1 if => BT.x + e > 0, y = 0 otherwise
- since e is unobserved we make assumptions about its distribution to estimate the model.

How well did you know this?

Not at all

Perfectly

Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:

Someone’s decision to work
- Logit and Probit, i.e. MLR, i.e. normal or logistic distribution

Probit: assume e - N(0,1),
then G(z) = integral((1/(2pi)^-.5))e^-(0.5u^2).du) limits z and negativity infinity
Logit: assume e - Logistic (0,1)
Then G(z) = ((e^z)/(1 + e^z))

Both are bell shaped and symmetric, so you get similar qualitative results, but the tails differ slightly

How well did you know this?

Not at all

Perfectly

Compute the PDF of the Logit function
- u need this for computing marginal effects and performing MLE estimation

G(z)(1-G(z))
- PDF is symmetric around 0 and bell shaped,
- this also applies to the Probit model

How well did you know this?

Not at all

Perfectly

Economic intuition and mathematical setup behind binary response models like Logit and Probit, using example of:

Someone’s decision to work
- probability and estimation
- back to the general model

Using the symmetry of G(z), we get:
- P(y=1|x) = p(e>-BT.x) = 1 - G(-BT.x) = G(BT.x)
- can now estimate B using MLE, but OLS wont work as models is nonlinear
- this symmetry applies to both Logit and Probit

How well did you know this?

Not at all

Perfectly

MLE using a simple binary example:
- y=1 with probability p, y=0 with probability 1-p
- lets say we observed: y1 = 0, y2 = 1, y3 = 0, independent draws from a Bernoulli distribution with unknown probability p

Use the likelihood function:
L(p) = ((1-p)^2)(p)
- maximise with respect to p, p = 1/3
- can also turn into log likelihood function and maximise to get p = 1/3
Core idea of ML is to find p which makes the observed data most likely.

How well did you know this?

Not at all

Perfectly

Likelihood for Binary Respone:
- we want to estimate parameters B for a binary model, y - {0,1}

Given the (PDF), likelihood function f(yi|xi), if yi = 1, expression becomes G(BT.x) and if y = 0, it becomes 1 - G(BT.x)
- this can work for either Logit or Probit
Estimate B by maximising L(B) = SUM(li(B)), with respect to B, this is MLE
- estimator is consistent, asymptotically normal and efficient.

How well did you know this?

Not at all

Perfectly

3 standard methods for testing Multiple Exclusion Restrictions in models like Logit or Probit

Lagrange Multiplier, or Score Test
Wald Test
Likelihood Ratio Test

How well did you know this?

Not at all

Perfectly

Lagrange Multiplier, or Score Test
- what it does
- key feature
- why useful
- test stat distribution

tests whether adding extra parameters would improve the fit of a restricted model
only estimate the restricted model, i.e. the one without excluded variables
efficient when you’re testing if variables are needed before estimating the bigger models
H0, restrictions are valid, test follows chi squared distribution

How well did you know this?

Not at all

Perfectly

Wald Test
- what it does
- key feature
- why useful
- test stat distribution

Study These Flashcards

tests whether estimated parameters in the unrestricted model are statistically 0
only estimate the unrestricted model
common in regression packages, quick and based on SEs from full model
may perform poorly if the null is near the boundary of the parameter space or the sample is small

LR test
- what it does
- key feature
- why useful
- test stat distribution

Study These Flashcards

compares the fir of restricted vs unrestricted models
have to estimate both models
chi squared distribution.

Coefficient Bj in Logit or Probit issue

Study These Flashcards

What we model is:
- Pr(y=1|x) = G(BT.x)
- but BT.x is actually interpreted as the expected value of a latent variable y*, which itself is not observed
- means the coefficients Bj do not directly tell you the marginal effects of xj on prob y being 1

Coefficient Bj in Logit or Probit
- discrete vs continuous

Study These Flashcards

If discrete: cant take a derivative as the variable jumps
- so instead compute difference in predicted probabilities when variable switches from 0 to 1
- Effect = G(with x=1) - G(with x=0), where G is the CDF
If continuous:
- Dp(x)/Dxj = g(BT.x).Bj
- g is the PDF - the slope of the CDF, so this tells you how much does the probability of y=1 change when you increase xj slightly

Key remarks on partial effects

Study These Flashcards

Generalisation to other variables, like nonlinear transformations
Can be generalised for general functional forms with no nonlinearities
Elasticities can be calculated
Careful with interactions of variables.

PEA - Partial Effect at the Average

Study These Flashcards

Dp(E{x})/Dxj = g(B^T.x_).Bj^
- plug in the average value of x into the marginal effect formula
- simple, easy to interpret
- what does it even mean to be 48% female?
- for nonlinear models, E[f(x)} does not = f(E[x]), so may not reflect reality well

APE - Average Partial Effect

Study These Flashcards

E[Dp(x)/Dxj] = Bj^.1/n(SUM(g(B^T.x))
- compute the marginal effect for each individual and then average across your sample
- more representative of the sample

PEA or APE?

Study These Flashcards

Use PEA if you want simplicity, use APE for more accurate average marginal effects.

In binary models, regular R^2 isn’t meaningful, few alternatives
- percent correctly predicted
- fraction of successes in sample
- pseudo R^2

Study These Flashcards

Predict y^ = 1 if G(^) >/ 0,5, otherwise, 0, then compare predicted y to actual y, calculation proportion of correct predictions
Sometimes sample has very few 1s, can adjust threshold instead of 0.5 to match fraction of successes in sample
= 1 - ((lnLur)/(lnL0)), lnLur is log likelihood from the model with predictors, L0 is the log-likelihood from the null model

Digging deeper into Pseudo R^2 - efron

- makes sense as log likelihoods are negative, and a better model means higher log likelihood, since lnLur > lnL0, value here is less than 1 - alternative: 1 - (SUM.(pi^ - yi)^2/SUM.(yi - y_)^2), where pi^ is the estimate probability that y = 1 for observation i.

What is censoring? - when may it occur?

Censoring refers to situations in regression models where the dependent variable is one partially observed - top coding: income reported as 100k+ - duration models: might only know someone hasn’t yet experienced the events like death - right censoring - attrition: a survey respondent drops out, you know they were still employed last time but not what happened next

Particular model with censoring is thus: - y* = Bt.x + u, u|x, c - N(0,o^2) - y = min(c,y*)

If y* > c, we only observe y, this is right-censoring If y* \< c, we observe the actual y* - we know whether the true outcome was above or below, but not the exact value if its censored. - also is an analogous left-censoring version with y = max(c,y*), like when time can’t be below 0

Particular model with censoring is thus: - y* = Bt.x + u, u|x, c - N(0,o^2) - y = min(c,y*) How to construct the likelihood function for this censored model

Pr(y = c|x) = pr (y* >/ c|x) = pr( u >/ c - Bt.x |x) = 1- CDF Likelihood contributions, two cases: 1. Uncensored data, so if y

Tobit model example: - we assume there’s an unobserved variable y* = Bt.x + u that reflects someone’s true inclination to work, but we only observe: - y = max(c,y)

So if y > c, we observe y*, otherwise we observe c, usually c = 0 - if someone wants to work a positive amount, we observe that - if not we observe a corner solution, y = 0 A left censoring problem

Tobit model example: - we assume there’s an unobserved variable y* = Bt.x + u that reflects someone’s true inclination to work, but we only observe: - y = max(c,y) LIKELIHOOD CONSTRUCTION

1. Probability of being censored (i.e., y = 0) - Pr( y = 0|x) = Pr( y* \< 0|x) = Pr (u \< -Bt.x) = 1 - CDF 2. Density when y > 0, uncensored, a standard normal density shifted by Bt.x, scaled by 1/o 3. For the likelihood function, when y = 0, we use the complement of the CDF, when y > 0, we use the normal density, then MLE this

Comparing Tobit and censoring models - similarities and differences

Similarities: - both involve combining censored and uncensored observations and estimating with MLE Differences: - in censoring models, y* is the variable of interest, model is linear and Bj directly reflects the partial effect on y* - in Tobit, y* is just a modelling tool, not the focus of analysis, actual y is the one we cater about, corner solution is a meaningful outcome, thus Bj does not directly give the marginal effect on y

Want to compute the expected value of y given x in the Tobit model - y* = Bt.x + u, u is normally distributed - y = max(0,y*)

Two types of partial effects in the Tobit model

Prove these: - DE(y|x,y > 0)/Dxj - the conditional partial effect, tells us how xj changes y among individuals for whom we actually observe y > 0 - DE(y|x)/Dxj = unconditional partial effect, adjusts the latent effect Bj by the probability of being uncensored

Limitations of the Tobit model

- Tobit models ties together the probability of being above the censoring threshold and the expected value conditional on being above it, via the same parameter Bj, so any variable xj must affect both outcomes in the same direction

Lecture 14 Flashcards

(34 cards)