L5: IV regression Flashcards by Chris Jenkins

When might we use instrumental variables? (3 and what these issues have in common)

1) OVB from a variable that is correlated with X but is unobserved (tf cannot be incl. in regression eqn.)
2) Simultaneous causality bias (ie. X causes Y AND Y causes X)
3) Errors-in-variables bias (X is measured with error)

All 3 problems -> E(u|X) not equal to zero

How well did you know this?

Not at all

Perfectly

What does IV regression do?

Eliminates bias when E(u|X) not equal to zero, using an instrumental variable, Z

How well did you know this?

Not at all

Perfectly

What are endogenous and exogenous variables?

Endogenous - a variable correlated with u

Exogenous - a variable not correlated with u

How well did you know this?

Not at all

Perfectly

What are the two conditions for a VALID INSTRUMENT?

1) Instrument relevance: corr(Zi, Xi) /=0

2) Instrument exogeneity: corr(Zi, ui) = 0

How well did you know this?

Not at all

Perfectly

Explain carefully how to estimate when using an IV?

2 stage least squares:
1) ISOLATE part of X that is uncorrelated with u by regressing X on Z using OLS:
EQN: Xi= π0+ π1Zi+ vi
Because Zi is uncorrelated with ui, π0+ π1Zi is also tf so is Xi! From here, we then compute predicted values of Xi, where: Xi(hat)=π0(hat)+ π1(hat)Zi

2) Replace Xi by Xi(hat) in the regression of interest, and regress Y on Xi(hat) using OLS:
ie. Yi=B0+B1Xi(hat)+ui

Since Xi(hat) is uncorrelated with ui, E(u|X(hat))=0 tf it works! (Then can estimate B1(hat)(TSLS))

How well did you know this?

Not at all

Perfectly

What does 2SLS require?

n to be large so π0 and π1 are estimated precisely

How well did you know this?

Not at all

Perfectly

Show that the 2SLS estimator is equal to the ratio of the covariances: S(YZ)/S(XZ)

see notes bottom page 1 side 1

How well did you know this?

Not at all

Perfectly

Is the 2SLS estimator consistent?

YES see notes for why (ie. both the sample covariances are consistent tf the estimator tends with probability to true value of B1)

How well did you know this?

Not at all

Perfectly

What is inference like using TSLS?

Same as usual

How well did you know this?

Not at all

Perfectly

Why are OLS standard errors from the 2nd stage regression wrong?

They do not take into account the estimation of the first stage where Xi(hat) is estimated (stata can solve this with a command that computes the TSLS with corrects SEs) (HTSK-robust SEs)

How well did you know this?

Not at all

Perfectly

Why would a regression that relates quantity (Y) to price (X) likely suffer from bias? What type of bias would this be?

This regression only gives equilibrium point at the crosssover of S and D, but when collecting data in a market only get price and quantity at equilibrium tf no D and S function and tf this gives rise to simultaneity bias (ie. change in D causes change in Quantity supplied and vice versa?)

How well did you know this?

Not at all

Perfectly

See

cigarette demand example in notes

How well did you know this?

Not at all

Perfectly

See

General IV regression model notes

How well did you know this?

Not at all

Perfectly

What is the problem in the generalised IV regression model with adding more IVs?

see notes

How well did you know this?

Not at all

Perfectly

Explain the three cases of identification relevant to 2SLS? When can 2SLS be done?

Exact identification if m=k
Underidentified if m less than k
Overidentified if m>k

Can only be done with exact/overidentification - where m is number of IVs and k is number of ENDOgenous regressors

How well did you know this?

Not at all

Perfectly

See notes

Study These Flashcards

Bottom of side 2 check I understand how to do TSLS with a single endogenous regressor (X) and multiple exogenous regressors (W1…Wi) (go over cig example too!)

If you have 2 suitable IVs, Z1 and Z1, that are both correlated with the endogenous variable and uncorrelated withe error, which should you use and why?

Study These Flashcards

BOTH!
regress the endogenous variable on both Z1 and Z2 - this is a case of overidentification and therefore will reduce the SE of the results (so long as additional IVs are appropriate): more information -> BETTER ESTIMATES!

Explain under what assumptions does TSLS hold and its t-statistic is normally distributed?

Study These Flashcards

E(ui|W1i,…,Wri) = 0 the exogenous regressors are exogenous.
(Yi,X1i,…,Xki,W1i,…,Wri,Z1i,…,Zmi) are i.i.d
The X’s, W’s, Z’s, and Yhave nonzero, finite 4th moments
The instruments (Z1i,…,Zmi) are valid (ie. Corr(Zmi,ui)=0 and Corr(Zmi,Xi)=/0 for m=1 to M)

In MRM generalised IVs, when are instruments said to be relevant? And when are they said to be weak?

Study These Flashcards

In the first stage, if at least one π is not equal to zero then the instruments are relevant
If they are all equal to zero (or v. close to zero) the instruments are weak

What do weak instruments do?

Study These Flashcards

They explain very little of the variation in X BEYOND what is explained by the W’s

What is a consequence of IVs being weak?

Study These Flashcards

TSLS sampling distribution and t-stat are not at all normal, even when n is large!

(Why? Because makes S(XZ) v small tf beta1(hat)TSLS becomes very large!) (ie. no correlation between X and Z and tf Z does not explain X tf Z does not explain Y either!) (see notes bottom of S2P2 and top of S1P3)

How do you test instrument strength?

Study These Flashcards

F-test that tests that all the coefficients on Z1,…,Zm DO NOT ENTER first stage regression (ie. are all equal to zero)
Rule of thumb: if F-stat is less than 10 then the set of instruments is weak! (tf -> biased 2SLS)

What does comparing to F=10 actually allow us to do?

Study These Flashcards

Compare if the bias (relative to OLS) is greater or less than 10% (IF F is less than 10, bias is more than 10% and vice versa!!!)

2 solutions to weak instruments?

Study These Flashcards

1) Find better instruments/drop ones you think may be weak

2) Use other estimators (can be very complicated though)

What criteria must be fulfilled to test for instrument exogeneity? What is the consequence for TSLS if this assumption does not hold?

Criteria: the model must be overidentified to do this test! If the assumption of instrument exogeneity fails, then TSLS is INCONSISTENT!

When to use J-test of overidentifying restrictions?

If given say 2 IVs, Z1 and Z2, and computer TSLS for both and the estimates for beta are very different, then know that one of Z1 or Z2 must be invalid

See

bottom of p2s2 on how to conduct a J-test

What are the hypotheses for a J-test?

H0: All instruments are exogenous H1: At least one instrument is not exogenous

J-statistic distribution? How many DofF in a J-test?

Chi-squared, with m-k DofF

Why must the model be overidentified to do a J-test?

Because otherwise the DofF, m-k, will equal 0!

What does it mean if the actual J statistic is in the critical region?

Means that H0 is rejected because there is at least one endogenous IV

Summary?

Slides 38 and 39 if needed

See

S3P3 in notes on cig demand bit

How can we interpret the J-test rejection?

Can use intuition to try work out which variable(s) is/are endogenous, then redo the model and try again

What points need to be considered when assessing the validity of a study?

1) OVB 2) Function form misspecification 3) Simultaneous causality bias 4) Errors-in-variables bias 5) Selection bias (ie. have all states been used or just some???) 6) Are IVs truly relevant and exogenous 7) Old data: if using old data need to consider if it is externally valid to apply it to today's problems

L5: IV regression Flashcards

(35 cards)