What are the two key assumptions required for instruments to be applicable?
1) Intrument relevance (correlation with the x variable)
2) Instrument exogeneity (not related to Y at all)
Why do we use IV regression?
We are worried about 1) reverse causality
2) endogeneity, where an unobserved variable Z impacts both X and Y (for example, cognitive ability being proxied by test scores and also affecting income later in life)
What is the reduced-form regression?
Run a regression of Yi on Zi (the instrumental variable)
What is the first stage regression?
Run a regression of Xi on Zi (the IV)