QE Flashcards
(96 cards)
Mean independent:
Independent:
E(u|X) = E(u) and E(u) = 0
Eg(u)|X) = E(g(u))
E.g. variance spreading as X increases
Assumptions for consistency vs unbiasedness
Consistency: OR cov(e,X) = 0
Unbiasedness: mean independence
- E(e|X) = E(e) = 0
Regressions in both directions implications
Run regression both directions
- Both coefficients have descriptive interpretations
- Only one coefficient can have causal
- in general B does not equal 1/y
- Inverting a LRM (or CEF) does not yield a LRM (or CEF)
- to persuade causal need to persuade OR is plausible
Define descriptive interpretation
“on average a unit increase in X1 is associated with a b* increase in Y, holding X2..Xk constant”
Standard error of regression
s = square root (SSR/(n-k-1)
Least squares assumptions
1) error term is conditional mean 0
2) X Y are iid draws from joint dis
3) non-finite fourth moments - large outliers are unlikely
4) no perfect multicollinearity
Talk about consistency
Means sample β is very close to u with high probability
- consequence of LLN
- distribution of sample β collapse to β
Talk about asymptotic normality
- Consequence of CLT
- β - β => N(0, σ^2)
Talk about asymptotic variance of beta / se(B hat)
w^2 = σu^2 / Var (X) (write as sum)
- OLS is more precise the larger Var (X) and the smaller σu^2 (better fit - can get by adding more regressors)
- valid for IV: good control high variance in Var(X) explained by X
Talk about imperfect multicollinearity
- high correlation of X with other regressors so Var(X ̅ ) is very small
- beta measured imprecisely
Hypothesis testing steps
1) state null and alternative
2) get t stat
3) Under the null t -> N(0,1)
4) Decision rule
5) Outcome
Talk about one-sided tests
- only makes sense if there’s an a priori reason (e.g. from eco theory) to excluding other direction from consideration
- has more power to detect departures from the null in positive direction but not power to detect departures in the negative direction
Pval definiton and usefulness
- prob, under the null, of obtaining a value of t at least as averse to the null as the one computed.
- summarising the weight of evidence against the null
Confidence interval interpretation
The collection of null hypothesised values for β that would be accepted (by a 2-sided t test) at significance level ∝
- set of null hypothesis that i couldn’t reject if I do a 1% confidence test
Polynomial in regression vs linear
- polynominal: can look at marginal effect of X on Y - differentiate
- Linear: averages out ease different marginal effects
- coefficient on x1x2 is effect of a one-unit inc in x1 or x2 above and beyond a unit inc in each of them alone
Causes of endogeneity
1) omitted variable bias
2) measurement error
3) simultaneity
OBV formula and usefulness
β’ = β + yCov(X1, X2) / Var (x1)
- assess likely direction of the bias
Impact of measurement in error in Y
- inferences on β still valid, just estimate of β is less precise
Example for IV for demand elasticity of cigarettes (why a good one)
General sales tax:
- cov(t,p) not equal zero
- cov(t,u) = 0 (assume not state specific)
Solutions to bad controls
1) Find an instrument for education and estimate model via 2SLS
2) omit from regression
- Interpretation: ‘total effect’ of labour market discrimination inclusive of its effects of educational attainment
Why 2SLS less efficient than OLS
- Var(X) = Var(X* + u) = Var(X) + Var(u) > Var (X)
- only looking at part of X explained by D so less precise
Tension in choosing instruments
- Want to be highly correlated with X: inc Var(X*)
- requiring variables to be exogenous Cov(Z,u) = 0
Test for relevance
F > c = 10
Test for exogeneity
- Descriptive exogeneity
Can’t test for one Z
- Test for more than one Z:
H0: cov(z1, u) =..= cov(z2,u) = 0
F test F-> Fm-1, infinity
Z correlated with other unobserved determinants of Y