Midterm Flashcards

Question

What drives the variance of the OLS slope estimate? What makes it more precise?

Answer 1

the lower the variation in the errors or the greater the var in the indept variable or the greater the same size (relatedly b/c sample variation increases with sample size) then the more precise the OLS estimates, on average

Answer 2

precision or efficiency of the estimate beta1-hat se-hat (beta1-hat) is lower (i.e. more precise) when: the residuals are small the variation of the independent variable is large the number of observations is large

Answer 3

1) controlling for other factors: even if you are primarily interested in estimating one parameter, including others in the regression will control of potentially confounding factors, the zero conditional mean assumption is more likely to hold 2) better predictions: more independent variable can explain more of the variation in y, meaning potentially higher R^2 3) Estimating non-linear relationships: by including higher order terms of a variable, we can allow for a more flexible, non-linear functional form between the dependent variable and an independent variable of interest 4) Testing joint hypotheses on parameters: can test whether multiple independent variables are jointly statistically significant

Answer 4

minimizes the sum of the squared residuals, combo of all the betas that gives the lowest sum of squared residuals

Answer 5

regress x3 on all the other regressors and obtain the residuals, e would contain the variation in x3 not explained by the other regressors from the initial population model, effectively holding all else constant then we conduct a bivariant regression of y on e-hat

Answer 6

the partial association between y and xk holding x1, x2,..., xk-1 equal beta-k--hat would be the slope of the multi deminsion plane along the x-k direction (ie the expected change in y when x1 increases by one unit, holding all other x's constant

Answer 7

high R^2 doesn't mean that nay of the regressors are a true cause of the dependent variable also does not meant that any of the coefficients are unbiased

Answer 8

by including higher order terms of an independent variable we can allow for a non-linear, or "more flexible functional form" between the dependent variable and an explanatory factor include x^2 as a regressor, take the partial derivative, gives you the total effect of x in two parts (linear and non-linear) if the first and second terms are substantively and significantly different from 0, then we have a situation where the sign and magnitude of the effect on wages can vary as x changes = non-linear relationship "marginal return to x", no ceteris paribus interpretation of individual parameters here, we must choose a given level of x and then describe the trade off "for an individual with ten years of experience, accumulating an additional year of experience is expected to increase his/her hourly wage by $0.18

Answer 9

when assumptions 1-4 hold, but usually ZCM fails, but it is more likely to hold with multiple regression

Answer 10

it is the R^2 from regressing x-j on all of the other independent variables a high R-j^2 is often the result of multicollinearity, high but not perfect correlation between two regressors lead to imprecise estimates

Answer 11

two countervailing channels: will will almost certainly reduce sigma-hat^2 (squared residuals), which will make the estimate more precise, this reduction will depend on the extent to which x3 predicts y by adding x3 we also introduce some correlation and perhaps multicollinearity between x1 and x3, or/and x2 and x3, which works against a more precise estimate of var-hat( beta-1-hat), this will depend on how correlated the regressors are

Answer 12

Assumptions 1-5 hold, this means that the beta-hats are the best linear unbiased estimators (BLUE)

Answer 13

whether the regressors jointly and significantly predict the variation in the squared residuals. H0: error term doesn't depend on regressors: beta-1-hat = beta-2-hat.... = 0

Answer 14

White test is harder to pass because it tests for a linear and non-linear relationships between u^2 and all x-j. therefore, you are more likely to reject the null with the white test, therefore if you reject with the B-P test you will always reject with the White.

Answer 15

the product of beta-2-hat and gamma-1-hat where beta2-hat is the estimated slope coefficient for x2 in the true model and gamma-1-hat is the slope coefficient of a bivariate regression of x2 on x1, hence if gamma-1-hat does not equal 0 there is a partial association between x2 and x1

Answer 16

corr(x1, x2) > 0 corr(x1, x2) <0 beta-2-hat > 0 + bias - bias beta-2-hat < 0 - bias + bias + bias = beta-1-estimate over estimates true beta-1-hat usually will end up causing a bias problem for all coefficients

Answer 17

bias is worse if you are making causal inferences (this is generally more important) if you are making predictive inferences then imprecision (higher standard error) is worse

Answer 18

(beta-hat-newly included regressor) * (gamma-hat-original biased variable when "regress new old") or beta-hat-educ (before you took out IQ) - beta-hat-educ (after you include IQ as a regressor) = OBV

Answer 19

sigma^2 = var(u-hat) = the average of the (unobserved) sum of squared errors therefore, sigma^2 = E(u-hat^2) = the average of the (observed) sum of squared residuals (adjusted for k = 1 restrictions)

Answer 20

Assumptions 1-5 error term u, conditional on x, has a mean of 0 and a variance of sigma^2 beta-hat has a mean of beta and a variance of var(beta-hat)

Answer 21

beta-hat ~ normal (beta, var(beta-hat)) = beta-hat is normally distributed normal sampling distribution of beta-hat for any sample size, even small samples characterize beta-hat as a t-distributed random variable, therefore can use the t-stat to evaluate hypotheses Error term is normally distributed. If we had a small sample, we would additionally require the error term to be normally distributed. However, by virtue of the large size of the sample studied in this exercise, we can conclude that the error term has an approximately normal distribution. This follows from the Central Limit Theorem

Answer 22

the probability that the H0 is mistakenly rejected due to sampling error the probability of obtaining a statistically significant association by chance when there is not in reality a statistically significant association

Answer 23

NO, this is considered to bias inferences and goes against the spirit of hypothesis testing

Answer 24

"the association between x and y is (positive/negative) and statistically significant at the 5% level, ceteris paribus "we reject the null hypothesis at 5% level (after stating H0)"

Answer 25

economic sig tells up about the magnitude of the coefficient relative to the sample mean of the dependent variable ex. if the coefficient of years of education on annual wages is $5,000 and the average wage is $30,000, we would say that the effect represents $5,000/$30,000 = 16.7% of the average dependent variable

Answer 26

"under the null hypothesis, the probability of obtaining a statistically significant result by chance is 2%

Answer 27

at 95% confidence: beta-hat +/- C of 97.5 * se-hat (beta-hat)

Answer 28

if random samples were obtained over and over again and CIs were calculated each them, then the (unknown) beta would lie in the respective CIs of 95% of these samples we often only work with one sample, so we do not know for sure whether our (random) sample is one of the 95% of samples where the interval estimate contains beta

Answer 29

"the regressors are jointly statistically significant at the alpha level"

Answer 30

beta1-~ = beta1^ + beta2^ * gamma1^ | where x1 = included variable, x2 = omitted variable and gamma is the partial association between x1 and x2

Answer 31

normality of the error term, unlikely to always hold so we relax this assumption in large samples therefore if sample size is sufficiently large, you don't need assumption 6 to run hypothesis testing

Answer 32

consistency is used to describe how the distribution of any estimator changes as the number of sample observations from a population increase. property in which the distribution of an estimator "converges" of becomes more concentrated around the real population parameter as the sample increases, even if the OLS estimator is biased. as sample size increases, the E(beta-hat) becomes increasingly close to beta, and the distribution of E(beta-hat) becomes increasingly narrow when sample size goes to infinity, the distribution of the estimator becomes a single point - true beta

Answer 33

theoretically possible to have an unbiased but inconsistent estimator more commonly, we have a biased but consistent estimator, as sample size increases the mean of the distribution of beta-hat converges on beta Law of Large Numbers: it can be shown that if assumptions 1-4 hold, beta-hat is not only an unbiased estimator of beta, but also a consistent estimator

Answer 34

that the variance of the error term systematically varies with, or depends on the explanatory variables, or any combination or function thereof

Answer 35

evaluates whether u^2 is linearly associated with (x1, x2...., xk) 1) obtain the OLS residuals u^ form the original regression 2) compute u-hat^2 3) regress u-hat^2 on (x1, x2,..., xk) u-hat^2 = gamma0 + gamma1*x1 + gamma2*x2,...., + E get R^2 for u-hat^2 4) conduct an F-test for the joint sig of gamma1, gamma2....

Answer 36

whether u^2 is systematically and jointly related to the regressors, their squares, and their interactions

Answer 37

it can eat up a lot of DF when we have many independent variables short cut: u-hat^2 = gamma0 + gamma1*x1 + gamma2*x2... + v H0: gamma1 = gamma2 = 0

Answer 38

it is very often the case that the variance of the error term u depends on the regressors, but we don't know exactly what form the heterokedasticity takes, remedy this situation with robust standard errors

Answer 39

standard error changes, so test stats change coefficient estimates will not change, R^2 will not change don't use if homoskedastic

Answer 40

the coefficient of educ represents the association between cigs and years of educ after partialling out the shared variation between educ and the other regressors. We can now identify how wages vary with education while holding the other covariates constant. This feature of multivariate regression allows us to adopt a ceteris paribus interpretation of educ.

Answer 41

Homoskedasticity is the assumption that the variance of the error term is independent of the model regressors. The Breusch-Pagan tests evaluates the credibility of this assumption by evaluating the relationship between the squared residuals (in lieu of the squared errors, which we do not observe with sample data) and the model regressors. If the model regressors are strong joint predictors of the squared residuals, this would cast doubt on the assumption that squared errors are independent of the regressors.

Answer 42

The unbiasedness of OLS does not depend on the credibility of the homoskedasticity assumption. Consequently, if we had used heteroskedastic robust standard errors, the estimated coefficients would not have been different. However, the robust standard error estimates clearly would have differed

Answer 43

γ11 measures the ... difference between returns to education in the west and that of the northeast γ1 measures the return to education in northeast only

Answer 44

LDVs are restricted variables – they can take on only few values (e.g. a few integers), have a restricted range (e.g. non-negative values), binary outcomes

Answer 45

1) LPM predicts probabilities less than 0 (0%) and greater than 1 (100%) 2) LPM not particularly good at “fitting” binary outcomes • LPM assumes that parents’ education will have the same effect at high levels (when a child is already likely to be placed in a good school) than at moderate levels → LPM doesn’t “bend” to fit data 3) LPM is heteroskedastic This can be seen by the distribution of the residuals around the regression line In fact, by construction, LPM always violates the homoskedasticity assumption Unadjusted standard errors will be biased and, consequently, statistical inference will not be valid

Answer 46

Transparent and easy to interpret 1) LPM predicts probabilities less than 0 (0%) and greater than 1 (100%) • As long as Assumptions 1 through 4 hold, the estimators are still unbiased • While predicted probabilities less than 0 and greater than 1 are clearly problematic, LPM works well for values close to average 2) LPM not particularly good at “fitting” binary outcomes • You can always model non-linear associations by including higher-order terms or specifying regressors as logs 3) LPM is heteroskedastic •Heteroskedastic robust standard errors are one way of retrieving unbiased standard errors

Answer 47

The problem is that if the particular specification you estimate does not capture the appropriate functional form for the relationship in which you’re interested, the zero conditional mean assumption will be violated

Answer 48

Adjusted R 2 (also written as R 2 ) Davidson-MacKinnon Test F -Test for Evaluating Nested Models Regression Specification Error Test (RESET)

Answer 49

A more rigorous way of evaluating the predictive power of a particular specification penalizes the inclusion of additional variables to a model Adj R2 =1− [SSR/(n − k − 1)]/[SST/(n−1)] In small samples, however, the R2 allows us to compare the predictive power of different specifications However, the R2 is not particularly useful in large samples ◦ The difference between R2 and R2 is indistinguishable when n is very large Also note that the R2 does not help us all that much in conducting hypothesis tests, particularly joint significance tests ◦ F-test of joint significance uses the R2, not the R2 only use the Adjust R2 for comparing nested models and non-nested models that have the same outcome variable

Answer 50

The Davidson-MacKinnon Test can be used to evaluate nested and non-nested models with the same dependent variable The intuition behind the Davidson-MacKinnon Test: if a particular specification is appropriate, then the fitted values of some alternative specification should not be significant predictors of the outcome variable

Answer 51

(1) y =β0 +β1x1 +β2x2 +u (2) y = β0 + β1log(x1) + β2log(x2) + u ``` Estimate Equation (2, the alternative model) and compute the fitted values yˆ2◦ Estimate Equation (1) and add yˆ2 as a regressor: y = β0 +β1x1 +β2x2 +δyˆ2 +u ``` Our null hypothesisis H0 :δ=0 ◦ Now conduct a t-test on δ • If we fail to reject H0 → then there is evidence that Equation (1) iscorrectly specified • If we reject H0 → then there is evidence that Equation (1) is misspecified There is a problem with this test, however → we might reject both tests or fail to reject both tests In that case referring to the adjusted R2 can be a good idea

Answer 52

Regression Specification Error Test (RESET) To evaluate the specification of any two nested or non-nested models, we can also implement the Regression Specification Error Test (RESET) Suppose we have a standard population model y =β0 +β1x1 +β2x2 +...+βkxk +u The intuition behind RESET is that: if a model is properly specified, then non-linear functions of the regressors (i.e., higher order and interaction terms) should not be statistically significant predictors of the dependent variable Limitation of RESET: No clear guideline of how to proceed if we reject H0 Advantages of RESET: We can conduct RESETs for a set of non-nested models and keep whichever model does not reject H0

Answer 53

We can therefore conduct the RESET by: 1) Generating yˆ2 and yˆ3 from the model we want to evaluate, and then plugging them back into the model: y =β0 +β1x1 +β2x2 +...+βkxk +δ1yˆ2 +δ2yˆ3 +u 2 2) conduct an F-test with the null hypothesis H0 :δ1 =δ2 =0 H1 :H0 does not hold If we fail to reject H0 → our original model captured all non-linear relationships between the dependent and independent variables, and our model is therefore correctly specified If we reject H0 → There are non-linearities that we haven’t accounted for, and our model is therefore misspecified

Answer 54

We know that omitted variables is often a threat to the zero conditional mean assumption But! Adding more regressors might not always be a good idea It is possible to have a bad control problem This is a problem in which one of your controls is an outcome of another regressor This can complicate the ceteris paribus interpretation of multiple regression Ex. can't include both alcohol consumption and alcohol tax in the same model because the tax directly effects consumption, the ceteris paribus condition breaks down: we can’t hold alcohol consumption constant if taxes change

Answer 55

• Most variables that we analyze in empirical research have been measured imprecisely for a variety of reasons ◦ Mis-coding ◦ Inappropriate measures of specific concepts ◦ Limited or uneven capacity to measure social, political, economic phenomena ◦ Variables that represent averages of more complex, granular information Consequences for OLS: the OLS estimates are less precisely estimated This is because our our error term, ε contains both u and the measurement error, so that Var(ε) = Var(u) + Var(e0) > Var(u) As a result, the error variance is higher when we have measurement error in the dependent variable The higher the variance of the measurement error → the higher the estimated error variance → Higher standard errors for all of the OLS estimates → Lower statistical significance for all of the OLS estimates

Answer 56

Measurement Error in the Independent Variables * So, when we use the mismeasured independent variable educ, then the ZCM Assumption breaks down * This situation is called Classical Errors-in-Variables (CEV) • Because the ZCM Assumption is not satisfied, OLS is biased and inconsistent * It can be shown that the direction of this bias is predictable * Specifically, classical-errors-in-variables gives rise to attenuation bias, which means that the estimate βˆj is biased toward 0 (i.e., towards finding a practically smaller association or effect) * With CEV, the OLS coefficients are also imprecise This is not just a problem for the mis-measured regressor If the mis-measured variable is cross-correlated with the other regressors, which is very likely, then all of the other regressors will be biased toward 0

Answer 57

What can we do about measurement error? * Data cleaning (i.e., check the data and re-code any identified mistakes) * Instrumental variables estimation

Answer 58

◦ Then the random sampling assumption still holds ◦ OLS still unbiased and consistent ◦ But (randomly) missing observations reduce the sample size available for a regression → less precise estimates

Answer 59

Suppose that we are interested in identifying the effect of x1 on y, but an omitted variable has resulted in a biased OLS estimate βˆ1 - Then an IV estimator, named z, is a variable that does not show up in Equation 3, but must relate to it in two different ways 1) The instrumental variable z must be “exogenous” to the outcome y Cov(z,u) = 0: The instrument is uncorrelated with the error term (“Instrument Exogeneity Assumption”) • This condition means that z is “as good as random” to the outcome variable We can't directly test assumption 1: We don’t observe u so we cannot definitively confirm or deny whether it is uncorrelated with a proposed instrument Researchers have to make a good argument for why an instrument is exogenous, backed up by quantitative tests of the exogeneity assumption (which we will talk about in the next lecture) 2) The instrumental variable z must be a good at predicting variation in the independent variable x1 Cov(z,x) ̸= 0: The instrument is correlated with the independent variable for which we want to instrument (education in the previous example) (“Instrument Relevance Assumption”) • By “correlated,” we mean that the instrument is a substantively and statistically significant predictor of the instrumented variable the more correlated, the better We can direction test assumption 2, If we regress the instrumented variable x on the instrument z (while controlling for the other regressors in the main population model), then we can directly observe the partial association between x and z The direction of the relationship should make sense

Answer 60

The estimated coefficient βˆnjpost represents the change in the number of employees per restaurant in New Jersey relative to the change in the number of employees in Pennsylvania.

Answer 61

This is because it accounts for pre-existing differences between New Jersey and Pennsylvania that did not change over the course of the policy process—for instance, class composition or geography. In the previous model, we were not be able to differentiate between differences in employment between New Jersey and Pennsylvania that were attributable to pre-existing differences or to the policy change enacted in New Jersey. With the difference-in-difference model, however, we can estimate how employment outcomes changed in New Jersey relative to Pennsylvania over time, giving us a more reliable estimate of the impact of the policy change.

Answer 62

The advantages of the Probit model (and maximum likelihood estimation models in general) relative to LPM are several. First, Probit models always predict probabilities bounded by 0% and 100%, while the LPM can predict probabilities outside of this range. Second, the Probit model automati- cally captures non-linear effects because it estimates changes in the probabilities implied by (linear) changes in the z-scores under a standard normal probability distribution. Third, the Probit model intrinsically accounts for heteroskedasticity, which means that estimated standard errors are always unbiased. The disadvantages of the Probit model relative to LPM are that Probit estimates do not have a straightforward interpretation (they correspond with z-scores under a standard normal probability distribution), and that they rely on the assumption of a normal distribution of the error term, an assumption that we do not need to make under OLS when the sample size is asymptotically large. One consideration of the LPM is that, while it does not always predict probabilities bounded by 0% and 100%, it can reliably capture the ceteris paribus effect of a particular regressor on a binary outcome for observations with values close to the average for the regressor of interest

Answer 63

The intuition behind the RESET is that if a model is properly specified, then non-linear functions of the regressors (i.e., higher order and interaction terms) should not be statistically significant predictors of the dependent variable. We can obtain particular non-linear transformations of the covariates without expending a large number of degrees of freedom by computing and including yˆ2 and yˆ3. Under the RESET, the null hypothesis is that the non-linear transformations of the regressors yˆ2 and yˆ3 are not systematically associated with the outcome variable. F test of R^2 in restricted and unrestricted

Answer 64

Under the test of overidentifying restrictions, the dependent variable is the residual from the IV 2SLS estimation. This residual captures the endogenous variation in lwage—that is, the variation in logged wages not explained by the exogenous predictors (by assumption) from the second stage equation. Consequently, if one of the instrumental variables is statistically similar to the other in predicting the IV 2SLS residuals, after controlling for the exogenous controls from the structural equation, then this would provide evidence that the instruments are indeed exogenous

Answer 65

IV 2SLS strategy relies on a portion of the total variation in the endogenous variable that is predicted by the instrumental variable, whereas OLS employs the total variation in the endogenous variable

Answer 66

For libcrd14 to be a valid instrument for educ, it must satisfy the instrument exogeneity and instrument relevance assumptions. For the instrument exogeneity assumption to hold, owning a library card at the age of 14 must be “as if” random to wage outcomes. That is, individuals with different income levels, educational achievements, and other relevant characteristics are equally likely to have held a library card at the age of 14. Yet another way of saying this is that library card ownership should not exhibit any direct effect on wages after controlling for the covariates in the structural equation. For the instrument relevance assumption to hold, possessing a library card at adolescence needs to exhibit a statistically and substantively significant association with subsequent educational achieve- ment. If both of these assumptions hold, it follows that library card possession influences wage outcomes only through its effect on educational attainment.

Midterm Flashcards

(90 cards)