Questions from Course Manual Flashcards
How can we use the standard error to infer statistical significance of a coefficient?
The standard error determines how much variability “surrounds” a coefficient estimate. A coefficient is significant if it is non-zero.
How do we need to interpret coefficients when independent variables are dummies?
A dummy variable is always compared with the reference group. For example, in a regression assessing the relationship between income and political affiliation, a positive regression coefficient means that income is higher for the dummy variable than for the reference group.
What does controlling for other variables mean? What is the difference with interaction variables?
Controlling for a variable is the attempt to reduce the effect of confounding variables.
An interaction variable is a variable constructed from an original set of variables to try to represent either all of the interaction present or part of it.
What does the R2 measure and mean?
R-squared explains to what extent the variance of one variable explains the variance of the second variable.
How can misspecification tests help to assess the validity of your OLS estimates?
A regression suffers from misspecification of the functional when the functional form of the estimated regression model differs from the functional form the population regression function. Functional form misspecification leads to biased and inconsistent coefficient estimators.
Under which conditions do omitted variables or reverse causality bias coefficients?
For omitted variable bias to occur, two conditions must be fulfilled:
- X is correlated with the omitted variable
- The omitted variable is a determinant of the dependent variable Y
Reverse causality leads to correlation between X and the error in the population of interest such that the coefficient on X is estimated with bias.
What happens to the estimates when there is measurement error in the dependent variable?
if there is measurement error in the dependent variable, and the measurement error is random than there is no bias but only an increase in variance. If the error is random, then there is a bias.
What happens to the estimates when there is measurement error in the independent variable? In which direction does measurement error bias coefficients in this case?
When independent variables are measured imprecisely, we speak of errors-in-variables bias. This bias does not disappear if the sample size is large. If the measurement error has mean zero and is independent of the affected variable, the OLS estimator of the respective coefficient is biased towards zero.
Under which conditions can panel data be used to solve omitted variable bias?
Regression using panel data may mitigate omitted variable bias when there is no information on variables that correlate with both the regressors of interest and the independent variable and if these variables are constant in the time dimension or across entities.
What is meant by Pooled OLS? When is Pooled OLS appropriate? Can this be tested?
- POLS refers to the application of OLS to panel data. In POLS the data is treated as if it were cross-sectional, ignoring the time effect.
- POLS is appropriate when the explanatory variables in each time period are uncorrelated with the idiosyncratic error (the time-varying part of the error)
- to test for pool ability use Breusch-Pagan Test, which tests for heteroscedasticity
- if the errors are heteroskedastic then there is a correlation idiosyncratic error and the explanatory variable x.
What is meant by a fixed effects estimator?
In Panel data where longitudinal observations exist for the same subject, fixed effects represent the subject-specific means. IN panel data analysis the term fixed effects estimator (also known as the within estimator) is used to refer to an estimator for the coefficients in the regression model including those fixed effects (one time-invariant intercept for each subject).
What is the difference between FE and RE estimator? How can you choose between the FE or RE estimates?
- There are two assumptions about the individual-specific effect: FE assumption and RE assumption.
The RE assumption is that the individual-specific effects are uncorrelated with the independent variable.
The FE assumption is that the individual-specific effects are correlated with the independent variable.
Hausmann checks whether FE & RE generate similar results
Sargan J determines correlation between error term and independent variable
What is meant by a First Difference estimator?
FD approach is used to address the problem of endogeneity which is caused by unobserved heterogeneity. The endogeneity problem leaves the estimator biased and inconsistent, therefore FD are taken.
What are the criteria for a valid instrumental variable?
- The instrument z should be correlated with the independent variable x
- The instrument z should not be correlated with the error u
What are the criteria for a valid instrumental variable?
- The instrument z should be correlated with the independent variable x
- The instrument z affects the dependent variably y only through x.
What does a first stage mean? How do you determine whether an instrument, or a set of instruments, is strong? What is the danger of weak instruments?
- The instrument must be correlated with the endogenous explanatory variable. If the correlation is strong, then the instrument is said to have a strong first stage. A weak correlation may provide misleading inferences about parameter estimates and standard errors.
What does the exclusion restriction imply? Can this be tested?
Exclusion restriction:
The instrument z affects dependent variable y only through x. In other words, z itself does not cause y.
The exclusion restriction cannot be tested.
How does the IV estimator solve the potential endogeneity problem of OLS? Why does this create a trade off between consistency and efficiency? Can this be tested?
- we can apply IV estimations because IVs are used to cut correlations between error term and independent variable
- if there is correlation between instrument and error term, then IV regression is not consistent
- the standard errors of IV are relatively large, which causes a loss of efficiency
- Hausman test checks endogeneity condition to decide between OLS and IV
How does Regression Discontinuity (RD) work? What is the difference with an IV approach?
In RD you take a subsample, which consists of observations that are close around the instrument (before and after). The further you move away from the threshold (the larger the bandwidth gets), the more dissimilar the control and treatment group become ( the standard error increases). For small bandwidth conclusions about causality can be drawn, as the groups are more similar and there are less disturbing factors influencing the model. Disadvantage of a small bandwidth: small sample size.
- -> For IV, instrument is random by assumption.
- -> For RD we know our cut-off - not random so we need to restrict the bandwidth
What is the difference between a sharp and a fuzzy RD?
Sharp RD: if treatment and control are perfectly predicted based on whether an observation is above or below the threshold
Fuzzy RD: if the threshold is strongly correlated with treatment group, but its not a perfectly predictive relationship
What kind of a trade-off does one need to make in the choice of bandwidth?
- larger bandwidth increases standard errors
- smaller bandwidth decreases sample size
What is meant by Difference-in-Differences? How can DiD be obtained by regression techniques?
- DiD is used when a treatment or program has to be evaluated
- there has to be treatment and control groups
- groups are observed before and after
Key Assumption: - trend in control group approximates what would have happened in the treatment group in the absence of the treatment
Why is a common trend important in this technique? Can this be tested?
- Without trend, groups are not comparable
- Trend can be tested visually by graphing the trend before and after the treatment
- Formal test which is also suitable for multivalued treatments or several groups is to interact the treatment variable with time dummies
What is an autoregressive (AR) model? When can it be used?
An AR model predicts future behavior based on past behavior. Its used for forecasting. The process is basically a linear regression of the data in the current series against one or more past values in the same series.
Where simple linear regression and AR models differ is that Y is dependent on X and previous values for Y.