Cross-Section Data Concepts Flashcards

Question 1

Q

Ordinary Least Squares (OLS)

Answer

A

Regression that is used to analyze how variables are related to each other. OLS draws the best fitted line to the observations.
Control variables and interaction terms can be added to improve performance of OLS model.
Logarithm can be used for dependent variable to better fit the data. Introducing logarithm makes the model non-linear.
1. Log-Level: A 1 unit change in x is related to a 100*ß percentage point change in y.
2. Log-Log: A 1% change in x is related to a ß % change in Y.
3. Level-Log: A 1% change in x is related to a ß/100 change in y.
Common Problems with OLS:
1. Omitted Variables
2. Reverse Causality
3. Measurement Error

Question 2

Q

Identification Problem (OLS)

Answer

A

Correlation does not imply causation. Causation can be inferred from:

Dif-in-Dif
IV
RD
Structural VAR

Question 3

Q

BLUE

Answer

A

Best Linear Unbiased Estimate

Key identifying assumptions for OLS

Question 4

Q

Endogeneity

Answer

A

If the explanatory variable is correlated with the error term, the variable is endogenous. It could be due to omitted variables but not only.

Question 5

Q

Linear Probability Model

Answer

A

A linear probability model uses a dummy variable as the dependent variable.

Question 6

Q

Multiple Regressions

Answer

A

One way to solve the omitted variables problem of OLS, by including more factors in the regression.
Increases the efficiency of the model.
Does not necessarily solve the omitted variable bias.

Question 7

Q

Panel Data

Answer

A

Using panel data allows to control for all variables that do not change over time.
Disadvantage:
- One has to be sure that the variables indeed do not change over time.
- Variables that change over time can only be included via dummies. But with to many dummies, the model might become overspecified.
- Panel data can lack variation in the data.

Question 8

Q

Difference-in Difference (DiD)

Answer

A

Using DiD one divides the sample into two groups - the control group and the treatment group. These two groups are then compared before and after a treatment occurred.
Key assumptions:
- Parralel trend assumption -> if the treatment had not happened, trends would have stayed the same (no statistical test to see whether assumption holds)
- Random assignment to groups is not an assumption but rather a necessary condition to draw conclusions about causality
Advantage:
- solves causality problem of OLS
Disadvantage:
- Randomness is a pre-condition.
- Problematic if there is a common trend before the treatment.

Question 9

Q

Instrumental Variable (IV)

Answer

A

Two conditions for an instrument to be valid:

Instrument z should be strongly correlated with independent variable x
- Instrument is valid when: F-Statistic > 10 or t-statistic > 3.33
- F-statistic has to be used when there is more than one instrument
Instrument z should not be correlated with the error u
- cannot be tested as u is unobservable
- SE of instrument can give indication on validity

Advantages:
- IV solves the three problems of OLS
- possible to include more than one instrument (compared to DiD)
- possible to make causal interferences
Disadvantage:
- SEs are relatively large -> loss of efficiency
- possible overidentification

Test to choose between IV and OLS:
Hausmann test
H0 = no endogeneity
HA = endogeneity

Question 10

Q

Overidentification

Answer

A

When adding more than one instrument to the model, the model can become overidentified. The general idea is that more instruments are included than needed to estimate the parameters consistently. This worsens the performance of the model. This is specifically a problem when you include more instruments than endogenous variables.

Tests for overidentification:
1. Sargan test
- assumes that at least one instrument is valid
- an instrument that is invalid is correlated with the residual
H0 = No overidentifaction/valid instruments
HA = at least one instrument is invalid
2. Hansen’s J-statistic
- used when there is heteroskedasticity
- interpretation same as for Sargan test

Question 11

Q

LATE Theorem

Answer

A

The Local Average Treatment Effect (LATE) refers to the limitation that IV estimates are based on the behavior of those captured by the instrument and therefore no conclusions can be drawn on the behaviour of others. Depending on the nature of the instrument, it may be impossible to identify any meaningful subpopulation whose behaviour is being measured.

Question 12

Q

Regression Discontinuity

Answer

A

Similar to IV and can also be used to establish causal relation.
In RD you take a subsample, which consists of observations that are close around the instrument (before and after).
The further you move away from the threshold (the larger the bandwidth gets), the more dissimilar control and treatment group become (SE increases).
For small bandwidth conclusions about causality can be drawn.
Disadavantage of small bandwidth: small sample size

The donut method:

fact that observations close to threshold are removed as they might be biased.
Reduces the bias from manipulation
If manipulation impossible, no need for donut method.

Question 13

Q

Natural Experiments

Answer

A

Conducted by selecting random sample and dividing into treatment and control group. Treatment group is offered treatment, control group not.
Key assumptions:
- characteristics of groups similar
- ensured by random selection if sample is large enough
Advantage:
- no reverse causality due to random selection
- possibility to establish causal relationship
Disadvantage:
- not always feasable
- some characteristics cannot be changed or controlled
- possibility of treatment dilution

Question 14

Q

Misspecification

Answer

A

Specification of the model:

normality
homoskedasticity
-> Variance of error needs to be constant
functional form
-> a multiple regression suffers from a functional form misspecification when it does not properly account for the relationship between dependent and observed explanatory variables

Test for misspecification:
Ramsey RESET test
H0 = regression is well specified
HA = there are omitted variables

Question 15

Q

Heteroskedasticity

Answer

A

The variance of the error term is not constant across observations.
Test for heteroskedasticity:
1. Berausch-Pagan test
H0 = homoskedasticity (no heteroskedasticity)
HA = heteroskedasticity
2. White Test
H0 = homoskedasticity (no heteroskedasticity)
HA = heteroskedasticity
If there is heteroskedasticity in the data it is necessary to use robust standard errors

Question 16

Q

Robust Standard Errors

Answer

Study These Flashcards

A

Standard errors that are used in the case of heteroskedasticity. Without heteroskedasticitiy, one uses “conventional” standard errors.

Cross-Section Data Concepts Flashcards

(16 cards)