RDD Flashcards

1
Q

Let W be the treatment indicator and use it in the potential outcome framework to show the sharp RDD

A

W = I[X\geq c]

Y = I[X\geq c]Y(1)+[X< c]Y(0)

See notion for details

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the main difference between fuzzy and sharp rd?

A

For a Sharp RD, the probability of treatment jumps from zero to one exactly ay $c$. For a Fuzzy RD, the probability doesn’t jump all the way to 1 at $c$. Rather, the individual has a “higher” probability of being treated at the threshold.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What typ of effect do we estimate in a sharp RDD?

A

In an RDD we can not estimate any $ATE = E(Y(1)-Y(0))$ or $ATT = E(Y(1)-Y(0)|W=1)$. Rather we will estimate a LATE. That is, a local ATE at some specific value $X$, i.e., $c$. Let’s call te $LATE_c = \tau_c$. (However, note that this is not a LATE in the correct sense. More a LATE in the sense that it regards a subsample close to the threshold rather than a subsample that takes up the treatment. In a fuzzy design, we will have a strict LATE effect in its original meaning.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the main identification in the RD world?

A

To be able to identify $\tau_c$ we need that $\mu_g(X)$ is continuous at $c$, where ($g=0,1$). In practice, this means “no sorting”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the RDD estimand?

A

Let $m^-(c)$ be how $X$ approaches $c$ from below and $m^+$ be how $X$ approaches $c$ from above. Our estimand is then $\tau_c = m^+(c)-m^-(c)$

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Write the two equation (which can be combined) that we estimate in the local linear RDD.

A

Y(0) = \mu_0(c)+\beta_0(X-c)+e_0

Y(1) = \mu_1(c)+\beta_1(X-c)+e_1

Where \mu is the expectations function e.g., E[Y(1)|X] = \mu_1(X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the problem with the local linear RDD approach?

A

We very heavily rely on the linearity assumption. I.e., we assume that the red and the blue lines are linear. This a strong assumption that we do not like to assume since we like to make as few assumptions as possible.

Also, we know that linear regressions estimate most badly at the boundary points, and in the RDD we are most interested in the boundary points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the non-parametric RDD approach?

A

In this approach, we do not make any assumptions about the functional form. This is because we are only interested in what is happening at the cut-off boundary points, not at any other places.

In general, we do this using a kernel regression. Here we then use data around some neighborhood (window = bandwidth) around $c.$ We generally put more weight on the observations closest to $c$. this is done by the choice of “kernel type”. The kernel is thus a weighting function that decides how we will weigh different data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Is kernel a OLS estimate?

A

These are different from the OLS since we use the kernel as a weighting scheme. Thus, we have weighted least squares (WLS) rather than OLS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How about variance/bias in the with a local linear kernal?

A

With local linear regression, we get less bias but more variance mechanically. This is because we do estimate the data better so we get less bias, but we get more bias since we now estimate two parameters, both the intercept and slope. Local linear regression is the most common type we use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe the bias variance trade off when choosing BW

A

The smaller BW = $h$ we choose, the less bias we are going to get as we will better and better estimate the stuff that is going on near the threshold. However, when we are reducing the BW, we get fewer and fewer observations, thus the variance increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the result of choosing a triangular kernel?

A

Choosing a triangular kernel gives the most weight to the observations closest to the cut-off.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the mean threats to identification of the RDD?

A

The two main concerns are

  1. Other factors jump at the threshold
  2. Sorting (manipulation of the running variable)

Regarding (1), it is no threat to identification, however, we will only get some kind of reduced form since we will capture a lot of other stuff in the estimates as well.

Regarding (2), this is a threat to identification but only a problem if we have “perfect” sorting. That is, individuals can sort perfectly on either side of the threshold. Just being able to somewhat change $X$ in an insignificant way doesn’t threaten our design. E.g., on a test, we can always choose to score a little better or a little worse, however, we might not be perfectly able to manipulate our score to be at a specific point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the RDD check-list? E.g., the things we need to consider and check for regarding robustness.

A
    • Is there actually a jump?
      • [ ] Show/picture the running variable without regression lines and cut off line to show that there is a clear jump.
    • No bunching in the running variable
      • [ ] Show density graph
      • [ ] Run a McCrary density test
    • The potential outcome is continuous
      • [ ] Show the RDD graph with bin-averages
      • [ ] Placebo checks at other thresholds
    • Balancing checks: Pre-determined covariates (characteristics) are balanced around the threshold
      • [ ] Continuity checks (graphical or formal) for other covariates
    • BW sensitivity checks
      • [ ] Show graphically how estimates and CI changes with different BW. Show explicitly what the optimal is.

Donut regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly