Exam 1 Answers Flashcards

Question 1

Q

Draw a fork DAG. Label the variables and explain the essence of a fork DAG with an example.

Answer

A

Example:
* smoking (𝑋) → lung cancer (𝑌 ), but not the other way around (no reverse causality)
* genes (𝑍) → smoking (𝑋), but not a two-way relationship (violates acyclic characteristic of DAG)
* genes (𝑍) → smoking (𝑌 ): it is a confounding relationship
The fork explains a relationship of common-cause confounding

Question 2

Q

What is a mediator? Explain and draw the DAG

Answer

A

The mediator is 𝑀 in the above DAG. Mediation is the process by which a third vari-
able (i.e., the mediator) transmits the effect of an independent variable on a dependent
variable.
To better understand mediators, let’s consider the effect of attending an elite high school
(𝑋) on going to an elite university (𝑌 ). A key reason/mechanism why attending an elite
high school usually enables a student to go to an elite university is because elite high
schools have better teachers. We denote better teachers with 𝑀 , not 𝑍. We do so
because better teachers represent a pathway mechanism/mediator, not a confounder,
through which students can get into an elite university. In short, the three variables of
interest are:
* 𝑋: Attending an elite high school (main independent variable/treatment/exposure)
* 𝑌 : Going to an elite university (dependent variable/outcome)
* 𝑀 : Better teacher in the elite high school (mediator)
For 𝑀 to be a mediator, we need to meet two core essential characteristics:
1. Elite high school (𝑋) → elite university (𝑌 ). On average, students who attend elite
high school more frequently go to elite universities. This is very uncontroversial.
2. Better teachers (𝑀 ) must be on the path from elite high school (𝑋) to elite
university (𝑌 ): that is, 𝑋 → 𝑀 → 𝑌 . The fact that better teachers lead to better
student academic outcomes is also very uncontroversial.
We also need to avoid reverse causality, yielding a third essential characteristic:
3. Elite university (𝑌 ) ↛ elite high school (𝑋). Students can’t get into university
without first going to high school, so such a path is impossible.1 More technically,
the no reverse causality assumption in mediation analysis with three variables is
called “sequential ignorability”.

Question 3

Q

What is a direct effect, and how does it relate to mediation

Answer

A

A direct effect refers to the mediation effect produced every other mediator except 𝑀. In the example from question 2, it would be the every other reason besides good teachers why going to an elite high school helps with getting into an elite university

Question 4

Q

In the language of DAGs, what does it mean to close all relevant backdoor paths?

Answer

A

It means to only close the backdoor paths associated with confounders, not colliders or mediators — i.e., assuming an interest in some form of an average treatment effect as the estimand.

Question 5

Q

What is a collider? Draw the DAG and explain.

Answer

A

Colliders (𝐶) are variables that, if adjusted for, can introduce a spurious relationship
between 𝑋 and 𝑌

Question 6

Q

What is selection bias and how do you indicate it in a DAG?

Answer

A

Selection bias is when you have of availability of data in your sample that does not represent the population for which you are making your inference. You indicated selection bias with 𝑆

Question 7

Q

Draw a DAG with selection bias on the dependent variable. What can we learn in such instances? Explain with an example.

Answer

A

In our smoking example, sample selection bias on the dependent variable (𝑌 ) entails
having a sample of either people mostly with lung cancer or mostly without lung cancer.
In either case, it is diﬀicult to make any within-sample inference with respect to causality or prediction, because there is not enough variation in people. For external validity, we can’t make inferences to a larger population if we don’t have (1) data representative of that population; or (2) variables in our sample to adjust for the sample-population differences.
Now, you may be wondering: even if we don’t have variation in the dependent variable, people with lung cancer (𝑌 ), can’t we just figure out if the independent variable, smoking (𝑋), changes 𝑌 in some way? Technically, we can. As Brady and Collier (2010) highlight, selection on the dependent variable (𝑌 ) is core to the multiple case study approach. To that end, qualitative researchers often select some cases where 𝑌 is stronger, others where 𝑌 is weaker, and analyze the mechanisms/reasons explaining 𝑌 under each scenario. As some of you may have noticed, the multiple qualitative case study approach resembles mediation analysis and front door adjustment. The catch is that the qualitative approach is not as inferentially robust if we wish to generalize or transport results beyond the sample. As Bennett (2006) and Brady and Collier (2010) argue, it can be valuable to learn more about specific cases, even if external validity is not the goal. However, mediation analysis is a better option if we wish to generalize or transport results, because we can rely on the central limit theorem and the law of large numbers. By contrast, the set of potential outcomes are limited with the smaller samples in qualitative analysis

Question 8

Q

You are presented with a regression in which the author uses female literacy as the independent variable (𝑋) and overall literacy (𝑌 ) as the dependent variable. Does this seem like a valid set up? Why or not?

Answer

A

No, it is not a valid set up. Clearly, female literacy is part and parcel of overall literacy.
Accordingly, there is no point in running a regression of something that already partly
explains something else by design

Question 9

Q

What is positivity in the context of internal validity? Provide an example to show that you understand it. It may also be helpful to draw something

Answer

A

Whether the different manifestations of the independent variable (𝑋) overlap across subgroups/strata of the treatment and control, taking into account selection bias ( 𝑆 ) that can result in under- or over-coverage

Question 10

Q

What is positivity in the context of external validity? Provide an example to show that you understand it. It may also be helpful to draw something

Answer

A

Whether the different manifestations of the independent variable (𝑋) and effect modifiers (𝑉 ) overlap across the sample and population, taking into account
selection bias ( 𝑆 ) that can result in under- or over-coverage.

Question 11

Q

What are generalizability and transportability?

Answer

A

Generalizability is when the sample is embedded within the population of interest, and transportability is when the sample corresponds to another population of interest.

Question 12

Q

What is an Intent to Treat (ITT) effect?

Answer

A

The effect of assigning the treatment, even if people did not comply with their treatment assignment. The ITT is often a very conservative estimand.

Question 13

Q

What are the treatment (𝑋), instrument (𝑄), dependent variable (𝑌 ), and confounder (𝑍) in Column (2) of Table 4 of Acemoglu, Johnson, and Robinson (2001)? Also, draw the DAG

Answer

A

𝑋: average protection against expropriation risk (quality of institutions)
𝑄: settler mortality in colonial era
𝑌 : log GDP per capita
𝑍: latitude

Question 14

Q

What is the exclusion restriction in Acemoglu, Johnson, and Robinson
(2001)

Answer

A

Settler mortality at the time of colonization (𝑄) must not be directly related to current
GDP per capita (𝑌 ). If settler mortality at the time of colonization were related to current GDP, it would violate the exclusion restriction assumption.

Question 15

Q

Explain two-stage least squares in instrumental variables. Show the math

Answer

A

The idea is to use the exogenous variation in the first stage between the instrument (𝑄)
and the treatment (𝑋) to overcome the endogenous/reverse causal relationship between
the treatment (𝑋) and outcome (𝑌 ). We attain the exogenous variation in the first
stage by putting 𝑋 as the dependent variable:
𝑋𝑖 = 𝛼 + 𝛽1𝑄𝑖 + 𝜀𝑖 (1)
where 𝛼 is an intercept and 𝜀 is an error term. Note how the independent variable (𝑋)
serves as the dependent variable in the first stage. Now that we have the first stage
figured out, let’s specify the second stage is: 𝑌𝑖 = 𝛾 + 𝛿 ̂ 𝑋𝑖 + 𝜂𝑖 (2)
Above, the hat on top of 𝑋 indicates that we are not using all of 𝑋; we are only using the
predicted variation from the first stage. You may also note that we have used 𝛾 (Gamma)
for the intercept and 𝛿 (delta) for the variable coeﬀicients to prevent confusion

Question 16

Q

What should you look for in the first stage of instrumental variables estimation

Answer

Study These Flashcards

A

F-stat above 10 or 11 (suﬀicient for full points)
* High 𝑅2 and/or high correlation (helpful for external validity, but only worth half points)

Question 17

Q

Draw the two potential DAG(s) for standard natural experiments and explain your rationale

Answer

Study These Flashcards

A

The experiment DAG is possible because 𝑍 is irrelevant is 𝑋 is randomly assigned—even if the researcher doesn’t control it in a natural experiment. The fork DAG is possible because the as-if random assignment is not always perfect in natural experiments, so it is often necessary to control for 𝑍.

Question 18

Q

Explain in detail at least one limitation of standard natural experiments

Answer

Study These Flashcards

A

Nature doesn’t assign every treatment as-if randomly:
a. Randomized experiments have similar problem:
* Not everything can be randomly assigned for ethics or feasibility reasons
b. Many major social science phenomena not assigned as-if randomly:
* Democracy/autocracy (political science)
* Social capital (sociology)
* GDP (economics)
Overclaiming
a. Everyone wants to say that their treatment is as-if randomly assigned
* It makes life easy when we can ignore the confounding (𝑍) variables, but
that is frequently not the case
b. Standard natural experiments are hard to verify
* They necessitate great qualitative knowledge of treatment assignment
CPOs. Otherwise, if we can’t show that the treatment assignment is as
good as random through qualitative knowledge of the case, it is really
not a natural experiment.
Balance tables are the main method that we have to show that observations are assigned as-if randomly, but balance tables are only useful if (i) we have access to all of the potential 𝑍 variables; and (ii) they are measured correctly. Often, we can’t meet those two criteria.

Question 19

Q

What the main distinction between a standard natural experiment and a field experiment?

Answer

Study These Flashcards

A

The researcher controls the randomization device in a field experiment, whereas nature or happenstance makes the randomization in standard natural experiments

Question 20

Q

What is ignorability in the context of internal validity?

Answer

Study These Flashcards

A

No unmeasured confounders.

Question 21

Q

What is ignorability in the context of external validity?

Answer

Study These Flashcards

A

No unaccounted for selection biases that generate difference between the sample and population in terms of effect modifiers

Question 22

Q

What are strata?

Answer

Study These Flashcards

A

A subgroup that divides the population for meaningful analysis. Technically, strata are
non-overlapping as well.
(Note to grader: I didn’t stress the non-overlapping part in my lecture notes, so you can give full points for the first part)

Exam 1 Answers Flashcards

(22 cards)