Exam 2 Trivia Flashcards
(14 cards)
What does the bind rows() command do in R?
It stacks two data frames with the same variable names on top of each
other. Unlike with rbind(), the order of the variables in each data does
not matter when you use bind rows()
How would you know if it is necessary to take the log of a variable?
There a few different ways, including by analyzing the summary statistics of a variable or examining its Q-Q plot
What is a completely pooled regression?
It happens with no fixed effects or random intercepts, but complete pooling can include adjustments for standard errors. There is only one intercept in the model from which every estimate is based
What is a no pooling regression
It happens in fixed effects models, most visibly through least-squares dummy variable (LSDV) regression with different estimates for each group
What is partial pooling?
It happens in random intercept or random slope models. Based on a
normal distribution, the random effects model shrinks each group’s
estimate toward toward the overall/grand mean, especially when groups
have few observations or are outliers. By doing so, random effect estimates
from multilevel models shares information about variance between groups,
avoids overfitting, and often yields more stable within-group estimates
than modeling each group’s intercept independently like in fixed effects.
What happens to the intercept in Least Squares Dummy Variable (LSDV) regression?
It is no longer the starting point for all observations. Instead, it is the starting point for the reference category, which is the dummy variable that R leaves out of the regression to avoid collinearity
Why it is often useful to cluster standard errors by group?
Clustering the standard errors addresses the fact that observations from the same group may have correlated residuals, so clustering adjusts the size of the standard errors.
What does the adjustmentSets() command from the dagitty package do in R?
It helps us figure out which variables that we need to control for in our regression. While there are always many potential control variables, we don’t need to control for them all.
How might we check if matching is doing what it is supposed to do?
We can and should run density plots to see if the treatment and control
distributions are more similar after matching—i.e., is there common
support?
Is matching better used as a method to directly test a hypothesis, or is matching more useful as a pre-processing method?
It is more useful as a pre-processing method to reduce imbalance and make less estimates less model-dependent, prior to running whatever parametric estimation method you were going to run anyway (e.g., linear
regression or logit).
What is the idea behind the matching frontier?
The idea is that there is trade-off between balance and sample size. As you decrease the sample size, you increase balance—but only to a point. At that point, there isn’t much utility from a balance/causal inference perspective from dropping more units.
From an inference perspective, the idea is to estimate the effects for all points along the frontier to get a sense of the external validity of the estimiates. This way, we can see if the estimates to which samples are
picked
What is a propensity score? How do you estimate it?
The propensity score captures the probability of receiving the treatment (X) given the covariates (Z ): P(X = 1)|Z . We usually estimate it using logistic regression, where X is the dependent variable and Z are the covariates
Is a propensity score more useful for matching or weighting?
The propensity score is more useful for weighting—specifically, inverse probability weighting. When using the propensity score for matching, it gives us the equivalent of complete random sampling in an ideal scenario, when we usually want blocked/stratified random sampling.
What is the parallel trends assumption in difference-in-differences? Draw and explain.
It is the idea that the estimates would have travelled on a parallel course in the absence of the treatment.