Trivia Flashcards
(12 cards)
How would you know if it is necessary to take the log of a variable?
T here a few different ways, including by analyzing the summary statistics of a variable or examining its Q-Q plot.
What is a completely pooled regression?
It happens with no fixed effects or random intercepts, but complete pooling can include adjustments for standard errors. There is only one intercept in the model from which every estimate is based
What is a no pooling regression?
It happens in fixed effects models, most visibly through least-squares dummy variable (LSDV) regression with different estimates for each group
What is partial pooling?
It happens in random intercept or random slope models. Based on a normal distribution, the random effects model shrinks each group’s estimate toward toward the overall/grand mean, especially when groups have few observations or are outliers. By doing so, random effect estimates from multilevel models shares information about variance between groups, avoids overfitting, and often yields more stable within-group estimates than modeling each group’s intercept independently like in fixed effects.
What happens to the intercept in Least Squares Dummy Variable (LSDV) regression?
It is no longer the starting point for all observations. Instead, it is the starting point for the reference category, which is the dummy variable that
R leaves out of the regression to avoid collinearity.
Why it is often useful to cluster standard errors by group?
Clustering the standard errors addresses the fact that observations from
the same group may have correlated residuals, so clustering adjusts the
size of the standard errors.
What does the adjustmentSets() command from the dagitty
package do in R?
It helps us figure out which variables that we need to control for in our
regression. While there are always many potential control variables, we don’t need to control for them all.
Is matching better used as a method to directly test a hypothesis, or is matching more useful as a pre-processing method?
It is more useful as a pre-processing method to reduce imbalance and make less estimates less model-dependent, prior to running whatever parametric estimation method you were going to run anyway (e.g., linear regression or logit).
What is the idea behind the matching frontier?
The idea is that there is trade-off between balance and sample size. As you decrease the sample size, you increase balance—but only to a point.
At that point, there isn’t much utility from a balance/causal inference perspective from dropping more units
From an inference perspective, the idea is to estimate the effects for all points along the frontier to get a sense of the external validity of the
estimates. This way, we can see if the estimates to which samples are picked.
What is a propensity score? How do you estimate it?
The propensity score captures the probability of receiving the treatment (X) given the covariates (Z ): P(X = 1)|Z . We usually estimate it using
logistic regression, where X is the dependent variable and Z are the covariates.
Is a propensity score more useful for matching or weighting?
The propensity score is more useful for weighting—specifically, inverse
probability weighting. When using the propensity score for matching, it gives us the equivalent of complete random sampling in an ideal scenario,
when we usually want blocked/stratified random sampling
What is the parallel trends assumption in difference-in-differences? Draw and explain.
It is the idea that the estimates would have travelled on a parallel course in the absence of the treatment.