Flashcards in Final Deck (36):
distance between the line and any given point
takes residuals, squares them and adds them up.
The regression shows us the best fitting line in terms of sum of squared errors
When the random variable, X, has the same variance for all observations of X. This isn’t a problem.
When the random variable, X, does not have the same variance for all observations of X.
When an independent variable is highly related to other independent variables, the variance of the coefficient we estimate for that variable will be high.
Variables in a data set come in different forms:
• Dummy (AKA binary or dichotomous) variables • Discrete versus continuous variables
• Ordinal variables
• Nominal variables
Data sets themselves also come in different forms:
• Cross-sectional data
• Panel (time series) data
one version of a transformed variable
Binary (dummy) variables are very useful.
• They are regularly used to control for specific effects (Think about the South in Larry Bartels’ article)
• They are used in experiments to identify the treated (1) and control (0) units.
Binary variables make difference
in means across two groups, or ‘average treatment effects’, very easy to calculate.
Comes in ’bins’ or groups.
• Example: On a scale of 1 to 5, how much do you like this class? 1, 2, 3, 4, 5. (5! Obv.)
• Polity score is another example.
• Discrete data lacks precision, The bin’s may be clear but they may not. We may not know what going from 1-2 means.
Can take any value in a sequence. Examples:
• Annual income
• votes for each candidate • percentages.
it describes how the world is. Often, categorical data comes from qualitative research.
Categorical data can be ordinal or nominal.
• Ordinal can be ordered (low, medium high—Comparable to discrete data)
• nominal cannot be ordered (majors: political science, economics, sociology, states).
to find out if something is a factor
to find out the levels of that factor and store it as an object
ifelse(data$factor == lev | data$factor == lev, 1, 0)
if a variable is at these levels, code it as a 1 (treated) otherwise code it as a 0 (control)
• A sample of a population in a given period of time. You observe a bunch of units at one time period.
• Example: Representative public opinion poll before an election.
Repeated cross-sectional data:
• Taking different samples of a population over time. You observe different units over time. • Example: Multiple waves of a representative public opinion poll, where different people
respond (pick up the phone) in each wave (sample).
Panel (time-series) data:
• Seeing the same population repeatedly over time. You see the same units over given time periods.
• Example: countries’s GDP and wars by year from 1980-2015, turnout in each CA precinct between 2000-2014, Survey efforts that repeatedly target the same group of people annually.
Fixed effects models
control for unit specific effects – it nets out the average way that unit behaves over time.
• You could think of this as the ’culture’ of a unit.
Fixed effects models also
control for time period effects
• they control for the average behavior in a given year.
• Example: average income during recession years, average turnout in each presidential election year.
Fixed effects approach
allows us to control for any factor that is fixed within the entire panel, regardless of whether we observe the factor
deal with the endogeneity problem by creating exogeneity.
What factors do we need to control for, as covariates, in an experiment?
None! If randomization was done properly, and you have a large enough sample size, in expectation we should have balance across all the covariates.
Average Treatment Effect
Since D can only take two values, a one unit shift in D is the effect of a unit moving from control (0) to treated (1). It’s an average effect calculated across the full sample.
Potential causes of attrition:
Potential solutions to attrition?
- Attrition is when units drop out of our experiment. For this reason, we never see the result of the experiment for this unit. In other words, the outcome variable is censored.
- Potential causes of attrition: frustration with experiment, busyness, moving, death
- We might expect that the longer the experiment lasts, the more likely we are to see attrition.
- What is the problem with attrition? It’s likely not random. Some part of our sample will be more likely to attrit than others. This undermines the exogeneity created through random assignment.
- Potential solutions to attrition? 1) Trimming the data set to make the treated and control units look the same on observable covariates (remember: this might not address unobservables). 2) Modeling the attrition to control for it.
- Balance is when your treated and control units look the same on other variables (covariates.)
- You can check this with a balance table: a table that describes the mean and variance for relevant control variables.
- If your experiment was properly randomly assigned, and you had a large enough sample size (e.g. GG&L experiment) this shouldn’t be a problem.
- Compliance refers to whether experimental subjects actually take the treatment if they were assigned to (and don’t if they weren’t assigned to.)
- How could this happen?
Groups sneak into the treated group (e.g. the job training program) or the decline to take the treatment (e.g. they are assigned to come to a class to learn about statistics, but sadly, do not show up thereby missing their one shot at greatness!)
- What’s one solution if you find you have non-compliance? Run an intention-to-treat model – compare the groups’ by assignment, regardless of whether they took the treatment.
- What should we expect this to do to our ATE?
Shrink it towards zero. As non-compliance grows, ATE gets closer to zero. This is like noise.
Designing your treatment and control units in advance to ensure you will end up with balance in your covariates, particularly important confounders. This is most important for small sample sizes.
What is the overarching puzzle that Gerber, Green and Larimer want to answer?
“To what extent do social norms cause voter turnout?" (p. 33)
What is our independent variable? Social norms What is our dependent variable? Electoral turnout
What are our hypotheses?
H0: Social norms do not shape electoral turnout.
HA: Social norms increase (or decrease) electoral turnout.
Who make up the experimental sample? 180 002 households in the State of Michigan
How many treatment conditions are there in this experiment? There were four separate treatments (CIVIC DUTY, HAWTHORNE, SELF, NEIGHBORS).
Why is the CIVIC DUTY treatment a "baseline measure? ...because it is common to all treatment mailings!
What are the main differences between the HAWTHORNE, SELF, and NEIGHBORS treatments?
They progressively increase the level of social pressure. HAWTHORNE makes people aware they are being studied, SELF includes info on an individual’s voting records, NEIGHBORS also includes information on the voting records of neighbors
What is the control condition?
The control group did not receive ANY mailers.
Why do we need a control group in this experiment?
Natural experiments occur when a researcher identifies a situation in which values of the independent variable have been determined by a random, or at least exogenous, process.
Although quantitative research can answer many questions, it has limits. What are some limits?
- Data limitations: Sometimes we can’t / don’t have quantitative measures, or the data is poor quality.
- Causal inference: We can’t create an experiment, so cross-sectional regression results may not be robust.
- Outliers: We may want to understand why some cases don’t fit the overall pattern or relationship in the data.
- Variable-oriented: Flattens the world into variables.
- Theory-driven: Imposes the researcher’s categories onto the world rather than allowing the world to tell the researcher new ideas.
- Mechanisms: Quantitative work is not very good at identifying how X caused Y.
intensive study of a single spatial and temporal a
phenomenon; "within-case analysis"
study of several cases to compare a
phenomenon across space and time; "between-case (cross-case) analysis"
Seawright & Gerring provide us with a range of case selection options. What are they?
Random (they argue against this, in favor of purposive sampling) Typical
Most similar (for cross-case) Most different (for cross-case)