Midterm Flashcards Preview

PS 15 > Midterm > Flashcards

Flashcards in Midterm Deck (66):

Dependent Variable

Outcome we are interested in
Depends on the other variable
Goes on the y-axis


Independent Variable

Intervention or treatment
The "cause"
Goes on the x-axis



Something related to both the IV and DV



in absence of treatment


Treated Group

those who et some treatment of interest


Control Group

Those who do not get the treatment of interest


Observational Study

General term for research where you don't get to randomize who get the treatment
Instead you just observe some relationship in the world


Experimental Study & Randomized Control Trial (RCT)

common terms for research designs in which you do randomize who gets the treatment

Typically you can make causal claims from experimental studies


Quasi-experimental research

research in which you have observational data, but you find ways to ensure that the treatment was effectively randomly distributed


Internal Validity

Is the experiment well designed? Is it free from confounders or bias?


External Validity

Is the finding generalizable to other populations, situations or cases? Does it apply outside of the context in which the finding was generated?


Problems with experiments

Not everything can be randomized (democracy, gender)
Not everything should be randomized (wars, right to vote and medication for birth defects)
Ethical dilemma: randomized treatment means denying treatment to some but not randomizing means we don't really know if its effective or not
Running experiment is expensive



dependent variable, outcome variable, the thing we want to predict



independent variable, the thing that predicts the DV



error term
part of the DV or IV doesn't explain
everything NOT in our model



slope coefficient
relationship between X & Y
Indicated how much change in Y is expected if X increases by 1 unit



Value of Y when X is zero (intercept)
Indicates where the regression line crosses the Y-axis
Value of Y when X is 0



the IV is correlated with the error term
confounder: this means that there is another unmeasured variable (a confounder) that affects the IV which also affects the DV
We haven't included this other confounding variable in our model


To get our casual estimate

we need to create exogeneity



IV is unrelated to or uncorrelated with the error term
Accomplish this through random assignment



noise in the data
could go away with larger sample sizes
address some of these concerns with t-tests (p-values) and confidence intervals



using a coin toss to create treatment and control groups which creates exogeneity



the overall collection of individuals, beyond just the sample



the collection of individuals on which statistical analyses are performed, and from which general trends for the population are inferred



also called an "object" or "unit"
a single data point contributing to the sample



average of a variable
(X bar)



lowest value of a variable (min)



the highest value of a variable (max)


Sample size

the number of observations (N)


Standard deviation

how widely dispersed the values of the variable are (population, sample or standard deviation)


Probability Theory

Population distribution (Y)
Then goes to sample (Y1, Y2,...Yn)
Then to Estimator/Statistic g(Y1, Y2,...Yn)
Finally to Estimate and back to population distribution



the best guess about what number will be drawn from the distribution



how far the numbers you drew tend to be from the best guess



something that exists for any random variable
if you could draw repeatedly from a distribution and take an average, it would get closer and closer to the expectation



measure of spread, or how fast you expect a random draw to be from the expectation


draw a bunch of numbers from the same distribution, squared the difference from each to the mean and averaged those


E[X] & Var[X]

are properties of the distribution of X, not your data


Normal Distribution

X~N (E [X], Var(X))

X ∼ N (µ, σ2), where µ is the expectation and σ 2 is the variance.


Sample Mean

The Sample mean of X1, X2…XN is

X bar = 1/N  Xi = 1/N [X1 + X2 + ... + XN]

This is an example of an estimator


Law of Large Numbers

For Rv's X1, X2...Xn the mean (x bar) gets closer and closer to E[X]

As N grows the estimator gets better


Distribution of the mean

Property 1: the means distribution is centered around E[X}
 You only get a mean once, but you know it takes a value from a distribution centered on E[X]
 We don’t know E[X] (though we know X bar gets closer and closer as N grows)

Property 2: variance of the mean is V(X bar) which is the Var(X) divided by N
• So we can estimate Var(Xbar) =Varbar/N


How is the mean distributed

o There is one more piece to understanding the distribution of the mean: we know its expectation and variance, but what about the shape?
o The shape must depend on the distribution of the underlying X, you would think
 Fortunately it does not
o Property 3: Central limit theorem (CLT): the distribution of the mean tends toward a normal distribution
 This is magical: regardless of how the original X is distributed, when you take the mean of multiple RVs drawn from the same distribution, it starts to look normal
 You do need N to be big enough for this to work, but that’s often not a problem
 We will discuss some rules for deciding if N is big enough and adjustments to use when it is not
 As N increases, the distribution starts to turn into a triangle shape, with the peak in the middle.
 The more N increases, the width of each box gets smaller
o Theoretical understanding of distribution
 Take the N=50 case
 We can estimate the center using X bar which is 0.5
 We can estimate the variance of X bar using (1/N) Varbar(X)
 We know the shape of the distribution of the mean is normal
 Using all this, we estimate that the mean should be distributed
• Xbar ~ N(µ = 0.5, σ2 = .08/N)



the mean value of the product of the deviations of two variables from their respective means

measure of the join variability of two random variables

IF the greater values of one variable mainly correspond with the grader values of the other variable, and the same holds for lesser values, the variables tend to show similar behavior, the covariance is positive

Positive association: when X is higher, we expect Y is usually higher
Negatively associated when X is higher, we expect Y is usually lower
Not associated: when X is higher it doesn't tell us anything about Y

Problem with covariance: scale is not very natural



statistical technique that shows whether and how strongly pairs of variables are related

Correlation ranges between -1 & 1

A perfect positive relationship has Cor(X,Y) = 1
A perfect negative relationship has Cor(X,Y) = -1
Two perfectly unrelated variables have Cor (X,Y) = 0


When we divide by the standard deviations

We are in effect standardizing the covariance and rescaling it from -1 to 1.


Correlation only sees

linear relationships


Regression logic

o How can we best figure out this relationship between X & Y
o Suppose we guess the slope and intercept. How do we assess whether it is a good guess for the relationship between X & Y?
o We use the sum of squared errors (the residuals added up and square) to see how well we are doing
o It turns out the best way to estimate this relationship is to choose our slope and intercept for X (slope and intercept) to minimize the value: the sum of squared errors.
o The residual is the distance between the line and any given point
o The SSE takes those residuals, squares them, and adds them up
o The regression shows us the best fitting line in terms of sum of squared errors


Standard Error

o If we took another sample of data from the same source and estimated βˆ.
o SE (βˆ) estimates the standard deviation for that distribution of βˆ


Compare modeled outcome to the simple mean

We can think of regression as a prediction machine that tells us our best guess of Y (growth) given our knowledge of X (yearsschool) for an observation


Understanding the variance explained or R squared

Spread of points around the regression line should be smaller than the spread of points around the mean line
 If we average up the squared distances around the mean we get the variance of Y.
 If we average up the squared distances around the regression line we get mean squared error for the regression
 We were trying to minimize this form of prediction error in our choice of Beta.


Causal Inference

o Just like difference in means reflected an association, correlations, covariance, and regression coefficients only reflect an observed relationship in the data
 The setup makes us focus on variation in IV as an explanation for DV
 But those countries that are higher or lower on IV are probably higher on other things
 These are things may be why DV differs not just from IV
 In terms of confounders


What is bias

Unbiased estimate: on average, our estimate is equal to the true parameter

Biased estimate: our coefficient is systematically wrong, either too high or too low than the true parameter


Omitted Variable Bias

o OLS does not necessarily create unbiased estimates
o Omitted variable bias: this a specific form of endogeneity
 X is correlated with something else that influences Y; the error term is correlated with Y
 This is often the reason why, if you change the model specification, your estimates change. We say, our model is not robust to alternative specifications
 Therefore our model is missing a key confounder, and we haven’t estimated a causal relationship
 Theoretically, you could include this confounder in your model (multi-variable regression) but this is often hard to do: maybe you don’t know what the confounder is, or you don’t have data on it.



when the random variable, X, HAS the same variance for all observations of X. This isn’t a problem



when the random variable, X, DOES NOT have the same variance for all observations of X. This is a fixable problem
o Remember this only affects our standard errors, not our slope estimate. Therefore it doesn’t cause bias
o So what the solution? We use slightly different estimator for our standard error calculations. Intuitively, we estimate the variance in our standard errors.



o An observation that is extremely difference from the rest of the observations in the sample. “One of these things is not like the others
o Reminder, what does an outlier do to our estimate of the mean? It drags it toward the outlier. The mean is sensitive to outliers
o Intuitively, then what would outliers do to our regression estimates? It would also drag our estimate of the slope towards the outlier.


Two sample tests:

o How can we compare one sample to another sample to ask: “Are these samples from the same population or different populations?
 We call this two sample tests, comparing the samples
• We will try to understand how likely we are to observe some difference if there really is not a difference
• That will require thinking about a null distribution, describing how “weird” or result is if there really is no difference (our null hypothesis)


Difference in means

o The fundamental quantity of interest today will be the difference in means between two groups on some outcome
 Difference in mean income across two states
 Difference in voter turnout among two groups of people (Students vs employed)
 Difference in probability of war in two groups of countries (autocracies vs democracies)
 Difference in proportion of heads after tossing one coin N1 time and another coin N2 times.


One sided vs. Two sided tests

Two sided: is one group different, either bigger or smaller than one?

One sided: is one group bigger (smaller) than the other



o A good null: H0: VoteR = VoteD
o Alternative: VoteR ≠ VoteD
o We find that:
 Among Republicans, 20/25 report they will vote. (VoteR=0.8)
 Among Democrats, 22/35 report they will vote (VoteD=0.63)
 Thus VoteR – VoteD = 0.17
 What is the key thing we need to assess how “weird” this result is?


What is a p-value?

o A p-value is the probability of observing a difference in mean or a coefficient as big as what we observed, if the null were true.
o First, you need to decide is this a one-sided or two-sided test?
o Then you need to set your critical value- how different would I want these two distributions to be before I conclude they probably weren’t from the same distribution?
o Based on this critical value, you can say whether it seems like these two groups are significantly different


For two sample test with a difference in means

 Null Hypothesis: difference in means between two groups is zero
 Alternative: the two samples are probably from different groups
 Critical value: how certain do I want to be that they’re really different? Often 1.96
 P-value: how often you would get a result at least as extreme as you got under the null. Usually we set the cut-off for significance level to 0.05-1 out of 20 cases


Hypothesis testing with regression

o What if we wanted to know if the relationship we found between X and Y was real, versus we saw it just by chance?
o It could be that the sample we draw shows a relationship between X & Y but if we had a slightly different sample we wouldn’t see any relationship
o We need to set decision criteria: how different from zero do we want the relationship to be before we decide we’ve really uncovered a real relationship. We call this the critical value.


Hypothesis testing with regression

o Null: β1 = 0- there is not relationship between X and Y
o Alternative: β1 ≠ 0- there is a relationship between X and Y
o How would we set our cut off for making sure we don’t make mistakes? Pick our critical value


The standard normal distribution

o Up to 1.64 SD gets 95% confidence interval
o -1.64 to 1.64 SDs gets central 90% confidence interval
o Up to 1.96 SDs gets 97.5% confidence interval
o -1.96 to 1.96 SDs gets central 95% confidence interval


Hypothesis Testing

o State null hypothesis
o Determine if hypothesis test is one sided or two sided
o State alternative hypothesis
o Run Regression
o Test statistic: take coefficient and standardize it
 Divide coefficient by standard error
 We do this so we can apply coefficient to normal distribution
 This will let us know how likely it would be that we got this coefficient just by chance (if the RA scrambled our data on accident)
o Decide on a critical value (typically 1.96 for two sided test)
o State whether we can reject the null in favor of the alternative
 If critical value < |test statistic|: we reject the null
 If critical value > |test statistic|: we fail to reject the null


Fearon and Laitin Dataset

o Research question: does ethnic fractionalization explain how long civil wars last?
 Dependent variable? Year of civil war
 Independent variable? Ethnic fractionalization
 Null hypothesis?
• There is no association between ethnic fractionalization and years of civil war
 One or two sided? Lets go with two-sided to be safe
 Alternative hypothesis?
• There is an association (either positive or negative) between ethnic fractionalization and years of a civil war
 Lets get test statistic
• Step 1: get β1 (coefficient for ethnic fractionalization)
• Step 2: Standardize β1
o Estimate/std. error
o For year of civil war: Estimate for the intercept (44.571), Std. error (2.079). t value (21.439), p-value (2e^-16)
o For ethfrac: Estimate (-9.830), std error (4.205), t-value (-2.338) , p-value (0.0207)
o For standardizing β1: (-9.830/4.205) = -2.338
 Which is your test statistic for ethfrac!
• Lets decide on a critical value: 1.96
• |test statistic| = 2.338
• Critical value = 1.96
• Therefore, critical value < |test statistic|
o In other words, we are very unlikely to see such a large estimate with this data due to chance. How unlikely? Such an estimate will come up only about 2% of the time
o We reject the null hypothesis
 Why do we use p-values?
• Intuitive: its just the probability of your coefficient being due to chance
• Easy short-hand: run the regression, if your p-value is below 0.05 reject the null. If not, you fail to reject the null
o Confidence intervals
 Given we have just one sample, and we know there is variability from just having one sample rather than the full population of data, how confident should we be in our results—our point estimate?
 What factors would make you more confident in your estimates? Larger sample size, smaller variance in data
 What factors would make you less confident in your estimates? Smaller sample size, large variance in your data
o Calculating confidence intervals
 As long as you have a relatively big sample size then you can use the following basic formula
 Notice this formula is also dependent on choosing your critical value