Flashcards in Midterm Deck (66):

1

## Dependent Variable

###
Outcome we are interested in

Depends on the other variable

Goes on the y-axis

2

## Independent Variable

###
Intervention or treatment

The "cause"

Goes on the x-axis

3

## Confounder

### Something related to both the IV and DV

4

## Comparability

### in absence of treatment

5

## Treated Group

### those who et some treatment of interest

6

## Control Group

### Those who do not get the treatment of interest

7

## Observational Study

###
General term for research where you don't get to randomize who get the treatment

Instead you just observe some relationship in the world

8

## Experimental Study & Randomized Control Trial (RCT)

###
common terms for research designs in which you do randomize who gets the treatment

Typically you can make causal claims from experimental studies

9

## Quasi-experimental research

### research in which you have observational data, but you find ways to ensure that the treatment was effectively randomly distributed

10

## Internal Validity

### Is the experiment well designed? Is it free from confounders or bias?

11

## External Validity

### Is the finding generalizable to other populations, situations or cases? Does it apply outside of the context in which the finding was generated?

12

## Problems with experiments

###
Not everything can be randomized (democracy, gender)

Not everything should be randomized (wars, right to vote and medication for birth defects)

Ethical dilemma: randomized treatment means denying treatment to some but not randomizing means we don't really know if its effective or not

Running experiment is expensive

13

## Yi

### dependent variable, outcome variable, the thing we want to predict

14

## Xi

### independent variable, the thing that predicts the DV

15

## Ei

###
error term

part of the DV or IV doesn't explain

everything NOT in our model

16

## β1

###
slope coefficient

relationship between X & Y

Indicated how much change in Y is expected if X increases by 1 unit

17

## β0

###
constant

Value of Y when X is zero (intercept)

Indicates where the regression line crosses the Y-axis

Value of Y when X is 0

18

## Endogeneity

###
the IV is correlated with the error term

confounder: this means that there is another unmeasured variable (a confounder) that affects the IV which also affects the DV

We haven't included this other confounding variable in our model

19

## To get our casual estimate

### we need to create exogeneity

20

## Exogeneity

###
IV is unrelated to or uncorrelated with the error term

Accomplish this through random assignment

21

## Randomness

###
noise in the data

could go away with larger sample sizes

address some of these concerns with t-tests (p-values) and confidence intervals

22

## Randomization

### using a coin toss to create treatment and control groups which creates exogeneity

23

## Population

### the overall collection of individuals, beyond just the sample

24

## Sample

### the collection of individuals on which statistical analyses are performed, and from which general trends for the population are inferred

25

## Individual

###
also called an "object" or "unit"

a single data point contributing to the sample

26

## Mean

###
average of a variable

(X bar)

27

## Minimum

### lowest value of a variable (min)

28

## Maximum

### the highest value of a variable (max)

29

## Sample size

### the number of observations (N)

30

## Standard deviation

### how widely dispersed the values of the variable are (population, sample or standard deviation)

31

## Probability Theory

###
Population distribution (Y)

---Estimand/Parameter

Then goes to sample (Y1, Y2,...Yn)

Then to Estimator/Statistic g(Y1, Y2,...Yn)

Finally to Estimate and back to population distribution

32

## Expectation

### the best guess about what number will be drawn from the distribution

33

## Variance

### how far the numbers you drew tend to be from the best guess

34

## E[X]

###
something that exists for any random variable

if you could draw repeatedly from a distribution and take an average, it would get closer and closer to the expectation

35

## Variance

###
measure of spread, or how fast you expect a random draw to be from the expectation

E[X-E[X]^2]

draw a bunch of numbers from the same distribution, squared the difference from each to the mean and averaged those

36

## E[X] & Var[X]

### are properties of the distribution of X, not your data

37

## Normal Distribution

###
X~N (E [X], Var(X))

X ∼ N (µ, σ2), where µ is the expectation and σ 2 is the variance.

38

## Sample Mean

###
The Sample mean of X1, X2…XN is

X bar = 1/N Xi = 1/N [X1 + X2 + ... + XN]

This is an example of an estimator

39

## Law of Large Numbers

###
For Rv's X1, X2...Xn the mean (x bar) gets closer and closer to E[X]

As N grows the estimator gets better

40

##
Distribution of the mean

Properties

###
Property 1: the means distribution is centered around E[X}

You only get a mean once, but you know it takes a value from a distribution centered on E[X]

We don’t know E[X] (though we know X bar gets closer and closer as N grows)

Property 2: variance of the mean is V(X bar) which is the Var(X) divided by N

• So we can estimate Var(Xbar) =Varbar/N

41

## How is the mean distributed

###
o There is one more piece to understanding the distribution of the mean: we know its expectation and variance, but what about the shape?

o The shape must depend on the distribution of the underlying X, you would think

Fortunately it does not

o Property 3: Central limit theorem (CLT): the distribution of the mean tends toward a normal distribution

This is magical: regardless of how the original X is distributed, when you take the mean of multiple RVs drawn from the same distribution, it starts to look normal

You do need N to be big enough for this to work, but that’s often not a problem

We will discuss some rules for deciding if N is big enough and adjustments to use when it is not

As N increases, the distribution starts to turn into a triangle shape, with the peak in the middle.

The more N increases, the width of each box gets smaller

o Theoretical understanding of distribution

Take the N=50 case

We can estimate the center using X bar which is 0.5

We can estimate the variance of X bar using (1/N) Varbar(X)

We know the shape of the distribution of the mean is normal

Using all this, we estimate that the mean should be distributed

• Xbar ~ N(µ = 0.5, σ2 = .08/N)

42

## Covariance

###
the mean value of the product of the deviations of two variables from their respective means

measure of the join variability of two random variables

IF the greater values of one variable mainly correspond with the grader values of the other variable, and the same holds for lesser values, the variables tend to show similar behavior, the covariance is positive

Positive association: when X is higher, we expect Y is usually higher

Negatively associated when X is higher, we expect Y is usually lower

Not associated: when X is higher it doesn't tell us anything about Y

Problem with covariance: scale is not very natural

43

## Correlation

###
statistical technique that shows whether and how strongly pairs of variables are related

Correlation ranges between -1 & 1

A perfect positive relationship has Cor(X,Y) = 1

A perfect negative relationship has Cor(X,Y) = -1

Two perfectly unrelated variables have Cor (X,Y) = 0

44

## When we divide by the standard deviations

### We are in effect standardizing the covariance and rescaling it from -1 to 1.

45

## Correlation only sees

### linear relationships

46

## Regression logic

###
o How can we best figure out this relationship between X & Y

o Suppose we guess the slope and intercept. How do we assess whether it is a good guess for the relationship between X & Y?

o We use the sum of squared errors (the residuals added up and square) to see how well we are doing

o It turns out the best way to estimate this relationship is to choose our slope and intercept for X (slope and intercept) to minimize the value: the sum of squared errors.

o The residual is the distance between the line and any given point

o The SSE takes those residuals, squares them, and adds them up

o The regression shows us the best fitting line in terms of sum of squared errors

47

## Standard Error

###
o If we took another sample of data from the same source and estimated βˆ.

o SE (βˆ) estimates the standard deviation for that distribution of βˆ

48

## Compare modeled outcome to the simple mean

### We can think of regression as a prediction machine that tells us our best guess of Y (growth) given our knowledge of X (yearsschool) for an observation

49

## Understanding the variance explained or R squared

###
Spread of points around the regression line should be smaller than the spread of points around the mean line

If we average up the squared distances around the mean we get the variance of Y.

If we average up the squared distances around the regression line we get mean squared error for the regression

We were trying to minimize this form of prediction error in our choice of Beta.

50

## Causal Inference

###
o Just like difference in means reflected an association, correlations, covariance, and regression coefficients only reflect an observed relationship in the data

The setup makes us focus on variation in IV as an explanation for DV

But those countries that are higher or lower on IV are probably higher on other things

These are things may be why DV differs not just from IV

In terms of confounders

51

## What is bias

###
Unbiased estimate: on average, our estimate is equal to the true parameter

Biased estimate: our coefficient is systematically wrong, either too high or too low than the true parameter

52

## Omitted Variable Bias

###
o OLS does not necessarily create unbiased estimates

o Omitted variable bias: this a specific form of endogeneity

X is correlated with something else that influences Y; the error term is correlated with Y

This is often the reason why, if you change the model specification, your estimates change. We say, our model is not robust to alternative specifications

Therefore our model is missing a key confounder, and we haven’t estimated a causal relationship

Theoretically, you could include this confounder in your model (multi-variable regression) but this is often hard to do: maybe you don’t know what the confounder is, or you don’t have data on it.

53

## Homoscedasticity

### when the random variable, X, HAS the same variance for all observations of X. This isn’t a problem

54

## Heteroscedasticity

###
when the random variable, X, DOES NOT have the same variance for all observations of X. This is a fixable problem

o Remember this only affects our standard errors, not our slope estimate. Therefore it doesn’t cause bias

o So what the solution? We use slightly different estimator for our standard error calculations. Intuitively, we estimate the variance in our standard errors.

55

## Outliers

###
o An observation that is extremely difference from the rest of the observations in the sample. “One of these things is not like the others

o Reminder, what does an outlier do to our estimate of the mean? It drags it toward the outlier. The mean is sensitive to outliers

o Intuitively, then what would outliers do to our regression estimates? It would also drag our estimate of the slope towards the outlier.

56

## Two sample tests:

###
o How can we compare one sample to another sample to ask: “Are these samples from the same population or different populations?

We call this two sample tests, comparing the samples

• We will try to understand how likely we are to observe some difference if there really is not a difference

• That will require thinking about a null distribution, describing how “weird” or result is if there really is no difference (our null hypothesis)

57

## Difference in means

###
o The fundamental quantity of interest today will be the difference in means between two groups on some outcome

Difference in mean income across two states

Difference in voter turnout among two groups of people (Students vs employed)

Difference in probability of war in two groups of countries (autocracies vs democracies)

Difference in proportion of heads after tossing one coin N1 time and another coin N2 times.

58

## One sided vs. Two sided tests

###
Two sided: is one group different, either bigger or smaller than one?

One sided: is one group bigger (smaller) than the other

59

## Hypothesis

###
o A good null: H0: VoteR = VoteD

o Alternative: VoteR ≠ VoteD

o We find that:

Among Republicans, 20/25 report they will vote. (VoteR=0.8)

Among Democrats, 22/35 report they will vote (VoteD=0.63)

Thus VoteR – VoteD = 0.17

What is the key thing we need to assess how “weird” this result is?

60

## What is a p-value?

###
o A p-value is the probability of observing a difference in mean or a coefficient as big as what we observed, if the null were true.

o First, you need to decide is this a one-sided or two-sided test?

o Then you need to set your critical value- how different would I want these two distributions to be before I conclude they probably weren’t from the same distribution?

o Based on this critical value, you can say whether it seems like these two groups are significantly different

61

## For two sample test with a difference in means

###
Null Hypothesis: difference in means between two groups is zero

Alternative: the two samples are probably from different groups

Critical value: how certain do I want to be that they’re really different? Often 1.96

P-value: how often you would get a result at least as extreme as you got under the null. Usually we set the cut-off for significance level to 0.05-1 out of 20 cases

62

## Hypothesis testing with regression

###
o What if we wanted to know if the relationship we found between X and Y was real, versus we saw it just by chance?

o It could be that the sample we draw shows a relationship between X & Y but if we had a slightly different sample we wouldn’t see any relationship

o We need to set decision criteria: how different from zero do we want the relationship to be before we decide we’ve really uncovered a real relationship. We call this the critical value.

63

## Hypothesis testing with regression

###
o Null: β1 = 0- there is not relationship between X and Y

o Alternative: β1 ≠ 0- there is a relationship between X and Y

o How would we set our cut off for making sure we don’t make mistakes? Pick our critical value

64

## The standard normal distribution

###
o Up to 1.64 SD gets 95% confidence interval

o -1.64 to 1.64 SDs gets central 90% confidence interval

o Up to 1.96 SDs gets 97.5% confidence interval

o -1.96 to 1.96 SDs gets central 95% confidence interval

65

## Hypothesis Testing

###
o State null hypothesis

o Determine if hypothesis test is one sided or two sided

o State alternative hypothesis

o Run Regression

o Test statistic: take coefficient and standardize it

Divide coefficient by standard error

We do this so we can apply coefficient to normal distribution

This will let us know how likely it would be that we got this coefficient just by chance (if the RA scrambled our data on accident)

o Decide on a critical value (typically 1.96 for two sided test)

o State whether we can reject the null in favor of the alternative

If critical value < |test statistic|: we reject the null

If critical value > |test statistic|: we fail to reject the null

66