Flashcards in Midterm Deck (90)

Loading flashcards...

1

## u

### the variation in yi that is not captured or explained by xi → this could include both unsystematic predictors of yi (e.g., a job application randomly landing on the top or bottom of a stack of other applications) and systematic determinants of yi (e.g., years of prior experience) that are omitted from the model

2

## PRF vs. SRF

###
PRF: E(yi) =β0+β1xi

SRF: yˆi = βˆ0 + βˆ1xi

3

## Errors and residuals for SRF

###
Notice that there is no estimate of uˆi for the SRF because yˆi, by definition, is the regression line (same logic applies to the PRF)

The estimates of the errors, which are called the residuals, are the differences between observed values of yi and the predicted values yˆi :

uˆ i = y i − yˆ i = y i − βˆ 0 − βˆ 1 x i

4

## OLS

###
OLS is the most commonly used estimator in the social sciences (to find beta)

OLS will be our workhorse estimator in this course

OLS obtains estimates of the “true” population parameters β0 and β1, which we typically do not observe

The logic of the OLS estimation procedure: choose βˆ0 and βˆ1 that minimize the sum of squared residuals: uˆ2

5

## Why minimize the sum of squared residuals, instead of the sum of residuals or the absolute value of residuals?

###
If we use the sum of residuals, then residuals with different signs but similar magnitudes will cancel each other out

Minimizing sum of absolute values is a viable alternative but does not generate formulas for the resulting estimators

6

## Relationship between PRF and SRF through residuals

### y i = yˆ i + uˆ i

7

## SST

### Total Sum of Squares (SST): Measure of sample variation in y

8

## SSE

###
Explained Sum of Squares (SSE): Measure of the part of y

explained by x

9

## SSR

### Residual Sum of Squares (SSR): Part of the variation in y unexplained by x

10

## R^2 and magnitude of relationship between y and x

###
As a measure of correlation, R2 should not be confused with the magnitude of the relationship between a DV and IV

You can have a bivariate relationship that has a high R2 (i.e., high correlation), but that has a slope that is close to 0

You can also have a bivariate relationship with a low R2 (i.e., low correlation), but that has a slope that is high in magnitude

11

## What changes when you transform a regressor?

### Bottom line: if we transform a regressor, then only the slope coefficient for that regressor is transformed

12

## What happens if the relationship between wage and education is non-linear?

###
These patterns can be nicely modeled by re-defining the dependent and/or independent variables as natural logarithms

The linear regression model must be linear in the parameters, but not necessarily linear in the variables, so logging Y or X shouldn’t violate our requirement of a linear relationships between the dependent variables and its determinants

13

##
Linear regression models?

y = β0 + β1x + u

log(y) = β0 + β1x + u

log(y)= β0+β1log(x)+u

y = log(β0+β1x+u)

e^y = β0+β1 √x+u

y = β0+ (β1x1)/(1 + β2x2) + u

###
y = β0 + β1x + u Yes

log(y) = β0 + β1x + u Yes

log(y)=β0+β1log(x)+u Yes

y =log(β0+β1x+u) Yes

e^y =β0+β1 √x+u Yes

y=β0+ (β1x1)/(1 + β2x2) + u No

1 If we exponentiate both sides of this equation, we get: e^y = β0 + β1x + u, which is linear in the parameters

14

## wage= β0 + β1educ + u " "

###
1 additional year of education is associated with an increase in wages

of β1 units

15

## wage= β0 + β1log(educ) + u " "

###
1% increase in education is associated with an increase in wages of β1/100 units

decreasing returns

16

## log(wage)= β0 + β1educ + u

###
1 additional year of education is associated with a (100 ∗ β1 )% increase in wages

increasing returns

17

## log(wage)= β0 + β1log(educ) + u

### 1% increase in education is associated with a β1% increase in wages

18

## Assumption 1

###
Linearity in the parameters

the population model can be non-linear in the variables but must be linear in the parameters

19

## Assumption 2

###
Random Sampling

individual observations are identically and independently distributed (i.e., observations are randomly selected from a population such that each observation has the same probability of being selected, independent of which other observations were selected.)

20

## Assumption 3

###
Sample Variation in the explanatory Variable

the sample standard deviation in xi must be greater than 0 (need some variance in order to get an estimate)

21

## Assumption 4

###
Zero Conditional Mean

E(u | X) = 0

if it holds, then the error term u is uncorrelated with the regressor X

this assumption is usually the biggest area of concern in empirical analysis

22

## Assumption 5

###
Homokedasticiy assumption

Var (u | X) = sigma^2

the variance of the unobservable error term, conditional on x, is assumed to be constant

Var(u) is independent of x

23

## If assumptions 1-4 hold....

###
the OLS estimator is unbiased, meaning that on average, E(beta-hat) = beta

24

## If assumptions 1-5 hold

###
if 1 - 5 hold, then we can derive a formula for the variance of the coefficient estimates, Var(beta-1)

25

## What drives the variance of the OLS slope estimate? What makes it more precise?

###
the lower the variation in the errors

or the greater the var in the indept variable

or the greater the same size (relatedly b/c sample variation increases with sample size)

then the more precise the OLS estimates, on average

26

## Standard errors measure... relationship with percision

###
precision or efficiency of the estimate beta1-hat

se-hat (beta1-hat) is lower (i.e. more precise) when:

the residuals are small

the variation of the independent variable is large

the number of observations is large

27

## Motivation for multiple regression analysis

###
1) controlling for other factors:

even if you are primarily interested in estimating one parameter, including others in the regression will control of potentially confounding factors, the zero conditional mean assumption is more likely to hold

2) better predictions:

more independent variable can explain more of the variation in y, meaning potentially higher R^2

3) Estimating non-linear relationships:

by including higher order terms of a variable, we can allow for a more flexible, non-linear functional form between the dependent variable and an independent variable of interest

4) Testing joint hypotheses on parameters:

can test whether multiple independent variables are jointly statistically significant

28

## How does OLS make estimates?

### minimizes the sum of the squared residuals, combo of all the betas that gives the lowest sum of squared residuals

29

## How do you isolate the variation that is unique to x3?

###
regress x3 on all the other regressors and obtain the residuals, e would contain the variation in x3 not explained by the other regressors from the initial population model, effectively holding all else constant

then we conduct a bivariant regression of y on e-hat

30