Midterm Flashcards Preview

Econometrics > Midterm > Flashcards

Flashcards in Midterm Deck (90)
Loading flashcards...


the variation in yi that is not captured or explained by xi → this could include both unsystematic predictors of yi (e.g., a job application randomly landing on the top or bottom of a stack of other applications) and systematic determinants of yi (e.g., years of prior experience) that are omitted from the model



PRF: E(yi) =β0+β1xi
SRF: yˆi = βˆ0 + βˆ1xi


Errors and residuals for SRF

Notice that there is no estimate of uˆi for the SRF because yˆi, by definition, is the regression line (same logic applies to the PRF)

The estimates of the errors, which are called the residuals, are the differences between observed values of yi and the predicted values yˆi :

uˆ i = y i − yˆ i = y i − βˆ 0 − βˆ 1 x i



OLS is the most commonly used estimator in the social sciences (to find beta)

OLS will be our workhorse estimator in this course

OLS obtains estimates of the “true” population parameters β0 and β1, which we typically do not observe

The logic of the OLS estimation procedure: choose βˆ0 and βˆ1 that minimize the sum of squared residuals: uˆ2


Why minimize the sum of squared residuals, instead of the sum of residuals or the absolute value of residuals?

If we use the sum of residuals, then residuals with different signs but similar magnitudes will cancel each other out

Minimizing sum of absolute values is a viable alternative but does not generate formulas for the resulting estimators


Relationship between PRF and SRF through residuals

y i = yˆ i + uˆ i



Total Sum of Squares (SST): Measure of sample variation in y



Explained Sum of Squares (SSE): Measure of the part of y
explained by x



Residual Sum of Squares (SSR): Part of the variation in y unexplained by x


R^2 and magnitude of relationship between y and x

As a measure of correlation, R2 should not be confused with the magnitude of the relationship between a DV and IV

You can have a bivariate relationship that has a high R2 (i.e., high correlation), but that has a slope that is close to 0

You can also have a bivariate relationship with a low R2 (i.e., low correlation), but that has a slope that is high in magnitude


What changes when you transform a regressor?

Bottom line: if we transform a regressor, then only the slope coefficient for that regressor is transformed


What happens if the relationship between wage and education is non-linear?

These patterns can be nicely modeled by re-defining the dependent and/or independent variables as natural logarithms

The linear regression model must be linear in the parameters, but not necessarily linear in the variables, so logging Y or X shouldn’t violate our requirement of a linear relationships between the dependent variables and its determinants


Linear regression models?

y = β0 + β1x + u
log(y) = β0 + β1x + u
log(y)= β0+β1log(x)+u
y = log(β0+β1x+u)
e^y = β0+β1 √x+u
y = β0+ (β1x1)/(1 + β2x2) + u

y = β0 + β1x + u Yes
log(y) = β0 + β1x + u Yes
log(y)=β0+β1log(x)+u Yes
y =log(β0+β1x+u) Yes
e^y =β0+β1 √x+u Yes
y=β0+ (β1x1)/(1 + β2x2) + u No

1 If we exponentiate both sides of this equation, we get: e^y = β0 + β1x + u, which is linear in the parameters


wage= β0 + β1educ + u " "

1 additional year of education is associated with an increase in wages
of β1 units


wage= β0 + β1log(educ) + u " "

1% increase in education is associated with an increase in wages of β1/100 units

decreasing returns


log(wage)= β0 + β1educ + u

1 additional year of education is associated with a (100 ∗ β1 )% increase in wages

increasing returns


log(wage)= β0 + β1log(educ) + u

1% increase in education is associated with a β1% increase in wages


Assumption 1

Linearity in the parameters

the population model can be non-linear in the variables but must be linear in the parameters


Assumption 2

Random Sampling

individual observations are identically and independently distributed (i.e., observations are randomly selected from a population such that each observation has the same probability of being selected, independent of which other observations were selected.)


Assumption 3

Sample Variation in the explanatory Variable

the sample standard deviation in xi must be greater than 0 (need some variance in order to get an estimate)


Assumption 4

Zero Conditional Mean

E(u | X) = 0

if it holds, then the error term u is uncorrelated with the regressor X

this assumption is usually the biggest area of concern in empirical analysis


Assumption 5

Homokedasticiy assumption

Var (u | X) = sigma^2

the variance of the unobservable error term, conditional on x, is assumed to be constant

Var(u) is independent of x


If assumptions 1-4 hold....

the OLS estimator is unbiased, meaning that on average, E(beta-hat) = beta


If assumptions 1-5 hold

if 1 - 5 hold, then we can derive a formula for the variance of the coefficient estimates, Var(beta-1)


What drives the variance of the OLS slope estimate? What makes it more precise?

the lower the variation in the errors

or the greater the var in the indept variable

or the greater the same size (relatedly b/c sample variation increases with sample size)

then the more precise the OLS estimates, on average


Standard errors measure... relationship with percision

precision or efficiency of the estimate beta1-hat

se-hat (beta1-hat) is lower (i.e. more precise) when:

the residuals are small
the variation of the independent variable is large
the number of observations is large


Motivation for multiple regression analysis

1) controlling for other factors:
even if you are primarily interested in estimating one parameter, including others in the regression will control of potentially confounding factors, the zero conditional mean assumption is more likely to hold

2) better predictions:
more independent variable can explain more of the variation in y, meaning potentially higher R^2

3) Estimating non-linear relationships:
by including higher order terms of a variable, we can allow for a more flexible, non-linear functional form between the dependent variable and an independent variable of interest

4) Testing joint hypotheses on parameters:
can test whether multiple independent variables are jointly statistically significant


How does OLS make estimates?

minimizes the sum of the squared residuals, combo of all the betas that gives the lowest sum of squared residuals


How do you isolate the variation that is unique to x3?

regress x3 on all the other regressors and obtain the residuals, e would contain the variation in x3 not explained by the other regressors from the initial population model, effectively holding all else constant

then we conduct a bivariant regression of y on e-hat


betak-hat represents what " "
in terms of slope?

the partial association between y and xk holding x1, x2,..., xk-1 equal

beta-k--hat would be the slope of the multi deminsion plane along the x-k direction (ie the expected change in y when x1 increases by one unit, holding all other x's constant