Tutorial 1: Introduction & Linear Regression Model Flashcards
How does a OLS model (for all I = 1,…,N observations) look like?
y = X β + ε where….
- ….y = vertical vector = (y₁ … yₙ)’ = N x 1
- ….X = matrix of observations = N x K
- ….β = vertical vector = (β₁ … βₖ) = K x 1
- ….ε = vertical vector = (ε₁ … εₙ)’ = N x 1
What are the OLS residuals?
e = ^ε = y − ŷ = y − X β̂
Sum of squared residuals?
Σeᵢ² = e’e = (y - Xβ̂)’(y - Xβ̂)
What’s the OLS estimator β̂ₒₗₛ?
arg min [β̂] (y - Xβ̂)’(y - Xβ̂) = (X’X)⁻¹X’y
What is the OLS Estimator (conceptually)?
OLS estimator minimizes the sum of squared residuals
What is K?
of parameters = # rows in β = # columns in X
What is N?
sample size = # rows of X = # rows of Y = # rows of ε
What is the relationship between K and N in order to be able to estimate β? What if that is not the case?
k =< N
Otherwise, other estimators like lasso or ridge must
be used.
How can you interpret a coefficient?
how the linear prediction of Y changes if we increase variable x by one unit, holding the other variables fixed.
What is the variance?
Variance (σ²):
measurement of the spread between numbers in a data set -> it measures how far each number in the set is from the mean and therefore from every other number in the set
How does the variance of β̂ₒₗₛ look like (in matrix form)?
K × K variance-covariance matrix (K= number of parameters in the model):
How can β̂ₒₗₛ be rewritten in terms of β?
How can the variance of β̂ₒₗₛ be simplified?
What is the variance of the error term in a heteroskedastic OLS model?
Variance of the error term is NOT constant across i , depends on xᵢ:
What is the variance of the error term in a homoskedastic OLS model?
Variance of the error term is constant for all i , does not depend on xᵢ :
What condition do we assume to hold for both homoskedastic and heteroskedastic models?
We always assume that error terms of two observations are not correlated:
What are the consequences of heteroskedasticity?
Consequences of heteroskedasticity:
- if the other OLS assumptions are fulfilled, ˆβ is still unbiased + consistent
- ˆβ is not efficient (not BLUE any more), others like GLS will be better (= more efficient)
- non-robust estimator Var(ˆβ|x) is not consistent estimator of Var(β|x) any more
How can the variance of ˆβ be simplified with homoskedasticity?
What is a consistent estimator of the variance of ˆβ under homoskedasticity?
What are solutions to heteroskedasticity?
- Heteroskedasticity-robust standard errors: only ˆβ will then still not be the most efficient estimator
- GLS (= OLS on transformed data)
How does heteroskedasticity-robust variance look like?
can not be simplied further. The non-robust standard errors will be inconsistent!
What is the sandwhich estimator?
- proposed by White (1980)
- based on squared OLS residuals eᵢ²
- consistent under all forms of heteroskedasticity (also homoskedasticty!)
- = “hetereoskedasticity-robust standard errors”’
What does it mean for an estimator to be consistent?
An estimator is consistent if, as the sample size increases, the estimates (produced by the estimator) “converge” to the true value of the parameter being estimated. To be slightly more precise - consistency means that, as the sample size increases, the sampling distribution of the estimator becomes increasingly concentrated at the true parameter value.
What does it mean for an estimator to be unbiased?
An estimator is unbiased if, on average, it hits the true parameter value. That is, the mean of the sampling distribution of the estimator is equal to the true parameter value