Econometrics - OLS Flashcards
What is the general equation for an ordinary least squares (OLS)?
Yi = b0 + b1Xi + ui
What is the total sum of squares equal to?
explain sum or squares + residual sum of squares (ui)
What is the interpretation of the slope coefficient in OLS?
When all X variables are 0, Y will be the value of the slope coefficient
What is the interpretation of OLS when Y and X are in levels?
A one unit increase in X will lead to a b unit increase in Y
What are the 5 classical assumptions?
- Error has a zero conditional mean (implies no relationship between error and explanatory variables)
- Linear in the parameters (b1 can’t be a power of X)
- Error term has constant variance (homoskedasticity)
- Error terms cannot be correlated (serial correlation if correlation of errors is not zero)
- Independence of X and u for all periods
What are 3 practical assumptions?
- number of observations should be larger than the number of regressors/degrees of freedom
- X must take different values
- Normality of random error
What is the additional OLS assumption needed for time series data?
Strict exogeneity - the error term is uncorrelated with each explanatory variable in every time period, so that unbiased estimates are yieled
What is the Gauss Markov Theorem?
OLS is BLUE - Best Linear Unbiased Estimator
Best = minimum variance
Linear = can be proven mathematically that OLS yields results that are linear estimates
Unbiased = expected value of the estimated value would be equal to the true underlying value
What is the problem with the Gauss Markov Theorem?
Often faced with failing of 3 BLUE assumptions
What other property is relied on when Gauss Markov is violated?
Consistency - if you have a large sample and the variance of the estimator becomes smaller AND the value of the estimator approaches the parameter value, then the estimator is consistent
How is the t-stat for an coefficient estimate calculated?
estimate of coefficient / standard error of estimate of coefficient (*assuming a t-distribution with n-2 degrees of freedom)
When can you reject the null hypothesis using 5% sig level?
if p-value < 0.05 then reject null
What does R-squared show? How can you interpret its value?
Goodness of fit, if R-squared is large then best fit line “fits” sample data closely.
R-squared can be interpreted as % of variation explained by model e.g. 0.6 = 60% explained
How is R-squared calculated?
Explained sum of squares / total sum of squares
What is adjusted R-squared?
Takes into account number of observations and number of explanatory variables
What is the formula for the adjusted R-squared?
1-(1-R-squared)*((n-1)/(n-p-1))
How do you interpret a multiple linear regression (multiple OLS)
If X1 changes by 1 unit, then Y will change by b2 units holding all other X fixed
What is the general F-test reported by software?
Test on overall significance of all explanatory variables in regression
What test does a restricted/unrestricted model to test for whether additional variables have explanatory power use?
F-test
How do you calculate the F-stat?
F-stat = [(RSSr - RSSUu)d] / [RSSu / (n-k) ]
where r = restricted and u = unrestricted
What is multicollinearity?
High linear relationships between explanatory variables, movements in one X are closely matched by moves in other X
What are the problems with multicollinearity?
Isn’t possible to estimate effectively the separate effects of X variables, as standard errors become large.
Variances will be large, as will confidence intervals and have statistical insignificance
What is the difference between perfect and imperfect multicollinearity?
Perfect is where 2 Xs are exactly linearly related, it is v rare and usually due to a dataset compilation error. Imperfect is where Xs are linearly related to a high degree but not an exact relationship
What are the 2 ways you can detect multicollinearity?
Partial (pairwise) correlation coefficients OR variance inflation factors (VIFs)