Chapter 5 - CLRM assumptions and testing Flashcards
Recall the assumptions for classical linear regression model CLRM
1) Residuals have expected value 0
2) Residuals have constant variance sigma^2
3) Residuals have 0 covariance
4) Residuals and variables have zero covariance
5) Residuals are iid and normally distributed
Considering the assumptions, what do we want to understand about them+
We need to detect violations of the assumptions.
We need to understand what violations do to our CLRM
We need to know some classic cases that violate the assumptions
What are the main outcomes of using a model even though the assumptions are violated?
1) Coefficient estimates can be wrong
2) the standard errors can be wrong
3) The distributions used for test statistics are wrong
Basically, all kinds of fucked up results
what do we call testing that concern itself with checking validity of assumptions?
diagnostic test
What alternatives do we have in regards to testing?
1) LM lagrange multiplier
2) Wald
shortly elaborate on LM testing
in this context, it will follow a chi squared distribitokn with degrees of freedom equal to the number of restrictions placed on the model
elaborate shortly on the Wald testing
distribution is F, and degrees of freedom is (m, T-k).
what can we say about LM and Wald
Asymptotically, they are the same. but results vary in small samples.
elaborate on the first assumption
E[u_t] = 0
This will never be violated if we include a constant term.
However, if we enforce the line to go through the origin, we could get bad results.
Basically, what happens is that the slope must accomodate for the shit poisition of the line. This can give a slope that is no where near accurate, but gets placed like that because it minimize the errors regardless.
What could happen is that the regression line that was fitted end up being a worse fit than a simple average. This would then create negative R^2
elaborate on detecting heteroscedasticity
It is difficult, because one rarely knows the shape, to use plotting methods.
However, there are statistical tests.
Goldfeld-Quandt: Split the total sample into subsamples. The regression model is estimated on each subsample. Compute residual variance (s^2) of both cases using the known formula for sample variance of regression. The null hypothesis is that there is no difference in these variances.
The test statistic is the ratio of sample variances. It is F-distributed.
the weakness of the Goldfeld-Quandt is the requirement of a good split. We typically use the split in regards to a known event, structural event.
One can also remove a larger central portion of the splitting point to make the split more evident.
what do we mean by heteroscedasticity?
anything that entails not having constant variance of the residuals. For instance, if the residual increase in magnitude along the positive axis of one of the explanatory variables, then we can have average error being 0, but still have difference in variance depending on variable values.
what term is used to explain assuption 2?
Homoscedasticity
elaborate on White’s test
Whites test is a test for heteroskedasticity.
It is useful because it makes no assumptions on the shape of the heteroskedasticity.
Assume we have a regular linear regression.
We want to test var(u_t) = sigma^2.
Estimate the model. Get the residuals.
Then we create an auxillary regression where the residuals are the dependent variable. then, as independent variables, we include squares, cross products etc. The goal is to see whether we can explain movement in residuals by using the variables.
then, we could use the F-test approach, but this require more regressions. LM approach is typically easier.
LM approach for White’s test is basically using the fact that if one or more of the parameters (coefficeints) are statistically significant, they will show this with their R^2 (R^2 of the regression). R^2 will be larger than the R^2 of the case with no statistically significant coefficeints.
We obtain R^2 from the auxiliary regression, and mutkiply it by the number of observations. This statistic is chi squared with m degrees of freedom, with null of joint 0 for all coefficients.
So, we want to see low values for the statistic, because this shows that there is no evidence indicating that the R^2 is large.
What happens to OLS if there is heteroskedasticity present?
we get unbiased estimators
what is assumption 3?
Assume no autocorrelation between residuals
what tests do we have for autocorrelation in CLRM?
1) Durbin Watson
2) Breusch Godfrey
elaborate on Durbin Watson
DW is a first-order autocorrelation test. Test only the first lag.
The idea of Durbin Watson is to use the residuals to create a new regression that basically check whether the coefficeints of the new regression are statistically significant or not. The null hypothesis is that teh coefficient is 0.
u_t = p u_(t-1) + v_t
We’re only testing for p.
Now, we do not actually have to run the regression. We have the values we need from running the original regression.
DW does not follow a distribution. Instead, it operates on a region, and use critical values.
DW is a test for whether consecutive errors are related. Sort of limited.
DW is a test to see whether consecutive errors are related. Can we do better?
Yes, with Breusch godfrey.
In theory, we could also use DW, but replace the lag-1 wiht all kinds of lags. but this is not practical.
Breusch Godfrey is a joint test for multiple lags at once.
1) Estimate OLS like always to find the residuals
2) Use the residuals to build an auxiliary regression. The explanatory variables are lagged residuals, but we also include the intercept, and the regular explanatory variables from the regualr regression. Using later knowledge, this is to remove dependencies from the residual.
3) Obtain R^2 from the new regression.
4) (T-r)R^2, where T is the number of observations and r is the order of lags, this shit is chi squared distributed with r degrees of freedom.
why do we multiply R^2 by (T-r) in Breusch-Godfrey, and not by just T as it is with regular heteroskedasticity?
when we use the lags up to order k, we lose k variables.
how do we deal with testing for normality of resiuduals?
Bera-Jarque test. BJ test.
elaborate on the Bera-Jarque test
It utilize the fact that the normal distribution is defined purely on its first 2 moments. Skewness and kurtosis is not changeable. skewness is always 0 and kurtosis is always 3.
We define excess kurtosis to be equa lto kurtosis less 3. Then, the test will test the joint hypothesis that both the skewness and the excess kurtosis is 0. If so, normality is assumed. If the test values show extreme values, it indicates that the result was highly unlikely to observe given the fact that the shit is normally distributed, which we use as evidence that it is not normalyl distributed.
what happens if we find that shit is not normally distributed?
it is not striaghtforward to know what to do.
If the sample is very large, this has nothing to do, andwe can use the model without worry.
SOmetimes, logging the variables can help.
Other times, some extreme outliers, that are not really representative of the pattern, will fuck up the test. we could remove these outliers to aid the model.
In general, removal of outliers is dangerous. Perhaps the outlier is not actually an outlier, but we just lack data in this region. we usually say that removal is only justifiable if we have some sort of evidence that suggests that the event was a one-off.
how can we remove outliers?
We can remove outliers by adding binary (dummies) variables only for the outlier.
elaborate on multicollinearity
implicit assumption made during OLS.
Explanatory variables cannot be correlated.
Multicollinearity refers to cases where there is a higher degree of correlation between variables.