Chapter 5 - Classical Linear regression assumptions and diagonistic tests Flashcards
(75 cards)
Recall the assumptions we make when doing the classical linear regression model
E[u_t] = 0 (residuals have expected value 0)
var(u_t) = sigma^2 < infinity
cov(u_i, u_j) = 0
cov(u_i, x_i) = 0
u_t is normally distributed N(0, sigma^2)
why do we need those assumptions?
They serve 2 primary purposes:
1) In order to show that OLS have a list of certain desirable properties, we needed to make these assumptions
2) Doing so helps us in hypothesis testing
what sort of questions are we interested in in this chapter?
How can we detect violations of the CLRM assumptions?
What are the most likely causes of violations in practice?
What happens if we choose to ignore a certain assumption, and continue with the model nonetheless?
name the two test statistic approaches that we use
Lagrange multipliers
Wald test
what do we need to know regarding the LM test packages and the Wald test?
LM test statistic in the context of diagnostic tests follow chi squared distribution with m degrees of freedom, where m is the number of constraints/restrictions placed on the model.
The wald version of the test follows an F-distribution with (m, T-k) degrees of freedom.
What can we say about comparisons of LM and Wald tests
asymptotically, their behavior is the same. But for smaller samples, there will be differences.
This comes as a result of how the F distribtion is naturally related to chi squared, and when the number of sample points T increase, the ratio of the chi squared variables in the F-distribution will make it converge to a regular chi squared variable.
what is actualyl a diagnostic test?
a test concerning the validity of a model
elaborate on the first assumption of CRLM
E[u_t] = 0
We assume that in our model, the average error is 0.
Is this reasonable?
For OLS, it is. This is because if we add a constant term to the regression line function, regardless of what the slope is, we can adjust the constant so that the average error can always be obtained as zero.
Possible problems occur when we force the slope through zero because of other reasons than 0-intercept being the constant that gives us mean error equal 0.
if we for some reason remove the intercept term from our model, these problematic cases might occur.
THE BIGGEST thing is perhaps that the sample mean will be better at explaining the variance about the mean as compared with the regression line. this makes R^2 negative, and we we undisirable results
elaborate on the problemtic cases that may occur as a result of omitting the intercept term in the CRLM
First of all, recall that R^2 can be defined as ESS/TSS, explained sum of squares divided by total sum of squares. ESS is simply the variation in the model around the mean level that is captured by the model. TSS includes the variation around the mean, but also the motion that the model does not capture. THe difference is called unexplained sum of squares.
If we remove the intercept term from the regression line, R^2 can be negative. An interpretation of this is that the sample average explains more of the variation in the process than the explanatory variables are able to do.
An even more negative consequence of not including the intercept, is that we risk having severe bias in the slope of the hyperplane. The actual relationship may be perfectly linear, but offset at a certain position in y. Without an intercept, we cannot model this correctly.
elaborate on homoscedasticity
constant variance assumption (error terms have constant variance)
what do we say if the second assumption (constant variance of residuals) is violated?
The variance of the error terms is heteroscedastic.
how can we test for heteroscedasticity?
The simplest method is “Goldfeld-Quandt” test.
Roughly speaking, the Goldfeld-Quandt test is about dividing the sample T into two subsamples. Then we compute the variance of each subsample. These random variables are chi squared etc.
Therefore, we can use the F-distribution to test whether the two variances are equal or not. The null hypothesis is that the variance is the same for both.
AS with the regular F-test, this test is about checking the ratio of the variances. If they are actually the same, the value should be within the critical region etc.
is the Goldfeld-Quandt test good?
It is decent, but has weaknesses. Most of the weaknesses is related to being dependent on the splitting point.
Shit can be done. For instance, some sort of shifflung not by using time, but by using a third variable that creates effects on the variance.
A second approach is to omit the center of the original sample T. This creates a separation space that can be beneficial.
Do we have alternatives to the Goldfeld-Quandt test for testing heteroscedasticity?
“White’s test”
elaborate on “White’s test”
We are testing for heteroscedasticity. Therefore, we want to see if the variance of the residuals can be explained in a way that would make it be suvbject to vary depending on the variables we use as explanatory variables etc.
We first obtain the residuals from regular regression.
Then we want to see whether the squared residuals (from our sample, essentially) can be described “additionally” (than a mean, constant) by our explanatory variables, squares of our explanatory variables, and cross products of the explanatory variables. Each of these terms have a parameter coefficient which is what White’s test is really interested in.
To understand why we square the independent variable (squared residuals we found from regular regression), we consider the formula for variance of the residuals:
var(u_t) = E[(u_t - E[u_t])^2]
var(u_t) = E[(u_t - 0)^2]
var(u_t) = E[u_t^2]
We find the expected value from the regression, but we have to square the residuals first. This is why we use squared residuals as the independent variable in the auxiliary regression.
After we perform the auxiliary regression, we have a choice of testing method. We could use the regular framework of F-distribution and F-test, where we’d find the RSS from restricted and unrestricted models etc. This would be done by considering a regression with only a constant as reressor (no variables).This would be the restricted model, while the other auxiliary regerssion is the unrestricted.
However, it is perhaps easier to use the lagrange approach. this centers on the value of R^2. Recall R^2 as explained variance divided by total variance. The idea is that if the auxiliary regression get a high R^2 value, it means that the variables are good predictors of the variance. This is “bad” because it would indicate that the variance is actually not constant, it correlates with variable value. So, we’d obtain the R^2 value, multiply by T, and then it can be shown that this follow a chi-squared distribution with m degrees of freedom, where m is the number of regressors in the auxiliary regression, excluding the constant.
what happens if the errors are heteroskedastic but we carry on none-theless
The estimators are still unbiased, but they are no longer the best we can get. The variance will be off.
Elaborate on dealing with heteroscedasticity
if the form of the heteroscedasticity is known, we can use GLS.
GLS can be considered weighted least squares. This is because for GLS, we are minimizing the weighted sum of squared residuals, whereas with OLS it is simply an unweighted sum of squared residuals that we are minimizing.
Regarding “the form of heteroscedasticity”: We are talking about knowing the variance as a funciton of something. For instance, if the variance of residuals is sigma^2 times z_t^2, we could remove the heteroscedasticity by dividing on z_t (all terms) in the regression model. This creates a weighted sum instead.
Other ways of dealing with heteroscedasticity is to transform the data into soemthing that makes the variance more constant. Perhaps log transfgorm. Logging has the effect of pulling down extreme values, and thus reducing variance.
what do we typically say if the residuals does not satisfy cov(uj, ui) =0?
They are not uncorrelated, which means that they are autocorrelated.
Another term for autocorrelation is “serially correlated”.
if we use differneces in variable values ∆y_t instead of the og values, what is important to remebmer?
We lose the first value. The first difference use the first and the second value, and the final difference use the final value and the second to final value. The result is that our sequence of values shrink by removal of the first value.
what is the first thing we can do to check for validity of cov(uj, ui)=0 assumpiton?
We fit hte model and acquire the residuals. then we plot the residuals against each other using the lag components.
When we plot the residuals’ lag components against each other, what do we wish to see?
A random scatter plot with no pattern
In practice, do we use the residual plot to make any decisions?
The plot is a first-phase sort of thing. it doesnt really provide any information other than intuition on the data.
The simplest thing we can do that is more professional, is the Durbin-Watson test.
elaborate on the Durbin-Watson test
A Durbin-Watsen (DW) test is a test for first-order autocorrelation.
First order autocorrelation means that it will only consider the first immediate lag variable of residual.
The null hypothesis will be that the autocorrelation between residual at time step t and residual at time step t-1 is 0. The alternative hypothesis is the two-tailed option of lag-1 autocorrelation being not zero.
The test statistic (the DW test statistic) is defined as:
DW = (∑(u_t - u_{t-1})^2 [t=2, T])/(∑(u_t)^2 [t=2, T])
So, DW is the ratio of: sum of squared differences of consecutive residuals, divided by the sum of squared residuals. Squared normally distirubted variables appear to be chi squared, which should give something like F-distribuiton.
IMPORTANT: All of the residuals here are the estimates that we get from fitting a model, and computing the residuals.
Recall that the denominator is simply the variance of the residuals, because the mean of them is expected to be 0.
The enumerator basically keeps track of how much correlation there is.
DW is approximately equal to = 2(1-p), where p is the estimated correlation coefficient from the formula: u_t = pu_{t-1} + v_t
The DW is weird, because it doesnt follow a statistical distribution. Instead, it works with critical values.
Firstly, if we use that DW =2(1-p), we get that if p=1, DW = 0.
if p=0, DW=2
if p=-1, DW = 4
thus, the case of no correlation is in the middle range.
The values DW use as critical values are listed in the book’s appendix.
Elaborate on Breusch-Godfrey test
The Breusch-Godfrey test is a more general test that test the autocorrelation up to k’th order (lag).
We pick an order, and use the following model as the baseline for the test:
u_t = p1 u_{t-1} + p2 u_{t-2} …. + pr u_{t-r} + v_t, vt is normally distributed with n(0, sigma^2).
The null hypothesis is that all of the correlations (the p’s) are 0. Thus, this test is trying to answer the question “Is there any autocorrelation among the first ‘r’ lags?”.
The alternative hypothesis is the sequence of logical OR’s of p_x != 0.
NOTE: When eprforming the test below, we add the X’s as well because this makes the test valid even if exogenity is not present. Thus, this test ONLY answers: Is there autocorrelation?
This means that we obviously need to check for correlation between residuals and explanatory variables as well later.
The test proceeds as follows:
Step 1)
perform regualr regression to obtain the residuals.
Step 2)
Perform a new type of regression, where the dependent variable is u_t, and the regressors include the original x’es from step 1 in addition to the regression line we discussed above, which consist of the correlations and the errors (lagged errors).
Then we obtain R^2 from this new modified regression.
Step 3)
If T is the number of observations, the test statistic given by:
(T-r)R^2 is chi-squared distributed with r degrees of freedom. Then can make a simple chi squared test.