Chapter 5 - Classical Linear regression assumptions and diagonistic tests Flashcards

Question

when is DW test valid?

Answer 1

There must be a constant term in the regression. The regressors must be non-stochastic. there must be no lags of dependent variabel.

Answer 2

firstly, outliers is difficult concept because it breach the assumptoin of normallity distributed error terms very easily. We can knock them out by adding dummy variables. A single dummy per outlier. We then name the dummy varaible something related to the outlier, so it is easy to keep track of it. of course, doing this also potentially fucks up the data and estimation.

Answer 3

Yes, the Breusch-Godfrey test.

Answer 4

2 primary reason: 1) Inertia. Soemtiems, it takes some time to react to certain events. time series are nice for this as it allows for correlation between such events. 2) Overreactions. Perhaps the market has a tendency to react very positively to good news.

Answer 5

Much of the same as the heteroscedasticity. The estiamtors will still be unbiased, but are no longer particualrily effcieint (they are not BLUE). As a result, the standard error estiamtes are not necessarily correct, and we can get bad inferences in regards to whether a variable is important or not etc.

Answer 6

If we know the form of the autocorrelation, we can use GLS. GLS procedures. One such procedure is called "Cochrane-Orcutt" procedure. We assume that the autocorrelation is produced by a specific process, typically an AR process. So we get something like this: We have the regular regression as always: y_t = b_1 + b_2 x_2t + b_3 x_3t + u_t but we also have the assumption: u_t = p u_{t-1} + v_t We first obtain the residuals, ignoring the assumption. Then we run the regression û_t = pû_{t-1} + v_t, to obtain an estimate ^p for p. A lot of more bullshit, cant be arsed

Answer 7

If the variables (independent variables) are actually independent, it means that there are no relationship between them, and they are orthogonal. If orthogonality apply, adding and removing variables from the regression will not change any of the parameter values of the other explanatory variables.

Answer 8

Even thouhg we assume independence among the explanatory variables, in rpactice there is almost always a certain amount of correlation. This is usually fine, and does not result in too much loss of precision. however, we need to control it, or at least make sure that the dependence among the so-called independent variables is not too large. If it is very large, undesirable shit can happen.

Answer 9

multicollinearity

Answer 10

we distinguish between 2 main types of multicollinearity: 1) Perfect multicollinearity 2) Near multicollinearity **Perfect multicollinearity** There is an exact relationship between two or more variables. In such a case, it is not possible to estimate all the coefficients in the model. Perfect multicollinearity is typically a case of a modeling mistake where we are basically including the same variable twice etc. For instance, if one variable is proportional to some other variable, including both in a regression model will make us attempt to estimate 2 parameters, but we only have information enough for one. technically, the issue lies in inverting the X'X matrix, since there is issues regarding independence among columns etc. **Near multicollinearity** Near multicollinearity is more likely to occur, and would arise when there is non-negligible relationships between variables. Obviously, relationships between the dependent and independent variables is not referring to multicollinearity.

Answer 11

Testing for it is surprisingly difficult. therefore, we only investigate the presence of it.

Answer 12

We take the explanatory variables, and create the correlation matrix. High correlation is an indication of multicollinearity. NOTE: This only manage to locate relationships among pairs of explanatory variables. Oblique cases are not covered.

Answer 13

VIF: Variance Inflation Factors. the idea is to represent how much more variance an explanatory variable coefficeint receive as a result of being correlated with some other explanatory variable. The VIF has more information, but i cant be arsed atm

Answer 14

Individual coefficeints will have high standard errors, but the R^2 will look good. Therefore, the regression might look good, but the individual variables are not significant.

Answer 15

One can attempt methods like PCA. However, such methods are complex, and might create models that are difficult to interpret and use. As a result, one tends to consider multicollinearity as a data issue rather than a modeling issue.

Answer 16

We need to perform a test that can give us an idea of whether the relationships are linear or not. Ramsey's RESET test will do.

Answer 17

We do regular linear regression. Then we extend to including the powers of the fitted results. This works because the power terms include a WIDE ass variety in cross-terms and powers of the various x_tt variables (since the entire regression is to the power). This allows us to capture basically a shit load of differnet orders of forms. Then we perform a test where we are lookng for the null hypothesis that alpha_2 to alpha_p are all 0 at the same time. If this is the case, the model is inherently linear. Or the underlying process is most likely linear. THen we find the R^2, and multiply by T. TR^2 test statistic follow a chi squared distribution as X^2 (p-1), where p was the maximum order of the hyptoehsis or whatever. Degrees of freedom is (p-1). The reason for p-1 and not just p, is because we use p power terms, but actually only use p-1 power terms because the case where the power equals 1 is not included in the new model.

Answer 18

the first thing is that the RESET test only say that something is wrong, it says nothing on what might be wrong. therefore, we dont really have a direction as to what kind of non-linear methods we should pursue. There is shit one can try. One can try differnet non linear methods. One can try log function to transform into a more linear data.

Answer 19

the estiamtes of the parameters for the variables we have included will be biased and inconsistent, unless the omitted variable is completely uncorrelated with the selected variables. even if the omitted variable is completely uncorrelated, the constant intercept term will be biased as a result of it. The result of this is basically just that the model will make inferences that are not completely true. It may be close though. however, if our goal is to perhaps understand and learn about the process, we're obviously not learning the whole truth.

Answer 20

The estimators of all the variables remain consistent and unbiased. However, they will inflate their standard errors. In other words, the estimators become inefficient, meaning that their variance increase. Variance increase is that same as seeing their standard error increase. This is a problem because with larger standard errors, it makes it more difficult to detect significancies in testing oh hypotheses etc. For instance, variables that might have been considered marginally significant (meaning that they do contribute to predicting the dependent variable) may be brushed by as insignificant as a result of the shit standard error. If this happens, we will likely remove the variable, even though it actually has a contribution. The book suggests that it is better to include marginally significant variables rather than risk losing important ones. The omitted variable bias is usually a more dangerous issue.

Answer 21

given a fitted model, we now want to see whether it is equally appropriate to use it on all areas of the domain (across the entire sample). the overall idea is to split the sample of data into subsamples. then we estimate up to 3 models. We create one model for all the data. We create a model per sub sample. If we just divide the og sample into 2 pieces, we get 3 models total. Then we compare RSS for all the models.

Answer 22

We have Chow (analysis of variance), and predictive failure tests.

Answer 23

it is actually basic F-test, but with different terms in the enumerator and denominator. Also differnet degrees of freedom. But the overall idea is to figure out if the RSS of the subsamples is different from the RSS of the whole sample. For stability, we want it to be close together. The H0 hypothesis is that the parameters for the two new models are the same.

Answer 24

Require a lot of data. We need enough data to confidently make 2 regressions. this is amplified if we want to test uneven cases. For instance, what if we want to place the break point close to the beginning or close to end of the dataset. Then we hardly have enough data for regression.

Answer 25

Predictive failure test.

Answer 26

It is more robust than the chow test, because it only requires us to fit a regression model to the entire original sample, and not one of the new subsamples. Therefore, we can select the larger one, and avoid the issue of having to fit a regression based on very little data. We run the regression for the whole period, and find the RSS. Then we run the regression for a large sample, and find that RSS. Then we enter it into the F-statistic.

Answer 27

We can plot hte shit and look for structural changes. Split according to known historic events

Answer 28

Explore the assumptions of the OLS and CRLM. we want to identify whether some of the assumptions are violated. This because of how the model will behave if violated. Then we want to understand what happens if violated. Then we want to find some remedy

Answer 29

1) Estimates of coefficients are simply not correct. Can for instance lack something, or be overfitted etc. 2) The standard errors corresponding to the coefficients can be wrong. 3) The hypothesis tests we use rely on certain distribuution, which might be wrong.

Answer 30

Misspecification test.

Answer 31

it is more of introducing the fact that we are using chi squared and F-distributions for most of our testing. The chapter start by saying that these two are different for smaller samples, but are converging asymptotically.

Answer 32

Yes, but no. tghere is a stronger one: E[u_t | x] = 0 that is used by the lectures. It is slightly differnet, and can replace this first assumption and the covariance one cov(x_j, u_j) = 0 I believe

Answer 33

Plot is likely to reveal very little. We need testing.

Answer 34

Goldfeld-Quandt. F-statistic. Split the sample into two parts. Length T1 and T2. Compute the residual variances according to the formulas by the book. Again: residual variance. Not variance of parameter. we find the residuals from each fitted regression model, and find their variances. these two residual variances will have differnet degrees of freedom., unless if they are identically sized up. THne we divide them on each other to create a ratio of chi squared variables which is following the F-dist. The test H0 is just that hte residual variances are the same. The F-statistic is F-distributed with F(T1-k, T2-k) degrees of freedom.

Answer 35

White's test. Benefit of White is that it doesnt make any assumption on the form of the heteroscedasitcity. Recall that the goldfeld-quandt test makes an assumption that first-half vs second-half is a good split choice. It can of course decide to split differently, but in either case we must make an assumption of what makes a good splitting point. This is not an issue with White.

Answer 36

Test for heteroscedasticity (Var(u_t)=sigma^2 for all t) we obtain residuals from our original model; then we use these residuals (squared) to fit a new model (which is the auxiliary regression) which have the residuals as the dependent variable; therefore we try to explain fluctuations around the mean of the residuals by linear and some non-linear relationships between the residuals and the explanatory variables. If we find that fluctuations around the mean squared residual is explained by one or more of these relationships (indicated by non-zero coefficient) then we know that the variance of the residuals are not completely random and constant. Regarding the auxilliary regression, the terms are decided based on broad understanding of capturing the basic and most cases. We include all the linear terms, their squared non-linear terms, and their interaction terms. We could add more terms, for instance exponential, but this will also bring potential of overfitting the model, which will increase the likelihood that we end up seeing things in the variance that actually doesnt exist. Note the importance of the constant term as well in the auxilliary regression. Extremely important. To perform the test, we run the auxiliary regression. then we run it again, but only on a single constant term. This is the crux. we are checking whether we can actually explain fluctuations in variance of the residuals as a result of our explanatory variables, and this to a higher degree than the single constant is able to do. Our wish is that this is not the case, and that the single-constant regression is able to capture the exact same shit as the auxiliary regression does. We find the RSS from both models, and use them in F-statistic. ALTERNATIVELY; we use the LM approach. In such a case, we use the fact that the R^2 will be large if one or more of the coefficeints are statistically significant (of the auxiliary regression. this approach use only the auxiliary regression, not the one with only the constant). If none of the coefficeints are significatn, R^2 will be relatively low. We'd obtain R^2, multiply by the sample size T, and use the fact that this is a chi-squared distributed variable with m (m=number of regressors) degrees of freedom. The null hypothesis is that all coefficients in the auxiliary regression, except for the constant, has value 0. We wish to reject it.

Answer 37

Breusch-Godfrey is the better option. Recall that DW is also a possibility, but only checks for a single lag. Breusch-godfrey creates a new regression based on the residuals (residuals are the dependent variable), and it use 'r' lags with each corresponding coefficient. We also include the explanatory variables in the new regression. The aim is to explore whether any of the coefficients are significant. We want them to be close to zero. The crux is that we are testing whether it is possible to explain the fluctuations of the residuals around its mean by modeling using the explanatory variables and the lagged residuals at the same time in a linear way. Finally, we obtain R^2 and use the fact that it is chi-squared with r degrees of freedom if we multiply it by (T-r), where T is the sample size and r is the number of lags we're looking at.

Answer 38

If a variable is correlated with error term, the model will not even be consistent. The reason is that when the residual and some variable swing together, if the error term is large and therefore also the explanatory correlated variable is large, the model will credit the variable with the high value for y, when in reality it was only the error that was high.

Answer 39

BJ test. Chi squared shit. follow the book, the shit is much formula.

Answer 40

If the sampels are large enough, normality is sort of guaranteed, so then nothing really happens. Due to how chi squared variables tend toward normality for large samples size.

Answer 41

Outliers. We get some outliers that makes it look like the data is not normally distirbuted (residuals I mean). We can remove the outliers, in the sense of residuals, by including one dummy variable per outlier, and simply assign a 1-value to the point. Thus we can achieve perfect overfit and remove the point when considering the residuals. This obviously is at the risk of losing potentially important data though.

Answer 42

The aim of VIF is to provide a measure that we use to see whether there is multicollineatrity in the data or not. The way it works is that we are checking whether the variance of a parameter estimate increases as a result of correlation among variables. We make a regression where x_i, which is one of the original explanatory variables, is the dependent variable. We make the regression using constant term (intercept) and all the other explanatory variables. Then we find the R^2 that results from this new auxiliary regression. This will capture the degree to which the other explanatory variables are able to capture the fluctuations of x_i around its mean. If there is correlation, we will expect R^2 to be quite large, as a large R^2 represent a large degree of explained variance in the model. We could use R^2 in itself, but we subtract it from 1 to get a result that basically is the ratio of how much variance that is not explained, and then use this as the lower part of a fraction to obtain a multiplier that represent how much more variance the parameter of x_i has compared to what it would have been if all variables (explanatory) were independent. If R^2 is very large, it means that the explanatory variables are good predictors of the x_i explanatory variable. this is a violation of our assumptions of zero covariance among explanatory variables. If R^2 is large, (1-R^2) is small, and this makes 1/(1-R^2) large. and vice versa. RULE OF THUMB: If the VIF is below 5, we typically neglect multicollineairty.

Answer 43

Advanced: PCA Not advanced: ignore it, drop correlated variable, transform corrleated variables into a ratio

Answer 44

Wrong functional form refers to assuming that the true form is for instance "linear" while it has non-linear shape of type polynomial etc. This is naturally something we want to test for

Answer 45

Perform regular regression. Then we use the predicted y-values in a new regression where we still use the very same y-variable as dependent variable, but now we include more terms. The terms we now include are higher powers of y, using the results from our previous predicitons. The ultimate goal is then to see if some of the new coefficients for the higher powered y-prediction-variables are signiificant or not. If significant, we typicallly have a case of non-linearity.

Answer 46

Figuring out whether the parameters are stable throughout the entire sample, or if some subsets are significantly different from others.

Answer 47

dummy variable approach. for instance, if we split a subset of size T into T1 and T2, we can use dummy variables to identify which sample belongs to which part. The book list it like the image shows.

Answer 48

The goal is to understand whether the parameters of the model are the same for two subsets. We use dummy variables to differ between subsets. Then we use a simple F-test to see whether the parameters of the dummy section is 0 or not. The dummy variables only kicks in when the subset if activated, and provides a possible "correction". If no correction is made, the first subset and the second subset are similar and the parameters are stable.

Chapter 5 - Classical Linear regression assumptions and diagonistic tests Flashcards

(75 cards)