Chapter 5 - CLRM assumptions and testing Flashcards

Question

how do we know if variables are truly orthogonal to each other?

Answer 1

Running a regression, then remove a variable and run again, and observe that the remaining coefficients does not change from their first reuslts. This would indicate orthogonality.

Answer 2

1) There is an exact relationship between two or more variables 2) The model is not solvable, as we have linear dependent matrix. We'd only have enough information to estimate some paramters, (numjber of independent columns in the X vector), not all of them

Answer 3

Difficult. we can only investigate by checking the correlation matrix. High values indicate the presence of something.

Answer 4

we might get a good R^2 value, but the standard errors are high as well. this means that the coefficients are likely not statistical significant. this arise from the difficulty in observing independent contritubutions.

Answer 5

transform the data into better data. use PCA

Answer 6

simply means that the expected value is not equal to the true value. So, in the context of linear regression coefficients, if the estimates are biased, then we are missing.

Answer 7

There are two ways this can go. 1) The variable has some correlation do the ones that remain. As a result, the coefficients of the remaining variables that are correlated with the missing variable will be biased to account for the missing evidence. 2) If the missing variable is completely uncorrelated to the ones we include, then the constant term, the intercept, will be biased.

Answer 8

S^2 = RSS / (T - k)

Answer 9

the variance of a oefficeint is given by: var(b) = s^2 (X'X)^(-1) Note that this gives a full matrix. Our values are along the main diagonal

Answer 10

the model must be consistent with the data and with the theory. The specific-to-general is not really using theory the correct way, and sort of represent a more "try shit out" strategy.

Answer 11

A model should be able to explain everything that a rivaling model can. as a result, the model should not be a subset of a better model.

Answer 12

I believe it is mostly about basing it on a valid theory. Include all variables we believe are important, test for statistical validity, and perform constraints etc with encompassing pricniple etc. The other approach is more path dependent and is naturally inclined to test more models. This risk going into the trap where statistical adequacy is passed by chance.

Answer 13

There is the classic Goldfeld-Quandt test. We split the sample into two subsamples based on a specific ordering of the data. Typically associated with time variable, or something else. Then we solve the model. then we use the solved model on each sub sample. Find residuals of both. Use residuals to compute variance of hte residuals for both samples. This is the regression variance. S^2 = u'u / (T-k) Since the residuals are assumed normally distributed, this is basically chi squared with T degrees of freedom. However, we use k, so we must subtract. Then we take the ratio of regression sample variances to get the F-distributed statistic. GC = S_1^2 / S_2^2 The null is that they are equal to eachother. NOTE: requires good udnerstanding of where to place the break point in to samples

Answer 14

White's test. Sometimes considered better than GQ, because it doesnt rely on the break point information. Run the regular regression. Find residuals. use residuals as dependent variable in a new regression, where the independent variables are all kinds of variations and cross products of the regular regression's independent variables. The goal is to capture relationships between the residuals and magnitudes of various degrees in regards to the variables. NB: The new regression predict squared residuals! Not the residual itself. There is a crucial reason for this: What we really want to predict wit hthe new regression, is the variance of the residual. we want ot see whether the variance of residuals can be explained to any degree by explanatory variables, cross products etc. however, we we use teh assumption that the expected value of the residual is 0, then the formula for variance, E((u_t^2) - E(u_t)^2) = E(u_t^2) NOW WE HAVE 2 OPTIONS: 1) F-test (WALD) 2) Chi squared (LM) If we use the F-test, we would need to use 2 regressions. One is the new one we just used for the variance of the residual. The second would be a regression that is just the constant (intercept) term. This is because if there is no heteroskedastic patterns, only the constant term will provide statistical adequacy, and there would be no differnece in using either of the two models. Therefore, we'd test both against each other, where the model with only the constant term is the restricted model. We'd make use of the fact that residual sum of squares is chi-squared with T-k degrees of freedom. However, it might be easier to use chi squared variant. this involve using the R^2 from auxuliary regression, multiplying it by the number of data points, and using the fact that this is chi sqwuared with m degrees of freedom, where m is the number of regressors. The point here is that R^2 should be low because the model should not be able to explain anything but the average level.

Answer 15

still unbiased, but no longer efficient

Answer 16

Autocorrelation in the reisduals, basically

Answer 17

THe idea is to test for autocorrelation. Consider the case where a residual is either +1 or -1. Then assume that the pattern is literally 1,-1,1-1,... With this pattern, the difference between a residual, and its lag-1 corresponding variable value, would be 4,4,4,... if the pattern was more lke this: 1,1,1,1,-1,-1,-1, the result would be 0,0,0,4,0,0... and if we remove the hard break, at say that each residual is very close to its previous one, the series is like 0.1, 0.1, 0.1 ..... So there are basically two cases: 1) Next residual is far away repeadetly from the current one 2) Next residual is close BOth of these represent high autocorrelation. THe "no-autocorrelation" scenario would be more like white noise. so we use the DW test statistic: ∑(u_t - u_(t-1))^2 / ∑u_t^2 DW has no distribution. But we need to find critical values based on the number of observations and hte number of regressors. NB: DW critical value is so that we want to be within the range [d_upper, 4 - d_upper] to reject, and [d_lower, 4 - d_lower] to be inconclusive.

Answer 18

Breusch godfrey.

Answer 19

if there is presnet autocorrelation, and we ignore it, our estimators are still unbiased. However, like with heteroskedasticity, they are no longer efficient (best ones available).

Answer 20

the idea is that OLS will produce a model, which is not necessarily a good model. And under a certain set of assumption, the OLS estimator will prpoduce a model that has certain characteristics. We use the acronym BLUE to remember these. best,linear, unbiased, efficient. All this means, is that if we have some data, and the data satisfy the assumptions, then using OLS gives us a model that is BLUE. This, however, has nothing to do with performance. The model could still suck. If we have a white noise case, the model would suck. But we would not be able to find a linear estimator that is better. In the case of violating assumptions, certain things happen. For instance, with autocorrelation, we get structures in the data that a linear estimator might not be able to detect. We could also get patterns and structures that OLS is not able to claim being "best" at. In regards to the specific autocorrelation case of two linear lines, I suppose we could, given that we know this structure, define dummy variable that allows us to basically add one more intercept. However, OLS is about not knowing anything about the data, and providing certain guarantees. By adding the dummy, we would (I believe) get the BLUE case, but it would involve altering the data, so it is not really BLUE. regarding the assumption of constant variance: If the variance incresae with larger values of one explanatory variable, we could get a pattern that looks like brownian motion with drift. The linear line from OLS would still capture the trend average, and get unbiased property, but will not be able to capture the pattern in the best way. this is because there is a structure in the variance, and by using it, we would get better approximations.

Answer 21

An implicit assumption made when using OLS is that variables are uncorrelated. In practice, ther is usually some degree of correlation. However, this is not really an issue unless the correlation is large. If we have perfect multicollinearity, then the systme is simply not solvable. Specifically, we will not be able to invert the (X'X) matrix because it does not have full rank, as the columns are not all linearly independent of each other. If the problem is large, but not perfect multicolluineairty, we say "near multicollinearity". This is a big problem.

Answer 22

we have a data matrix X. When we take the dot product of X and X, we compute X'X. if X's columns are mean centered (zero mean for each column) then multiplying by 1/n gives sample covariance matrix.

Answer 23

Large standard errors. This can make it difficult to perform good tests. This happens as a result of there being difficulties in assigning controbituion to individual variables.

Chapter 5 - CLRM assumptions and testing Flashcards

(48 cards)