Regression Flashcards

Question

motivation of the F-test

Answer 1

We can test relationships that concerns more than 1 variable at a time. For instance, is it so that when we have both X_1 and X_2 together, that they can say something about the effect on Y, that they are unable to do independently?

Answer 2

The idea is to compare the difference in variance of the unrestricted regression vs the restricted regression. By taking RRSS - URSS and dividing on URSS, we receive 0, or close to 0, if the proposed restriction has a negligible effect on the result. If the numerator is very large, the restriction has a large effect, which basically means that the restriction made the model worse. This would simply mean that the restriction IS NOT BACKED by the data.

Answer 3

We look for the number of equalities (informally). It is actually just the degrees of freedom related to the (RRSS-URSS) chi-squared variable.

Answer 4

Testing for junk regressors by using the null hypothesis that all parameters are equal to 0, except for the first one that is the intercept. The goal is to see if any of the parameters actually affect the dependent variable.

Answer 5

dummies is basically binary variables used to encode categorical variables. The important part is to make sure that we never enter the dummy variable trap. This trap is related to making the matrix non-solvable in regards to creating a unique solution. This happens if one of the columns (vectors) is a linear combination of some of hte others. The intuition is that if we have say 3 dummies, and there is a constraint that one of them must be equal to 1, we get redundancy if we include all dummies and the intercept. Typically solved by removing one of hte dummies, and let the intercept hold the case where the third scenario applies. Note that if there is no constraint on mutually exclsive events that must have 1 dummy be 1, the dummy trap no longer applies.

Answer 6

The square of the correlation coefficient between y and ^y. Correlations are restricted to the interval [-1, 1], and thus the square is defined between [0,1] A low correlation means a shit fit etc.

Answer 7

TSS = ∑(y_i - mean(y))^2 total sum of squares can be split into two things: 1) The part that is explained by the model 2) The part that the model was not able to explain The part that the model is able to explain, is called explained sum of squares, ESS. The unexplainable part is the sum of squared residuals. TSS = ESS + RSS TSS = ∑(y_i - mean(y))^2 = ∑(y^_i - mean(y))^2 + ∑u_i^2 The explained sum of squares is the difference between each prediciton and the average.

Answer 8

R^2 = ESS / TSS

Answer 9

1) it doesnt make sense to use it for comparisons because different dependent variables cause different values for the variables and y and R^2. So, it is invalid for comparison when the models use differnet dependent variables. 2) Adding more regressors never reduce the goodness of R^2, but can actually either make it remain the same or increase R^2, making it difficult to know if there are many useless regressors or not.

Answer 10

The probability of rejecting the null hypothesis when it actually is true. It will be equal to alpha, where alpha represent the level we set when conducting the test. We select for instance alpha=0.05, and say that if our observed value has a less than alpha probability of being observed given the distribution, we reject the null hypothesis because it was so uncertain.

Answer 11

not rejecting the null hypothesis when it is false

Answer 12

the t-ratio is a test where the nul lhypothesis parameter has the value 0. This means that we are basically testing whether our observed values indicate that the parameter is zero or not. How likely is it to observe our values given that the actual mean is 0. This is used for testing whether a variable is having any effect at on on some other variable. we collect a sample and use the estimator of linear regression parameters to create the estimate. Then we test this estimate against the hypothesis that its value should be 0. I would assume that if the estimate is small, it is indication in itself that the null hypothesis is true, or cannot be rejected, but it is difficult to say this with exact probability. Therefore we check to see if the observed estimate is within the specific range of values which we consider to be expected. If the value is so large that it falls outside of the probability window we have sat, we should reject the null hypothesis and we can sort of conclude that the variable that corresponds to the parameter is of significance in predicting the dependent variable

Answer 13

y = Xb + u y is a vector X is a matrix b is a vector u is a vector Given the ultimate b-vector that holds all of our estimator-results, we have a vector that will be mapped to a new vector, y.

Answer 14

They refer to data mining as the process of trying many variables in a regression without basing the selection on financial theory. For instance, if we try 20 regressions, and the size of the test is 5%, and we find that 3 of the regressors are significant, what have we actually done? The probability of observing an extreme value when doing 20 regressions with 5% is much higher than 5%, so the true size of the test is much larger. The way we can deal with this problem is to use a separate test set of the data.

Answer 15

It doesnt provide any good measure because it is unbounded from above. How are we to interpret a value of 150. The value of the RSS depends greatly on the scale of the dependent variable, which further provide fuckery.

Answer 16

The most common one is the R^2. R^2 can be defined as the square of the correlation coefficient between y and y_pred. Correlation lies between -1 and 1, and squaring it brings it to 0,1. So, we need the correlation between each true value y and corresponding predicted value y_pred as provided by our model. But how do we get the correlation? Another definition or R^2 is, is found by considering what the model is actually trying to explain. The model attempt to explain variability of y around its mean value mean(y).

Answer 17

Consider what the model attempts to do. If we had no model, we would estimate y to be y_pred where y_pred is simply the mean of all values. This is no linear regression, but rather a simple average. For stock returns, we'd look at a stock's daily returns, take the average and say that the average is our prediction of the stock's return. With a model, a regression model in this case, we are adding explanatory variables with the goal of providing more direct relationships between certain factors and the dependent variable. This idea, or goal, of this is to increase our level of understanding, and increase what the model is actually able to explain. In a perfect model, the model would explain absolutely everything about the dependent variable. In such a case, variations in y would have direct relationships to certain variables in our model. However, this is wishful thinking, as models usually dont come very close to perfect explanation. but, in regards to defining R^2, which is a goodness of fit statistic, we can make use of the model's "ability to explain movement in the dependent variable". The idea is to separate the movement in y that the model was able to indicate, from the movement in y that the model was not able to capture. Since we are using simple linear regression for now, the "movement in y that the model was not able to capture" is simply the difference between y_true and y_pred, which we know as the residuals. On the other hand, the movement in y that the model was able to capture, can be defined as a "rest-sum" where we take the total sum of squares, and remove the residual sum of square (because this is not explained by the model) and we are then having the explained sum of squares. The total sum of squares is defined as deviations around the mean of the dependent variable. Consider the stock's daily returns again. We take the average daily return with a sample size of say 1 year, and use this average daily return as a baseline. Then we take each day's return and compare against the average. The deviation is then squared, and summed over the sample. The result is the total sum of squares. From intuition, we can also say that the explained sum of squares is the squared difference between the predictions and the mean. So, now that we have some terms, we can define R^2 as a ratio tellign us "how much of the total movement was/is explained by the model". to create this ratio, we take ESS and divide on TSS.

Answer 18

The enumerator must be 0, and the only way this is, is with all ESS contributions being equal to 0 (since they are restricted between 0 and positive). To create 0, the parameters of the model must all be equal to 0, and have the intercept be equal to the mean.

Answer 19

Cannot be used to compare different dependent variables R^2 never decrease when adding more independent variables to the model.

Answer 20

There is a way that accounts for the loss in degrees of freedom from incorporating more variables in the model. This new metric is known as adjusted R^2. The intuition behind the formula goes like this: We use a factor (T-1)/(T-k) that is multiplied by (1-R^2), and then we take "1-result". Adding more parameters makes it worse, so that the increase in R^2 from adding the params must be significalty large.

Regression Flashcards

(45 cards)