Chapter 4 - CLRM Flashcards
(30 cards)
elaborate on the usefulness of the bivariate regression model in finance
It is a basis for everything, but it needs to be generalized. It can work well with ideas and theories using only a single variable, like CAPM, but the whole thing about arbitrage pricing theory requires more variables.
if we take the bivariate linear regression model, and generalize it to multivariate, what happens to the itnerpretation of the coefficient estimates?
Since there are more than one regressor, we need to make sure that we are holding “everything else constant”. This allows us to look at the effect of a change in variable by considering the coefficeint of the variable.
Specifically, what does an estimate for a variable coefficient represent?
Rperesents the average change in the explained variable as a result of independent movement in the specific explanatory variable. Keyword is “average change per unit”. When we use OLS we get average best outcomes I believe
is the constant a constant?
not really. In theory, when we generalize the model, we assume that there is a variable there as well. However, we assume that this variable holds only 1 values. This makes it behave as a constant.
what do we mean by the number “k”?
number of coefficients that we are solving for. Since we are also trying to find the best intercept term, we also consider the constant term as part of the “k”.
Can we call the constant term an explanatory variable?
No this makes no sense. It doesnt explain anything. The reason we call the other variables as “explanatory” is because they relate to how changes in themselves relate to changes in the variable of interest. Since the constant term never change, it doesnt explain any movement.
However, I suppose one could say that it explains a certain base level
y = Xb + u
Elaborate on the dimensions of all parts
X : T x k matrix
b : k x 1
u : T x 1
y : Tx1
NB: Regarding b, this is correct because we, by convention, always consider vectors as column vectors initially. In order to perform the dot product, then the transpose happen etc. but by default, all vectors are column vectors.
how do we find the estimators for regression coefficients?
We need to minimize a loss function.
The loss function is the sum of squared errors.
what is this?
estimator for the variance of residuals from generalized linear regression (multivariate) model.
It is called sample variance estimator.
T is the number of sample points.
k is the number of degrees of freedom. Equals the number of parameters we are solving for essentially.
This formula is very useful, because it is required in order to find the variance of the regressor coefficients. These are then rooted to find the standard error of the coefficients.
These are useful because they tell us how much precision there is in our model compared to our data. Note that this has nothing to do with performance of the model in general. It only tells us how well the model is suited for the specific sample.
what does teh standard error actually tell us?
How much the coefficient is expected to change if we repeat the sampling process.
we assume that errors are iid (and normal probably). Then we take the estimator for the coefficeint, and use the fact that the estimator is a function of these errors. The estimator itself has a distribution. and we are essentially finding the standard deviation of this distribution.
Standard error is defiend as the standard deviation of a sampling distribution of a statistic. Sampling distribution of a statistic is the distribution frm a random-sample according to the statistic.
so when we use the estimator for regression coefficients, we get what the estimator believe to be the best value based on the sample. However, the sample also tell us how uncertain the estimator is (stnadard error) and how much movement it believe we will see if we were to change sample
what is a sampling distribution
distribution we would get by repeatedly drawing sample points from some population. Typically associated with a statistic, like the mean etc.
name the main reason why we need standard errors
Because of how they are defiend as the standard deviation of the sampling distribution of a statistic (we are interested in the statistic) it is usually required for hypothesis testing.
briefly discuss on f-test
Builds on the f-distribution.
f distribution is created by dividing one chi-squared dist on another chi squared distribution, along with their corresponding degrees of freedom.
the benefit of the f-test is the ability to test for multiple hypothesis at once, in a conjunction kind of way.
we need:
1) Unrestricted regression
2) Restricted regression
The unrestricted regression has no requirements on what values can be etc
the restricted regression has some sort of constraint on the regression coefficeints.
elaborate on the workings of the f-tets
We have two regressions: One with constraiunt, one without.
Then we find the residual sum of squares for both the unstricted and restricted regression. This gives us the ability to compute the test statistic:
statistic = (RRSS - URSS)/URSS x ((t-k)/m)
Notice what happens here: If there is no difference between using the constraint and not using it, URSS is equal to RRSS. In this case, we get a statistic close to 0. However, if adding the constraint severly fucks up the error, then the unrestricted residual sum of squares will be much lower than the restricted. This makes the value of the test statistic larger.el
elaborate on the fuckery we need to do to enforce the constraints
Firstly, we need residuals of the constrained regression. The easiest way to do this is to impose the constraint, and then perform the variable swap in a way that allows us to use OLS on the exact same parameters as the unrestrcted regression!. It is important that the coefficients remain exactly the same. Therefore we rather substituate variables by defining new transformed variables. If we do not do this, then it makes no sense as we ultimately end up comparing regressions that are not related.
elaborate on what it actually entails to use a certain distribution as test statistic
When we for instance use f-statistic in an f-test, and use the fact that it is f-distributed, we say that it is f-distributed under the null hypothesis. When we add the null hypothesis of a constraint, we are saying that “assuming that the constraint holds, we get a p-value of X associated with our observed values”. If the value for the statistic is large, we likely reject. Therefore, the null hypothesis must be a constraint.
NB: it doesnt have to say that sum of coefficeints must be 0 or whatever, the f-test framework is very flexible and versatile. We are basically checking differences in error sums to see if a constraint was backed by the data or not.
what do we mean by “test for junk regressors”?
Using the f-test on all but the constant coefficeint, and checking if they are all equal to 0. This basically tests for whetehr the variables are suitable at all. If we reject the nul lhypothesis, it means that some of the variable coefficients are not useless. However, if we cannot reject the null, the entire regression is useless.
does the f-test have any limitations?
Unable to test on non-linear cases
what is the size of the test?
alpha. The probability of rejecting the null hypothesis even though it is true.
what is data snooping?
testing for all kinds of variables and looking for significance without actually having a reason to check it other than to see if there is something there.
The issue with this is that when this is done a lot, variables will become significant purely by chance due to the fact that the size of the test is usually like 1%.
how to deal with qualitative variables?
dummvariables to turn them into quantitative
elaborate on dummy variables
Usually binary, but can be integers in some cases.
They are used as always, but there is a danger when using integers. The danger is that one model a relationship as if the variable domain is ordinal, but is actually not. In such cases, it is more appropriate to rather add multiple binary variables instead of a single integer variable.
There is also the more subtle issue of dummy trap: By including all categories of a one hot variable, and the intercept, the matrix become perfect collinearity. It will not be invertible, and cannot be solved. In other words, the columns are no longer linearly independent.
what are intercept dummies?
Dummy variables that basically add (shift) a contribution to the intercept term if activated.
It is disrable to have a measure of how well a regression model fit the data. What should we consider?
Goodness of fits statistics