Chapter 4 - CLRM Flashcards by Martin Sund

elaborate on the usefulness of the bivariate regression model in finance

It is a basis for everything, but it needs to be generalized. It can work well with ideas and theories using only a single variable, like CAPM, but the whole thing about arbitrage pricing theory requires more variables.

How well did you know this?

Not at all

Perfectly

if we take the bivariate linear regression model, and generalize it to multivariate, what happens to the itnerpretation of the coefficient estimates?

Since there are more than one regressor, we need to make sure that we are holding “everything else constant”. This allows us to look at the effect of a change in variable by considering the coefficeint of the variable.

How well did you know this?

Not at all

Perfectly

Specifically, what does an estimate for a variable coefficient represent?

Rperesents the average change in the explained variable as a result of independent movement in the specific explanatory variable. Keyword is “average change per unit”. When we use OLS we get average best outcomes I believe

How well did you know this?

Not at all

Perfectly

is the constant a constant?

not really. In theory, when we generalize the model, we assume that there is a variable there as well. However, we assume that this variable holds only 1 values. This makes it behave as a constant.

How well did you know this?

Not at all

Perfectly

what do we mean by the number “k”?

number of coefficients that we are solving for. Since we are also trying to find the best intercept term, we also consider the constant term as part of the “k”.

How well did you know this?

Not at all

Perfectly

Can we call the constant term an explanatory variable?

No this makes no sense. It doesnt explain anything. The reason we call the other variables as “explanatory” is because they relate to how changes in themselves relate to changes in the variable of interest. Since the constant term never change, it doesnt explain any movement.

However, I suppose one could say that it explains a certain base level

How well did you know this?

Not at all

Perfectly

y = Xb + u

Elaborate on the dimensions of all parts

X : T x k matrix
b : k x 1
u : T x 1

y : Tx1

NB: Regarding b, this is correct because we, by convention, always consider vectors as column vectors initially. In order to perform the dot product, then the transpose happen etc. but by default, all vectors are column vectors.

How well did you know this?

Not at all

Perfectly

how do we find the estimators for regression coefficients?

We need to minimize a loss function.

The loss function is the sum of squared errors.

How well did you know this?

Not at all

Perfectly

what is this?

estimator for the variance of residuals from generalized linear regression (multivariate) model.

It is called sample variance estimator.
T is the number of sample points.
k is the number of degrees of freedom. Equals the number of parameters we are solving for essentially.

This formula is very useful, because it is required in order to find the variance of the regressor coefficients. These are then rooted to find the standard error of the coefficients.
These are useful because they tell us how much precision there is in our model compared to our data. Note that this has nothing to do with performance of the model in general. It only tells us how well the model is suited for the specific sample.

How well did you know this?

Not at all

Perfectly

what does teh standard error actually tell us?

How much the coefficient is expected to change if we repeat the sampling process.

we assume that errors are iid (and normal probably). Then we take the estimator for the coefficeint, and use the fact that the estimator is a function of these errors. The estimator itself has a distribution. and we are essentially finding the standard deviation of this distribution.

Standard error is defiend as the standard deviation of a sampling distribution of a statistic. Sampling distribution of a statistic is the distribution frm a random-sample according to the statistic.

so when we use the estimator for regression coefficients, we get what the estimator believe to be the best value based on the sample. However, the sample also tell us how uncertain the estimator is (stnadard error) and how much movement it believe we will see if we were to change sample

How well did you know this?

Not at all

Perfectly

what is a sampling distribution

distribution we would get by repeatedly drawing sample points from some population. Typically associated with a statistic, like the mean etc.

How well did you know this?

Not at all

Perfectly

name the main reason why we need standard errors

Because of how they are defiend as the standard deviation of the sampling distribution of a statistic (we are interested in the statistic) it is usually required for hypothesis testing.

How well did you know this?

Not at all

Perfectly

briefly discuss on f-test

Builds on the f-distribution.

f distribution is created by dividing one chi-squared dist on another chi squared distribution, along with their corresponding degrees of freedom.

the benefit of the f-test is the ability to test for multiple hypothesis at once, in a conjunction kind of way.

we need:
1) Unrestricted regression
2) Restricted regression
The unrestricted regression has no requirements on what values can be etc
the restricted regression has some sort of constraint on the regression coefficeints.

How well did you know this?

Not at all

Perfectly

elaborate on the workings of the f-tets

We have two regressions: One with constraiunt, one without.

Then we find the residual sum of squares for both the unstricted and restricted regression. This gives us the ability to compute the test statistic:

statistic = (RRSS - URSS)/URSS x ((t-k)/m)

Notice what happens here: If there is no difference between using the constraint and not using it, URSS is equal to RRSS. In this case, we get a statistic close to 0. However, if adding the constraint severly fucks up the error, then the unrestricted residual sum of squares will be much lower than the restricted. This makes the value of the test statistic larger.el

How well did you know this?

Not at all

Perfectly

elaborate on the fuckery we need to do to enforce the constraints

Firstly, we need residuals of the constrained regression. The easiest way to do this is to impose the constraint, and then perform the variable swap in a way that allows us to use OLS on the exact same parameters as the unrestrcted regression!. It is important that the coefficients remain exactly the same. Therefore we rather substituate variables by defining new transformed variables. If we do not do this, then it makes no sense as we ultimately end up comparing regressions that are not related.

How well did you know this?

Not at all

Perfectly

elaborate on what it actually entails to use a certain distribution as test statistic

Study These Flashcards

When we for instance use f-statistic in an f-test, and use the fact that it is f-distributed, we say that it is f-distributed under the null hypothesis. When we add the null hypothesis of a constraint, we are saying that “assuming that the constraint holds, we get a p-value of X associated with our observed values”. If the value for the statistic is large, we likely reject. Therefore, the null hypothesis must be a constraint.

NB: it doesnt have to say that sum of coefficeints must be 0 or whatever, the f-test framework is very flexible and versatile. We are basically checking differences in error sums to see if a constraint was backed by the data or not.

what do we mean by “test for junk regressors”?

Study These Flashcards

Using the f-test on all but the constant coefficeint, and checking if they are all equal to 0. This basically tests for whetehr the variables are suitable at all. If we reject the nul lhypothesis, it means that some of the variable coefficients are not useless. However, if we cannot reject the null, the entire regression is useless.

does the f-test have any limitations?

Study These Flashcards

Unable to test on non-linear cases

what is the size of the test?

Study These Flashcards

alpha. The probability of rejecting the null hypothesis even though it is true.

what is data snooping?

Study These Flashcards

testing for all kinds of variables and looking for significance without actually having a reason to check it other than to see if there is something there.

The issue with this is that when this is done a lot, variables will become significant purely by chance due to the fact that the size of the test is usually like 1%.

how to deal with qualitative variables?

Study These Flashcards

dummvariables to turn them into quantitative

elaborate on dummy variables

Study These Flashcards

Usually binary, but can be integers in some cases.

They are used as always, but there is a danger when using integers. The danger is that one model a relationship as if the variable domain is ordinal, but is actually not. In such cases, it is more appropriate to rather add multiple binary variables instead of a single integer variable.

There is also the more subtle issue of dummy trap: By including all categories of a one hot variable, and the intercept, the matrix become perfect collinearity. It will not be invertible, and cannot be solved. In other words, the columns are no longer linearly independent.

what are intercept dummies?

Study These Flashcards

Dummy variables that basically add (shift) a contribution to the intercept term if activated.

It is disrable to have a measure of how well a regression model fit the data. What should we consider?

Study These Flashcards

Goodness of fits statistics

Most common goodness of fit statistic

R^2

elaborate on what "goodness of fit" statistics try to do

They try to place a number of on how well the model is able to explain the individual data point's deviations from the mean. If all the points have the same value, then we could perfectly model the relationship using a single constant. However, if the data has deviations from the mean, we want to capture it. We want a model that can "explain why there is, or isnt, a deviation". I suppose as well that a useful measure should be able to identify a case where the model was not able to capture anything but the mean, and also be able to identify if it captured everything perfectly.

there are 2 main ways of defining R^2

1) The square of the correlation coefficeint between the observed values y, and the fitted values pred(y). Gives a number between 0 and 1, where 0 indicates no capture, whiel 1 is perfect capture. Squaring works well because -1 correlation also refers to a model that captures everything. 2) Use the unconditional mean, y_bar, of the dataset, and use it to find the "Total sum of squares". TSS is defined as ∑(y_t - y_bar)^2. The total sum of squares consists of the the Explained sum of squares, and the Residual sum of squares. In essence, R^2 = ESS/TSS = (TSS-RSS)/TSS = 1 - RSS/TSS This final one is perhaps the easiest to use in practice.

Most important problem with R^2, and its quick fix

adding more regressors always makes traditional R^2 better. Fixed by using adjusted R^2, as it adds a penality to havign more explanatory variables.

what is a hedonic pricing model?

A hedonic pricing model is one where the final price is the sum of contributions from individual parts. to qualify as hedonic, the intention must also be to infer something about the value added by a piece. This is the main point. For instance, we can value chess pieces by considering how thousands of games ended up, and how their states looked like leading up to the final point. Hedonic pricing is about making an assumption that total outcome in regards to price or value or something else, can be estimated by considering the effect of smaller contributions.

Chapter 4 - CLRM Flashcards

(30 cards)