Chapter 4 - Brooks Flashcards

(26 cards)

1
Q

in regards to fiancne and econometrics, name a reason why we would conxider it beneficial to extend the bivariate CLRM to more variables

A

no arbitrage theory opens up for possibility of multiple variables affecting the return

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is the difference in interpretation of the coefficients in the multivariate CLRM vs bivariate

A

Now they are called “partial regression coefficeints”, because they are only a part in the ultimate explanation of variability in the dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

when we say that we have k variables, what does it include?

A

All variables, including the intercept term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how do we derive the OLS estimator?

A

The OG way is to minimize the sum of squared errors. The sum of squared error is a convex function, so we know that if we differentiate it and solve for 0, we get the error-minizing result.

However, MLE and method of moments also work and yield the same answer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

elaborate deeply on method of moments

A

Create a system of equations, one equation per unknown, and one unknwon is a parameter of the distribution we are working with.

Each equation is found by relating the theoretical k’th moment with the sample k’th moment.

Solve the system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what can we do to create a broader testing framework that t-ratio+

A

due to the downsides of t-ratio being only a single parameter at a time, we want a test that can test multiple hypothesdis at once.

This is nice, because it allows for broader class of restrictions, testing for things like “this and this and this” as a combination.

The outcome if the F-test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

elaborate on the F-test

A

Requires 2 regressions.

1) Unrestricted
2) Restricted

The restricted regression requires that we enter the hypothesis we want to test for. In essence, this is then going to make the model more limiting.

We find the residual sum of squares from each regression. Then we do this:

test statistic = (RRSS - URSS)/(URSS) x (T - k)/m

This is f-distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is “k” in the F-test?

A

explaantory variables number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is “m” in F-test?

A

The number of restrictions for the restricted regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is “T” in the F-test?

A

number of data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

why does the F-test work?

A

the enumerator, RRSS - URSS, represent a difference. If the restrictions we added are insignificant, this difference is equal to 0.

If the difference is large, meaning that the restrictions have an impact, we get a value that means “This value is not likely to observe given that the regressions are equal”. Thus, large values indicate that the parameters we placed restrictions on are actually important, and that the restrictions should not be used.

A restriction that makes sense is to place all coefficients to be 0. This will test quickly whether the regression has any idea or not.

So, one assumes that the variable “RRSS-URSS” is a chi squared variable. RRSS has “T-k-m” degrees of freedom. URSS has “T-k” degrees of freedom. Thus, the differnce is “m”. Therefore this chi-squared variable gets “m” degrees of freedom.
The URSS still has “T-k” degrees of freedom.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the F-statistic actually?

A

F-distributed random variable is defined as the ratio of two chi-squared variables along with their degrees of freedom:

X = (X_1/df1)/(X_2/df2)

There is little exciting shit about the F-statistic other than its uses with testing an ANOVA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

discuss the “issue” with the F-test

A

It is not an issue, it is just sometihng to be aware of. It is a one-sided test, so we only care is the statistic exceeds a certain limit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what relationships cannot be tested with either t-ratio or F-test?

A

non linear relations, like b1 b2 = 4 etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is size of the test?

A

Alpha, the probability of performing type 1 error, the probability of rejecting the null hypothesis even though it is valid.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the issue surrounding the size of the test?

A

By the law of large number or whatever, if we try enough shit, we will eventually (by sheer luck) find significant relationships. It is not that they are singificant, it is that they appear as significant.

The issue with this is best seen when we use a large number of regressors. say we have 20 regressors, and the size of the test is 5%. Then we should expect one of the regressors to appear significant just because of how data is distributed.

15
Q

how to solve the issue of data snooping?

A

Use a test set, out of sample test set, to get a measure of the performance.

This may be difficult in cases where we are limited in data.

16
Q

elaborate on dummy variables

A

used for qualitative values.

Typically only binary values.

Be aware of hte trap.

17
Q

elaborate on the traps with dummy variables

A

There are several.

One is the case where one try to model relationships that are not interval attributes as integer valued. This creates a fucked up model. For instance, using an integer variable to represent “location”. One should instead using binary variables.

THe other is “the trap” that entails fucking up and making the matrix unsolvable. This is done if we get a system of equations that is linearly dependent in its columns.

The idea to avoid the trap is to not include exhaustive amount of variables.

We NEED each group to have one free spot if we want ot include the intercept.
if we remove the intercept, we can include all dummys. but then we might lose some info.

18
Q

what is the point about R^2?

A

goodness of fit. R^2 is a TYPE of goodness of fit statistics.

The point is to get an understanding of how well the model performs.

Specifically, R^2 answers “how well is the model able to explain deviations from the mean level”.

19
Q

elaborate on R^2

A

R^2 gives a number of how much variance the model is able to explain. We therefore relate the values of “total sum of squares” with “residual sum of squares”/”unexplained sum of squares” and “explained sum of squares”.

The “total sum of squares”, TSS, is the sum of squared differneces between data points and the constant mean level.

The ESS, explained sum of squares, is the sum of squared differences between the constant mean level and the SRF line that we have predicted.

The RSS represent the remaining sum of squares. The smaller this value is, the better.

The ratio of ESS/TSS represent how much of the total variation that has been captured by the model.

R^2 = ESS/TSS

We can also write it as:

R^2 = (TSS - RSS)/TSS

R^2 = TSS/TSS - RSS/TSS

R^2 = 1 - RSS/TSS

20
Q

elaborate on the issues with R^2

A

1) it favors more regressors as there are no penalities for adding more regressors

2) R^2 depends heavily on the dependent variable. Therefore, if the model is slightly re-parameterized, it makes no sense to use R^2 for comparison.

3) sucks for time series models

21
Q

elaborate on R^2 as dependent on the dependent variable

A

We must be careful if we re-parameterize.

If the reparameterization scale the function so that the y_values are placed relatively speaking at different positions compared to each other, then we cannot use R^2 to compare fits. For instance, if we log-transform it, we cannot use R^2.
if we divide by “n”, we cannot use it.

We can shift it

22
Q

Elaborate on overcoming the issue of “more regressors is better”

A

using adjusted R^2.

Adds a penalty for more regressors.

23
elaborate on hedonic pricing models
Hedonic refers to that we place a value associated with some component. Typically these models are linear, but they dont have to be.
24