Chapter 3 Brooks Flashcards

(32 cards)

1
Q

what is regression

A

evaluating the relationship between one variable and movements in one or more other variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are we trying to do with regression

A

We are trying to explain movements in some dependent variable, y. We are trying to explain its movements by using explanatory variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is correlation

A

The degree of linear association between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the danger with correlation

A

1) Assume causaility
2) Misunderstand what linear association is

Correlation is given as:

cov(X,Y)/(sigma_X sigma_Y)

cov(X,Y) = E[(X - mu_x)(Y - mu_y)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the interpretation of covariance

A

The interpretation of covariance is “The expected difference from the mean of X multiplied by the difference from the mean of Y.” Meaning, the covariance of X and Y represent how much we expect X to move away from its mean in relation to how much Y moves away from its mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

elaborate on the role of correlation

A

The role of correlation is to provide an understanding of the linear relationship between two variables. If the linear relationship is perfect, it means that movement in one variable can perfectly explain the movement in the other.

HOWEVER: The correlation coefficeint is not interpretable for this. The coefficient only tell us the degree of the relationship.

We need to use linear methods to capture the linear relationship. This is where the linear regression come into play.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

first step to see if there is a relationship between two variables

A

Plot it visually

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

elaborate on:

y = a + bx

A

This is an exact line. The problem with this is that it doesnt account for errors.

This is a model, a best fitting line, but it is not realistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a better model than

y = a + bx?

A

y_t = a + bx + u_t

This model assumes a relationship but there is a random disturbance term that always exist. Might be because of how it is impossible to catch everything, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

generally speaking, how do we find values for alpha and beta

A

we need to find the alpha and beta that makes the sum of vertical distances between points y_t and y be as small as possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

why vertical and not horizontal distanves?

A

we make use of the assumption that the x-values are fixed in random samples. this means that there is no random element here. Therefore, this assumption reduce our task to finding only the y_t’s.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Most common line fitting method

A

OLS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

general procedure of OLS

A

Square the distances between y_t and the exact line y. Sum together. find the y-line that makes this sum the samllest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

with correct notaiton how do we describe the method of OLS

A

Minimize the sum of squared differences between y_t and y_t-pred

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is y_t and what is y_t_pred

A

y_t is the data point, as collected.

y_t_pred is the predicted data point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Other way of describing the method of OLS

A

Minimizing the sum of squared errors/residuals

17
Q

what is the residual mathematically?

A

The difference between y_t and y_t_pred

18
Q

what is PRF

A

Population regression function, the function that is thought to be producing the data

19
Q

give the PRF

A

y_t = a + bx_t + u_t

20
Q

Why does PRF contian error?

A

Because even though it is the true process, the true process can contian random elements.

21
Q

What is SRF?

A

Sample regression function

22
Q

Give the SRF

A

y_t_pred = a_pred + b_pred x_t

23
Q

why not error in SRF

A

Because it is the best fitting line that we have found to minimize the RSS.

24
Q

what is CLRM

A

the model of:

y_t = a + bx_t + u_t

along with the 5 assumptions

25
what are the CLRM assumptions?
1) E[u_t] = 0 2) var(u_t) = sigma^2 < infinity 3) cov(u_i, u_j) = 0 4) cov(x_j, u_t) = 0 5) u_t ~n(0,sigma^2)
26
do we need all the assumptions?
To use the model, no. we dont actually need number 5 to fit the model. but we need this assumption to make inference later.
27
Elaborate on the properties of the OLS estimator
if assumptions 1-4 hold, OLS estimator gives estimates that are "BLUE". Best Linear Unbiased Estimator. Best refers to Gauss Markov theorem, stating that OLS estimator has the lowest variance in class. With the 1-4 assumptions in place, OLS estimator is unbiased, efficient and consistent.
28
is it possible to find an estimator with lower variance than OLS estimator?
Yes. but the thing is that it would not be unbiased and linear.
29
elaborate on the standard error of the OLS estimator
standard error is a measure of precision. Specifically, the standard error represent the standard deviation of the statistic. Theoretically, it is important to emphasize that standard error is an estimate of the standard deviation of the statistic. The standard errors represent a degree of variation in the statistic. Specifically, how much we should expect, on average, to see an estimate deviate from its mean. It is important to understand that standard error does not give any indication on the overall fit of the parameter. It only give us an answer in regards to "if we perform this experiment again, do we expect to see a very different result?".
30
what informaiton is provided by standard error
Precision, not accuracy. This means that we will get some indication on whether our estimate is robust or not. But whether it is a good estimateo f the true populaiton parameter, it cannto say
31
elaborate on t-ratio
special test where the hypothesis is that hte parameter we're testing for is 0. This gives us a ratio. We call it "t" because it follow student-t dist.
32