Chapter 3 Brooks Flashcards by Martin Sund

what is regression

evaluating the relationship between one variable and movements in one or more other variables

How well did you know this?

Not at all

Perfectly

What are we trying to do with regression

We are trying to explain movements in some dependent variable, y. We are trying to explain its movements by using explanatory variables.

How well did you know this?

Not at all

Perfectly

what is correlation

The degree of linear association between two variables

How well did you know this?

Not at all

Perfectly

what is the danger with correlation

1) Assume causaility
2) Misunderstand what linear association is

Correlation is given as:

cov(X,Y)/(sigma_X sigma_Y)

cov(X,Y) = E[(X - mu_x)(Y - mu_y)]

How well did you know this?

Not at all

Perfectly

what is the interpretation of covariance

The interpretation of covariance is “The expected difference from the mean of X multiplied by the difference from the mean of Y.” Meaning, the covariance of X and Y represent how much we expect X to move away from its mean in relation to how much Y moves away from its mean.

How well did you know this?

Not at all

Perfectly

elaborate on the role of correlation

The role of correlation is to provide an understanding of the linear relationship between two variables. If the linear relationship is perfect, it means that movement in one variable can perfectly explain the movement in the other.

HOWEVER: The correlation coefficeint is not interpretable for this. The coefficient only tell us the degree of the relationship.

We need to use linear methods to capture the linear relationship. This is where the linear regression come into play.

How well did you know this?

Not at all

Perfectly

first step to see if there is a relationship between two variables

Plot it visually

How well did you know this?

Not at all

Perfectly

elaborate on:

y = a + bx

This is an exact line. The problem with this is that it doesnt account for errors.

This is a model, a best fitting line, but it is not realistic

How well did you know this?

Not at all

Perfectly

What is a better model than

y = a + bx?

y_t = a + bx + u_t

This model assumes a relationship but there is a random disturbance term that always exist. Might be because of how it is impossible to catch everything, etc.

How well did you know this?

Not at all

Perfectly

generally speaking, how do we find values for alpha and beta

we need to find the alpha and beta that makes the sum of vertical distances between points y_t and y be as small as possible.

How well did you know this?

Not at all

Perfectly

why vertical and not horizontal distanves?

we make use of the assumption that the x-values are fixed in random samples. this means that there is no random element here. Therefore, this assumption reduce our task to finding only the y_t’s.

How well did you know this?

Not at all

Perfectly

Most common line fitting method

OLS

How well did you know this?

Not at all

Perfectly

general procedure of OLS

Square the distances between y_t and the exact line y. Sum together. find the y-line that makes this sum the samllest

How well did you know this?

Not at all

Perfectly

with correct notaiton how do we describe the method of OLS

Minimize the sum of squared differences between y_t and y_t-pred

How well did you know this?

Not at all

Perfectly

what is y_t and what is y_t_pred

y_t is the data point, as collected.

y_t_pred is the predicted data point.

How well did you know this?

Not at all

Perfectly

Other way of describing the method of OLS

Study These Flashcards

Minimizing the sum of squared errors/residuals

what is the residual mathematically?

Study These Flashcards

The difference between y_t and y_t_pred

what is PRF

Study These Flashcards

Population regression function, the function that is thought to be producing the data

give the PRF

Study These Flashcards

y_t = a + bx_t + u_t

Why does PRF contian error?

Study These Flashcards

Because even though it is the true process, the true process can contian random elements.

What is SRF?

Study These Flashcards

Sample regression function

Give the SRF

Study These Flashcards

y_t_pred = a_pred + b_pred x_t

why not error in SRF

Study These Flashcards

Because it is the best fitting line that we have found to minimize the RSS.

what is CLRM

Study These Flashcards

the model of:

y_t = a + bx_t + u_t

along with the 5 assumptions

what are the CLRM assumptions?

1) E[u_t] = 0 2) var(u_t) = sigma^2 < infinity 3) cov(u_i, u_j) = 0 4) cov(x_j, u_t) = 0 5) u_t ~n(0,sigma^2)

do we need all the assumptions?

To use the model, no. we dont actually need number 5 to fit the model. but we need this assumption to make inference later.

Elaborate on the properties of the OLS estimator

if assumptions 1-4 hold, OLS estimator gives estimates that are "BLUE". Best Linear Unbiased Estimator. Best refers to Gauss Markov theorem, stating that OLS estimator has the lowest variance in class. With the 1-4 assumptions in place, OLS estimator is unbiased, efficient and consistent.

is it possible to find an estimator with lower variance than OLS estimator?

Yes. but the thing is that it would not be unbiased and linear.

elaborate on the standard error of the OLS estimator

standard error is a measure of precision. Specifically, the standard error represent the standard deviation of the statistic. Theoretically, it is important to emphasize that standard error is an estimate of the standard deviation of the statistic. The standard errors represent a degree of variation in the statistic. Specifically, how much we should expect, on average, to see an estimate deviate from its mean. It is important to understand that standard error does not give any indication on the overall fit of the parameter. It only give us an answer in regards to "if we perform this experiment again, do we expect to see a very different result?".

what informaiton is provided by standard error

Precision, not accuracy. This means that we will get some indication on whether our estimate is robust or not. But whether it is a good estimateo f the true populaiton parameter, it cannto say

elaborate on t-ratio

special test where the hypothesis is that hte parameter we're testing for is 0. This gives us a ratio. We call it "t" because it follow student-t dist.

Chapter 3 Brooks Flashcards

(32 cards)