Bivariate Linear Regression Flashcards

1
Q

regression model for consumption

A

Add ε to

Y=β₀+β₁x + ε

β₀ is intercept
β₁is MPC
ε is random variable to represent other factors influencing consumption.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When is a model linear

A

If it is linear in the parameters. Contains no unknown parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Line of best fit formula

A

OLS (ordinary least squares)

Use estimated regression formula
Yi=β’₀ + β’₁xi + ε’i

Rearrange this to make residuals the subject, then square
ε’²i = min Σ(Yi - β₀ - β₁Xi)²

I.e square all the deviation/residuals, then add up.

I.e if n=28, used as example.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Then, how to find minimum S

What values do we get for estimated β₀ and β₁

A

Differentiate β₀ and β₁ individually from
Min Σ(Yi - β₀ - β₁Xi)² and set =0 as we want to find minimum

To get…

β₀ = Ybar - β₁Xbar

β₁= …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

So now we know how to estimate β₁ and β₀,

Why do we need to know properties of OLS estimators

A

To see if the estimates we get for β₁ and β₀ are accurate

E.g we found B1 (MPC) is 0.7765, so per every £1 increase consumption by 0.77p. Is this accurate? We have to check properties

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Properties REQUIRED of OLS estimators (2)

A

Unbiased - centre of distribution
Efficiency - small variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do we know OLS has these properties?

A

If certain conditions are satisfied, called classic linear regression assumptions. (CLRA)

SO WE USE CLRA TO ENSURE OLS IS UNBIASED AND EFFICIENT TO BE ABLE TO ESTIMATE B0 AND B1 ACCURATELY)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Classical linear regression assumptions

A
  1. Model is written as
    Y=β₀+β₁x + ε (and β₀ β₁ are unknown parameters/constants)
  2. Explanatory variable X is fixed/non-stochastic (we can choose values of X in order to observe effects on Y
  3. X is not a constant. It is variable, researchers adjust to observe values of Y!
  4. Error (ε) has EV(mean) of 0
  5. No 2 errors are correlated. Except in time-series models.
  6. Each error has same σ² variance except in cross-sectional models. (HOMOSCEDASTIC)
  7. Error is normally distributed. (Mean 0, variance σ²) allows us to do hypothesis testing.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Theoretical assumption 1

A

Under classical linear regressions assumptions 1-4 holding,

OLS estimators are unbiased.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Theoretical result 2

A

Under CLRA 1-6 holding,

OLS estimator is the best linear unbiased estimator (BLUE).

(MINIMUM VARIANCE, SO MOST EFFICIENT)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Theoretical result 3

A

Under CLRA 1-7 holding

OLS estimator is the minimum variance unbiased estimator of linear AND non-linear estimators. (TR2 is just best LINEAR only)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Proof for TA1 : estimator of slope parameter B^₁ is unbiased

A

Start with this
E(β^₁) = E(β₁+Σwiεi)

Simplify using technical appendix to make
= β₁ + ΣWi E(εi)

E(β₁)=β₁ using estimator properties
CLRA2 removes ΣWi from the expectations operator (since X is fixed)
CLRA 4 - ε mean is 0, therefore it leaves E(β^₁)= β₁ therefore unbiased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Proof for why estimator of slope parameter B^₀ is unbiased

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Proof for lowest variance (BLUE)

A

CLRA 5 and 6 needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Learn 6* and 7* lowest variance formulas, we should end up getting that answer in the proof which proves TR2 and BLUE (lowest variance)

A

For estimated B₁,
Start with var (β^₁) = var (β₁ +Σwiεi)

Simplifies to var(Σwiεi) since β₁ is constant and thus no variance so =0

CLRA5 removes the correlation part to leave (as correl=0)
Σwi²var(εi)
CLRA6 - all ε has equal variances of σ²
E comes
σ²Σwi²

Then finally replace Wi with orignal X function to get 7* (would’ve got the lowest possible variance result)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Random Regresssor

A

Rarely we actually have fixed X (CLRA2), so we have to consider X will be a random regressor

17
Q

Relationship between x and ε

A

We look at the value of ε given values of X (as random now)

We assume ε does not depend on value of X

18
Q

How is this expressed?

A

E (εi | Xi) = E(εi)

means conditional on X, but since we assume ε does not depend on X, it is the same

19
Q

What does this expression also show

A

Shows X and E are mean independent (X is strictly exogenous to E)

20
Q

What happens when we apply CLRA4 to the X and ε relationship

A

𝐸(𝜀𝑖|𝑋𝑖) = 0

21
Q

What does the zero conditional mean look like as an example

A

Assume ε represents innate ability and we are looking at wages.

So the average innate ability is the same regardless of given X (schooling years)

I.e whether it be 1 year or 20 years in education, ability stays the same.

However, if we think innate ability increases with schooling years, zero conditional mean does not hold.

22
Q

So now we need to adjust the classical assumptions to account for the random X and the zero conditional mean assumption.

A
23
Q

CLRA 1-6 accounting for random X

(No longer 7 assumption)

A

CLRA1 - the same except ε₁ is random

CLRA 2 - there IS variation in X variable (same)

CLRA 3 - Error has zero conditional mean assumption.

CLRA 4 -disturbances are conditionally uncorrelated, 𝑐𝑜𝑣(𝜀i, 𝜀j |𝑋) = 0 ,where 𝑖 and 𝑗 are time periods, because this only affects time-series models. (Same but just add |X)

CLRA 5 - Each E has finite CONDITIONAL VARIANCE 𝑣𝑎𝑟(𝜀𝑖|𝑋) = 𝜎2

CLRA 6 - E , conditional on X is normall distributed
𝜀𝑖|𝑋~𝑁(0, 𝜎2).

24
Q

Coefficient of determination - goodness of fit

A

R squared.

Shows how much of the total variation of Y is attributable to the regression line.

25
Q

When we consider the coefficient of determination , how does the regression change?

A

𝑌i = 𝛽^₀ + 𝛽^₁𝑋i + 𝜀̂i = 𝑌^ + 𝜀̂𝑖

Actual Y = 𝑌^ + error (Estimated value of y + deviation above or below the line)

26
Q

Total sum of squares

A

Σ(Yi -Ybar)²

27
Q

Explained sum of squares

A

Σ(Y^i-Ybar)²

28
Q

Residual sum of squares

A

Σε^i²

Same thing that OLS is trying to minimise! (Square all residuals and then sum up)

29
Q

R² formulas (2)

A

ESS/TSS

Or

1 -RSS/TSS

30
Q

What is the t statistic formula to work this out

A

T= βhat₁ - δ
/
√σ²/Σ(Xi-Xbar)² (or SE of β₁)

~ t(n-2) since 2 unknown parameters (β₀ and β₁)

But we also need to know formula for σ²
σ²= Sum of residiuals squared / n-2

31
Q

Error variance σ² formula

A

RSS/n-2

32
Q

Confidence bands

A

Bhati + tcv (seβ₁)