Multiple Linear Regression Flashcards

1
Q

How to do OLS for a multiple regression model

A

Same - square residuals and add up

We don’t have to remember the actual estimator formula, just provide stata x₁ x₂ etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

CLRA needs to be modified again for multiple regression model…

A

CLRA 1 - Written as
𝑌=𝛽₀ +𝛽₁𝑋₁ +𝛽₂𝑋₂ +…+𝛽k𝑋ki +𝜀i where β₀…βk are unknown parameters and εi is random

CLRA2: Error has EV of 0 E(ε|X₁,…Xk) but now conditional on all X variables (Xk)

CLRA 3: No regressors are constant, no exact linear relationships (no PERFECT collinearity i.e cov=-1 or 1)

CLRA 4: Errors are uncorrelated cov(εi) = 0 (can be correl just not perfectly)

CLRA5: Same finite variance var(ε|x₁,…,xk) = σ²

CLRA6: Normally distributed
εi|x₁,…,xk ~ N (0, σ²)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why may an estimator fail (2)

A

If perfectly collinear (breaks CLRA3)

If sample size n is too small in relation to parameters being estimated. Fails if n < k+1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do we work out the error variance σ² in multi

A

RSS/ n - (k+1)

(We are looking at a 3 variable model, so denominator would be n-3) (basically just n - amount of X variables!)

Same as error variance in bivariate except n-(k+1 instead of n-2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Coefficient of determination (goodness of fit) for multiple regression models

Problem: as we add more variables, R² increases (since more of the variation in y becomes attributable to the regression line)

So how do we cover for this?

A

Adjust R² to Rbar² and penalise the inclusion of more explanatory variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is the adjusted R² expressed

A

Rbar² = 1- [(n-1)/n-(k+1)) x (1-R²)]

SO this accommodates for the inclusion of more variables, by only increasing when the variables add something important to the analysis !

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Recall lowest variance of βhat₁… (standard error of an OLS estimator - we are estimating β₁ in this instance!)

Hint: includes goodness of fit statistic for x and z…

A

Uses FC6 or pg 3 formula for σ²(β₁) variance

var (βhat₁) = σ²/(1-R²zx) Σx²

R²zx is goodness of fit statistic (correl coefficient between x and z)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Now we can work out what affects the standard error of OLS estimators? (5)

Use the formula given on last page to test it out

A

Variance of error (σ²) - higher = higher variance of OLS estimator, so lower accuracy (dispersed)

Variation in X variable - If low, larger the standard error. (Hard to tell how variable contributes to the regression if X doesn’t change a lot e.g if schooling doesn’t vary much, hard to estimate contribution of variable) So high variation in X is good

Correlation of the variables (R²zx) - higher correlated, higher standard error so harder to work out how much each variable contributes indepently.

Sample size n - larger sample size, lower standard error.

Number of regressors k - more regressors increases standard error. (Higher degree of multicollinearity means lower degrees of freedom since n-(k+1) so less accurate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

CLRA 3 When are variables perfectly correlated (meaning not accurate) (2)

A

when one variable is a constant multiple of another

one independent variable can be expressed as an exact linear function of two or more of the other independent variables e.g x₁=x₂ , or x₁=x₂+x₃

How well did you know this?
1
Not at all
2
3
4
5
Perfectly