Econometrics Flashcards

(131 cards)

1
Q

5 Steps in Econometrics

A
  1. Come up with a question
  2. Build a model
  3. Get data
  4. Run your model
  5. Interpret and refine
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Cross Sectional data

A

data measured at same time, each person different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Time series data

A

data over time period, look at trends.

temporal dependence (a can provide info leading up to b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Pooled cross-section data

A

cross-sectional at multiple points in time, survey in 2020, and again in 2023, but with different people

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Panel data

A

Observations on the same cross-section units at different points in time

i.e Earnings in Victoria (worker 1 in 2022, worker 1 in 2023,worker 2 in 2022, worker 2 in 2023)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Ordinary Least Squares (OLS) meaning

A

Trying to find the line that best fits your data

* We want to draw a line that's as close as possible to all the dots (data points) 

* OLS does this by -> minimizing the squared distance between each point and the line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Experimental data

A

The gold standard for measuring causal effects is using data from randomized control trials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Purpose of the linear regression model

A

To estimate B1 and B0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Observed and Unobserved elements in linear regression formula

A

u = errors: a random variable which captures the effect on y of factors other than x (unobserved)

B0 = y intercept (unobserved)

B1 = slope between y and x (unobserved)

The data (x1, y1), (x2, y2), . . . ,(xn, yn) is observed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does observed and unobserved mean in linear regression formula

A

observed = we can already see from the data set, such as the explanatory/dependent variable, e.g. person 1 = 12 years of schooling (ev) and earns $50000 (dv)

unobserved = we can’t necessarily see, unless we work it out/run the regression e.g. b1,b0, u

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

when to add the subscript “i”

A

when we have a model with sample data/observations, when talking about actual data from your sample. Each observation (person, firm, country, etc.) gets its own little “i”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

when do we use ^ symbol

A

Before running regression:
We write:

yi​=β0​+β1​xi​+ui​

- because we don’t know the betas yet.

After running regression (using OLS):
We get estimates:

β^​0​,β^​1​

and use those to make predictions:

y^​i​=β^​0​+β^​1​xi​

basically our “predicted linear regression”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define intercept

A

When [x variables] are equal to zero, the predicted [y variable] will be equal to[number and unit].

e.g: if the median age in the state was 0 and a woman was from a western state, the average birth rate for a woman in that region would be equal to5.572 births per woman.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what does ^ mean and when do we use it

A

The hat (^) just means “this is an estimate.” You’re no longer talking about the true value, you’re talking about the value you calculated from your data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define slope

A

Controlling for [other variables], the predicted [y variable], on average [increases/decreases] with [number and unit] for each increase in [one unit] ofthe [x variable].

e.g: Controlling for region, the predicted birth rate in a state on average decreases with 0.128 births per woman for each year increase in the medianage in that state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define Error term

A

Controlling for [other variables], the [y variable] that is unexplained by these factors is equivalent to [number and unit]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Define Standard error

A

The standard error for each value is the standard deviation of the residuals. A smaller residual standard error means the regression model is a better fit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Matrix form

A

(m x n) where m = row, n = column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

row vector

A

(1 x n) → denoted by a’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

column vector

A

(m x 1) → denoted by a

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Transpose matrix

A

Interchanging the rows and columns → denoted by A

if transpose is the same as original, then it is SYMMETRIC matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Trace in matrix

A

For square matrices only
→ sum of the elements on the principal diagonal
→ denoted by tr(A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Identity matrix

A

Has 1 along the principal diagonal

  • Pre/post multiplying by I has no effect
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Orthogonal vectors

A

Where two vectors multiplied equals zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Norm vector
||a|| = (a'a)^1/2 The norm of the vector a is the square root of the scalar product of a with itself.
26
Inverse matrix
If A has an inverse matrix, it is nonsingular if and only if|A| ≠ 0 A*A^-1 = A^-1 * A = In
27
Determinant
|A| = ad - bc Square matrix only - |AB| = |A||B| = |B||A| = |BA|
28
Random variable
set of possible outcomes from a random experiment (e.g. flipping a coin) discrete, and continuous
29
Probabilities corresponding to each value:
p1, p2, . . . , pm p1 = P(X = x1) p2 = P(X = x2) and so on....
30
Expected value
is a weighted average of all possible values of X, with weights determined by the probability density function
31
Graph used for looking at random variable
Probability Density Function (PDF)
32
Write out in probability form: The Probability of that the height of a randomly selected man lies in a certain interval (between 180-190) is the area under the pdf over that interval
P(180
33
Expected value formula
E(X) = p1x1 + p2x2 + .... + pmxm
34
Expected value constant c rule
For any constant c, E(c) = c
35
Expected value constant a and b rule
For any constants a and b, E(aX + b) = aE(X) + b
36
The expected value of the sum of several variables IS the sum of their expected values formula
E(X + Y + Z) = E(X) + E(Y ) + E(Z)
37
E(X + Y + Z) = E(X) + E(Y ) + E(Z) how constants a,b,c,d are integrated into it:
E(a + bX + cY + dZ) = a + bE(X) + cE(Y ) + dE(Z)
38
Does E go through non-linear transformations
No. "go through" asks if you can pull constants or functions out of the expectation? a.k.a rearrange the equation and get the same result. non linear = squares, logs, or products e.g. E(XY) ≠ E(x) E(Y) But when X and Y are independent, it can. When X can't tell you something about Y
39
Example of X and Y being independent (coin toss) in Expected Value
Let’s define two random variables: Let’s define two random variables: 𝑋 = 1 if the first coin is Heads, 𝑋 = 0 if Tails 𝑌 = 1 if the second coin is Heads, 𝑌 = 0 if Tails Since the coins don’t influence each other: 𝐸 ( 𝑋 𝑌 ) = 𝑃 ( both heads ) = 𝑃 ( 𝑋 = 1 and 𝑌 = 1 ) = 0.25 Each coin is fair, and the two tosses are independent.
40
Medium of X probability formula
P(X ≤ xmed ) = 0.5
41
Q: X is a discrete random variable with possible values {−3, −1, 0, 1, 5}. The expected value of X is?
We are not told that the outcomes are equally likely (c) We cannot compute E(X) because we do not know the probability of each outcome
42
Define Variance
the expected distance from X to its mean, measuring how spread out the data is
43
Another way of writing E(X)
µ(𝑋) (or mu)
44
Variance formula
σ²ₓ = Var(X) = E{(X − μₓ)²} so if we had 6 values (going from 1-6), and the E(X) was 3.5 then Var(X) = ((1-3.5)^2 + (2-3.5)^2 + (3-3.5)^2 + (4-3.5)^2 + (5-3.5)^2 + (6-3.5)^2) / 6 = 2.92 so it dents to deviate by about 2.92 (squared units) from the mean
45
Variance (Var(X) can also be denoted as
σ²ₓ
46
Standard deviation formula
σₓ = SD(X) = √E{(X − μₓ)²} thus, σₓ = SD(X) = √Var(X)
47
Q: You have $1 to bet. You can either bet it all at once or play 20 cents for 5 rounds. Compare the risk and return for each strategy (not averaging returns/var)
we need to calculate 2 things here, the return (expected value) and the risk (variance) For the $1 bet: E(5X) = 5E(X) | return increases 5x Var(5X) = 25Var(X) | risk increases 25x why 25? because Var(aX) = a^2Var(X) For the 5x20 cent bets: E(X1 + X2 + X3 + X4 + X5) = 5E(X) | return increases 5x Var(X1 + X2 + X3 + X4 + X5) = 5Var(X) | risk increase 5x 5Var(X)Note: here we are not averaging returns/var conclusion: 5x20 cents is the better option
48
We can take a sample of one observation from the random variable and use that as the estimate of the mean, or we can take a sample of 5 observations and take the average of those 5 observations as the estimate of the mean (are averaging returns/var)
1 Observation: E(X) = mu Var(X) = sigma squared 5 Observations: E(X bar) = ⅕(X1 + X2 + X3 + X4 + X5) = mu Var(X bar) = Var (⅕(X1 + X2 + X3 + X4 + X5)) = 1/25[(Var(X1) + Var(X2)...] = 1/25[⅕(sigma squared)] = ⅕(sigma squared). Note: here we ARE averaging returns/var
49
Define Covariance
measures how two variables change together (proportional or inversely proportional) can be misleading if X and Y differ in units (100s v 1000s)
50
Covariance formula
Cov(X,Y) = E((X−μX​) (Y−μY​)) = E(XY) − μX​μY​ You're looking at how far X is from its mean, and how far Y is from its mean Then taking the average of their product
51
What if Covariance is > 0
If Cov ( 𝑋 , 𝑌 ) > 0 Cov(X,Y)>0, then on average: 𝑋 > 𝐸[𝑋] ⇒ 𝑌 > 𝐸[𝑌] , and 𝑋 < 𝐸[𝑋] ⇒ 𝑌 < 𝐸 [𝑌] When X is above average, Y tends to be above average too When X is below average, Y tends to be below average → They “move together”
52
What if Covariances is < 0
If Cov ( 𝑋 , 𝑌 ) < 0 Cov(X,Y)<0, then on average: 𝑋 > 𝐸 [𝑋] ⇒ 𝑌 < 𝐸 [𝑌] , and 𝑋 < 𝐸 [𝑋] ⇒ 𝑌 > 𝐸 [𝑌] When X is above average, Y tends to be below average When X is below average, Y tends to be above average → They “move oppositely”
53
What if Covariance is = 0
Could mean they’re independent Or just that they have no linear relationship — other types of dependence might still exist
54
X and Y with independence and covariance
If X and Y are independent Cov(X, Y) = 0 But, if X and Y are dependent Cov(X, Y) does not necessarily ≠ 0
55
Covariance and a and b constant
Where a and b are constant → Cov(aX, bY) = abCov(X, Y) unlike expected value, they can do scalar, and cross-products
56
Variance constant rule | c, Var(c)
For any constant c, Var(c) = c.
57
Q: X and Y are statistically independent random variables. If Var(X) = 4 and Var(Y ) = 9 what is Var(2X − Y)?
Var(2X − Y ) = 4Var(X) + Var(Y) = 4 × 4 + 9 = 25
58
Variance constant a and b rule | Var(aX + b)
For any constants a and b, Var(aX + b) = Var(X)
59
Variance constant a and b rule | Var(aX + bY) or Var(aX - bY)
Var(aX + bY) = a^2 Var(X) + 2abCov(X, Y) + b^2 Var(Y) Var(aX - bY) = a^2 Var(X) - 2abCov(X, Y) + b^2 Var(Y)
60
Correlation formula
Corr(X, Y ) = Cov(X, Y ) / sd(X)sd(Y )
61
Define statistical dependence
Two random variables that have non-zero covariance or correlation are statistically dependent, meaning that knowing the outcome of one of the two random variables gives us useful information about the other.
62
Weighted value
You’re multiplying each value of Y by how likely it is, and summing it all up E[Y] = y1​⋅P(Y=y1​) + y2​⋅P(Y=y2​)+…
63
LIE (Law of Iterated Expectations)
The overall expected value of Y is the average of the expected values of Y given X, weighted by the probability of each X. E(Y) = E(Y∣X=1)P(X=1) + E(Y∣X=2)P(X=2) + E(Y∣X=3)P(X=3)
64
Conditional probability density function
P (y|x) = P (x and y)/P (x)
65
Conditional expectation function
E[Y|X] = y1 f(y1|X) + y2 f(y2|X) E[0.2|0.8 = 1] = 1 x 0.20 + 2 x 0.80 = 1.80
66
Population parameters vs sample statistics
Difference between u and uhat “Population” = the universal reality (y=β0​+β1​x1​+u) “Sample” = the observable universe → hat (y=βhat​0​+βhat​1​x1​+uhat) sample mean is ESTIMATOR for population mean
67
Simple linear regression formula
yi​=β0​+β1​xi​+ui​ with E(ui​∣xi​)=0
68
Formula: Variance of OLS estimators
o^2 / SST(1-R)
69
Formula: Standard error of OLS estimators
sqrt / o(hat)^2/ SST(1-R)
70
Formula: t-calc for hypothesis testing
t = B(hat)1 - B0 / se(B(hat)1)
71
Formula: degrees of freedom
n-k-1 n = sample number k = no. of coefficients incl. intercept q = number of restrictions
72
Formula: F-stat for joint hypothesis testing
F = ((SSRr - SSRur) / q) / (SSRur / (n-k-1))
73
Formula: R-squared
R^2 = SSE/SST = 1 − SSR/SST
74
Formula: adjusted R-squared
1- ((SSR / (n-k-1)) / (SST / (n-1)))
75
ui formula
ui​ = yi ​− E(yi​∣xi​)
76
OLS estimator purpose
try to find the best fitted line. estimators of β0 and β1 are the values of b0 and b1 which minimise the sum of squared residuals (SSR).
77
B(hat)0 OLS formula:
β^​0​= yˉ​ − β^​1​xˉ
78
B(hat)1 OLS formula
β^​1​= hat(Cov(x,y)) / hat(Var(x)) also = o(hat)/o(hat)^2
79
Formula: Predicted or fitted values y(hat) - OLS
y(hat)i = B(hat)0 + B(hat)1
80
Formula: Prediction errors or residuals u(hat)i - OLS
u(hat)i = yi - B(hat)0 - B(hat)1
81
SST
Total sum of squares: measures of TOTAL sample variation in y Measures how much Y varies overall (without using the model) - e.g. if we only look at people's heights without considering any factors
82
SSE
Explained sum of squares: measure of sample variance in Y(HAT) measures how much Y varies after applying the model (explained by X) - e.g. if height differences can be explained by age and genetics
83
SSR
Residual sum of squares: measures of sample variation in U(HAT) Measures how much Y is not explained by the model (random error) - e.g. if two people of the same age and genetics have different heights, that's unexplained variation - When you add more variables, SSR always decreases regardless of the quality of the variable
84
SST formula
SST = SSE + SSR
85
SSR formula
Variance * (n - k - 1)
86
Adjusted R squared
tells us how well a set of predictor variables is able to explain the variation in the response variable, adjusted for the number of predictors in a model.
87
what does a 95% confidence interval mean
We are 95% confident that the population parameter lies between two values. if 0 is NOT included in the confidence interval -> the x factor is STATISTICALLY SIGNIFICANT, which means there is an effect (reject null)
88
Formula: Confidence interval for Bj
B(hat)j +- ta/2 * se(b(hat)j)
89
SSR, SSE, SST formula's written in sum (Σ) form
SSR: Σ (yi - y(hat)i)^2 SSE: Σ (y(hat)i - yˉ​ )^2 SST: Σ (yi - yˉ​ )^2
90
B(hat) matrix formula
β(hat)​ = (X′X)^-1 * X′y
91
t crit formula
t(n-k-1) n = sample number k = no. of coefficients incl. intercept
91
t calc formula
t = B(hat)1 - B0 / se(B(hat)1)
92
Hypothesis testing (8 Steps)
1. Report equations 2. Formulate null (Ho) / alternative (H1) 3. Pick test statistic under Ho 4. Calculate calc value 5. Calculate crit value 6. Decision rule 7. Decision 8. Conclusion
93
Hypothesis testing (8 steps)
Step 1: Report equations → Write estimated regression model (incl. standard errors in brackets underneath each coefficient) Step 2: Formulate H₀ and H₁ → State the null and alternative hypotheses clearly. Step 3: Test statistic under H₀ → Choose the appropriate test statistic formula (usually a t-statistic). Step 4: Calculate calc value Step 5: Calculate crit value → Find the critical value Step 6: Decision rule → Define your rejection rule (e.g. reject H₀ if |tₐ| > tₖᵣᵢₜ). Step 7: Decision → Compare your calc and crit values → reject or fail to reject H₀. Step 8: Conclusion → Write a contextual interpretation in plain English.
94
T-test (steps)
1. Write out the estimated population equation (with parameters under of standard errors) 2. H0: β1 = 0, H1: β1 > 0 3. Test statistic under the null distribution t = B(hat)1 - B0 / se(B(hat)1) ~ t(n-k-1) 4. using a significance level of a = 0.05 tcalc = t = B(hat)1 - B0 / se(B(hat)1) 5. tcrit = t(n-k-1) 6. Reject null if tcalc > tcrit 7. Since a > b, then we reject the null hypothewsis and conclude the higher of x, the higher the y, holding constant of all other factors
95
Key points when reporting regression
1. State the population model (Always include i and u) 2. State the estimated model (Always include i, hats but NEVER u) 3. Report the estimated model coefficients (Check decimal places, rounding, and sign) 4. Report the standard errors (Check decimal places and rounding, always directly under the coefficient) 5. Report R-squared (Usually to the right side or the bottom) 6. Optional significance stars (*p-val < 0.05, **p-val < 0.01, ***p-val < 0.001)
96
Population model regression vs Estimated model regression
population model: yi​= β0​ + β1​xi ​+ ui​ estimated model: y(hat)i​= β(hat)0​ + β(hat)1​xi ​ no 'u' because it cannot be observed
97
p-value
p value < 0.05 we reject (statistically significant) p value > 0.05 we fail to reject (statistically insigificant)
98
E(wageᵢ | femaleᵢ, educᵢ) = β₀ + δ₀femaleᵢ + β₁educᵢ, what are the implications of the dummy variable
If femaleᵢ = 0: E(wageᵢ) = β₀ + β₁educᵢ → represents men If femaleᵢ = 1: E(wageᵢ) = (β₀ + δ₀) + β₁educᵢ → represents women * δ₀ measures the difference in intercept between women and men. (intercept slope) - only need 1 dummy variable if we only have 2 possibilities (male/female)
99
Intercept and slope dummy
Intercept (δ₀) = difference in base outcome between groups Slope (δ₁) = difference in effect of another variable between groups (e.g. education effect on male and female)
100
Slope dummy
E(wageᵢ | femaleᵢ, educᵢ) = β₀ + δ₀femaleᵢ + β₁educᵢ + δ₁(femaleᵢ × educᵢ) * Implies: ○ For men: E(wageᵢ) = β₀ + β₁educᵢ ○ For women: E(wageᵢ) = (β₀ + δ₀) + (β₁ + δ₁)educᵢ * δ₁ captures the difference in slope between men and women.
101
Hypothesis testing for Dummy variables
Test for whether dummy variables jointly affect the dependent variable. * Joint hypothesis: ○ H₀: Both δ₀ and δ₁ = 0 (gender has no effect) ○ H₁: At least one ≠ 0 (gender does affect wage) * Two models: ○ Unrestricted includes dummy and interaction terms ○ Restricted includes only non-dummy regressors * Use F-test for joint hypothesis testing:
102
Hypothesis testing for Dummy variables (unrestricted vs restricted)
Unrestricted: wageᵢ = β₀ + δ₀femaleᵢ + β₁educᵢ + δ₁(femaleᵢ × educᵢ) +ui Restricted: wageᵢ = β₀ + β₁educᵢ + u
103
do we conclude the investigation, if T-tests do not reject H₀, δ₀, δ₁ (dummies)
No, F-tests can show different results when the dummies are not in their isolated stage
104
Dummy variable trap
when you fail to omit one dummy or omit the intercept as a base measure against the other variables
105
Imagine 4 industry sectors → transport, consumer products, finance, utility. transᵢ = 1 if transport consprodᵢ = 1 if consumer products financeᵢ = 1 if finance utilityᵢ = 1 if utility now testing with the log (salary) log(salaryᵢ) = β₀ + β₁financeᵢ + β₂consprodᵢ + β₃utilityᵢ + β₄log(salesᵢ) + β₅ROEᵢ + uᵢ using transport as the benchmark (omitted) What are they key points with this regression model, in terms of relationships of dummies with the benchmark and with each other? What does each beta represent?
log(salaryᵢ) = β₀ + β₁financeᵢ + β₂consprodᵢ + β₃utilityᵢ + β₄log(salesᵢ) + β₅ROEᵢ + uᵢ - β₀ measures the average log salary in the transport industry (i.e the base dummy) ○ β₁ = difference in log salary between finance and transport. ○ β₂ = difference in log salary between consumer products and transport. ○ β₃ = difference in log salary between utility and transport. - β₂ − β₁ measures the difference between the average log salary in the consumer product firms and finance firms.
106
log-level model
when the dependent variable is in log: log(ŷ) = β̂₀ + β̂₁x₁ + ... + β̂ₖxₖ β1(hat) is the change in log(y)(hat) as x1 increases by 1 unit, all else constant say B1 = 17, instead of normal where we would say, when x increases by 1 y increases by 17 on average, we would say y increase by 0.04% on average. as it is logged. we get this value through the e formula: For dummy variables: % change in y due to category = 100(e^β̂ − 1)%
107
Example of when you should use the log models in real life scenarios
Total debt and education → log level Wage and GDP → level log GDP and unemployment → log log
108
Quadratic models for log
y= β0 ​+ β1​x + β2​x2 + u derivative: Marginal effect of x on y : β1​ + 2β2x e.g. sleep vs age: Sleep might go down with age, but after a point, it might go up again → quadratic pattern.
108
when to use logs
- if y must be positive - if % changes make more sense - if data is skewed don't log years (age, education), % variables, or negatives
109
log-level vs level-log vs log-log
Log-Level model: log(y) = β0 + β1x + u β1 → % change in y when x increases by 1 unit. Level-Log model: y = β0 + β1​log(x) + u β1/100 → change in y when x increases by 1%. Log-Log model: log(y) = β0 + β1log(x) + u %β1 → % change in y when x increases by 1% → elasticity notes; - The r squared for a log regression for y is different to the original - when tested on a variable (e.g. B1) it is holding all other factors constant
110
Interpretation of quadratic models for log
y= β0 ​+ β1​x + β2​x2 + u The coefficients of x and x^2 on their own do not have meaningful interpretation. Find when derivative = 0 Maximum when B2 < 0, minimum when B2>0 Min or max = -a/2b When in doubt about whether to add a quadratic term, we can add it and check its statistical significance or see if it improves the adjusted R squared
111
When to use quadratic model?
1. Is effect of x constant or changing over x? 2. is there a peak/optimal x for y? (wage and age)
112
Model selection criteria
R^2, Adjusted R^2, or Information criteria (AIC, HQ, BIC)
113
R^2 (model selection)
R^2 = SSE/SST = 1 - ((SSR / (n-1)) / ((SST / (n-1)) SSR never goes up with more regressions, R^2 not reliable when models differ in # of regressors
114
Adjusted R^2 (model selection)
Adjusted R^2 = 1 - ((SSR / (n-k-1)) / ((SST / (n-1)) Penalizes adding too many variables. Only goes up if adding a variable really helps.
115
Information Criteria (IC)
All ICs balance two things: 1. How well the model fits (low SSR = good) 2. How simple the model is (fewer variables = good) General formula: IC= c + ln(SSR) + P(k)/n penalty increases with # of regressors, and lowers SSR. we prefer the model = the LOWEST IC value and HIGHEST ADJ R^2
116
AIC
AIC = c1 + In(SSR) + 2k/n
117
HQ
HQ = c2 + In (SSR) + 2k ln(ln(n)) / n
118
SIC/BIC
BIC = c3 = In(SSR) + kln(n) /n
119
Multiple linear regression form
| y1 | | x11 | | x1k | | u1| | y2 | | x21 | | x2k | | u2 | | : | = B0 + | : | B1 + ' ' ' ' | : | Bk + | : | | yn | | xn1| | x3k | | un |
120
3 features of the OLS matrix notation
R^2 = 1 - SSR/SST B(hat) = (X'X)^-1 X'y AND B1(hat) = cov(x,y)hat / var(x)hat meaning: Note: the difference between the population parameter β andits OLS estimator β hat.- β is constant and does not change.- Β hat is a function of sample and its value changes fordifferent samples. X'u(hat) = 0 Meaning: The vector of residuals must be orthogonal to every column ofthe X (i.e when multiplied must equal 0)
121
What does it mean when the OLS estimator is BLUE
best linear unbiased estimator’ for Beta
122
What does unbiased estimator mean
is an unbiased estimator of a parameter of interest if its EXPECTED VALUE is the PARAMETER OF INTEREST E(β^) = β or matrix form: E(β^) = E[(X'X)^-1X'y] = β β^ = estimator β = parameter of interest
123
5 Assumptions of biasedness
A1: The population model is linear in parameters: y = Xβ + u A2: Columns of X are linearly independent A3: conditional mean of errors is zero: E(u|X) = 0 A4: Homoskedasiticty and no serial correlation: Var (u|X) = o^2*ln A5: Errors are normally distributed u|X ~ N(0, o^2*ln) if these assumptions do not hold, the null distribution and T statistics are no longer reliable
124
Law of iterated expectations (LIE)
E (Y) = Ex [E(Y | X)]
125
THE OLS estimator is held under 3 assumptions (LIE)
E(β^) = E[(X'X)^-1X'y] = β A1: The population model is linear in parameters: y = Xβ + u A2: Columns of X are linearly independent A3: conditional mean of errors is zero: E(u|X) = 0
126
Variance of the OLS estimator
Var(β^​) = σ^2 (X′X)^−1 choose the estimator with the smallest variance
127
Covariance of the OLS estimator
The matrix contains: 2x2 - Variances of each β^j ​on the diagonal - Covariances between β^j ​ and β^k ​off-diagonal. top left: Var(B(hat)0) top right: Cov(Bhat0, Bhat1) bottom left: Cov(Bhat1, Bhat 0) bottom right: Var((Bhat)1)
128
Variance of the OLS estimator
B(hat) ~ N(B, o^2 9X'X)^-10
129