# 07 Linear Regression Flashcards

1
Q

σ

A

Population standard deviation

2
Q

s

A

Sample standard deviation:

An estimator of the population standard deviation

3
Q

s_y

A

s_y is the estimate of the population standard deviation for the random variable Y of the population from which the sample was drawn

4
Q

SE()

A

Standard error of an estimator:
An estimator of the standard deviation of the estimator

SE( ̄Y) = ˆσ_ ̄Y = s_y / √n

5
Q

μ

A

Population mean

6
Q

u

A

all other factors than X that affects Y

7
Q

synonyms for “dependent variable” and “independent variable”

A

dependent variable vs. independent variable
explained variable vs. explanatory variable
predicted variable vs. control variable
response variable vs. control variable
regressand vs. regressor

8
Q

A normally distributed variable (X) can be made standard normal by:

A

Z = (X - μ) / ( σ / root(n))

9
Q

The sample average is normally distributed whenever:

A
• Xi is normally distributed

- n is large (CLT)

10
Q

T variable

A

T = (X - μ) / (s_x / root(n))

11
Q

SLRM

A

Simple Linear Regression Model

12
Q

The sum of squared prediction mistakes over all n observations

A

sum[(Y - E(β0) - E(β1)X)^2]

13
Q

E(β0)

A

E(β0) = avg(Y) - E(β1) * avg(X)

Given by derivation++ of sum[(Y - E(β0) - E(β1)X)^2]

14
Q

E(β1)

A

E(β1) = sum[(X - avg(X)) (Y - avg(Y))] /
sum[(X - avg(X)^2]

Given by derivation++ of sum[(Y - E(β0) - E(β1)X)^2]

E(β1) = r_{XY} * s_Y / s_X

15
Q

If uˆi is positive, the line ____ Yi

A

If uˆi is positive, the line underpredicts Yi

16
Q

By the definition of uˆ and the first OLS first order condition the sum of the prediction error is …

A

By the definition of uˆ and the first OLS first order condition the sum of the prediction error is zero

Sum(û_i) = 0

17
Q

The sample covariance between the independent variable and the OLS residuals is …

A

The sample covariance between the independent variable and the OLS residuals is zero.

18
Q

The point … is always on the regression line (OLS)

A

The point (X ̄ , Y ̄) is always on the regression line (OLS)

19
Q

different goals of regression

A

Among others:

• Describe data set
• Predictions and forecasts
• Estimate causal effect
20
Q

Causality

A

Causality is the effect measured in an ideal randomized controlled experiment

21
Q

The OLS estimator is unbiased, consistent and has asymptotically normal sampling distribution if:

A

The OLS estimator is unbiased, consistent and has asymptotically normal sampling distribution if:
- Random sampling
- Large outliers are unlikely
- The conditional mean of u_i given X_i is 0:
E (u|X ) = 0
E(abil | educ = 8) = E(abil | educ = 16).

22
Q

The OLS estimator is ___, ____ and has _____ if:

• Random sampling
• Large outliers are unlikely
• The conditional mean of u_i given X_i is 0
A

The OLS estimator is unbiased, consistent and has asymptotically normal sampling distribution if:

• Random sampling
• Large outliers are unlikely
• The conditional mean of u_i given X_i is 0
23
Q

(OLS) When dealing with outliers one may want ______

A

When dealing with outliers one may want to report the OLS regression both with and without the outliers

24
Q

OLS is the most e cient (the one with the lowest variance) among all linear unbiased estimators whenever:

A

OLS is the most ecient (the one with the lowest variance) among all linear unbiased estimators whenever:

• The 3 OLS assumptions hold
• The error is homoskedastic
25
Q

TSS

A

Total sum of squares:
Sum[(Yi - avg(Y))^2]

TSS = ESS + SSR

26
Q

ESS

A

Explained sum of squares:

Sum[(^Yi - avg(Y))^2]

27
Q

SSR

A

Sum of squared residuals:

Sum[û_i^2]

28
Q

R^2

A

The regression R^2 is the fraction of the sample variance of Yi explained by Xi:

R^2 = ESS / TSS = 1 - SSR / TSS

R2 = 0 - none of the variation in Yi is explained by Xi
R2 = 1 - all the variation is explained by Xi, all the data points lie on the OLS line.
A high R2 means that the regressor is good at predicting Yi (not necessarily the same as a ”good” regression)

29
Q

SER

A

The standard error of the regression (SER) is an estimator for the standard deviation of the regression error u_i.

SER = SSR / (n - 2)

It measures the spread of the observations around the regression line.

30
Q

If the independent variable is multiplied by som nonzero constant c, then the OLS slope coeffi cient is _____

A

If the independent variable is multiplied by som nonzero constant c, then the OLS slope coeffi cient is divided by c.

31
Q

Homoskedasticity

A

The error u has the same variance given any value of the explanatory variable, in other words: Var(u|x) = 2

Homoskedasticity is not required for unbiased estimates, but it is an underlying assumption in the standard variance
calculation of the parameters. To make the variance expression easy the assumption that the errors are homoskedastic are added.

32
Q

The larger the variance of X, the ____ the variance of E(β1)

A

The larger the variance of X, the smaller the variance of E(β1)

33
Q

Var(E(β1))

A

Holy shit. (Appendix 4.3)

34
Q

s_xy

A

Sample covariance

1 / (n - 1) * sum{(Xi - avg(X))(Yi - avg(Y))}

35
Q

s^2_X

A

Sample variance of X

1 / (n - 1) * sum{(Xi - avg(X))(Xi - avg(X))}

36
Q

sample correlation coefficient

A

r_{XY} = s_{XY} / (s_X * s_Y)

37
Q

consistency

A

A variable is consistent if the spread around the true parameter approaches zero as n increases

38
Q

Normality assumption

A

The population error u is independent of explanatory variables and is Normal(0, σ^2)

• Whenever y takes on just a few values it cannot have anything close to a normal distribution.
• The exact normality of OLS depends on the normality of the error.
• If the βˆ is not normally distributed the t-statistic does not have t-distribution.
• The normal distribution of u is the same as the distribution of Y given X.
• In large samples we can invoke the CLT to conclude that the OLS satisfy asymptotic normality.
39
Q

E(β1) ∼

A

Normal[β1, Var(E(β1))]

Thus (E(β1) − β1) / std(E(β1)) ∼ Normal(0, 1)

This comes from:
• A random variable which is a linear function of a normally distributed variable is itself normally distributed.
• If we assume that u ∼ N(0, σ2) then Yi is normally distributed.
• Since the estimators βˆ and βˆ is linear functions of the Yi’s then the estimators are normally distributed.

40
Q

In general the t-statistics has the form:

A

t = (estimator - hypothesised value) / standard error of the estimator

41
Q

A coefficient can be statistically significant either because ____

A

A coefficient can be statistically significant either because the coefficient is large, or because the standard error is small.

42
Q

(OLS) Which standard errors are prefered?

A

Heteroskedasticity robust standard errors

In econometric applications the errors are rarely homoskedastic and normally distributed, but as long as n is large and we compute heteroskedasticity robust standard errors we can compute t-statistics and hence p-values and confidence intervals as normal.

43
Q

(OLS) Most often the violated assumption is ___

A

(OLS) Most often the violated assumption is the zero conditional mean assumption, X is often correlated with the error term.

44
Q

Sum[ (Xi - avg(X)) (Yi - avg(Y)) ] = Sum[ … ]

A

Sum[ Xi (Yi - avg(Y)) ]