S7 OLS Regression & Assumptions (complete) Flashcards

1
Q

Correlation vs. Regression

What is the difference?

A

Correlation:
tells us how strongly associated two variables are

Regression:
can tell us, on average, how much a one unit increase in the iv increases/decreases the predicted value of the dv

-> Regression gives us more precise information on the strength of a relationship

-> bivariate regression finds the best fit for a line through the data; the line with the best fit is the one that minimizes the Y distance from each observation to the line -> to find the best line use OLS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Ordinary Least Squares

A

OLS minimizes the prediction errors in yi - [y-hat]

b= Covar(x,y) / Var(x) (-> siehe Formelsammlung)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Standard error of the slope

Why does the slope have a standard error and how is it calculated? -> Hint RMSE

Howcan we build a confidence interval around the slope?

A

coefficients (b) are also sample statistics -> random sampling error

standard error of the slope (b), is given by the root mean square error (RMSE) over the standard deviation

  • RMSE is given by the root of the error sum of squares (ESS) over the adjusted sample size; the RMSE is a useful measure of goodness of fit

beta = b +/- 1.96 x RSME

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hypothesis testing with regression

2 ways to do it

A
  1. calculate degrees of freedom: df = n minus # of parameters (a&b)
  2. Form a null hypothesis: i.e. no effect -> beta = 0, the regression line is horizontal
  3. Evaluate: to reject the null, the confidence intervals around b should exclude zero

Alternatively: calculate a t-ratio

t= b-beta(H0) / s.e.

with beta(H0) usually zero

-> If our t-ratio is greater than 2 (i.e., 1.96), p is under .05 and we can reject the Null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two general wys to measure the performance of an estimator?

A

two general ways to measure the performance of an estimator:

> Bias:
- a systematic tendency to produce estimates that are too high or too low relative to the true value
- minimize the bias

> Efficiency
- an efficient estimator yields standard errors that are as small as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the 5 OLS assumptions?

A
  1. Linearity: The dependent variable y is a linear function of the x’s, plus a population error term.
  2. Mean independence: The mean value of the error does not depend on any of the x’s.
  3. Homoscedasticity (variance dependence): The variance of the error cannot depend on the x’s. The variance is constant.
  4. Uncorrelated disturbances: The value of the error for any observation is uncorrelated with the value of the error for any other observation.
  5. Normal disturbance: The disturbances/errors are distributed normally.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which OLS assumptions guarantee what?

A

Assumptions (1) Linearity & (2) Mean independence -> linear and unbiased estimated

Assumptions (3) homoscedasticity and (4) uncorrelated disturbances -> efficient model -> “best”

together: BLUE

Adding assumption (5) normality implies that a t- or z-table can be used to calculate p-values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mean Independence

A

Most important assumption because violations

  • can generate LARGE bias in the estimates and often occur
  • cannot be tested for without additional data-> if your x’s are related to something outside of the model, they might be picking up its effect on y as well as their own!
    -> this is called omitted variable bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Dangers of Violating Mean Independence

A

Omitted variable bias
- can generate LARGE bias in the estimates and often occur
-> if your x’s are related to something outside of the model, they might be picking up its effect on y as well as their own!

  • cannot be tested for without additional data

Endogeneity bias
->explanatory variable is correlated with the error term

  • often reverse causation or selection effects
  • If y has a causal effect on any of the x’s, then the error term will indirectly affect the x’s

Measurement Error
- If that x’s are measured with error, that error becomes part of the error term
- Because the measurement error affects the measured value of the x’s, the error term is related to the x’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Assumption (3) Homoscedasticity

A

Wanted: homoscadicity;
bad brother = heteroscadicity

-> Non-constant variance (scatterplot that looks like a “joint”)
-> Biased standard errors (in either direction)
- easily fixed with “robust standard errors”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

(4) Uncorrelated Errors

A

The disturbances (errors) for any two observations must be uncorrelated.

Correlated errors can arise from connected observations (e.g. Husbands and Wives), causal effects (e.g. peer pressure) or serial connection (measuring same unit over time)

  • correlated errors do not bias coefficient estimate

But they do
- shrink the standard errors

  • observations are assumed to be more independent than they are
  • DANGER: Type 1 error!! False positive
  • solution depends on type of correlation in errors, e.g. “clustered standard errors”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Normality

A

The population disturbance term must be normally distributed

Note that only disturbances, not the variables, must be normally distributed (Big misconception!!)

Normality is the least important assumption because OLS can be BLUE without it (unbiased and efficient)

Normally distributed disturbances simply enable the use of a z- or t-table for the p-values. Thus, in large samples we don’t even care about normality of disturbances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which pitfalls can bias estimates and which can influence the standard errors?(assumptions + darüber hinaus)

A

Pitfalls that can bias estimates:

(1) Non-linearity (misspecification)
(2) Violation of mean independence > omitted var bias (misspecification)

  • endogeneity (= explanatory variable is correlated with the error term)
  • measurement error

Standard errors:
- Outliers - sometimes from skew
- heteroskedasticity
- correlated errors
- multicollinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Consider a linear function, y = α + βx. What does the constant α signify? (Select ALL the answers that apply)

a. The value of x when the y-intercept is 0
b. The value of y when x is 0
c. The value of the residuals when x is 0
d. The Y-intercept

A

Correct: b & d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which of these statements does not form part of the OLS assumptions?

Select one:

a. Mean independence. The mean value of ε does not depend on any of the x’s. Assume that e(ε)=0.

b. Linearity. The dependent variable y is a linear function of the x’s, plus a population error term, ε.
y = α + β1 x1 + β2 x2 + ε

c. Normality. The dependent variable is approximately normally distributed around its mean.

d. Uncorrelated disturbances. The value of ε for any observation is uncorrelated with the value of ε for any other observation.

A

Correct: C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which of the following accurately describes a p-value? Check all that apply.

a.
For the same measured relationship, a larger sample size will lead to a smaller p-value.

b. p=0.01 means that there is a 1% chance that our alternative hypothesis is true.

c. p=0.01 means that there is a 1% chance that we would see the measured relationship due to random chance.

d. If our t-value is large, our p-value will also be large.

A

Correct: A & C