Research Skills Part 3 Flashcards

1
Q

Important note on correlation

A

Zero correlation means that there is no linear relation between x and y. But it does not imply independence!!!

Correlation is not causation!!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name 3 structures of correlation

A
  1. x causes y
  2. y causes x
  3. x causes y and y causes x > self-reinforcement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In a univariate regression…

A

Correlation determines sign of regression coefficient, and CORR^2 = R^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is RSS?

A

Residual Sum of Squares = sum of all the residuals squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Give the formula for the beta coefficient

A

= cov(x,y) / var(x)
= (SD(y) / SD(x)) * CORR(x,y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is TSS?

A

Total Sum of Squares = sum (y – y-bar)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is ESS

A

Explained Sum of Squares = sum (y-hat – y-bar)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Give the formula for R2

A

TSS = ESS + RSS
R2 = 1 –RSS/TSS = ESS/TSS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the drawbacks of R2?

A
  1. It depends on how dep var is defined (changes versus levels, wages versus log wages, etc.). It is only comparable if the dep var is the same.
  2. It always increases if you add more vars, even if they’re useless > compute Adj-R2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Note on (adj) R2

A

(adj) R2 is useful for comparing the relative performance of 2 models with same dep var. However, it is not useful for evaluating absolute performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Name 3 factors reducing the accuracy of OLS estimate

A
  1. Large error variance (s^2) > large influence of other variables are not in the model > OMITTED VARIABLE BIAS!!!!!
  2. Small number of observations
  3. Little spread in indep var > without variation in x one cannot explain variation in y, but too much variation is also bad
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the F-test?

A

The F-test of overall significance indicates whether your linear regression model provides a better fit to the data than a model that contains no independent variables.

-Multiple regression: p-value of F-test equals p-value of null hypothesis that all coefficients are jointly equal to zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When is the Omitted Variable Bias more severe and pose a solution for this problem?

A

Problem more severe when the x variable in regression has high correlation with omitted variable z

Solution: multivariate regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the assumptions of the linear regression model?

A
  1. residuals have a mean of 0 and are independent
  2. residuals have a constant variance = homoskedasticity
  3. residuals are uncorrelated = no autocorrelation
  4. there’s no exact linear relation between the independent variables
    .
    Under these assumptions, the OLS estimators (betas) are BLUE = best linear unbiased estimator for the true beta.
    Only then are the routinely computed S.E.s and t-stats correct.
    .
  5. residuals follow a normal distribution
    .
    If the errors are correlated with any of the independent vars, OLS is biased and inconsistent > wrong coefficient estimates!!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you test for non-linearity and how do you fix non-linearity issues?

A

Test: Ramsey’s RESET test to examine linearity of regression

Solution: use data transformation > take logs or add a squared term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Heteroskedasticity causes / consequences / testing / solutions

A

Causes:
- changing variance over time (time series)
- changing variance across firms (cross sectional)

Consequences:
- usual S.E. and t-stats not valid
- BUT, no impact on coefficients!!

Testing:
- visual testing
- statistical tests

Solutions:
- use corrected S.E.s
- use log transform or scale variables by size

17
Q

Autocorrelation causes / consequences / testing / solutions

A

Causes:
- seasonality effects
- lead/lag effects > over/underreaction to news
- model misspecification

Consequences:
- usual S.E.s and t-stats not valid
- positive autocorr: S.E. understated and t-stats too big
- negative autocorr: S.E. overstated and t-stats too small

Testing:
- visual testing
- statistical tests

Solutions:
- add lagged dep/indep vars
- include dummy variable
- use corrected S.E.s

18
Q

Multicollinearity causes / consequences / checking / solutions

A

Causes:
- 2 or more indep vars are highly correlated

Consequences:
- low t-stats and high S.E.s for individual coefficients
- weird signs or magnitudes of coefficient estimates

Checking:
- compute CORR matrix
- compute Variance Inflation Factor. VIF > 10 is a problem

Solutions:
- drop one variable > can lead to omitted variable bias
- collect more data to increase accuracy

19
Q

Non-normality causes / consequences / testing / solutions

A

Causes:
- extreme observations
- bounded dep var
- binary dep var
- discrete dep var

Consequences:
- large sample > no problem
- small sample > inference about coefficients wrong and t-stats invalid
- BUT, no impact on coefficients!!

Testing:
- JB statistic to test for normal distribution

Solutions:
- winsorize / truncate
- use log transformation
- use other regression model > tobit, probit/logit

20
Q

If the error term in a linear regression model is not normally distributed…

A. … the OLS estimator is biased
B. … routinely calculated S.E.s are incorrect
C. … we need to rely on asymptotic theory to perform valid tests
D. … we need to take the log of the dependent variable

A

C

21
Q

In a linear regression model, if the slope coefficient of X has a t-stat of 3.0…

A. we accept the hypothesis that X has an impact
B. we accept that X is significant
C. we reject the null hypothesis that X is insignificant
D. we reject the null hypothesis that X has no impact

A

D

22
Q

What do endogeneity and simultaneity mean?

A

Endogeneity broadly refers to situations in which an explanatory variable is correlated with the error term.

Simultaneity is another common cause of endogeneity. Simultaneity arises when one or more of the predictors (e.g., treatment variable) is determined by the response variable (Y). In simple terms, X causes Y and Y causes X.

23
Q

Which problem does make the OLS estimator biased?

A. simultaneity between x and y
B. heteroskedasticity
C. a small sample
D. all of these

A

A

24
Q

Which statement(s) is/are correct?

A. R2 is the most important statistic of a regression
B. R2 tells us how well the model fits the data
C. a larger R2 is always better
D. if R2=0, we have a useless model

A

B & D

25
Q

What increases the precision of the OLS estimator?

A. having more observations
B. having more variation in X
C. having less correlation between X and other regressors
D. having a smaller error variance

A

All of them