Midterm Flashcards

(64 cards)

1
Q

Causal relationship

A

a change in one variable (action) CAUSES change in another variable (result)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Correlation

A

the change between X and Y can partially be explained by other factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Error Term

A
  • Deviation of the observed Y from the true line
  • Represented by εi in structural equation
  • A theoretical representation of unobserved variables that explains for the remaining change not interpreted by the model (omitted variable absorbed by error term)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Residual

A
  • Deviation of the observed Y from the estimated line
  • Calculated by e_i = Y_i – (Y_i )̂
  • *Oberserved - Estimated**
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

R^2

A

Goodness of Fit

  • Ranges from 0 – 1
  • Closer to 1 = better fit
  • Adjusted R2: Includes “penalty” for adding additional regressors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

null hypothesis

A

The null hypothesis states “no difference” or “no effect”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

alternative hypothesis

A

The alternative hypothesis states there is a difference/effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

T-test

A
  • If the absolute value of the t-stat is bigger than the critical value (e.g. 1.96) it means we can reject the null hypothesis and accept the alternative that the true coefficient is not zero
  • *our variable is statistically significant at the 5% level of significance
  • This also means the p-value is smaller than 5% (0.05).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

T-test formula

A

Divide the coefficient by the standard error to get the t-value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

F-test

A

Test a set of regression coefficients for joint significance

  • H0: β1 = β2 = β3= 0 (ALL coefficients = 0)
  • HA: β1 ≠ 0 OR β2 ≠ 0 OR β3 ≠ 0 (at least 1 coefficient NOT equal to 0)

F-stat > Critical Value = Reject the Null
(p-value of F lower than the level of significance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

F-test formula

A

You want the F-stat high & probability low

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Interpreting Coefficients:

Level-Level

A

Y = β1 X1

on average a one-unit increase in X is associated with a β1-unit increase in Y, holding all else constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Interpreting Coefficients:

Log-Level

A

lnY= β1 X1

on average a one-unit increase in X is associated with a β1% increase in Y, holding all else constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Interpreting Coefficients:

Level-Log

A

Y= β1 lnX1

on average a 1% increase in X is associated with a β1-unit increase in Y, holding all else constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Interpreting Coefficients:

Log-Log

A

lnY= β1 lnX1

on average, a 1% increase in X is associated with a β1% increase in Y, holding all else constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Dummy/binary variable

A

Only has two possible values – e.g. X = 1 if female; X= 0 is male

Y = B0 + B1female

Ex: On average, being female is associated with a B1 difference in Y compared to male, holding all else constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Categorical Variable

A

A variable like “region” has multiple values (south, west, northeast, midwest) that should be transformed into individual dummy (0 or 1) variables

Y = B0 + B1south + B2west + B3 northeast

Ex: On average, living in the South is associated with a B1 change in Y compared to the Midwest, holding all else constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Interaction term

A

An independent variable in a regression equation that is the multiple of two or more other independent variables. Each interaction term has its own regression coefficient

Does the effect of work experience on salary differ between males and females?

Y = B0 +B1Experience + B2Female + B3(Experience*Female) + e

Ex: On average, a one-unit increase in experience has a B3 difference in Y for females compared to males, holding all else constant

This allows the effect of experience on income to vary by gender

B3 now measures the effect of an additional year of experience for females relative to males

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

7 Classical Assumptions

A
  1. Regression model is linear (in B’s), correctly specified, and has an additive error term
  2. The error term has a population mean of zero
  3. The explanatory variables are not correlated with the error term
  4. Observations of the error term are not correlated
  5. The error term has a constant variance
  6. The regressors are uncorrelated with each other
  7. Error term is normally distributed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Omitted Variable Bias

A

Y = β0 + β1X1 +e

where error term absorbs an omitted variable X2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Variable Inclusion Criteria

A

Theory: is there sound justification for including the variable?

Bias: do the coefficients for other variables change noticeably when the variable is included?

T-Test: is the variable’s estimated coefficient statistically significant?

R-square: has the R-square (adjusted R-square) improved?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

First-order serial correlation

A

occurs when the value of the error term in one period is a function of its value in the previous period; the current error term is correlated with the previous error term.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

DW Test

A

compare DW(d) to the critical values (𝐝_𝐋, 𝐝_𝐔)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Newey-West Standard Errors

A

-Designed to correct for the consequences of first-order serial correlation; they are technically still biased, but are more accurate than OLS standard errors so they can be used for t-tests and other hypothesis tests

Newey-West SE > OLS SE

-Larger standard errors produce lower t-scores, so coefficients won’t be as statistically significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Heteroskedasticity
happens when the standard errors of a variable, monitored over a specific amount of time, are non-constant. With heteroskedasticity, the tell-tale sign upon visual inspection of the residual errors is that they will tend to fan out over time, as depicted in the image below.
26
Pure Heteroskedasticity
occurs in correctly specified equations
27
Impure Heteroskedasticity
arises due to model misspecification
28
Multicollinearity
state of very high intercorrelations or inter-associations among the independent variables. It is therefore a type of disturbance in the data, and if present in the data the statistical inferences made about the data may not be reliable.
29
Perfect Multicollinearity
virtually always the result of a definitional relationship between the independent variables, and is solved by dropping variables from the regression.
30
Imperfect Multicollinearity
describes the existence of a strong (but not exact) linear relationship between two or more independent variables that can significantly affect the estimates of coefficients. 
31
Multicollinearity
Multicollinearity exists in every equation & the severity can change from sample to sample.  There are no generally accepted true statistical tests for multicollinearity. VIF > 5 as a rule of thumb
32
Outliers
A distinctly unusual observation or extreme value 
33
Unbiased
Parameter estimates are, on average, equal to the parameter's true value in the population model
34
Unbiased Equation
E(Bhat)=B | Distrobution of Bhat is centered around B
35
Efficient
Has the lowest variance among unbiased estimators
36
Multicollinearity
strong (but not exact) linear relationship between two or more regressors
37
Best Linear Unbiased Estimator
If first 6 classical assumptions are met
38
OLS stands for
Ordinary least squares
39
Most Common remedies for multicollinearity
1. Do Nothing 2. Drop a redundant variable 3. Increase the sample size
40
Impure Serial Correlation
Serial correlation that is caused by a specification error such as an omitted variable or an incorrect functional form
41
Pure Serial Correlation
This type of serial correlation occurs when the error in one period is correlated with the errors in other periods. The model is assumed to be correctly specified.
42
Best remedy for impure serial correlation
attempt to find the ommitted variable or the correct functional form for the equation
43
Stochastic Error Term
term that is added to aregression equation to introduce all the variation in the dependent variable that cannot be explained by the independent variables that have been included Equation: Y = B0+B1X+e
44
Residual Error Term
The difference between the estimated value of the dependent variable and the actual value of the dependent variable (observered-estimated) Equation: ei=Yi-Y^i
45
Durbin-Watson d statistic test
Used to determine if there is first-order serial correlation in the error term of an equation by examining residuals. Includes dL(lower bound) and dU (upper bound).
46
Durbin-Watson assumptions
1. regression model includes an intercept term 2. serial correlation is first-order in nature 3. regression model does not include a lagged dependent variable as an independent variable
47
What if Durbin-Watson d-statistic is outside of upper limit?
We do not reject the null hypothesis of no autocorrelation since there is no statistical evidence of first order positive serial correlation
48
White Test
Used to test for heteroskedasticity
49
t-test formula
coefficient divided by standard error
50
r2 formula
ess(model) divided by tss (total)
51
Sign of bias = sign of OVB x sign correlation
sign of bias
52
5% confidence
1.96
53
1% confidence
2.787
54
Omitted Variable (issue)
``` Bias in the coefficient estimates of the included X's #3 OLS classical assumption ```
55
Omitted Variable (correction)
``` Include the ommitted variable or a proxy #3 OLS classical assumption ```
56
Irrelevant variable
Inclusion of a variable that does not belong in the equation
57
Incorrect Functional Form (issue)
``` The function form is inappropriate #1 OLS classical assumption ```
58
Incorrect Functional Form (correction)
``` Transform the variable or the equation to a different functional form #1 OLS classical assumption ```
59
Multicollinearity (issue)
``` Some of the ind. variables are imperfectly correlated #6 OLS classical assumption ```
60
Multicollinearity (correction)
``` Drop the redudant variables but often doing nothing is best #6 OLS classical assumption ```
61
Serial Correlation (issue)
``` Observations of the error term are correlated #4 OLS classical assumption ```
62
Serial Correlation (correction)
``` if impure, fix the specification. consider Geralized Least Squares or Newey-West standard errors #4 OLS classical assumption ```
63
Heteroskedasticity (issue)
``` The variance of the error term is not constant for all observations #5 OLS classical assumption ```
64
Heteroskedasticity (correction)
if impure, fix the specification. Use the HC standard errors or reformulate the variables