S9 - Model Specification (complete) Flashcards

1
Q

What types of specification error can occur?

A

Specification errors occur when the regression model does not match the theory:
> Omitting an important variable
> Including an inappropriate variable
> falsely assuming a constant effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Multicollinearity

A

= when two or more independent variables are highly correlated with each other

  • Regression coefficients are based on the variance that is not collinear
  • When we include a variable (W) that correlates strongly with our main independent variable (X), the estimated effect of X will be based on the little variation left that is unexplained by W
  • MC does not bias your coefficient estimates - still BLUE!
  • However, it inflates the standard errors of the highly collinear variables -> can lead to type II error (false negative)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can we diagnose Multicollinearity?

A

Variance Inflation Factor = VIF = 1/ 1-R^2

  • the higher the VIF, the more the standard error is inflated &
  • the higher the chance of false negative
  • Interpretation: The square root of the VIF is the factor by which MC inflates the standard error

Rules of thumb:
* Values higher than 4-5 moderate to high
* Values of 10 or more, very high
** Note that in some contexts values of 2 would already be too high

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an example for an underspecified models?
Explain what it is and (if) how it is measured
What are possible consequences?

A

Omitted variable bias

happens when a necessary control variable is being omitted
Independent variables in your model may pick up covariation with the DV that actually belongs to the omitted variable.

Positive Type I error (false positive)
OLS assumption violated = mean independence

Cannot be measured - but the sign of the bias can be estimated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can the bias of omitted variable bias be estimated?

A

->
beta > 0 & Corr(x1,x2) > 0 -> positive bias
beta < 0 & Corr(x1,x2) < 0 -> positive bias

beta > 0 & Corr(x1,x2) < 0 -> negative bias
beta < 0 & Corr(x1,x2) > 0 -> negative bias

Positive/negative bias refers to magnitude of effect
Upward/downward bias refers to value, not to absolute value. -> think of it as values on a pole

[Nochmal schauen was upward, downward genau meint]

A caveat. This technique is less valid with more covariates, since one must assume that all are uncorrelated with  1 but still helpful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Collider Bias

A

Collider Bias - form of overspecified model; too many covariates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What forms of overspecified models do we know?

A
  • Collider Bias
  • Post-treatment Bias
  • Multicollinearity (-> inflated standard errors)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Post-treatment bias

A

Overspecified model

Including covariates that control for your causal mechanism can result in “Post-treatment bias”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Multicollinearity

A

Problem: too many covariates!

the closer you are to -1 or 1, the more MC you have

High Multicollinearity does not bias the estimation of your coefficient estimates -> still BLUE
High MC does inflate standard errors of highly collinear variables (ineffeciency) and
Induce unstable estimates (inefficiency)

Problem with inflated s.e.? Possible Type II error (false negatives)!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What can you do when you have high MC?

A
  • different specification
  • instrumental variables
  • transformation of skewed variables
  • categorical instead of interval variables (e.g. dummies)
  • different operationalization of variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In the presence of (high, but not perfect) multicollinearity: What happens to the results of your regression? Select all that apply.

Select one or more:

a. The estimates of your regression coefficients are biased

b. We may encounter type II error (false negatives)

c. OLS is no longer BLUE.

d. The standard errors are inflated

A

Correct: B & D

The higher the multicollinearity, the more our standard error is inflated, the larger the confidence interval and the smaller the chance that a coefficient is determined to be statistically significant. This leads to false negatives or type II error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When are the Variance Inflation Factors too high?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Let’s say that your data set includes another variable that you believe is correlated with your dependent variable (i.e. your measure of euroscepticism), but it is not correlated with your independent variable of interest (i.e. unemployment rate). Do you need to include this variable in your model to avoid omitted variable bias?

A

Correct: No.

To avoid omitted variable bias, you only need to include variables that correlate with BOTH your dependent variable AND your independent variable of interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When specifying a regression model, including too many covariates can have which of the following effects? (multiple answers are possible).

Select one or more:

a. Attenuated (= abgeschwächt) b coefficient from post-treatment bias

b. Attenuated standard errors from multicollinearity

c. Inflated standard errors from multicollinearity

d. Inflated b coefficient from post-treatment bias

A

Correct: A & C

Overspecifying a model can cause multicollinarity (which inflates se) and/or post-treatment bias (which attenuates coefficient size).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In which of the following cases would you expect the assumption of mean independence (also called zero conditional mean) to be violated? (Multiple answers possible)

Select one or more:

a. Your model includes two variables that are highly correlated
b. Your model excludes an important control variable
c. Your model is overspecified
d. One of your independent variables is highly skewed
e. Your model is underspecified
f. You have structural measurement error in one of your independent variables

A

Correct: B, E, and F

As we mentioned in the lecture, when we exclude an important control variable/our model is underspecified or we have measurement error in one of the independent variables, then we will violate the assumption of mean independence.

If our model is overspecified/includes variables that are highly correlated, then our standard errors might be too big, but we are not violating the assumption.

If one of our variables is highly skewed, we might want to recode it (for left skew) or log-transform it (for right skew), but no OLS assumption will be violated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly