Quant Methods #10 - Multiple Regression & Issues in Regression Analysis Flashcards

1
Q

variance and standard deviation equations and relationship to each other

A

LOS 10.a

Variance: σX2 =E(i=1 to n) (Xi - Xmean) / (n-1)

standard deviation is the square root of variance:

σX = sqrt(σX2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Fill in the terms to this ANOVA table:

Source df SS MS

Regression ? ? ?

Error ? ? ?

Total ? ?

A

LOS 10.i

Source df SS MS

Regression k RSS MSR

Error n-k-1 SSE MSE

Total n-1 SST

NOTE: MSR = RSS / k; MSE = SSE / (n-k-1); R2 = RSS / SST; SEE = sqrt(MSE) ≈ sforecast for large n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Construct equations for MSE, MSR, R2, F, and SEE to show their relationship with terms in the ANOVA table:

Source df SS MS

Regression k RSS MSR

Error n-k-1 SSE MSE

Total n-1 SST

A

LOS 10.i

mean squared error: MSE = SSE / (n-k-1)

mean regression sum of squares: MSR = RSS / k

coefficient of determinantion: R2 = RSS / SST

F-statistic: F = MSR / MSE

standard error of estimate: SEE = sqrt(MSE)

standard error of forecast (large n): sforecast ≈ SEE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Compute the residual ^e “e-hat” for observation “i” from the observation data and the multi-variate regression

A

LOS 10.a

^ei = Yi - ^Yi = Yi - (^b0 + ^b1X1i + ^b2X21 + … + ^bkXki)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

T-statistic used for testing regression coefficients for statistical significance

A

LOS 10.c, LOS 10.d

t = (^bj - bj) / s^b,j

df = n - k - 1

where:

  • ^bj = coefficient to be tested
  • bj = significance value to be tested (= 0)
  • s^b,j = estimated standard error for bj
  • n = number of observations
  • k = number of independent variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Interpret estimated regression coefficients

A

LOS 10.b

intercept term - value of dependent variable when all independent variables are zero.

(partial) slope coefficients - estimated change in the dependent variable for a one-unit change in that independent variable, holding all other independent variables constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Interpret the p-value of an estimated regression coefficient

A

LOS 10.b

The p-value is the smallest level of significancefor for which the null hypothesis can be rejected.

Comparing p-value to the significance level:

  • If p-value < significance level, H0 can be rejected
  • If p-value > significance level, H0 cannot be rejected

Example: if ^b1 = 0.40 and its p-value = 0.032, at 1% significance level:

  • p (0.032) > 0.01, so we cannot reject H0, so ^b1 is not statistically signficant from 0 at 1% level of significance.
  • However, we can conclude that ^b1 is statistically signficant from 0 at any significance level greater than 3.2%.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

heteroskedasticity

A

LOS 10.k

  • arises when residual variance is non-constant
  • 2 types of heteroskedasticity:
    • Type 1: “unconditional”
      • residuals not related to X’s
      • type 1 causes no major problems
    • Type 2: “conditional”
      • residual are related to X’s
      • type 2 is a problem!
  • Impact / effect:
    • std errors (sb’s) unreliable estimates
    • coefifficient estimates (b’s) are not​ affected
    • t-stats are too high (sb’s too small)
    • F-test unreliable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

detecting heteroskedasticity

A

LOS 10.k

  • scatter diagrams: plot residuals vs each X & time
  • Breusch-Pagan test: regress squared residuals on “X” variables to test significance of Rresid2
    • ​H0: no heteroskedasticity
    • Chi-square test: BP = Rresid2 * n (w/ df = k)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

correcting heteroskedasticity

A

LOS 10.k

  • 1st Method: White-corrected (“robust”) std errors; makes std errors higher, t-stats lower, and conclusions more accurate
  • 2nd Method: use “generalized least squares” - modifying original equation to eliminate heteroskedasticity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

serial correlation

A

LOS 10.k

  • positive autocorrelation: each residual trends in same direction as previous term; common in financial data
  • impact: t-stats too high
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

detecting serial correlation

A

LOS 10.k

  • scatter plot: visually inspect error terms
  • Durbin-Watson statistic
    • formal test of error term correlation
    • for large samples: DW ≈ 2(1 - r), where r = correlation of residuals from one observation to the next
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

interpreting Durbin-Watson values

A

LOS 10.k

  • for DW ≈ 2(1-r):
    • no autocorrelation (rho = 0): DW =2
    • positive autocorr. (rho=1): DW = 0 (common)
    • negative a.c. (rho=-1): DW=4 (uncommon)
  • How close to “2” does DW have to be to conclude “no autocorrelation”? Look at ranges in DW tables
    • table gives critical values “dl” and “du
    • H0 = no positive serial correlation
    • 0 l: reject H0, is + autocorrelated
    • dl u: inconclusive
    • du 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

correcting serial correlation

A

LOS 10k

preferred method: “Hansen Method”

  • adjust standard errors upwards and then recalculate t-stats
  • also corrects for conditional heteroskedasticity
  • result: t-stats decline, chance of Type I error (false positive) declines
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

multicolinearity

A

LOS 10.l

multicolinearity - two or more “X’s” are correlated to each other

  • effects: inflates std errors; reduces t-stats, increases chance for Type II errors (false negative)
  • i.e. t-stats look artificifially small, so variables look unimportant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

detecting/correcting multicolinearity

A

LOS 10.l

Tell-tale signs from regression data:

  • significant F-stat (and high R2) but all t-stats insignificant
  • high correlation between “X” variables (for k=2 case only)
  • sign of coefficient is unexpected

correction: omit one or more X variables

17
Q

summary of regression analysis problems

A

LOS 10.k,l

Conditional Heteroskedasticity

  • define: non-constant residual variance
  • effect: Type I errors (false positive)
  • detect: Breusch-Pagan; chi-square test
  • correct: White-corrected standard errors

​Serial Correlation

  • define: residuals are correlated
  • effect: Type I errors (false positive)
  • detect: Durbin-Watson test
  • correct: Hansen method to adjust standard errors

​Multicolinearity

  • define: two or more X’s are correlated
  • effect: Type II errors (false negative)
  • detect: conflicting t and F stats; correlation among X’s (for k=2)
  • correct: drop one of the correlated X’s
18
Q

regression model misspecification

A

LOS 10.m

model specification: process of variable selection and transformation; determines/affects quality of regression

  • Effect of misspecification:
    • regression coef’s will be biased and inconsistent
    • lack of confidence in hypothesis tests of coef’s or in model predicitions

Types of Model Misspecification:

  • omitting an important variable
  • variables not transformed appropriately
  • incorrect pooling of data
  • using lagged dep. var. as indep. var.
  • forecasting the past
  • inaccurate measurement of indep. var. data
19
Q

multiple regression model flow chart

A

LOS 10