Quant Flashcards
5 Assumptions to use a multiple regression model
1) Linearity
2) Homoskedasticity
3) Independence of Errors
4) Normality
5) Independence of Independent Variables
Linearity Assumption
The relationship between the independent variable(s) and dependent variable needs to be linear
Homoskedasticity Assumption
the variance of the regression residuals should be the same for all observations
Independence of Errors Assumption
The observations are independent of one another and uncorrelated
Normality Assumption
The regression residuals are normally distributed
Independence of Independent Variables Assumption
Independent variables are not random and they are not correlated
Adjusted R-Squared
Adjusted version of R-squared that increases when new variables introduced into the model help improve its accuracy
AIC v. BIC
AIC is for prediction
BIC is for goodness of fit
Lower values are better for both
F Statistic
[(SSE of unrestricted - SSE of restricted)/q] / (SSE of restricted)(n-k-1)
SSE is mean squared
T Stat when only given coefficient and standard error, and what is null hypothesis
coefficient/error, null hypothesis is coefficient does not differ significantly from 0
Breusch Pagan Test (BP)
- What does it test for
- What is the formula
1) Conditional Heteroskedasticity - variance in residuals differs across observations
2) n*R-Squared
2 Types of Heteroskedasticity
1) Conditional - error variance is correlated with independent variables (much bigger problem) - high probability of Type 1 errors
2) Unconditional - less problematic, no correlations
Durbin-Watson Test (DW)
A test for first-order serial correlation in time series model
Breusch-Godfrey Test (BG)
A test to used to determine autocorrelation up to a predesignated order of the lagged residuals in a time series model
Multicollinearity
When two or more independent variables are correlated to each other
Test for multicollinearity
Variance inflation factor (VIF)
1 / (1-R-Squared)
Any value over 5 warrants investigation
Any value over 10 means multicollinearity is likely
Two types of observations that may influence regression results
1) High Leverage Point
2) Outlier
Difference between high leverage point and outlier
High leverage point is when x value is extreme and outlier is when the y value is extreme, however a point can be both high leverage and an outlier
How to calculate if a point is high leverage
Leverage
If leverage exceeds 3*(k+1)/n
k - independent variables
n - observations
When looking at regression, determine if independent variable is significantly different from 0
If T stat > p value, it is significantly different from 0
T stat if not given is coefficient / standard error
Method to identify if method is an outlier and what is the formula
Studentized deleted residuals
t(I) = residual with the ith term deleted (e(I)) / standard deviation of all residuals (s(e)) == this equals standard error
if greater than 3 or greater than the critical t stat with n-k-2 degrees of freedom, observation is an outlier
When is an observation considered influential
If its exclusion from the sample causes substantial changes in the regression function
Cook’s D
Metric for identifying influential observations
Interpreting Cook’s D
If value is greater than 0.5, possibly influential
If value is greater than 1, likely influential
If value greater than SqRt(k/n), likely influential