CFA L2 Quant Flashcards
(196 cards)
True or false: With financial instruments, we can typically use a one-factor linear regression model?
False, typically we need a multiple regression model.
Multiple regression model
Regression models that allow to see the effects of multiple independent variables on one dependent variable.
Ex: Can the 10-year growth in the S&P 500 (dependent variable (Y)) be explained by the trailing dividend payout ratio of the index’s stocks (independent variable 1 (X1)) and the yield curve slope (independent variable 2 (X2))?
What are the uses of multiple regression models?
- Identify relationships between variables.
- Forecast variables. (ex: forecast CFs or forecast probability of default)
- Test existing theories.
Standard error
A statistical measure that shows how well the sample represents the population
Residual (ε)
The difference between the observed Y value and the predicted Y value (ŷ).
ε = Y - ŷ
OR
Y - (b0 + b1x1 + b2x2 … + bnxn)
P-value
The smallest level of significance for which the null hypothesis can be rejected.
- If the p-value is less than the significance level (α), the null hypothesis can be rejected and if it’s greater it is failed to be rejected.
If the significance level is 5% and the p-value is .06, do we reject the null hypothesis?
No, we fail to reject the null hypothesis.
Assumptions underlying a mutliple regression model:
- A linear relationship exists between the dependent and independent variables.
- The residuals are normally distributed.
- The variance of the error terms is constant.
- The residual of one observation ISN’T correlated w/ another.
- The independent variables ARE NOT random
- There is no linear relationship between any two or more independent variables.
Q-Q plot
A plot used to compare a variable’s distribution to a normal distribution. The residual of the variable’s distribution should lie along a diagonal line if they follow a normal distribution.
True or false: For a standard normal distribution, only 5% of the observations should be beyond -2 standard deviations of 0?
False, only 5% of the observations should be beyond -1.65 standard deviations.
Analysis of variance (ANOVA)
A statistical test used to assess the difference between the means of more than two groups. At its core, ANOVA allows you to simultaneously compare arithmetic means across groups. You can determine whether the differences observed are due to random chance or if they reflect genuine, meaningful differences.
- A one-way ANOVA uses one independent variable.
- A two-way ANOVA uses two or more independent variables.
Coefficient of determination (R^2)
The percentage of the total variation in the dependent variable explained by the independent variable(s).
R^2 = SSR/SST
OR
(SST - SSE) / SST
Ex: R^2 of 0.63 means that the model explains 63% of the variation in the dependent variable.
SSR= regression sum of squares. It’s the sum of the differences between the predicted value and the mean of the dependent variable.
RSS= regression sum of squares. It’s the total variation in the dependent variable explained by the independent variable.
Adjusted R^2
Since R^2 almost always increases as more independent variables are added to the model, we must adjust it.
- If adding an additional independent variable causes the adjusted R^2 to decrease, it’s not worth adding that variable.
Overfitting
When R^2 is high because there is a large # of indepedent variables, rather than a strong explanation.
Akaike’s information criterion (AIC)
Looks at multiple regression models and determines which has the best forecast.
Calculation: (n * ln(SSE/n)) + 2(k+1)
- Lower values indicate a better model.
- Higher k values result in higher values of the criteria.
Schwarz’s Bayesian information criteria (BIC)
Looks at multiple regression models and determines which has a better goodness of fit.
Calculation: (n * ln(SSE/n)) + (ln(n)*(k+1))
- Lower values indicate a better model.
- Higher k values result in higher values of the criteria.
- BIC imposes a higher penalty for overfitting than AIC.
- AIC and BIC are alternatives to R^2 and adjusted R^2 to determine the quality of the regression model.
Nested models
Models that have a full model and an unrestricted model.
Full model vs restricted model
Full model= A linear regression model that uses all k independent variables
Restricted model= A linear regression model that only uses some of the k independent variables
Joint F-Test
Measures how well a set of independent variables, as a group, explains the variation in the dependent variable. Put simply, it tests overall model significance.
Calculation: [ (SSErestricted - SSEunrestricted) / Q ] / [ (SSEunrestricted) / (n - k - 1) ]
* Q = # of excluded variables in the restricted model.
* Decision rule: reject the null hypothesis if F-stat > F critical value.
True or false: We could also use a t-test to evaluate the significance to see which variables are significant?
True, but the F-test provides a more meaningful evaluation since there is likely some amount of correlation among independent variables.
True of false: The F-test will tell us if at least one of the slope coefficients in a multiple regression model is statistically different from 0?
TRUE
True or false: When testing the hypothesis that all the regression coefficients are simultaneously equal to 0, the F-test is always a two tailed test?
False, when testing the hypothesis that all the regression coefficients are simultaneously equal to 0, the F-test is always a one tailed test.
True or false: We can use the regression equation to make predictions about the dependent variable based on forecasted values of the independent variable?
True, we can make predictions.
Predicting the dependent variable from forecasted values of the independent variable:
ŷ = predicted value of the intercept + (X1 * estimated slope coefficient for X1) + (X2 * estimated slope coefficient for X2)…