What is the correlation coefficient

Correlation analysis expresses the relationship between two data series in a single number. The correlation coefficient measures how closely two data series are related.

(Wiley v1_136)

Wiley. Wiley Study Guide for 2018 Level II CFA Exam: Complete Set (Mobile Friendly). John Wiley & Sons P&T, 2017-07-01. VitalBook file.

The citation provided is a guideline. Please check each citation for accuracy before use.

Slope Calculation

Rise over Run

What are the properties of covariance

Properties of Covariance

Covariance is symmetric, that is, Cov(X, Y) = Cov(Y, X).

The covariance of X with itself, Cov(X,X), equals the variance of X, Var(X).

(Wiley v1_138)

Wiley. Wiley Study Guide for 2018 Level II CFA Exam: Complete Set (Mobile Friendly). John Wiley & Sons P&T, 2017-07-01. VitalBook file.

The citation provided is a guideline. Please check each citation for accuracy before use.

Interperate covarinace

negative, move in opposite

positive, move in same direcion

0 no related

Sample Covariacne

Sample Correlation Coefficient

Sample Variance

Sample Std. Deviation

What are the Limitations to Correlation Analysis

###
- only measure linear relationshipos
- unrealiabvle when
__outliers__ present
- Spurious correlation
- Correlation does not imply causation

__outliers__presentTest-Statistic

What is correlation Analysis used for?

###
- Investment analysis. For example, correlation analysis can be used to evaluate the accuracy of inflation forecasts in predicting actual inflation. If the correlation between inflation forecasts and actual inflation is higher, these forecasts can be used in asset pricing models.
- Identifying appropriate benchmarks in the evaluation of portfolio manager performance.
- Identifying appropriate avenues for effective diversification of investment portfolios (e.g., by investing in assets denominated in different currencies that are not highly correlated).
- Evaluating the appropriateness of using other measures (e.g., net income) as proxies for cash flow in financial statement analysis.
- Analysis of large data sets (or big data). For example, large amounts of unstructured data were used by an investment firm to determine the measures that would need to be taken (store closings) by two companies merging in order to gain antitrust approval for the merger.

(Wiley v1_142)

Wiley. Wiley Study Guide for 2018 Level II CFA Exam: Complete Set (Mobile Friendly). John Wiley & Sons P&T, 2017-07-01. VitalBook file.

The citation provided is a guideline. Please check each citation for accuracy before use.

What does linear regression aim to minimize

The sum of the squared regression residuals

What are the six normal linear regression assumption?

###
- The relationship between the dependent variable (Y) and the independent variable (X) is linear in the parameters, b1 and b0. This means that b1 and b0 are raised to the first power only and neither of them is multiplied or divided by another regression parameter.
- The independent variable, X, is not random.
- The expected value of the error term is zero: E(ε) = 0.
- The variance of the error term is constant for all observations (E(ε2i) = σ2ε, i=1,…, n)(E(εi2) = σε2, i=1,…, n). This is known as the homoskedasticity assumption.
- The error term is uncorrelated across observations.
- The error term is normally distributed.

(Wiley v1_144)

The citation provided is a guideline. Please check each citation for accuracy before use.

regression residuals

(Wiley v1_143)

The citation provided is a guideline. Please check each citation for accuracy before use.

What is the SEE (standard error of estimate) or the standard error of the regression

used to measure how well a regression model captures the relationship between the two variables. It indicates how well the regression line “fits” the sample data and is used to determine how certain we can be about a particular prediction of the dependent variable yhat

based on a regression equation. A good way to look at the SEE is that it basically measures the standard deviation of the residual term ehat in the regression. At the risk of stating the obvious, the smaller the standard deviation of the residual term (the smaller the standard error of estimate), the more accurate the predictions based on the model.

(Wiley v1_147)

The citation provided is a guideline. Please check each citation for accuracy before use.

SEE Formula

SEE = MSE^(1/2)

Total Variation

Summation(Y-Ybar)^2

- variation explained by the independent varibalble and the variation that remains unexplained by the independent variable.

What is the Explained Variation

Explained Variation = Σ(Yhat-Ybar)^2

The variation in the dependent variable that can be explained by the independent variabels

What is the explanied variation

Σ(Yhat-Ybar)^2

The variation in the dependent variable that cannot be expalined by the independent variables

What is the coefficient of determination, "R-Squared" and what is the formula

Rsquared = (Explained variation/total variation) = [(Total variation - unexplained variation)/(total variation)]

this equation simplifies to = 1 - (unexplained variation/total variation)

The point of r-sqaured is it measures the percentrage of the total variation in the dependent variable that can be explaiend by the variation in the independent variable.

Note do not forget about r-quared adjusted which will be more apporprate in multiple variable linear regression.

What is the test Statistic?

note that an "=" makes it a two tailed tailed test.

What is a an confidence interval?

###
- A confidence interval is a range of values within which we believe the true population parameter (e.g., b1) lies, with a certain degree of confidence.
- Here the interval represents the “fail‐to‐reject‐the‐null region” and is based around, or centered on, the estimated parameter value from sample data

Type I error

Rejecting true null

Type II error

Accepting false null

What is Regression sum of squares (RSS)

RSS is the amount of variation in the dependent variable that is explained by the independent variable. RSS = Summation(Yhat-Ybar)^2 also know as the explained variation

What is the sum of squared erros or residuals (SSE)

SSE is the amount of variation in the dependent variable that cannot be explained by the independent variable.

SSE = Summuaiton( Y-Yhat)^2

SSE = Unexplained variation

Mean Square of Regression (MSR)

MSR=RSS/k

Mean Square of Errors

MSE = SSE/(n-2)

What is the F-test

Used to test is all the slope coeficients in the regression are equal to zero.

The F-stat is a one tailed test. The decision rule for the test is that we reject Hnull if the f-stat > F critcial value

Components of Total Variatoin

Graphical Representaiton

What are the two source of uncertainty when we use a regression model

• The uncertainty inherent in the error term, ε.

• The uncertainty in the estimated parameters, b0 and b1.

Estimated Variation of Prediction error

How to construct prediction Interval

What is a confidence Interval and how do I construct it

Assumptions of Multiple Linear Regression Model

• The relationship between the dependent variable (Y) and the independent variables (X1, X2,..., Xk) is linear.

• The independent variables (X1, X2,..., Xk) are not random and no exact linear relationship exists between two or more independent variables.

• The expected value of the error term, conditioned on the independent variables, is zero: E(ε| X1, X2,..., Xk) = 0. 2 2

• The variance of the error term is the same for all observations. E(εi ) = σε . • The error term is uncorrelated across observations. E(εiε j ) = 0, j ≠ i . • The error term is normally distribute

What is R-Sqared and how do you calculate it.

RSS/SST

Adjusted R^2

Introduces pentlty fo new coffeicient

Anova Table

• Lists the regression sum of squares (RSS), sum of squared errors (SSE), and total sum of squares (SST) along with associated degrees of freedom.

• Also includes calculated values for mean regression sum of squares (MSR) and mean squared error (MSE).

• The F‐stat can be calculated by dividing MSR by MSE. The F‐test is used to test whether at least one of the slope coefficients on the independent variables in the regression is significantly different from 0.

• R2 (and adjusted R2) can be calculated from the data in the ANOVA table by dividing RSS by SST. R2 is used to determine the goodness of fit of the regression equation to the data.

• The standard error of estimate (SEE) can also be computed from the information in the ANOVA table. SEE = (MSE)^(1/2)

Why use a Dummy Variable?

To asses the stregnth of a qualitative variable

Heteroskedasticity

Heteroskedasticity occurs when the variance of the error term in the regression is not constant across observations. Figure 3-1 shows the scatter plot and regression line for a model with homoskedastic errors. There seems to be no systematic relationship between the regression residuals (vertical distances between the data points and the regression line) and the independent variable. Figure 3-2 shows the scatter plot and regression line for a model with heteroskedastic errors. Notice that the regression residuals appear to increase in size as the value of the independent variable increases.

Effects of Heteroskedasticity

Heteroskedasticity does not affect the consistency of estimators of regression parameters. However, it can lead to mistakes in inferences made from parameter estimates.

- The F‐test for the overall significance of the regression becomes unreliable as the MSE becomes a biased estimator of the true population variance.
- The t‐tests for the significance of each regression coefficient become unreliable as the estimates of the standard errors of regression coefficients become biased.

■ Typically, in regressions with financial data, standard errors of regression coefficients are underestimated and t‐stats are inflated due to heteroskedasticity. Therefore, ignoring heteroskedasticity results in significant relationships being found when none actually exist. (Null hypotheses are rejected too often.)

■ Sometimes however, heteroskedasticity leads to standard errors that are too large, which makes t‐stats too small.

Two Types of Heteroskedasticity

###
- Unconditional heteroskedasticity occurs when the heteroskedasticity of the variance in the error term is not related to the independent variables in the regression. Unconditional heteroskedasticity does not create major problems for regression analysis.
- Conditional heteroskedasticity occurs when the heteroskedasticity in the error variance is correlated with the independent variables in the regression. While conditional heteroskedasticity does create problems for statistical inference, it can be easily identified and corrected.

The Breusch‐Pagan (BP) Test is a Test for Heteroskedasticity, how is it done.

The BP test requires a regression of the squared residuals from the original estimated regression equation (in which the dependent variable is regressed on the independent variables) on the independent variables in the regression.

BP test is a Chi‐squared (χ2) random variable that is calculated as:

Correcting Heteroskedasticity

There are two ways to correct for conditional heteroskedasticity in linear regression models:

1. Use robust standard errors (White‐corrected standard errors or heteroskedasticity‐ consistent standard errors) to recalculate the t‐statistics for the original regression coefficients based on corrected‐for‐heteroskedasticity standard errors.

2. Use generalized least squares, where the original regression equation is modified to eliminate heteroskedasticity. See Example 3-2.