CFA L2 Quant Flashcards

Question 1

Q

True or false: With financial instruments, we can typically use a one-factor linear regression model?

Answer

A

False, typically we need a multiple regression model.

Question 2

Q

Multiple regression model

Answer

A

Regression models that allow to see the effects of multiple independent variables on one dependent variable.Ex: Can the 10-year growth in the S&P 500 (dependent variable (Y)) be explained by the trailing dividend payout ratio of the index’s stocks (independent variable 1 (X1)) and the yield curve slope (independent variable 2 (X2))?

Question 3

Q

What are the uses of multiple regression models?

Answer

A

Identify relationships between variables.Forecast variables. (ex: forecast CFs or forecast probability of default)Test existing theories.

Question 4

Q

Residual (ε)

Answer

A

The difference between the observed Y value and the predicted Y value (ŷ).ε = Y - ŷ ORY - (b0 + b1x1 + b2x2 … + bnxn)

Question 5

Q

P-value

Answer

A

The smallest level of significance for which the null hypothesis can be rejected.If the p-value is less than the significance level (α), the null hypothesis can be rejected and if it’s greater it is failed to be rejected.

Question 6

Q

If the significance level is 5% and the p-value is .06, do we reject the null hypothesis?

Answer

A

No, we fail to reject the null hypothesis.

Question 7

Q

Q-Q plot

Answer

A

A plot used to compare a variable’s distribution to a normal distribution. The residual of the variable’s distribution should lie along a diagonal line if they follow a normal distribution.

Question 8

Q

True or false: For a standard normal distribution, only 5% of the observations should be beyond -2 standard deviations of 0?

Answer

A

False, only 5% of the observations should be beyond -1.65 standard deviations.

Question 9

Q

Coefficient of determination (R^2)

Answer

A

The percentage of the total variation in the dependent variable explained by the independent variable.R^2 = SSR/SSTOR(SST - SSE) / SSTEx: R^2 of 0.63 means that the model explains 63% of the variation in the dependent variable.

Question 10

Q

Akaike’s information criterion (AIC)

Answer

A

Looks at multiple regression models and determines which has the best forecast.Lower values indicate a better model.Higher k values result in higher values of the criteria.Calculation: (n * ln(SSE/n)) + 2(k+1)

Question 11

Q

Schwarz’s Bayesian information criteria (BIC)

Answer

A

Looks at multiple regression models and determines which has a better goodness of fit.Lower values indicate a better model.Higher k values result in higher values of the criteria.BIC imposes a higher penalty for overfitting than AIC.Calculation: (n * ln(SSE/n)) + (ln(n)*(k+1))

Question 12

Q

Joint F-Test

Answer

A

Measures how well a set of independent variables, as a group, explains the variation in the dependent variable. Put simply, it tests overall model significance.Calculation: [ (SSErestricted - SSEunrestricted) / Q ] / [ (SSEunrestricted) / (n - k - 1) ]Q = # of excluded variables in the restricted model. Decision rule: reject the null hypothesis if F-stat > F critical value.

Question 13

Q

True or false: We could also use a t-test to evaluate the significance to see which variables are significant?

Answer

A

True, but the F-test provides a more meaningful evaluation since there is likely some amount of correlation among independent variables.

Question 14

Q

True of false: The F-test will tell us if at least one of the slope coefficients in a multiple regression model is statistically different from 0?

Question 15

Q

True or false: When testing the hypothesis that all the regression coefficients are simultaneously equal to 0, the F-test is always a two tailed test?

Answer

A

False, when testing the hypothesis that all the regression coefficients are simultaneously equal to 0, the F-test is always a one tailed test.

Question 16

Q

True or false: We can use the regression equation to make predictions about the dependent variable based on forecasted values of the independent variable?

Answer

A

True, we can make predictions.

Question 17

Q

Predicting the dependent variable from forecasted values of the independent variable:

Answer

A

ŷ = predicted value of the intercept + (X1 * estimated slope coefficient for X1) + (X2 * estimated slope coefficient for X2)…

Question 18

Q

Functional form misspecifications (A regression suffers from misspecification of the functional form when the functional form of the estimated regression model differs from the functional form of the population regression function):

Answer

A

Omission of important independent variables: may lead to biased and inconsistent regression parameters AND serial correlation or heteroskedasticity in the residuals.Inappropriate variable form (ex: you may need to take the natural log of a variable): may lead to heteroskedasticity in the residuals. This can happen if there is no linear relationship between the independent & dependent variables. Inappropriate variable scaling (ex: common-size financial statements): May lead to heteroskedasticity in the residuals or multicollinearity.Data improperly pooled: May lead to heteroskedasticity or serial correlation in the residuals.

Question 19

Q

Heteroskedasticity

Answer

A

When the variance of the residuals is not constant across all observations in the sample. This happens when there are subsamples that are more spread out than the rest of the sample.

Question 20

Q

Unconditional heteroskedasticity

Answer

A

When the heteroskedasticity is not related to the # of independent variables meaning heteroskedasticity won’t increase/decrease as the amount of independent variables increase/decrease.Although it’s a violation of our assumptions, it is usually not a big problem.

Question 21

Q

Conditional heteroskedasticity

Answer

A

Heteroskedasticity that is related to the # of independent variables. Creates significant problems for statistical interference if not corrected properly.

Question 22

Q

Effects of conditional heteroskedasticity

Answer

A

If the pattern of heteroskedasticity is low (most observations on the plot are low values):Standard errors (SEE) of the coefficients in a regression are affected by conditional heteroskedasticity and usually become unreliable estimates by being underestimated. This will lead to the T-stat being too large too often and thus rejecting the null too often, a.k.a type 1 error.For the F test (MSR/MSE), MSE is underestimated, and therefore the F-stat is often too large leading to the null is rejected too often, a.ka type 1 error.If the pattern of heteroskedasticity is high (most observations on the plot are high values): the same errors will happen but in the opposite direction.

Question 23

Q

How to detect conditional heteroskedasticity

Answer

A

There are two methods of detection: examining scatter plots of the residuals and by using the Breusch-Pagan chi-square test.

Question 24

Q

Serial correlation/autocorrelation

Answer

A

When residuals are correlated with each other.Poses serious problems when using time series data.

Question 25

Q

Positive serial correlation

Answer

A

When a positive residual in one time period increases the probability of observing a positive residual in the next time periodThis type of correlation typically results in coefficient standard errors that are too small, causing T-stats or F-stats to be too large, which will lead to type 1 errors.

Question 26

Q

Effect of serial correlation on model parameters

Answer

A

If the dependent variable’s reaction to the independent variable has a lag in a regression model, serial correlation causes the estimates of the slope coefficients to be inconsistent. If there is no lag, then the estimates of the slope coefficient will be consistent.

Question 27

Q

How to detect serial correlation?

Answer

A

First, we can use a scatter plot. This will show very dramatic scenarios. We can also use a Durbin-Watston (DW) statistic or a Breusch-Godfrey (BG) test. The DW statistic is used to detect serial correlation at a single lag, whereas a BG test is used to detect serial correlation at multiple lags.

Question 28

Q

Breusch-Godfrey (BG) Test

Answer

A

The BG Test regresses the residuals against the original set of independent variables, plus one or more additional variables representing lagged residuals.Calculation: ε = a1x1 + a2x2… + p1x1 + pnxnThe null under the BG test is that there is no serial correlation (i.e p1=0).

Question 29

Q

How to detect multicollinearity?

Answer

A

The most easily observable sign is when t-tests indicate none of the individual coefficients are significantly different than zero, but the F-test indicates that at least one of the coefficients is statistically significant and the R^2 is high. This means that none of the individual variables cause variation in the dependent variable but combined together they are highly correlated which washes out the individual effects.More formally we use a variance inflation factor (VIF) for each of the independent variables.

Question 30

Q

Variance inflation factor (VIF)

Answer

A

Estimates how much of the variation in the dependent variable in a multiple regressions model is due to multicollinearity. We start by regressing one of the independent variables (making it a dependent variable) against the remaining independent variables.VIF= 1 / (1 - Rj^2)VIF values >1 indicates that the variable is not highly correlated with other independent variables.VIF values >5 indicate further investigation.VIF values >10 indicate high correlation.

Question 31

Q

How to correct multicollinearity?

Answer

A

The most common method to correct for multicollinearity is to omit one or more of the highly correlated independent variables. You can also use a proxy for one of the variables or increase the sample size.

Question 32

Q

True or false: The coefficient on a variable in a multiple regression is the amount of return attributable to the variable?

Question 33

Q

True or false: Using actual instead of expected inflation will improve model specification?

Answer

A

False, using actual instead of expected inflation is likely to result in model misspecification.

Question 34

Q

Leverage (in statistics)

Answer

A

This is a way of identifying extreme observations in the independent variable. A measure of the distance between the xth observation of the independent variable relative to its sample mean. Leverage values will be between 0 and 1. The closer to 1 the farther the distance. If a variable’s leverage is higher than three times the average ((3*(k+1))/n), it is considered potentially influential.

Question 35

Q

Studentized residuals

Answer

A

A way of identifying outliers. The studentized residual is the # of standard deviations the data point is from the regression line. For each data point, the residual ÷ standard division is its standardized residual. There are four main steps to this process:Estimate the regression model using the original sample size and then delete one observation and re-estimate the regression. Perform this sequentially deleting a new observation each time.Compare the actual Y values of the deleted observation to the predicted y-values. ei= Y-ŷThe studentized residual is the residual in #2 ÷ standard deviation. t= ei / s Compare the studentized residuals to critical values in a t-table using n-k-2 df. Points that fall in the rejection region are termed outliers and potentially influential.

Question 36

Q

True or false: All outliers and high-leverage points are influential on the regression?

Question 37

Q

Cook’s Distance

Answer

A

A composite metric for evaluating if a high leverage and/or outlier is influential. Cook’s distance measures how much the estimated values of the regression change if certain high leverage points or outliers are deleted from the sample.Calculation:D= [ ei^2 / ((K+1) * MSE) ] * [ hi / (1-hx)^2 ]hi= leverage value for the xth observationei= the residual for the ith observationValues > than √(k/n) indicate the observation is highly likely to be an influential data point.Generally, values > 1 indicate highly influential, whereas values > 0.5 indicate the need for further investigation.

Question 38

Q

Dummy variables

Answer

A

Binary variables with only two options.When assigning a numerical value, it can only be 0 and 1.Always use (n-1) dummy variables to avoid multicollinearity (i.e., 3 dummy variables for 4 quarters in a year).Ex: True/falseEx 2: On/off

Question 39

Q

Dummy variables example:

Answer

A

EPS for four quartersEPS = 1.25 + 0.75Q1 - 0.20Q2 + 0.10Q3Question 1: What this the predicted EPS for Q4?Answer 1: EPS = 1.25 + 0.75(0) - 0.20(0) + 0.10(0) = 1.25omitted quarter shows as the interceptQuestion 2: What is the predicted value for Q1?Answer 1: EPS = 1.25 + 0.75(1) - 0.20(0) + 0.10(0) = 2.00Question 3: What is the predicted EPS for Q1 of next year?Answer 1: EPS = 1.25 + 0.75(1) - 0.20(0) + 0.10(0) = 2.00This simple model uses average EPS for any specific quarter over the past ten years as a forecast of EPS in its respective quarter of the following year.

Question 40

Q

Logistic regression (logit) model

Answer

A

Estimates the probability of a DISCRETE binary variable occurring.Logit models assume that residuals have a logistic distribution- similar to a normal distribution but with fatter tails.Logit models are nonlinear.Calculation: ln(p/(1-p)) = b0 + b1x1 + b2x2 … + εThe intercept value is an estimate of log odds when the values of all independent variables is zero.The change in log odds when one of the independent variables change is dependent on the curvature of the function.Odds= e^yProbability = 1 / (1 + n(p/(1-p)))OR1 / (1 + e^(-yhat))

Question 41

Q

Likelihood ratio (LR) test

Answer

A

Similar to joint F-test but for logit models. Measures the goodness of fit of a logit model.Calculation= -2 * (log likelihood restricted model - log likelihood unrestricted model).Recall, the restricted model has fewer independent variables.Always provides a negative value. Values closer to 0 indicate a better-fitting model. LR test is a chi-square distribution.

Question 42

Q

Time-series data

Answer

A

A set of observations taken periodically (most often at equal intervals) at different points in time.A key feature of a time series is that new data can be added w/o affecting the existing data.Trends can be found by plotting these observations on a graph.

Question 43

Q

Linear trend

Answer

A

1/2 broad types of trend models. A time-series trend that can be graphed using a straight line. The independent variable will be time. A downward sloping linear trend indicates a negative trend and vice versa for a positive trend.Simplest form: Y= bo +b1(t) + b2(t) … + ε

Question 44

Q

Log-linear trend model

Answer

A

1/2 broad types of trend models. This is used to model positive and negative exponential growth. Recall, exponential growth is some constant growth rate (positive or negative). Exponential growth will show a convex curve. Simplest form: e^(b0 + b1(t))b1 is the constant rate of growth.Rather than trying to fit the nonlinear data with a linear (straight line) regression, we take the natural log of both sides and transform it into a linear trend line called the log-linear model. This increases the predictive ability of the model.Form: ln(y) = ln(e^(b0 + b1(t)))

Question 45

Q

How to determine if a linear or log-linear trend model should be used?

Answer

A

Plot the data. A linear trend model may be used if the data points are equally distributed above and below the regression line (ex: inflation data is usually modeled with a linear trend model). If, when plotted, the data plots with a curved shape, use a log-linear trend model (ex: financial data- stock indices and stock prices- are often modeled with log-linear trend models).If there is serial correlation, we will use an autoregressive model.

Question 46

Q

True or false: For a time series model without serial correlation, the DW statistic should be approximately equal to 0?

Answer

A

False, for a time series model without serial correlation, the DW statistic should be approximately equal to 2. A DW that significantly differs from 2 suggests that the residuals are correlated.

Question 47

Q

Autoregressive (AR) model

Answer

A

A time-series model that regresses the dependent variable against one or more lagged values of itself. Ex: A regression of the sales of a firm against the sales of the firm in the previous month. In this model, past values are used to predict the current value of the variable.DW test stat cannot be used to test for serial correlation in AR model. Simplest form: Xt = bo + b1x_t-1 …. bpx_t-p + εXt= value of time series at time tX_t-1= value of time series at time t-1

Question 48

Q

Covariance stationary

Answer

A

An AR model is covariance stationary if:There is a constant and finite expected value: the expected value is constant over time.Constant and finite variance: the volatility around the time series’ mean is constant over time.The covariance between any two observations w/ equal distance apart will be equal.

Question 49

Q

True or false: A nonstationary time series can still produce meaningful results sometimes?

Answer

A

False, we need stationary covariance.

Question 50

Q

T-stat for residual autocorrelations in AR model:

Answer

A

correlation of the error term with the kth lagged error term ÷ (1 ÷ √n)Standard error = (1 ÷ √n)(n-2) dfn= # of observations.

Question 51

Q

Mean reversion

Answer

A

When a time-series has a tendency to move towards its mean. In other words, the dependent variable has a tendency to decline when the current value is above the mean and rise when the current value is below the mean. If a time series is at its mean reverting level, the model predicts the next value of the time series will be the same as its current value.

Question 52

Q

Mean reverting level calculation

Answer

A

Xt = b0 ÷ (1 - b1)The model will not be covariance stationary if b1 = 1If Xt > than the mean reverting level, the model predicts that x_t+1 will be lower than Xt and vice versa.All covariance stationary time series have a finite mean-reverting level.As forecasts become more distant, the value of the forecast will be closer to the mean reverting level.

Question 53

Q

In-sample

Answer

A

Data that was used to develop a regression model.

Question 54

Q

True or false: Financial and economic time series inherently exhibit some form of instability or nonstationarity.

Answer

A

True. Since financial/economic conditions are dynamic, the coefficients in one period may be different from those in another period. Model with shorter estimated time periods are usually more stable for this reason. When selecting a time series sample, analysts should understand regulatory changes, changes to the economic environment, etc. If there have been large changes, the model may not be accurate.

Question 55

Q

True or false: There is a trade-off between statistical reliability in the long run and statistical stability in the short run?

Answer

A

True. Statistical reliability= if you use a long time period, there is more statistical reliability.

Question 56

Q

Random walk

Answer

A

When, in an AR model, the value of the dependent variable in one period is equal to the value of the series in the previous period plus a random error term.Form: Xt = X_t-1 + εb0 = 0b1 = 1

Question 57

Q

Random walk with a drift

Answer

A

The same concept as a random walk but the intercept term is not equal to zero. Thus, the time series model is expected to increase/decrease by the intercept term and the error term.Form: Xt = b0 + X_t-1 + εb1 = 1

Question 58

Q

True or false: A random walk with or w/o a drift is NOT covariance stationary?

Answer

A

True, random walks will always have a unit root which makes them not covariance stationary.

Question 59

Q

Dickey-Fuller Test

Answer

A

A test we use in an AR model to determine if there’s a unit root.Calculation: Xt = b0 + b1X1 + ε ↠ Xt - X_t-1 = b0 * (b1 X_t-1) - X_t-1 + ε ↠ Xt - X_t-1 = b0 + (b1 -1) * X_t-1 + ε ↠Then, test whether the new coefficient (b-1) [(b-1) a.k.a G] = 0 using a t-test.The null hypothesis is that (b-1)= 0. If the null is failed to be rejected, the time series has a unit root and is nonstationary.

Question 60

Q

True or false: The Dickey-Fuller test uses the standard T distribution to find the critical values?

Answer

A

False, it has its own distribution to calculate the critical values.

Question 61

Q

First differencing calculation

Answer

A

If the original time series has a unit root, then ε= Xt - X_t-1Then we will create a new dependent variable: Yt = Xt - X_t-1ORYt = εThen, if we state it in the form of an AR model:Yt= B0 + B1*(Y_t-1) + εB0 = B1 = 0

Question 62

Q

Seasonality

Answer

A

A characteristic of a time series in which the data experiences regular and predictable changes that recur every calendar year.If seasonality is present, we MUST adjust the AR model in order for it to be correctly specified.

Question 63

Q

How to correct for seasonality?

Answer

A

We add an additional lag of the dependent variable to the original model as another independent variable. The lag will be X_t-4 in a quarterly model or X_t-12 in a monthly model. Calculation: ln(Xt) = b0 + b1 * ln(X_t-1) + b2 * ln(X_t-4) + ε

Question 64

Q

True or false: for a T-test with seasonality, the null hypothesis is that there is seasonality?

Answer

A

False. H0 = 0- no seasonality present; Ha ≠ 0- seasonality present.

Answer 62

A

When the variance of the residuals in one period is dependent on the variance of the residuals in a previous period in an AR model. When ARCH exists, the standard errors of the coefficients and the hypothesis tests are invalid.

Answer 63

A

After we run an ARCH model, if we determine that a1 is significant, future variance of errors can predicted by using:σ^2_t+1 = a0hat + a1hat * εt^2We cannot predict future variance if a1 is not significant.

Answer 64

A

When more than one time series is run at the same time.Ex: Yt = b0 + b1 * Xt + εt ↠ Yt and Xt are two different time series.Either or both of these time series could be subject to nonstantionarity.

Answer 65

A

Run separate DF tests for each time series.If either of the time series’ are nonstationary, the coefficients will be unreliable.

Answer 66

A

Regress one variable on the other:Yt = b0 + b1 * Xt + εYt= value of time series ‘Y’ at time t.Xt= value of time series ‘X’ at time t.Then, the residuals are tested for a unit root using the Dickey-Fuller test with critical t-values calculated by the Engle and Granger (DF-EG Test). If the DF test rejects the null hypothesis (Ho= no cointegration), then we conclude the error terms are covariance stationary and there is cointegration.

Answer 67

A

A significant shift in the plotted data at a point in time that essentially divides the data into two or more distinct patters.If there is structural change present, you must run two different models- one incorporating data before the data and one after the date.

Answer 68

A

Filters useful info from substantial amounts of data by learning from known examples to find a pattern in the data. Machine learning acts without human intervention.

Answer 69

A

1/3 types of ML. We teach the model, then with that knowledge have it predict future instances. Supervised learning uses labeled data- data where the target variable is defined. Supervised learning is used when the training data contains the ground truth= the target variable. Multiple regression is an example of supervised learning. Regression and classification are the two most common examples of supervised learning. If target variable is continuous then use a regression. If target variable is categorical or ordinal then use a classification model. Output of classification models looks to group observations.

Answer 70

A

1/3 types of ML that is used for complex tasks such as imagine recognition, natural language processing, etc. Deep learning is based on neural networks. Deep learning is a self-teaching system. A type of NN with many hidden layers (at least two but often more than 20)

Answer 71

A

Algorithms that have an agent seek for a reward given restraints. The RL does not rely on labeled data, but rather these programs learn from their own prediction errors.

Answer 72

A

The extent to which a ML program is able to make out-of-sample predictions.

Answer 73

A

When a large number of features (independent variables) are in the data set. Overfitting will decrease the accuracy of out-of-sample forecasts.The training sample will have a high R^2 and the test sample will have a low R^2.

Answer 74

A

Create three overlapping data sets:Training sample: In-sample data. Used to train the ML algorithm.Validation sample: Out-of-sample data. Used to tune the training model.Test sample: Out-of-sample data.A model that generalized well should have a high R^2 for in-sample and out-of-sample data.

Answer 75

A

This is the in-sample error resulting from models with a poor fit.Occurs when there is underfitting.

Answer 76

A

This is the out-of-sample error resulting from overfitted models that do not generalize well. This is the extent to which the ML model’s results change in response to test and validation sample data.Associated with overfitting.Increases with model complexity.Nonlinear models tend to have high variance error.

Answer 77

A

This is the out-of-sample error resulting from residual errors due to random noise. Just randomness in the data.Decreases with model complexity.Linear models tend to have high base error.

Answer 78

A

Plots the accuracy rate in the test sample versus the size of the training sample. A ML model that generalizes well will show an improving accuracy rate as the sample size increases. The in-sample and out-of-sample error rates should converge toward the desired level as the sample size increases.

Answer 79

A

In-sample accuracy rate= 1 - bias error rateOut-of-sample accuracy rate= 1 - variance error rate.Base accuracy rate= 1 - base error rate.

Answer 80

A

False, the accuracy rates will converge just far below the desired level.

Answer 81

A

False, only the in-sample accuracy rate will converge towards the desired level.

Answer 82

A

Reduce complexity and cross validation.

Answer 83

A

AN estimate of out-of-sample error rates directly from the validation sample.

Answer 84

A

A penalty imposed to exclude features that do not meaningfully contribute to out-of-sample prediction accuracy.

Answer 85

A

When the ML algorithm fails to identify an actual relationship. This occurs when there is an oversimplified model.R^2 will be low for in-sample and out-of-sample data.High bias errorLinear functions are susceptible to underfitting.

Answer 86

A

A method for alleviating the holdout sample problem: when the training set is reduced too much. This process eliminates sampling bias. There are four steps in this process:Shuffle the data randomly.Divide the data into k equal sub-samples.K-1 samples will be training samples with the remaining sample a validation sample.This process is then repeated k times. The average of the k validation errors is then taken as a reasonable estimate of the model’s out-of-sample error.

Answer 87

A

This is a popular penalized regression model. LASSO attempts to minimize SSE and the sum of the absolute values of the slope coefficients of the regression. The penalty increases with number of features. There is a tradeoff in reducing SSE (increasing independent variables) and the penalty imposed. Investment analysts use LASSO to build parsimonious (few predictor variables) models.

Answer 88

A

A type of penalized regression. Forces the beta coefficients of nonperforming features towards zero. Regularization can be applied to non-linear models.

Answer 89

A

A common supervised ML algorithm often used for text data. The model assumes the data is linearly separable; An SVM is a linear classification algorithm. An SVM attempts to find the optimal hyperplane that separates two sets of data (classes) by the max amount using n features.

Answer 90

A

Handles misclassified observations in the training data in an SVM.

Answer 91

A

A common supervised ML algorithm. New observations are classified by finding the nearest” (most similar) between a new observation and its k-nearest piece of data in the current data set. If k=5

Answer 92

A

A common supervised ML algorithm that combines the predictions from multiple models rather than a single model. The different models cancel out noise and result in a lower average error rate. There are two types of ensemble methods: aggregation of heterogeneous learners and aggregations of homogeneous learners. Ensemble learning typically produces more stable and accurate results than single models. Aims to decrease variance (bagging), decrease bias (boosting), and improving predictions (stacking).

Answer 93

A

Different algorithms are combined together through a voting classifier and each algorithm gets a vote. The answer with the most votes is the model we go with.

Answer 94

A

The same algorithm is used but on different training data. The different training data used by the same model can be derived through bootstrap resampling (a.k.a bagging).

Answer 95

A

A common supervised ML algorithm. This is a variation of a classification tree where a large # of classification trees are trained using bagged data from the same data set. A random subset of features is used in creating each tree, thus every tree is different. This process mitigates overfitting and reduce noise from errors. A drawback of Random Forests is that the transparency of CART is lost.Random forests can INCREASE the signal-to-noise ratio.

Answer 96

A

A common unsupervised ML algorithm. Problems w/ too much noise arise when there are excessive amts of features (high dimensionality). PCA seeks to reduce this excess noise by discarding the excess features. A PCA transforms the feature’s covariance matrix in order to reduce highly correlated features into a smaller # of uncorrelated features, called eigenvectors, which are linear combinations of the original feature. Each eigenvector has an eigenvalue: the proportion of total variance in the data set explained by the eigenvector. The end product is an algorithm with lower dimensionality, which makes the model easier to train and interpret.

Answer 97

A

A plot that shows the proportion of total variance explained by each of the principal components.

Answer 98

A

A common unsupervised ML algorithm. Clustering is the process of grouping observations into categories based on similar attributes (a.k.a cohesion). The two most common types of clustering are: K-means clustering and hierarchical clustering.

Answer 99

A

Grouping observations into categories based on the observations’ similarities.

Answer 100

A

1/2 main types of clustering that builds a hierarchy of clusters without any predefined # of clusters.

Answer 101

A

1/2 types of hierarchical clusters. This starts with one observation as its own cluster and then adds other similar observations to that group, thus forming another nonoverlapping cluster. In the end, all observations are merged into a single cluster.

Answer 102

A

Made up of layers of neurons. The first layer is the input layer (node layer), which receives the input. The final layer is the output layer. In between exists hidden layers. Neurons of each layer are connected to neurons of the next layer through channels. There may be multiple hidden layers. The multiple layers allow the NN to model complex nonlinear functions. NNs are an adaptive system that computers use to learn from their mistakes and improve continuously. A group of ML algorithms applied to problems with significant nonlinearity.

Answer 103

A

1/2 types of hierarchical clusters. The algorithm starts with one giant cluster, and then it partitions that cluster into smaller and smaller clusters. In the end, each cluster contains only one observation.

Answer 104

A

Neurons comprise the summation operator which gathers the info from the neurons and assigns them a weighted average, then passes the info on to the activation function. The activation function then generates a value from the inputs.

Answer 105

A

This is how the machine learns from its errors. When the weighted averages from the summation operators are adjusted as the algorithm learns from its errors.

Answer 106

A

Conceptualization of the problemData collectionData preparation and wrangling: cleaning the data set and preparing it for the model. Data exploration: Feature selection and performing data analysis. Evaluating the data set and determining the most appropriate way to configure it for model training.Model training: Determining which ML algorithm to use, using a training data set, and tuning the model.

Answer 107

A

Text problem formulationText curation: ensuring the quality of data, for example by adjusting for bad or missing data.Text preparation and wranglingText explorationModel training

Answer 108

A

Reducing errors in raw data. Common errors include:Missing valuesInvalid valuesInaccurate valuesNon-uniform valuesDuplicate observations

Answer 109

A

Prepping data for model use. This includes transforming and scaling. Data transformations include:ExtractionAggregation: consolidating two variables into one (using appropriate weighting)Filtration: removing irrelevant observations.Selection: removing features not needed for processing.Conversion of data of diverse types

Answer 110

A

Contain info about how, what, and where the data is stored. Helps ensure validity.

Answer 111

A

Data that describes other data by providing info about one or more aspects of the data. Essentially a summary.

Answer 112

A

A way researchers exclude outliers. Instead of entirely excluding outliers, they substitute reasonable values in for them.

Answer 113

A

One way researchers exclude outliers. This type of means excludes a certain portion of the highest values and lowest values. For example, excludes lowest 1% and highest 1% of all values.

Answer 114

A

1/2 common types of scaling. Scales variable values between 0 and 1.Sensitive to outliers.Use this when trying to understand where the variables lie within the data set.Calculation: (Xi - Xminimum) ÷ (Xmaximum - Xminimum)

Answer 115

A

Lowercasing: Ex: Dog ↠ dogRemoval of stop words: super common unimportant words Ex: the, is, and, etc. Stemming: Take similar words and combine them into a single word. Ex: integrate ↠ integration ↠ integrating Lemmatization: Return the base of the word. Ex: saw ↠ seeBag-of-words (BOW): A bow is just the results of steps #1-#4. All the collected words or tokens are collected w/o regard to occurrence. If order doesn’t matter we can stop here.N-gram: If ordering is important, we can create a two-gram to look for two specific words that go together or three-gram that looks for three words that go together, and so on.Organizing the BOW and N-Gram into a document term matrix (DTM):

Answer 116

A

ML models that give you a result without explaining how they get to their decision.

Answer 117

A

When a feature is created from the data set.Ex: Creating a value for age using date of birth data.

Answer 118

A

A type of future engineering. The process used to convert a categorical feature into a dummy variable.

Answer 119

A

Term frequency= The # of times the token appears in the datasetDocument frequency= The # of documents that a token appears in ÷ the # of documents. Chi-square Test= Ranks tokens by their usefulness to a certain class of info. Tokens with higher chi-square test-stat occur more frequently.Mutual information Test= A numerical value indicating the contribution of a token to a specific class. Tokens with less frequencies in a class compared to another class it will have a value close to 1, whereas if a token appears a lot in all classes it will have a value of 0.

Answer 120

A

Numbers= Tokens w/ standard lengths are converted into new tokens. Ex: 4 letter words converted into ‘#4’.N-GramsName entity recognition (NER)= Assign tokens a NER tag based on their context. Ex: Europe-place ; Google-website.Parts of Speech= Assign tokens a POS tag based on their language structure. For example: Google- PPN (proper noun) ; 2000 - CDN (cardinal #).

Answer 121

A

The researcher must define the objective(s) of data analysis, identify useful data points, and conceptualize the model. Once a ML algorithm/method is selected, he should specify the hyperparameters.

Answer 122

A

Small training samplesLow # of features in the model. This can lead to an underfitting problem because the model doesn’t have enough info to find patterns. Feature selection is important to mitigate underfitting and overfitting. Feature engineering can reduce underfitting.

Answer 123

A

Method selection= choosing the right ML algorithm considering supervised/unsupervised learning, type of data, and size of data.Performance evaluationTuning

Answer 124

A

Text= SVMs and Generalized linear models (GLMs)Numerical= Regression trees, CART methods, and classification methods.Image= Neural networks and deep learning networks.

Answer 125

A

Error analysis: Errors in classification problems can be false positives (type 1 errors) or false negatives (type 2 errors). We build confusion matrixes for type 1 and type 2 errors.Receiver operating characteristic (ROC)Root mean squared error (RMSE)

Answer 126

A

A way to evaluate the fit of an ML algorithm. It’s the ratio of true positives (not false positives (type 1 errors)) to predicted positives. Use the precision metric when the cost of a type 1 error is large.Calculation: True positives ÷ (True positives + false positives)

Answer 127

A

A way to evaluate the fit of an ML algorithm. It’s the ratio of true positives (not false positives (type 1 errors)) too all actual positives. Use when the cost of a type 2 error is large.Calculation: True positives ÷ (True positives + false negatives)

Answer 128

A

A way to evaluate the fit of an ML algorithm. It’s the harmonic mean of precision and recall.The higher the better.More appropriate than the model accuracy metric when there are class imbalances. Calculation: (2 * precision * recall) ÷ (Precision + recall)

Answer 129

A

A curve that plots the tradeoff between false positives and true positives. The true positive rate (recall metric) is plotted on the y-axis, whereas the false positive rate is plotted on the x-axis. The area under the curve (AUC) is a value from 0 - 1. The closer the value is to 1 the higher the predictive accuracy of the model. AUCs = 0 mean it’s never right and 0.5 mean 50% of the time- just guessing. The higher convexity of the curve the higher its AUC.

Answer 130

A

A graph that plots error (in-sample error (training sample error) and out-of-sample error (cross-validation sample error) on the y-axis and model complexity on the x-axis. The graph shows two curves: a curve for training error and a curve for cross-validation prediction error.

Answer 131

A

An evaluation and tuning of each components in the model.Applied to complex models.

Answer 132

A

The primary limitation of trend models is that they are not useful if the residuals exhibit serial correlation.

Answer 133

A

False, it’s non-parametric: it makes no assumptions regrading the distribution of the data.

Answer 134

A

LASSO models are used to build parsimonious models and regularization is used for nonlinear models.

Answer 135

A

Generates binary classifications, such as: classifying debt issuers into likely-to-default versus not-likely-to-default issuers, stocks-to-short versus not-to-short, and even classifying text (from news articles or company press releases) as positive or negative.

Answer 136

A

Predicting bankruptcy, assigning a bond to a ratings class, predicting stock prices, and creating customized indices.

Answer 137

A

Fraud detection in financial statements and selecting stocks/bonds.

Answer 138

A

Factor-based asset allocation and prediction models for the success of an IPO.

Answer 139

A

False, the hidden layer nodes (not the input layer nodes) each consist of a summation operator and an activation function; these nodes are where learning takes place.

Answer 140

A

False, it only allows us the reject the hypothesis that all regression coefficients are zero and accept the hypothesis that at least one isn’t.

Brainscape's Knowledge GenomeTM

CFA L2 Quant Flashcards

Brainscape's Knowledge Genome^TM