Quant Flashcards

1
Q

5 Assumptions to use a multiple regression model

A

1) Linearity
2) Homoskedasticity
3) Independence of Errors
4) Normality
5) Independence of Independent Variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Linearity Assumption

A

The relationship between the independent variable(s) and dependent variable needs to be linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Homoskedasticity Assumption

A

the variance of the regression residuals should be the same for all observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Independence of Errors Assumption

A

The observations are independent of one another and uncorrelated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Normality Assumption

A

The regression residuals are normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Independence of Independent Variables Assumption

A

Independent variables are not random and they are not correlated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Adjusted R-Squared

A

Adjusted version of R-squared that increases when new variables introduced into the model help improve its accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

AIC v. BIC

A

AIC is for prediction
BIC is for goodness of fit
Lower values are better for both

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

F Statistic

A

[(SSE of unrestricted - SSE of restricted)/q] / (SSE of restricted)(n-k-1)

SSE is mean squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

T Stat when only given coefficient and standard error, and what is null hypothesis

A

coefficient/error, null hypothesis is coefficient does not differ significantly from 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Breusch Pagan Test (BP)
- What does it test for
- What is the formula

A

1) Conditional Heteroskedasticity - variance in residuals differs across observations

2) n*R-Squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

2 Types of Heteroskedasticity

A

1) Conditional - error variance is correlated with independent variables (much bigger problem) - high probability of Type 1 errors

2) Unconditional - less problematic, no correlations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Durbin-Watson Test (DW)

A

A test for first-order serial correlation in time series model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Breusch-Godfrey Test (BG)

A

A test to used to determine autocorrelation up to a predesignated order of the lagged residuals in a time series model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Multicollinearity

A

When two or more independent variables are correlated to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Test for multicollinearity

A

Variance inflation factor (VIF)

1 / (1-R-Squared)

Any value over 5 warrants investigation
Any value over 10 means multicollinearity is likely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Two types of observations that may influence regression results

A

1) High Leverage Point
2) Outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Difference between high leverage point and outlier

A

High leverage point is when x value is extreme and outlier is when the y value is extreme, however a point can be both high leverage and an outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How to calculate if a point is high leverage

A

Leverage

If leverage exceeds 3*(k+1)/n

k - independent variables
n - observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When looking at regression, determine if independent variable is significantly different from 0

A

If T stat > p value, it is significantly different from 0

T stat if not given is coefficient / standard error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Method to identify if method is an outlier and what is the formula

A

Studentized deleted residuals

t(I) = residual with the ith term deleted (e(I)) / standard deviation of all residuals (s(e)) == this equals standard error

if greater than 3 or greater than the critical t stat with n-k-2 degrees of freedom, observation is an outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

When is an observation considered influential

A

If its exclusion from the sample causes substantial changes in the regression function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Cook’s D

A

Metric for identifying influential observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Interpreting Cook’s D

A

If value is greater than 0.5, possibly influential

If value is greater than 1, likely influential

If value greater than SqRt(k/n), likely influential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Dummy Variable
Independent variable that takes on a value of either 0 or 1 also called indicator variable
24
Types of dummy Variables
1) Intercept Dummy 2) Slope Dummy 3) Interaction Term
25
Go from log odds to probability
1) Raise it to power of e, this is odds 2) Take odds/(1+odds), this is probability
26
Likelihood Ratio (LR) Test
A method to assess the fit of logistic regression models that is based on the log-likelihood metric that describes the model's fit to the data LR = -2 * (Log-likelihood of restricted model - log-likelihood of unrestricted model)
27
Calculate Standard Error of autocorrelations in time series
1 / sqrt(T), where T is number of observations, uniform for every observation
28
Covariance Stationary
A key assumption to make a valid statistic inference in time series models 1) Expected value must be constant and finite in all periods 2) Variance must be constant and finite in all periods 3) Covariance must be constant and finite in all periods
29
Autocorrelation
Correlations of a time series with its own past values
30
Mean reverting level of a time series
b(0) / (1-b(1))
31
Root Mean Squared Error (RMSE)
The square root of the average squared forecast error, used to compare the out-of-forecast performance of forecasting models Smallest RMSE is most accurate
32
How to handle simple random walk without drift
First difference the time series because it makes it covariance stationary
33
Expected Value of simple random walk without drift
0
34
How to test for unit root
Dickey-Fuller Test The null hypothesis is that a unit root is present, so rejected the null is to say the time series is covariance stationary
34
Unit Root
A time series that is not covariance stationary has a unit root and is therefore a random walk When the absolute value of the lag coefficient (b1) is 1 or greater than 1, unit root is present
35
Co-integration
If we are mapping two series and both have a unit root, they are co-integrated, meaning they move together, and a relationship can be established between the two
36
Mean Reverting Level
b(0) / (1-b(1)), where b0 and b1 are the coefficients in the model you're referencing
37
How to interpret Durbin Watson
A value of 2 means there is no serial autocorrelation 2-4 is negative correlation 0-2 is positive correlation 1.5-2.5 is safe zone where you can use the results
38
When can you not use the Durbin Watson Test in a time series
When one of the independent models you are using is a lagged dependent variable
39
RMSE Calculation
1) Take difference between mean and forecasts 2) Square the differences 3) Sum the squares 4) Divide by the number of observations to get the mean 5) Take square root of the mean The lower the RMSE the more accurate the model
40
How to tell if model is covariance stationary based off regression results
coefficient/standard error for each b term (or respective t stat) and compare to critical t stat if not greater, not significantly different from 0 and therefore not covariance stationary, and also has a unit root
41
Null hypothesis in Dickey Fuller Test
Null is there is unit root, so if T stat below critical value, there is unit root
42
In AR1 Model, how do you know if there is a unit root (random walk)
If B0 is 0 and B1 is 1
43
A bag of words
Representation of text that describes the occurrence of words within a document
44
Winsorization
The process of replacing extreme values and outliers with the maximum and minimum points
45
Recall
TP/TP+FN -> uses first column only
46
Precision
TP/TP + FP -> Uses first row only
47
When would CART and random forests be used
classification of labeled data and regression not used for unlabeled data
48
Low bias error but high variance are indicative of what
Overfitting
49
Tokenization
Splitting a given word into text or characters
50
Which supervised learning technique requires no hyperparameter
SVM
51
Hyperparameter in LASSO
lambda
52
Hyperparameter in KNN
k
53
K means clustering
Unsupervised technique where partitions observation into a fixed number, k, of non-overlapping clusters. Each cluster is characterized by its center (centroid) and each observation is assigned to the cluster with the centroid it matches closest with
54
What does the r stand for in DW equation 2(1-r)
The sample correlation between the regression residuals
55
What types of variables are logistic regression most suited for
discrete variables, where traditional regression is suited for continuous variables
56
Target vs. Features
In supervised learning, target is the y (dependent variable) and features are the x (independent variable)
57
Complexity
The number of features in a model
58
Bias Error
The degree to which a model fits the data
59
Base Error
Due to randomness in the data
59
Variance Error
How much the model changes to new observations
60
Learning Curve
Curve that plots the accuracy rate
61
Soft Margin Misclassification
Adds a penalty to the objective function for observations that are misclassified in a SVM model
62
K Nearest Neighbor
A supervised learning technique that classifies a new observation by finding similarities between this observation and the existing data
63
Classification and Regression Tree (CART)
a supervised learning technique that can be used to predict either a categorical or target variable, typically used on binary classification or regression
64
Pruning
a regularization technique used in CART models to reduce the dimensions of the model
65
Ensemble Learning
Combining the predictions from a collection of models
66
Bagging
- bootstrap aggregating - the original training data is used to generate new training data
67
Random forest classifier
A collection of a large number of decision trees via bagging
68
F1 Score
Harmonic mean of recall and precision (2*P*R) / (P+R)
69
Principal Components Reduction (PCA)
a unsupervised technique to reduce dimensions
70
Composite variable
a variable that combines two or more variables that are statistically strongly related to each other
71
Eigenvector
in the context of PCA, a vector that defines new mutually uncorrelated composite variables that are linear combinations of the original features
72
Eigenvalue
A measure that gives the proportion of total variance in the initial dataset that is explained by each eigenvector
73
Scree plot
a Plot that shows the proportion of total variance in the data explained by each principal component
74
Hierarchical Clustering
Iterative procedure used to build a hierarchy of clusters
75
Agglomerative clustering
a bottom-up hierarchical clustering method that begins with each observation being treated as its own cluster
76
Divise clustering
A top-down hierarchical clustering method that starts with all observations belong to a single large cluster
77
Dendrogram
a type of tree diagram used for visualizing a hierarchical cluster analysis
78
Summation operator
A functional part of a neural network's node that multiplies each input value received by a weight and sums the weighted values to form the total net input, which is then passed to the activation function
79
Activation Function
A functional part of a neural network's node that transforms the total net input received into the final output of the node
80
Backward propagation
The process of adjusting weights in a neural network, to reduce total error of the network, by moving backward through the network's layers
81
Learning Rate
a Parameter that affects the magnitude of adjustments in the weights in a neural network
82
Forward Propagation
The process of adjusting weights in a neural network, to reduce total error of the network, by moving forward through the network's layers
83
Deep Neural Networks
Neural networks with many hidden layers, at least 2, but often more than 20
84
Reinforcement Learning
Machine learning in which a computer learns from interacting with itself or data generated by the same algorithm
85
3 Characteristics of Big Data
1) Volume 2) Variety 3) Velocity
86
Stemming
Process of converting inflected forms of a word into its base word (analyzing -> analyz)
87
Lemmatization
Process of converting inflected forms of a word into its morphological root (analyzing -> analyze)
88
Bag-of-words
A collection of distinct set of tokens from all the texts in a sample dataset, but does not capture the position or sequence of those words the next step after cleansing data
89
Document Term Matrix (DTM)
last step of text processing uses the BOW Matrix where each row belongs to a document and each column represents a token
90
N-grams
a representation of word sequences, unigram, bigram,trigram etc.
91
False positive rate
FP / (TN+FP)
92
True positive rate
TP / (TP + FN)
93
When is precision useful
Where the cost of FP/Type 1 Error is high
94
When is Recall useful
When cost of FN/Type 2 error is high
95
What type of data is best used with SVM models
linear data
96
Veracity
The accuracy of data
97
Inconsistency Error
The data conflicts with what it should be (male in name column), "it doesn't make sense" data point
98
Non-Uniformity Error
Data not presented in same format
99
Extraction
New Variable is created using existing data
100
Difference in purpose between feature selection and feature engineering
Feature selection minimizes overfitting and feature engineering minimizes undercutting
101
Normalization Formula
(value - min) / (max - min)
102
How much should be allocated to training set when there is absence of ground of truth
0%, this is unsupervised data set
103
Invalidity Error
When the result is outside the meaningful range
104
SEE formula
if the relationship between the dependent and independent variables is strong, the SEE will be low Sq (MSE) MSE = SSE / n-k-1
105
Formula for T-statistic for correlation coefficient
t = (r * sq(n-2)) / (sq(1-r^2))
106
MSE Formula
SSE / n-k-1
107
Degrees of freedom for error term
n - k - 1
108
MSR formula
RSS / k
109
F stat formula
MSR / MSE MSR formula = RSS / k MSE = SSE / n-k-1
110
how many tails is f test
1
111
What does rejection of the null hypothesis of F test mean
at least one of the coefficient is significantly different than 0, which is good for explanatory reasons
112
What is the effect of serial correlation
Type 1 errors
113
what is the effect of multicollinearity
type 2 errors
114
Two categories of supervised learning
1) Regression 2) Classification
115
What type of learning is regression and when would it be used
If the target variable is continuous (supervised learning)
116
What type of learning is classification and when would it be used
If the target variable is categorical or ordinal, such as company rating (Supervised learning)
117
Two categories of unsupervised learning
1) Dimension reduction 2) Clustering
118
What type of learning technique is CART
supervised learning
119
What type of variables is CART used to predict
EITHER continuous or categorical
120
What type of learning technique is K-means and is it top down or bottom up
unsupervised / clustering / bottom up
121
What type of learning technique is principal component analysis and what is it good for
unsupervised / provides insight into the volatility contained in a data set
122
What type of learning technique is KNN
supervised
123
What type of learning technique is LASSO
supervised / regression
124
What is k-fold-cross-validation
technique for mitigating excess reduction of the training set size by reshuffling the training set
125
Advantage of using CART over KNN
1) CART provides visual 2) CART does not require initial hyper parameters set 3) CART does not require to specify a similarity measure
126
when is model generalization maximized
when prediction error on test data is minimzed
127
What is high bias error and high variance error indicative of
underfitting
128
which error are linear functions more prone to
bias error
129
are linear functions more prone to underfitting or overfitting
underfitting
130
which ML technique makes use of root nodes, decision nodes, and terminal nodes
CART
131
Durbin Watson for AR(1) models
indeterminable
132
What modeling technique can you use on random walk patterns
first-differenced regression
133
what is the most common problem with trend models
serial correlation
134
when can you not calculate the mean reverting level
when x1 is greater than 1
135
Stop word
A word that is so common in a text that it carries no meaning
136
Standardization in text processing
lowercasing, removing stop words, stemming and lemmatization
137
What problem do stemming and lemmatization address
data sparseness and low frequency tokens
138