Quant 2.6 Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is Machine Learning?

A
  • is a set of computer-driven approaches aimed at generating structure or predictions from data by finding a pattern and then applying the pattern without any human intervention.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the objective of ML and how does it work?

A
  • Objective is to make some meaning out of large amounts of data.
    The way it works is, a large amount of data is given to the computer to access and find patterns or establish some relationships. This data usually consists of known examples or usable data. Then the ML runs over and over to find a pattern, establish some meaning to the pattern and then apply the pattern over if required. All of this, without any human intervention!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the advantages associated to ML? What are the classes of ML techniques?

A
  • Advantages are that, unlike regression ML isn’t based on any assumptions. ML also easily works with data which has high degree of non-linear relationships, can deal with a very large number of variables (high dimensionality).
    The three classes are:
    Supervised learning
    Unsupervised learning
    Deep learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Supervised ML?

A
  • Under supervised ML, our objective is to let the machine develop a prediction rule by studying a target labeled data which we provide and also an output is provided. The ML then analyses the given labeled data (CC example - Date, time of payment, amount all X variables) and then compares it to the output Y and forms a pattern or establishes a relationship.
    Then once the training data set is exhausted, we can give a similar new data set on which the ML will run the learned prediction rule (which was created by working on the training data set) and then we can compare how well it performs in the actual data set. Basically, it will predict outputs based on new inputs (Y variable - fraudulent or not).
    Here, the X variables are called features (independent variables in multiple linear regression) and the dependent variable is called the Target (Y).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the categories of data sets that can be used in Supervised learning?

A
  • Two broad categories of data:
    Regression - which means that our target variable is continuous and thus will be some function of the features.
    Classification - data can be classified based on features and the target can be yes or no (binary) like in our CC example or can have multiple classifications as well.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is unsupervised learning?

A
  • It is the process in which the machine doesn’t use labeled data. (this is a key disctinction from supervised learning)
    Similarly, there are several features that are being used but no target is provided here. The algorithm tries to discover structure (make sense of the data all by itself). Can be used for large complex data set where it is hard to visualize.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What type of problems are well suited for the unsupervised ML?

A
  • two types:
    Problems where we need to ‘Reduce dimensions’ or Dimension reduction and
    Clustering - where we need to sort the observations into groups.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Deep learning?

A
  • refers to highly sophisticated algorithms which are used for highly complex tasks like image classification, face recognition, speech recognition and natural language processing.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is reinforcement learning?

A
  • a situation/process where the computer learns from interacting with itself.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are deep learning and reinforcement learning based upon?

A
  • Neural networks
    So these algorithms work well when we have non-linearities in our data. They can be supervised or unsupervised. Also, works well when our features interact among themselves.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When creating a model, how do you divide the data into samples?

A
  • It’s typically divided into 3 non-overlapping samples.
    A. Training sample - One used in Supervised learning to let the algo study.
    B. Validation sample - The sample where the algo can run and tune the prediction rule it created from the training sample.
    C. Test sample - The final test sample where we want the machine to predict outcomes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is generalization?

A
  • Is a degree to which our prediction rule/model retains its explanatory power while predicting on out-of-sample data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Overfitting?

A
  • A situation where our model performs well on the test sample, but doesn’t generalise well with other samples/data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you explain the type of fit of the Model?

A
  • Use the suit example.
    If you go to a tailor for a suit, and they’ve one which fits only one person perfectly and no one else, then it’s called Overfit.
    If the suit is so baggy that it can’t properly fit on anyone, it’s called Underfit.
    If the suit fits anyone with 5ft 10 inch height then it’s called a Goodfit.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the complexity of the model based upon?

A
  • No. of features, terms or branches in the model & whether the model is linear or non-linear
    The higher the complexity of the model, the higher is the risk of it being Overfit.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How are out-of-sample errors categorised?

A
  • There are three types into which out-of-sample errors can be categorised.
    A. Base error - present in the test sample due to the randomness of the data.
    B. Bias error - the degree to which a model fits the training data.
    C. Variance error - how much the model’s results change in response to new data from validation and test samples.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are learning curves and what is a robust model?

A
  • Learning curves plot accuracy rate vs traning sample size. (It’s a graph or a plot which shows which is the type of error in our model.
    Desired level of accuracy is 1 - the base error (it’s due to the randomness of the data & there’s nothing we can do about it)
    Robust model is a model where out-of-sample accuracy increases towards the desired level of accuracy when the the number of training sample size increases.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are some methods to reduce overfitting of the data in Supervised Machine learning?

A
  • By overfitting, we mean the model doesn’t perform well out of sample.
    There are two methods:
    A. As discussed earlier, when the complexity of a model increases, so does the overfitting of the model.
    Thus reducing complexity will directly result in reduction of overfitting problem. (the simplest solution tends to be the correct one)
    B. Cross-validation - Based on principle of avoiding sample bias.
    K-fold cross validation technique ->
    The data set is broken down into two sections - 1st is Training plus validation section & the other section is Out-of-sample section.
    Now, the t+v data is randomly shuffled into k-1 samples and the kth data set is used for validation.
    Doing the shuffling multiple times, helps the algo learn all the variations and reduces the bias error.
    Which in turn increases t+v accuracy rates, and thus makes the model a better fit for OOS data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the different types of algorithms under Supervised Learning?

A
  • There are 5:
    A. Penalised Regression
    B. Support Vector Machine
    C. K-Nearest Neighbor
    D. Classification & Regression Tree
    E. Ensemble learning and Random Forest
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Explain Penalised Regression.

A
  • It is a computationally efficient technique used in prediction problems where the target variable is continuous.
    Regression coefficients are chosen to minimise sum of squared residuals plus a penalty term that increases with the number of included variables.
    Classic example - LASSO (just remember LASSO is penalised regression algorithm)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is Regularization?

A
  • Similar to generalization but not the same.
    Regularization:
    includes methods that reduce statistical variability
    models that help avoid complex models and the risk of overfitting
    Can be applied to non-linear models
22
Q

What is Penalised Regression useful for?

A
  • Prediction problems where the target variable is continuous
    Large data sets
    Features that are correlated; reducing a large number of features to a manageable set
23
Q

What is the Support Vector Machine algorithm?

A
  • SVM is a linear classifier that aims to seek the optimal hyperplane.
    Remember support vectors (drawing lines to separate triangles and x’s)
24
Q

What to use when data is not perfectly linearly separable?

A
  • Soft margin classification or Non-linear SVM algorithm
25
Q

Where can SVM algorithm be useful?

A
  • Can be used for classification, regression and outlier detection, but typically used for classification problems
  • Well suited for small-size to medium-size complex high-dimensionality data sets
    Example: predicting company failures, classify text from documents into useful categories.
26
Q

What is the KNN algorithm?

A
  • the K-nearest neighbor is used to classify a new observation by finding similarities (“nearness”) between it and its k-nearest neighbors in the existing data set.
    (remember triangles and x’s and a square shows up)
27
Q

What is a challenge in KNN algorithm?

A
  • Defining the ‘similar’ or ‘near’ k, the hyperparameter of the model, must be chosen carefully.
    Also, different k values can give different results.
28
Q

What are the benefits of KNN and where is it useful?

A
  • Benefits: Intuitive, non-parametric, can be used directly for multi-class classification. KNN is most often used for classification & sometimes for regression.
    Applications: Corporate bond credit rating assignment, bankruptcy prediction, stock price prediction, customized equity and bond index creation.
29
Q

What is the CART algorithm?

A
  • Can be applied to predict a categorical target variable or a continuous target variable.
    (Remember the example of classifying companies by whether or not they increase dividend payments)
    Also, a binary tree is a combination of a root node, decision nodes and terminal nodes.
30
Q

What are the regularization techniques in CART?

A
  • Use parameters such as max depth of the tree, min population at a node, max no.of decision nodes. Pruning: remove section of the tree that provide little classification power.
31
Q

Benefits and applications of CART?

A
  • it can uncover complex non-linear dependencies between features.
    Tree provides visual explanation for prediction (unlike a black-box algorithm)
    Some applications: Fraud detection in financial statements, generating consistent decision processes in equity and fixed-income selection, simplifying communication of investment strategies to clients.
32
Q

What is Ensemble learning & Random Forest?

A
  • Here, we can combine predictions from a collection of models. Typically produces more accurate and more stable predictions than the best single model.
33
Q

Types of Ensemble learning?

A

Majority-vote classifier (hetergeneous learning) use different algorithms to select result with most votes.
Divesity is good and assumes that model predictions are independent.

Bootstrap aggregating (bagging - homogeneous learning) basically generate new training data sets of data from original trainging data set and train a single algorithm on the n independent data sets to generate n models (prediction rules). This protects against overfitting.

Random forest classifier (black-box algorithm) is a collection of many different decision trees generated by a bagging model or by randomly reducing the no. of features available during training.
Advantages: Reduces signal to noise ratio and protects against overfitting on the training data.
Applications: Prediction if an IPO will be successful & factor-based investment strategies.

34
Q

What are the different types of algorithms under Unsupervised Learning?

A
  • There are algorithms based on the two major problem types that we use for Unsupervised Machine learning: Dimension Reduction and Clustering.
    So, there are three algorithms:
    1. PCA (Dimension Reduction)
    2. K-Means (Clustering)
    3. Hierarchical clustering (Clustering)
35
Q

What is Dimension Reduction?

A
  • Reducing the set of features to a manageable size while retaining as much of the variation in data as possible.
36
Q

What is PCA algorithm?

A
  • Is a dimension reduction method where we reduce highly correlated features of data into a few uncorrelated composite variables.
    A ‘composite’ variable is a variable that combines two or more variables that are statistically strongly related to each other.
37
Q

What are Eigenvectors and Eigenvalues?

A
  • Eigenvectors are the composite variables called Principal components that are linear combinations of the original features but are mutually uncorrelated composite variables (called as PC1).
  • Eigenvalues is proportion of total variance in the initial data that is explained by each eigenvector.
38
Q

How does the algorithm form PC1?

A
  • The algorithm finds PC1 such that the sum of projection errors for all data points in minimised and the sum of spread between all data is maximised.
39
Q

How to determine how many PCs do we need?

A
  • Scree plots can help us decide. Scree plots show the proportion of total variance in the data explained by each principal component.
40
Q

Points to remember about PCA?

A
  • PCs are difficult to interpret (Black-box algorithm).
    Dimension reduction facilitates visual representation of data in 2 or 3 dimensions.
    Dimension reduction is often performed before training another supervised or unsupervised learning model.
41
Q

Explain Clustering?

A
  • A ‘cluster’ contains a subset of observations that are “similar”.
    Observations in a cluster should be close to each other (cohesion)
    Observation in two different clusters should be far from each other (separation)
42
Q

What is K-means clustering?

A

Determine hyperparameter, k, before training begins.
Repeatedly partitions observations into k non-overlapping clusters (chosen randomly)
Each cluster is characterized by its centroid
Each observation is assigned to the cluster with the centroid to which that observation is closest.

43
Q

Applications of K-Means clustering?

A

Deriving alternatives to static industry classifications.
Data exploration for discovering patterns in high dimensional data.

44
Q

What is hierarchical clustering?

A

algorithms create intermediate rounds of clusters of increasing (‘agglomerative’ - bottom-up) or decreasing (‘divisive’ - top-down) size until a final clustering is reached.

45
Q

Why use hierarchical clustering?

A

is more computationally intensive compared to k-means clustering.
allows analyst to examine alternative segmentations of data of different granularity before deciding which one to use.
doesn’t rely on a hyperparameter.

46
Q

What is a Dendrogram?

A

It highlights the hierarchical relationships among clusters.

47
Q

What are some general applications of clustering?

A

Portfolio diversification
Uncovering important underlying structure in complex data sets
Discovering patterns in high dimensional data
Deriving alternatives to static industry classifications

48
Q

What are Neural Networks?

A

Neural networks have layers of nodes connected by links
Input layer nodes correspond to features
Hidden layer(s) feed output node
Output node generates predicted value

49
Q

Where does most of the action happen in neural networks?

A

It happens in the hidden layers.
Each hidden node has 2 functional parts:
Summation operator - Multiplies each value by a weight and sums the weighted values to form the total net input
Activation formula - acts like a light dimmer switch that decreases or increases the strength of the input
Learning takes place in the hidden layer through improvements in weights applied to nodes with the aim of reducing total error.

50
Q

What are deep learning nets?

A

DLNs are sophisticated neural networks.
Neural networks with many hidden layers - at least 3 but often more than 20 hidden layers are known as DLNs
DLNs are the backbone of the AI revolution and are used in complex activities such as image, pattern and speech recognition.

51
Q

What is Reinforcement learning?

A

RL algorithm involves an agent that should perform actions that will maximize its rewards over time, taking into consideration the constraints of its environment.
The algo observes its environment, learns by testing new actions, and reuses its previous experiences.
Learning occurs through millions of trials and errors.