Reading 3: Machine Learning Flashcards by Anjy Alli

What is the goal of machine learning?

Find the pattern, apply the pattern.

The goal: filter useful information from great quantiites of data vt kearing from known examples to find a pattern in the data. Determine structure or generate forecasts without human intervention.

How well did you know this?

Not at all

Perfectly

What is supervised machine learning?

How well did you know this?

Not at all

Perfectly

In supervised machine learning, what characteristics of the target variable determine whether a problem is classified as regression or classification?

How well did you know this?

Not at all

Perfectly

What is unsupervised machine learning and what kinds of problems are appropriate here?

Trained without using labeled data therefore relations are inferred
Appropriate prblems: clustering and dimension reduction.

Clustering: Groups data points into clusters based on similarity
Dimensionality Reduction: Reduces the number of features while preserving important information

How well did you know this?

Not at all

Perfectly

What is deep learning and when is it used?

Deep learning is a subset of machine learning that uses artificial neural networks with many layers (hence “deep”) to learn patterns from data.

Inspired by how the human brain works.
Learns directly from raw data (like images, text, or sound).
Automatically extracts features—no need for manual feature engineering.

How well did you know this?

Not at all

Perfectly

What is generalisation?

Generalisation refers to a model’s ability to perform well on new, unseen data—not just the data it was trained on.

A well-generalised model captures the underlying patterns in the data.
It avoids being too specific to the training data.
The goal of machine learning is to build models that generalise well.

Example:
If you train a model to predict house prices using data from London, and it performs well on new data from Manchester, it has good generalisation.

How well did you know this?

Not at all

Perfectly

What is overfitting?

Overfitting happens when a model learns the noise and random fluctuations in the training data instead of the true patterns.

It performs very well on training data, but poorly on new data.
It’s like memorising answers for a test rather than understanding the material.

Signs of Overfitting:

High accuracy on training data
Low accuracy on validation/test data
Complex models with too many parameters

Example:
A model that perfectly predicts house prices in your training set but fails to predict prices for new listings is overfitting.

How well did you know this?

Not at all

Perfectly

What are the 3 sources out of sample error can originate from?

out of sample error = bias error + variance eror + base error

How well did you know this?

Not at all

Perfectly

What does high bias look like on a graph?

High Bias (Underfitting)
In-sample accuracy is low and flat.
Out-of-sample accuracy is also low and doesn’t improve with more data.
The model is too simple to learn the pattern.

How well did you know this?

Not at all

Perfectly

What does high variance error look like on a graph?

High Variance (Overfitting)

In-sample accuracy is very high.
Out-of-sample accuracy starts low and improves slowly.
The model memorises training data but struggles to generalise.

How well did you know this?

Not at all

Perfectly

What does a robust model look like?

Good Generalisation

Both in-sample and out-of-sample accuracy improve steadily.
The gap between the two narrows as training size increases.
This is the ideal learning behaviour.

How well did you know this?

Not at all

Perfectly

What is the holdout sample problem and what is used to solve it?

A holdout sample is a portion of your dataset that you set aside to test your model after training it. The holdout sample problem refers to the risk and limitations of relying on just one split of the data

K-Fold Cross Validation rotates the holdout set across different parts of the data:
Every data point gets to be in the test set once.
Every data point gets to be in the training set k−1 times.
You get k performance scores, which you average for a more reliable estimate.

How well did you know this?

Not at all

Perfectly

How does a k fold validation work?

Step 1: Randomly shuffle the dataset
Before splitting the data, you shuffle it randomly.
This ensures that the data is mixed well, so each fold is likely to be representative of the overall dataset (not grouped by time, location, or category).

✅ Why this matters: Without shuffling, you might accidentally split the data in a biased way — for example, all London assets in one fold, all Amsterdam assets in another.

Step 2: Split the data into k folds
Divide the shuffled data into k equal-sized parts (called folds).
If you have 100 data points and choose k = 5, each fold will have 20 points.

Step 3: Train and validate k times
For each of the k iterations:
Hold out one fold as the validation set.
Train the model on the remaining k−1 folds.
Evaluate the model on the validation fold and record the performance.

Each fold gets to be the validation set once.?

Step 4: Average the results
After all k iterations, you average the performance scores (e.g., accuracy, RMSE).
This gives you a more reliable and stable estimate of how your model performs on unseen data.

How well did you know this?

Not at all

Perfectly

What are the benefits of a k fold cross validation?

Shuffling ensures fairness and avoids bias in the folds.
K-Fold ensures every data point is used for both training and validation.
You get a robust estimate of model performance, especially useful when data is limited

How well did you know this?

Not at all

Perfectly

What are the 6 exmaples of supervised machine learning algos?

Penalised regression
Support vector machine
k-nearest neighbout
Classification and regression tree
Ensemble learning
Random forest

How well did you know this?

Not at all

Perfectly

What is LASSO?

Study These Flashcards

It’s a type of penalised regression that:

Adds a penalty based on the absolute values of the coefficients. The penalty increases with the number of features included. Think of it like adjusted r2.
Can shrink some coefficients to exactly zero, effectively removing those variables from the model.

Imagine you’re building a model to predict asset value using 20 features (e.g., location, size, tenant type, lease length, etc.).

Some features might be irrelevant or redundant.
LASSO helps by automatically selecting the most important ones.
It does this by penalising large coefficients, and zeroing out the ones that don’t help much.

Why is LASSO useful?

Study These Flashcards

Why is LASSO useful?

Feature selection: It simplifies the model by removing unimportant variables. LASSO automatically performs feature selection since it eliminates the least essential features from the model.

Prevents overfitting: Especially helpful when you have more features than observations.
Improves interpretability: You end up with a cleaner, more focused model i.e. parsimonious models (fewer predictor variables) given all features need to add an adequate contribution

What is lambda and what effect does it have on LASSO?

Study These Flashcards

Lambda is the tuning parameter that decides how much we want to penalize the flexibility of our model. As the value of λ
rises, the value of coefficients reduces and thus reducing the variance, consequently avoiding overfitting.

What is regularisation?

Study These Flashcards

Regularisation is a technique used to prevent overfitting in machine learning models. Overfitting happens when a model learns the training data too well — including its noise — and performs poorly on new, unseen data.
Regularisation helps by penalising complexity, encouraging the model to be simpler and more generalisable.

What is a support vector machine (SVM) and how does it work?

Study These Flashcards

A Support Vector Machine (SVM) is a tool that learns from examples to classify data into categories. You start by giving it examples from two groups. It then finds the best way to separate them using a straight line (or a flat surface called a hyperplane). This line is placed to create the widest possible gap between the groups, making it easier to decide where new examples belong. This method works best when the data can be clearly separated with a straight line — known as linear separability.

What is a hyperplane?

Study These Flashcards

Hyperplane: A decision boundary that separates different classes in the data.

What might a SVM be used for? Provide an investment related example.

Study These Flashcards

SVMs can be used for:
Credit scoring: Classifying borrowers as likely to default or not.
Fraud detection: Identifying unusual patterns in transaction data.
Market prediction: Predicting stock price movements or asset returns.
Portfolio optimization: Classifying assets based on risk-return profiles.#

E.g. Blue Dots: Represent Bull Market conditions — high momentum, low volatility.
Red Dots: Represent Bear Market conditions — low momentum, high volatility.

Asset A: High growth, low dividend → Growth portfolio
Asset B: Stable returns, high dividend → Income portfolio

Transaction 1: Normal amount, usual location → Not fraud
Transaction 2: Large amount, unusual location → Fraud

What is KNN and how does it work?

Study These Flashcards

KNN classifies a new data point based on how its neighbors are classified. It looks at the ‘K’ closest points (neighbors) in the training data and assigns the most common label among them.

You have a dataset with labeled examples (e.g., stocks labeled as “rising” or “falling”).
You choose a value for K (e.g., 3 or 5).
For a new data point, KNN:

Measures the distance (usually Euclidean) between the new point and all existing points.
Finds the K closest points.
Assigns the label that is most common among those K neighbors.

What are the main challenges of KNN?

Study These Flashcards

The main challenge facing KNN is the definition of “near.” Additionally, an important decision relates to the distance metric used to model nearness because an inappropriate measure generates poorly performing models. More subjectivity may arise depending on the correct choice of the correct distance measure.

Provide 3 examples of how KNN can be applied to investments

* Assigning bond ratings * Predicting bankruptcy * Predicting stock prices

What is a classification and regression tree and how does it work?

Classification and Regression Trees (CART) is a decision tree algorithm used for predictive modeling. It works by splitting data into branches based on feature values, creating a tree-like structure that leads to a decision. Continue splitting each branch until the data is pure or a stopping rule is met. When the target variable is categorical, CART builds a classification tree (predicts categories). When the target variable is continuous, CART builds a regression tree (predicts numerical values).

What are potential financial applications of CART?

* Detecting fraudulent financial statements * Selecting stocks and bonds * Credit risk prediction * Property valuation

How can regularisation be applied to CART?

With Classification and Regression Trees (CART), one way that regularization can be implemented is via pruning which will reduce the size of the regression tree—sections that provide little explanatory power are pruned (i.e., removed).

What is an enesemble model? Why is it useful?

Ensemble learning is a machine learning technique where multiple models (often called “learners” or “base models”) are combined to solve the same problem. The idea is that a group of models working together can produce better and more robust predictions than any single model alone. Any individual model will have a certain error rate and will make noisy predictions. By averaging predictions from many models, we can reduce noise, variance, bias and diversify the errors away. Predictions therefore tend to be more stable and accurate.

How might these machine learning models be used in finance?

* Credit risk modelling - prob of default * Stock price prediction * Portfolio optimisation - selecting assets to maxisimse risk and minimise return (less common for KNN and SVM) * Fraud detection * Economic forecasting

What is a random forest classifier? What are its benefits?

A Random Forest Classifier is a type of ensemble learning algorithm used for classification tasks. It builds multiple decision trees and combines their outputs to make more accurate and stable predictions. Reduces overfitting compared to a single decision tree. Improves stability of predictions. Handles large datasets and high-dimensional data well. Robust to noise and missing values. Works well with both categorical and numerical features.

How does a random forest classifier use bagging? When might it be used in finance?

It uses bagging (bootstrap aggregating) by: Creating multiple random subsets of the training data (each bag created randomly with replacement). Training a separate decision tree on each subset. Using a random subset of features at each split to increase diversity. Making predictions by majority vote across all trees. This approach makes Random Forests robust, accurate, and effective for tasks like credit risk modeling, fraud detection, modelling IPO success and stock classification.

Reading 3: Machine Learning Flashcards

(32 cards)