L7: Supervised machine learning Flashcards

1
Q

What is machine learning

A

A branch of AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does machine learns from?

A

.. Learning from data
… Discovering hidden patterns
… Essential for data-driven decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What questions can we ask?

A

Predictive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Examples of applied ML

A

Examples:
* Credit card fraud detection in Financial Institutions
* Recommendation systems on websites for personalization
* Customer segmentation for marketing strategies
* Customer churn to foresee service cancellations
* Predictive maintenance in manufacturing companies
* Sentiment analysis of social media data
* Health diagnosis to aid doctors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

ML Pipeline

A

Aqquire –> Prepare –> Analyze –> Report –> Act

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

ML Pipeline step 1: Aqquire data

A
  • Identify data sources: Check the question that needs to be addressed
  • Collect data: Record the necessary data
  • Integrate data (data wrangling): Merge/join data, if needed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

ML Pipeline step 2: Prepare Data

A
  • Explore: Understand your data e.g.,
  • Check the structure and variable types
  • Check for outliers, missing values etc.
  • Pre-process: Prepare your data for analysis e.g.,
  • Clean (missing values, mistakes etc.)
  • Feature selection (e.g., combine, remove, add)
  • Feature transformation (e.g., scaling, dimensionality reduction, filtering)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

ML Pipeline step 3: Analyze data

A
  • Select analytical techniques
  • Build models
  • Assess results
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

STEP 4: REPORT RESULTS

A

Communicate results

Recommend actions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

STEP 5: ACT

A

Apply the results

Implement, maintain, and assess the impact

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The goal is to estimate a model from a selection of input variables to give
the best estimate of the target (i.e., outcome variable). It predicts something
we have seen before (i.e., data labels guides the learning process).

Requires:
* A range of input variables
* An outcome variable

A

Supervised ML

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The process of adding informative labels or tags to our data
* Think of it as the “ground truth” for the target variable/outcome variable
* Necessary for a supervised ML algorithm

A

DATA LABELS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DATA LABELS

A

The process of adding informative labels or tags to our data
* Think of it as the “ground truth” for the target variable/outcome variable
* Necessary for a supervised ML algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Types of supervised ML

A

Regression and classifacation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Regression

A

Given input variables, predict a numeric (continuous) value.
Examples:
* Estimate average house price for a region
* Determine demand for a new product
* Predict power usage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Given input variables, predict a numeric (continuous) value.
Examples:
* Estimate average house price for a region
* Determine demand for a new product
* Predict power usage

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

CLASSIFICATION

A

Given input variables, predict a categorical variable.
Examples:
* Predict if it will rain tomorrow
* Determine if loan application is high-,medium-, or low-risk
* Identify sentiment as positive, negative, or neutral

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Given input variables, predict a categorical variable.
Examples:
* Predict if it will rain tomorrow
* Determine if loan application is high-,medium-, or low-risk
* Identify sentiment as positive, negative, or neutral

A

CLASSIFICATION

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

MACHINE LEARNING ALGORIHMS

A

Some examples:
* Linear regression
* Logistic regression
* K-nearest neighbor
* Decision trees
* Support vector machines

20
Q

PARAMETRIC/NON-PARAMETRIC
ALGORITHMS

A

PARAMETRIC:

Pre-known functional form f()
* Restrictive assumptions
* Non-flexible
* Pre-determined number of
parameters

NON-PARAMETRIC:

Any functional form f()
* No assumptions
* Flexible
* Parameters learned from data

21
Q

LINEAR REGRESSION

A

Linear regression models the linear relationship between a dependent
variable and one or more independent variables based on a fixed
functional form f(x). The simplest type of regression is linear regression. If you
add more than one independent variable, it is called a multple linear
regression.

22
Q

models the linear relationship between a dependent
variable and one or more independent variables based on a fixed
functional form f(x). The simplest type of regression is linear regression. If you
add more than one independent variable, it is called a multple linear
regression.

A

LINEAR REGRESSION

23
Q

LOGISTIC REGRESSION

A

Logistic regression predicts the probability of an event occuring (binary
outcome variable) based on a number of input variables. It has a fixed
functional form for f(x), and can accommodate a range of input variables.

Classification

24
Q

predicts the probability of an event occuring (binary
outcome variable) based on a number of input variables. It has a fixed
functional form for f(x), and can accommodate a range of input variables.

Classification

A

LOGISTIC REGRESSION

25
K-NEAREST NEIGHBOUR (KNN)
KNN is an algorithm that works locally because it uses a pre-specified number of observations (k = the number of nearest neighbours) to make the prediction. For regression, the average score is used whereas, in classification, the majority always wins. ü Regression ü Classification
26
is an algorithm that works locally because it uses a pre-specified number of observations (k = the number of nearest neighbours) to make the prediction. For regression, the average score is used whereas, in classification, the majority always wins. ü Regression ü Classification
K-NEAREST NEIGHBOUR (KNN)
27
DECISION TREES
Decision Trees are a global approach that use all observations to make a prediction. The tree-like structure shows that the functional form f(x) is approx. in a step-wise manner by means of recursive binary splitting. ü Regression ü Classification
28
are a global approach that use all observations to make a prediction. The tree-like structure shows that the functional form f(x) is approx. in a step-wise manner by means of recursive binary splitting. ü Regression ü Classification
DECISION TREES
29
SUPPORT VECTOR MACHINESestimate the most optimal decision boundary (i.e., line/plane/ hyperplane that seperates our data) by applying the kernel trick (i.e., place data in higher dimensions). The data points nearest the decision boundary are reffered to as support vectors and they form important margins. ü Regression ü Classification
SUPPORT VECTOR MACHINES
30
estimate the most optimal decision boundary (i.e., line/plane/ hyperplane that seperates our data) by applying the kernel trick (i.e., place data in higher dimensions). The data points nearest the decision boundary are reffered to as support vectors and they form important margins. ü Regression ü Classification
SUPPORT VECTOR MACHINES
31
Unsupervised ML
The goal is to derive associations and patterns based on a selection of input variables without knowing the target (outcome variable) i.e., we have no ground truth. Requires: * A range of input variable * No outcome variable
32
The goal is to derive associations and patterns based on a selection of input variables without knowing the target (outcome variable) i.e., we have no ground truth. Requires: * A range of input variable * No outcome variable
UNSUPERVISED ML
33
What is a model
A simplified representation of reality created for a specific purpose based on some assumptions. Example: Customer churn * Create a “formula” for predicting the probability of customer attrition at contract expiration
34
How to build a model
1. Consider the domain and your problem statement 2. Consider the requirement for explainability 3. Choose the type of algorithm 4. Establish success criteria i.e., definition of success 5. Train models 6. Model selection
35
Curse of dimensonality
Curse of dimensionality refers to the situation where we keep on adding more input variables to our data, which creates high-dimensional data. High-dimensional data = # of input variables ≥ # of observations The amount of training data needs to grow exponentially to maintain the same coverage!
36
Black box ML model
“Black box” ML models are too complex for humans to understand or interpret. A limitation some ML algorithms suffer from, but not all! * A complex decision process made by the algorithm * Difficult to trace back from the predictions to the origin * Hard to determine why an action was taken * Model parameters that are non-interpretable Think carefully about explainability (Can your stakeholders understand the results of the chosen model?). In general, it is good practice to use simpler and more interpretable models when there is no significant benefit gained from choosing a more complex alternative, an idea also known as Occam’s Razor.
37
Overfitting
When you learn patterns in the training data that only are there by chance i.e., not present in new unseen data. Non-parametric and non-linear models are prone to overfitting because they have more flexibility when they approximate the functional form of f(x).
38
When you learn patterns in the training data that only are there by chance i.e., not present in new unseen data. Non-parametric and non-linear models are prone to overfitting because they have more flexibility when they approximate the functional form of f(x).
Overfitting
39
Underfitting
When you do not learn important patterns in the training data nor important generalizable patterns in new unseen data. It will be obvious from the chosen performance metric (training data) and the remedy is to move on and try to estimate alternate models.
40
When you do not learn important patterns in the training data nor important generalizable patterns in new unseen data. It will be obvious from the chosen performance metric (training data) and the remedy is to move on and try to estimate alternate models.
Underfitting
41
BIAS-VARIANCE TRADE-OFF
Prediction Error x Model complexity
42
Data splitting
The goal: Split the data into a training data set and a test data set. Why? - Training a model and predicting with is are two separate things. - Avoid prediction bias when assessing the accuracy of the model. Requirements: * Independent (observations are independent of each other) * Mutually exclusive (an observation appears in only one of the two sets) * Completely exhaustive (all observations are allocated)
43
Data splitting strategies
Random split with 80% train (in-sample) and 20% test data (out-of-sample) * Stratified random splitting * Train data set / Validation (tuning) data set / Test data set * Cross-validation * Leave-One-Out * K-fold Not enough data? - Use a resampling technique e.g., Boot-strapping
44
Objective functions
How successful is the chosen algorithm? To measure this, you need to choose an objective (loss) function that represent your goal. Examples: * Mean Squared Error (MSE): The average of sum of the squared difference between your predictions and your actual observations. * Mean Absolute Error (MAE): The average of sum of absolute differences between predictions and your actual observations. * Misclassification rate: The number of incorrect predictions out of the total number of predictions.
45
Model tuning
The goal is to establish different versions (candidate models) of the basic model by tuning the hyperparameters. Hyperparameters are parameters that are not a part of the model but impacts the training of the model (e.g., the k in KNN, the depth of a decision tree, or C and γ in a radial kernel for SVMs). How to fine-tune the hyperparameters? * Run agrid-search & do k-fold cross-validation or use the validation set
46
FINAL MODEL SELECTION
When selecting the final model (model selection), we look at the fitted candidate models to choose the best one based on the in-sample error calculated based on the data points used in the training process.