L7: Supervised machine learning Flashcards

1
Q

What is machine learning

A

A branch of AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does machine learns from?

A

.. Learning from data
… Discovering hidden patterns
… Essential for data-driven decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What questions can we ask?

A

Predictive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Examples of applied ML

A

Examples:
* Credit card fraud detection in Financial Institutions
* Recommendation systems on websites for personalization
* Customer segmentation for marketing strategies
* Customer churn to foresee service cancellations
* Predictive maintenance in manufacturing companies
* Sentiment analysis of social media data
* Health diagnosis to aid doctors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

ML Pipeline

A

Aqquire –> Prepare –> Analyze –> Report –> Act

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

ML Pipeline step 1: Aqquire data

A
  • Identify data sources: Check the question that needs to be addressed
  • Collect data: Record the necessary data
  • Integrate data (data wrangling): Merge/join data, if needed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

ML Pipeline step 2: Prepare Data

A
  • Explore: Understand your data e.g.,
  • Check the structure and variable types
  • Check for outliers, missing values etc.
  • Pre-process: Prepare your data for analysis e.g.,
  • Clean (missing values, mistakes etc.)
  • Feature selection (e.g., combine, remove, add)
  • Feature transformation (e.g., scaling, dimensionality reduction, filtering)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

ML Pipeline step 3: Analyze data

A
  • Select analytical techniques
  • Build models
  • Assess results
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

STEP 4: REPORT RESULTS

A

Communicate results

Recommend actions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

STEP 5: ACT

A

Apply the results

Implement, maintain, and assess the impact

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The goal is to estimate a model from a selection of input variables to give
the best estimate of the target (i.e., outcome variable). It predicts something
we have seen before (i.e., data labels guides the learning process).

Requires:
* A range of input variables
* An outcome variable

A

Supervised ML

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The process of adding informative labels or tags to our data
* Think of it as the “ground truth” for the target variable/outcome variable
* Necessary for a supervised ML algorithm

A

DATA LABELS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DATA LABELS

A

The process of adding informative labels or tags to our data
* Think of it as the “ground truth” for the target variable/outcome variable
* Necessary for a supervised ML algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Types of supervised ML

A

Regression and classifacation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Regression

A

Given input variables, predict a numeric (continuous) value.
Examples:
* Estimate average house price for a region
* Determine demand for a new product
* Predict power usage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Given input variables, predict a numeric (continuous) value.
Examples:
* Estimate average house price for a region
* Determine demand for a new product
* Predict power usage

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

CLASSIFICATION

A

Given input variables, predict a categorical variable.
Examples:
* Predict if it will rain tomorrow
* Determine if loan application is high-,medium-, or low-risk
* Identify sentiment as positive, negative, or neutral

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Given input variables, predict a categorical variable.
Examples:
* Predict if it will rain tomorrow
* Determine if loan application is high-,medium-, or low-risk
* Identify sentiment as positive, negative, or neutral

A

CLASSIFICATION

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

MACHINE LEARNING ALGORIHMS

A

Some examples:
* Linear regression
* Logistic regression
* K-nearest neighbor
* Decision trees
* Support vector machines

20
Q

PARAMETRIC/NON-PARAMETRIC
ALGORITHMS

A

PARAMETRIC:

Pre-known functional form f()
* Restrictive assumptions
* Non-flexible
* Pre-determined number of
parameters

NON-PARAMETRIC:

Any functional form f()
* No assumptions
* Flexible
* Parameters learned from data

21
Q

LINEAR REGRESSION

A

Linear regression models the linear relationship between a dependent
variable and one or more independent variables based on a fixed
functional form f(x). The simplest type of regression is linear regression. If you
add more than one independent variable, it is called a multple linear
regression.

22
Q

models the linear relationship between a dependent
variable and one or more independent variables based on a fixed
functional form f(x). The simplest type of regression is linear regression. If you
add more than one independent variable, it is called a multple linear
regression.

A

LINEAR REGRESSION

23
Q

LOGISTIC REGRESSION

A

Logistic regression predicts the probability of an event occuring (binary
outcome variable) based on a number of input variables. It has a fixed
functional form for f(x), and can accommodate a range of input variables.

Classification

24
Q

predicts the probability of an event occuring (binary
outcome variable) based on a number of input variables. It has a fixed
functional form for f(x), and can accommodate a range of input variables.

Classification

A

LOGISTIC REGRESSION

25
Q

K-NEAREST NEIGHBOUR (KNN)

A

KNN is an algorithm that works locally because it uses a pre-specified
number of observations (k = the number of nearest neighbours) to make
the prediction. For regression, the average score is used whereas, in
classification, the majority always wins.

ü Regression
ü Classification

26
Q

is an algorithm that works locally because it uses a pre-specified
number of observations (k = the number of nearest neighbours) to make
the prediction. For regression, the average score is used whereas, in
classification, the majority always wins.

ü Regression
ü Classification

A

K-NEAREST NEIGHBOUR (KNN)

27
Q

DECISION TREES

A

Decision Trees are a global approach that use all observations to make a
prediction. The tree-like structure shows that the functional form f(x) is
approx. in a step-wise manner by means of recursive binary splitting.

ü Regression
ü Classification

28
Q

are a global approach that use all observations to make a
prediction. The tree-like structure shows that the functional form f(x) is
approx. in a step-wise manner by means of recursive binary splitting.

ü Regression
ü Classification

A

DECISION TREES

29
Q

SUPPORT VECTOR MACHINESestimate the most optimal decision boundary (i.e., line/plane/
hyperplane that seperates our data) by applying the kernel trick (i.e., place
data in higher dimensions). The data points nearest the decision boundary
are reffered to as support vectors and they form important margins.
ü Regression
ü Classification

A

SUPPORT VECTOR MACHINES

30
Q

estimate the most optimal decision boundary (i.e., line/plane/
hyperplane that seperates our data) by applying the kernel trick (i.e., place
data in higher dimensions). The data points nearest the decision boundary
are reffered to as support vectors and they form important margins.
ü Regression
ü Classification

A

SUPPORT VECTOR MACHINES

31
Q

Unsupervised ML

A

The goal is to derive associations and patterns based on a selection of input
variables without knowing the target (outcome variable) i.e., we have no
ground truth.

Requires:
* A range of input variable
* No outcome variable

32
Q

The goal is to derive associations and patterns based on a selection of input
variables without knowing the target (outcome variable) i.e., we have no
ground truth.
Requires:
* A range of input variable
* No outcome variable

A

UNSUPERVISED ML

33
Q

What is a model

A

A simplified representation of reality created for a specific purpose based
on some assumptions.

Example: Customer churn
* Create a “formula” for predicting the probability of customer attrition at
contract expiration

34
Q

How to build a model

A
  1. Consider the domain and your problem statement
  2. Consider the requirement for explainability
  3. Choose the type of algorithm
  4. Establish success criteria i.e., definition of success
  5. Train models
  6. Model selection
35
Q

Curse of dimensonality

A

Curse of dimensionality refers to the situation where we keep on adding
more input variables to our data, which creates high-dimensional data.

High-dimensional data = # of input variables ≥ # of observations

The amount of training
data needs to grow
exponentially to
maintain the same
coverage!

36
Q

Black box ML model

A

“Black box” ML models are too complex for humans to understand or
interpret. A limitation some ML algorithms suffer from, but not all!

  • A complex decision process made by the algorithm
  • Difficult to trace back from the predictions to the origin
  • Hard to determine why an action was taken
  • Model parameters that are non-interpretable

Think carefully about explainability (Can your stakeholders understand the
results of the chosen model?).

In general, it is good practice to use simpler and more interpretable models
when there is no significant benefit gained from choosing a more complex
alternative, an idea also known as Occam’s Razor.

37
Q

Overfitting

A

When you learn patterns in the training data that only are
there by chance i.e., not present in new unseen data.
Non-parametric and non-linear models are prone to
overfitting because they have more flexibility when they
approximate the functional form of f(x).

38
Q

When you learn patterns in the training data that only are
there by chance i.e., not present in new unseen data.
Non-parametric and non-linear models are prone to
overfitting because they have more flexibility when they
approximate the functional form of f(x).

A

Overfitting

39
Q

Underfitting

A

When you do not learn important patterns in the training data
nor important generalizable patterns in new unseen data. It
will be obvious from the chosen performance metric (training
data) and the remedy is to move on and try to estimate
alternate models.

40
Q

When you do not learn important patterns in the training data
nor important generalizable patterns in new unseen data. It
will be obvious from the chosen performance metric (training
data) and the remedy is to move on and try to estimate
alternate models.

A

Underfitting

41
Q

BIAS-VARIANCE TRADE-OFF

A

Prediction Error x Model complexity

42
Q

Data splitting

A

The goal: Split the data into a training data set and a test data set.

Why?
- Training a model and predicting with is are two separate things.
- Avoid prediction bias when assessing the accuracy of the model.

Requirements:
* Independent (observations are independent of each other)
* Mutually exclusive (an observation appears in only one of the two sets)
* Completely exhaustive (all observations are allocated)

43
Q

Data splitting strategies

A

Random split with 80% train (in-sample) and 20% test data (out-of-sample)
* Stratified random splitting
* Train data set / Validation (tuning) data set / Test data set
* Cross-validation
* Leave-One-Out
* K-fold
Not enough data?
- Use a resampling technique e.g., Boot-strapping

44
Q

Objective functions

A

How successful is the chosen algorithm? To measure this, you need to
choose an objective (loss) function that represent your goal.

Examples:
* Mean Squared Error (MSE): The average of sum of the squared difference
between your predictions and your actual observations.

  • Mean Absolute Error (MAE): The average of sum of absolute differences
    between predictions and your actual observations.
  • Misclassification rate: The number of incorrect predictions out of the total
    number of predictions.
45
Q

Model tuning

A

The goal is to establish different versions (candidate models) of the basic
model by tuning the hyperparameters. Hyperparameters are parameters
that are not a part of the model but impacts the training of the model (e.g.,
the k in KNN, the depth of a decision tree, or C and γ in a radial kernel for
SVMs).

How to fine-tune the hyperparameters?
* Run agrid-search & do k-fold cross-validation or use the validation set

46
Q

FINAL MODEL SELECTION

A

When selecting the final model (model selection), we look at the fitted
candidate models to choose the best one based on the in-sample error
calculated based on the data points used in the training process.