lecture 5 Flashcards

1
Q

What is predictive analytics

A

Predictive analytics is the process of extracting information from large data sets in order to determine trends and patterns that can be used to generate models and predict behaviors of interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Prescriptive analytics

A

Aims at suggesting (prescribing) the best decision options in order to take advantage of the predicted future utilizing large amounts of data (Ε ikΕ‘nys & Pedersen, 2016).

Incorporates the predictive analytics output and utilizes artificial intelligence, optimization algorithms and expert systems in a probabilistic context in order to provide adaptive, automated, constrained, time-dependent and optimal decisions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Relation between Predictive and prescriptive (predictive-prescriptive split)

A

There is considerable overlap between the two areas.

difference:
prescriptive depends on predictive. In this course treated as two seperate steps.

Venn diagram in slide shows that Machine Learning / data mining is mainly predictive analytics, but also falls into the prescriptive part.
Probabilistic models is halfway in both.

predictive analystics
statistical analysis

prescriptive analystics
mathematical programming
simulation
logic based models
evolutianry computation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is AI?

A

No consensus on a single definition

Thinking Humanly:
Cognitive science/Cognitive modelling

Acting Humanly: Turing test

Thinking Rationally: Logic-based/Deductive Intelligence

Acting Rationally: Rational (trying to achieve the best
solution) agents

Is it more about actual intelligence or perceived
intelligence?

slide 11

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Chinese room argument

A

Is it more about actual intelligence or perceived
intelligence
?
Does an AI actually
understand or does it simply
execute an algorithm/set of
rules with (super)human
capacities?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Levels of AI

A
  • narrow AI
  • general AI
  • super AI
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is narrow AI?

A

Dedicated to assist with or take over specific tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

General AI

A

takes knowledge from one domain, transfers to other domains

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Super AI

A

machines that are an order of magnitude smarter than humans

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

differences between AI, machine learning, and deep learning

A

AI: computing systems which are capable of performing tasks that humans are very good at, for example recognizsing objects

ML: the field of AI that applies statistical methods to enable computer systems to learn from the data towards and end goal.

Deep learning: neural networks with several hidden layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Machine learning definition

A

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, *if its performance at tasks *in T, as measured by P, improves with
experience E

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When to use:
* classical ML
* Reinforcement learning
* ensembles
* neural networks and deep learning

A

classical ML
* simple data and clear features
Reinforcement learning
* no data, but we have an environment to interact with
ensembles
* when quality is a real problem
neural networks and deep learning
* complicated data, unclear features, belief in a miracle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data requirements for Machine learning (taxonomy of machine learning)

A
  • Supervised
  • unsupervised
  • semisupervised
  • reinforcement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Supervised learning

A

With supervised learning, you feed the output of your algorithm into the system (as input, for instance pics of cats and dogs with the answer that a pic of a dog is a dog and a cat is a cat, to train the model). This means that in supervised learning, the machine already knows the output of the algorithm before it starts working on it or learning it. A basic example of this concept would be a student learning a course from an instructor. The student knows what he/she is learning from the course.

With the output of the algorithm known, all that a system needs to do is to work out the steps or process needed to reach from the input to the output. The algorithm is being taught through a training data set that guides the machine.

type of target variable is either:
* continous which results in regression analysis
* catergorical which results in classification.
Examples of these categories formed through classification would include demographic data such as marital status, sex, or age

Even more information if needed

Supervised learning uses a training set to teach models to yield the desired output. This training dataset includes inputs and correct outputs, which allow the model to learn over time. The algorithm measures its accuracy through the loss function, adjusting until the error has been sufficiently minimized.
Uses labeled data.
examples:
* Image- and object-recognition: Supervised learning algorithms can be used to locate, isolate, and categorize objects out of videos or images, making them useful when applied to various computer vision techniques and imagery analysis.
* Predictive analytics
* Spam detection: Spam detection is another example of a supervised learning model. Using supervised classification algorithms, organizations can train databases to recognize patterns or anomalies in new data to organize spam and non-spam-related correspondences effectively

challenges of supervised learning
* Supervised learning models can require certain levels of expertise to structure accurately.
* Training supervised learning models can be very time intensive.
* Datasets can have a higher likelihood of human error, resulting in algorithms learning incorrectly.
* Unlike unsupervised learning models, supervised learning cannot cluster or classify data on its own.

IBM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Difference between supervised vs. unsupervised learning vs. semi-supervised learning

A

Unlike supervised learning, unsupervised learning uses unlabeled data. From that data, it discovers patterns that help solve for clustering or association problems. This is particularly useful when subject matter experts are unsure of common properties within a data set. Common clustering algorithms are hierarchical, k-means, and Gaussian mixture models.

Semi-supervised learning occurs when only part of the given input data has been labeled. Unsupervised and semi-supervised learning can be more appealing alternatives as it can be time-consuming and costly to rely on domain expertise to label data appropriately for supervised learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Unsupervised learning

A
  • Does not use labels
  • output is unknown
  • far less used than supervised learning
  • forms the future behind ML and its possibilities
  • machine and computers developing the ability to β€œteach themselves” is alluding to the process of unsupervised learning.
  • no access to concrete datasets
  • outcomes of problems are largely unknown
  • no reference data at all
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Is skippable

example to show difference between supervised and unsupervised learning

A

consider that we have a digital image that has a variety of colored geometric shapes on it. These geometric shapes needed to be matched into groups according to color and other classification features. For a system that follows supervised learning, this whole process is a bit too simple.

The procedure is extremely straightforward, as you just have to teach the computer all the details pertaining to the figures. You can let the system know that all shapes with four sides are known as squares, and others with eight sides are known as octagons, etc. We can also teach the system to interpret the colors and see how the light being given out is classified.

However, in unsupervised learning, the whole process becomes a little trickier. The algorithm for an unsupervised learning system has the same input data as the one for its supervised counterpart (in our case, digital images showing shapes in different colors).

Once it has the input data, the system learns all it can from the information at hand. In fact, the system works by itself to recognize the problem of classification and also the difference in shapes and colors. With information related to the problem at hand, the unsupervised learning system will then recognize all similar objects, and group them together. The labels that it will give to these objects will be designed by the machine itself. Technically, there are bound to be wrong answers, since there is a certain degree of probability. However, just like how we humans work, the strength of machine learning lies in its ability to recognize mistakes, learn from them, and to eventually make better estimations next time around.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Reinforcement learning

A

Reinforcement Learning spurs off from the concept of Unsupervised Learning, and gives a high sphere of control to software agents and machines to determine what the ideal behavior within a context can be. This link is formed to maximize the performance of the machine in a way that helps it to grow. Simple feedback that informs the machine about its progress is required here to help the machine learn its behavior.

An agent decides the best action based on the current state of the results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Reinforcement learning vs. supervised learning and unsupervised learning

A

Reinforcement vs supervised learning
In Supervised Learning we have an external supervisor who has sufficient knowledge of the environment and also shares the learning with a supervisor to form a better understanding and complete the task, but since we have problems where the agent can perform so many different kind of subtasks by itself to achieve the overall objective, the presence of a supervisor is unnecessary and impractical. In the concept of Reinforcement Learning, there is an exemplary reward function, unlike Supervised Learning, that lets the system know about its progress down the right path.

Reinforcement vs unsupervised learning
Reinforcement Learning basically has a mapping structure that guides the machine from input to output. However, Unsupervised Learning has no such features present in it. Unsupervised Learning, the machine focuses on the underlying task of locating the patterns rather than the mapping for progressing towards the end goal.

For example, if the task for the machine is to suggest a good news update to a user, a Reinforcement Learning algorithm will look to get regular feedback from the user in question, and would then through the feedback build a reputable knowledge graph of all news related articles that the person may like. On the contrary, an Unsupervised Learning algorithm will try looking at many other articles that the person has read, similar to this one, and suggest something that matches the user’s preferences.

https://crayondata.ai/machine-learning-explained-understanding-supervised-unsupervised-and-reinforcement-learning/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Math representation (Taxonomy of Machine Learning)

A

divided in model-based and instance based

Instance-based: machine learning technique simply compares new instances to the ones they were trained on.
So comparing new data to the training data and based on the training data classifying it.

model-based: try to find a general representation of the relationships in the dataset.
the algorithm chooses an hypothesis, a mathematical representation. Then it determines the parameters of this hyporhesis based on the available data. This will be used to make estimations on new data.

https://hermit-notebook.site/en/notebook/computer-sciences/artificial-intelligence/machine-learning/taxonomy-of-machine-learning/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Classification by Training behaviour (Taxonomy of Machine Learning)

A

ML techniques cannot have a memory of the entire dataset they were trained on, but iterative adjustments are based on the data it is provided with. Many learning techniques will not be able to adjust on new data an already trained representation while keeping it consistent with its previous training (because there is no memory of the previous data).

batch learning: Learning techniques that require the entire data set for their training.
All the examples must be provided during the traning phase. The β€œpredictor” resulting from the training is then used in production and no more learning occurs. In this setting, if we obtain new examples, we need to train a new model from scratch on the complete enriched data set.

online learning: This learning algorithm can actually adjuts an already trained representation to new data. Unlike batch learning, an online learning technique can be provided with new training examples progressively and changes its representations accordinly, even while being used in production. For many underlying representations, true online learning is not possible. However, depending on the formulation, we can often find a pseudo online algorithm based on recursive algorithms. In this case, the new predictor depends on the current best predictor and all the previous examples (already learnt).

https://hermit-notebook.site/en/notebook/computer-sciences/artificial-intelligence/machine-learning/taxonomy-of-machine-learning/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Classification by Task Type (Machine learning taxonomy by usage or goal)

A
  • Regression
  • Classification
  • Clustering
  • Association Rule learning
  • Decision making
  • Blind source seperation
  • Dimensinality reduction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Regression

A

π‘Œ = 𝑓 (𝑋)

The values of π‘Œ are determined by a human
π‘Œ ∈ ℝ is a continuous variable

𝑓 is learned from the data through ML

Regression tries to find the value of a property of a phenomenon depending on the values of other properties or instances of the same kind.

Regression typically falls under supervised learning.

For example, suppose an ice cream seller wants to predict its incomes based on temparature forcasts. We would be learning the (model and) parameters of a regression if we were to try to create a software package for this requirement.

24
Q

Classification

A

π‘Œ = 𝑓(𝑋)
The values of π‘Œ are determined by a human

π‘Œ ∈ { 𝐢! , … , 𝐢” } is a discrete variable
𝐢! = Triangle
𝐢# = Circle

𝑓 is learned from the data through ML

Classification tries to find boundaries in the dataset so as to seperate the elements into a number of classes known (or defined) before the training.

Classification typicaly fall under supervised learning. However there exist unsupervised classification, like anomaly detection or outliers detection.

25
Q

clustering

A

The real values of π‘Œ are unknown

The ML algorithm tries to identify existing patterns in the data (without prior supervision)

Clustering tries to group observations such that elements belonging to the same group (or cluster) are more similar - according to some similarity measure - and thoses belonging to different groups are more dissimilar. Clustering typically is an unsupervised learning task.

26
Q

Baseline vs. State-of-the-art-model

A

Baseline/Benchmark
* Simple model
* Easy/quick to fit
* Reference point for performance analysis

State-of-the-art model
* Usually very complex model
* Costly/optimized fit
* Best possible performances

27
Q

Supervised machine learning for regression

A

Linear Regression

Artificial Neural Networks

Deep Artificial Neural Networks

Support Vector Regression (SVR)

K-Nearest Neighbours (k-NN)

28
Q

Linear Regression

A

Linear Regression

Dataset requirement :
Supervised

Data provisioning: Batch

Model representation:
Model-based : π‘Œ = 𝛽𝑋 + Ξ΅

Task: Regression
For Classification, the equivalent model is
Logistic Regression

29
Q

Artificial Neural Networks

A

Dataset requirement :
Supervised (ANN, RNN, CNN, GAN)
Unsupervised (Autoencoders)

Data provisioning: Batch/Online

Model representation: Model-based
Task: Regression/Classification Ensemble model

30
Q

Deep Artificial Neural Networks

A

Dataset requirement :
β–ͺ Supervised (ANN, RNN, CNN, GAN)
Unsupervised (Autoencoders)

Data provisioning: Batch/Online

Model representation: Model-
based

Task: Regression/Classification

31
Q

Support Vector Regression (SVR)

A

Dataset requirement :
Supervised

Data provisioning: Batch

Model representation: Model-
based : π‘Œ = 𝐾(𝛽𝑋) + Ξ΅

Task: Classification
For Regression, the equivalent model is Support Vector Regression

32
Q

K-Nearest Neighbours (k-NN)

A

Dataset requirement:
Supervised

Data provisioning:
Batch/Online

Model representation:
Model-based

Task:
Classification/Regression
Regression -> Mean
Classification -> Majority vote

33
Q

Supervised Machine Learning for classification

A
  • NaΓ―ve Bayes
  • Logistic Regression
  • Support Vector Machines (SVM)
  • Decision Tree
  • Random forest
  • Artificial Neural Networks
34
Q

NaΓ―ve Bayes

A

Dataset requirement :
Supervised

Data provisioning:
Batch

Model representation:
Model-based

Task: Classification

35
Q

Logistic Regression

A

Dataset requirement :
Supervised

Data provisioning: Batch

Model representation:
Model-based : π‘Œ = 𝛽𝑋 + Ξ΅

Task: Classification
For Regression, the equivalent model is Linear Regression

36
Q

Support Vector Machines (SVM)

A

Dataset requirement :
Supervised

Data provisioning: Batch

Model representation: Model-based : π‘Œ = 𝐾(𝛽𝑋) + Ξ΅

Task: Classification

For Regression, the equivalent model is Support Vector Regression

37
Q

Decision Tree

A

Dataset requirement :
Supervised

Data provisioning: Batch

Model representation: Instance-based

Task: Regression/Classification
Regression VS Classification
Decision Tree

38
Q

Random forest

A

Dataset requirement :
Supervised

Data provisioning: Batch

Model representation: Instance-based

Task: Regression/Classification
Ensemble model

39
Q

Artificial Neural Networks

A

Dataset requirement :
Supervised (ANN, RNN, CNN, GAN)
Unsupervised (Autoencoders)

Data provisioning: Batch/Online

Model representation: Model-based

Task: Regression/Classification
Ensemble model

40
Q

Unsupervised Machine Learning

A
  • K-Means Clustering
  • Hierarchical clustering
  • And many more…

many more:

Dimensionality Reduction
* PCA
* t-SNE
* Autoencoders

Clustering
* DBSCAN
* Self-organizing maps

Reinforcement Learning
* Q-Learning
* Deep Q-Learning
…

41
Q

K-Means Clustering

A

Dataset requirement: Unsupervised

Data provisioning: Batch

Model representation: Instance-based

Task: Clustering/pattern recognition

N.B. : As clustering is unsupervised, multiple solutions can be found!

42
Q

Hierarchical clustering

A

Dataset requirement: Unsupervised

Data provisioning: Batch

Model representation: Instance-based

Task: Clustering/pattern recognition

43
Q

Machine learning in practice - pipeline

A

raw data
* collection
* download
* scraping

Data preprocessing
* Data quality (cf. diagnostic)
* missing data
* categorical variables

Train-test split
* single validation
* cross validation

model fit
* fit on training data
* test on testing data

performance evaluation
* performance metric choice
* evaluation on validation data

44
Q

Splitting data

A

Data is split for three different uses:
* trees of different depths are fit to the training data
* their performance is evaluated on the validation set (the lower the validation error the better)
* and a final estimate of model performance is computed on the test set

45
Q

Splitting data and vocabulary

A

Feature: With respect to a dataset, a feature represents an attribute and value combination. Color is an attribute. β€œColor is blue” is a feature (blue is one of the values color can have).
target: target variable, also known as a dependent variable, is the outcome we aim to predict or explain using our model. It is the variable that we want to estimate or classify based on the available data.
sample: a row, one instance in a dataset, so an answer for all the features (and thus variables)
Training Set: A set of observations used to generate machine learning models.
Test Set: A set of observations used at the end of model training and validation to assess the predictive power of your model. How generalizable is your model to unseen data?

46
Q

Categorical data preprocessing

A

ordinal
one-hot-encoding

Use One-Hot Encoding: When dealing with nominal categorical variables that lack any inherent order.

Use Ordinal Encoding: When you have categorical variables with a clear ordinal relationship and the order between categories holds valuable information.

47
Q

One-hot-encoding

A

transforms categorical variables into a binary matrix where each category is represented as a column, and each instance is marked with a β€˜1’ in the corresponding column and β€˜0’ in all other columns. (so for instance, three values: red, green and yellow, then 3 columns, if it is red then a 1 in the red column a 0 in the others.)

advantages:
1. Preservation of Information: One-hot encoding preserves the uniqueness of each category. It ensures that the algorithm does not assume any ordinal relationship among the categories.
2. Lack of Bias: Since each category is represented independently, one-hot encoding prevents introducing unintended biases based on the order of categories.
3. Suitable for Most Algorithms: One-hot encoded data is widely accepted by various machine learning algorithms, such as decision trees, random forests, and neural networks

limitations:
1. Dimensionality: One-hot encoding can significantly increase the dimensionality of the dataset, especially when dealing with categorical variables with many unique categories. This can lead to the curse of dimensionality and negatively impact model performance.
2. Loss of Order Information: One-hot encoding discards any inherent order that might exist among categories, which can be crucial in some scenarios.

48
Q

Ordinal Encoding

A

Ordinal encoding is a technique that assigns a unique integer value to each category based on their order or rank. It is suitable for categorical variables that exhibit a clear ordinal relationship, where one category is greater or lesser than another. (for instance: flight ticket, first, second or a third class)

advantages:
1. Efficiency in Dimensionality: Ordinal encoding does not inflate the dataset’s dimensionality like one-hot encoding does. It replaces categorical values with integers, saving space and computation time.
2. Retains Order Information: This technique preserves the ordinal information that exists among categories, allowing the algorithm to leverage this information if it is relevant to the problem.

limitations:
1. Assumption of Equal Steps: Ordinal encoding assumes equal intervals between categories, which might not always be the case in real-world scenarios.
2. Potential Misrepresentation: If the assigned integer values do not accurately reflect the ordinal relationships, the encoded data might mislead the algorithm.

49
Q

Missing data preprocessing

A

Case deletion
Missing data imputation
Approcahes that take into account data distribution

50
Q

Missing data imputation

A

Generally replace the missing quantitative values using Mean/Median and when it comes to categorical or qualitative data, we use Mode to impute the missing data.

51
Q

Case deletion

A

List Wise Deletion: If we have missing values in the row then, delete the entire row. So, here we get some data loss. But to avoid this, we can use the Pairwise deletion method.

Pair Wise Deletion: We find the correlation matrix here. If the feature is highly correlated with the target variable, then we use some different imputation methods to deal with missing values. But, if the feature is not highly correlated with the target variable, then we delete the entire column.

52
Q

Precision

A

exactness of model

True positive / (true positive + false positive)

53
Q

Accuracy

A

percentage correct predictions

(true positive + true negative)
/
(tp + fn + fp + tn)

54
Q

Recall

A

Completeness of model

TP / (TP+FN)

55
Q

F1 Score

A

Combines precision and recall

(precision * recall)
\ *2
(precision + recall)

(so the fraction and then time 2)