machine learning landscape Flashcards

1
Q

Supervised/Unsupervised Learning

Question 1

Définir ce qu’est l’apprentissage supervisée

A

In supervised learning, the training data you feed to the algorithm includes the desired solutions, called labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Question 02

Supervised-learning

Quels sont les deux tâches principales de l’apprentissage supervisée ?

A

Classification(spam ou ham)

Régression(prédicteur)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Question 03

Définir ce qu’est l’apprentissage non supervisé ?

A

In unsupervised learning, as you might guess, the training data is unlabeled.

The system tries to learn without a teacher.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Question 04

Donner un exemple d’algorithme non supervisée.

A

For example, say you have a lot of data about your blog’s visitors. You may want to run a clustering algorithm to try to detect groups of similar visitors.

At no point do you tell the algorithm which group a visitor belongs to: it finds those connections without your help.

For example, it might notice that 40% of your visitors are males who love comic books and generally read your blog in the evening, while 20% are young sci-fi lovers who visit during the weekends, and so on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Question 05

Apprentissage non supervisée

Quel est la deuxième application d’un apprentissage non supervisée ?

A

Visualization algorithms are also good examples of unsupervised learning algorithms: you feed them a lot of complex and unlabeled data, and they output a 2D or 3D representation of your data that can easily be plotted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Question 06

Apprentissage non supervisée

Quelle est la quatrième application d’un apprentissage non supervisé ?

A

Dimensionality reduction, in which the goal is to simplify the data without losing too much information.

One way to do this is to merge several correlated features into one.

For example, a car’s mileage may be very correlated with its age, so the dimensionality reduction algorithm will merge them into one feature that represents the car’s wear and tear. This is called feature extraction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Question 07

Apprentissage non supervisé

Quelle est la première application d’un apprentissage non supervisé ?

A

For example, say you have a lot of data about your blog’s visitors. You may want to run a clustering algorithm to try to detect groups of similar visitors.

If you use a hierarchical clustering algorithm, it may also subdivide each group into smaller groups. This may help you target your posts for each group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Question 08

Définir ce qu’est l’apprentissage semi-supervisé ?

A

Some algorithms can deal with partially labeled training data, usually a lot of unlabeled data and a little bit of labeled data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Question 09

Apprentissage semi supervisé

Donner une application d’apprentissage semi supervisé ?

A

Some photo-hosting services, such as Google Photos, are good examples of this. Once you upload all your family photos to the service, it automatically recognizes that the same person A shows up in photos 1, 5, and 11, while another person B shows up in photos 2, 5, and 7. This is the unsupervised part of the algorithm (clustering). Now all the system needs is for you to tell it who these people are. Just one label per person,4 and it is able to name everyone in every photo, which is useful for searching photos.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Question 10

What is a batch learning system?

A

In batch learning, the system is incapable of learning incrementally: it must be trained using all the available data.

This will generally take a lot of time and computing resources, so it is typically done offline. First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it has learned. This is called offline learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Question 11

What is an online learning system?

A

In online learning, you train the system incrementally by feeding it data instances sequentially, either individually or by small groups called mini-batches. Each learning step is fast and cheap, so the system can learn about new data on the fly, as it arrives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Question 12

What is the out-of-core learning ?

A

Online learning algorithms can also be used to train systems on huge datasets that cannot fit in one machine’s main memory (this is called out-of-core learning). The algorithm loads part of the data, runs a training step on that data, and repeats the process until it has run on all of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Question 13

What type of learning algorithm relies on a similarity measure to make predictions ?

A

Instance-based learning system

The system learns the examples by heart, then generalizes to new cases using a similarity measure(exemple:count the number of words they have in common(spam))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Question 14

What do model-based learning algorithms search for?

A

Generalize from a set of examples then build a model of these examples, then use that model to make predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Question 15

What is the most common strategy of the model base machine learning use to succeed? How do they make predictions?

A

You studied the data

You selected a model.

You trained it on the training data (i.e., the learning algorithm searched for the model parameter values that minimize a cost function).

Finally, you applied the model to make predictions on new cases (this is called inference), hoping that this model will generalize well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Question 16

Can you name four of the main challenges in Machine Learning ?

A

Insufficient Quantity of Training Data

Nonrepresentative Training Data

Poor-Quality Data

Irrelevant Features

17
Q

Question 17

Définir ce qu’est l’overfitting ?

A

It means that the model performs well on the training data, but it does not generalize well

18
Q

Question 18

Quelles sont les solutions à l’overfitting ?

A
  • To simplify the model by selecting one with fewer parameters (e.g., a linear model rather than a high-degree polynomial model), by reducing the number of attributes in the training data or by constraining the model
  • To gather more training data
  • To reduce the noise in the training data (e.g., fix data errors and remove outliers)
19
Q

Question 19

Constraining a model to make it simpler and reduce the risk of overfitting is called …

A

Constraining a model to make it simpler and reduce the risk of overfitting is called regularisation

20
Q

Question 20

The amount of regularization to apply during learning can be controlled by a . A hyperparameter is a parameter of a learning algorithm (not of the model).

A

The amount of regularization to apply during learning can be controlled by a hyperparameter. A hyperparameter is a parameter of a learning algorithm (not of the model).

21
Q

Question 21

What is a test set and why would you want to use it?

A

A test set is used to estimate the generalization error that a model will make on new instances, before the model is launched in production

22
Q

Question 22

What is the purpose of a validation set?

A

A validation set is used to compare models. It makes it possible to select the best model and tune the hyperparameters.

23
Q

Question 23

What can go wrong if you tune hyperparameters using the test set?

A

If you tune hyperparameters using the test set, you risk overfitting the test set, and the generalization error you measure will be optimistic (you may launch a model that performs worse than you expect).

24
Q

Question 24

What is cross-validation and why would you prefer it to a validation set?

A

Cross-validation is a technique that makes it possible to compare models (for model selection and hyperparameter tuning) without the need for a separate validation set. This saves precious training data.

25
Q
A