Lesson 4: Machine Learning Flashcards

1
Q

Accuracy (classification)

A

A measure which is defined as the number of correct predictions divided by the total number of predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Area Under the Curve

A

the percentage of area underneath the ROC curve. This is a measure of how accurate the two-class model is, with numbers closer to 1 being better.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Azure Machine Learning Studio

A

The integrated development environment (IDE) for Azure Machine Learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Compute (Azure ML)

A

Virtual machine resources which are dedicated to performing tasks in Azure Machine Learning. Compute may include individual virtual machines (VMs), typically configured as data science VMs, or it may include a cluster of VMs intended for training and inference pipeline executions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Confusion matrix:

A

A table representing predicted versus actual values for a classification problem. A classic two-class confusion matrix has four boxes. Using “Yes” and “No” as the two classes, these four boxes are:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How many types of values are there in a confusion matrix?

A

four

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Name the four confusion matrix types

A

True Positive: we predicted Yes correctly
False Positive: we predicted Yes but it was really No
False Negative: we predicted No but it was really Yes
True Negative: we predicted No correctly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True Positive

A

from the confusion matrix: we predicted Yes correctly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

False Positive

A

from the confusion matrix: we predicted Yes but it was really No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

False Negative

A

from the confusion matrix: we predicted No but it was really Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

True Negative

A

from the confusion matrix: we predicted No correctly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data Labeling

A

This functionality allows you to label images as part of an image classification project.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Experiment (Azure ML)

A

A collection of trials used to validate a user’s hypothesis. An experiment may contain multiple runs of pipelines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Feature

A

Inputs which help us understand what affects the label.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Feature engineering

A

Creating new features from existing data. This might include calculating new features, translating a street address into latitude and longitude, or parsing passages of text for meaning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Feature selection

A

Removing a column from consideration when training a model.

17
Q

Label

A

The thing we want to predict.

18
Q

Linked Services

A

This functionality allows you to integrate Azure Machine Learning with other Azure services. At present, the only linked service offering is to connect to Azure Synapse Analytics, which is a modern data warehousing offering on Azure.

19
Q

Mean Absolute Error (MAE)

A

An evaluation measure for any regression model. It is the average difference between predicted and actual values. This works well when dealing with small ranges of numbers.

20
Q

Mean Absolute Percent Error (MAPE)

A

An evaluation measure for any regression model. It is the percentage difference between the predicted and actual values. If the actual value is 0, MAPE will fail with a divide by 0 error, so it is not a good measure if the actual value can be 0. MAPE works best when you have large ranges of numbers.

21
Q

Microservice

A

A lightweight, independent service. Typically, microservices have one job and communicate with each other using well-defined operations.

22
Q

Node (input, output)

A

An input or output connection point on a component. Each component will have 0 to 3 input nodes and 0 to 3 output nodes. Each input or output node has a specific type, such as DataFrameDirectory, TransformationDirectory, or UntrainedModelDirectory. An input of DataFrameDirectory can only attach to an output of the same type.

23
Q

Overfitting

A

A situation which happens when a trained model latches onto the particular relationships within a training data set, but those particulars are not always indicative of the broader world.

24
Q

Pipeline (Azure ML)

A

A collection of components connected together in a defined order. The metaphor represents how data moves from a source (an initial dataset) and flows through components until it reaches a destination. There are two types of pipeline: training pipelines and inference pipelines.

25
Q

Pipeline Asset

A

A component available within Azure Machine Learning. This includes datasets you have imported, sample datasets which come with the service, and different components to transform, train, evaluate, and deploy models.

26
Q

Precision

A

A measure which calculates how frequently our predicted value is correct. It is defined as True Positives / (True Positives + False Positives).

27
Q

R^2 (R-squared)

A

An evaluation measure for linear regression models which ranges from 0-1, where 1 is the highest possible score.

28
Q

Recall

A

A measure which calculates how frequently we correctly predict a value. It is defined as True Positives / (True Positives + False Negatives).

29
Q

Receiver Operating Characteristic (ROC) curve

A

A plot which represents true positive versus false positive rates for a two-class model.

30
Q

Reinforcement learning

A

A machine learning technique in which we train an agent to observe its environment and use those environmental clues to make a decision.

31
Q

Root Mean Square Error (RMSE)

A

An evaluation measure for any regression model. RMSE works best when you are concerned with large differences between the predicted and actual values.

32
Q

Root Mean Square Log Error (RMSLE)

A

An evaluation measure for any regression model. RMSLE works best when you are concerned with large percentage differences between the predicted and actual values.

33
Q

Run (Azure ML)

A

An attempt to train a model in Azure Machine Learning. This can be done through a pipeline in the Azure ML designer or through Automated ML.

34
Q

Semi-supervised learning

A

A machine learning technique in which we have a small percentage of data with labels and a large percentage of unlabeled data.

35
Q

Sink node

A

A node with no outputs. An example of a sink node is Web Service Output.

36
Q

Source node

A

A node with no inputs. An example of a source node is any dataset you bring onto the canvas.

37
Q

Supervised learning:

A

A machine learning technique in which we have a known good answer for our label and attempt to learn from this label for inference purposes. The most common examples of this include classification and regression.

38
Q

Unsupervised learning

A

A machine learning technique in which we do not have labels for our data. We use unsupervised learning techniques to try to discover what those labels should be. Clustering is the most common example of this.