Google ML Vocabulary Flashcards

Vocabulary Terms from the Google Machine Learning Glossary

1
Q

Google ML Vocabulary

Agent

A

In reinforcement learning, the entity that uses a policy to the determine which action will maximize the expected return gained from transitioning between states of the environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Google ML Vocabulary

Action

A

In reinforcement learning, the mechanism by which the agent transitions between states of the environment. The agent chooses the action by using a policy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Google ML Vocabulary

Bias (math) or bias term

A

An intercept or offset from an origin. Bias is a parameter in machine learning models, which is symbolised by either of the following:

  • b
  • w₀

For example, bias is the b in the following formula:

y' = b + w₁x₁ + w₂w₂ + ... wₙxₙ

In a simple two-dimensional line, bias just means “y-intercept”.

Bias exists because not all models start from the origin (0,0). For example, suppose an amusement park costs 2 Euros to enter and an additional 0.5 Euro for every hour a customer stays. Therefore, a model mapping the total cost has a bias of 2 because the lower cost is 2 Euros.

Bias is not to be confused with bias in ethics and fairness or prediction bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Google ML Vocabulary

Class

A

A category that a label can belong to. For example:

  • In a binary classification model that detects spam, the two classes might be spam (positive) and not spam (negative)
  • In a multi-class classification model that identifies dog breeds, the classes might be poodle, beagle, pug and so on.

A classification model predicts a class. In contrast, a regression model predicts a number.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Google ML Vocabulary

Classification Model

A

A model whose predication is a class. For example, the following are all classification models:

  • A model that predits an input sentences’ language (French? Spanish? Italian?).
  • A model that repdicts tree species (Maple? Oak? Ash?)
  • A model that predicts the positive or negative class for a particular medical condition
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Google ML Vocabulary

Classification Model

A

In a binary classification, a number between 0 and 1 that converts the raw output of a logistic regression model into a prediction of either the positive class or the negative class. Note that the classification threshold is a value that a human chooses, not a value chosen by model training.

A logistic regression model outputs a raw value between 0 and 1. Then:

  • If this raw value is greater than the classification threshold, then the positive class is predicted.
  • If this raw value is less than the classification threshold, then the negative class is predicted.

For example, suppose the classification threshold is 0.8. If the raw value is 0.9, then the model predicts the positive class. If they raw value i 0.7, then the model predicts the negative class.

The choice of classification threshold strongly influences the number of false positives and false negatives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Google ML Vocabulary

Clustering

A

Grouping related examples, particularly during unsupervised learning. Once all the examples are grouped, a human can optionally supply meaning to each cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Google ML Vocabulary

Convergence

A

A state reached when loss values change very little or not at all with each iteration.

A model converges when additional training won’t improve the model.

In deep learning, loss values sometimes stay constant or nearly so for many iterations before finally descending. During a long period of constant loss values, you may temporarily get a false sense of convergence.

See also early stopping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Google ML Vocabulary

Empirical risk minimization (ERM)

A

Choosing the function that minimizes loss on the training set.

Contrast with structural risk minimization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Google ML Vocabulary

Example

A

The values of one row of features and possibly a lable. Examples in supervised learning fall into two general categories:

  • A labeled example consists of one or more features and a label. Labeled Examples are used during training.
  • An unlabeled example consist of one or more features but no label. Unlabeled examples are used during inference.

For instance, suppose you are training a model to determine the influence of weather conditions on student test scores. This first data set contains three examples, each with three features (Temperature, Humidity and Pressure) and one label (Test Score):

Temperature, Humidity, Pressure, Test Score
15, 47, 998, 92
19, 34, 1020, 84
18, 92, 1012, 87

Here are the same “unlabeled” examples, that do not include the label Test Score value:

Temperature, Humidity, Pressure
15, 47, 998
19, 34, 1020
18, 92, 1012

The row of a dataset is typically the raw source for an example. That is, and example typically consists of a subset of the columns in the dataset. Furthermore, the features in an example can also include synthetic features, such as feature crosses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Google ML Vocabulary

False Negative

Also referred to as FN

A

When a binary classification model mistakenly predicts the negative class. For example, the model predicts that a particular email message is not spam (the negative class), but that email message actually is spam.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Google ML Vocabulary

False Negative Rate

A

The proportion of actual positive examples for which the model mistakenly predicted the negative class. The following formula calculates the false negative rate:

false negative rate = (false negatives / (false negatives + true positives))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Google ML Vocabulary

False Positive

A

An example in which the model mistakenly predicts the positive class. For example, the model predicts that a particular email message is spam (the positive class), but that email message is actually no spam.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Google ML Vocabulary

False Positive Rate

A

The proportion of actual negative examples for which the model mistakenly predicted the positive clas. The following formula calculates the false positive rate:

false positive rate = (false positives / (false positives + true negatives))

The false positive rate is the x-axis in an ROC curve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Google ML Vocabulary

Feature

A

An input variable to a machine learning model. An example consists of one or more features. For instance, supposed you are training a model to determine the influence of weather conditions on student test scores. The following table shows three examples, each of which contains three features (Temperature, Humidity and Pressure) and one label (Test Score).

Temperature, Humidity, Pressure, Test Score
15, 47, 998, 92
19, 34, 1020, 84
18, 92, 1012, 87

Contrast with label

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Google ML Vocabulary

Feature Cross

A

A synthetic feature formed by “crossing” categorical or bucketed features.

For example, consider a “mood forecasting” model that represents temperature in one of the following four buckets: freezing, chilly, temperate, warm

And represents wind speed in one of the following three buckets: still,light,windy.

Without feature crosses, the linear model trains independently on each of the preceding seven various buckets. So, the model train on for instance, freezing independently of the training on, for instance, windy.

Alternatively you could create a feature cross of temperature and wind speed. This synthetic feature would have the following 12 possible values:

  • freezing-still
  • freezing-light
  • freezing-windy
  • chilly-still
  • chilly-light
  • chilly-windy
  • temperate-still
  • temperate-light
  • temperate-windy
  • warm-still
  • warm-light
  • warm-windy

Thanks to feature crosses, the model can learn mood differences between a freezing-windy day and a freezing-still day.

Formally, a feature cross is a cartesian product.

Feature crosses are mosly used with linear models and are rarely used with neural networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Google ML Vocabulary

Gradient Descent

A

A mathematical technique to minimize loss. Gradient descent iteratively adjusts weights and biases, gradually finding the best combination to minimize loss.

Gradient descent is older - much, much older - than machine learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Google ML Vocabulary

Hyperparameter

A

The variables that you or a hyperparameter tuning service adjust during successive runs of a training model. For example, learning rate is a hyperparameter. You could set the learning rate to 0.01 before one training session. If you determine that 09.01 is too high, you could perhaps set the learning rate to 0.003 for the next training session.

In contrast, parameters are the various weights and bias that the model learns during training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Google ML Vocabulary

Inference

A

In machine learning, the process of making predictions by applying a trained model to unlabeled examples.

Inference has a somewhat different meaning in statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Google ML Vocabulary

Iteration

A

A singleupdate of a model’s parameters - the model’s wieghts and biases - during training. The batch size determines how many examples the model process in a single iteration. For instance, if the batch size is 20, then the model process 20 examples before adjusting the parameters.

When training a nueral network, a single iteration involves the following two passes:

  1. A forward pass to evaluate loss on a single batch.
  2. A backward pass (backpropagation) to adjust the models’ parameters based on the loss and the learning rate.
21
Q

Google ML Vocabulary

Negative Class

A

In binary classification, one class is termed positive and the other is termed negative. The positive class is the thing or event that the model is testing for and the negative class is the other possibility. For example:

  • The negative class in a medical test might “not tumor”.
  • The negative class in a n email classifier might be “not spam”.

Contrast with positive class.

22
Q

Google ML Vocabulary

Positive Class

A

The class you are testing for.

For example, the positive class in a cancer model might be “tumor”. The positive class in an email classifier might be “spam”.

Contrast with negative class

23
Q

Google ML Vocabulary

Label

A

In supervised machine learning, the “answer” or “result” portion of an example.

Each labeled example consists of one or more features and a label. For instance, in a spam detection dataset, the label would probably be either “spam” or “not spam”. In a rainfall dataset, the label might be the amount of rain that fell during a certain period.

24
Q

Google ML Vocabulary

Labeled Example

A

An example that contains one or more features and a label. For example, the following table shows three labeled examples from a house valuation model, each with three features (Bedrooms, Bathrooms, Age) and one label (Price):

Bedrooms, Bathrooms, Age, Price
3, 2, 15, 345000
2, 1, 72, 179000
4. 2, 34, 392000

25
Q

Google ML Vocabulary

Learning Rate

A

A floating-point number that tells the gradient descent algorithm how strongly to adjust weights and biases on each iteration. For example, a learning rate of 0.3 would adjust weights and biases three times more powerfully than a learning rate of 0.1.

Learning rate is a key hyperparameter. If you set the learning rate too low, training will take too long. If you set the learning rate too high, gradient descent often has trouble reaching convergence.

During each iteration, the gradient descent algorithm multiplies the learning rate by the gradient. The resulting product i called the gradient step.

26
Q

Google ML Vocabulary

Linear

A

A relationship between two or more variables that can be represented solely through addition and multiplication.

The plot of a linear relationship is a line.

Contrast with nonlinear.

27
Q

Google ML Vocabulary

Linear Model

A

A model that assigns one weight per feature to make predictions. (linear models also incorporate a bias.) in contrast the relationship of features to predictions in deep models is generally nonliear.

Linear models are usually easier to train and more interpretable that deep models. However, deep models can learn complex relationships between features.

Linear regression and logistic regression are two types of linear models.

28
Q

Google ML Vocabulary

Logistic Regression

A

A type of regression model that predicts a probability. Logistic regression models have the following characteristics:

  • The label is categorical. The term logistic regression usually refers to binary logistic regression, that is, to a model that calculates probabilities for labels with two possible values. A less common variant, multinomial logistic regression, calculates probabilities for labels with more than two possible values.
  • The loss function during training is Log Loss. (Multiple Log Loss units can be placed in parallel for labels with more than two possible values.)
  • The model has a linear architecture, not a deep neural network. However, the remainder of this definition also applies to deep models that predict probabilities for categorical labels.

For example, consider a logistic regression model that calculates the probability of an input email being either span or not spam. During inference, suppose the model predicts 0.72. Therefore, the model is estimating:

  • A 72% chance of the email being spam.
  • A 28% chance of the email not being spam.

A logistic regression model uses the following two-step architecture:

  1. The model generates a raw prediction (y’) by applying a linear function of input features.
    2 The model uses that raw prediction as input to a sigmoid function, which converts the raw prediction to a value between 0 and 1, exclusive.

Like any regression model, a logistic regression model predicts a number. However, this number typically becomes part of a binary classification model as follows:

  • If the predicted number is greater than the classification threshold, the binary classification model predicts the positive class.
  • If the predicted number is less that the classification threshold, the binary classification model predicts the negative class.
29
Q

Google ML Vocabulary

Linear Regression

A

A type of machine learning model in which both of the following are true:

  • The model is a linear model
  • The prediction is a floating-point value. (This is the regression part of linear regression.)

Contrast linear regression with logistic regress. Also, contrast regression with classification.

30
Q

Google ML Vocabulary

Loss

A

During the training of a supervised model, a measure of how far a model’s prediction is from its label.

A loss function calculates the loss.

31
Q

Google ML Vocabulary

Structural risk minimization (SRM)

A

An algorithm that balances two goals:

  • The desire to build the most predictive model (for example, lowest loss).
  • The desire to keep he model as simple as possible (for example, strong regularization).

For example, a function that minimizes loss + regularization on the training set is a structural risk minimization algorithm.

Contras with empirical risk minimization.

32
Q

Google ML Vocabulary

Model

A

In general, any mathematical construct that process input data and returns output.

Phrased differently, a model is the set of parameters and structure needed for a system to make predictions.

33
Q

Google ML Vocabulary

Synthetic Feature

A

A feature not present among the input features, but assembled from one or more of them. Methods for creating synthetic features include the following:

  • Bucketing a continuous feature into range bins.
  • Creating a feature cross
  • Multiplying (or dividing) one feature value by other feature value(s) or by itself. For example, if a and b are input features, then ab and examples of synthetic features
  • Applying a transcendental function to a feature value. For example, if c is an input feature, then sin(c) and ln(c) are examples of synthetic features.

Features created by normalizing or scaling alone are not considered synthetic features.

34
Q

Google ML Vocabulary

Nonlinear

A

A relationship between two or more variables that can’t be represented solely through addition and multiplication. A linear relationship can be represented as a line; a nonlinear relationship can’t be represented as a line.

35
Q

Google ML Vocabulary

Parameter

A

The weights and biases that a model learns during training. For example, in a linear regression model, the parameters consist of the bias (b) and all the weights (w₁, w₂, …) in the following formula:

y' = b + w₁x₁ + w₂w₂ + ... wₙxₙ

In contrast, hyperparameters are the values that you (or a hyperparameter turning service) supply to the model. For example, a learning rate is a hyperparameter.

36
Q

Google ML Vocabulary

Policy

A

In reinforcement learning, an agent’s probabilistic mapping from states to actions.

37
Q

Google ML Vocabulary

Prediction

A

A models’ output. For exmaple:

  • The prediction of a binary classification model is either the positive class or the negative class.
  • The prediction of a multi-class classification model is one class.
  • The prediction of a linear regression model is a number.
38
Q

Google ML Vocabulary

Reinforcement Learning

Also referred to as RL

A

A family of algorithms that learn an optimal policy, whose goals is to maximize return when interacting with an environment.

For example, the ultimate reward of most games is victory. Reinforcement learning systems can become expert at playing complex games by evaluating sequences of previous game moves that ultimately led to wins and sequences that ultimately led to losses.

39
Q

Google ML Vocabulary

Regression Model

A

Informally, a model that generates a numerical prediction. Examples:

  • A model that predicts a certain house’s value, such as 423,000 Euros.
  • A model that predicts a certain tree’s life expectancy, such as 23.2 years.
  • A model that predicts the amount of rain that will fall in a certain city of the next six hours, such as 0.18 inches.

Two common types of regression models are:

  • Linear regression, which finds the line that best fits label values to features.
  • Logistic regression, which generates a probability between 0.0 and 1.0 that a system typically then maps to a class prediction.

Not every model that outputs numerical predictions is a regression model, for example a model that predicts postal codes is a classification model not a regression model.

40
Q

Google ML Vocabulary

Return

A

In reinforcement learning, given a certain policy and a certain state, the return is the sum of all rewards that the agent expects to receive when following the policy from the state to the end of the episode.

The agent accounts for the delayed nature of expected rewards by discounting rewards according to the state transitions required to obtain the reward.

41
Q

Google ML Vocabulary

Reward

A

In reinforcement learning, the numerical result of taking an action in a state, as defined by the environment.

42
Q

Google ML Vocabulary

Sigmoid Function

A

A mathematical function that “squishes” an input value into a constrained range, typically 0 to 1 or -1 to +1. That is, you can pass any number (two, a million, negative billion, whatever) to a sigmoid and the output will still be in the constrained range.

Plots of sigmoid functions are “S” shaped.

43
Q

Google ML Vocabulary

State

A

In reinforcement learning, the parameter values that describe the current configuration o the environment, which the agent uses to choose an action.

44
Q

Google ML Vocabulary

Supervised Machine Learning

A

Training a model from features and their corresponding labels. Supervised machine learning is analogous to learning a subject by studying a set of questions and their corresponding correct answers. After mastering the mapping between questions and answers, a student can then provide answers to new (never-before-seen) questions on the same topic.

Compare with unsupervised machine learning

45
Q

Google ML Vocabulary

Training

A

The process of determining the ideal parameters (weights and biases) comprising a model. During training, a system reads in examples and gradually adjusts parameters. Training uses each example anywhere from a few times to billions of times.

46
Q

Google ML Vocabulary

Unlabeled Example

A

An example that contains features but no label. For example, the following table shows three unlabeled examples from a house valuation model, each with three features (Bedrooms, Bathrooms, and Age) but no house value:

Bedrooms, Bathrooms, Age
3, 2, 15
2, 1, 72
4, 2, 34

In supervised machine learning, models train on labeled examples and make predictions on unlabeled examples.

In semi-supervised and unsupervised learning, unlabeled examples are used during training.

Contrast unlabeled example with labeled example.

47
Q

Google ML Vocabulary

Weight

A

A value that a model multiplies by another value. Training is the process of determining a model’s ideal weights. Inference is the process of using those learned weights to make predictions.

48
Q

Google ML Vocabulary

Unsupervised Machine Learning

A

Training a model to find patterns in a dataset, typically an unlabeled dataset.

The most common use of unsupervised machine learning is to cluster data into groups of similar examples. For example, an unsupervised machine learning algorithm can cluster songs based on various properties of the music. The resulting clusters can become an input to other machine learning algorithms (for example, to a music recommendation service). Clustering can help when useful labels are scarce or absent. For example, in domains such as anti-abuse and fraud, clusters can help humans better understand the data.

Contrast with supervised machine learning.