Machine Learning Glossary Flashcards
(171 cards)
ablation
A technique for evaluating the importance of a feature or component by temporarily removing it from a model. You then retrain the model without that feature or component, and if the retrained model performs significantly worse, then the removed feature or component was likely important.
For example, suppose you train a classification model on 10 features and achieve 88% precision on the test set. To check the importance of the first feature, you can retrain the model using only the nine other features. If the retrained model performs significantly worse (for instance, 55% precision), then the removed feature was probably important. Conversely, if the retrained model performs equally well, then that feature was probably not that important.
Ablation can also help determine the importance of:
Larger components, such as an entire subsystem of a larger ML system
Processes or techniques, such as a data preprocessing step
In both cases, you would observe how the system’s performance changes (or doesn’t change) after you’ve removed the component.
A/B testing
A statistical way of comparing two (or more) techniques—the A and the B. Typically, the A is an existing technique, and the B is a new technique. A/B testing not only determines which technique performs better but also whether the difference is statistically significant.
A/B testing usually compares a single metric on two techniques; for example, how does model accuracy compare for two techniques? However, A/B testing can also compare any finite number of metrics.
accelerator chip
A category of specialized hardware components designed to perform key computations needed for deep learning algorithms.
Accelerator chips (or just accelerators, for short) can significantly increase the speed and efficiency of training and inference tasks compared to a general-purpose CPU. They are ideal for training neural networks and similar computationally intensive tasks.
Examples of accelerator chips include:
Google’s Tensor Processing Units (TPUs) with dedicated hardware for deep learning.
NVIDIA’s GPUs which, though initially designed for graphics processing, are designed to enable parallel processing, which can significantly increase processing speed.
accuracy
The number of correct classification predictions divided by the total number of predictions. That is:
For example, a model that made 40 correct predictions and 10 incorrect predictions would have an accuracy of:
Binary classification provides specific names for the different categories of correct predictions and incorrect predictions. So, the accuracy formula for binary classification is as follows:
where:
TP is the number of true positives (correct predictions).
TN is the number of true negatives (correct predictions).
FP is the number of false positives (incorrect predictions).
FN is the number of false negatives (incorrect predictions).
Compare and contrast accuracy with precision and recall.
Click the icon for details about accuracy and class-imbalanced datasets.
See Classification: Accuracy, recall, precision and related metrics in Machine Learning Crash Course for more information.
action
In reinforcement learning, the mechanism by which the agent transitions between states of the environment. The agent chooses the action by using a policy.
activation function
A function that enables neural networks to learn nonlinear (complex) relationships between features and the label.
Popular activation functions include:
ReLU
Sigmoid
The plots of activation functions are never single straight lines. For example, the plot of the ReLU activation function consists of two straight lines:
active learning
A training approach in which the algorithm chooses some of the data it learns from. Active learning is particularly valuable when labeled examples are scarce or expensive to obtain. Instead of blindly seeking a diverse range of labeled examples, an active learning algorithm selectively seeks the particular range of examples it needs for learning.
AdaGrad
A sophisticated gradient descent algorithm that rescales the gradients of each parameter, effectively giving each parameter an independent learning rate. For a full explanation, see Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.
agent
In reinforcement learning, the entity that uses a policy to maximize the expected return gained from transitioning between states of the environment.
More generally, an agent is software that autonomously plans and executes a series of actions in pursuit of a goal, with the ability to adapt to changes in its environment. For example, an LLM-based agent might use an LLM to generate a plan, rather than applying a reinforcement learning policy.
agglomerative clustering
A category of clustering algorithms that create a tree of clusters. Hierarchical clustering is well-suited to hierarchical data, such as botanical taxonomies. There are two types of hierarchical clustering algorithms:
Agglomerative clustering first assigns every example to its own cluster, and iteratively merges the closest clusters to create a hierarchical tree.
Divisive clustering first groups all examples into one cluster and then iteratively divides the cluster into a hierarchical tree.
Contrast with centroid-based clustering.
See Clustering algorithms in the Clustering course for more information.
anomaly detection
The process of identifying outliers. For example, if the mean for a certain feature is 100 with a standard deviation of 10, then anomaly detection should flag a value of 200 as suspicious.
AR
Abbreviation for augmented reality.
area under the PR curve
Area under the interpolated precision-recall curve, obtained by plotting (recall, precision) points for different values of the classification threshold.
area under the ROC curve
A number between 0.0 and 1.0 representing a binary classification model’s ability to separate positive classes from negative classes. The closer the AUC is to 1.0, the better the model’s ability to separate classes from each other.
For example, the following illustration shows a classification model that separates positive classes (green ovals) from negative classes (purple rectangles) perfectly. This unrealistically perfect model has an AUC of 1.0:
A number line with 8 positive examples on one side and
9 negative examples on the other side.
Conversely, the following illustration shows the results for a classification model that generated random results. This model has an AUC of 0.5:
A number line with 6 positive examples and 6 negative examples.
The sequence of examples is positive, negative,
positive, negative, positive, negative, positive, negative, positive
negative, positive, negative.
Yes, the preceding model has an AUC of 0.5, not 0.0.
Most models are somewhere between the two extremes. For instance, the following model separates positives from negatives somewhat, and therefore has an AUC somewhere between 0.5 and 1.0:
A number line with 6 positive examples and 6 negative examples.
The sequence of examples is negative, negative, negative, negative,
positive, negative, positive, positive, negative, positive, positive,
positive.
AUC ignores any value you set for classification threshold. Instead, AUC considers all possible classification thresholds.
artificial general intelligence
A non-human mechanism that demonstrates a broad range of problem solving, creativity, and adaptability. For example, a program demonstrating artificial general intelligence could translate text, compose symphonies, and excel at games that have not yet been invented.
artificial intelligence
A non-human program or model that can solve sophisticated tasks. For example, a program or model that translates text or a program or model that identifies diseases from radiologic images both exhibit artificial intelligence.
Formally, machine learning is a sub-field of artificial intelligence. However, in recent years, some organizations have begun using the terms artificial intelligence and machine learning interchangeably.
attention
A mechanism used in a neural network that indicates the importance of a particular word or part of a word. Attention compresses the amount of information a model needs to predict the next token/word. A typical attention mechanism might consist of a weighted sum over a set of inputs, where the weight for each input is computed by another part of the neural network.
Refer also to self-attention and multi-head self-attention, which are the building blocks of Transformers.
attribute
Synonym for feature.
In machine learning fairness, attributes often refer to characteristics pertaining to individuals.
attribute sampling
A tactic for training a decision forest in which each decision tree considers only a random subset of possible features when learning the condition. Generally, a different subset of features is sampled for each node. In contrast, when training a decision tree without attribute sampling, all possible features are considered for each node.
AUC (Area under the ROC curve)
A number between 0.0 and 1.0 representing a binary classification model’s ability to separate positive classes from negative classes. The closer the AUC is to 1.0, the better the model’s ability to separate classes from each other.
For example, the following illustration shows a classification model that separates positive classes (green ovals) from negative classes (purple rectangles) perfectly. This unrealistically perfect model has an AUC of 1.0:
A number line with 8 positive examples on one side and
9 negative examples on the other side.
Conversely, the following illustration shows the results for a classification model that generated random results. This model has an AUC of 0.5:
A number line with 6 positive examples and 6 negative examples.
The sequence of examples is positive, negative,
positive, negative, positive, negative, positive, negative, positive
negative, positive, negative.
Yes, the preceding model has an AUC of 0.5, not 0.0.
Most models are somewhere between the two extremes. For instance, the following model separates positives from negatives somewhat, and therefore has an AUC somewhere between 0.5 and 1.0:
A number line with 6 positive examples and 6 negative examples.
The sequence of examples is negative, negative, negative, negative,
positive, negative, positive, positive, negative, positive, positive,
positive.
AUC ignores any value you set for classification threshold. Instead, AUC considers all possible classification thresholds.
augmented reality
A technology that superimposes a computer-generated image on a user’s view of the real world, thus providing a composite view.
autoencoder
A system that learns to extract the most important information from the input. Autoencoders are a combination of an encoder and decoder. Autoencoders rely on the following two-step process:
The encoder maps the input to a (typically) lossy lower-dimensional (intermediate) format.
The decoder builds a lossy version of the original input by mapping the lower-dimensional format to the original higher-dimensional input format.
Autoencoders are trained end-to-end by having the decoder attempt to reconstruct the original input from the encoder’s intermediate format as closely as possible. Because the intermediate format is smaller (lower-dimensional) than the original format, the autoencoder is forced to learn what information in the input is essential, and the output won’t be perfectly identical to the input.
For example:
If the input data is a graphic, the non-exact copy would be similar to the original graphic, but somewhat modified. Perhaps the non-exact copy removes noise from the original graphic or fills in some missing pixels.
If the input data is text, an autoencoder would generate new text that mimics (but is not identical to) the original text.
automatic evaluation
Using software to judge the quality of a model’s output.
When model output is relatively straightforward, a script or program can compare the model’s output to a golden response. This type of automatic evaluation is sometimes called programmatic evaluation. Metrics such as ROUGE or BLEU are often useful for programmatic evaluation.
When model output is complex or has no one right answer, a separate ML program called an autorater sometimes performs the automatic evaluation.
Contrast with human evaluation.
automation bias
When a human decision maker favors recommendations made by an automated decision-making system over information made without automation, even when the automated decision-making system makes errors.
See Fairness: Types of bias in Machine Learning Crash Course for more information.