Data Science Terminology - assorted topics Flashcards

1
Q

Machine Learning Operations (MLOps)

A

A practice for collaboration and communication between data scientists and operations professionals to help manage production machine learning (ML) lifecycles. It seeks to provide a disciplined approach to manage and scale ML models, drawing on principles and practices from DevOps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Machine Learning Model

A

A Machine Learning (ML) model is a mathematical or computational representation of real-world processes or patterns based on data. It’s built by training an algorithm on a set of data. There are various types of machine learning models, each suited to different tasks. Once a machine learning model is trained, it can be used to make predictions or decisions without being explicitly programmed to do so. For example, an ML model trained on email data might be able to predict whether a new email is spam or not based on its content. ML models are not perfect and their accuracy heavily depends on the quality and quantity of the data they are trained on, as well as the suitability of the algorithm used for the task at hand.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Science

A

Data science involves a blend of various tools, algorithms, and machine learning principles to extract patterns from raw data. It operates on the idea of using scientific methods, processes, and systems to gain insights from both structured and unstructured data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Goal of ML Ops

A

The goal of MLOps is to create a streamlined process for managing and deploying ML models at scale, improving the efficiency, reproducibility, and reliability of ML systems. It provides a conceptual framework to bridge the gap between development and operations in the ML lifecycle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Types of ML Models

A

supervised, unsupervised and reinforcement learning models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Supervised Learning Models

A

These are trained on labeled data, i.e., data that includes both the input and the desired output. They are used for tasks like regression (predicting a continuous output) and classification (predicting a categorical output). Examples include linear regression, decision trees, support vector machines, and neural networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Unsupervised Learning Models

A

These models learn from unlabeled data, finding structure and relationships within the data itself. They are used for tasks like clustering (grouping similar inputs) and dimensionality reduction (simplifying input by removing redundant features). Examples include k-means clustering and principal component analysis (PCA).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Reinforcement Learning Models

A

These models learn by interacting with their environment, receiving rewards or penalties based on the actions they take. They are used for tasks where the model needs to make a series of decisions that lead to a final goal, like game playing or robot navigation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Unsupervised Learning

A

Unsupervised learning is a type of machine learning where an algorithm learns from unlabeled data. This means the algorithm is not given the correct output during training. Instead, it must discover patterns, relationships, or structure in the input data on its own.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Clustering

A

Clustering is used to group similar data points together based on their characteristics. The algorithm determines the similarities between data points and clusters them accordingly. K-means and hierarchical clustering are popular examples of clustering algorithms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Dimensionality Reduction

A

Dimensionality reduction is used to reduce the number of input features while retaining the essential information. This is often used to make the data more manageable, to remove redundant or irrelevant features, or for visualization purposes. Principal Component Analysis (PCA) is a popular dimensionality reduction technique.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Supervised Learning

A

Supervised learning is a type of machine learning where an algorithm learns a model from labeled training data. This means the algorithm is given input data along with the corresponding correct output. It uses this information to learn the relationship between the input and the output, which can then be used to predict the output for new, unseen input data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Linear Regression

A

Linear regression is used to predict a continuous target variable based on one or more input features. The model assumes a linear relationship between the input and the output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Decision Trees

A

Decision trees are used for both classification (predicting a categorical output) and regression (predicting a continuous output). They split the data into different branches based on feature values, allowing for more complex relationships between the input and the output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Neural Networks

A

Neural networks are complex models inspired by the human brain, capable of learning nonlinear relationships between the input and the output. They consist of layers of interconnected nodes or “neurons”, each of which applies a simple computation to the data. Deep learning, a subfield of machine learning, involves neural networks with many layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Descriptive Statistics

A

These are basic metrics that summarize and describe the main features of a dataset. They include measures such as mean, median, mode, range, variance, standard deviation, and percentiles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Data Visualization

A

This involves using graphical representations of data to understand trends, patterns, and outliers in the data. Common tools include bar graphs, histograms, scatter plots, box plots, and heat maps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Data Cleaning

A

This involves dealing with missing values, removing duplicates, correcting errors, and handling outliers in the data. It’s a crucial step to ensure reliable results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Data Transformation

A

This involves converting data from one format or structure into another, such as normalizing numerical data, binning continuous variables, or encoding categorical variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Feature Engineering

A

This involves creating new features from existing ones to improve the performance of machine learning models. Techniques might include polynomial features, interaction terms, or creating domain-specific features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Hypothesis Testing

A

This is a statistical method that is used in making statistical decisions using experimental data. It involves formulating a null hypothesis and an alternative hypothesis, then using test statistics to accept or reject the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Regression Analysis

A

This technique is used for predicting a continuous outcome variable based on one or more input variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Classification

A

This technique is used to predict a categorical outcome variable based on one or more input variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Clustering

A

This unsupervised learning method groups data points together based on their similarities in features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Dimensionality Reduction

A

This involves reducing the number of input variables for a dataset, either by selecting only the most relevant features (feature selection) or by creating new features that capture the essential information in a more condensed form (feature extraction).

26
Q

Machine Learning

A

This involves using algorithms that can learn patterns from data and make predictions or decisions. It encompasses various techniques like regression, classification, clustering, and reinforcement learning.

27
Q

Deep Learning

A

This is a subset of machine learning that focuses on artificial neural networks with many layers (“deep” networks). It’s particularly effective for complex tasks like image and speech recognition.

28
Q

Natural Language Processing (NLP)

A

This involves techniques for dealing with human language. It’s used in applications like sentiment analysis, machine translation, and speech recognition.

29
Q

Time Series Analysis

A

This involves analyzing data that is recorded over time to identify trends, seasonal patterns, and other temporal structures.

30
Q

Anomaly Detection

A

This technique is used to identify outliers in the data, which can indicate errors, unusual behavior, or interesting patterns.

31
Q

Recommendation Systems

A

These systems suggest products or services to users based on their past behavior, preferences, and characteristics. Techniques include collaborative filtering and content-based filtering.

32
Q

A/B Testing

A

This is used to compare two versions of a webpage, marketing strategy, or other entity to determine which one performs better.

33
Q

Ensemble Methods

A

These combine the predictions of multiple machine learning models to improve the overall performance. Techniques include bagging, boosting, and stacking.

34
Q

Ensemble methods

A

Ensemble methods are techniques in machine learning that combine the decisions from multiple models to improve the overall performance. They’re based on the idea that a group of weak learners can come together to form a strong learner. Ensemble methods can be computationally expensive and complex to set up and understand, but they often achieve higher accuracy than individual models. By effectively combining multiple perspectives, they can lead to better generalization and more robust predictions.

35
Q

Bagging

A

Bagging, or Bootstrap Aggregating, involves creating multiple subsets of the original dataset, training a model on each subset, and then combining the outputs. The subsets are created with replacement, meaning that the same data point can appear in multiple subsets. The final prediction is typically the mode (for classification) or mean (for regression) of the predictions from each model. Random Forests is a popular example of bagging, where many decision tree models are combined.

36
Q

Boosting

A

Boosting involves training multiple models in a sequential manner, where each new model attempts to correct the errors of the previous models. Each model in the sequence puts more emphasis on the instances that the previous models got wrong, aiming to improve upon them. After training, the models vote for the final prediction, with some votes carrying more weight than others based on the individual model’s performance. Examples of boosting algorithms include AdaBoost and Gradient Boosting.

37
Q

Stacking

A

Stacking, or Stacked Generalization, involves training multiple different models and then combining their outputs with another model, called a meta-learner or a second-level learner. The meta-learner is trained to make a final prediction based on the outputs of the individual models. This method leverages the strengths of each individual model and can often lead to improved performance.

38
Q

Voting

A

Voting can be used to combine predictions from multiple models. It can be hard (majority or plurality voting, where each model votes for one class and the class with the most votes is chosen) or soft (where each model outputs probabilities for each class, and the class with the highest average probability across models is chosen).

39
Q

Bag of Models

A

This technique involves training multiple models using different subsets of the training data and different machine learning algorithms. The predictions from all models are then combined through a simple mechanism like averaging or majority voting.

40
Q

Transformers

A

A type of model architecture used in the field of deep learning, particularly for handling sequential data such as natural language. They were introduced in the paper “Attention is All You Need” by Vaswani et al. (2017).

41
Q

Input Embedding

A

In the case of natural language processing (NLP), the input to a Transformer is a sequence of tokens, which are typically words or subwords. These tokens are first converted into vectors using an embedding layer.

42
Q

Positional Encoding

A

Since Transformers don’t inherently understand the order of the input data (unlike RNNs and LSTMs), they need a way to incorporate the position of each token in the input sequence. This is done through positional encodings, which are added to the input embeddings.

43
Q

Multi-Head Attention

A

This is the core component of the Transformer architecture. It allows the model to focus on different parts of the input sequence for each token. It calculates a weighted sum of the input vectors, where the weights are determined by the “attention scores”. This process is done multiple times in parallel (hence “multi-head”) to allow the model to focus on different features.

44
Q

Feed-Forward Neural Networks

A

Each position in the Transformer passes through a feed-forward neural network, which is independent across positions. This includes two linear transformations with a ReLU activation in between.

45
Q

Normalization and Residual Connections

A

After each multi-head attention block and feed-forward neural network, the Transformer uses layer normalization and residual connections to help stabilize training.

46
Q

Encoder and Decoder Blocks

A

A Transformer model typically consists of an encoder and a decoder, each of which is composed of multiple identical layers. The encoder takes in the input sequence and produces a sequence of vectors (the “encoder output”). The decoder takes the encoder output and the target sequence so far and produces the next token in the target sequence.

47
Q

Output Linear Layer and Softmax

A

The output of the final decoder block is passed through a linear layer followed by a softmax function to produce probabilities for each potential output token.

48
Q

Masking

A

Transformers use masking to prevent certain attention scores from being calculated. In the encoder, this is used to prevent the model from attending to future tokens in the sequence, preserving the sequential nature of the data.

49
Q

Random Forests

A

This technique is a type of bagging ensemble method that creates a set of decision trees from a randomly selected subset of the training set. It then aggregates the votes from different decision trees to decide the final class of the test object. This algorithm not only helps to improve the model accuracy but also prevents overfitting.

50
Q

Bagging (Bootstrap Aggregating)

A

This technique involves creating multiple subsets of the original dataset, with replacement, training a model (for instance, a decision tree) on each, and then combining their predictions. The model’s final prediction is typically the mode (for classification) or mean (for regression) of the predictions from each model. Bagging helps to decrease the model’s variance.

51
Q

Extra-Trees (Extremely Randomized Trees)

A

Similar to random forests, in Extra Trees, a random subset of features is selected to split each node in the tree. However, unlike random forests, the best split is not chosen. Instead, a random split is selected, making Extra Trees “extremely randomized”. This can help reduce the variance even further than a random forest, at the cost of a slight increase in bias.

52
Q

AdaBoost (Adaptive Boosting)

A

Unlike bagging methods, boosting methods train models in sequence. Each new model is trained to correct the errors made by the previous models. AdaBoost achieves this by assigning higher weights to the instances that the previous model got wrong, making the new model focus more on these instances. Finally, the predictions from all models are combined through a weighted majority vote (or sum for regression) to produce the final prediction.

53
Q

Gradient Boosting

A

Gradient Boosting is another boosting method that trains models in sequence to correct the errors of the previous models. However, instead of modifying the instance weights, this method fits the new model to the residual errors made by the previous model. Then, it combines the predictions of all models through a sum. Examples of gradient boosting algorithms include Gradient Boosting Machine (GBM), XGBoost, LightGBM, and CatBoost.

54
Q

Stacking (Stacked Generalization)

A

This method involves training several different models and combining their predictions using another model, called a meta-learner or a second-level learner. The base-level models are trained based on a complete training set, then the meta-model is fitted based on the outputs, or the out-of-fold predictions, of the base level models to make a final prediction.

55
Q

Random Forests - S&W

A

Strengths:
Good performance: They often provide a very good predictive accuracy out-of-box.
Feature importance: They can provide a measure of feature importance.
Minimal data preprocessing: They require little data preprocessing, e.g., no need for feature scaling.
Handling missing data: They have methods for dealing with missing data.
Low risk of overfitting: Due to averaging of decision trees, overfitting risk is low.
Weaknesses:
Complexity: They create a large number of trees (though this can be controlled by the user) and hence are more complex and computational demanding.
Interpretability: They are not easily interpretable like a decision tree as they involve an ensemble of trees.

56
Q

Bagging (Bootstrap Aggregating) S&W

A

Strengths:
Reduces overfitting: By averaging the results from multiple models, it reduces the chance of overfitting.
Good with high dimensional data: Bagging can be effective on high-dimensional datasets where individual models may overfit.
Weaknesses:
Weak learners should be diverse: Bagging relies on the diversity of the weak learners, so the base learner should be a model with high variance.
Computationally expensive: It might be computationally expensive if the base learner is complex.

57
Q

Extra-Trees (Extremely Randomized Trees) S&W

A

Strengths:
Reduces variance further: By introducing additional randomness in the selection of features and splits, Extra Trees can reduce the variance of the model further than a random forest.
Fast to train: Since it uses random thresholds for each feature rather than searching for the best possible thresholds (like Random Forests), it’s typically faster to train.
Weaknesses:
Increased bias: Introducing additional randomness can increase the bias of the model.
Not interpretable: Like Random Forests, Extra Trees are not easily interpretable.

58
Q

AdaBoost (Adaptive Boosting) S&W

A

Strengths:
Good performance: Can achieve good classification results with much less tweaking of parameters or settings.
No need for prior knowledge about weak learner: It automatically determines the weight of the weak classifiers based on their accuracy.
Weaknesses:
Sensitive to noisy data and outliers: AdaBoost can be sensitive to noisy data and outliers.
Computationally expensive: AdaBoost can be slower to train than bagging models as the model trains sequentially.

59
Q

Gradient Boosting S&W

A

Strengths:
High performance: Often provides predictive accuracy that cannot be beaten.
Flexibility: Can optimize on different loss functions and provides several hyperparameter tuning options.
Weaknesses:
Prone to overfitting: Without careful tuning, gradient boosting models can overfit the training data.
Computationally expensive: Gradient boosting can be computationally expensive and requires careful tuning.

60
Q

Stacking (Stacked Generalization) S&W

A

Strengths:
High performance: Can outperform any individual model due to its ability to optimize over multiple models.
Flexibility: Can use different types of models at base level.
Weaknesses:
Complexity: Stacking multiple models increases complexity of the model.
Computationally expensive: Training multiple layers of models can be time-consuming.
Risk of overfitting: If not careful, stacking can lead to overfitting especially when involving many different models or complex models.