Machine Learning Flashcards
(32 cards)
What is Machine learning?
The fundamental idea of machine learning is to use data from past observations to predict unknown outcomes or values. Machine learning has its origins in statistics and mathematical modeling of data.
What are the processes in Machine Learning?
Fundamentally, a machine learning model is a software application that encapsulates a function to calculate an output value based on one or more input values. The process of defining that function is known as training. After the function has been defined, you can use it to predict new values in a process called inferencing.
What is training data ?
The training data consists of past observations. In most cases, the observations include the observed attributes or features of the thing being observed, and the known value of the thing you want to train a model to predict (known as the label).
In mathematical terms, you’ll often see the features referred to using the shorthand variable name x, and the label referred to as y. Usually, an observation consists of multiple feature values, so x is actually a vector (an array with multiple values), like this: [x1,x2,x3,…].
Types of Machine Learning
Major Types
1. Supervised Machine Learning
a. Regression
b. Classification
i. Binary Classification
ii. Multiclass classification
2. Unsupervised machine learning
a. Clustering
What is supervised machine learning?
Supervised machine learning is a general term for machine learning algorithms in which the training data includes both feature values and known label values.
What is regression?
Regression is a form of supervised machine learning in which the label predicted by the model is a numeric value.
What is classification?
Regression is a form of supervised machine learning in which the label predicted by the model is a numeric value.
Types:
1. In binary classification, the label determines whether the observed item is (or isn’t) an instance of a specific class. Or put another way, binary classification models predict one of two mutually exclusive outcomes.
2. Multiclass classification extends binary classification to predict a label that represents one of multiple possible classes
What is unsupervised machine learning?
Unsupervised machine learning involves training models using data that consists only of feature values without any known labels. Unsupervised machine learning algorithms determine relationships between the features of the observations in the training data.
What is clustering?
A clustering algorithm identifies similarities between observations based on their features, and groups them into discrete clusters.
In some cases, clustering is used to determine the set of classes that exist before training a classification model.
What is regression?
Regression models are trained to predict numeric label values based on training data that includes both features and known labels. The process for training a regression model (or indeed, any supervised machine learning model) involves multiple iterations in which you use an appropriate algorithm (usually with some parameterized settings) to train a model, evaluate the model’s predictive performance, and refine the model by repeating the training process with different algorithms and parameters until you achieve an acceptable level of predictive accuracy.
What is linear regression?
linear regression, which works by deriving a function that produces a straight line through the intersections of the x and y values while minimizing the average distance between the line and the plotted points
What are some Regression Evaluation Metrics?
- Mean Absolute Error (MAE) - This metric is known as the absolute error for each prediction, and can be summarized for the whole validation set as the mean absolute error (MAE).
- Mean Squared Error (MSE) - One way to produce a metric that “amplifies” larger errors by squaring the individual errors and calculating the mean of the squared values. This metric is known as the mean squared error (MSE).
3.Root Mean Squared Error (RMSE) - square root of MSE
4. Coefficient of determination (R2) - The coefficient of determination (more commonly referred to as R2 or R-Squared) is a metric that measures the proportion of variance in the validation results that can be explained by the model, as opposed to some anomalous aspect of the validation data (for example, a day with a highly unusual number of ice creams sales because of a local festival).
The calculation for R2 is more complex than for the previous metrics. It compares the sum of squared differences between predicted and actual labels with the sum of squared differences between the actual label values and the mean of actual label values, like this:
R2 = 1- ∑(y-ŷ)2 ÷ ∑(y-ȳ)2
the result is a value between 0 and 1. closer to 1 this value is, the better the model is fitting the validation data.
What is iterative Training?
In most real-world scenarios, a data scientist will use an iterative process to repeatedly train and evaluate a model, varying:
a. Feature selection and preparation
b. Algorithm selection
c. Algorithm parameters (numeric settings to control algorithm behavior, more accurately called hyperparameters to differentiate them from the x and y parameters).
After multiple iterations, the model that results in the best evaluation metric that’s acceptable for the specific scenario is selected.
What is Binary classification?
Classification, like regression, is a supervised machine learning technique; and therefore follows the same iterative process of training, validating, and evaluating models. Instead of calculating numeric values like a regression model, the algorithms used to train classification models calculate probability values for class assignment and the evaluation metrics used to assess model performance compare the predicted classes to the actual classes.
There are many algorithms that can be used for binary classification, such as logistic regression, which derives a sigmoid (S-shaped) function with values between 0.0 and 1.0
What are Binary Classification evaluation metrics
The first step in calculating evaluation metrics for a binary classification model is usually to create a matrix of the number of correct and incorrect predictions for each possible class label:
This visualization is called a confusion matrix, and it shows the prediction totals where:
ŷ=0 and y=0: True negatives (TN)
ŷ=1 and y=0: False positives (FP)
ŷ=0 and y=1: False negatives (FN)
ŷ=1 and y=1: True positives (TP)
where predicted class labels (ŷ) , actual class labels (y)
What is accuracy (in binary classification)
The simplest metric you can calculate from the confusion matrix is accuracy - the proportion of predictions that the model got right. Accuracy is calculated as:
(TN+TP) ÷ (TN+FN+FP+TP)
When chance of success is high, accuracy cannot be relied upon.
What is recall (in binary classification)
Recall is a metric that measures the proportion of positive cases that the model identified correctly. In other words, compared to the number of patients who have diabetes, how many did the model predict to have diabetes?
The formula for recall is:
TP ÷ (TP+FN)
Another name for recall is the true positive rate (TPR), and there’s an equivalent metric called the false positive rate (FPR) that is calculated as FP÷(FP+TN).
What is precision (in binary classification)
Precision is a similar metric to recall, but measures the proportion of predicted positive cases where the true label is actually positive. In other words, what proportion of the patients predicted by the model to have diabetes actually have diabetes?
The formula for precision is:
TP ÷ (TP+FP)
What is F1 Score (in binary classification)
F1-score is an overall metric that combined recall and precision. The formula for F1-score is:
(2 x Precision x Recall) ÷ (Precision + Recall)
what is Area Under the Curve (AUC) ( in binary classification)
TPR and FPR are often used to evaluate a model by plotting a received operator characteristic (ROC) curve that compares the TPR and FPR for every possible threshold value between 0.0 and 1.0:
The ROC curve for a perfect model would go straight up the TPR axis on the left and then across the FPR axis at the top.
What is multiclass classfication?
Multiclass classification is used to predict to which of multiple possible classes an observation belongs. As a supervised machine learning technique, it follows the same iterative train, validate, and evaluate process as regression and binary classification in which a subset of the training data is held back to validate the trained model.
What are some of the multiclass classification model algorthims?
To train a multiclass classification model, we need to use an algorithm to fit the training data to a function that calculates a probability value for each possible class. There are two kinds of algorithm you can use to do this:
One-vs-Rest (OvR) algorithms
Multinomial algorithms
What is One-vs-Rest (OvR) algorithms?
One-vs-Rest algorithms train a binary classification function for each class, each calculating the probability that the observation is an example of the target class. Each function calculates the probability of the observation being a specific class compared to any other class.
f0(x) = P(y=0 | x)
f1(x) = P(y=1 | x)
f2(x) = P(y=2 | x)
Each algorithm produces a sigmoid function that calculates a probability value between 0.0 and 1.0. A model trained using this kind of algorithm predicts the class for the function that produces the highest probability output.
What are multinomial algorithms?
Multinomial Algorithms creates a single function that returns a multi-valued output. The output is a vector (an array of values) that contains the probability distribution for all possible classes - with a probability score for each class which when totaled add up to 1.0:
f(x) =[P(y=0|x), P(y=1|x), P(y=2|x)]
An example of this kind of function is a softmax function, which could produce an output like the following example:
[0.2, 0.3, 0.5]
The elements in the vector represent the probabilities for classes 0, 1, and 2 respectively; so in this case, the class with the highest probability is 2.