Machine Learning Flashcards
Task
What we want to obtain given a set of data.
Features
The properties of the data that are used as input by the model
Model
Gets the data as input and returns an output.
Learning algorithm
The algorithm that generates the model, given a specific set of data.
Classification
Task: give a label C1 of a set of labels Cs, to a given input
Learning task: find a function c*, called classifier, that approximate at best the real classification function c.
Evaluation
We need to evaluate a model, to know how well it works for the task. There are a lot of measures to use to evaluate a model:
- accuracy
- recall
Decision tree
Is a model for CLASSIFICATION TASKS that uses a tree in which the nodes are feature-wise question to answer, the answers label the arcs to follow, and the leaves are the labels
Accuracy
It’s an evaluation measure that calculates:
Number of correctly labeled examples
/
Number of labeled examples
(TRUE POSITIVES + TRUE NEGATIVES) / (ALL POSITIVES + ALL NEGATIEVS)
Recall
In a BINARY CLASSIFICATION we define:
- POSITIVE RECALL = TRUE POSITIVE / ALL POSITIVES
- NEGATIVE RECALL (SPECIFICITY) = TRUE NEGATIVE / ALL NEGATIVES
- AVERAGE RECALL = (POSITIVE RECALL + NEGATIVE RECALL) / 2
Contingency table
It’s a tool used for BINARY CLASSIFICATION, we put on the columns the number of the predicted classes, and on the rows the number of the true classes.
Predicted Class | Positive | Negative | Actual Class |------------|------------| Positive | TP | FN | Negative | FP | TN |
Coverage plot
It’s a graph large N and height M.
N: number of FALSE POSITIVES
M: number of TRUE POSITIVES
Can be used to MODELS PERFORMANCES by finding the representing coordinates for each model in the graph, using the number of the model’s TRUE and FALSE POSITIVES.
Classifiers with the same accuracy are connected by lines of slope 1 (DEMONSTRATION)
Classifiers with the same average recall are connected by lines parallel to the main diagonal (Neg/Pos slope) (DEMONSTRATION)
NEEDS NORMALIZATION!
ROC plot
It’s the coverage plot, normalized. So the graph is large 1 and height 1.
Let confront models that have performances calculated on different datasets and numbers.
Classifiers with the same accuray are connected by lines of slope Neg/Pos (DEMONSTRATION)
Classifiers with the same average recall are connected by lines of slope 1 (DEMONSTRATION)
Scoring classification
This is a task similar to the classification, but the model is a SCORING CLASSIFIER s*, that takes an input and returns a Nth-vector of scores where N is the number of classes.
So it does not tell the class associated to the input, but the SCORE for each class associated to the input.
The TRAINING SET is the same as the one for the CLASSIFICATION.
Margin
For a SCORING BINARY CLASSIFIER the margin is a function that takes an input and returns a positive value if the input is classified correcty, negative otherwise.
Can be written as:
margin(x) = c(x)*s(x) =
1. +|s(x)| if s is correct
2. - |s(x)| if s is not correct
where
margin(x): is the margin function
c(x): is the true class of x (+1 for positive, -1 for negative class)
s(x): is the score for x given by the classifier
Loss function
For the purpose of rewarding large positive margins and penalize large negative margins, we define a LOSS FUNCTION.
Loss function is a function L that:
L: R –> [0,+inf)
that maps each example’s margin z(x) to an associated LOSS L(x).
We bound, or assume, L(x) to be:
L(x) > 1 for z(x) < 0
L(x) = 1 for z(x) = 0
L(x) > 0 and L(x) < 1 for z(x) > 0
So the value of L is less than 1 for each correctly classified example, and more than 1 otherwise.