Path1.Mod1.h - Explore ML Workspace - Model Metrics and Evaluation Flashcards

1
Q

MAE RMSE RSE RAE R2

The five Metrics for evaluating Regression model Performance for a completed job

Two can be applied to models with different units

A
  1. Mean Absolute Error (MAE) - Average diff between predicted and true values. Lower scores == better model accuracy/performance.
  2. Root Mean Squared Error (RMSE) - Square root of the mean squared difference between predicted and true. Larger values indicate greater variance in individual errors
  3. Relative Squared Error (RSE) - Based on the square of the differences between predicted and true values. This value lies between 0 and 1, with values closer to 0 indicating better performance. Because it is relative, it can be used to compare models with different units.
  4. Relative Absolute Error (RAE) - Based on the absolute differences between predicted and true values. This value lies between 0 and 1, with values closer to 0 indicating better performance. Because it is relative, it can be used to compare models with different units.
  5. Coefficient of Determination (R2) - This is R-Squared. Commonly used and summarizes the variance between predicted and true values as explained by the model. Values closer to 1 indicate better model performance (as opposed to Relative Squared or Relative Absolute). Caution here; higher values can be suspect, lower ones can be entirely normal (ELI5: The degree in which the data fits the model)

Relative Squared and Relative Absolute, since they are relative~

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

TP TN FP FN

The Confusion Matrix found in Evaluation Results for a completed Classification job

A

The Confusion Matrix shows (for binary results):
- True Positives - Both predicted and actual values were both 1. Top Left
- True Negatives - Both predicted and actual values were both 0. Bottom Right
- False Positives - The prediction is 1, but the actual is 0. Top Right
- False Negatives - The prediction is 0, but the actual is 1. Bottom Left

For multi-class results, for N-number of possible classifications, there would be an NxN matrix counting the results for each possibility.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A P R F1

The four metrics derived from the results of a Confusion Matrix for Classification Models

A
  1. Accuracy - The ratio of correct predictions (true positives + true negatives) to total predictions. What proportion of predictions did the model get right.
  2. Precision - The fraction of cases classified as positive that are true positive (true positives / (true positives + false positives)). Out of all the data points the model predicted true, the percentage of time it was correct (ELI5: It’s ability to guess True correctly)
  3. Recall - The fraction of cases correctly identified (true positives / (true positives + false negatives)). Out of all the datapoints that actually are true, how many did the model classify correctly. (ELI5: It’s ability to avoid guessing wrong)
  4. F1 Score - Precision + Recall. Ideal score value is 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

For metrics related to Classification Models, the one that is the most intuitive, but is potentially misleading

A

The most intuitive obviously is Accuracy, though care must be given when using it w.r.t. how well the model works. Ex. 3% of the population has cold sores. Your model could ALWAYS predict 0 and it would be 97% accurate. Notice that this model isn’t at all helpful in predicting cold sores in people?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The ROC Curve and AUC Metric for evaluating Classification Model results.

A

The ROC Curve - Receiver Operating Characteristic Curve. This is the plot of the False Positive Rate (x-axis) vs Recall/The True Positive Rate (y-axis) for every possible threshold value between 0 and 1. Ideally goes all the way up the left side then curves across the top

The AUC Metric - Area Under the ROC Curve. Measures the quality of the model’s predictions irrespective of what classification threshold is chosen. The larger this area is, the better the model is performing. Imagine a pure coin flip; 50% right 50% wrong. The graph for this would be a straight diagonal line from the origin f(x) = y. The area UNDER is 0.5. The greater this area gets (the greater the line curves upward) the better.

ELI5: It’s ability to guess between Positive and Negative classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

For evaluating Classification Model results:
The True Positive Rate formula
The False Positive Rate formula

A

True Positive Rate - TPR = TP/(TP + FN)
FN meaning predicted false but actually true

False Positive Rate - FPR = FP/(FP + TN)
TN meaning predicted false and actually false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The Residual Histogram for evaluating Regression/Forecasting

A

Remember, residuals are the number of prediction errors vs the frequency in which the error values occur. A “good” chart shows that most errors happen near zero, meaning the majority of error values are very close to or at zero. Whereas ideally you want to see larger error values (either negative error or positive errors) with lower to no frequency (either end of of the chart). If you see the opposite, then your model is shit:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Predicted vs True Chart for evaluating Regression/Forecasting

A

Chart with a dotted line meaning “ideal” predictions from your model, compared to to the average actual predictions. A “good” charts shows both these lines as close as possible:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The Forecast Horizon Chart for evaluating Time Series Forecasting.

The x and y axis represent…
Each part of the chart left of and right of the y-axis represents…

A

Plots the relationship between predicted values and the actual values mapped over time per cross validation folds, up to 5 folds.

  • The x-axis is the frequency (time period) provided during training setup
  • The y-axis is the Horizon Point or Horizon Line, which is the time period you want to start generating predictions
  • Left of the Horizon Line are historic datapoints
  • Right of the Horizon Line is the visualized predictions (purple line) against the actuals (blue line) with the shaded purple areas indicating confidence intervals or variance of prediction around the mean.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly