Evaluating Model Performance Flashcards Preview

Udacity > Evaluating Model Performance > Flashcards

Flashcards in Evaluating Model Performance Deck (20):

Describe Accuracy

accuracy = number of correctly identified instances in the class / all instances in that class


Describe a Confusion Matrix

A matrix with the predicted class counts (correct / in correct) on the row elements and the actual class count (correct / incorrect) on the columns


What is Recall?

TP / (TP + FN) - Recall in this context is also referred to as the true positive rate or sensitivity


What is Precision?

TP / (TP + FP) - precision is also referred to as positive predictive value (PPV);


What is the F1 score?

F1 = 2 * (precision * recall) / (precision + recall). The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst at 0:


What is the Mean Absolute Error?

The mean absolute error takes the total absolute error of each example and averages the error based on the number of data points.


What is the Mean Squared Error?

the residual error - or that is the difference between predicted and the true value, are squared.

Some benefits of squaring the residual error is that it automatically converts all the errors as positives, emphasizes larger errors rather than smaller errors, and from calculus is differentiable which allows us to find the minimum or maximum values.


In model prediction what are the two main sources of errors that a model can suffer from?

Bias due to a model being unable to represent the complexity of the underlying data or variance due to a model that is overly sensitive to the limited data it has been trained on.


When does bias occur?

bias occurs when a model has enough data but is not complex enough to capture the underlying relationships. As a result, the model consistently and systematically misrepresents the data, leading to low accuracy in prediction. This is known as underfitting. It is an overly simplified model. High error on training set. In regression that would mean low r^2, large SSE


When does variance occur?

When we train a model, we typically use a limited number of samples from a larger population (the training set). If we repeatedly train a model with randomly selected subsets of data, we would expect its predictons to be different based on the specific examples given to it. Here variance is a measure of how much the predictions vary for any given test sample.

Some variance is normal, but too much variance indicates that the model is unable to generalize its predictions to the larger population from which training samples were drawn. High sensitivity to the training set is also known as overfitting, and generally occurs when either the model is too complex and/or we do not have enough data to support it.

We can typically reduce the variability of a model's predictions and increase precision by training on more data. Would mean much larger error on test set than on training set.


What can happen more likely when you use too few features?

High bias. Might need several features to fully describe what's going on in the data but might only be using a subset of the necessary features. Thus, overly simplified model and high bias. Think of it as highly biases to too few features.


What are patterns to identify high variance?

Using many features or carefully optimizing performance to a training set.


What is K-Fold Cross Validation?

Split data into k bin sizes, Run K separate training / test runs picking each bin once and training on the other k-1 bins. Then average test set performances from the k experiments.


What is the curse of dimensionality?

As the number of features or dimensions grow, the amount of data we need to generalize accurately grows exponentially


What is a learning curve in machine learning

A visual graph that compares the metric performance of a model on training and testing data over a number of training instances.


What would indicate bias in a learning curve?

When the training and testing errors converge and are quite high this essentially means the model is biased. No matter how much data we feed it, the model cannot represent the underlying relationship and therefore has systematic high errors.


What would indicate variance in a learning curve?

When we have a large gap between the training and testing error this essentially means the model suffers from high variance. Unlike a bias model, models that suffer from variance can generally improve if we have more data to learn from or we can simplify the model representing the most important features of the data.


Describe an ideal learning curve.

When both the testing and training curves converge and where the error is extremely low. That is the model is very accurate on unseen data.


What is a model complexity graph?

A model complexity graph looks at how the complexity of a model changes the training and testing curves rather than the number of data points to train on. The general trend of is that as a model increases, the more variability exists in the model for a fixed set of data.


What is the Simpson's paradox?

Simpson's paradox, or the Yule–Simpson effect, is a paradox in probability and statistics, in which a trend appears in different groups of data but disappears or reverses when these groups are combined. Berkley discrimination example.