Chapter 4 Flashcards
(110 cards)
In classification, a Type I error, also known as a ______, occurs when a patient who does not have a disease receives a positive diagnostic.
false positive
A Type II error, or ______, happens when a patient has a disease but it was not detected.
false negative
The four fundamental outcomes in a binary classification task are True Positive (TP), True Negative (TN), ______, and ______.
False Positive (FP),False Negative (FN)
In a confusion matrix, the sum of True Positives and False Positives (TP + FP) represents the ______.
Total Predicted Positives (RP)
The formula for Accuracy in classification is (TP + TN) / ______, where P is total actual positives and N is total actual negatives.
P + N
Precision, defined as TP / (TP + FP), measures the proportion of ______ among the instances predicted as positive.
correctly predicted positive instances
Recall, also known as sensitivity or True Positive Rate, is calculated as TP / (TP + FN) and measures the proportion of ______ that were correctly identified.
actual positive instances
The F1-score is the harmonic mean of ______ and ______, providing a balance between them.
Precision,Recall
Sensitivity, calculated as TP / (TP + FN), can be thought of as the likelihood of spotting a ______ when presented with one.
positive case
Specificity, calculated as TN / (TN + FP), is the likelihood of spotting a ______ when presented with one.
negative case
The Receiver Operating Characteristic (ROC) curve plots the True Positive Rate (TPR) against the ______ at various threshold settings.
False Positive Rate (FPR)
The True Positive Rate (TPR) used in ROC analysis is also known as ______ or ______.
recall,sensitivity
The False Positive Rate (FPR) used in ROC analysis, calculated as FP(t)/N, is also referred to as ______ or ______.
fallout,false alarm rate
To compare different classification algorithms using ROC curves, one typically compares their ______, often abbreviated as AUC.
Area Under the Curve
Categorical data values that are simply names or labels with no ordering defined, such as gender or color, are known as ______ values.
Nominal
Categorical data values where order does matter, such as t-shirt size or rank, are called ______ values.
Ordinal
The process of replacing each category in categorical data with a unique number is known as ______.
String Indexing
______ is a technique used to break the inherent ordering within a categorical column by creating new binary columns for each unique category.
One Hot Encoding (OHE)
The process that aims to optimize a model’s configuration to achieve the best possible performance for a specific problem is called ______.
Hyperparameter Tuning
Common techniques for hyperparameter optimization include Manual Search, Grid Search, Random Search, and ______.
Bayesian Optimization
In regression tasks, the ______ is a common metric calculated as the square root of the Mean Squared Error (MSE).
root mean squared error (RMSE)
The Mean Squared Error (MSE) indicates how close a regression line is to a set of data points by taking the distances or ‘errors’ from the points to the regression line and ______ them.
squaring
A common practice when handling numerical attributes is to assume ______ for these attributes.
normal distributions
Error on the ______ is not a good indicator of a model’s performance on future data.
training data