Chapter 4 Flashcards by Matthew Alexander Paudianto

In classification, a Type I error, also known as a ______, occurs when a patient who does not have a disease receives a positive diagnostic.

false positive

How well did you know this?

Not at all

Perfectly

A Type II error, or ______, happens when a patient has a disease but it was not detected.

false negative

How well did you know this?

Not at all

Perfectly

The four fundamental outcomes in a binary classification task are True Positive (TP), True Negative (TN), ______, and ______.

False Positive (FP),False Negative (FN)

How well did you know this?

Not at all

Perfectly

In a confusion matrix, the sum of True Positives and False Positives (TP + FP) represents the ______.

Total Predicted Positives (RP)

How well did you know this?

Not at all

Perfectly

The formula for Accuracy in classification is (TP + TN) / ______, where P is total actual positives and N is total actual negatives.

P + N

How well did you know this?

Not at all

Perfectly

Precision, defined as TP / (TP + FP), measures the proportion of ______ among the instances predicted as positive.

correctly predicted positive instances

How well did you know this?

Not at all

Perfectly

Recall, also known as sensitivity or True Positive Rate, is calculated as TP / (TP + FN) and measures the proportion of ______ that were correctly identified.

actual positive instances

How well did you know this?

Not at all

Perfectly

The F1-score is the harmonic mean of ______ and ______, providing a balance between them.

Precision,Recall

How well did you know this?

Not at all

Perfectly

Sensitivity, calculated as TP / (TP + FN), can be thought of as the likelihood of spotting a ______ when presented with one.

positive case

How well did you know this?

Not at all

Perfectly

Specificity, calculated as TN / (TN + FP), is the likelihood of spotting a ______ when presented with one.

negative case

How well did you know this?

Not at all

Perfectly

The Receiver Operating Characteristic (ROC) curve plots the True Positive Rate (TPR) against the ______ at various threshold settings.

False Positive Rate (FPR)

How well did you know this?

Not at all

Perfectly

The True Positive Rate (TPR) used in ROC analysis is also known as ______ or ______.

recall,sensitivity

How well did you know this?

Not at all

Perfectly

The False Positive Rate (FPR) used in ROC analysis, calculated as FP(t)/N, is also referred to as ______ or ______.

fallout,false alarm rate

How well did you know this?

Not at all

Perfectly

To compare different classification algorithms using ROC curves, one typically compares their ______, often abbreviated as AUC.

Area Under the Curve

How well did you know this?

Not at all

Perfectly

Categorical data values that are simply names or labels with no ordering defined, such as gender or color, are known as ______ values.

Nominal

How well did you know this?

Not at all

Perfectly

Categorical data values where order does matter, such as t-shirt size or rank, are called ______ values.

Ordinal

How well did you know this?

Not at all

Perfectly

The process of replacing each category in categorical data with a unique number is known as ______.

String Indexing

How well did you know this?

Not at all

Perfectly

______ is a technique used to break the inherent ordering within a categorical column by creating new binary columns for each unique category.

One Hot Encoding (OHE)

How well did you know this?

Not at all

Perfectly

The process that aims to optimize a model’s configuration to achieve the best possible performance for a specific problem is called ______.

Hyperparameter Tuning

How well did you know this?

Not at all

Perfectly

Common techniques for hyperparameter optimization include Manual Search, Grid Search, Random Search, and ______.

Bayesian Optimization

How well did you know this?

Not at all

Perfectly

In regression tasks, the ______ is a common metric calculated as the square root of the Mean Squared Error (MSE).

root mean squared error (RMSE)

How well did you know this?

Not at all

Perfectly

The Mean Squared Error (MSE) indicates how close a regression line is to a set of data points by taking the distances or ‘errors’ from the points to the regression line and ______ them.

squaring

How well did you know this?

Not at all

Perfectly

A common practice when handling numerical attributes is to assume ______ for these attributes.

normal distributions

How well did you know this?

Not at all

Perfectly

Error on the ______ is not a good indicator of a model’s performance on future data.

training data

How well did you know this?

Not at all

Perfectly

To properly tune parameters and evaluate a model, one should ideally use three distinct datasets: training data, ______, and test data.

validation data, test data

The ______ procedure involves splitting the original data into a training set and a test set, for example, reserving 2/3 for training and 1/3 for testing.

Holdout

______ is a method where the data is partitioned into k disjoint subsets, and the model is trained on k-1 partitions and tested on the remaining one, repeating this k times.

K-fold Cross-validation

A disadvantage of the Holdout method is that its evaluation can have a ______, heavily depending on which data points end up in the training and test sets.

high variance

In k-fold cross-validation, every data point gets to be in a test set exactly once and gets to be in a training set ______ times.

k-1

A computationally expensive form of cross-validation where the number of folds is equal to the number of training instances is called ______ cross-validation.

Leave-One-Out

The performance of a machine learning algorithm on unseen data is referred to as its ______.

generalization error

______ is a resampling technique used to estimate the distribution of a dataset by sampling with replacement from the original dataset.

Bootstrap

The difference between prediction values made by a model and the actual or expected values is known as ______ or Errors due to bias.

bias errors

A model with ______ makes more assumptions and is unable to capture important features of the dataset, performing poorly on new data.

high bias

______ errors indicate how much a random variable's prediction differs from its expected value, with high values suggesting the model doesn't generalize well from training data.

Variance

A model that shows ______ learns the training dataset very well but performs poorly on unseen datasets, indicating overfitting.

high variance

To reduce ______, one might increase the number of input features, decrease regularization, or use more complex models.

high bias

To reduce ______, one might reduce the number of input features, use a less complex model, increase training data, or increase regularization.

high variance

In Scikit-learn, the ______ argument in functions like RandomForestClassifier allows you to specify the number of CPU cores to use for computationally expensive tasks.

n_jobs

Setting `n_jobs=-1` in Scikit-learn tasks will typically instruct the process to use ______ available on the system.

all of the cores

CPU technology such as ______ can allow a single physical CPU core to function as two logical cores, effectively doubling the number of cores available for parallel processing.

hyper-threading

The classification result FP stands for ______, representing a Type I error.

false positive

The classification result FN stands for ______, representing a Type II error.

false negative

The sum $TN+TP+FN+FP$ in a classification context represents the ______.

total number of instances

In a confusion matrix, $P = TP+FN$ represents the total number of ______ instances.

actual positive

In a confusion matrix, $N = TN+FP$ represents the total number of ______ instances.

actual negative

The F1-score is specifically the ______ of Precision and Recall.

Harmonic mean

An ROC curve for a ______ would go straight up the Y-axis to (0,1) and then straight across to (1,1).

perfect classifier

An ROC curve for a ______ would typically be a diagonal line from (0,0) to (1,1).

random classifier

When using One-Hot Encoding, if a categorical feature has 'n' unique categories, it will be transformed into ______ new binary columns.

Hyperparameters are set ______ training a model, unlike parameters which are learned ______ training.

before

The Mean Absolute Error (MAE) is calculated by taking the average of the ______ differences between predicted and actual values.

absolute

Error rate obtained by evaluating a model on the ______ can be misleadingly optimistic.

training data

The main purpose of a ______ is to tune hyperparameters without touching the final test set.

validation data

Stratified k-fold cross-validation ensures that each fold is ______ of the overall dataset in terms of class distribution.

representative

Bootstrap resampling is done ______ from the original dataset.

with replacement

Underfitting is often characterized by ______ and ______, while overfitting is characterized by ______ and ______.

high bias, low variance, low bias, high variance

The three main computationally expensive centers in machine learning are training models, evaluating models, and ______.

hyperparameter tuning machine learning models

If a CPU has 4 physical cores and supports hyper-threading, it can present as ______ logical cores to the operating system.

eight

A message that is not spam but is assigned to the spam folder is an example of a ______ error in spam detection.

Type I

A message that is spam but appears in the regular folder is an example of a ______ error in spam detection.

Type II

The term 'fallout' in the context of ROC analysis refers to the ______.

False Positive Rate (FPR)

If a model has low bias, it makes ______ about the form of the target function.

fewer assumptions

If a model has high variance, it shows a ______ in the prediction of the target function with changes in the training dataset.

large variation

Using a ______ complex model can help reduce high bias but may increase variance.

Increasing the ______ term is a common strategy to reduce high variance (overfitting).

regularization

When evaluating models, if labeled data is scarce, techniques like ______ are preferred over a simple holdout split.

cross-validation

The primary goal of hyperparameter tuning is to find a set of hyperparameters that results in the best ______ on unseen data.

generalization performance

In the context of performance metrics, $RP = TP+FP$ represents the ______ by the classifier.

total instances predicted as positive

In the context of performance metrics, $RN = TN+FN$ represents the ______ by the classifier.

total instances predicted as negative

The formula for Error rate is ______ / (P+N).

FP+FN

A key assumption for both training and test data is that they are ______ of the underlying problem.

representative samples

If a learning scheme operates in two stages, the first stage typically involves building the ______ of the model.

basic structure

Generally, the larger the training data the better the classifier, although this effect often shows ______.

diminishing returns

The dilemma in splitting data is that ideally, both the training set and the test set should be ______.

large

Cross-validation avoids ______ test sets, which can be an issue with repeated holdout methods.

overlapping

Leave-One-Out cross-validation makes the best use of the data and involves no ______.

random subsampling

A ______ is a subset of available data that is not used for training the model but is used to estimate the model's performance on unseen data during development.

validation set

High bias can occur if the model is too ______ for the complexity of the data.

simple

A model that performs well on the training data but poorly on the test data is said to be ______.

overfitting

The `n_jobs` parameter in scikit-learn is used to enable ______ for certain operations.

parallel computation

The process of breaking a categorical column into n different columns, where n is the number of unique categories, is the first step of ______.

One Hot Encoding

Sensitivity is the proportion of ______ we find.

edges (positive cases)

Specificity is the proportion of ______ that we find.

non-edges (negative cases)

A perfect classifier on an ROC curve would have an AUC value of ______.

A random classifier on an ROC curve would have an AUC value of approximately ______.

0.5

The choice of performance measure in evaluation can include the number of correct classifications, accuracy of probability estimates, or ______.

error in numeric predictions

When tuning hyperparameters, the ______ is used to optimize these settings, while the ______ provides the final, unbiased performance estimate.

validation data, test data

The term ______ refers to the model's inability to capture the underlying trend of the data, often due to an overly simplistic model.

underfitting

The term ______ refers to a model that learns the training data too well, including its noise and outliers, leading to poor performance on new data.

overfitting

Reducing the number of ______ or parameters can be a way to combat high variance.

input features

Increasing the amount of ______ can sometimes help reduce high variance.

training data

The scikit-learn library provides the `n_jobs` argument to control the use of ______ for tasks like model training and hyperparameter tuning.

multiple cores

Hyper-threading is a technology that allows each physical CPU core to handle ______ simultaneously.

multiple threads (typically two)

The total number of instances in a dataset can be represented by the sum ______.

TP + TN + FP + FN

The marginal P in a confusion matrix, calculated as TP + FN, represents the total number of ______ instances in the dataset.

actual positive

The marginal N in a confusion matrix, calculated as TN + FP, represents the total number of ______ instances in the dataset.

actual negative

The marginal RP in a confusion matrix, calculated as TP + FP, represents the total number of instances ______ by the model.

predicted as positive

The marginal RN in a confusion matrix, calculated as TN + FN, represents the total number of instances ______ by the model.

predicted as negative

If a model consistently makes large errors for a specific subgroup of data, it might be suffering from high ______ for that subgroup.

bias

If a model's performance changes drastically with small changes in the training data, it likely has high ______.

variance

______ search for hyperparameter tuning explores pre-defined points in the hyperparameter space.

Grid

______ search for hyperparameter tuning samples hyperparameter combinations randomly from specified distributions.

Random

One advantage of random search over grid search is that it can be more efficient in finding good hyperparameters when some hyperparameters are ______ than others.

more important

The evaluation of a model on the ______ is used to estimate its performance on unseen data.

test set

When using k-fold cross-validation, the model is trained ______ times.

Bootstrap involves resampling ______ from the original dataset to create multiple bootstrap samples.

with replacement

A model that is underfitting has ______ bias and ______ variance.

high, low

A model that is overfitting has ______ bias and ______ variance.

low, high

The `n_jobs` parameter in scikit-learn, when set to -1, typically means to use all ______ CPU cores.

available

Chapter 4 Flashcards

(110 cards)