Chapter 4 Flashcards

(110 cards)

1
Q

In classification, a Type I error, also known as a ______, occurs when a patient who does not have a disease receives a positive diagnostic.

A

false positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A Type II error, or ______, happens when a patient has a disease but it was not detected.

A

false negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The four fundamental outcomes in a binary classification task are True Positive (TP), True Negative (TN), ______, and ______.

A

False Positive (FP),False Negative (FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In a confusion matrix, the sum of True Positives and False Positives (TP + FP) represents the ______.

A

Total Predicted Positives (RP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The formula for Accuracy in classification is (TP + TN) / ______, where P is total actual positives and N is total actual negatives.

A

P + N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Precision, defined as TP / (TP + FP), measures the proportion of ______ among the instances predicted as positive.

A

correctly predicted positive instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Recall, also known as sensitivity or True Positive Rate, is calculated as TP / (TP + FN) and measures the proportion of ______ that were correctly identified.

A

actual positive instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The F1-score is the harmonic mean of ______ and ______, providing a balance between them.

A

Precision,Recall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sensitivity, calculated as TP / (TP + FN), can be thought of as the likelihood of spotting a ______ when presented with one.

A

positive case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Specificity, calculated as TN / (TN + FP), is the likelihood of spotting a ______ when presented with one.

A

negative case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The Receiver Operating Characteristic (ROC) curve plots the True Positive Rate (TPR) against the ______ at various threshold settings.

A

False Positive Rate (FPR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The True Positive Rate (TPR) used in ROC analysis is also known as ______ or ______.

A

recall,sensitivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The False Positive Rate (FPR) used in ROC analysis, calculated as FP(t)/N, is also referred to as ______ or ______.

A

fallout,false alarm rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

To compare different classification algorithms using ROC curves, one typically compares their ______, often abbreviated as AUC.

A

Area Under the Curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Categorical data values that are simply names or labels with no ordering defined, such as gender or color, are known as ______ values.

A

Nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Categorical data values where order does matter, such as t-shirt size or rank, are called ______ values.

A

Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The process of replacing each category in categorical data with a unique number is known as ______.

A

String Indexing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

______ is a technique used to break the inherent ordering within a categorical column by creating new binary columns for each unique category.

A

One Hot Encoding (OHE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The process that aims to optimize a model’s configuration to achieve the best possible performance for a specific problem is called ______.

A

Hyperparameter Tuning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Common techniques for hyperparameter optimization include Manual Search, Grid Search, Random Search, and ______.

A

Bayesian Optimization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

In regression tasks, the ______ is a common metric calculated as the square root of the Mean Squared Error (MSE).

A

root mean squared error (RMSE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The Mean Squared Error (MSE) indicates how close a regression line is to a set of data points by taking the distances or ‘errors’ from the points to the regression line and ______ them.

A

squaring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

A common practice when handling numerical attributes is to assume ______ for these attributes.

A

normal distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Error on the ______ is not a good indicator of a model’s performance on future data.

A

training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
To properly tune parameters and evaluate a model, one should ideally use three distinct datasets: training data, ______, and test data.
validation data, test data
26
The ______ procedure involves splitting the original data into a training set and a test set, for example, reserving 2/3 for training and 1/3 for testing.
Holdout
27
______ is a method where the data is partitioned into k disjoint subsets, and the model is trained on k-1 partitions and tested on the remaining one, repeating this k times.
K-fold Cross-validation
28
A disadvantage of the Holdout method is that its evaluation can have a ______, heavily depending on which data points end up in the training and test sets.
high variance
29
In k-fold cross-validation, every data point gets to be in a test set exactly once and gets to be in a training set ______ times.
k-1
30
A computationally expensive form of cross-validation where the number of folds is equal to the number of training instances is called ______ cross-validation.
Leave-One-Out
31
The performance of a machine learning algorithm on unseen data is referred to as its ______.
generalization error
32
______ is a resampling technique used to estimate the distribution of a dataset by sampling with replacement from the original dataset.
Bootstrap
33
The difference between prediction values made by a model and the actual or expected values is known as ______ or Errors due to bias.
bias errors
34
A model with ______ makes more assumptions and is unable to capture important features of the dataset, performing poorly on new data.
high bias
35
______ errors indicate how much a random variable's prediction differs from its expected value, with high values suggesting the model doesn't generalize well from training data.
Variance
36
A model that shows ______ learns the training dataset very well but performs poorly on unseen datasets, indicating overfitting.
high variance
37
To reduce ______, one might increase the number of input features, decrease regularization, or use more complex models.
high bias
38
To reduce ______, one might reduce the number of input features, use a less complex model, increase training data, or increase regularization.
high variance
39
In Scikit-learn, the ______ argument in functions like RandomForestClassifier allows you to specify the number of CPU cores to use for computationally expensive tasks.
n_jobs
40
Setting `n_jobs=-1` in Scikit-learn tasks will typically instruct the process to use ______ available on the system.
all of the cores
41
CPU technology such as ______ can allow a single physical CPU core to function as two logical cores, effectively doubling the number of cores available for parallel processing.
hyper-threading
42
The classification result FP stands for ______, representing a Type I error.
false positive
43
The classification result FN stands for ______, representing a Type II error.
false negative
44
The sum $TN+TP+FN+FP$ in a classification context represents the ______.
total number of instances
45
In a confusion matrix, $P = TP+FN$ represents the total number of ______ instances.
actual positive
46
In a confusion matrix, $N = TN+FP$ represents the total number of ______ instances.
actual negative
47
The F1-score is specifically the ______ of Precision and Recall.
Harmonic mean
48
An ROC curve for a ______ would go straight up the Y-axis to (0,1) and then straight across to (1,1).
perfect classifier
49
An ROC curve for a ______ would typically be a diagonal line from (0,0) to (1,1).
random classifier
50
When using One-Hot Encoding, if a categorical feature has 'n' unique categories, it will be transformed into ______ new binary columns.
n
51
Hyperparameters are set ______ training a model, unlike parameters which are learned ______ training.
before
52
The Mean Absolute Error (MAE) is calculated by taking the average of the ______ differences between predicted and actual values.
absolute
53
Error rate obtained by evaluating a model on the ______ can be misleadingly optimistic.
training data
54
The main purpose of a ______ is to tune hyperparameters without touching the final test set.
validation data
55
Stratified k-fold cross-validation ensures that each fold is ______ of the overall dataset in terms of class distribution.
representative
56
Bootstrap resampling is done ______ from the original dataset.
with replacement
57
Underfitting is often characterized by ______ and ______, while overfitting is characterized by ______ and ______.
high bias, low variance, low bias, high variance
58
The three main computationally expensive centers in machine learning are training models, evaluating models, and ______.
hyperparameter tuning machine learning models
59
If a CPU has 4 physical cores and supports hyper-threading, it can present as ______ logical cores to the operating system.
eight
60
A message that is not spam but is assigned to the spam folder is an example of a ______ error in spam detection.
Type I
61
A message that is spam but appears in the regular folder is an example of a ______ error in spam detection.
Type II
62
The term 'fallout' in the context of ROC analysis refers to the ______.
False Positive Rate (FPR)
63
If a model has low bias, it makes ______ about the form of the target function.
fewer assumptions
64
If a model has high variance, it shows a ______ in the prediction of the target function with changes in the training dataset.
large variation
65
Using a ______ complex model can help reduce high bias but may increase variance.
more
66
Increasing the ______ term is a common strategy to reduce high variance (overfitting).
regularization
67
When evaluating models, if labeled data is scarce, techniques like ______ are preferred over a simple holdout split.
cross-validation
68
The primary goal of hyperparameter tuning is to find a set of hyperparameters that results in the best ______ on unseen data.
generalization performance
69
In the context of performance metrics, $RP = TP+FP$ represents the ______ by the classifier.
total instances predicted as positive
70
In the context of performance metrics, $RN = TN+FN$ represents the ______ by the classifier.
total instances predicted as negative
71
The formula for Error rate is ______ / (P+N).
FP+FN
72
A key assumption for both training and test data is that they are ______ of the underlying problem.
representative samples
73
If a learning scheme operates in two stages, the first stage typically involves building the ______ of the model.
basic structure
74
Generally, the larger the training data the better the classifier, although this effect often shows ______.
diminishing returns
75
The dilemma in splitting data is that ideally, both the training set and the test set should be ______.
large
76
Cross-validation avoids ______ test sets, which can be an issue with repeated holdout methods.
overlapping
77
Leave-One-Out cross-validation makes the best use of the data and involves no ______.
random subsampling
78
A ______ is a subset of available data that is not used for training the model but is used to estimate the model's performance on unseen data during development.
validation set
79
High bias can occur if the model is too ______ for the complexity of the data.
simple
80
A model that performs well on the training data but poorly on the test data is said to be ______.
overfitting
81
The `n_jobs` parameter in scikit-learn is used to enable ______ for certain operations.
parallel computation
82
The process of breaking a categorical column into n different columns, where n is the number of unique categories, is the first step of ______.
One Hot Encoding
83
Sensitivity is the proportion of ______ we find.
edges (positive cases)
84
Specificity is the proportion of ______ that we find.
non-edges (negative cases)
85
A perfect classifier on an ROC curve would have an AUC value of ______.
1
86
A random classifier on an ROC curve would have an AUC value of approximately ______.
0.5
87
The choice of performance measure in evaluation can include the number of correct classifications, accuracy of probability estimates, or ______.
error in numeric predictions
88
When tuning hyperparameters, the ______ is used to optimize these settings, while the ______ provides the final, unbiased performance estimate.
validation data, test data
89
The term ______ refers to the model's inability to capture the underlying trend of the data, often due to an overly simplistic model.
underfitting
90
The term ______ refers to a model that learns the training data too well, including its noise and outliers, leading to poor performance on new data.
overfitting
91
Reducing the number of ______ or parameters can be a way to combat high variance.
input features
92
Increasing the amount of ______ can sometimes help reduce high variance.
training data
93
The scikit-learn library provides the `n_jobs` argument to control the use of ______ for tasks like model training and hyperparameter tuning.
multiple cores
94
Hyper-threading is a technology that allows each physical CPU core to handle ______ simultaneously.
multiple threads (typically two)
95
The total number of instances in a dataset can be represented by the sum ______.
TP + TN + FP + FN
96
The marginal P in a confusion matrix, calculated as TP + FN, represents the total number of ______ instances in the dataset.
actual positive
97
The marginal N in a confusion matrix, calculated as TN + FP, represents the total number of ______ instances in the dataset.
actual negative
98
The marginal RP in a confusion matrix, calculated as TP + FP, represents the total number of instances ______ by the model.
predicted as positive
99
The marginal RN in a confusion matrix, calculated as TN + FN, represents the total number of instances ______ by the model.
predicted as negative
100
If a model consistently makes large errors for a specific subgroup of data, it might be suffering from high ______ for that subgroup.
bias
101
If a model's performance changes drastically with small changes in the training data, it likely has high ______.
variance
102
______ search for hyperparameter tuning explores pre-defined points in the hyperparameter space.
Grid
103
______ search for hyperparameter tuning samples hyperparameter combinations randomly from specified distributions.
Random
104
One advantage of random search over grid search is that it can be more efficient in finding good hyperparameters when some hyperparameters are ______ than others.
more important
105
The evaluation of a model on the ______ is used to estimate its performance on unseen data.
test set
106
When using k-fold cross-validation, the model is trained ______ times.
k
107
Bootstrap involves resampling ______ from the original dataset to create multiple bootstrap samples.
with replacement
108
A model that is underfitting has ______ bias and ______ variance.
high, low
109
A model that is overfitting has ______ bias and ______ variance.
low, high
110
The `n_jobs` parameter in scikit-learn, when set to -1, typically means to use all ______ CPU cores.
available