lecture 3 Flashcards

(30 cards)

1
Q

What are the basic steps in offline machine learning?

A
  1. Abstract the problem to a standard task (Classification, Regression, etc.). 2. Choose instances and features. 3. Choose a model class. 4. Search for a good model.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is binary classification?

A

A classification task with two classes: positive and negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is classification error?

A

The proportion of misclassified examples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is classification accuracy?

A

The proportion of correctly classified examples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why do we compare models?

A

To determine the best model for production use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an example of hyperparameter tuning in kNN?

A

Choosing the number of neighbors (k).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the simplest way to compare two classifiers?

A

Train both, compute their errors, and pick the one with the lowest error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is evaluating on training data misleading?

A

Because the model may overfit, performing well on training data but poorly on unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the purpose of a test set?

A

To evaluate model performance on unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the recommended minimum size for a test set?

A

At least 500 examples; ideally 10,000 or more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the danger of testing many models on the same test set?

A

Overfitting to the test set due to multiple testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is overfitting in model selection?

A

Choosing a model that performs well on a specific test set but generalizes poorly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the modern approach to model evaluation?

A
  1. Split data into train and test sets. 2. Choose model and hyperparameters using training data. 3. Test the model only once on test data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why shouldn’t test data be reused?

A

Reusing test data leads to selecting the wrong model and inflating performance estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the purpose of a validation set?

A

To tune model hyperparameters without using the test set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is cross-validation?

A

A technique where training data is split into multiple subsets (folds) to validate model performance.

17
Q

What is walk-forward validation used for?

A

For time-series data, ensuring training data precedes test data chronologically.

18
Q

What is the difference between validation and evaluation?

A

Evaluation simulates production, validation simulates evaluation.

19
Q

What are common hyperparameter tuning methods?

A

Trial-and-error (intuition), grid search, and random search.

20
Q

Why is random search often better than grid search?

A

Random search explores more parameter values efficiently in high-dimensional spaces.

21
Q

Why is statistical testing controversial in ML?

A

Large datasets often make statistical tests unnecessary, and replication is the best validation.

22
Q

What is the difference between true accuracy and sample accuracy?

A

True accuracy is the actual probability of correct classification, while sample accuracy is the proportion of correctly classified test samples.

23
Q

What does a confidence interval represent?

A

The range within which the true metric likely falls in repeated experiments.

24
Q

What is the impact of test set size on confidence intervals?

A

Larger test sets produce narrower confidence intervals, increasing reliability.

25
What is Alpaydin’s 5x2 F test used for?
To test statistical significance when test sets are small.
26
What is the standard error of the mean (SEM)?
A measure of how much sample means vary from the true mean.
27
What is the 95% confidence interval formula for the mean?
Mean ± 1.96 × SEM.
28
What are common meanings of error bars?
They can represent standard deviation, standard error, or confidence intervals.
29
What does overlap in error bars indicate?
If error bars overlap, the difference between models is likely not statistically significant.
30
What should you avoid when interpreting confidence intervals?
Saying the probability that the true mean is in the interval is 95%. Instead, say that in 95% of repeated experiments, the true mean would fall in the interval.