Lecture 2 - End-to-End ML Project Flashcards

(31 cards)

1
Q

What are the 8 steps of a complete ML project?

A
  1. Big picture, 2. Get data, 3. Discover/visualize, 4. Prepare data, 5. Select/train model, 6. Fine-tune, 7. Present solution, 8. Maintain system.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If one had to summarize an ML project into 3 steps, what would they be?

A

Preparation, Training, Deployment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Big Picture step in ML project preparation?

A

It involves understanding the real-world problem, defining the mechanism, and identifying the learning problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is emphasized in the Big Picture step of ML?

A

The goal is not to build a model, but to understand and define the real-world problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does problem classification mean in the Big Picture step?

A

It is identifying whether the problem is classification, regression, supervised, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is measurement important in the Big Picture step?

A

Because it determines how outcomes are quantified and learned.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 4 typical issues in the Get Data step?

A
  1. Nonrepresentative data, 2. Poor-quality data, 3. Irrelevant features, 4. Overfitting/Underfitting.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What can be done to address poor-quality data?

A

Remove outliers, fill in missing values (imputation), or remove features/instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which 3 steps make up the ML pipeline?

A

Prepare data, Train model, Fine-tune model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which 3 steps make up the preparation part of an ML project?

A

Big picture, Get data, Discover/visualize.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does deployment mean in ML?

A

Making the model available for use, and monitoring its performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which 2 steps make up the deployment phase?

A

Present solution and Maintain system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a pipeline in ML?

A

A sequence of automated steps that includes preprocessing, training, and evaluation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is standardization?

A

Scaling data so that it has zero mean and unit variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is normalization?

A

Scaling data to fit within a specific range, usually [0,1].

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a problem with normalization?

A

It is sensitive to outliers.

17
Q

What is an alternative to normalization that is robust to outliers?

A

Robust scaling.

18
Q

What is robust scaling?

A

A method that uses interquartile range to scale features, reducing sensitivity to outliers.

19
Q

What is categorical feature encoding?

A

Transforming categorical variables into numerical format for ML models.

20
Q

What is one-hot encoding?

A

Encoding categorical variables as binary vectors with a single high bit.

21
Q

What is imputation?

A

The process of filling in missing values based on known data.

22
Q

What are types of imputation algorithms?

A

Mean/median imputation and kNN imputation.

23
Q

What is oversampling/undersampling?

A

Techniques to handle imbalanced data by adjusting the class distributions.

24
Q

What are SMOTE and ADASYN?

A

Oversampling techniques that generate synthetic data points between existing minority samples.

25
What is evaluation of a model?
Assessing model performance using unseen data.
26
What is a good alternative to a hold-out test?
Cross-validation.
27
When would you use nested k-fold cross validation?
When you need to tune hyperparameters and evaluate the model performance reliably.
28
What is grid search?
A method to exhaustively search through a specified set of hyperparameters.
29
What is randomized search?
A method to randomly sample hyperparameter combinations for evaluation.
30
What are ensemble models?
Models that combine multiple learners to improve performance.
31
What are two kinds of ensemble models?
Bagging and Boosting.