Lecture 2 - End-to-End ML Project Flashcards by Daniel Cortild

What are the 8 steps of a complete ML project?

Big picture, 2. Get data, 3. Discover/visualize, 4. Prepare data, 5. Select/train model, 6. Fine-tune, 7. Present solution, 8. Maintain system.

How well did you know this?

Not at all

Perfectly

If one had to summarize an ML project into 3 steps, what would they be?

Preparation, Training, Deployment.

How well did you know this?

Not at all

Perfectly

What is the Big Picture step in ML project preparation?

It involves understanding the real-world problem, defining the mechanism, and identifying the learning problem.

How well did you know this?

Not at all

Perfectly

What is emphasized in the Big Picture step of ML?

The goal is not to build a model, but to understand and define the real-world problem.

How well did you know this?

Not at all

Perfectly

What does problem classification mean in the Big Picture step?

It is identifying whether the problem is classification, regression, supervised, etc.

How well did you know this?

Not at all

Perfectly

Why is measurement important in the Big Picture step?

Because it determines how outcomes are quantified and learned.

How well did you know this?

Not at all

Perfectly

What are the 4 typical issues in the Get Data step?

Nonrepresentative data, 2. Poor-quality data, 3. Irrelevant features, 4. Overfitting/Underfitting.

How well did you know this?

Not at all

Perfectly

What can be done to address poor-quality data?

Remove outliers, fill in missing values (imputation), or remove features/instances.

How well did you know this?

Not at all

Perfectly

Which 3 steps make up the ML pipeline?

Prepare data, Train model, Fine-tune model.

How well did you know this?

Not at all

Perfectly

Which 3 steps make up the preparation part of an ML project?

Big picture, Get data, Discover/visualize.

How well did you know this?

Not at all

Perfectly

What does deployment mean in ML?

Making the model available for use, and monitoring its performance.

How well did you know this?

Not at all

Perfectly

Which 2 steps make up the deployment phase?

Present solution and Maintain system.

How well did you know this?

Not at all

Perfectly

What is a pipeline in ML?

A sequence of automated steps that includes preprocessing, training, and evaluation.

How well did you know this?

Not at all

Perfectly

What is standardization?

Scaling data so that it has zero mean and unit variance.

How well did you know this?

Not at all

Perfectly

What is normalization?

Scaling data to fit within a specific range, usually [0,1].

How well did you know this?

Not at all

Perfectly

What is a problem with normalization?

Study These Flashcards

It is sensitive to outliers.

What is an alternative to normalization that is robust to outliers?

Study These Flashcards

Robust scaling.

What is robust scaling?

Study These Flashcards

A method that uses interquartile range to scale features, reducing sensitivity to outliers.

What is categorical feature encoding?

Study These Flashcards

Transforming categorical variables into numerical format for ML models.

What is one-hot encoding?

Study These Flashcards

Encoding categorical variables as binary vectors with a single high bit.

What is imputation?

Study These Flashcards

The process of filling in missing values based on known data.

What are types of imputation algorithms?

Study These Flashcards

Mean/median imputation and kNN imputation.

What is oversampling/undersampling?

Study These Flashcards

Techniques to handle imbalanced data by adjusting the class distributions.

What are SMOTE and ADASYN?

Study These Flashcards

Oversampling techniques that generate synthetic data points between existing minority samples.

What is evaluation of a model?

Assessing model performance using unseen data.

What is a good alternative to a hold-out test?

Cross-validation.

When would you use nested k-fold cross validation?

When you need to tune hyperparameters and evaluate the model performance reliably.

What is grid search?

A method to exhaustively search through a specified set of hyperparameters.

What is randomized search?

A method to randomly sample hyperparameter combinations for evaluation.

What are ensemble models?

Models that combine multiple learners to improve performance.

What are two kinds of ensemble models?

Bagging and Boosting.

Lecture 2 - End-to-End ML Project Flashcards

(31 cards)