Questions exam Flashcards by Ola L.

What should you prioritize first when building a machine learning system?
A) Feature optimization

B) Deployment strategy

C) Simple end-to-end pipeline

D) Ensemble of complex models

How well did you know this?

Not at all

Perfectly

Which of the following reduces overfitting?
A) Increasing model size

B) Regularization

C) Using more features

D) Reducing training data

How well did you know this?

Not at all

Perfectly

. What is the biggest risk when using production data without version control?
A) High latency

B) Data leakage

C) Inability to reproduce models

D) Bias amplification

How well did you know this?

Not at all

Perfectly

Why should you monitor models after deployment?
A) To improve UX

B) To detect model drift

C) To boost server efficiency

D) To make models faster

How well did you know this?

Not at all

Perfectly

What does “data drift” refer to?
A) Changes in model architecture

B) Changes in input data distribution

C) Decrease in training time

D) Increase in model complexity

How well did you know this?

Not at all

Perfectly

In Google’s ML rules, why avoid optimizing one part too early?
A) Because it’s computationally expensive

B) Because it introduces latency

C) Because it may not help overall system performance

D) Because it’s harder to document

How well did you know this?

Not at all

Perfectly

What is a primary reason to keep raw data?
A) To reduce storage cost

B) To train faster

C) To debug and retrain better models later

D) To anonymize user data

How well did you know this?

Not at all

Perfectly

What is concept drift?
A) Feature values change

B) Target meaning changes over time

C) Loss function changes

D) Model structure evolves

How well did you know this?

Not at all

Perfectly

Why are ensembles not recommended at first?
A) They are slower

B) They complicate debugging

C) They cost more money

D) They are not accurate

How well did you know this?

Not at all

Perfectly

When evaluating a classification model, which metric shows balance between precision and recall?
A) AUC

B) Accuracy

C) F1 Score

D) Mean Absolute Error

How well did you know this?

Not at all

Perfectly

Which of these improves model interpretability?
A) Using deep ensembles

B) Adding more layers

C) Using simple models like linear regression

D) Removing regularization

How well did you know this?

Not at all

Perfectly

Why version your datasets?
A) For faster model deployment

B) For reproducibility

C) To save storage space

D) To anonymize user data

How well did you know this?

Not at all

Perfectly

What is the risk of frequent retraining without validation?
A) Over-regularization

B) Catastrophic forgetting

C) Faster convergence

D) Improved precision

How well did you know this?

Not at all

Perfectly

What happens if you don’t monitor input data?
A) Decreased model speed

B) Silent model decay

C) Decrease in training data

D) Increased learning rate

How well did you know this?

Not at all

Perfectly

What’s a cheap way to boost model performance without changing model code?
A) Feature engineering

B) Increase batch size

C) Reduce regularization

D) Add more dropout

How well did you know this?

Not at all

Perfectly

What does a high-variance model suffer from?
A) Underfitting

B) Overfitting

C) Bias error

D) Slow inference

How well did you know this?

Not at all

Perfectly

Why is it bad to optimize offline metrics only?
A) Training takes longer

B) It may not match real-world performance

C) Higher latency

D) Poor AUC

Study These Flashcards

What helps in detecting concept drift?
A) Watching confusion matrix changes over time

B) Increasing learning rate

C) Adding regularization

D) Using larger datasets

Study These Flashcards

When using ensembling, what should you be careful about?
A) Training time

B) Feature selection

C) Inference latency

D) Data splitting

Study These Flashcards

What is a good first action if model performance drops suddenly?
A) Retrain immediately

B) Inspect data for drift

C) Increase regularization

D) Add features

Study These Flashcards

Which metric penalizes large errors more in regression?
A) MAE

B) RMSE

C) R²

D) F1 Score

Study These Flashcards

Why are simpler models preferred early in development?
A) Because they perform better

B) Because they debug and iterate faster

C) Because they scale better

D) Because they always converge

Study These Flashcards

What problem can occur with non-stationary data streams?
A) Concept drift

B) High variance

C) Missing labels

D) Low precision

Study These Flashcards

What kind of latency is most dangerous for ML production systems?
A) Training latency

B) Inference latency

C) Hyperparameter tuning latency

D) Data validation latency

Study These Flashcards

5. Which process helps detect whether your model generalizes well? A) Training on more features B) Cross-validation C) Bagging D) Boosting

26. What’s a typical cause of data leakage? A) Poor model selection B) Including future information in training C) Hyperparameter search D) Random noise

27. What is the biggest danger of overfitting? A) Slow training B) Poor performance on unseen data C) Slow inference D) High variance loss

28. What can help reduce concept drift’s impact? A) Never retrain the model B) Frequent retraining with new data C) Model compression D) Reducing feature space

29. What is the benefit of raw data storage? A) Better model inference B) Ability to re-engineer features C) Faster cloud deployment D) Improved bias correction

30. When should you alert for model failure? A) Only at retraining time B) Only when user complaints occur C) Immediately upon quality drop D) Once a month

2. In Burkov’s ML system design, when should you optimize model hyperparameters? A) After building and validating a working pipeline B) Before collecting data C) Before data validation D) Right after defining the problem

3. Which is a valid reason to split your dataset into Train, Validation, and Test sets? A) To maximize training accuracy B) To avoid data leakage and have honest evaluation C) To reduce model size D) To ensure faster training

1. According to Google's Rules, what should you monitor after model deployment? A) Only model outputs B) Only system latency C) Both input data and model outputs D) Only server CPU usage

4. According to Google's Rules, when building your first ML model, you should: A) Use a simple, end-to-end pipeline even if it's bad B) Train a complex ensemble immediately C) Focus on hyperparameter tuning first D) Skip evaluation to save time

5. In Burkov’s view, what is a key danger of feature engineering? A) Increased model training time B) Hidden data leakage from target variables C) Decreased inference speed D) Overfitting due to smaller models

6. What is a primary signal of overfitting in a model? A) Low training error but high test error B) High training error and high test error C) Low bias D) High precision and high recall

7. According to Google, what should you version during ML development? A) Only final models B) Both data and models C) Only raw features D) Only system architecture diagrams

8. What is the goal of cross-validation in machine learning? A) Reduce model training time B) Make deployment faster C) Estimate model generalization performance reliably D) Increase model complexity

9. Which feature encoding method is safest from a data leakage perspective? A) One-hot encoding B) Mean encoding C) Hashing trick D) Normalization

10. In ML system design (Burkov), after feature engineering and data preparation, your next step is usually: A) Model monitoring B) System scaling C) Model training D) Model rollback

Questions exam Flashcards

(40 cards)