Questions exam Flashcards

(40 cards)

1
Q
  1. What should you prioritize first when building a machine learning system?
    A) Feature optimization

B) Deployment strategy

C) Simple end-to-end pipeline

D) Ensemble of complex models

A

C)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. Which of the following reduces overfitting?
    A) Increasing model size

B) Regularization

C) Using more features

D) Reducing training data

A

B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

. What is the biggest risk when using production data without version control?
A) High latency

B) Data leakage

C) Inability to reproduce models

D) Bias amplification

A

C)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. Why should you monitor models after deployment?
    A) To improve UX

B) To detect model drift

C) To boost server efficiency

D) To make models faster

A

B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. What does “data drift” refer to?
    A) Changes in model architecture

B) Changes in input data distribution

C) Decrease in training time

D) Increase in model complexity

A

B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. In Google’s ML rules, why avoid optimizing one part too early?
    A) Because it’s computationally expensive

B) Because it introduces latency

C) Because it may not help overall system performance

D) Because it’s harder to document

A

C)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. What is a primary reason to keep raw data?
    A) To reduce storage cost

B) To train faster

C) To debug and retrain better models later

D) To anonymize user data

A

C)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. What is concept drift?
    A) Feature values change

B) Target meaning changes over time

C) Loss function changes

D) Model structure evolves

A

B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. Why are ensembles not recommended at first?
    A) They are slower

B) They complicate debugging

C) They cost more money

D) They are not accurate

A

B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  1. When evaluating a classification model, which metric shows balance between precision and recall?
    A) AUC

B) Accuracy

C) F1 Score

D) Mean Absolute Error

A

C)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
  1. Which of these improves model interpretability?
    A) Using deep ensembles

B) Adding more layers

C) Using simple models like linear regression

D) Removing regularization

A

C)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  1. Why version your datasets?
    A) For faster model deployment

B) For reproducibility

C) To save storage space

D) To anonymize user data

A

B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
  1. What is the risk of frequent retraining without validation?
    A) Over-regularization

B) Catastrophic forgetting

C) Faster convergence

D) Improved precision

A

B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
  1. What happens if you don’t monitor input data?
    A) Decreased model speed

B) Silent model decay

C) Decrease in training data

D) Increased learning rate

A

B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
  1. What’s a cheap way to boost model performance without changing model code?
    A) Feature engineering

B) Increase batch size

C) Reduce regularization

D) Add more dropout

A

A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
  1. What does a high-variance model suffer from?
    A) Underfitting

B) Overfitting

C) Bias error

D) Slow inference

A

B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q
  1. Why is it bad to optimize offline metrics only?
    A) Training takes longer

B) It may not match real-world performance

C) Higher latency

D) Poor AUC

18
Q
  1. What helps in detecting concept drift?
    A) Watching confusion matrix changes over time

B) Increasing learning rate

C) Adding regularization

D) Using larger datasets

19
Q
  1. When using ensembling, what should you be careful about?
    A) Training time

B) Feature selection

C) Inference latency

D) Data splitting

20
Q
  1. What is a good first action if model performance drops suddenly?
    A) Retrain immediately

B) Inspect data for drift

C) Increase regularization

D) Add features

21
Q
  1. Which metric penalizes large errors more in regression?
    A) MAE

B) RMSE

C) R²

D) F1 Score

22
Q
  1. Why are simpler models preferred early in development?
    A) Because they perform better

B) Because they debug and iterate faster

C) Because they scale better

D) Because they always converge

23
Q
  1. What problem can occur with non-stationary data streams?
    A) Concept drift

B) High variance

C) Missing labels

D) Low precision

24
Q
  1. What kind of latency is most dangerous for ML production systems?
    A) Training latency

B) Inference latency

C) Hyperparameter tuning latency

D) Data validation latency

25
5. Which process helps detect whether your model generalizes well? A) Training on more features B) Cross-validation C) Bagging D) Boosting
B)
26
26. What’s a typical cause of data leakage? A) Poor model selection B) Including future information in training C) Hyperparameter search D) Random noise
B)
27
27. What is the biggest danger of overfitting? A) Slow training B) Poor performance on unseen data C) Slow inference D) High variance loss
B)
28
28. What can help reduce concept drift’s impact? A) Never retrain the model B) Frequent retraining with new data C) Model compression D) Reducing feature space
B
29
29. What is the benefit of raw data storage? A) Better model inference B) Ability to re-engineer features C) Faster cloud deployment D) Improved bias correction
B)
30
30. When should you alert for model failure? A) Only at retraining time B) Only when user complaints occur C) Immediately upon quality drop D) Once a month
C)
31
2. In Burkov’s ML system design, when should you optimize model hyperparameters? A) After building and validating a working pipeline B) Before collecting data C) Before data validation D) Right after defining the problem
A)
32
3. Which is a valid reason to split your dataset into Train, Validation, and Test sets? A) To maximize training accuracy B) To avoid data leakage and have honest evaluation C) To reduce model size D) To ensure faster training
B)
33
1. According to Google's Rules, what should you monitor after model deployment? A) Only model outputs B) Only system latency C) Both input data and model outputs D) Only server CPU usage
C)
34
4. According to Google's Rules, when building your first ML model, you should: A) Use a simple, end-to-end pipeline even if it's bad B) Train a complex ensemble immediately C) Focus on hyperparameter tuning first D) Skip evaluation to save time
A)
35
5. In Burkov’s view, what is a key danger of feature engineering? A) Increased model training time B) Hidden data leakage from target variables C) Decreased inference speed D) Overfitting due to smaller models
B)
36
6. What is a primary signal of overfitting in a model? A) Low training error but high test error B) High training error and high test error C) Low bias D) High precision and high recall
A)
37
7. According to Google, what should you version during ML development? A) Only final models B) Both data and models C) Only raw features D) Only system architecture diagrams
B)
38
8. What is the goal of cross-validation in machine learning? A) Reduce model training time B) Make deployment faster C) Estimate model generalization performance reliably D) Increase model complexity
C)
39
9. Which feature encoding method is safest from a data leakage perspective? A) One-hot encoding B) Mean encoding C) Hashing trick D) Normalization
A)
40
10. In ML system design (Burkov), after feature engineering and data preparation, your next step is usually: A) Model monitoring B) System scaling C) Model training D) Model rollback
C)