Questions exam Flashcards
(40 cards)
- What should you prioritize first when building a machine learning system?
A) Feature optimization
B) Deployment strategy
C) Simple end-to-end pipeline
D) Ensemble of complex models
C)
- Which of the following reduces overfitting?
A) Increasing model size
B) Regularization
C) Using more features
D) Reducing training data
B)
. What is the biggest risk when using production data without version control?
A) High latency
B) Data leakage
C) Inability to reproduce models
D) Bias amplification
C)
- Why should you monitor models after deployment?
A) To improve UX
B) To detect model drift
C) To boost server efficiency
D) To make models faster
B)
- What does “data drift” refer to?
A) Changes in model architecture
B) Changes in input data distribution
C) Decrease in training time
D) Increase in model complexity
B)
- In Google’s ML rules, why avoid optimizing one part too early?
A) Because it’s computationally expensive
B) Because it introduces latency
C) Because it may not help overall system performance
D) Because it’s harder to document
C)
- What is a primary reason to keep raw data?
A) To reduce storage cost
B) To train faster
C) To debug and retrain better models later
D) To anonymize user data
C)
- What is concept drift?
A) Feature values change
B) Target meaning changes over time
C) Loss function changes
D) Model structure evolves
B)
- Why are ensembles not recommended at first?
A) They are slower
B) They complicate debugging
C) They cost more money
D) They are not accurate
B)
- When evaluating a classification model, which metric shows balance between precision and recall?
A) AUC
B) Accuracy
C) F1 Score
D) Mean Absolute Error
C)
- Which of these improves model interpretability?
A) Using deep ensembles
B) Adding more layers
C) Using simple models like linear regression
D) Removing regularization
C)
- Why version your datasets?
A) For faster model deployment
B) For reproducibility
C) To save storage space
D) To anonymize user data
B)
- What is the risk of frequent retraining without validation?
A) Over-regularization
B) Catastrophic forgetting
C) Faster convergence
D) Improved precision
B)
- What happens if you don’t monitor input data?
A) Decreased model speed
B) Silent model decay
C) Decrease in training data
D) Increased learning rate
B)
- What’s a cheap way to boost model performance without changing model code?
A) Feature engineering
B) Increase batch size
C) Reduce regularization
D) Add more dropout
A)
- What does a high-variance model suffer from?
A) Underfitting
B) Overfitting
C) Bias error
D) Slow inference
B)
- Why is it bad to optimize offline metrics only?
A) Training takes longer
B) It may not match real-world performance
C) Higher latency
D) Poor AUC
B)
- What helps in detecting concept drift?
A) Watching confusion matrix changes over time
B) Increasing learning rate
C) Adding regularization
D) Using larger datasets
A)
- When using ensembling, what should you be careful about?
A) Training time
B) Feature selection
C) Inference latency
D) Data splitting
C)
- What is a good first action if model performance drops suddenly?
A) Retrain immediately
B) Inspect data for drift
C) Increase regularization
D) Add features
B)
- Which metric penalizes large errors more in regression?
A) MAE
B) RMSE
C) R²
D) F1 Score
B)
- Why are simpler models preferred early in development?
A) Because they perform better
B) Because they debug and iterate faster
C) Because they scale better
D) Because they always converge
B)
- What problem can occur with non-stationary data streams?
A) Concept drift
B) High variance
C) Missing labels
D) Low precision
A)
- What kind of latency is most dangerous for ML production systems?
A) Training latency
B) Inference latency
C) Hyperparameter tuning latency
D) Data validation latency
B)