MLE Flashcards
(142 cards)
What are the two types of Quantization in TFX?
Post Training Quantization and Quantization Aware Training.
What is Post Training Quantization?
Post-training quantization is a conversion technique that can reduce model size while also improving CPU and hardware accelerator latency, with little degradation in model accuracy. You can quantize an already-trained float TensorFlow model when you convert it to TensorFlow Lite format using the TensorFlow Lite Converter. It reduces the model size by integer casting, reduced float precision or using a dynamic range quantization.
What is Quantization Aware Training?
Training that reduces the space of a model by reducing the bit precision of training weights in a neural network. It is more accurate than post training quantization but it is harder to use and requires full retraining.
What are the four available options for model training on Vertex AI?
1.) AutoML
2.) Custom Training
3.) Model Garden
4.) Generative AI
What is AutoML? When is it recommended?
AutoML is a no code solution for training tabular, image, text, or video data without preparing data splits. It is best for:
- Automatically tuning a model with some input data.
- Teams that have little to know coding experience
- Teams that want to quickly get a model running.
-Teams that do not want control of hyperparameter tuning aside from early stopping.
- Teams that are solving a problem in the defined problem types offered.
- Models served on an edge device or on google cloud.
- Models with latency greater than 100ms.
-
What is BigQueryML? When is it recommended?
BigQueryML is Google’s built in ML commands to create ML directly in BQ. It is recommended for:
- Those comfortable in SQL.
- Those with data already in BQ
- Those whose problems are covered by BQ’s model set.
What is custom training? When is it recommended?
Custom training is complete freedom to optimize all aspects of an ML pipeline. It is recommended for:
- Problems outside the scope of BQML and AutoML.
- Problems that are already written in code from another premesis.
What are the three custom training methods on Vertex AI?
1.) Custom Jobs
2.) Hyper-parameter tuning jobs
3.) Training pipelines
What are custom training custom jobs?
A basic way to run a custom machine learning model on Vertex AI. It needs a pre-built or custom container to run in.
What are custom training hyperparameter tuning jobs?
This runs multiple trials of custom jobs to tune hyperparameters. It requires, a metric to evaluate performance against, a maximum number of trials to perform, a maximum number of parallel trials, the maximum number of jobs that can fail, the machine type and any accelerators (GPUs/TPUs it uses), the custom or pre-built container information it is using.
What is a custom training, training pipeline?
A training pipeline can run a custom job or hyperparameter tuning job and outputs your model to a google cloud storage bucket.
What frameworks have pre-built containers for training?
TensorFlow, XGBoost, Scikit-Learn, Pytorch
What is model garden?
Model Garden in the Google Cloud console is an ML model library that helps you discover, test, customize, and deploy Google proprietary and select OSS models and assets. Many of these are pretriained and allow fine tuning/ transfer learning to customize to a nearby solution.
What is fine tuning and transfer learning? When should one be used over the other?
Transfer learning is the process of retraining the final layers of a pre-trained model. Fine tuning is an extension of transfer learning that allows retraining of the weights of a model as well. Fine tuning is recommended for larger datasets while transfer learning is for smaller ones as it is more likely to overfit.
What is the AutoML workflow?
1.) Prepare your training data.
2.) Create a dataset.
3.) Train a model.
4.) Evaluate and iterate on your model.
5.) Get predictions from your model.
6.) Interpret prediction results.
What are best practices for ML Environment Setup in custom training?
1.) Use Vertex AI Workbench notebooks for experimentation and development.
2.) Create notebook instances for each team member.
3.) Store ML resources the same just like datasets with IAM permisisons.
4.) Use Vertex AI SDK for Python.
What are best practices for ML Development in custom training?
1.) Store structured and semi-structured data in BigQuery.
2.) Store image, video, audio and unstructured data on Cloud Storage.
3.) Use Vertex AI Data Labeling for unstructured data.
4.) Use Vertex AI Feature Store with structured data.
5.) Avoid storing data in block storage.
6.) Use Vertex AI TensorBoard and Vertex AI Experiments for analyzing experiments.
7.) Train a model within a notebook instance for small datasets.
8.) Maximize your model’s predictive accuracy with hyperparameter tuning.
9.) Use feature attributions (importances) to gain insights into model predictions.
What are best practices for Data Processing in custom training?
- ) Use BigQuery to process structured and semi-structured data or if data is in BQ already.
2.) Use Dataflow to process data.
3.) Use Dataproc for serverless Spark data processing.
What are best practices for operationalized training in custom training?
1.) Run code in a managed service like Vertex AI training (container based solutions with task.py file) or Vertex AI pipelines.
2.) Operationalize job execution with training pipelines.
3.) Use training checkpoints to save the current state of your experiment.
4.) Prepare model artifacts for serving in Cloud Storage.
5.) Regularly compute new feature values and push them to feature store.
What is operationalized training?
Operationalized training refers to the process of making model training repeatable, tracking repetitions, and managing performance.
What is Dataproc?
A managed Apache Spark/Hadoop service that allows batch processing, querying, streaming and ML.
What is Dataflow?
Data flow is a serverless service built on Apache Beam for setting up automated data processing pipelines. It can be used with TFX and Kubeflow Pipelines as they have integrated DataFlow runners. Since Vertex AI Pipelines support both, it can also be used there.
What is Vertex AI TensorBoard?
A tool for measuring and visualizing aspects of a TF ML workflow.
What is a Vertex AI Managed Dataset?
Vertex AI offers a central repo for datasets which can be used for AutoML and custom models on Vertex AI. It accepts Image, Tabular, Text and Video data