8. Model Training and Hyperparameter Tuning Flashcards

1
Q

What does Google Cloud analytics portfolio have?

A

Collect: Pub/Sub, Datastream, Data Transfer Service
Process: Dataflow, Dataproc, Data Fusion, Composer, Dataprep
Store: Cloud SQL, Spanner, Bigtable, Firestore, Memorystore
Analyze: BigQuery, BI Engine, BQML, Data QnA, Google Storage, MultiCloud
Activate: Vertex AI, Looker, 3rd BI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Pub/Sub?

A

Pub/Sub is a serverless scalable service for messaging and real‐time analytics. You can directly stream data from a third party to BigQuery using Pub/Sub.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Datastream?

A

Datastream is a serverless and easy‐to‐use change data capture (CDC) and replication service.
It allows you to synchronize data across heterogeneous databases and applications with minimal latency and downtime.
Datastream supports streaming from Oracle and MySQL databases into Cloud Storage.
Datastream is integrated with Dataflow, and it leverages Dataflow templates to load data into BigQuery, Cloud Spanner, and Cloud SQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is BigQuery Data Transfer Service?

A

You can load data from the following sources to BigQuery: Data warehouses such as Teradata and Amazon Redshift
External cloud storage provider Amazon S3
Google software as a service (SaaS) apps such as Cloud Storage, Google Ads, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Cloud Dataflow?

A

Cloud Dataflow is a serverless, fully managed data processing or ETL service to process streaming and batch data. Dataflow used Apache Beam.
It allows you to build pipelines, monitor their execution, and transform and analyze data.
It allows you to process and read data from source Google Cloud data services to sinks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Cloud Data Fusion?

A

It is a UI‐based ETL tool with no code implementation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Cloud Dataproc?

A

Dataproc is a fully managed and highly scalable service for running Apache Spark, Apache Flink, Presto, and 30+ open‐source tools and frameworks.
Dataproc lets you do batch processing, querying, streaming, and machine learning.
Dataproc automation helps you create clusters quickly, manage them easily, and turn them off when not use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What integrations do Dataproc have with Google Cloud Platform?

A

BigQuery, Cloud Storage, Cloud Bigtable, Cloud Logging, and Cloud Monitoring.
They provide a complete data platform. You can use Dataproc to do ETL.
Dataproc uses the Hadoop Distributed File System (HDFS) for storage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Cloud Composer?

A

It is a managed data workflow orchestration service allowing you to author, schedule and monitor pipelines.
It is built on Apache Airflow and pipelines are configured as directed acyclic graphs.
It supports hybrid and multicloud architecture.
It provides end-to-end integration with Google Cloud products.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the Dataproc connectors?

A

Cloud Storage connector: Run Apache Hadoop or Apache Spark jobs directly on data in Cloud Storage.
BigQuery connector: Enable Spark and Hadoop applications to process data from BigQuery and write data to BigQuery.
BigQuery Spark connector: Support reading and writing to and from BigQuery to Spark’s DataFrames.
Cloud Bigtable with Dataproc: Use Bigtable with Dataproc
Pub/Sub Lite Spark connector: Support Pub/Sub Lite as a input source to Apache Spark Structured Streaming.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Cloud Dataprep?

A

It is a UI-based ETL tool for structured and unstructured data for analysis, reporting and machine learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are different types of data processing tools best used for?

A

Dataflow: Unified streaming and batch workload need customization
Data Fusion: Managed batch and real-time pipelines from hybrid sources
Dataproc: Lift and shift Hadoop workloads from on premise
Dataprep: Ad hoc analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is data storage guidance on GCP for machine learning?

A

Tabular data: BigQuery, BigQuery ML
Image, video, audio, unstructured data: Cloud Storage
Unstructured data: Vertex Data Labeling
Structured data: Vertex AI Feature Store
For AutoML image, video, text: Vertex AI Managed Datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

You should not store data in …

A

Block storage (Network File System) and VMs. Avoid reading data directly from databases like CloudSQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When should you store data as sharded TFRecord files and Avro files?

A

Sharded TFRecord files for Tensorflow and Avro files for other framework.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to improve read and write throughput to Cloud Storage if you have image, video, audio and unstructured data.

A

Combining individual files into large files at least 100MB and between 100-10000 shard.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is TensorFlow I/O?

A

TF I/O manages data in Parquet format for TensorFlow training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Vertex AI Workbench?

A

You can create Jupyter Notebook to train, tune and deploy models using Vertex AI Workbench.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is user-managed notebook?

A

You have more control but fewer features.
Custom container
Use one framework (from all supported framework)
VPC + other networking and security features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is managed notebook?

A

It comes with more features:
Automatic shutdown
UI integration with Cloud Storage and BigQuery
Automated run
Custom container
Dataproc or Serverless Spark integration
All frameworks preinstalled
VPC support

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Why don’t you need a large hardware to develop code in Jupyterlab?

A

You perform training and prediction using Vertex AI training and Prediction SDKs. The APIs and SDKs create a training container outside the JupyterLab environment. It creates a prediction container and host it (endpoint).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the two types of trainings supported by Vertex AI?

A

AutoML: Minimal technical effort
Custom training: complete control

21
Q

What are the options to create training jobs in Vertex AI?

A

Training pipelines: This is primary training workflow to create an AutoML model or a custom model. Custom models require custom jobs (work pool, machine types, settings, containers) and hyperparameter tuning.

22
Q

What frameworks do Vertex AI support for prebuilt container training?

A

PyTorch, TensorFlow, Scikit and XGBoost

23
Q

What are the two types of Vertex AI training dataset?

A

No Managed Dataset: Cloud Storage and BigQuery
Managed Dataset

24
Q

What are the advantages of managed dataset?

A

Central location
Easily create labels and multiple annotation sets
Create tasks for human labeling
Track lineage
Compare model performance (AutoML vs custom models)
Generate statistics and visualizations
Automatically split data into different sets

25
Q

What are the steps to set up a training with pre-built container?

A

Put setup.py (specify your libraries) in the root and task.py in the training folder.
Upload your training code as Python source distribution to a Cloud Storage bucket before training.

26
Q

What is custom containers in Vertex AI?

A

It is a Docker image that you create to run your training application.

26
Q

What are the advantages of custom containers in Vertex AI?

A

Faster start-up time (dependencies pre-installed)
Use the ML framework and version of your choice
Extended support for distributed training

27
Q

How do you create a custom container?

A

Create a custom container and training file
Build and run your Docker container
Push the container image to Artifact Registry
Start training by creating a custom job

28
Q

What is the structure of distributed training?

A

Cluster: Vertex AI allocates a cluster of machines based on your specifications.
Replica: Each running job on a node is called a replica.
Worker Pool: A group of replicas with the same configuration is called a worker pool.
Roles: Each replica has a role, like Primary (manages others) or Worker (performs training).

29
Q

What are worker pool tasks in distributed training?

A

workPoolSpecs[0]: Master manages the others and reports status for the job
workPoolSpecs[1]: Worker does its portion of work
workPoolSpecs[2]: Parameter server store parameters to coordinate shared model status between workers Redaction server increase throughput and reduce latency.
workPoolSpecs[3]: Evaluator to evaluate your model.

30
Q

Why are hyperparameters important?

A

Hyperparameters heavily influence the behaviour of the learned model.

30
Q

What is hyperparameter?

A

Hyperparameters are parameters of the training algorithm that are not learned directly from the training process.

31
Q

What are the three commonly used search algorithm options?

A

Grid search (long): Exhaustively search through a manually specified set of hyperparameters
Random search (doesn’t use prior experiments): Randomly search from a set of combinations
Bayesian search (use past evaluations): Use Gaussian Process Bandits

32
Q

How to speed up hyperparameter optimization?

A

Use a simple validation set if you have a large data set.
Use distributed training
Pre-compute or cache the results of computations
Decrease the number of hyperparameters for grid search

33
Q

How does hyperparameter tuning work?

A

it works by running multiple trials of your training application with values for the hyperparameters you specify.

34
Q

What are the steps to configure a hyperparameter tuning job using CLI with custom jobs?

A

For a custom container, install the cloud-ml hypertune Python package in your Dockerfile.
Add hyperparameter tuning code to the task.py file
Build and push the container to Artifact Registry

35
Q

What are the steps to create a hyperparameter tuning job?

A

Create a YAML file specifying a set of hyperparameters
Run a shell command to create a custom job to start tuning

36
Q

What are the use cases of Vizier?

A

Optimize hyperparameters for neural network
Optimize usability of an application
Minimize computing resources for a job
Optimize the amounts of ingredients

37
Q

What do you use to debug problems with your training code or Vertex AI configuration?

A

Use an interactive shell to run tracking and profiling tools, analyze GPU usage, and check GCP permissions available in the container.

37
Q

What is Vertex AI Vizer?

A

It is a black-box optimization service that helps you tune hyperparameters.

37
Q

What are the criteria to use Vizier to train ML models?

A

Meet one of the following criteria:
You don’t have a known objective function to evaluate.
It is too costly to evaluate by using the objective function
It can also perform other optimization tasks, e.g., tune model parameters

38
Q

Can Vizier be used for ML and non-ML cases?

A

Yes

39
Q

What is utilized by custom training for tuning its hyperparameters?

A

Vertex AI Vizier is a built-in feature for hyperparameter tuning for custom training.

40
Q

What are the tools available for tracking metrics or profile training metrics?

A

py-spy for visualizing Python execution (time spent)
nvidia-smi and nvprof to monitor GPU utilization and to collect GPU profiling information.
Perf analyze the performance of your training node (Linux profiling)

41
Q

What is Vertex AI TensorBoard?

A

Monitor and optimize your model training performance by providing resource consumption of training operations.

42
Q

What is What-If Tool?

A

Inspect AI Platform prediction models through an interactive dashboard.

43
Q

What is data drift?

A

It is a change in the statistical distribution of production data from the baseline data used to train or build the model.

44
Q

What is concept drift?

A

the statistical properties of the target variable change over time

44
Q

What tool do you use to monitor drift?

A

Vertex AI Model Monitoring

45
Q

What are the retraining strategies?

A

Periodic training
Performance-based trigger
Data changes trigger
Retraining on demand

46
Q

What are the ways of testing for model training and serving?

A

Unit test: Model output shape, output ranges, decrease in loss in a gradient step, make assertions, check label leakage
Test updates in API call: Test retraining API call
Test for algorithm correctness: Train for a few iterations and verify loss decreases, train without regularization (loss should be close to 0), test specific subcomputations of your algorithm