Sagemaker Flashcards

Question 1

Q

What is Image_URIs module on Sagemaker SDK?

Answer

A

Functions for generating ECR image URIs for pre-built SageMaker Docker images.

Question 2

Q

Why should one use the image_uris module in SageMaker?

Answer

A

It provides pre-built, optimized Docker image URIs for ML frameworks (e.g., PyTorch, TensorFlow) and algorithms, ensuring easy setup, consistent environments, and seamless AWS integration without manual configuration.

Question 3

Q

What happens when one changes the instance_type input for image_uris?

Answer

A

The retrieved container image may change to ensure the correct dependencies and performance optimizations for the chosen hardware.

Question 4

Q

What is one benefit of using SageMaker over traditional VM setup?

Answer

A

SageMaker automates VM provisioning, so you don’t need to manually launch and configure instances.

Question 5

Q

How does SageMaker handle ML environments?

Answer

A

It provides pre-configured Docker images with the correct versions of Python, TensorFlow, PyTorch, and other ML frameworks.

Question 6

Q

How does SageMaker optimize for different hardware?

Answer

A

SageMaker selects the appropriate image based on instance_type, ensuring optimized performance for CPU or GPU instances.

Question 7

Q

What makes SageMaker easier for distributed training?

Answer

A

It manages distributed training automatically, eliminating the need for Kubernetes or manual cluster setup.

Question 8

Q

How does SageMaker integrate with AWS services?

Answer

A

It seamlessly connects with S3 (storage), CloudWatch (logs), IAM (security), and Step Functions (pipelines).

Question 9

Q

Why is SageMaker useful for data scientists?

Answer

A

It abstracts infrastructure management, allowing data scientists to focus on ML instead of DevOps.

Question 10

Q

What factors should be considered when distributing ML computation?

Answer

A

You need to consider the ML task, software libraries supporting distributed computation, and available compute resources.

Question 11

Q

Why doesn’t distributed computation always lead to a linear efficiency gain?

Answer

A

Bottlenecks in data I/O, inter-GPU communication overhead, and numerical changes in model training (e.g., batch size affecting learning rate) can impact performance and accuracy.

Sagemaker Flashcards

(11 cards)