Sagemaker Flashcards

(11 cards)

1
Q

What is Image_URIs module on Sagemaker SDK?

A

Functions for generating ECR image URIs for pre-built SageMaker Docker images.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why should one use the image_uris module in SageMaker?

A

It provides pre-built, optimized Docker image URIs for ML frameworks (e.g., PyTorch, TensorFlow) and algorithms, ensuring easy setup, consistent environments, and seamless AWS integration without manual configuration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What happens when one changes the instance_type input for image_uris?

A

The retrieved container image may change to ensure the correct dependencies and performance optimizations for the chosen hardware.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is one benefit of using SageMaker over traditional VM setup?

A

SageMaker automates VM provisioning, so you don’t need to manually launch and configure instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does SageMaker handle ML environments?

A

It provides pre-configured Docker images with the correct versions of Python, TensorFlow, PyTorch, and other ML frameworks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does SageMaker optimize for different hardware?

A

SageMaker selects the appropriate image based on instance_type, ensuring optimized performance for CPU or GPU instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What makes SageMaker easier for distributed training?

A

It manages distributed training automatically, eliminating the need for Kubernetes or manual cluster setup.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does SageMaker integrate with AWS services?

A

It seamlessly connects with S3 (storage), CloudWatch (logs), IAM (security), and Step Functions (pipelines).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why is SageMaker useful for data scientists?

A

It abstracts infrastructure management, allowing data scientists to focus on ML instead of DevOps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What factors should be considered when distributing ML computation?

A

You need to consider the ML task, software libraries supporting distributed computation, and available compute resources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why doesn’t distributed computation always lead to a linear efficiency gain?

A

Bottlenecks in data I/O, inter-GPU communication overhead, and numerical changes in model training (e.g., batch size affecting learning rate) can impact performance and accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly