Sagemaker Flashcards
(11 cards)
What is Image_URIs module on Sagemaker SDK?
Functions for generating ECR image URIs for pre-built SageMaker Docker images.
Why should one use the image_uris module in SageMaker?
It provides pre-built, optimized Docker image URIs for ML frameworks (e.g., PyTorch, TensorFlow) and algorithms, ensuring easy setup, consistent environments, and seamless AWS integration without manual configuration.
What happens when one changes the instance_type input for image_uris?
The retrieved container image may change to ensure the correct dependencies and performance optimizations for the chosen hardware.
What is one benefit of using SageMaker over traditional VM setup?
SageMaker automates VM provisioning, so you don’t need to manually launch and configure instances.
How does SageMaker handle ML environments?
It provides pre-configured Docker images with the correct versions of Python, TensorFlow, PyTorch, and other ML frameworks.
How does SageMaker optimize for different hardware?
SageMaker selects the appropriate image based on instance_type
, ensuring optimized performance for CPU or GPU instances.
What makes SageMaker easier for distributed training?
It manages distributed training automatically, eliminating the need for Kubernetes or manual cluster setup.
How does SageMaker integrate with AWS services?
It seamlessly connects with S3 (storage), CloudWatch (logs), IAM (security), and Step Functions (pipelines).
Why is SageMaker useful for data scientists?
It abstracts infrastructure management, allowing data scientists to focus on ML instead of DevOps.
What factors should be considered when distributing ML computation?
You need to consider the ML task, software libraries supporting distributed computation, and available compute resources.
Why doesn’t distributed computation always lead to a linear efficiency gain?
Bottlenecks in data I/O, inter-GPU communication overhead, and numerical changes in model training (e.g., batch size affecting learning rate) can impact performance and accuracy.