Amazon SageMaker - Deep Dive Flashcards
What is Amazon SageMaker?
A fully managed machine learning service by AWS that enables developers and data scientists to build, train, tune, and deploy ML models at scale.
What are the three main steps in a SageMaker ML workflow?
1) Collect and prepare data, 2) Build and train models, 3) Deploy and monitor models.
What types of algorithms are built-in with SageMaker?
Supervised (e.g., Linear regression, KNN), Unsupervised (e.g., PCA, K-means), Anomaly detection, NLP, and Image processing.
What is AMT in SageMaker?
Automatic Model Tuning, which automatically optimizes hyperparameters to improve model performance.
What are the four deployment types in SageMaker?
Real-time, Serverless, Asynchronous, and Batch Transform.
What is Real-time inference in SageMaker?
A low-latency prediction service for small payloads (up to 6 MB) that responds instantly using a deployed endpoint.
What is Serverless inference in SageMaker?
A deployment option without infrastructure management that auto-scales with memory configuration; may have cold start latency.
What is Asynchronous inference in SageMaker?
Used for large payloads (up to 1 GB) and longer processing times (up to 1 hour); input/output handled via Amazon S3.
What is Batch Transform in SageMaker?
Used for processing entire datasets (multiple records) at once with high latency; supports concurrent large-scale predictions.
What is SageMaker Studio?
A web-based IDE for ML development that supports model building, training, tuning, deployment, and collaboration.
What is the benefit of AMT’s early stop condition?
It saves time and cost by halting underperforming tuning jobs automatically.
What is SageMaker Data Wrangler used for?
Preparing, transforming, and engineering features from tabular and image data for machine learning.
What types of data can you prepare with Data Wrangler?
Tabular and image data.
What are some key features of Data Wrangler?
Data selection, cleansing, exploration, visualization, transformation, and feature engineering.
Does Data Wrangler support SQL?
Yes, it supports SQL for transformations and queries.
What tool in Data Wrangler helps analyze data completeness and formatting?
The data quality tool.
Why is feature engineering important?
Because high-quality features directly impact the performance of machine learning models.
What is a common transformation in feature engineering?
Converting a birth date into age (a numerical value).
What is the SageMaker Feature Store used for?
Storing, sharing, and discovering machine learning features across datasets and teams.
Where are features from the Feature Store discoverable?
Within SageMaker Studio.
Why is having a centralized Feature Store beneficial?
It improves collaboration and enables reuse of high-quality features across datasets and projects.
Can a quick model be created in Data Wrangler?
Yes, to analyze how well the model might perform on the transformed data.
Is Data Wrangler part of SageMaker Studio?
Yes, it’s fully integrated within SageMaker Studio.
What is SageMaker Clarify used for?
Evaluating foundation models, detecting bias, and explaining model predictions.