ML Exam Revision Flashcards
What are some ways of using Spark on Sagemaker?
-Through the Sagemaker AI Spark Library
-The Spark Kernel on Sagemaker Notebooks
-Connect to EMR Serverless
What is the common protocol for training models on Sagemaker AI using Spark?
-Use the same Spark preprocessing being used commonly
-Use an estimator from the Sagemaker AI Spark Library to train the model
-Use the SagemakerModel obtained as output from the estimator to host the model
True or False:
-Sagemaker AI supports Identity based policies
-Sagemaker AI supports Resource based policies
-Sagemaker AI supports Tag based access control
-Sagemaker AI fully supports Service-linked roles
-Sagemaker AI supports Service roles
-True
-False
-True
-False, only partially
-True
When your Sagemaker Studio is running on VPC-only mode, what steps should be taken to make sure it functions properly?
Its VPC should have an Internet Gateway or the appropriate interface endpoints to connect to needed AWS Services
What is network isolation in regards to training jobs?
It is a functionality that allows a training job instance to be completely closed off from the rest of the network, with the exception of other instances involved in the job. In this case, it does not receive any AWS credentials
What is AWS Deep Lens?
AWS Deeplens is a deep learning-enabled video camera that is developed by Amazon Web Services (AWS). It is designed to make it easy for developers to create and deploy deep learning models on edge devices, such as cameras and robots.
What are the requirements for Sagemaker AI inference containers?
-Containers must be listening on port 8080 for /ping and /invocations
-Containers must accept socket connection requests within 250 ms
-Containers must respond to requests within 60 seconds
What are the consequences of excessively large batch sizes when training ML Models?
-They get stuck on local minima and ovefit more often
What is the difference between transfer learning and incremental learning?
Transfer Learning refers to a model’s capability of receiving a smaller training set to fine-tune its performance to a certain use case. Incremental learning means a model can be trained without it’s full data, allowing continuous training over time
What time-series model is good for cold start problems?
DeepAR
What are Sagemaker HPO Strategies?
-Random
-Grid Search
-Bayesian Search
-Hyperband
Which is the fastes Sagemaker HPO Strategy?
Hyperband
Which optimization metric should you use when you have a classification problem where the training dataset has few positive examples?
PR Curve
True or False: Sagemaker Inference Pipeline supports both sklearn and sprak containers
True
True or False: ROC AUC is particularly good for heavily imbalanced data
False, it is not good for heavily imbalanced data
What are possible input sources for a Glue ETL Job?
-S3
-RDS
-Aurora
-DynamoDB
-Redshift
-Opensearch
-Kinesis data streams
-MSK
-DocumentDB
-3rd party data stores
What are possible output targets for a Glue ETL Job?
Same as input with exception of DocumentDB:
-S3
-RDS
-Aurora
-DynamoDB
-Redshift
-Opensearch
-Kinesis data streams
-MSK
-3rd party data stores
What are the possible output formats of a Glue ETL Job?
-CSV
-JSON
-Parquet
-Avro
-ORC
With with Sagemaker models can you perform incremental training?
-Semantic Segmentation
-Object Detection (MXNet)
-Image Labeling (MXNet)
True or False: Factorization Machines are good for click rate data
True
What are the required params for Factorization Machines?
-Feature_dim
-Num_factors
-Predictor_type
What are the 3 steps for building a KNN model on Sagemaker?
-Sampling: reduces the data so it fits in memory
-Dimensionality reduction: Good mainly for inputs with over 1000 features
-Index creation: Creates index to be used for finding the individual points fast
When not using labels on sagemaker training, how is the label_size parameter configured?
Label_size = 0
What are the required KNN hyperparams?
-Sample_size
-K
-Predictor_type
-Feature_dim