ML Exam Revision Flashcards by Luiz Martins

What are some ways of using Spark on Sagemaker?

-Through the Sagemaker AI Spark Library
-The Spark Kernel on Sagemaker Notebooks
-Connect to EMR Serverless

How well did you know this?

Not at all

Perfectly

What is the common protocol for training models on Sagemaker AI using Spark?

-Use the same Spark preprocessing being used commonly
-Use an estimator from the Sagemaker AI Spark Library to train the model
-Use the SagemakerModel obtained as output from the estimator to host the model

How well did you know this?

Not at all

Perfectly

True or False:
-Sagemaker AI supports Identity based policies
-Sagemaker AI supports Resource based policies
-Sagemaker AI supports Tag based access control
-Sagemaker AI fully supports Service-linked roles
-Sagemaker AI supports Service roles

-True
-False
-True
-False, only partially
-True

How well did you know this?

Not at all

Perfectly

When your Sagemaker Studio is running on VPC-only mode, what steps should be taken to make sure it functions properly?

Its VPC should have an Internet Gateway or the appropriate interface endpoints to connect to needed AWS Services

How well did you know this?

Not at all

Perfectly

What is network isolation in regards to training jobs?

It is a functionality that allows a training job instance to be completely closed off from the rest of the network, with the exception of other instances involved in the job. In this case, it does not receive any AWS credentials

How well did you know this?

Not at all

Perfectly

What is AWS Deep Lens?

AWS Deeplens is a deep learning-enabled video camera that is developed by Amazon Web Services (AWS). It is designed to make it easy for developers to create and deploy deep learning models on edge devices, such as cameras and robots.

How well did you know this?

Not at all

Perfectly

What are the requirements for Sagemaker AI inference containers?

-Containers must be listening on port 8080 for /ping and /invocations
-Containers must accept socket connection requests within 250 ms
-Containers must respond to requests within 60 seconds

How well did you know this?

Not at all

Perfectly

What are the consequences of excessively large batch sizes when training ML Models?

-They get stuck on local minima and ovefit more often

How well did you know this?

Not at all

Perfectly

What is the difference between transfer learning and incremental learning?

Transfer Learning refers to a model’s capability of receiving a smaller training set to fine-tune its performance to a certain use case. Incremental learning means a model can be trained without it’s full data, allowing continuous training over time

How well did you know this?

Not at all

Perfectly

What time-series model is good for cold start problems?

DeepAR

How well did you know this?

Not at all

Perfectly

What are Sagemaker HPO Strategies?

-Random
-Grid Search
-Bayesian Search
-Hyperband

How well did you know this?

Not at all

Perfectly

Which is the fastes Sagemaker HPO Strategy?

Hyperband

How well did you know this?

Not at all

Perfectly

Which optimization metric should you use when you have a classification problem where the training dataset has few positive examples?

PR Curve

How well did you know this?

Not at all

Perfectly

True or False: Sagemaker Inference Pipeline supports both sklearn and sprak containers

True

How well did you know this?

Not at all

Perfectly

True or False: ROC AUC is particularly good for heavily imbalanced data

False, it is not good for heavily imbalanced data

How well did you know this?

Not at all

Perfectly

What are possible input sources for a Glue ETL Job?

Study These Flashcards

-S3
-RDS
-Aurora
-DynamoDB
-Redshift
-Opensearch
-Kinesis data streams
-MSK
-DocumentDB
-3rd party data stores

What are possible output targets for a Glue ETL Job?

Study These Flashcards

Same as input with exception of DocumentDB:
-S3
-RDS
-Aurora
-DynamoDB
-Redshift
-Opensearch
-Kinesis data streams
-MSK
-3rd party data stores

What are the possible output formats of a Glue ETL Job?

Study These Flashcards

-CSV
-JSON
-Parquet
-Avro
-ORC

With with Sagemaker models can you perform incremental training?

Study These Flashcards

-Semantic Segmentation
-Object Detection (MXNet)
-Image Labeling (MXNet)

True or False: Factorization Machines are good for click rate data

Study These Flashcards

True

What are the required params for Factorization Machines?

Study These Flashcards

-Feature_dim
-Num_factors
-Predictor_type

What are the 3 steps for building a KNN model on Sagemaker?

Study These Flashcards

-Sampling: reduces the data so it fits in memory
-Dimensionality reduction: Good mainly for inputs with over 1000 features
-Index creation: Creates index to be used for finding the individual points fast

When not using labels on sagemaker training, how is the label_size parameter configured?

Study These Flashcards

Label_size = 0

What are the required KNN hyperparams?

Study These Flashcards

-Sample_size
-K
-Predictor_type
-Feature_dim

True or False: Normalization must be applied before you start linear learning training

False, Linear Learner can normalize your data for you

What are the required Linear Learner hyperparams?

-Predictor_type -Num_classes (If predictor_type is multiclass_classifier)

True or False: Sagemaker XGBoost can be used both as an algorithm and a framework

True

True or False: On Seq2Seq, the hyperparameters for both encoders are required

False, only for encoder_0, with encoder_1 copying it if necessary

True or False: LDA is based on bag of words, so the order of words do not matter

False, It does matter

ML Exam Revision Flashcards

(29 cards)