ML Exam Revision Flashcards

1
Q

What are some ways of using Spark on Sagemaker?

A

-Through the Sagemaker AI Spark Library
-The Spark Kernel on Sagemaker Notebooks
-Connect to EMR Serverless

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the common protocol for training models on Sagemaker AI using Spark?

A

-Use the same Spark preprocessing being used commonly
-Use an estimator from the Sagemaker AI Spark Library to train the model
-Use the SagemakerModel obtained as output from the estimator to host the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

True or False:
-Sagemaker AI supports Identity based policies
-Sagemaker AI supports Resource based policies
-Sagemaker AI supports Tag based access control
-Sagemaker AI fully supports Service-linked roles
-Sagemaker AI supports Service roles

A

-True
-False
-True
-False, only partially
-True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When your Sagemaker Studio is running on VPC-only mode, what steps should be taken to make sure it functions properly?

A

Its VPC should have an Internet Gateway or the appropriate interface endpoints to connect to needed AWS Services

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is network isolation in regards to training jobs?

A

It is a functionality that allows a training job instance to be completely closed off from the rest of the network, with the exception of other instances involved in the job. In this case, it does not receive any AWS credentials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is AWS Deep Lens?

A

AWS Deeplens is a deep learning-enabled video camera that is developed by Amazon Web Services (AWS). It is designed to make it easy for developers to create and deploy deep learning models on edge devices, such as cameras and robots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the requirements for Sagemaker AI inference containers?

A

-Containers must be listening on port 8080 for /ping and /invocations
-Containers must accept socket connection requests within 250 ms
-Containers must respond to requests within 60 seconds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the consequences of excessively large batch sizes when training ML Models?

A

-They get stuck on local minima and ovefit more often

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between transfer learning and incremental learning?

A

Transfer Learning refers to a model’s capability of receiving a smaller training set to fine-tune its performance to a certain use case. Incremental learning means a model can be trained without it’s full data, allowing continuous training over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What time-series model is good for cold start problems?

A

DeepAR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are Sagemaker HPO Strategies?

A

-Random
-Grid Search
-Bayesian Search
-Hyperband

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which is the fastes Sagemaker HPO Strategy?

A

Hyperband

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which optimization metric should you use when you have a classification problem where the training dataset has few positive examples?

A

PR Curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

True or False: Sagemaker Inference Pipeline supports both sklearn and sprak containers

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

True or False: ROC AUC is particularly good for heavily imbalanced data

A

False, it is not good for heavily imbalanced data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are possible input sources for a Glue ETL Job?

A

-S3
-RDS
-Aurora
-DynamoDB
-Redshift
-Opensearch
-Kinesis data streams
-MSK
-DocumentDB
-3rd party data stores

17
Q

What are possible output targets for a Glue ETL Job?

A

Same as input with exception of DocumentDB:
-S3
-RDS
-Aurora
-DynamoDB
-Redshift
-Opensearch
-Kinesis data streams
-MSK
-3rd party data stores

18
Q

What are the possible output formats of a Glue ETL Job?

A

-CSV
-JSON
-Parquet
-Avro
-ORC

19
Q

With with Sagemaker models can you perform incremental training?

A

-Semantic Segmentation
-Object Detection (MXNet)
-Image Labeling (MXNet)

20
Q

True or False: Factorization Machines are good for click rate data

21
Q

What are the required params for Factorization Machines?

A

-Feature_dim
-Num_factors
-Predictor_type

22
Q

What are the 3 steps for building a KNN model on Sagemaker?

A

-Sampling: reduces the data so it fits in memory
-Dimensionality reduction: Good mainly for inputs with over 1000 features
-Index creation: Creates index to be used for finding the individual points fast

23
Q

When not using labels on sagemaker training, how is the label_size parameter configured?

A

Label_size = 0

24
Q

What are the required KNN hyperparams?

A

-Sample_size
-K
-Predictor_type
-Feature_dim

25
True or False: Normalization must be applied before you start linear learning training
False, Linear Learner can normalize your data for you
26
What are the required Linear Learner hyperparams?
-Predictor_type -Num_classes (If predictor_type is multiclass_classifier)
27
True or False: Sagemaker XGBoost can be used both as an algorithm and a framework
True
28
True or False: On Seq2Seq, the hyperparameters for both encoders are required
False, only for encoder_0, with encoder_1 copying it if necessary
29
True or False: LDA is based on bag of words, so the order of words do not matter
False, It does matter