[MLS] Modelling - SageMaker Flashcards

(36 cards)

1
Q

What is Linear Learner used for?

A

Binary or multi-class regression and classification; good for problems with linear relationships between features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What type of algorithm is XGBoost and what is it used for?

A

Traditional ML using gradient boosting trees; used for classification and regression with tabular/structured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Seq2Seq and its primary use?

A

Deep learning neural network for text-to-text transformations like translation and summarization; processes sequences of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What type of algorithm is DeepAR and what is it’s main purpose?

A

Deep learning algorithm for time series forecasting with multiple time series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is BlazingText and how does it compare to word2vec?

A

Deep learning neural network for word embeddings and text classification; similar to word2vec but faster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Object2Vec used for?

A

Deep learning neural network for learning embeddings of pairs of objects like customer-item pairs and document similarities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does Object Detection do in SageMaker?

A

Deep learning CNN that locates and classifies objects in images with bounding boxes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the purpose of Image Classification in SageMaker?

A

Deep learning CNN that categorizes images into predefined classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does Semantic Segmentation do?

A

Deep learning CNN for pixel-level image classification where each pixel is labeled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Random Cut Forest used for?

A

Traditional unsupervised ML for anomaly detection to identify unusual data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Neural Topic Model’s function?

A

Deep learning neural network for topic modeling to find topics in document collections

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is K-Nearest-Neighbors (KNN) used for?

A

Traditional ML for classification and regression based on similarity measures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is K-Means Clustering?

A

Unsupervised traditional ML for clustering similar data points into k clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is PCA?

A

Traditional unsupervised ML algorithm for dimensionality reduction while preserving variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are Factorization Machines used for?

A

Traditional ML for recommendation systems or classification with sparse data and feature interactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is IP Insights used for?

A

Neural network for detecting suspicious IP addresses by learning patterns in IP address usage

17
Q

What is SageMaker Canvas? What type of ML analysis can you do with it?

A

No code ML solution for business analysts. Regression or classification.

18
Q

What data format does SageMaker Canvas accept?

19
Q

What is class imbalance?

A

When one of the facets of your training data has less data than others, e.g. a specific demographic

20
Q

What is difference in proportion of labels?

A

A more specific instance of class imbalance where there is an imbalance of positive outcomes between facet values, e.g. older people in the training data always get approved for loans. This can be a problem, but can also just be something that an ML engineer should be aware of without actually being a problem. (It’s a problem if it reflects institutional discrimination, for example, but might not necessarily be if there is data to say young people default on their loans 100% of the time)

21
Q

What is divergence?

A

Demonstrates how the distribution of outcomes changes depending on the subgroup, a way to assess how much the outcomes diverge depending on the group being assessed. A divergence of 0 between 2 groups means that they would be treated exactly the same.

22
Q

What is Conditional Demographic Disparity?

A

A method to see if bias in outcomes still exists even after holding constant certain variables. E.g. if net worth, credit score etc. are held constant (legitimate differentiators), but 50y/os are still being favoured for loans over 45y/os then maybe there is a problem.

23
Q

What is SageMaker Training Compiler?

A

An optimised compiler for training your models

24
Q

Is SageMaker Training Compiler compatible w/ SageMaker distributed training libraries?

25
What is SageMaker ML Lineage Tracking?
Keeps a running history of your models which can be done automatically or manually.
26
Can you query SageMaker ML Lineage?
Yes, using the LineageQuery API from Python
27
Does SageMaker Data Wrangler do the transformations themselves?
No, it generates the code to do the transformations, but makes figuring out that code so much easier through an abstracted visual interface.
28
What is Data Wrangler Quick Model?
Train the model with your data quickly and measure results to see if you are on the right path
29
What are SageMaker Deployment Guardrails?
A way to control the shifting of traffic to new models, so that you can stage the shift over time. (Blue/green deployment etc.)
30
What type of endpoints work with SageMaker Deployment Guardrails?
Asynchronous and real-time endpoints
31
What is a Q value in reinforcement learning?
The value of executing each action-state pair, which is defined by the resulting outcome. If the resulting outcome is bad then the Q value is likely to be negative or low.
32
Can SageMaker's reinforcement learning be distributed across cores? What about instances?
Yes and yes
33
What is the SageMaker-spark library?
A library that you use to with SageMakerEstimator to use a Spark generated data frame in SageMaker. Connects a SageMaker notebook to a remote EMR cluster running Spark.
34
What is the purpose of SageMaker Debugger?
Saves the internal state of your model as it trains over time, and can send alerts if defined rules are triggered.
35
What are the 3 built in-rules to SageMaker Debugger?
* System Bottlenecks * Profile model framework operations (framework specific metrics, knowing the execution time of different elements) * Debug model parameters
36
What is the difference between SageMaker Model Monitor and SageMaker Clarify?
Model Monitor is about detecting anomalies, outliers and drift. Clarify is about interpretability and bias.