[MLS] Modelling - SageMaker Flashcards
(36 cards)
What is Linear Learner used for?
Binary or multi-class regression and classification; good for problems with linear relationships between features
What type of algorithm is XGBoost and what is it used for?
Traditional ML using gradient boosting trees; used for classification and regression with tabular/structured data
What is Seq2Seq and its primary use?
Deep learning neural network for text-to-text transformations like translation and summarization; processes sequences of data
What is DeepAR’s main purpose?
Deep learning algorithm for time series forecasting with multiple time series
What is BlazingText and how does it compare to word2vec?
Deep learning neural network for word embeddings and text classification; similar to word2vec but faster
What is Object2Vec used for?
Deep learning neural network for learning embeddings of pairs of objects like customer-item pairs and document similarities
How does Object Detection work in SageMaker?
Deep learning CNN that locates and classifies objects in images with bounding boxes
What is the purpose of Image Classification in SageMaker?
Deep learning CNN that categorizes images into predefined classes
What does Semantic Segmentation do?
Deep learning CNN for pixel-level image classification where each pixel is labeled
What is Random Cut Forest used for?
Traditional unsupervised ML for anomaly detection to identify unusual data points
What is Neural Topic Model’s function?
Deep learning neural network for topic modeling to find topics in document collections
What is K-Nearest-Neighbors (KNN) used for?
Traditional ML for classification and regression based on similarity measures
What is K-Means Clustering’s primary purpose?
Unsupervised traditional ML for clustering similar data points into k clusters
What does PCA do?
Traditional unsupervised ML for dimensionality reduction while preserving variance
What are Factorization Machines used for?
Traditional ML for recommendation systems or classification with sparse data and feature interactions
What is IP Insights used for?
Neural network for detecting suspicious IP addresses by learning patterns in IP address usage
What is SageMaker Canvas? What type of ML analysis can you do with it?
No code ML solution for business analysts. Regression or classification.
What data format does SageMaker Canvas accept?
CSV only
What is class imbalance?
When one of the facets of your training data has less data than others, e.g. a specific demographic
What is difference in proportion of labels?
A more specific instance of class imbalance where there is an imbalance of positive outcomes between facet values, e.g. older people in the training data always get approved for loans. This can be a problem, but can also just be something that an ML engineer should be aware of without actually being a problem. (It’s a problem if it reflects institutional discrimination, for example, but might not necessarily be if there is data to say young people default on their loans 100% of the time)
What is divergence?
Demonstrates how the distribution of outcomes changes depending on the subgroup, a way to assess how much the outcomes diverge depending on the group being assessed. A divergence of 0 between 2 groups means that they would be treated exactly the same.
What is Conditional Demographic Disparity?
A method to see if bias in outcomes still exists even after holding constant certain variables. E.g. if net worth, credit score etc. are held constant (legitimate differentiators), but 50y/os are still being favoured for loans over 45y/os then maybe there is a problem.
What is SageMaker Training Compiler?
An optimised compiler for training your models
Is SageMaker Training Compiler compatible w/ SageMaker distributed training libraries?
No