Sagemaker Built-in Algorithms Flashcards

Question

Semantic Segmentation

Answer 1

- pixel-level object classification - useful for self-driving cars - produces a segmentation mask

Answer 2

- JPG images and PNG annotations - label maps for describing annotations - augmented manifest image format for pipe! - JPG for inference

Answer 3

- built on MxNet Gluon and GluonCV - choice of 3 algorithms: - fully-convolutional network (FCN) - pyramid scene parsing (PSP) - DeepLabV3 - backbone: ResNet50, ResNet10 - both trained on ImageNet HPs: epochs, learning rate, batch size, optimizer, algorithm used, backbone used single machine GPU only, CPU or GPU for inference

Answer 4

- unsupervised anomaly detection - detect - spikes in time-series data - breaks in periodicity - unclassifiable data points - gives anomaly score to each point - amazon very proud of this!

Answer 5

- CSV or recordIO/protobuf - file or pipe - optional test channel for computing AUC, recall, precision, F1 score

Answer 6

- create forest of trees where each tree is a partition of the training data - looks at expected change in complexity as a result of adding a new point - sampled randomly, then trained - can be used on time series HPs: number of trees (increasing # reduces noise), samples / tree no GPU

Answer 7

- organize documents into topics - classify/summarize documents based on topics - not just TF/IDF - NTM groups things into higher levels - unsupervised - uses a neural variational inference algorithm

Answer 8

- four data channels - train channel required (validation, test, aux optional) - recordIO/protobuf or CSV - words need to be tokenized with a vocab file - file or pipe mode

Answer 9

- define how many topics to generate - latent representation based on top-ranking words - one of two topic modeling algorithms (LDA) HPs: smaller batch size and learning rate can reduce validation loss but increase training time, # of topics CPU or GPU

Answer 10

- topic modeling (not deep learning) - unsupervised - grouping of documents with shared subset of words - can be used for things other than words - customer clusters, harmonic analysis

Answer 11

- train, optional test channel - recordIO/protobuf or CSV - need to tokenize - each document has counts for every word in vocab (CSV) - pipe only supported with recordIO

Answer 12

- unsupervised > you pick the # of topics - test channel - score results - functionally similar to Neural Topic Modeling, but CPU based HPs: # of topics, Alpha0 (initial guess for concentration values) single instance CPU

Answer 13

- supervised - simple classification or regression algorithm - classification: - find K closest points to a sample and return most frequent label - regression: - find K closest points to a sample and return average value

Answer 14

- train, optional test channel - recordIO/protobuf or CSV - file or pipe

Answer 15

- data is sampled - dimensionality reduction - avoid sparse data at the cost of noise/accuracy - sign or figit methods - build index - serialize - query HPs: K, sample size CPU or GPU inference - CPU for lower latency, GPU for higher throughput

Answer 16

- unsupervised clustering - divide data into K groups where members are similar - you define "similar" > euclidian distance - web-scale k-means clustering

Answer 17

- train channel (sharded by S3 key flag), optional test (fully replicated key flag) - recordIO/protobuf or CSV - file and pipe

Answer 18

- every observation mapped to n-dimensional space - works to optimize center of K-clusters - extra cluster centers may be specified to improve accuracy - K = k*x - k = clusters we want - x = extra cluster centers - algorithm: determine initial cluster centers - random or K-means ++ - K-means ++ tries to make initial clusters far apart - iterate over data and calculate cluster centers - reduce clusters from K to k (using Lloyd's method for k-means++) HPs: batch size, extra center factor (x), init method (random or k-means++), K - K is tricky: use elbow method - basically optimize for tightness of clusters CPU or GPU (CPU recommended)

Answer 19

- dimensionality reduction - projecting higher-level dimensional data into lower-dimensional (like a 2D plot) while minimizing loss of info - reduced dimensions are called components - first component has largest possible variability - 2nd component has next largest - unsupervised

Answer 20

- recordIO/protobuf | - file or pipe

Answer 21

- covariance matrix created, then singular value decomposition (SVD) - 2 modes: - regular - sparse data, moderate # of features - randomized - large # of features - uses approximation algorithm HPs: algorithm mode, subtract mean (unbias data) CPU or GPU - depends on specifics of data

Answer 22

- classification/regression with SPARSE DATA - good for recommendations - click prediction - item recommendations - since a user doesn't interact with most pages/products, the data is sparse - supervised (classification or regression) - limited to pair-wise interactions - user - item

Answer 23

- recordIO/protobuf with Float32 | - sparse data means CSV isn't practical

Answer 24

- essentially makes a big matrix - find factors we can use to predict a classification (click or not) or value (predicted rating) given a matrix representing some pair of things (users and items) HPs: initialization methods for bias, factors, and linear terms - uniform, normal, or constant CPU or GPU - CPU recommended, GPU only works with dense data

Answer 25

- unsupervised - learning of IP address usage patterns - ID suspicious activity - security tool

Answer 26

- user names, account IDs can be fed in directly - training channel, optional validation (computes AUC) - CSV only - entity, IP

Answer 27

- neural network to learn about latent vector representations of entities and IP - entities hashed and embedded - need big enough hash size - auto generates negative samples by randomly pairing entities and IPs HPs: # of entity vectors (hash size), vector dim, epochs, learning rate, batch size CPU or GPU (GPU recommended)

Sagemaker Built-in Algorithms Flashcards

(51 cards)