Models Flashcards

Question

BlazingText - Hyperparameters

Answer 1

Word2vec: Mode (batch_skipgram, skipgram, cbow) Learning_rate Window_size Vector_dim Negative_samples Text classification: Epochs Learning_rate Word_ngrams Vector_dim

Answer 2

Word2Vec: For cbow and skipgram, recommend a single: ml.p3.2xlarge --> Any single CPU or GPU will work For batch skipgram, can use single or multiple CPU instances For text classification: C5 is recommended for less than 2GB training data. For larger data set, use a single GPU instance (ml.p2.xlarge or ml.p3.2xlarge)

Answer 3

Word2vec → finds relationships between words in a sentence Object2Vec → can work on entire document, or other objects Creates a low dimensional dense embeddings on high dimensional objects - Compute nearest neighbour of objects - Visualise clusters Use cases: Genre predictions Recommendation (similar items or users)

Answer 4

Data must be tokenised into integers Training data consist of pair of tokens and/or sequence of tokens: Sentence-sentence Labels-sentence (genre to description?) Customer-customer Product-product User-user

Answer 5

Process data into JSON lines and shuffle it Train with two input channels, two encoders, and a comparator Encoder choices: Average-pooled embeddings CNN Bidirectional LSTM Comparator followed by a feed forward neural network

Answer 6

Usual deep learning: Dropout Early stopping Epochs Learning rate Batch size layer Activation fns Optimiser weight decay Enc1_network and enc2_network: Choose cnn, bilstm, pooled_embedding → choose encoder type for each channel

Answer 7

Can train on only a single instance (CPU< GPU, or multi GPU): Start with CPU: ml.m5.2xlarge Ml.p2.xlarge If needed go up to ml.m5.4xlarge, ml.m5.12slarge GPU: P2, P3, G4dn, G5 Inference: ml.p3.2xlarge Use INFERENCE_PREFERED_MODE env var to optimise for encoder embeddings rather than classification or regression

Answer 8

Identify all objects in an image with bounding boxes Detects and classifies objects with a single deep neural network Classes accompanied by confidence scores Can train from scratch or use pre-trained models based on MXNet

Answer 9

Two variants: MXNet and Tensorflow Takes an image as input, outputs all instances of objects in the image, with categories and confidence scores Mxnet: Uses a CNN with single shot multibox detector (SSD) algo Transfer learning model / incremental learning Uses flip, rescale, and jilter internally to avoid overfitting Tensorflow: Uses resnet, efficient net, mobilenet modes

Answer 10

MXNet: - recordIO or image format (jpg or png) With image format, supply a json with annotation data for each image

Answer 11

Batch size Learning rate Optimiser --> Sgd, adam, rmsprop

Answer 12

Use GPU for training - can do multi and multi-machine Ml.p2, ml,p2, G4dn, and G5 Inference CPU or GPU: M5, P2, P3, G4dn

Answer 13

Object detection tells you where an object is Image Classificaiton tells you what is in the image Assign one or more labels to an image Doesn’t tell you where objects are, just what is it

Answer 14

Separate algos for mxnet and tensorflow Mxnet: 1. Fulltraining mode --> Network initialised with random weights 2. Transfer learning --> Pre-trained weights --> The top fully connected layer is initialised with random weights --> Network is fine tuned with new training data Default image size is 3-channel 224x224 (RGB) Tensorflow → uses various tensorflow hub models (mobilenet, inception, resnet, efficientnet) → Top classification layer is available for fine tuning and further training

Answer 15

Usual deep learning: Batch size, learning rate, optimiser Optimiser specific Weight decay, beta 1, beta 2, eps, gamma Slight difference between mxnet and tensorflow

Answer 16

GPU for training (multi GPU, and multi instances P2, p3, g4dn, g5 CPU or GPU for Inference M5, p2, p3, g4dn, g5

Answer 17

Pixel-level object classification: Rather than just a bounding box Shows you EXACTLY where the object is Useful for self-driving vehicles, medical diagnostics, robot sensing Produces a segmentation mask

Answer 18

Built on mxnet and Gluon CV Choice of 3 algos (decoders --> constructs segmentation mask): Fully conv net (FCN) Pyramid scene Parsing (PSP) DeepLabV3 Choice of backbones (or encoder --> applies activation fn to features): Resnet50, resnet101, both trained on imagenet Incremental training, or training from scratch, both supported

Answer 19

JPG images and PNG annotation For both training and validation Label maps to describe annotations Augmented manifest image format supported for Pipe mode

Answer 20

JPEG image

Answer 21

Epochs, learning rate, batch size, optimiser etc Algorithms Backbones

Answer 22

Only GPU for training: P2, P3, G4dn, G5 Only single instance Instance CPU (C5 or M5) or GPU (P3 or G4dn)

Answer 23

Anomaly detection: Unsupervised Detect unexpected spikes in time series Breaks in periodicity Unclassified data points Assigns an anomaly score to each data points Based on an algo AWS made

Answer 24

RecordIO-protobuf or csv Can use file or pipe mode Optional test channel for computing accuracy, precision. Recall etc → on something where you know where the anomalies are

Answer 25

Creates a forest of trees where each tree is a partition of the training data → looks at expected change in complexity of the tree as a result of adding a point to it Data is sampled randomly Then trained RCF shows up in Kinesis Analytics as well → anomaly detection on streaming data

Answer 26

Number of trees Increase → reduces noise Num samples per tree Should be chosen such that 1/num_samples_per_tree approaches the rate of anomalous to normal data

Answer 27

No GPUS Use m4, c4, c5 for training C5 for inference

Answer 28

What is a document about? Unsupervised Natural variational inference Organise documents into topics Classify or summarise documents based on topics --> Not just TF-IDF --> Won’t return topic name, but will groups docs

Answer 29

Four data channels: “Train” is required Validation, test and auxiliary are optional Recordio-protobuf or csv Words must be tokenized into integers Every doc must contain a count for every word in the vocabulary in CSV The auxiliary channel is the vocabulary, mapping tokens to words File or pipe mode

Answer 30

You define how many topics you want These topics are a latent representation based on top ranking words Topics will not be human readable words One of 2 topics modelling algos in sagemaker

Answer 31

Batch size and learning rate: Can reduce validation loss, at expense of training time Num_topics

Answer 32

GPU or CPU GPU for training CPU adequate for inference

Answer 33

Unsupervised: generates however many topics you specify Optional test channels can be used for scoring results Per word log likelihood shows how well it works Functionality similar to NTM, but CPU based Therefore much cheaper / efficient

Answer 34

Sagemakers other topic modelling algo Latent dirichlet allocations Unsupervised Topics themselves are unlabeled → just groupings of documents with a shared subset of words NTM is another unsupervised topic identification algo Not deep learning Can be used for things other than words: Cluster customers based on purchases Harmonic analysis in music

Answer 35

Train channel, optional test channel Redordio-protbuf or csv Each document has counts for every word in vocab for that document Pipe mode only supported with recordio

Answer 36

Num topics Alpha0: Initial guess for concentration parameter Smaller values generate sparse topic mixtures Larger values (>10) produce uniform mixtures

Answer 37

Single CPU instance

Answer 38

Simplification classification or regression algo Technically supervised → labelled Classification: Find the k closest point to a sample and return the most frequent label Regression: Find the k nearest neighbours and return the average value

Answer 39

Data is first sampled Sagemaker includes a dimensionality reduction stage: Avoid sparse data (curse of dimensionality) At cost of noise/accuracy Optionas: Sign or fjit methods Builds an index for looking at neighbours Serialise the model Query the model for a given k

Answer 40

Train channel contains your data Test channel emits accuracy or MSE recordIO-protbuf or csv Csv: first column contains label Pipe or file mode

Answer 41

K Sample_size

Answer 42

Training on cpu or gpu: M5 or p2 Inferences CPU for lower latency Gpu for higher throughput on large batches

Answer 43

Unsupervised clustering Divide data into k groups, where members of a group are as similar to each other as possible You define what similar means Measured by euclidean distance Web-scale k-means clustering

Answer 44

Train channel, optional test: Train SharedByS3Key, test FullyReplicated RecordIO-protobuf or CSV File or Pipe mode

Answer 45

Every observation to n-dimensional space (n=number fo features) Works to optimise the centre of K clusters" --> “Extra cluster centres” may be specified to improve accuracy (which end up getting reduced to k) --> K=k*x → K is the initial number of clusters, want to reduce this down to k Algorithm: Determine initial cluster centres - Random or k-mean++ approach - K-means++ tries to make initial clusters far apart Iterate over training data and calculate cluster centres Reduce clusters from K to k - Using Lloyds method with kmeans++

Answer 46

K: Chosing K is tricky Pilot within cluster sum of squares as a function of k Use “elbow emthod” Basically optimise for tightness of clusters Batch size Extra centre factor Init method

Answer 47

CPU or GPU, but cpu recommended Only one GPU per instance used on GPU → g4dn if going to GPU P2, p3, g4dn, and g4 supported

Answer 48

Principal component analysis Dimensionality reduction: - Project higher-dimensionality data (lots of features) into lower dimensional space while minimising the loss of information - The reduced dimensions are called components - First component has largest possible variability - Second component has the next largest - Unsupervised

Answer 49

Covariance matrix is created, then SVD (single value decomposition) Two modes: Regular --> For sparse data and moderate number of observations and features Randomised --> For large number of observations and features Uses approximation algorithm

Answer 50

Recordio-protobuf or csv File or pipe mode

Answer 51

Algortihm_mode Subtract_mean → unbiased the data

Answer 52

GPU or CPU It depends on the specifics of the input data → need to experiment

Answer 53

Dealing with sparse data: Click prediction (= individual user does not interact with majority pages on a website, but they do interact with a few pages) Item recommendations Since an individual user doesn’t interact with most pages / products the data is sparse Supervised Classification or regression Limited to pair-wise interactions: User → item for example

Answer 54

Redcordio-protobuf format with float32 Sparse data means csv isn’t practical → loads of commas

Answer 55

Find factors we can use to predict a classification (click or not? Purchase or not?) or value (predicted rating?) given a matrix representing some pair of things Usually used in the context of recommender systems

Answer 56

Initialisation methods for bias, factors and linear terms - Uniform, normal, or constant - Can tune properties of each method

Answer 57

Cpu or gpu Cpu recommended Gpu only works with dense data

Answer 58

Unsupervised learning of ip address usage patterns Identifies suspicious behaviour from ip address Identify logins from suspicious ip addresses Identify accounts creating resources from anomalous ips

Answer 59

User names, accounts IDs can be fed in directly, no need to preprocess Training channel, optional validation (computes AUC score) CSV only → entity, IP

Answer 60

Uses a neural network to learn latent vector representation of entities and ip addresses Entities are hashed and embedded: Need sufficiently large hash size Automatically generates negative samples during training by randomly pairing entities and ips

Answer 61

Num entity vectors: Hash size Set to twice the number of unique entity identifiers Vector dim: Size of embedding vectors Scales model size Too large results in overfitting Epochs, learning rate, batch size etc

Answer 62

CPU or GPU Gpu recommended e.g. p3 or higher Can use multiple GPUs Size of CPU depends on vector dim and number of vectors

Models Flashcards

(86 cards)