ML Fundamentals Flashcards by Sean Ba

allows people to store objects (files) in “buckets”
(directories)

Amazon S3

How well did you know this?

Not at all

Perfectly

What pathway is this called: * <my_bucket>/my_folder1/another_folder/my_file.txt</my_bucket>

S3 Bucket Key

How well did you know this?

Not at all

Perfectly

Pattern for speeding up range queries (ex: AWS Athena)
By Date: s3://bucket/my-dataset/year/month/day/hour/data_00.csv
By Product: s3://bucket/my-data-set/product-id/data_32.csv

Amazon S3 Data Partitioning

How well did you know this?

Not at all

Perfectly

Durability or availability:
* If you store 10,000,000 objects with Amazon S3, you can on average
expect to incur a loss of a single object once every 10,000 years
* Same for all storage classes

Durability

How well did you know this?

Not at all

Perfectly

Durability or availability:
* Measures how readily available a service is
* Varies depending on storage class

Availability

How well did you know this?

Not at all

Perfectly

What S3 storage class is the below:
* 99.99% Availability
* Used for frequently accessed data
* Low latency and high throughput
* Sustain 2 concurrent facility failures
* Use Cases: Big Data analytics, mobile & gaming applications,
content distribution…

S3 Standard – General Purpose

How well did you know this?

Not at all

Perfectly

What S3 Storage class:
*For data that is less frequently accessed, but requires rapid access
when needed
* Lower cost than S3 Standard
** 99.9% Availability
* Use cases: Disaster Recovery, backups

Amazon S3 Standard-Infrequent Access (S3 Standard-IA)

How well did you know this?

Not at all

Perfectly

What S3 Storage class:
*For data that is less frequently accessed, but requires rapid access
when needed
* Lower cost than S3 Standard
* High durability (99.999999999%) in a single AZ; data lost when AZ is destroyed
* 99.5% Availability
* Use Cases: Storing secondary backup copies of on-premise data, or data you
can recreate

Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA)

How well did you know this?

Not at all

Perfectly

What S3 Storage class:
Small monthly monitoring and auto-tiering fee
* Moves objects automatically between Access Tiers based on usage
* There are no retrieval charges in S3 Intelligent-Tiering

S3 Intelligent-Tiering

How well did you know this?

Not at all

Perfectly

Describe the S3 storage Intelligent Tiering classes below:
*__________: default tier
* Infrequent Access tier (automatic): objects not accessed for 30 days
* ______: objects not accessed for 90 days
* _________: configurable from 90 days to 700+ days
* ________: config. from 180 days to 700+ days

Frequent Access tier (automatic): default tier
* Infrequent Access tier (automatic): objects not accessed for 30 days
* Archive Instant Access tier (automatic): objects not accessed for 90 days
* Archive Access tier (optional): configurable from 90 days to 700+ days
* Deep Archive Access tier (optional): config. from 180 days to 700+ days

How well did you know this?

Not at all

Perfectly

Help you decide when to transition objects
to the right storage class
Recommendations for Standard and
Standard IA
Does NOT work for One-Zone IA or Glacier
Report is updated daily
24 to 48 hours to start seeing data analysis
Good first step to put together Lifecycle
Rules (or improve them)!

Amazon S3 Analytics

How well did you know this?

Not at all

Perfectly

bucket wide rules from the S3 console - allows cross account

S3 Bucket policies

How well did you know this?

Not at all

Perfectly

_____ is a managed alternative to Apache Kafka
* Great for application logs, metrics, IoT, clickstreams
* Great for “real-time” big data
* Great for streaming processing frameworks (Spark, NiFi, etc…)
* Data is automatically replicated synchronously to 3 AZ

Amazon Kinesis

How well did you know this?

Not at all

Perfectly

__________ low latency streaming ingest at scale

Kinesis Streams

How well did you know this?

Not at all

Perfectly

________ perform real-time analytics on streams using SQL

Kinesis Analytics

How well did you know this?

Not at all

Perfectly

_________ load streams into S3, Redshift, ElasticSearch & Splunk

Kinesis Firehose

How well did you know this?

Not at all

Perfectly

______ meant for streaming video in real-time

Kinesis Video Streams

How well did you know this?

Not at all

Perfectly

Kinesis Streams are divided in ordered ______

Shards

How well did you know this?

Not at all

Perfectly

What are the two capacity modes for Kinesis Data streams?

Provisioned and On-Demand modes

How well did you know this?

Not at all

Perfectly

What Kinesis data stream capacity mode is below:
*You choose the number of shards provisioned, scale manually or using API
* Each shard gets 1MB/s in (or 1000 records per second)
* Each shard gets 2MB/s out (classic or enhanced fan-out consumer)
* You pay per shard provisioned per hour

Provisioned

How well did you know this?

Not at all

Perfectly

What Kinesis data stream capacity mode is below:
* No need to provision or manage the capacity
* Default capacity provisioned (4 MB/s in or 4000 records per second)
* Scales automatically based on observed throughput peak during the last 30
days
* Pay per stream per hour & data in/out per GB

On-demand mode

How well did you know this?

Not at all

Perfectly

What Kinesis service is this:
*Fully Managed Service, no administration
* Near Real Time (60 seconds latency minimum for non full batches)
* Data Ingestion into Redshift / Amazon S3 / ElasticSearch / Splunk
* Automatic scaling
* Supports many data formats
* Data Conversions from CSV / JSON to Parquet / ORC (only for S3)
* Data Transformation through AWS Lambda (ex: CSV => JSON)
* Supports compression when target is Amazon S3 (GZIP, ZIP, and
SNAPPY

Kinesis data firehose

How well did you know this?

Not at all

Perfectly

Whats the difference between kinesis data streams and firehose?

*Streams
* Going to write custom code (producer / consumer)
* Real time (~200 ms latency for classic, ~70 ms latency for enhanced fan-out)
* Automatic scaling with On-demand Mode
* Data Storage for 1 to 365 days, replay capability, multi consumers
*Firehose
* Fully managed, send to S3, Splunk, Redshift, ElasticSearch
* Serverless data transformations with Lambda
* Near real time (lowest buffer time is 1 minute)
* Automated Scaling
* No data storage

How well did you know this?

Not at all

Perfectly

What Kinesis tool is this:

Use cases
* Streaming ETL: select columns, make simple transformations, on streaming
data
* Continuous metric generation: live leaderboard for a mobile game
* Responsive analytics: look for certain criteria and build alerting (filtering)
* Features
* Pay only for resources consumed (but it’s not cheap)
* Serverless; scales automatically
* Use IAM permissions to access streaming source and destination(s)
* SQL or Flink to write the computation
* Schema discovery
* Lambda can be used for pre-processing

Kinesis data analytics

How well did you know this?

Not at all

Perfectly

For Kinesis Analytics, you Pay only for ______ (but it’s not cheap)

resources consumed

Is amazon kinesis serverless?

Yes

What amazon data product has the below characteristics: * Producers: * security camera, body-worn camera, AWS DeepLens, smartphone camera, audio feeds, images, RADAR data, RTSP camera. * One producer per video stream * Video playback capability * Consumers * build your own (MXNet, Tensorflow) * AWS SageMaker * Amazon Rekognition Video * Keep data for 1 hour to 10 years

Kinesis video stream

__________ create real-time machine learning applications

Kinesis Data Stream

_____ ingest massive data near-real time

Kinesis Data Firehose

___________ real-time ETL / ML algorithms on streams

Kinesis Data Analytics

___________ real-time video stream to create ML applications

Kinesis Video Stream

* Metadata repository for all your tables * Automated Schema Inference * Schemas are versioned * Integrates with Athena or Redshift Spectrum (schema & data discovery)

Glue data catalog

____ go through your data to infer schemas and partitions * Works JSON, Parquet, CSV, relational store

Glue crawlers

Transform data, Clean Data, Enrich Data (before doing analysis) * Generate ETL code in Python or Scala, you can modify the code * Can provide your own Spark or PySpark scripts * Target can be S3, JDBC (RDS, Redshift), or in Glue Data Catalog * Fully managed, cost effective, pay only for the resources consumed * Jobs are run on a serverless Spark platform

Glue ETL

What type of data store is this: Data Warehousing, SQL analytics (OLAP - Online analytical processing)

Redshift

What type of data store is this: Relational Store, SQL (OLTP - Online Transaction Processing) * Must provision servers in advance

* RDS, Aurora:

What type of data store is this: NoSQL data store, serverless, provision read/write capacity * Useful to store a machine learning model served by your application

* DynamoDB:

What type of data store is this: Object storage * Serverless, infinite storage * Integration with most AWS Services

What type of data storoe is this: * Indexing of data * Search amongst data points * Clickstream Analytics

OpenSearch (previously ElasticSearch)

What type of data store is this: * Caching mechanism * Not really used for Machine Learning

* ElastiCache

What are these below features identifying what AWS data service: Destinations include S3, RDS, DynamoDB, Redshift and EMR * Manages task dependencies * Retries and notifies on failures * Data sources may be on-premises * Highly available

AWS Data Pipeline

What are the differences between AWS Data Pipeline and AWS Glue?

Glue: * Glue ETL - Run Apache Spark code, Scala or Python based, focus on the ETL * Glue ETL - Do not worry about configuring or managing the resources * Data Catalog to make the data available to Athena or Redshift Spectrum * Data Pipeline: * Orchestration service * More control over the environment, compute resources that run code, & code * Allows access to EC2 or EMR instances (creates resources in your own account)

What AWS data service is below: * Run batch jobs as Docker images * Dynamic provisioning of the instances (EC2 & Spot Instances) * Optimal quantity and type based on volume and requirements * No need to manage clusters, fully serverless * You just pay for the underlying EC2 instances

AWS Batch

What is the difference between AWS Batch and Glue?

* Glue: * Glue ETL - Run Apache Spark code, Scala or Python based, focus on the ETL * Glue ETL - Do not worry about configuring or managing the resources * Data Catalog to make the data available to Athena or Redshift Spectrum * Batch: * For any computing job regardless of the job (must provide Docker image) * Resources are created in your account, managed by Batch * For any non-ETL related work, Batch is probably better

What AWS data service has the below features: * Quickly and securely migrate databases to AWS, resilient, self healing * The source database remains available during the migration * Supports: * Homogeneous migrations: ex Oracle to Oracle * Heterogeneous migrations: ex Microsoft SQL Server to Aurora * Continuous Data Replication using CDC * You must create an EC2 instance to perform the replication tasks

AWS Database Migration Service - DMS

What is the difference between AWS DMS and Glue?

Glue: * Glue ETL - Run Apache Spark code, Scala or Python based, focus on the ETL * Glue ETL - Do not worry about configuring or managing the resources * Data Catalog to make the data available to Athena or Redshift Spectrum * AWS DMS: * Continuous Data Replication * No data transformation * Once the data is in AWS, you can use Glue to transform it

What AWS Data service has the below features: For data migrations from on-premises to AWS storage services * A DataSync Agent is deployed as a VM and connects to your internal storage * NFS, SMB, HDFS * Encryption and data validation

AWS DataSync

* An Internet of Things (IOT) thing * Standard messaging protocol * Think of it as how lots of sensor data might get transferred to your machine learning model * The AWS IoT Device SDK can connect via ____

MQTT

What are the three major types of data?

* Numerical * Categorical * Ordinal

______ Represents some sort of quantitative measurement * Heights of people, page load times, stock prices, etc.

Numerical

_______ is Integer based; often counts of some event. * How many purchases did a customer make in a year? * How many times did I flip “heads”?

Discrete data

__________ * Has an infinite number of possible values * How much time did it take for a user to check out? * How much rain fell on a given day?

* Continuous Data

___________ is Qualitative data that has no inherent mathematical meaning * Gender, Yes/no (binary data), Race, State of Residence, Product Category, Political Party, etc.

Categorical data

A mixture of numerical and categorical * Categorical data that has mathematical meaning * Example: movie ratings on a 1-5 scale. * Ratings must be 1, 2, 3, 4, or 5 * But these values have mathematical meaning; 1 means it’s a worse movie than a 2.

Ordinal data

What AWS service has the below characteristics: * Interactive query service for S3 (SQL) * No need to load data, it stays in S3 * Presto under the hood * Serverless! * Supports many data formats * CSV (human readable) * JSON (human readable) * ORC (columnar, splittable) * Parquet (columnar, splittable) * Avro (splittable) * Unstructured, semi-structured, or structured

Amazon athena

What AWS service uses the below scenarios? * Ad-hoc queries of web logs * Querying staging data before loading to Redshift * Analyze CloudTrail / CloudFront / VPC / ELB etc logs in S3 * Integration with Jupyter, Zeppelin, RStudio notebooks * Integration with QuickSight * Integration via ODBC / JDBC with other visualization tools

amazon athena

What AWS service has the below cost model? Pay-as-you-go * $5 per TB scanned * Successful or cancelled queries count, failed queries do not. * No charge for DDL (CREATE/ALTER/DROP etc.) * Save LOTS of money by using columnar formats * ORC, Parquet * Save 30-90%, and get better performance

Athena

What AWS Service has the below characteristics: * Fast, easy, cloud-powered business analytics service * Allows all employees in an organization to: * Build visualizations * Perform ad-hoc analysis * Quickly get business insights from data * Anytime, on any device (browsers, mobile) * Serverless

Quicksight

What is the in memory database that is used by quicksight?

SPICE

What quicksight service is below: Machine learning-powered * Answers business questions with Natural Language Processing * “What are the top-selling items in Florida?” * Offered as an add-on for given regions * Personal training on how to use it is required * Must set up topics associated with datasets * Datasets and their fields must be NLP-friendly * How to handle dates must be defined

Quicksight Q

What quicksight service is below: Reports designed to be printed * May span many pages * Can be based on existing Quicksight dashboards * New in Nov 2022

Paginated Reports

What AWS Service is this: * Managed Hadoop framework on EC2 instances * Includes Spark, HBase, Presto, Flink, Hive & more * EMR Notebooks * Several integration points with AWS

Amazon EMR (Elastic Map Reduce)

What is this called: Applying your knowledge of the data – and the model you’re using - to create better features to train your model with. * Which features should I use? * Do I need to transform these features in some way? * How do I handle missing data? * Should I create new features from the existing ones?

Feature engineering

What is The Curse of Dimensionality ?

Too many features can be a problem – leads to sparse data * Every feature is a new dimension * Much of feature engineering is selecting the features most relevant to the problem at hand * This often is where domain knowledge comes into play

What AI data cleansing concept is below: Replace missing values with the mean value from the rest of the column (columns, not rows! A column represents a single feature; it only makes sense to take the mean from other samples of the same feature.) * Fast & easy, won’t affect mean or sample size of overall data set * Median may be a better choice than mean when outliers are present

Mean replacement

What are the cons of mean replacement?

Only works on column level, misses correlations between features * Can’t use on categorical features (imputing with most frequent value can work in this case, though) * Not very accurate

What solution to missing data is this : If not many rows contain missing data… * …and dropping those rows doesn’t bias your data… * …and you don’t have a lot of time… * …maybe it’s a reasonable thing to do. * But, it’s never going to be the right answer for the “best” approach.

Dropping data

What are the three ways to solve missing data with machine learning techniques?

*KNN: Find K “nearest” (most similar) rows and average their values * Assumes numerical data, not categorical * There are ways to handle categorical data (Hamming distance), but categorical data is probably better served by… * Deep Learning * Build a machine learning model to impute data for your machine learning model! * Works well for categorical data. Really well. But it’s complicated. * Regression * Find linear or non-linear relationships between the missing feature and other features * Most advanced technique: MICE (Multiple Imputation by Chained Equations)

What kind of data is this: Large discrepancy between “positive” and “negative” cases * i.e., fraud detection. Fraud is rare, and most rows will be notfraud * Don’t let the terminology confuse you; “positive” doesn’t mean “good” * It means the thing you’re testing for is what happened. * If your machine learning model is made to detect fraud, then fraud is the positive case. * Mainly a problem with neural networks

unbalanced data

To improve AI Data quality, what is the term below: Artificially generate new samples of the minority class using nearest neighbors * Run K-nearest-neighbors of each sample of the minority class * Create a new sample from the KNN result (mean of the neighbors) * Both generates new samples and undersamples majority class * Generally better than just oversampling

SMOTE (* Synthetic Minority Over-sampling TEchnique)

If you have too many false positives, one way to fix that is to simply increase that _________

threshold

_____ is simply the average of the squared differences from the mean

Variance

_____ is just the square root of the variance.

Standard Deviation 𝜎

Bucket observations together based on ranges of values. * Example: estimated ages of people * Put all 20-somethings in one classification, 30-somethings in another, etc

Binning

Applying some function to a feature to make it better suited for training

Transforming

Transforming data into some new representation required by the model

encoding

Some models prefer feature data to be normally distributed around 0 (most neural nets) * Most models require feature data to at least be scaled to comparable values * Otherwise features with larger magnitudes will have more weight than they should * Example: modeling age and income as features – incomes will be much higher values than ages

Scaling/normalization

Many algorithms benefit from _____ their training data * Otherwise they may learn from residual signals in the training data resulting from the order in which they were collected

shuffling

What is Ground Truth?

* Ground Truth manages humans who will label your data for training purposes * Ground Truth creates its own model as images are labeled by people * As this model learns, only images the model isn’t sure about are sent to human labelers

Turnkey solution * “Our team of AWS Experts” manages the workflow and team of labelers * You fill out an intake form * They contact you and discuss pricing

Ground truth plus

* AWS service for image recognition * Automatically classify images

Rekognition

* AWS service for text analysis and topic modeling * Automatically classify text by topics, sentiment

Comprehend

* Important data for search – figures out what terms are most relevant for a document *

TF-IDF * Stands for Term Frequency and Inverse Document Frequency

* just measures how often a word occurs in a document * A word that occurs frequently is probably important to that document’s meaning

Term Frequency

_____ is how often a word occurs in an entire set of documents, i.e., all of Wikipedia or every web page * This tells us about common words that just appear everywhere no matter what the topic, like “a”, “the”, “and”, et

Document Frequency

Can you explain bi grams and tri grams?

An extension of TF-IDF is to not only compute relevancy for individual words (terms) but also for bi-grams or, more generally, n-grams. * “I love certification exams” * Unigrams: “I”, “love”, “certification”, “exams” * Bi-grams: “I love”, “love certification”, “certification exams” * Tri-grams: “I love certification”, “love certification exams”

What are the three types of neural networks?

* Feedforward Neural Network * Convolutional Neural Networks (CNN) * Recurrent Neural Networks (RNNs)

What kind of activation function is this: It doesn’t really *do* anything * Can’t do backpropagation

Linear

What kind of activation function is this: * It’s on or off * Can’t handle multiple classification – it’s binary after all * Vertical slopes don’t work well with calculus!

Binary step function

What kind of activation function is this: * These can create complex mappings between inputs and outputs * Allow backpropagation (because they have a useful derivative) * Allow for multiple layers (linear functions degenerate to a single layer)

Non linear activation function

What kind of activation function is this: * Nice & smooth * Scales everything from 0-1 (Sigmoid / Logistic) or -1 to 1 (tanh / hyperbolic tangent) * But: changes slowly for high or low values * The “Vanishing Gradient” problem * Computationally expensive * Tanh generally preferred over sigmoid

Sigmoid / Logistic / TanH

What kind of activation function is this: Now we’re talking * Very popular choice * Easy & fast to compute * But, when inputs are zero or negative, we have a linear function and all of its problems

Rectified Linear Unit (ReLU)

What kind of activation function is this: Solves “dying ReLU” by introducing a negative slope below 0 (usually not as steep as this)

Leaky ReLU

What kind of activation function is this: * ReLU, but the slope in the negative part is learned via backpropagation * Complicated and YMMV

Parametric ReLU (PReLU)

What kind of activation function is this: * From Google, performs really well * But it’s from Google, not Amazon… * Mostly a benefit with very deep networks (40+ layers)

Swish

What kind of activation function is this: * Outputs the max of the inputs * Technically ReLU is a special case of maxout * But doubles parameters that need to be trained, not often practical.

Maxout

* Used on the final output layer of a multi-class classification problem * Basically converts outputs to probabilities of each classification * Can’t produce more than one label for something (sigmoid can)

Softmax

What are convolutional neural networks used for?

When you have data that doesn’t neatly align into columns * Images that you want to find features within * Machine translation * Sentence classification * Sentiment analysis * They can find features that aren’t in a specific spot * Like a stop sign in a picture * Or words within a sentence * They are “feature-location invariant”

_________ They can find features that aren’t in a specific spot * Like a stop sign in a picture * Or words within a sentence

convolutional neural network

True or false: CNNs are very resource-intensive (CPU, GPU, and RAM)

true

What are recurrent neural networks used for?

Time-series data * When you want to predict future behavior based on past behavior * Web logs, sensor logs, stock trades * Where to drive your self-driving car based on past trajectories * Data that consists of sequences of arbitrary length * Machine translation * Image captions * Machine-generated music

What neural network should you use: * Time-series data * When you want to predict future behavior based on past behavior * Web logs, sensor logs, stock trades * Where to drive you

recurrent neural network

________ deep learning architectures are what’s hot * Adopts mechanism of “self-attention” * Weighs significance of each part of the input data * Processes sequential data (like words, like an RNN), but processes entire input all at once. * The attention mechanism provides context, so no need to process one word at a time. * BERT, RoBERTa, T5, GPT-2 etc., DistilBERT * DistilBERT: uses knowledge distillation to reduce model size by 40%

Transformer

What is it called when the below things are used in AI? * NLP models (and others) are too big and complex to build from scratch and re-train every time * The latest may have hundreds of billions of parameters! * Model zoos such as Hugging Face offer pre-trained models to start from * Integrated with Sagemaker via Hugging Face Deep Learning Containers * You can fine-tune these models for your own use cases

transfer learning

Neural networks are trained by ________ (or similar means)

gradient descent

* Too high a learning rate means you might _________

overshoot the optimal solution!

* Too small a learning rate will _____

take too long to find the optimal solution

Learning rate is an example of a ___________

hyperparameter

Smaller batch sizes can work their way out of _________

“local minima” more easily

* Batch sizes that are too large can ________

end up getting stuck in the wrong solution

* Regularization techniques are intended to prevent ________.

overfitting

true or false: Overfitted models have learned patterns in the training data that don’t generalize to the real world

true

* Models that are good at making predictions on the data they were trained on, but not on new data it hasn’t seen before

overfitting

What is the vanishing gradient problem?

When the slope of the learning curve approaches zero, things can get stuck

_ regularization: sum of weights * Performs feature selection – entire features go to 0 * Computationally inefficient * Sparse output

L1 regularization

__ regularization: sum of square of weights * All features remain considered, just weighted * Computationally efficient * Dense output

L2 regularization

What matrix does the below show? * A test for a rare disease can be 99.9% accurate by just guessing “no” all the time * We need to understand true positives and true negative, as well as false positives and false negatives.

the confusion matrix

____ = AKA Sensitivity, True Positive rate, Completeness * Percent of positives rightly predicted * Good choice of metric when you care a lot about false negatives

recall

What is the formula for recall?

𝑇𝑅𝑈𝐸 𝑃𝑂𝑆𝐼𝑇𝐼𝑉𝐸𝑆/ (𝑇𝑅𝑈𝐸 𝑃𝑂𝑆𝐼𝑇𝐼𝑉𝐸𝑆+𝐹𝐴𝐿𝑆𝐸 𝑁𝐸𝐺𝐴𝑇𝐼𝑉𝐸)

____ = AKA Correct Positives * Percent of relevant results * Good choice of metric when you care a lot about false positives * i.e., medical screening, drug testing

precision

What is the formula for precision?

𝑇𝑅𝑈𝐸 𝑃𝑂𝑆𝐼𝑇𝐼𝑉𝐸𝑆 / (𝑇𝑅𝑈𝐸 𝑃𝑂𝑆𝐼𝑇𝐼𝑉𝐸𝑆+𝐹𝐴𝐿𝑆𝐸 𝑃𝑂𝑆𝐼𝑇𝐼𝑉𝐸𝑆)

* Plot of true positive rate (recall) vs. false positive rate at various threshold settings. * Points above the diagonal represent good classification (better than random) * Ideal curve would just be a point in the upper-left corner * The more it’s “bent” toward the upper-left, the better

ROC Curve * Receiver Operating Characteristic Curve

Equal to probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one * ROC AUC of 0.5 is a useless classifier, 1.0 is perfect * Commonly used metric for comparing classifiers

* Area Under the Curve (AUC)

Good = higher area under curve * Similar to ROC curve * But better suited for information retrieval problems * ROC can result in very small values if you are searching large number of documents for a tiny number that are relevant

* Precision / Recall curve

__________ = Generate N new training sets by random sampling with replacement * Each resampled model can be trained in parallel

bagging

_____ = * Observations are weighted * Some will take part in new training sets more often * Training is sequential; each classifier takes into account the previous one’s success.

boosting

What type of sagemaker built in algorithm is this: Linear regression * Fit a line to your training data * Predications based on that line * Can handle both regression (numeric) predictions and classification predictions * For classification, a linear threshold function is used. * Can do binary or multi-class

Linear learner

For linear learner, it can handle both regression (numeric) predictions and _______ predictions

classification predictions

Linear Learner: What training input does it expect?

* RecordIO-wrapped protobuf * CSV * File or Pipe mode both supported

Linear learner: Preprocessing * Training data must be ______(so all features are weighted the same) * Linear Learner can do this for you automatically

normalized

What does sagemaker linear learner use in training?

Uses stochastic gradient descent

What type of sagemaker built in algorithm is this: Boosted group of decision trees * New trees made to correct the errors of previous trees * Uses gradient descent to minimize loss as new trees are added

XGBoost

What type of training input does xgboost expect?

it takes CSV or libsvm input.

With xgboost, Models are serialized/deserialized with ___

Pickle

What type of sagemaker built in algorithm is this: * Input is a sequence of tokens, output is a sequence of tokens * Machine Translation * Text summarization * Speech to text * Implemented with RNN’s and CNN’s with attention

Seq2Seq

What sagemaker built in algorithm maps to the below training inputs : * RecordIO-Protobuf * Tokens must be integers (this is unusual, since most algorithms want floating point data.) * Start with tokenized text files * Convert to protobuf using sample code * Packs into integer tensors with vocabulary files * A lot like the TF/IDF lab we did earlier. * Must provide training data, validation data, and vocabulary files.

Seq2Seq

Seq2Seq can optimize on : * Accuracy -Vs. provided validation dataset * __ score * Compares against multiple reference translations * Perplexity * Cross-entropy

BLEU score

Seq2Seq: Instance Types Can only use ____ instance types (P3 for example) * Can only use a single machine for training * But can use multi-GPU’s on one machine

GPU instance types

What sagemaker algorithm has the below characteristics? * Forecasting one-dimensional time series data * Uses RNN’s * Allows you to train the same model over several related time series * Finds frequencies and seasonality

DeepAR

What sagemaker algorithm has the below training input needs? JSON lines format * Gzip or Parquet * Each record must contain: * Start: the starting time stamp * Target: the time series values * Each record can contain: * Dynamic_feat: dynamic features (such as, was a promotion applied to a product in a time series of product purchases) * Cat: categorical features

DeepAR

For DeepAR, Always include entire _____ for training, testing, and inference

time series

For deepAR, start with ___, move up to __ if necessary.

CPU, GPU

What sagemaker algorithm has the below characteristics: * Text classification * Predict labels for a sentence * Useful in web searches, information retrieval * Supervised * Word2vec * Creates a vector representation of words * Semantically similar words are represented by vectors close to each other * This is called a word embedding * It is useful for NLP, but is not an NLP algorithm in itself! * Used in machine translation, sentiment analysis * Remember it only works on individual words, not sentences or documents

BlazingText

BlazingText: What training input does it expect?

* For supervised mode (text classification): * One sentence per line * First “word” in the sentence is the string __label__ followed by the label * Also, “augmented manifest text format” * Word2vec just wants a text file with one training sentence per line.

What type of sagemaker algorithm is below: * It creates low-dimensional dense embeddings of high-dimensional objects * It is basically word2vec, generalized to handle things other than words. * Compute nearest neighbors of objects * Visualize clusters * Genre prediction * Recommendations (similar items or users)

Object2Vec

What type of algorithm has the below training requirements: * Data must be tokenized into integers * Training data consists of pairs of tokens and/or sequences of tokens * Sentence – sentence * Labels-sequence (genre to description?) * Customer-customer * Product-product * User-item

Object2Vec

For object2vec, you Process data into ___ and shuffle it

JSON Lines

What are important hyperparameters for Object2Vec?

* The usual deep learning ones… * Dropout, early stopping, epochs, learning rate, batch size, layers, activation function, optimizer, weight decay * Enc1_network, enc2_network * Choose hcnn, bilstm, pooled_embedding

What sagemaker algorithm is below: * Identify all objects in an image with bounding boxes * Detects and classifies objects with a single deep neural network * Classes are accompanied by confidence scores * Can train from scratch, or use pretrained models based on ImageNet

object detection

What are the two variants of sagemaker object detection?

MXNet and Tensorflow * Takes an image as input, outputs all instances of objects in the image with categories and confidence scores * MXNet * Uses a CNN with the Single Shot multibox Detector (SSD) algorithm * The base CNN can be VGG-16 or ResNet-50 * Transfer learning mode / incremental training * Use a pre-trained model for the base network weights, instead of random initial weights * Uses flip, rescale, and jitter internally to avoid overfitting * Tensorflow * Uses ResNet, EfficientNet, MobileNet models from the TensorFlow Model Garden

What training input does object detection expect?

* MXNet: RecordIO or image format (jpg or png) * With image format, supply a JSON file for annotation data for each image

Whats the difference between object detection and image classification?

Object detection will show the specific point in the image where the object is. Image classification will classify the image and tell you what it is, not where it is

Image Classification: What’s it for?

* Assign one or more labels to an image * Doesn’t tell you where objects are, just what objects are in the image

For image classification , there are Separate algorithms for ________ and _____

MXNet and Tensorflow

Semantic Segmentation: What’s it for?

* Pixel-level object classification * Different from image classification – that assigns labels to whole images * Different from object detection – that assigns labels to bounding boxes * Useful for self-driving vehicles, medical imaging diagnostics, robot sensing

* Useful for self-driving vehicles, medical imaging diagnostics, robot sensing

semantic segmentation

Semantic Segmentation: What training input does it expect?

* JPG Images and PNG annotations * For both training and validation * Label maps to describe annotations * Augmented manifest image format supported for Pipe mode. * JPG images accepted for inference

What form of sagemaker algorithm tool has the below choices: Choice of 3 algorithms: * Fully-Convolutional Network (FCN) * Pyramid Scene Parsing (PSP) * DeepLabV3

semantic segmentation

Random cut forest us used for ________

anomaly detection

Neural Topic Model: What’s it for?

* Organize documents into topics * Classify or summarize documents based on topics * It’s not just TF/IDF * “bike”, “car”, “train”, “mileage”, and “speed” might classify a document as “transportation” for example (although it wouldn’t know to call it that)

What are the four data channels for neural topic model?

* Four data channels * “train” is required * “validation”, “test”, and “auxiliary” optional

Neural Topic Model: How is it used?

* You define how many topics you want * These topics are a latent representation based on top ranking words * One of two topic modeling algorithms in SageMaker – you can try them both!

Another topic modeling algorithm * Not deep learning * Unsupervised * The topics themselves are unlabeled; they are just groupings of documents with a shared subset of words * Can be used for things other than words * Cluster customers based on purchases * Harmonic analysis in music

* Latent Dirichlet Allocation (LDA)

What sagemaker algorithm: Unsupervised; generates however many topics you specify * Optional test channel can be used for scoring results * Per-word log likelihood * Functionally similar to NTM, but CPU-based * Therefore maybe cheaper / more efficient

* Latent Dirichlet Allocation (LDA)

Simple classification or regression algorithm * Classification * Find the K closest points to a sample point and return the most frequent label * Regression * Find the K closest points to a sample point and return the average value

* K-Nearest-Neighbors - KNN

for KNN: SageMaker includes a ___________ stage * Avoid sparse data (“curse of dimensionality”) * At cost of noise / accuracy * “sign” or “fjlt” methods

dimensionality reduction

These are important hyperparameters for what algorithm: * K! * Sample_size

KNN

What sagemaker algorithm: * Unsupervised clustering * Divide data into K groups, where members of a group are as similar as possible to each other * You define what “similar” means * Measured by Euclidean distance * Web-scale K-Means clustering

K Means

These are important hyperparameters for what algorithm: * K! * Choosing K is tricky * Plot within-cluster sum of squares as function of K * Use “elbow method” * Basically optimize for tightness of clusters * Mini_batch_size * Extra_center_factor * Init_method

K means

What is the below sagemaker algorithm: * Dimensionality reduction * Project higher-dimensional data (lots of features) into lower-dimensional (like a 2D plot) while minimizing loss of information * The reduced dimensions are called components * First component has largest possible variability * Second component has the next largest… * Unsupervised

* Principal Component Analysis PCA

PCA: What training input does it expect?

* recordIO-protobuf or CSV * File or Pipe on either

What sagemaker algorithm: * Covariance matrix is created, then singular value decomposition (SVD) Two modes: * Regular * For sparse data and moderate number of observations and features * Randomized * For large number of observations and features * Uses approximation algorithm

* Principal Component Analysis PCA

What sagemaker algorithm: Dealing with sparse data * Click prediction * Item recommendations * Since an individual user doesn’t interact with most pages / products the data is sparse * Supervised * Classification or regression * Limited to pair-wise interactions * User -> item for example

factorization machines

What sagemaker algorithm: Finds factors we can use to predict a classification (click or not? Purchase or not?) or value (predicted rating?) given a matrix representing some pair of things (users & items?) * Usually used in the context of recommender systems

factorization machines

What sagemaker algorithm: * Unsupervised learning of IP address usage patterns * Identifies suspicious behavior from IP addresses * Identify logins from anomalous IP’s * Identify accounts creating resources from anomalous IP’s

IP Insights

What sagemaker algorithm: * Uses a neural network to learn latent vector representations of entities and IP addresses. * Entities are hashed and embedded * Need sufficiently large hash size * Automatically generates negative samples during training by randomly pairing entities and IP’s

IP Insights

What sagemaker algorithm: * You have some sort of agent that “explores” some space * As it goes, it learns the value of different state changes in different conditions * Those values inform subsequent behavior of the agent * Examples: Pac-Man, Cat & Mouse game (game AI) * Supply chain management * HVAC systems * Industrial robotics * Dialog systems * Autonomous vehicles * Yields fast on-line performance once the space has been explored

reinforcement learning

What sagemaker algorithm: * A specific implementation of reinforcement learning * You have: * A set of environmental states s * A set of possible actions in those states a * A value of each state/action Q * Start off with Q values of 0 * Explore the space * As bad things happen after a given state/action, reduce its Q * As rewards happen after a given state/action, increase its Q

q learning

Reinforcement Learning in SageMaker * Uses a deep learning framework with ____ and ________

Tensorflow and MXNet

What is this called: * SageMaker spins up a “HyperParameter Tuning Job” that trains as many combinations as you’ll allow * Training instances are spun up as needed, potentially a lot of them * The set of hyperparameters producing the best results can then be deployed as a model * It learns as it goes, so it doesn’t have to try every possible combination

Automatic Model Tuning

* Visual IDE for machine learning!

SageMaker Studio

Create and share Jupyter notebooks with SageMaker Studio * Switch between hardware configurations (no infrastructure to manage)

Sagemaker notebooks

* Organize, capture, compare, and search your ML jobs

Sagemaker experiments

* Saves internal model state at periodical intervals * Gradients / tensors over time as a model is trained * Define rules for detecting unwanted conditions while training * A debug job is run for each rule you configure * Logs & fires a CloudWatch event when the rule is hit

sagemaker debugger

* Automates: * Algorithm selection * Data preprocessing * Model tuning * All infrastructure * It does all the trial & error for you * More broadly this is called AutoML

Sagemaker autopilot

* Integrates with SageMaker Clarify * Transparency on how models arrive at predictions * Feature attribution

autopilot explainability

* Get alerts on quality deviations on your deployed models (via CloudWatch) * Visualize data drift * Example: loan model starts giving people more credit due to drifting or missing input features * Detect anomalies & outliers * Detect new features * No code needed

Sagemaker model monitor

* _________ detects potential bias * i.e., imbalances across different groups / ages / income brackets * With ModelMonitor, you can monitor for bias and be alerted to new potential bias via CloudWatch * SageMaker Clarify also helps explain model behavior * Understand which features contribute the most to your predictions

SageMaker Clarify

* A “feature” is just a property used to train a machine learning model. * Like, you might predict someone’s political party based on “features” such as their address, income, age, etc. * Machine learning models require fast, secure access to feature data for training. * It’s also a challenge to keep it organized and share features across different models.

sagemaker feature store

* Creates & stores your ML workflow (MLOps) * Keep a running history of your models * Tracking for auditing and compliance * Automatically or manually-created tracking entities * Integrates with AWS Resource Access Manager for cross-account lineage

SageMaker ML Lineage Tracking

* Visual interface (in SageMaker Studio) to prepare data for machine learning * Import data * Visualize data * Transform data (300+ transformations to choose from) * Or integrate your own custom xformswith pandas, PySpark, PySpark SQL * “Quick Model” to train your model with your data and measure its results

Sagemaker data wrangler

* No-code machine learning for business analysts * Upload csv data (csv only for now), select a column to predict, build it, and make predictions * Can also join datasets * Classification or regression

sagemaker canvas

________ * For asynchronous or real-time inference endpoints * Controls shifting traffic to new models * “Blue/Green Deployments” * All at once: shift everything, monitor, terminate blue fleet * Canary: shift a small portion of traffic and monitor * Linear: Shift traffic in linearly spaced steps * Auto-rollbacks

Deployment Guardrails

________ * Compare performance of shadow variant to production * You monitor in SageMaker console and decide when to promote it

Shadow Tests

One facet (demographic group) has fewer training values than another

* Class Imbalance (CI)

* Imbalance of positive outcomes between facet values

* Difference in Proportions of Labels (DPL)

* How much outcome distributions of facets diverge

* Kullback-Leibler Divergence (KL), Jensen-Shannon Divergence(JS)

* P-norm difference between distributions of outcomes from facets

* Lp-norm (LP)

* L1-norm difference between distributions of outcomes from facets

* Total Variation Distance (TVD)

* Maximum divergence between outcomes in distributions from facets

* Kolmogorov-Smirnov (KS)

* Disparity of outcomes between facets as a whole, and by subgroups

* Conditional Demographic Disparity (CDD)

* Integrated into AWS Deep Learning Containers (DLCs) * Can’t bring your own container * Compile & optimize training jobs on GPU instances * Can accelerate training up to 50% * Converts models into hardware-optimized instructions * Tested with Hugging Face transformers library, or bring your own model

SageMaker Training Compiler

What AI Service: * Natural Language Processing and Text Analytics * Input social media, emails, web pages, documents, transcripts, medical records (Comprehend Medical) * Extract key phrases, entities, sentiment, language, syntax, topics, and document classifications * Events detection * PII Identification & Redaction * Targeted sentiment (for specific entities) * Can train on your own data

Amazon comprehend

What AI Service: * Uses deep learning for translation * Supports custom terminology * In CSV or TMX format * Appropriate for proper names, brand names, etc.

Amazon Translate

What AI service: * Speech to text * Input in FLAC, MP3, MP4, or WAV, in a specified language * Streaming audio supported (HTTP/2 or WebSocket) * French, English, Spanish only * Speaker Identificiation * Specify number of speakers * Channel Identification * i.e., two callers could be transcribed separately * Merging based on timing of “utterances” * Automatic Language Identification * You don’t have to specify a language; it can detect the dominant one spoken. * Custom Vocabularies * Vocabulary Lists (just a list of special words – names, acronyms) * Vocabulary Tables (can include “SoundsLike”, “IPA”, and “DisplayAs”)

Amazon Transcribe

What AI Service: * Neural Text-To-Speech, many voices & languages * Lexicons * Customize pronunciation of specific words & phrases * Example: “World Wide Web Consortium” instead of “W3C” * SSML * Alternative to plain text * Speech Synthesis Markup Language * Gives control over emphasis, pronunciation, breathing, whispering, speech rate, pitch, pauses. * Speech Marks * Can encode when sentence / word starts and ends in the audio stream * Useful for lip-synching animation

Amazon Polly

What AI Service: * Computer vision * Object and scene detection * Can use your own face collection * Image moderation * Facial analysis * Celebrity recognition * Face comparison * Text in image * Video analysis * Objects / people / celebrities marked on timeline * People Pathing * Image and video libraries

Rekognition

What AI Service: * Fully-managed service to deliver highly accurate forecasts with ML *“AutoML” chooses best model for your time series data * ARIMA, DeepAR, ETS, NPTS, CNN-QR Prophet * Works with any time series * Price, promotions, economic performance, etc. * Can combine with associated data to find relationships * Inventory planning, financial planning, resource planning * Based on “dataset groups,” “predictors,” and “forecasts.”

Amazon Forecast

What AI Tool: * Billed as the inner workings of Alexa * Natural-language chatbot engine * A Bot is built around Intents * Utterances invoke intents (“I want to order a pizza”) * Lambda functions are invoked to fulfill the intent * Slots specify extra information needed by the intent * Pizza size, toppings, crust type, when to deliver, etc. * Can deploy to AWS Mobile SDK, Facebook Messenger, Slack, and Twilio

Amazon Lex

What AI Service: * Fully-managed recommender engine * Same one Amazon uses * API access * Feed in data (purchases, ratings, impressions, cart adds, catalog, user demographics etc.) via S3 or API integration * You provide an explicit schema in Avro format * Javascript or SDK * GetRecommendations * Recommended products, content, etc. * Similar items * GetPersonalizedRanking * Rank a list of items provided * Allows editorial control / curation

Amazon Personalize

What AI Service: * Equipment, metrics, vision * Detects abnormalities from sensor data automatically to detect equipment issues * Monitors metrics from S3, RDS, Redshift, 3rd party SaaS apps * Vision uses computer vision to detect defects in silicon wafers, circuit boards, etc.

Amazon Lookout

What AI Service: * End to end system for monitoring industrial equipment & predictive maintenance

Amazon Monitron

What AI Service: * Computer Vision at the edge * Brings computer vision to your existing IP cameras

AWS Panorama

What AI Tool: * Upload your own historical fraud data * Builds custom models from a template you choose * Exposes an API for your online application

Amazon Fraud Detector

What AI Service: * Automated code reviews! * Finds lines of code that hurt performance * Resource leaks, race conditions * Fix security vulnerabilities

Codeguru

What AI Service: * For customer support call centers * Ingests audio data from recorded calls * Allows search on calls / chats * Sentiment analysis * Find “utterances” that correlate with successful calls * Categorize calls automatically * Measure talk speed and interruptions * Theme detection: discovers emerging issues

Contact Lens for Amazon Connect

What AI Service: * Enterprise search with natural language * For example, “Where is the IT support desk?” “How do I connect to my VPN?” * Combines data from file systems, SharePoint, intranet, sharing services (JDBC, S3) into one searchable repository * ML-powered (of course) – uses thumbs up / down feedback * Relevance tuning – boost strength of document freshness, view counts, etc.

Amazon Kendra

What AI Service: * Human review of ML predictions * Builds workflows for reviewing low-confidence predictions * Access the Mechanical Turk workforce or vendors * Integrated into Amazon Textract and Rekognition * Integrates with SageMaker * Very similar to Ground Truth

Amazon Augmented AI (A2I)

* All models in SageMaker are hosted in ________

Docker containers

* Docker containers are created from ______

images

* Images are built from a _______

Dockerfile

* Images are saved in a ________

repository

* Train once, run anywhere * Edge devices * ARM, Intel, Nvidia processors * Embedded in whatever – your car? * Optimizes code for specific devices * Tensorflow, MXNet, PyTorch, ONNX, XGBoost, DarkNet, Keras * Consists of a compiler and a runtime

Sagemaker Neo

ML Fundamentals Flashcards

(224 cards)