Flashcards

AI

1
Q

_________ is a field of computer science dedicated to solving
problems that we commonly associate with human intelligence

A

Artificial Intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Used to generate new data that is similar to the data it was trained on
* Text
* Image
* Audio
* Code
* Video…

A

Generative AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

To generate data, we must rely on a __________
* ___________ are trained on a wide variety of input data
* The models may cost tens of millions of dollars to train

A

Foundation Model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Type of AI designed to generate coherent human-like text
* One notable example: GPT-4 (ChatGPT / Open AI)
* Trained on large corpus of text data
* Usually very big models
* Billions of parameters
* Trained on books, articles, websites, other textual data
* Can perform language-related tasks
* Translation, Summarization
* Question answering * Content creatio

A

Large Language Models (LLM)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

We usually interact with the LLM by giving a ____

A

prompt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the term for below: the generated text may be different for every user that uses
the same prompt

A

Non-deterministic:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What’s Amazon Titan?

A
  • High-performing Foundation Models from AWS
  • Image, text, multimodal model choices via a fully-managed APIs
  • Can be customized with your own data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What term goes with this:
-Adapt a copy of a foundation model with your own data

A

Fine Tuning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

*Improves the performance of
a pre-trained FM on domain-specific tasks
* = further trained on a
particular field or area of
knowledge

A

Instruction based fine tuning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

-make a model expert in a specific domain
* For example: feeding the entire AWS
documentation to a model to make it an expert on AWS
* Good to feed industry-specific terminology
into a model (acronyms, etc…)
* Can continue to train the model as more
data becomes available

A

domain-adaptation fine-tuning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
  • Part of instruction-based
    fine-tuning
  • system (optional) : context
    for the conversation.
  • messages : An array of
    message objects, each
    containing:
  • role :
    Either user or assistant
  • content : The text content
    of the message
A

single turn messaging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  • To provide instructionbased fine tuning for a
    conversation (vs SingleTurn Messaging)
  • Chatbots = multi-turn
    environment
  • You must alternate
    between “user” and
    “assistant” roles
A

multi turn messaging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

True or false: Instruction-based fine-tuning is usually cheaper than re training an FM as computations are
less intense and the amount of data required usually less

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

_________ the broader concept of re-using a pre-trained model to adapt it to a new related task
* Widely used for image classification
* And for NLP (models like BERT and GPT)

A

transfer learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

This is a good use case of _____

  • A chatbot designed with a particular persona or tone, or geared
    towards a specific purpose (e.g., assisting customers, crafting
    advertisements)
  • Training using more up-to-date information than what the language
    model previously accessed
  • Training with exclusive data (e.g., your historical emails or messages,
    records from customer service interactions)
  • Targeted use cases (categorization, assessing accuracy)
A

fine tuning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does it mean to automatically evaluate a model?

A

Evaluate a model for quality control.
Scores are calculated automatically

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does it mean to have human evaluation of a model?

A
  • Choose a work team to evaluate
  • Employees of your company
  • Subject-Matter Experts (SMEs)
  • Define metrics and how to evaluate
  • Thumbs up/down, ranking
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
  • Curated collections of data designed specifically
    at evaluating the performance of language
    models
  • Wide range of topics, complexities, linguistic
    phenomena
  • Helpful to measure: accuracy, speed and
    efficiency, scalability
A

benchmark datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

_________
* Semantic similarity between generated text
* Uses pre-trained ___ models (Bidirectional Encoder Representations from Transformers) to compare the
contextualized embeddings of both texts and computes the cosine similarity between them.
* Capable of capturing more nuance between the texts

A
  • BERTScore
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q
  • Evaluate the quality of generated text, especially for translations
  • Considers both precision and penalizes too much brevity
  • Looks at a combination of n-grams (1, 2, 3, 4)
A

BLEU: Bilingual Evaluation Understudy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Evaluating automatic summarization and machine translation systems
* ____-N – measure the number of matching n-grams between reference and generated text
* _____–L – longest common subsequence between reference and generated text

A
  • ROUGE: Recall-Oriented Understudy for Gisting Evaluation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q
  • Allows a Foundation Model to reference a data source outside of its training data
  • Bedrock takes care of creating Vector Embeddings in the database of your choice based on your data
  • Use where real-time data is needed to be fed into the Foundation Model
A
  • RAG = Retrieval-Augmented Generation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

search & analytics database
real time similarity queries, store millions of vector embeddings
scalable index management, and fast nearest-neighbor (kNN) search capability

A

Amazon OpenSearch Service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

[with MongoDB compatibility] – NoSQL database
real time similarity queries, store millions of vector embeddings

A

Amazon DocumentDB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
These are examples of what use cases? * Customer Service Chatbot * Knowledge Base – products, features, specifications, troubleshooting guides, and FAQs * ___application – chatbot that can answer customer queries * Legal Research and Analysis * Knowledge Base – laws, regulations, case precedents, legal opinions, and expert analysis * ____Application – chatbot that can provide relevant information for specific legal queries * Healthcare Question-Answering * Knowledge base – diseases, treatments, clinical guidelines, research papers, patients… * ___application – chatbot that can answer complex medical queries
RAG Use case
26
__________: converting raw text into a sequence of tokens
Tokenization
26
* The number of tokens an LLM can consider when generating text
Context Window
26
What is the first factor to look at when considering a model?
the context window. The larger the context window, the more information and coherence
27
* Control the interaction between users and Foundation Models (FMs) * Filter undesirable and harmful content * Remove Personally Identifiable Information (PII) * Enhanced privacy * Reduce hallucinations
guardrails
27
* Create vectors (array of numerical values) out of text, images or audio * Vectors have a high dimensionality to capture many features for one input token, such as semantic meaning, syntactic role, sentiment * _____models can power search applications
Embedding
28
Manage and carry out various multi-step tasks related to infrastructure provisioning, application deployment, and operational activities * Task coordination: perform tasks in the correct order and ensure information is passed correctly between tasks * _____are configured to perform specific pre-defined action groups
agents
29
Send logs of all invocations to Amazon CloudWatch and S3 * Can include text, images and embeddings * Analyze further and build alerting thanks to CloudWatch Logs Insights
* Model Invocation Logging
30
* Published metrics from Bedrock to _________
* CloudWatch Metrics
31
_________ – give access to Amazon Bedrock to your team so they can easily create AI -powered applications
* Bedrock Studio`
32
* _________ – check if an image was generated by Amazon Titan Generator
Watermark detection
33
What is the bedrock pricing model for image models?
charged for every image generated
34
What is the bedrock pricing model for embedding models?
charged for every input token processed
35
What is the bedrock pricing model for text models?
charged for every input/output token processed
36
Model Improvement Techniques Cost Order: Put the cheapest at the top and the expensive at the bottom. Prompt Engineering, Domain Adaptation Fine-tuning, Instruction-based Fine-tuning, Retrieval Augmented Generation (RAG)
1. Prompt Engineering * No model training needed (no additional computation or fine-tuning) 2. Retrieval Augmented Generation (RAG) * Uses external knowledge (FM doesn’t need to ”know everything”, less complex) * No FM changes (no additional computation or fine-tuning) 3. Instruction-based Fine-tuning * FM is fine-tuned with specific instructions (requires additional computation) 4. Domain Adaptation Fine-tuning * Model is trained on a domain-specific dataset (requires intensive computation)
37
usually a smaller model will be cheaper (T/F)
True
38
developing, designing, and optimizing prompts to enhance the output of FMs for your needs
Prompt engineering
39
* Prompt gives a lot of guidance and leaves little into the model’s interpretation True or false
* false, Prompt gives little guidance and leaves a lot to the model’s interpretation
40
what are four improved prompting techniques?
* Instructions – a task for the model to do (description, how the model should perform) * Context – external information to guide the model * Input data – the input for which you want a response * Output Indicator – the output type or format
41
A technique where you explicitly instruct the model on what not to include or do in its response
negative prompting
42
True or false. Negative prompting aims to avoid Unwanted Content – explicitly states what not to include, reducing the chances of irrelevant or inappropriate content
true
43
What is temperature in prompt engineering?
creativity of the model’s output * Low (ex: 0.2) – outputs are more conservative, repetitive, focused on most likely response * High (ex: 1.0) – outputs are more diverse, creative, and unpredictable, maybe less coherent
44
_______ is how fast the model responds
prompt latency
45
What type of prompt engineering technique is this: Present a task to the model without providing examples or explicit training for that specific task
zero shot prompting
46
What type of prompt engineering technique is this: What type of prompt engineering technique is this:
few shots prompting
47
What type of prompt engineering technique is this: * Divide the task into a sequence of reasoning steps, leading to more structure and coherence * Using a sentence like “Think step by step” helps * Helpful when solving a problem as a human usually requires several steps
Chain of Thought Prompting
48
What type of prompt engineering technique is this: * Combine the model’s capability with external data sources to generate a more informed and contextually rich response
Retrieval-Augmented Generation (RAG)
49
* Simplify and standardize the process of generating Prompts * Helps with: * Processes user input text and output prompts from foundation models (FMs) * Orchestrates between the FM, action groups, and knowledge bases * Formats and returns responses to the user * You can also provide examples with few-shots prompting to improve the model performance
prompt templates
50
* Amazon QuickSight is used to visualize your data and create dashboards about them * Amazon Q understands natural language that you use to ask questions about your data * Create executive summaries of your data * Ask and answer questions of data * Generate and edit visuals for your dashboards
Amazon Q for quicksight
51
* EC2 instances are the virtual servers you can start in AWS * Amazon Q for EC2 provides guidance and suggestions for EC2 instance types that are best suited to your new workload * Can provide requirements using natural language to get even more suggestions or ask for advice by providing other workload requirements
Amazon Q for EC2
52
* AWS Chatbot is a way for you to deploy an AWS Chatbot in a Slack or Microsoft Teams channel that knows about your AWS account * Troubleshoot issues, receive notifications for alarms, security findings, billing alerts, create support request * You can access Amazon Q directly in AWS Chatbot to accelerate understanding of the AWS services, troubleshoot issues, and identify remediation paths
Amazon Q for AWS Chatbot
53
* Fully managed Gen-AI assistant for your employees * Based on your company’s knowledge and data * Answer questions, provide summaries, generate content, automate tasks * Perform routine actions (e.g., submit time-off requests, send meeting invites) * Built on Amazon Bedrock (but you can’t choose the underlying FM)
Amazon Q for business
54
* Create Gen AI-powered apps without coding by using natural language * Leverages your company’s internal data * Possibility to leverage plugins (Jira, etc…)
Amazon Q apps
55
* Answer questions about the AWS documentation and AWS service selection * Answer questions about resources in your AWS account * Suggest CLI (Command Line Interface) to run to make changes to your account * Helps you do bill analysis, resolve errors, troubleshooting
amazon q developer
56
is a broad field for the development of intelligent systems capable of performing tasks that typically require human intelligence:
Artificial Intelligence
57
What AI Component is this: collect vast amount of data
data layer
58
What AI Component is this: data scientists and engineer work together to understand use cases, requirements, and frameworks that can solve them
* ML Framework and Algorithm Layer
59
What AI Component is this: implement a model and train it, we have the structure, the parameters and functions, optimizer function
model layer
60
What AI Component is this: how to serve the model, and its capabilities for your users
application layer
61
* _______ is a type of AI for building methods that allow machines to learn * Data is leveraged to improve computer performance on a set of task * Make predictions based on data used to train the model * No explicit programming of rules
Machine Learning
62
* Uses neurons and synapses (like our brain) to train a model * Process more complex patterns in the data than traditional ML
Deep Learning
63
True or False: Natural Language Processing is NOT an example of deep learning.
False, it is
64
Is generative AI a subset of deep learning?
Yes
65
* Powerful models that can understand and generate human-like text * Trained on vast amounts of text data from the internet, books, and other sources, and learn patterns and relationships between words and phrases * Example: Google BERT, OpenAI ChatGPT
Transformer based LLMs
66
* Able to process a sentence as a whole instead of word by word * Faster and more efficient text processing (less training time) * It gives relative importance to specific words in a sentence (more coherent sentences
Transformer based LLMs
67
* Does NOT rely on a single type of input (text, or images, or audio only) * Does NOT create a single type of output * Example: a ______ can take a mix of audio, image and text and output a mix of video, text for example
Multi-modal Models
68
– generate human text or computer code based on input prompt
GPT (Generative Pre-trained Transformer)
69
– similar intent to GPT, but reads the text in two directions
BERT (Bidirectional Encoder Representations from Transformers)
70
– meant for sequential data such as time-series or text, useful in speech recognition, time-series prediction
RNN (Recurrent Neural Network)
71
used for image recognition tasks, object detection, facial recognition
ResNet (Residual Network) – Deep Convolutional Neural Network (CNN)
72
– ML algorithm for classification and regression
SVM (Support Vector Machine)
73
– model to generate raw audio waveform, used in Speech Synthesis
WaveNet
74
– models used to generate synthetic data such as images, videos or sounds that resemble the training data. Helpful for data augmentatio
GAN (Generative Adversarial Network)
75
– an implementation of gradient boosting
XGBoost (Extreme Gradient Boosting)
76
Data includes both input features and corresponding output labels
labeled data
77
Data includes only input features without any output labels
unlabeled data
78
* Data is organized in a structured format, often in rows and columns (like Excel)
structured data
79
Data is arranged in a table with rows representing records and columns representing features
tabular data
80
Data points collected or recorded at successive points in time
time series data
81
* Data that doesn't follow a specific structure and is often text-heavy or multimedia content
unstructured data
82
* Learn a mapping function that can predict the output for new unseen input data * Needs labeled data: very powerful, but difficult to perform on millions of datapoints
supervised learning
83
* Used to predict a numeric value based on input data * The output variable is continuous, meaning it can take any value within a range
Supervised Learning – Regression
84
What type of supervised learning do these scenarios represent? * Predicting House Prices – based on features like size, location, and number of bedrooms * Stock Price Prediction – predicting the future price of a stock based on historical data and other features * Weather Forecasting – predicting temperatures based on historical weather data
regression
85
Used to predict the categorical label of input data * The output variable is discrete, which means it falls into a specific category or class * Use cases: scenarios where decisions or predictions need to be made between distinct categories (fraud, image classification, customer retention, diagnostics)
Supervised Learning – Classification
86
* Used to train the model * Percentage: typically, 60-80% of the dataset * Example: 800 labeled images from a dataset of 1000 images
training set
87
* Used to tune model parameters and validate performance * Percentage: typically, 10-20% of the dataset * Example: 100 labeled images for hyperparameter tuning (tune the settings of the algorithm to make it more efficient)
validation set
88
* Used to evaluate the final model performance * Percentage: typically, 10-20% of the dataset * Example: 100 labeled images to test the model's accuracy
test set
89
* The process of using domain knowledge to select and transform raw data into meaningful features * Helps enhancing the performance of machine learning models
feature engineering
90
– extracting useful information from raw data, such as deriving age from date of birth
feature extraction
91
– selecting a subset of relevant features, like choosing important predictors in a regression model
Feature Selection
92
– transforming data for better model performance, such as normalizing numerical data
Feature Transformation
93
* _______ – deriving new features like “price per square foot” * _________ – identifying and retaining important features such as location or number of bedrooms * _________ – normalizing features to ensure they are on a similar scale, which helps algorithms like gradient descent converge faster
Feature Creation Feature Selection Feature Transformation
94
* The goal is to discover inherent patterns, structures, or relationships within the input data * The machine must uncover and create the groups itself, but humans still put labels on the output groups
unsupervised learning
95
* Used to group similar data points together into clusters based on their features
unsupervised learning - clustering
96
* Use a small amount of labeled data and a large amount of unlabeled data to train systems * After that, the partially trained algorithm itself labels the unlabeled data * This is called pseudo-labeling * The model is then re-trained on the resulting data mix without being explicitly programmed
Semi-supervised Learning
97
* A type of Machine Learning where an agent learns to make decisions by performing actions in an environment to maximize cumulative rewards
reinforcement learning
98
What are the associated reinforcement learning concepts: * Key Concepts * __– the learner or decision-maker * _____– the external system the agent interacts with * ____– the choices made by the agent * ___– the feedback from the environment based on the agent’s actions * __– the current situation of the environment * __– the strategy the agent uses to determine actions based on the state
* Agent – the learner or decision-maker * Environment – the external system the agent interacts with * Action – the choices made by the agent * Reward – the feedback from the environment based on the agent’s actions * State – the current situation of the environment * Policy – the strategy the agent uses to determine actions based on the state
99
The goal of __________ is to maximize cumulative reward over time.
reinforcement learning
100
What does RLHF stand for?
* RLHF = Reinforcement Learning from Human Feedback
101
* Use human feedback to help ML models to self-learn more efficiently * In Reinforcement Learning there’s a reward function * RLHF incorporates human feedback in the reward function, to be more aligned with human goals, wants and needs * First, the model’s responses are compared to human’s responses * Then, a human assess the quality of the model’s responses
* RLHF = Reinforcement Learning from Human Feedback
102
In case your model has poor performance, you need to look at its ___
model fit
103
What kind of model fit is this: * Performs well on the training data * Doesn’t perform well on evaluation data
Overfitting
104
What kind of model fit is this: * Model performs poorly on training data * Could be a problem of having a model too simple or poor data features
overfitting
105
What kind of model fit is this: Model performs poorly on training data * Could be a problem of having a model too simple or poor data features
Underfitting
106
What kind of model fit is this: -Neither overfitting or underfitting
Balanced fit
107
* Difference or error between predicted and actual value * Occurs due to the wrong choice in the ML process
Bias
108
* The model doesn’t closely match the training data * Example: linear regression function on a non-linear dataset * Considered as underfitting
High bias
109
How much the performance of a model changes if trained on a different dataset which has a similar distribution
Variance
110
* Model is very sensitive to changes in the training data * This is the case when overfitting: performs well on training data, but poorly on unseen test data
high variance
111
how can you reduce variance?
Feature selection (less, more important features) * Split into training and test data sets multiple times
112
Precision or Recall? True Positives / (True Positives + False Positives)
Precision
113
Precision or Recall? True Positives / (True Positives + False Negatives)
Recall
114
* _____– Best when false positives are costly * ____– Best when false negatives are costly * ______ – Best when you want a balance between precision and recall, especially in imbalanced datasets * ______ – Best for balanced datasets
Precision Recall F1 Score Accuracy
115
* AUC-ROC shows what the curve for true positive compared to false positive looks like at various thresholds, with multiple confusion matrixes * You compare them to one another to find out the threshold you need for your business use case.
AUC-ROC Area under the curve-receiver operator curve
116
* ________is when a model is making prediction on new data
Inferencing
117
* Settings that define the model structure and learning algorithm and process * Set before training begins * Examples: learning rate, batch size, number of epochs, and regularization
Hyperparameter
118
* Finding the best ______ values to optimize the model performance
hyperparameters
119
What hyperparameter is this: How large or small the steps are when updating the model's weights during training * High ________ can lead to faster convergence but risks overshooting the optimal solution, while a low learning rate may result in more precise but slower convergence.
learning rate
120
What hyperparamater is this: * Number of training examples used to update the model weights in one iteration * Smaller batches can lead to more stable learning but require more time to compute, while larger batches are faster but may lead to less stable updates.
batch size
121
what hyperparameter is this: * Refers to how many times the model will iterate over the entire training dataset. * Too few epochs can lead to underfitting, while too many may cause overfitting
Number of epochs
122
* _______ is when the model gives good predictions for training data but not for the new data
Overfitting
123
_______ are pre-trained ML services for your use case
AWS AI Services
124
* For Natural Language Processing – NLP * Fully managed and serverless service * Uses machine learning to find insights and relationships in text * Language of the text * Extracts key phrases, places, people, brands, or events * Understands how positive or negative the text is * Analyzes text using tokenization and parts of speech * Automatically organizes a collection of text files by topic * Sample use cases: * analyze customer interactions (emails) to find what leads to a positive or negative experience * Create and groups articles by topics that Comprehend will uncover
Amazon Comprehend
125
Extracts predefined, general-purpose entities like people, places, organizations, dates, and other standard categories, from text
Named Entity Recognition (NER)
126
* Natural and accurate language translation * ________ allows you to localize content - such as websites and applications - for international users, and to easily translate large volumes of text efficiently.
Amazon Translate
127
* Automatically convert speech to text * Uses a deep learning process called automatic speech recognition (ASR) to convert speech to text quickly and accurately * Automatically remove Personally Identifiable Information (PII) using Redaction * Supports Automatic Language Identification for multi-lingual audio * Use cases: * transcribe customer service calls * automate closed captioning and subtitling * generate metadata for media assets to create a fully searchable archive
amazon transcribe
128
What managed service: Turn text into lifelike speech using deep learning * Allowing you to create applications that talk
amazon polly
129
What AWS Managed service: * Find objects, people, text, scenes in images and videos using ML * Facial analysis and facial search to do user verification, people counting * Create a database of “familiar faces” or compare against celebrities * Use cases: * Labeling * Content Moderation * Text Detection * Face Detection and Analysis (gender, age range, emotions…) * Face Search and Verification * Celebrity Recognition * Pathing (ex: for sports game analysis)
Amazon Rekognition
130
* Fully managed service that uses ML to deliver highly accurate forecasts * Example: predict the future sales of a raincoat * 50% more accurate than looking at the data itself * Reduce forecasting time from months to hours * Use cases: Product Demand Planning, Financial Planning, Resource Planning, …
amazon foorecast
131
What managed service: * Build chatbots quickly for your applications using voice and text * Example: a chatbot that allows your customers to order pizzas or book a hotel * Supports multiple languages * Integration with AWS Lambda, Connect, Comprehend, Kendra * The bot automatically understands the user intent to invoke the correct Lambda function to “fulfill the intent” * The bot will ask for ”Slots” (input parameters) if necessary
amazon Lex
132
* Fully managed ML-service to build apps with real-time personalized recommendations * Example: personalized product recommendations/re-ranking, customized direct marketing * Example: User bought gardening tools, provide recommendations on the next one to buy * Same technology used by Amazon.com * Integrates into existing websites, applications, SMS, email marketing systems, … * Implement in days, not months (you don’t need to build, train, and deploy ML solutions) * Use cases: retail stores, media and entertainment…
Amazon Personalize
133
* Automatically extracts text, handwriting, and data from any scanned documents using AI and ML
amazon textract
134
* Fully managed document search service powered by Machine Learning * Extract answers from within a document (text, pdf, HTML, PowerPoint, MS Word, FAQs…) * Natural language search capabilities * Learn from user interactions/feedback to promote preferred results (Incremental Learning) * Ability to manually fine-tune search results (importance of data, freshness, custom, …)
Amazon Kendra
135
* Crowdsourcing marketplace to perform simple human tasks * Distributed virtual workforce * Example: * You have a dataset of 10,000,000 images and you want to labels these images * You distribute the task on Mechanical Turk and humans will tag those images * You set the reward per image (for example $0.10 per image) * Use cases: image classification, data collection, business processing
Amazon Mechanical Turk
136
* Human oversight of Machine Learning predictions in production * Can be your own employees, over 500,000 contractors from AWS, or AWS Mechanical Turk * Some vendors are pre-screened for confidentiality requirements * The ML model can be built on AWS or elsewhere (SageMaker, Rekognition…)
Amazon Augmented AI (A2I)
137
* Fully autonomous 1/18th scale car race driven by Reinforcement Learning (RL)
AWS DeepRacer
138
* Fully managed service for developers / data scientists to build ML models * Typically, difficult to do all the processes in one place + provision servers * Example: predicting your AWS exam score
SageMaker
139
These are examples of a SageMaker service: * Supervised Algorithms * Unsupervised Algorithms * Textual Algorithms * Image Processing
sagemaker built in algorithms
140
* Define the Objective Metric * _____ automatically chooses hyperparameter ranges, search strategy, maximum runtime of a tuning job, and early stop condition * Saves you time and money * Helps you not wasting money on suboptimal configurations
SageMaker – Automatic Model Tuning (AMT)
141
This is an example of batch or asynchronous sagemaker model deployment: * For large payload sizes up to 1GB * Long processing times * Near-real time latency requirements * Request and responses are in Amazon S3
Asynchronous
142
This is an example of batch or asynchronous sagemaker model deployment: * Prediction for an entire dataset (multiple predictions) * Request and responses are in Amazon S3
batch
143
* End-to-end ML development from a unified interface * Team collaboration * Tune and debug ML models * Deploy ML models * Automated workflow
sagemaker studio
144
* Prepare tabular and image data for machine learning * Data preparation, transformation and feature engineering * Single interface for data selection, cleansing, exploration, visualization, and processing * SQL support * Data Quality tool
SageMaker – Data Wrangler
145
_____are inputs to ML models used during training and used for inference * Example - music dataset: song ratings, listening duration, and listener demographics
Features
146
* Ingests features from a variety of sources * Ability to define the transformation of data into feature from within Feature Store * Can publish directly from SageMaker Data Wrangler into SageMaker Feature Store * Features are discoverable within SageMaker Studio
SageMaker – Feature Store
147
* Evaluate Foundation Models * Evaluating human-factors such as friendliness or humor * Leverage an AWS -managed team or bring your own employees * Use built -in datasets or bring your own dataset * Built-in metrics and algorithms
SageMaker Clarify
148
* A set of tools to help explain how machine learning (ML) models make predictions * Understand model characteristics as a whole prior to deployment * Debug predictions provided by the model after it's deployed * Helps increase the trust and understanding of the model * Example: * “Why did the model predict a negative outcome such as a loan rejection for a given applicant?” * “Why did the model make an incorrect prediction?”
SageMaker Clarify - Model Explainability
149
* Ability to detect and explain biases in your datasets and models * Measure bias using statistical metrics * Specify input features and bias will be automatically detected
SageMaker Clarify – Detect Bias (human)
150
_______ occurs when the training data does not represent the full population fairly, leading to a model that over-represents or disproportionately affects certain group
Sampling bias
151
____ occurs when the tools or measurements used in data collection are flawed or skewed
: Measurement bias
152
_________ happens when the person collecting or interpreting the data has personal biases that affect the result
Observer bias
153
_________is when individuals interpret or favor information that confirms their preconceptions. This is more applicable to human decision-making rather than automated model outputs.
Confirmation bias
154
* Model review, customization and evaluation * Align model to human preferences * Reinforcement learning where human feedback is included in the “reward” function
* RLHF – Reinforcement Learning from Human Feedback
155
* Define roles for personas * Example: data scientists, MLOps engineers
* SageMaker Role Manager
156
* * Centralized portal where you can view, search, and explore all of your models * Information and insights for all models
* SageMaker Model Dashboard
157
* Monitor the quality of your model in production: continuous or on-schedule * Alerts for deviations in the model quality: fix data & retrain model * Example: loan model starts giving loans to people who don’t have the correct credit score (drift)
SageMaker – Model Monitor
158
* Centralized repository allows you to track, manage, and version ML models * Catalog models, manage model versions, associate metadata with a model * Manage approval status of a model, automate model deployment, share models…
SageMaker – Model Registry
159
* a workflow that automates the process of building, training, and deploying a ML mode * Continuous Integration and Continuous Delivery (CI/CD) service for Machine Learning * Helps you easily build, train, test, and deploy 100s of models automatically * Iterate faster, reduce errors (no manual steps), repeatable mechanisms
SageMaker Pipeline –
160
* ML Hub to find pre-trained Foundation Model (FM), computer vision models, or natural language processing models * Large collection of models from Hugging Face, Databricks, Meta, Stability AI… * Models can be fully customized for your data and use -case * Models are deployed on SageMaker directly (full control of deployment options) * Pre -built ML solutions for demand forecasting, credit rate prediction, fraud detection and computer vision
SageMaker JumpStart * ML Hub to find pre-trained Foundation
161
* Build ML models using a visual interface (no coding required) * Access to ready-to-use models from Bedrock or JumpStart * Build your own custom model using AutoML powered by SageMaker Autopilot * Part of SageMaker Studio * Leverage Data Wrangler for data preparation
SageMaker Canvas * Build ML models using a visual interface
162
an open-source tool which helps ML teams manage the entire ML lifecycle
MLFlow
163
What is sagemaker ground truth used for?
RLHF, humans for model grading and data labeling
164
Sagemaker role manager is used for ______
access control
165
* Making sure AI systems are transparent and trustworthy * Mitigating potential risk and negative outcomes * Throughout the AI lifecycle: design, development, deployment, monitoring, evaluation
* Responsible AI
166
* Ensure to add value and manage risk in the operation of business * Clear policies, guidelines, and oversight mechanisms to ensure AI systems align with legal and regulatory requirements * Improve trust
* Governance
167
* Ensure adherence to regulations and guidelines * Sensitive domains such as healthcare, finance, and legal applications
Compliance
168
* Form of responsible AI documentation * Help understand the service and its features * Find intended use cases and limitations * Responsible AI design choices * Deployment and performance optimization best practices
AWS AI Service Cards
169
* The degree to which a human can understand the cause of a decision * Access into the system so that a human can interpret the model’s output * Answer “why and how”
* Interpretability
170
* Understand the nature and behavior of the model * Being able to look at inputs and outputs and explain without understanding exactly how the model came to the conclusion
explainability
171
* Show how a single feature can influence the predicted outcome, while holding other features constant * Particularly helpful when the model is “black box” (i.e., Neural Networks) * Helps with interpretability and explainability
Partial Dependence Plots (PDP) * Show how a single feature can
172
* Approach to design AI systems with priorities for humans’ needs
Human-Centered Design (HCD) for Explainable AI
173
Generating content that is offensive, disturbing, or inappropriate
toxicity
174
* Assertions or claims that sound true, but are incorrect * This is due to the next -word probability sampling employed by LLM
Hallucinations
175
* Intentional introduction of malicious or biased data into the training dataset of a model * Leads to the model producing biased, offensive, or harmful outputs (intentionally or unintentionally)
poisoning
176
* Influencing the outputs by embedding specific instructions within the prompts themselves * Hijack the model's behavior and make it produce outputs that align with the attacker's intentions (e.g., generating misinformation or running malicious code)
Hijaking and Prompt Injection
177
* The risk of exposing sensitive or confidential information to a model during training or inference * The model can then reveal this sensitive data from their training corpus, leading to potential data leaks or privacy violations
exposure
178
* The unintentional disclosure or leakage of the prompts or inputs used within a model * It can expose protected data or other data used by the model, such as how the model works
prompt leaking
179
* AI models are typically trained with certain ethical and safety constraints in place to prevent misuse or harmful outputs (e.g., filtering out offensive content, restricting access to sensitive information…) * Circumvent the constraints and safety measures implemented in a generative model to gain unauthorized access or functionality
jailbreaking
180
– principles, guidelines, and responsible AI considerations * Data management, model training, output validation, safety, and human oversight * Intellectual property, bias mitigation, and privacy protection
policies
181
– combination of technical, legal, and responsible AI review * Clear timeline: monthly, quarterly, annually… * Include Subject Matter Experts (SMEs), legal and compliance teams and end-users
review cadence
182
* Technical reviews on model performance, data quality, algorithm robustness * Non-technical reviews on policies, responsible AI principles, regulatory requirements * Testing and validation procedure for outputs before deploying a new model * Clear decision-making frameworks to make decisions based on review results
review strategies
183
* Publishing information about the AI models, training data, key decisions made * Documentation on limitations, capabilities and use cases of AI solutions * Channels for end-users and stakeholders to provide feedback and raise concerns
transparency standards
184
* Train on relevant policies, guidelines, and best practices * Training on bias mitigation and responsible AI practices * Encourage cross-functional collaboration and knowledge-sharing * Implement a training and certification program
team training requirements
185
* Responsible framework and guidelines (bias, fairness, transparency, accountability) * Monitor AI and Generative AI for potential bias, fairness issue, and unintended consequences * Educate and train teams on responsible AI practices
responsible AI
186
* Attributing and acknowledging the sources of the data * Datasets, databases, other sources * Relevant licenses, terms of use, or permissions
source citation
187
* Example: generating fake content, manipulated data, automated attacks * Deploy AI-based threat detection systems * Analyze network traffic, user behavior, and other relevant data sources
threat detection
188
* Identify vulnerabilities in AI systems: software bugs, model weaknesses... * Conduct security assessment, penetration testing and code reviews * Patch management and update processes
vulnerability management
189
* Secure the cloud computing platform, edge devices, data stores * Access control, network segmentation, encryption * Ensure you can withstand systems failures
infrastructure protection
190
* Manipulated input prompts to generate malicious or undesirable content * Implement guardrails: prompt filtering, sanitization, validation
prompt injection
191
______ – ratio of true positive predictions (correct vs. incorrect positive prediction) * ______– ratio of true positive predictions compare to actual positive
precision Recall
192
True or false: * AWS responsibility - Security of the Cloud
true
193
True or false. * Customer responsibility - not Security in the Cloud
false * For Bedrock, customer is responsible for data management, access controls, setting up guardrails, etc… * Encrypting application data
194
* Make sure models aren’t just developed but also deployed, monitored, retrained systematically and repeatedly * Extension of DevOps to deploy code regularly
MLOps
195
* Users or Groups can be assigned JSON documents called _____ * These _____define the ______of the users
policies permissions
196
* ____are people within your organization, and can be grouped
Users
197
* EC2 =
Elastic Compute Cloud
198
_____ is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in AWS. * ___helps identify and alert you to sensitive data, such as personal info.
Amazon Macie
199
* Helps with auditing and recording compliance of your AWS resources * Helps record configurations and changes over time
AWS Config
200
* Automated Security Assessments
Amazon inspector
201
* Provides governance, compliance and audit for your AWS Account
AWS CloudTrail
202
* Portal that provides customers with on-demand access to AWS compliance documentation and AWS agreements
AWS Artifact
203
* On-demand access to security compliance reports of Independent Software Vendors (ISVs)
AWS Artifact - third party reports
204
* Assess risk and compliance of your AWS workloads * Continuously audit AWS services usage and prepare audits
AWS Audit Manager
205
* No need to install anything – high level AWS account assessment * Analyze your AWS accounts and provides recommendation on 6 categories: * Cost optimization * Performance * Security * Fault tolerance * Service limits * Operational Excellence
AWS Trusted advisor
206
private network to deploy your resources (regional resource)
* VPC - Virtual Private Cloud
207
_______allow you to partition your network inside your VPC (Availability Zone resource)
Subnets
208
* A __________ is a subnet that is accessible from the internet
public subnet
209
* A _________ is a subnet that is not accessible from the internet
private subnet
210
* _____ helps our VPC instances connect with the internet
Internet Gateway
211
* ______ (AWS-managed) allow your instances in your Private Subnets to access the internet while remaining private
NAT Gateways
212
We want to use ________ * Access an AWS service privately without going over the public internet * Usually powered by AWS PrivateLink * Keep your network traffic internal to AWS * Example: your application deployed in a VPC can access a Bedrock model privately
VPC endpoints
213