AWS cert2 Flashcards

(93 cards)

1
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Amazon SageMaker

A

A fully managed service that data scientists and developers use to prepare, build, train, and deploy ML models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Amazon Bedrock

A

A fully managed service that makes FMs from Amazon and leading AI companies available through an API. Amazon Bedrock has a broad set of capabilities to quickly build and scale genAI applications with security, privacy and responsible AI. You can also privately customize FMs with your own data and seamlessly integrate and deploy them into your apps using AWS tools and capabilities.
Bedrock’s RAG Implementation is called “Knowledge Base”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

SageMaker Data Wrangler

A

Amazon SageMaker Data Wrangler to balance your data in cases of any imbalances. SageMaker Data Wrangler offers three balancing operators: random undersampling, random oversampling, and Synthetic Minority Oversampling Technique (SMOTE) to rebalance data in your unbalanced datasets.Data preparation, transformation and feature engineering tool. Aggregate and prepare data for ML. Used for data selection, cleaning, exploration, visualization and processing. Has SQL Support for data query, and a Data Quality tool to analyze quality of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Amazon SageMaker Model Cards

A

SageMaker Model Cards are a feature of SageMaker that you can use to record information about ML models. SageMaker Model Cards include information such as training details, risk rating, evaluation metrics, model performance, considerations, and recommendations. Part of the SageMaker Model Registry, Amazon SageMaker Model Cards document critical details about your machine learning (ML) models in a single place for streamlined governance and reporting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

SageMaker Canvas

A

You can use SageMaker Canvas to build ML models without needing to write any code. SageMaker Canvas does not have any models that can perform content moderation of creative content types.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Amazon SageMaker Ground Truth

A

SageMaker Ground Truth is a service that uses a human workforce to create accurate labels for data that you can use to train models. SageMaker Ground Truth does not store information about model training and performance for audit purposes. Amazon offers a labeling service, Amazon SageMaker Ground Truth. SageMaker Ground Truth can leverage a crowdsourcing service called Amazon Mechanical Turk that provides access to a large pool of affordable labor spread across the globe.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Amazon SageMaker Model Monitor

A

SageMaker Model Monitor establishes an automated alert system that alerts when there are variations in the model’s quality, such as data drift and anomalies. You can use SageMaker Model Monitor to monitor deployed models for performance issues, data drift, and operational inconsistencies. You would primarily use SageMaker Model Monitor to ensure that the model’s performance remains stable over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

SageMaker Studio

A

SageMaker Studio offers a suite of integrated development environments (IDEs), including JupyterLab, RStudio, and Visual Studio Code - Open Source (Code-OSS).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Guardrails for Amazon Bedrock

A

Amazon Bedrock Guardrails evaluates user inputs and FM responses based on use case specific policies, and provides an additional layer of safeguards (e.g. block undesirable content, detect prevent hallucinations, redact sensitive/PII, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Amazon Rekognition

A

Amazon Rekognition is a fully managed AI service for image and video analysis. You can use Amazon Rekognition to identify inappropriate content in images, including drawings, paintings, and animations. Amazon Rekognition can also help wth performing content moderation of the creative content types. Detect custom objects, such as brand logos, using automated machine learning (AutoML) to train your models with as few as 10 images.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Vector Database

A

A vector database is a collection of data that is stored as mathematical representations. Vector databases store structured and unstructured data, such as text or images with the vector embeddings. Vector embeddings are a way to convert words and sentences and other data into numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Amazon DocumentDB

A

Amazon DocumentDB is a fully managed, native JSON document database. Amazon DocumentDB supports vector search. You can use vector search to store, index, and search millions of vectors with millisecond response times.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Amazon OpenSearch Service

A

OpenSearch Service is a fully managed service that you can use to deploy, scale, and operate OpenSearch on AWS. You can use OpenSearch Service vector database capabilities for many purposes. For example, you can implement semantic search, retrieval augmented generation (RAG) with large language models (LLMs), recommendation engines, and multimedia searches. OpenSearch Service can also scale to store millions of embeddings and can support high query throughput.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Amazon SageMaker Clarify

A

SageMaker Clarify helps identify potential bias in machine learning models and datasets without the need for extensive coding. SageMaker Clarify is a feature of SageMaker that helps you explain how a model makes predictions and whether datasets or models reflect bias. SageMaker Clarify also includes a library to evaluate FM performance. The foundation model evaluation (FMEval) library includes tools to compare FM quality and responsibility metrics, including bias and toxicity scores. FMEval can use built-in test datasets, or you can provide a test dataset that is specific to your use case.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
A

It can detect biases in training data and model predictions. You can use SageMaker Clarify to provide insights into model decisions. Therefore, SageMaker Clarify is a suitable solution to develop responsible and fair AI systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

SageMaker JumpStart

A

Amazon SageMaker JumpStart is a machine learning hub with open-source and proprietary foundation models, built-in algorithms, and prebuilt ML solutions that you can deploy with a few clicks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

SageMaker HyperPod

A

Reduce time to train foundation models by up to 40% and scale across more than a thousand AI accelerators efficiently. It efficiently distributes and parallelizes your training workload across many accelerators.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

SageMaker Model Registry

A

SageMaker Model Registry is a fully managed catalog for ML models. You can use SageMaker Model Registry to manage model versions, associate metadata with models, and manage model approval status.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

SageMaker Model Dashboard

A

Amazon SageMaker Model Dashboard is a centralized portal, accessible from the SageMaker console, where you can view, search, and explore all of the models in your account

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Vector

A

A vector is an ordered list of numbers that represent features or attributes of some entity or concept.
In the context of generative AI, vectors might represent words, phrases, sentences, or other units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Embeddings

A

Embeddings are vector representations of content that captures semantic relationships. Embeddings provide content with similar meanings to have close vector representations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Amazon Comprehend

A

Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Amazon Textract

A

You can use Amazon Textract to extract text from documents, handwritten text etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Amazon Kendra
Amazon Kendra is an intelligent search service that provides answers to questions based on the data that is provided. Amazon Kendra uses semantic and contextual understanding to provide specific answers.
26
Amazon Q Business
Amazon Q Business is a generative AI virtual assistant that can answer questions, summarize content, generate content, and complete tasks based on the data that is provided. Amazon Q Business does not provide access to FMs.
27
Retrieval Augmented Generation(RAG)
Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data (external) sources before generating a response.
28
Fine-Tuning
Fine-tuning refers to the process of taking a pre-trained language model and further training it on a specific tasks or domain-specific dataset. It requires a labeled dataset. Benefits of fine-tuning: Increase specificity, Improve accuracy, Reduce bias, Boost efficiency.
29
Continued Pre-Training
Using unlabeled data - industry specific and domain specific unlabeled data, continue pre-training the FM. It is costly and requires expertise.
30
Instruction based fine-tuning
Instruction-based fine-tuning improves the performance of a pre-trained foundation model (FM) on domain-specific tasks. Instruction-based fine-tuning uses labeled examples that are formatted as prompt-response pairs and that are phrased as instructions.
31
Domain adaptation fine-tuning
This approach uses a pre-trained model to improve its performance only in a specific domain. This method is more about making the model knowledgeable in a specific domain rather than improving its ability to manage complex conversational tasks or adapt to individual user preferences.
32
Transfer learning
Method where a model developed for one task is reused as the starting point for a model on a second task. Suitable for solving Natural Language Processing problems (example image).
33
PEFT
PEFT (Parameter-Efficient Fine-Tuning) refers to techniques that aim to fine-tune large language models efficiently, without having to update all of the model's parameters. This helps reduce the amount of task-specific data and computational resources required.
34
LoRA
LoRA (Low-Rank Adaptation), a PEFT method, adds a low-rank adaptation module to the model, which can be fine-tuned while keeping the original model parameters frozen.
35
Prefix Tuning
Learns taskspecific prefix vectors that are prepended to the input, without modifying the original model.
36
Chain of Thought
Chain of thought is a prompt engineering technique that breaks down a complex question into smaller parts. A recommended technique when you have arithmetic and logical tasks that require reasoning.
37
RLHF
RLHF is a technique for fine-tuning large language models to align them with human values and preferences, by collecting human feedback on model outputs and then using reinforcement learning to reward model behaviors that match those human preferences. The goal of RLHF is to train models to behave in ways that are more consistent with human values, beyond just optimizing for task performance.
38
Common prompt injection attacks
Ignoring the prompt template (asked in sample exam). This general attack consists of a request to ignore the model's given instructions. For example, if a prompt template specifies that an LLM should answer questions only about the weather, a user might ask the model to ignore that instruction and to provide information on a harmful topic. Others - Prompted persona switches, Fake completion (guiding the LLM to disobedience), rephrasing or obfuscating common attacks.... refer to list https://docs.aws.amazon.com/prescriptive-guidance/latest/llm-prompt-engineering-best-practices/common-attacks.html
39
Bias
Unfair prejudice or preference that favors or disfavors a person or group.
40
Fairness
Impartial and just treatment without discrimination
41
Overfitting
When a model performs well on training data but fails to generalize to new data; Low bias and high variance: Low bias indicates that the model is not making erroneous assumptions about the training data. High variance indicates that the model is paying attention to noise in the training data and is overfitting.
42
Underfitting
When a model is too simple to capture the underlying patterns in the data. High bias and low variance: High bias indicates that the model is making erroneous assumptions about the training data. Low variance indicates that the model is not paying attention to noise in the training data, which will lead to underfitting.
43
Low bias and low variance (Ideal)
Low bias indicates that the model is not making erroneous assumptions about the training data. Prevents underfitting. Low variance indicates that the model is not paying attention to noise in the training data. Prevents overfitting. Low bias and low variance is an ideal outcome for model training as it does not result in model overfitting / underfitting.
44
Explainability
The ability to understand how a model arrives at a prediction, explain the behaviour in human terms. Sagemaker Clarify uses methods like SHAP, Partial dependency plots(pdp) etc. Answers WHY a model made a specific decision
45
Interpretability
The ability to explain and understand the internal decision-making process of a machine learning model = "The What". Helps users understand how a model combines features to make predictions
46
Generative AI model
The key benefit of generative AI models is their ability to produce novel, human-like outputs based on the data they are trained on. This makes them highly versatile and applicable across a wide range of domains and use cases.
47
ROUGE
Quality of generated summaries or translations compared to reference texts
48
BLEU (Model Evaluation)
Similarity between generated text translations and reference translations
49
BERT (Model Evaluation)
Evaluates semantic similarity between generated text. Compares the "meaning" and similarities of the text being compared
50
Top K
Top-K is a parameter used in language models to limit the selection of tokens to the K most probable options during text generation, controlling the balance between diversity and predictability in the output.
51
Top P
Top P is a setting that controls the diversity of the text by limiting the number of words that the model can choose from based on their probabilities. Top P is set on a scale of 0-1. Low Top P (like 0.25), the model will only consider words that make up the top 25% of the total probability distribution. This can help the output to be more focused and coherent because the model is limited to choosing from the most probable words given the context. High top P (0.99) - the model will consider a broad range of possible words for the next word in the sequence because it will include words that make up the top 99% of the total probability distribution. This can lead to more diverse and creative outputs because the model has a wider pool of words to choose from.
52
Temperature
Controls the randomness or diversity of the generated outputs. A higher temperature value increases the probability of sampling from less likely or lower-probability output tokens, resulting in a more diverse and unpredictable response. A lower temperature value favors the most probable outputs, leading to more deterministic and repetitive respones.
53
F1
F1 score balances precision and recall by combining them in a single metric. The F1 score is a metric that you can use to evaluate classification models. F1 = 2 * P * R / P + R ( P = Precision , R = Recall )
54
Accuracy
Correct predictions / All predictions. the percentage of correct predictions on a 0-1 scale. Accuracy is not a good measure when the data has class imbalance. Accuracy measures how close the predicted class values are to the actual valuesof true positives (TP) and true negatives (TN) to the total number of predictions
55
Precision
Precision measures how well an algorithm predicts true positives out of all the positives that it identifies. This is a good quality metric to use when your goal is to minimize the number of false positives. True positives/(true positives + false positives)
56
Mean Squared Error (MSE)
Mean squared error, or the average of the squared differences between the predicted and actual values. MSE values are always positive. The better a model is at predicting the actual values, the smaller the MSE value is. MSE is used to evaluate the performance of regression models.
57
Mean Absolute Percentage Error (MAPE)
MAPE is the mean of the absolute differences between the actual values and the predicted values, divided by the actual values. You can use MAPE in numeric predictions to understand model prediction errors.
58
Mean Absolute Error (MAE)
MAE measures how different the predicted and actual values are when the values are averaged over all values. You can use MAE in numeric predictions to understand model prediction errors.
59
R squared or R2
R-squared measures how much of the variation in your data is explained by your model, ranging from 0 (no explanation) to 1 (perfect explanation). For example, if R2=0.8, it means 80% of the variation in your data is accounted for by the model, while 20% is still unexplained. It helps evaluate model fit, but a high R2 doesn’t always mean the model is good—it could overfit or miss other important factors.
60
Root Mean Squared Error (RMSE)
Root Mean Squared Error, or the standard deviation of the errors. Measures the square root of the squared difference between predicted and actual values, and is averaged over all values. It is used to understand model prediction error, and it's an important metric to indicate the presence of large model errors and outliers. Values range from zero (0) to infinity, with smaller numbers indicating a better model fit to the data. RMSE is dependent on scale, and should not be used to compare datasets of different types.
61
Hallucination
AI hallucinations are incorrect or misleading results that AI Models generate. These errors can be caused by a variety of factors, including insufficient training data, incorrect assumptions made by the model, or biases in the data used to train the model.
62
GAN
A generative adversarial network (GAN) is a deep learning architecture. It trains two neural networks to compete against each other to generate more authentic new data from a given training dataset.
63
VAE
A variational autoencoder (VAE) provides a probabilistic manner for describing an observation in latent space.
64
Transformers
Transformers are a type of neural network architecture that transforms or changes an input sequence into an output sequence. They do this by learning context and tracking relationships between sequence components.
65
SageMaker inference
Real-time inference allows you to deploy your model to SageMaker hosting services and get a fully managed, autoscaling endpoint that can be used for real-time inference. Serverless inference lets you deploy and scale without managing any underlying architecture. Asynchronous inference queues incoming, large requests <1GB and processes them asynchronously. Batch transform is for batch inference, also known as offline inference.
66
AWS AI Service Cards
Resource to help customers better understand our AWS AI services to enhance transparency and advance responsible AI
67
Amazon SageMaker Debugger
helps debug and optimize machine learning models by monitoring and profiling training jobs in real-time.
68
Amazon Augmented AI (Amazon A2I)
Amazon A2I is a service to build human review systems for ML solutions. You can use Amazon A2I to create a workflow for human reviewers to audit individual predictions. Amazon A2I is not a reporting tool designed to support system-level compliance audits.
69
Amazon SageMaker Autopilot
Amazon SageMaker Autopilot is an automated ML (AutoML) tool that simplifies and automates the process of building and deploying ML models for application owners.
70
Epoch
"Epoch" refers to a single complete pass through the entire training dataset during the process of training a machine learning model.
71
Learning Rate
The Learning rate hyperparameter controls the step size at which a model's parameters are updated during training. It determines how quickly or slowly the model's parameters are updated during training.
72
Supervised Learning
Supervised learning is a type of machine learning where the algorithm is trained on a labeled dataset, meaning the input data is paired with the correct output. The goal is for the algorithm to learn the mapping between input and output so it can accurately predict outcomes for new, unseen data.
73
Unsupervised Learning
Unsupervised learning involves training algorithms on unlabeled data, without predefined outputs or correct answers. The goal is for the algorithm to discover hidden patterns, structures, or relationships within the data on its own, often used for clustering, dimensionality reduction, or anomaly detection.
74
Semi-Supervised Learning
Semi-supervised learning is a hybrid approach that combines elements of both supervised and unsupervised learning, using a small amount of labeled data along with a larger amount of unlabeled data. This method aims to leverage the benefits of both approaches, improving model performance when fully labeled datasets are scarce or expensive to obtain.
75
Stop Sequences
Stop sequences are specific tokens or phrases that instruct an AI model to cease generating text at a designated point, such as the end of a sentence or list.
76
Feature Extraction
Feature extraction is the technique of creating new features by transforming or combining the original input features.
77
Feature Selection
Feature selection is the process of choosing a subset of the most relevant features from a dataset. It aims to reduce dimensionality.
78
Amazon Bedrock Knowledge Bases
With Amazon Bedrock Knowledge Bases, you can give FMs and agents contextual information from your company’s private data sources for RAG to deliver more relevant, accurate, and up-to-date response.
79
Accuracy
Is an evaluation matrix used for evaluating Classification algorithm models.Accuracy is most effective with balanced datasets
80
Token
A sequence of characters that a model can interpret or predict as a single unit of meaning. For example, with text models, a token could correspond not just to a word, but also to a part of a word with grammatical meaning (such as "-ed"), a punctuation mark (such as "?"), or a common phrase (such as "a lot").
81
Context Window
The context window is a model property that describes the number of tokens that the model can accept in the context.
82
Latent Space
Latent space refers to the encoded knowledge within an LLM, representing complex relationships and patterns learned from the massive datasets used during training.
83
Confusion matrix
A confusion matrix is a table that compares the predictions of a classification model to the actual values of a dataset. A confusion matrix is used to summarize the performance of a classification model when it's evaluated against test data.
84
Poisoning
Poisoning refer to the intentional introduction of malicious or biased data into the training dataset of a model. This can lead the model producing biased, offensive, or harmful outputs, either intentionally or unintentionally.
85
Hijacking and prompt injection
Hijacking and prompt injection refer to the technique of influencing the outputs of generative models by embedding specific instructions within the prompts themselves. The goal is to hijack the model's behaviour and make it produce outputs that align with the attacker's intentions, such as generating misinformation or running malicious code.
86
Prompt leaking
Prompt leaking refers to the unintentional disclosure or leakage of the prompts or inputs (regardless of whether these are protected data or not) used within a model. Prompt leaking does not necessarily expose protected data. But it can expose other data used by the model, which can reveal information of how the model works and this can be used against it.
87
Jailbreaking
Jailbreaking refers to the practice of modifying or circumventing the constraints and safety measures implemented in a generative model or AI assistant to gain unauthorized access or functionality. Jailbreaking attempts involve crafting carefully constructed prompts or input sequences that aim to bypass or exploit vulnerabilities in the AI system's filtering mechanism or constraints. The goal is to "break out" of the intended model limitations.
88
Prompt sterotyping
An evaluation from Amazon SageMaker Studio that measures the probability that your model encodes biases in its response. These biases include those for race, gender, sexual orientation, religion, age, nationality, disability, physical appearance, and socioeconomic status.
89
Shapley value
SageMaker Clarify provides feature attributions based on the concept of Shapley value, used to determine the contribution that each feature made to model predictions. These attributions can be provided for specific predictions and at a global level for the model as a whole.
90
Negative prompting
Negative prompting refers to guiding a generative AI model to avoid certain outputs or behaviors when generating content.
91
Prompt Template
Prompt templates are predefined formats that you can be used to standardize inputs and outputs for AI models.
92
Amazon Q developer
Amazon Q Developer is a generative artificial intelligence (AI) powered code assistant.
93
Amazon SageMaker Feature Store
Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference.