AI Practice Test #3 Flashcards
Model customization methods
Model customization involves further training and changing the weights of the model to enhance its performance. You can use continued pre-training or fine-tuning for model customization in Amazon Bedrock.
Continued Pre-training
Fine-tuning
Continued Pre-training
In the continued pre-training process, you provide unlabeled data to pre-train a foundation model by familiarizing it with certain types of inputs. You can provide data from specific topics to expose a model to those areas. The Continued Pre-training process will tweak the model parameters to accommodate the input data and improve its domain knowledge.
For example, you can train a model with private data, such as business documents, that are not publicly available for training large language models. Additionally, you can continue to improve the model by retraining the model with more unlabeled data as it becomes available.
Fine-tuning
While fine-tuning a model, you provide labeled data to train a model to improve performance on specific tasks. By providing a training dataset of labeled examples, the model learns to associate what types of outputs should be generated for certain types of inputs. The model parameters are adjusted in the process and the model’s performance is improved for the tasks represented by the training dataset.
A company is using Amazon Bedrock based Foundation Model in a Retrieval Augmented Generation (RAG) configuration to provide tailored insights and responses based on client data stored in Amazon S3. Each team within the company is assigned to different clients and uses the foundation model to generate insights specific to their clients’ data. To maintain data privacy and security, the company needs to ensure that each team can only access the model responses generated from the data of their respective clients, preventing any unauthorized access to other teams’ client data.
What is the most effective approach to implement this access control and maintain data security?
The company should create a service role for Amazon Bedrock for each team, granting access only to the specific team’s clients data in Amazon S3
This is the correct approach because creating a service role for each team that has specific access to their data in Amazon S3 ensures fine-grained control over who can access which data. By assigning specific service roles to Amazon Bedrock, the company can enforce data security and privacy rules at the team level, ensuring that each team only has access to the data they are authorized to use. This method also aligns with AWS best practices for secure and controlled access management.
Amazon SageMaker Automatic Model Tuning (AMT)
A healthcare analytics company is using Amazon SageMaker Automatic Model Tuning (AMT) to optimize its machine learning models for predicting patient outcomes. To ensure the models are performing at their best, the data science team is configuring the autotune settings but needs to understand which parameters are mandatory for successful tuning. Properly setting these configurations will allow the team to enhance model accuracy and performance efficiently.
Which of the following options is mandatory for the given use case?
None
Choosing the correct hyperparameters requires experience with machine learning techniques and can drastically affect your model performance. Even with hyperparameter tuning, you still need to specify multiple tuning configurations, such as hyperparameter ranges, search strategy, and number of training jobs to launch. Correcting such a setting is intricate and typically requires multiple experiments, which may incur additional training costs.
Amazon SageMaker Automatic Model Tuning can automatically choose hyperparameter ranges, search strategy, maximum runtime of a tuning job, early stopping type for training jobs, number of times to retry a training job, and model convergence flag to stop a tuning job, based on the objective metric you provide. This minimizes the time required for you to kickstart your tuning process and increases the chances of finding more accurate models with a lower budget.
Incorrect options:
Hyperparameter ranges
Tuning strategy
Number of jobs
Serverless Inference
Serverless Inference
On-demand Serverless Inference is ideal for workloads that have idle periods between traffic spurts and can tolerate cold starts.
Amazon SageMaker Serverless Inference is a purpose-built inference option that enables you to deploy and scale ML models without configuring or managing any of the underlying infrastructure.
Serverless endpoints automatically launch compute resources and scale them in and out depending on traffic, eliminating the need to choose instance types or manage scaling policies. This takes away the undifferentiated heavy lifting of selecting and managing servers. Serverless Inference integrates with AWS Lambda to offer you high availability, built-in fault tolerance, and automatic scaling. With a pay-per-use model, Serverless Inference is a cost-effective option if you have an infrequent or unpredictable traffic pattern. During times when there are no requests, Serverless Inference scales your endpoint down to 0, helping you to minimize your costs.
Unsupervised learning
Unsupervised learning algorithms train on unlabeled data. They scan through new data and establish meaningful connections between the unknown input and predetermined outputs. For instance, unsupervised learning algorithms could group news articles from different news sites into common categories like sports and crime.
Clustering
Dimensionality reduction
Clustering
Clustering is an unsupervised learning technique that groups certain data inputs, so they may be categorized as a whole. There are various types of clustering algorithms depending on the input data. An example of clustering is identifying different types of network traffic to predict potential security incidents.
Dimensionality reduction
Dimensionality reduction is an unsupervised learning technique that reduces the number of features in a dataset. It’s often used to preprocess data for other machine learning functions and reduce complexity and overheads. For example, it may blur out or crop background features in an image recognition application.
Decision tree
The decision tree is a supervised machine learning technique that takes some given inputs and applies an if-else structure to predict an outcome. An example of a decision tree problem is predicting customer churn.
Neural network
A neural network solution is a more complex supervised learning technique. To produce a given outcome, it takes some given inputs and performs one or more layers of mathematical transformation based on adjusting data weightings. An example of a neural network technique is predicting a digit from a handwritten image.
Sentiment analysis
This is an example of semi-supervised learning. Semi-supervised learning is when you apply both supervised and unsupervised learning techniques to a common problem. This technique relies on using a small amount of labeled data and a large amount of unlabeled data to train systems. When considering the breadth of an organization’s text-based customer interactions, it may not be cost-effective to categorize or label sentiment across all channels. An organization could train a model on the larger unlabeled portion of data first, and then a sample that has been labeled.
Amazon SageMaker Data Wrangler
Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare tabular and image data for ML from weeks to minutes. With SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow (including data selection, cleansing, exploration, visualization, and processing at scale) from a single visual interface. You can use SQL to select the data that you want from various data sources and import it quickly. Next, you can use the data quality and insights report to automatically verify data quality and detect anomalies, such as duplicate rows and target leakage. SageMaker Data Wrangler contains over 300 built-in data transformations, so you can quickly transform data without writing any code.
SageMaker Data Wrangler offers a selection of over 300 prebuilt, PySpark-based data transformations, so you can transform your data and scale your data preparation workflow without writing a single line of code. Preconfigured transformations cover common use cases such as flattening JSON files, deleting duplicate rows, imputing missing data with mean or median, one hot encoding, and time-series–specific transformers to accelerate the preparation of time-series data for ML.
Amazon SageMaker Clarify
SageMaker Clarify helps identify potential bias during data preparation without writing code. You specify input features, such as gender or age, and SageMaker Clarify runs an analysis job to detect potential bias in those features.
Amazon SageMaker Ground Truth
Amazon SageMaker Ground Truth offers the most comprehensive set of human-in-the-loop capabilities, allowing you to harness the power of human feedback across the ML lifecycle to improve the accuracy and relevancy of models. You can complete a variety of human-in-the-loop tasks with SageMaker Ground Truth, from data generation and annotation to model review, customization, and evaluation, either through a self-service or an AWS-managed offering.
Amazon SageMaker Feature Store
Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference. For example, in an application that recommends a music playlist, features could include song ratings, listening duration, and listener demographics.
Which of the following performance metrics would you recommend to the team for evaluating the effectiveness of its classification system?
Precision, Recall and F1-Score
Precision, Recall, and F1-Score are standard performance metrics used to evaluate the effectiveness of a classification system:
Precision
Measures the accuracy of the positive predictions, calculated as the ratio of true positives to the sum of true positives and false positives.
Recall (Sensitivity)
Measures the ability of the classifier to identify all positive instances, calculated as the ratio of true positives to the sum of true positives and false negatives.
F1-Score
The harmonic mean of Precision and Recall, providing a single metric that balances both concerns.
Amazon SageMaker Clarify
You can use SageMaker Clarify to identify potential bias in data preparation, allowing you to detect and measure bias in datasets and models to ensure fairness and transparency in machine learning applications
SageMaker Clarify is specifically designed to help identify and mitigate bias in machine learning models and datasets. It provides tools to analyze both data and model predictions to detect potential bias, generate reports, and help ensure that models are fair and transparent. It can help identify and measure bias within the data preparation stage and throughout the model’s lifecycle. This capability is essential for building trustworthy AI systems that do not inadvertently discriminate against specific groups.
Batch inference
Batch inference is the most suitable choice for processing a large payload of several gigabytes with Amazon SageMaker when there is no need for immediate responses. This method allows the company to run predictions on large volumes of data in a single batch job, which is more cost-effective and efficient than processing individual requests in real-time. Batch inference can handle large datasets and is ideal for scenarios where waiting for the responses is acceptable, making it the best fit for this use case.
SageMaker Batch Transform will automatically split your input file of several gigabytes (GBs) into whatever payload size is specified if you use “SplitType”: “Line” and “BatchStrategy”: “MultiRecord”.
Data security and compliance aspects of Amazon Bedrock
The company’s data is not used to improve the base Foundation Models (FMs) and it is not shared with any model providers
Amazon Bedrock is a fully managed service that makes high-performing foundation models (FMs) from leading AI startups and Amazon available for your use through a unified API. Using Amazon Bedrock, you can easily experiment with and evaluate top foundation models for your use cases, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that execute tasks using your enterprise systems and data sources.
With Amazon Bedrock, your content is not used to improve the base models and is not shared with any model providers.
Your data in Amazon Bedrock is always encrypted in transit and at rest, and you can optionally encrypt the data using your own keys. You can use AWS PrivateLink with Amazon Bedrock to establish private connectivity between your FMs and your Amazon Virtual Private Cloud (Amazon VPC) without exposing your traffic to the Internet.
AWS Artifact
The company should use AWS Artifact to facilitate on-demand access to AWS compliance reports and agreements, as well as allow users to receive notifications when new compliance documents or reports, including ISV compliance reports, are available
This is the correct option because AWS Artifact is specifically designed to provide access to a wide range of AWS compliance reports, including those from Independent Software Vendors (ISVs). AWS Artifact allows users to configure settings to receive notifications when new compliance documents or reports are available. This capability makes it an ideal choice for a company that needs timely email alerts regarding the availability of ISV compliance reports.
The new third-party reports tab on the AWS Artifact Reports page provides on-demand access to security compliance reports of Independent Software Vendors (ISVs) who sell their products through AWS Marketplace.
You can subscribe to notifications and create configurations to get notified when a new report or agreement, or a new version of an existing report or agreement becomes available on AWS Artifact.