Practice Questions - Amazon AWS Certified AI Practitioner AIF-C01 Flashcards

Question

A company wants to use a large language model (LLM) on Amazon Bedrock for sentiment analysis. The company wants to classify the sentiment of text passages as positive or negative. Which prompt engineering strategy best meets these requirements? A. Provide examples of text passages with corresponding positive or negative labels in the prompt followed by the new text passage to be classified. B. Provide a detailed explanation of sentiment analysis and how LLMs work in the prompt. C. Provide the new text passage to be classified without any additional context or examples. D. Provide the new text passage with a few examples of unrelated tasks, such as text summarization or question answering.

Answer 1

B. AWS CloudTrail AWS CloudTrail is the correct answer because it logs all API calls made to AWS services, including Amazon Bedrock. This allows the company to track and identify any unauthorized access attempts. The other options are incorrect because: * **A. AWS Audit Manager:** While Audit Manager helps manage compliance, it doesn't directly track individual API calls and unauthorized access attempts in real-time like CloudTrail. * **C. Amazon Fraud Detector:** This service is designed to detect fraudulent activities, not specifically unauthorized access attempts to AWS services. * **D. AWS Trusted Advisor:** This service provides recommendations for improving security and cost optimization, but it doesn't provide the detailed logging necessary to pinpoint unauthorized access attempts.

Answer 2

B. AWS Artifact AWS Artifact allows access to AWS compliance reports and provides email notifications when new compliance documents are available. This directly addresses the company's need for notifications when ISV compliance reports are released. The other options are incorrect because: A (AWS Audit Manager) focuses on internal, not external, compliance; C (AWS Trusted Advisor) offers optimization recommendations, not ISV compliance report notifications; and D (AWS Data Exchange) is for data distribution, not compliance report management.

Answer 3

A The correct answer is A because creating a prompt template designed to identify and mitigate common attack patterns (like prompt injections) makes the LLM more resistant to manipulation. This proactive approach teaches the LLM to recognize and prevent malicious attempts. Option B is incorrect because increasing the temperature parameter generally increases randomness and creativity, making the LLM more susceptible to manipulation, not less. Option C is incorrect because restricting LLMs to those listed in Amazon SageMaker doesn't inherently protect against prompt engineering attacks. Option D is incorrect because decreasing the number of input tokens might limit the context the LLM can use but doesn't directly address the problem of malicious prompt engineering.

Answer 4

D Explanation: Option D, building and training a generative AI model from scratch using specific data, provides the company with the most control and therefore the most responsibility for security. The company owns the data, the training process, the model architecture, and the deployment, giving them complete oversight of all security aspects. Options A, B, and C all involve using third-party components, reducing the company's control and thus their security responsibility. The less control a company has, the less ownership of security responsibilities they have.

Answer 5

A. Object detection Object detection is the correct answer because it's a computer vision technique specifically designed to identify and classify objects within an image. The problem describes needing to identify and categorize animals in photos, which aligns perfectly with the functionality of object detection. B. Anomaly detection is incorrect because it focuses on identifying unusual or unexpected data points, not on classifying known objects like animals. C. Named entity recognition is incorrect because it's a natural language processing technique used to identify named entities (like people, places, organizations) in text, not images. D. Inpainting is incorrect because it's an image editing technique used to fill in missing or damaged parts of an image, not to identify and categorize objects.

Answer 6

A. On-Demand The On-Demand pricing model is the best fit because it allows the company to pay only for the resources consumed, offering flexibility and avoiding long-term commitments or upfront costs. This aligns perfectly with the company's limited budget and preference for flexibility. Option B (Model customization) is incorrect because it focuses on modifying the models, not the pricing structure. Option C (Provisioned Throughput) implies a pre-committed level of resources, which contradicts the requirement for flexibility. Option D (Spot Instance) is irrelevant to Amazon Bedrock pricing.

Answer 7

B. Amazon SageMaker JumpStart Amazon SageMaker JumpStart provides pre-trained models and solutions, simplifying the process of deploying and using foundation models. Options A, C, and D are incorrect. Amazon Personalize is focused on personalization, not general foundation model deployment. PartyRock (presumably a hypothetical service or a misremembering of a service) is not a standard AWS offering. While Amazon SageMaker endpoints are where models are served, JumpStart helps with the initial deployment and setup to get the model to that endpoint within the VPC.

Answer 8

A The correct answer is A, Generative pre-trained transformers (GPT). GPT models are specifically designed for generating text, making them suitable for translating natural language (employee input) into a structured query language like SQL. Residual neural networks (B) are excellent for image and complex data processing but aren't inherently designed for text generation. Support vector machines (C) are primarily used for classification and regression tasks, not text generation. WaveNet (D) is designed for generating raw audio waveforms, making it irrelevant to this scenario.

Answer 9

A. Data augmentation for imbalanced classes Data augmentation is the correct answer because it directly addresses the problem of biased input data. If certain professions are underrepresented in the dataset (leading to bias), data augmentation techniques can create synthetic data to balance the classes and make the model more representative. B is incorrect because while model monitoring can *identify* bias, it doesn't solve it. C is irrelevant; RAG is a technique for improving text generation, not image generation or bias mitigation. D is also irrelevant; watermark detection is unrelated to addressing bias in image generation.

Answer 10

B, C The correct answers are B (Threat detection) and C (Data protection). Threat detection is crucial for regulatory compliance as it protects against cyberattacks, data breaches, and other vulnerabilities that could compromise the chatbot's integrity and violate data security regulations. Data protection is essential for compliance as it ensures the privacy and security of customer data collected and processed by the chatbot, adhering to regulations like GDPR, CCPA, etc. Option A (Auto scaling inference endpoints) is an operational capability, not directly related to regulatory compliance. Option D (Cost optimization) is a business concern, not a compliance feature. Option E (Loosely coupled microservices) is an architectural pattern that can indirectly improve compliance by enhancing system resilience and security, but it's not a direct compliance capability.

Answer 11

D Amazon SageMaker Clarify's primary function is to identify potential bias in data used for machine learning, both before and after model training. Options A, B, and C describe functionalities of other Amazon SageMaker services: A (RAG workflow integration) is not a Clarify feature; B (model quality monitoring in production) is handled by SageMaker Model Monitor; and C (model documentation) is the role of SageMaker Model Cards.

Answer 12

D. Amazon EC2 Trn series The Trn series instances are specifically designed for training deep learning models and are optimized for energy efficiency. They utilize specialized AWS-designed Trainium chips, resulting in a better performance-to-energy ratio compared to other instance types. This makes them the most environmentally friendly option for LLM training among the choices provided. The other options (C, G, and P series) are not specifically designed for this purpose and thus will likely have a higher energy consumption per unit of computation.

Answer 13

A Generative adversarial networks (GANs) are specifically designed to generate new data instances that resemble the training data. They achieve this through a competition between two neural networks: a generator that creates synthetic data and a discriminator that tries to distinguish between real and synthetic data. This adversarial process leads to the generator producing increasingly realistic synthetic data. XGBoost is a gradient boosting algorithm used for prediction tasks, not data generation. Residual neural networks are used for various tasks, including image classification and object detection, but not primarily for synthetic data generation. WaveNet is a deep generative model for raw audio synthesis, not for general-purpose synthetic data generation across different data types.

Answer 14

B and D Amazon Comprehend is a natural language processing (NLP) service that excels at analyzing text and extracting insights, including sentiment. It directly addresses the need to determine customer sentiment from written reviews. Amazon Bedrock provides access to foundation models capable of sentiment analysis via generative AI. This offers another powerful approach to analyzing customer reviews and understanding sentiment. Amazon Lex (A) is a service for building conversational interfaces, not directly relevant to sentiment analysis from text. Amazon Polly (C) is a text-to-speech service, unrelated to sentiment analysis. Amazon Rekognition (E) is for image and video analysis, not text.

Answer 15

C. Exploratory data analysis Exploratory data analysis (EDA) involves analyzing and visualizing data to understand its characteristics, identify patterns, and find anomalies. Creating correlation matrices, calculating statistics, and visualizing data are all common EDA tasks. The other options are incorrect because: A) Data pre-processing focuses on cleaning and preparing data for modeling; B) Feature engineering involves creating new features from existing ones; and D) Hyperparameter tuning optimizes model parameters after the model is built. The company's actions clearly indicate they are exploring and understanding the data, which is the defining characteristic of EDA.

Answer 16

D. BERT-based models The correct answer is D because BERT (Bidirectional Encoder Representations from Transformers) is a language model specifically designed for understanding context in text. It considers both the surrounding words to predict masked words, making it highly effective at filling in missing words within sentences. Options A, B, and C are incorrect because: * **A. Topic modeling:** Focuses on identifying themes and topics within a collection of documents, not on filling in missing words within individual documents. * **B. Clustering models:** Group similar data points together; this is not directly relevant to predicting missing words based on context. * **C. Prescriptive ML models:** Suggest actions to optimize an outcome, not suitable for filling in missing text.

Answer 17

A The correct answer is A, Confusion matrix. A confusion matrix is specifically designed to evaluate the performance of classification models. It provides a detailed breakdown of the model's predictions, including true positives, true negatives, false positives, and false negatives. This allows for the calculation of other important metrics like accuracy, precision, recall, and F1-score, which are all crucial for assessing the performance of a classification model. Options B, C, and D are incorrect because they are more suitable for regression problems, not classification problems. A correlation matrix measures the linear relationship between variables, R² score represents the proportion of variance in the dependent variable explained by the independent variable(s), and MSE measures the average squared difference between predicted and actual values – all relevant to regression tasks, not classification.

Answer 18

A The correct answer is A because moderation APIs directly filter image content before it's displayed, blocking inappropriate material. Option B is incorrect because retraining with a general public dataset doesn't guarantee the removal of inappropriate content; it may even introduce more. Option C (model validation) is a good practice but doesn't actively prevent inappropriate images from being shown. Option D (automating user feedback) is reactive, not proactive, and doesn't prevent inappropriate images from being shown in the first place.

Answer 19

A The correct answer is A because decreasing the temperature value reduces the randomness in the LLM's output, leading to more consistent responses for the same input. Increasing the temperature (B) would increase randomness and variability. Decreasing the length of output tokens (C) and increasing the maximum generation length (D) affect the length of the response, not its consistency.

Answer 20

B. Inference

Answer 21

B. Increase the epochs. Increasing the number of epochs allows the model to train on the data for a longer period, potentially improving accuracy by learning more complex patterns. Decreasing the batch size (A) might affect training stability but doesn't guarantee increased accuracy. Decreasing the epochs (C) reduces training time but likely decreases accuracy. Increasing the temperature parameter (D) affects the model's output randomness during inference, not training accuracy.

Answer 22

C. Increase the volume of data that is used in training. The significant decrease in model performance after deployment suggests the model is overfitting the training data. Overfitting occurs when a model learns the training data too well, including its noise and outliers, and fails to generalize to new, unseen data. Increasing the volume of training data is a common solution to this problem. More data helps the model learn the underlying patterns more effectively and reduces the impact of noise, leading to better generalization and improved performance on production data. Option A is incorrect because reducing the training data would likely exacerbate the overfitting problem. Option B is incorrect because adding hyperparameters without addressing the underlying issue of overfitting is unlikely to solve the problem. While hyperparameter tuning is important for model optimization, it's not the primary solution in this case of a significant performance drop after deployment. Option D is incorrect because increasing training time without addressing the overfitting issue might lead to even more overfitting, not improved generalization.

Answer 23

D Option D is the most cost-effective because using an Amazon Bedrock knowledge base is designed for efficient context retrieval from large document sets. Options A and B are impractical for large numbers of PDF files due to context window limitations of LLMs; trying to include all PDFs (B) would likely be impossible or prohibitively expensive. Option C, fine-tuning, is generally more expensive than using a knowledge base because it requires significant computational resources and time. The knowledge base approach allows for efficient querying and retrieval of relevant information from the PDF documents without the need for extensive prompt engineering or costly model fine-tuning.

Answer 24

D The correct answer is D because Amazon SageMaker Canvas is a no-code/low-code machine learning service. This directly addresses the company's lack of coding experience and need for a data-driven predictive model. Options A, B, and C all require some level of coding or familiarity with ML algorithms, making them unsuitable for this company.

Answer 25

A The correct answer is A because it directly addresses the security requirement by isolating access at the role level. Each team gets its own role with restricted access to only its designated customer data within S3. This provides the strongest isolation and minimizes the risk of unintended data access. Option B relies on application-level controls (specifying customer names in requests), which is less secure and more prone to errors. Option C, while addressing data privacy through redaction, doesn't fully solve the access control problem; teams still need separate permissions to control which redacted data they can access. Option D creates a central point of access (the Bedrock role with full S3 access), creating a single point of failure and increasing the risk of a security breach impacting multiple teams.

Answer 26

C The correct answer is C because Guardrails for Amazon Bedrock directly addresses the need to filter content and prevent the inclusion of personal information in the model's responses. Amazon CloudWatch alarms provide the necessary notification system for policy violations. Option A is incorrect because while Amazon Macie can scan for sensitive data, it doesn't directly integrate with Amazon Bedrock's content generation process to prevent the inclusion of sensitive information *before* it's generated. Option B is incorrect because AWS CloudTrail is primarily for logging and monitoring API calls and activity within AWS, not for filtering content generated by a model. Option D is incorrect because Amazon SageMaker Model Monitor focuses on model quality and data drift, not on preventing the generation of responses containing sensitive information.

Answer 27

C. Guardrails for Amazon Bedrock Guardrails for Amazon Bedrock are safety mechanisms that can be applied to ensure content generated by foundation models adheres to specific safety and appropriateness guidelines. This makes it the ideal choice to ensure the generated stories are suitable for children and avoid inappropriate content. Amazon Rekognition is an image and video analysis service, not relevant to text generation. Amazon Bedrock playgrounds are environments for experimenting with models, not content filtering tools. Agents for Amazon Bedrock are for building conversational AI applications and don't directly address content appropriateness.

Answer 28

A Amazon SageMaker Serverless Inference is a fully managed service designed for deploying and serving machine learning models without requiring the user to manage underlying infrastructure. This directly addresses the company's requirement of deploying the model and serving predictions without managing infrastructure. Option B is incorrect because Amazon CloudFront is a content delivery network (CDN) and is not designed for hosting and serving ML models. Option C is incorrect because while Amazon API Gateway can be used to create an API to access a model, it doesn't inherently host or manage the model itself; it requires integration with a separate hosting service. Option D is incorrect because AWS Batch is a batch processing service, not suited for real-time prediction serving required by a web application.

Answer 29

A. Batch transform Batch transform is the most suitable option because it is designed for processing large datasets (multiple GBs) without the need for immediate predictions. Real-time inference requires immediate responses, which is not a requirement here. Serverless inference, while scalable, is still geared towards individual requests rather than bulk processing. Asynchronous inference, while handling large requests, still involves individual endpoint calls, making batch transform a more efficient choice for this large-scale, archived data processing scenario.

Answer 30

B. Sampling bias The correct answer is B because sampling bias occurs when the data used to train the model does not accurately represent the real-world population. If the training dataset contains more examples of people from a specific ethnic group associated with suspicious activities, the model learns skewed patterns and disproportionately applies those associations, leading to unfair discrimination. A is incorrect because measurement bias refers to systematic errors in the measurement process itself, not the composition of the data used for training. C is incorrect because observer bias relates to the subjective interpretation of data by a human observer, which is not the case here as the analysis is performed by an ML model. D is incorrect because confirmation bias describes the tendency to search for or interpret information in a way that confirms pre-existing beliefs, which is not directly related to the disproportionate flagging of a specific ethnic group by the model.

Answer 31

B The correct answer is B because Amazon Bedrock has a built-in feature to enable invocation logging. This directly addresses the practitioner's need to store and monitor input and output data from the model's invocations. Option A is incorrect because AWS CloudTrail is a service for logging API calls and management events, not specifically for model invocation details within Amazon Bedrock. Option C is incorrect because AWS Audit Manager focuses on compliance and auditing, not detailed model invocation logging. Option D is incorrect because while EventBridge can handle logs, it's not the primary or most direct method for managing Amazon Bedrock's invocation logs; Bedrock offers this functionality natively.

Answer 32

C. Amazon Q in Amazon QuickSight Amazon QuickSight is a fully managed business intelligence service that allows users to easily create and publish interactive dashboards and visualizations, including graphs. Amazon Q, integrated within QuickSight, enables natural language querying to automate the generation of these visualizations based on the user's questions about their data. This makes it the ideal solution for automating the creation of graphs showing total sales data. Option A is incorrect because Amazon EC2 is a compute service; it doesn't offer built-in business intelligence or graph generation capabilities. Option B is incorrect because Amazon Q Developer is focused on integrating Amazon Q into custom applications, not directly generating visualizations. Option D is incorrect because AWS Chatbot is a conversational interface, not a business intelligence tool capable of generating complex graphs from sales data.

Answer 33

C The correct answer is C because using effective prompts allows direct control over the generated content's style and message. Options A and B involve altering the pre-trained model itself, which is unnecessary and potentially problematic. Option D suggests training a new model, which is inefficient and defeats the purpose of using a pre-trained model. Only option C directly addresses the need to align the output with the company's brand voice and messaging using readily available tools.

Answer 34

B The correct answer is B because it directly addresses the company's need to assess the models based on employee preferences. Using a human workforce allows for subjective evaluation of response style, and custom prompt datasets ensure that the models are evaluated on prompts relevant to the company's use case. Option A is incorrect because built-in datasets may not reflect the company's specific needs and style preferences. Option C is incorrect because public leaderboards focus on general performance metrics, not stylistic preferences. Option D is incorrect because it focuses on performance metrics (latency) rather than the qualitative aspect of response style.

Answer 35

A. Amazon Textract Amazon Textract is the correct answer because it is an AWS service specifically designed to extract text and data from documents, including PDFs. This directly addresses the company's need to automate the conversion of PDF resumes into plain text. Option B, Amazon Personalize, is incorrect because it's a recommendation engine, not a document processing service. Option C, Amazon Lex, is incorrect because it's a service for building conversational interfaces, not for document conversion. Option D, Amazon Transcribe, is incorrect because it converts speech to text, not documents to text.

Answer 36

B The correct answer is B because Amazon Bedrock Agents excel at automating repetitive tasks and orchestrating complex workflows. Customer support often involves many similar inquiries, making automation ideal for efficiency. While options A, C, and D are features of Bedrock, they aren't the most directly beneficial for rapidly processing a high volume of customer support requests. A focuses on prediction, C on model consolidation, and D on model selection; all secondary to the core need of efficient task automation.

Answer 37

A The correct answer is A because the number of tokens consumed directly impacts the computational resources required for inference. More tokens mean more processing, resulting in higher costs. Options B, C, and D relate to the training of the LLM, not the inference costs incurred during its use. Temperature value affects the randomness of the model's output, while the amount of training data and training time impact the model's development cost, not its runtime inference cost.

Answer 38

A. Amazon S3 Amazon S3 is the correct answer because it's a scalable, durable, and secure object storage service designed for storing large datasets. Amazon Bedrock can easily access data stored in S3 buckets for model validation. Option B (Amazon EBS) is incorrect because it is a block storage service primarily used for persistent storage for EC2 instances, not for general-purpose data storage and retrieval required by Bedrock. Option C (Amazon EFS) is incorrect as it's a file system designed for shared file access, not optimized for the large-scale data handling needed for model validation. Option D (AWS Snowcone) is incorrect because it's a physical device for transferring large datasets, not a service for online storage and retrieval.

Answer 39

C. Plagiarism The correct answer is C because the student is directly copying AI-generated content and submitting it as their own work without proper attribution. This is the definition of plagiarism. Option A, Toxicity, refers to AI generating harmful or offensive content. Option B, Hallucinations, refers to AI generating factually incorrect information. Option D, Privacy, refers to the potential misuse of personal data by AI systems. None of these options directly address the issue of academic dishonesty presented in the scenario.

Answer 40

A. Embeddings Embeddings are the correct answer because they are specifically designed as numerical representations that capture the semantic meaning of words, phrases, or documents. This allows AI and NLP models to understand the relationships between these concepts and improve their comprehension of text. Option B, Tokens, is incorrect because tokens are simply individual units of text (words, punctuation, etc.) and don't inherently represent semantic meaning. Option C, Models, is too broad; models use embeddings, but are not the embeddings themselves. Option D, Binaries, is incorrect as it refers to a data type (0s and 1s) and not a semantic representation.

Answer 41

B. Hallucination Hallucination is the correct answer because it describes the LLM generating content that appears realistic but is factually inaccurate. Data leakage refers to the unintended exposure of training data, overfitting means the model performs well on training data but poorly on unseen data, and underfitting means the model is too simple to capture the underlying patterns in the data. None of these accurately describe an LLM generating plausible but incorrect information.

Answer 42

B Reinforcement learning is the correct answer because it allows the chatbot to learn from its interactions with customers and adjust its responses based on positive feedback. This directly addresses the requirement for self-improvement based on past interactions. Options A and D rely on pre-defined datasets and therefore lack the continuous self-improvement capability. Option C, unsupervised learning, focuses on identifying patterns in data rather than improving responses based on feedback.

Answer 43

A The correct answer is A because the model may have memorized the confidential data during training. Deleting the model removes the risk of the model directly outputting the confidential information. Removing the confidential data from the training dataset and retraining prevents the new model from learning and subsequently using the sensitive information. Options B, C, and D attempt to mitigate the issue after the fact, but they do not address the root problem of the model having already learned the confidential data. They also do not guarantee that the confidential data won't still be indirectly reflected in the model's output in other ways.

Answer 44

B. Improves model performance over time Ongoing pre-training allows the foundation model to learn from and adapt to new data continuously. This results in improved performance and accuracy over time as the model becomes better at handling various tasks and contexts. Option A is incorrect because ongoing pre-training typically increases the model's knowledge and thus its complexity, not decreases it. Option C is incorrect because pre-training adds to the overall training time; it doesn't decrease it. Option D is incorrect because ongoing pre-training primarily affects the model's accuracy and adaptability, not its inference speed.

Answer 45

A. Bilingual Evaluation Understudy (BLEU) BLEU is the correct answer because it is specifically designed to evaluate the quality of machine-translated text by comparing it to human-created reference translations. This directly addresses the company's need to assess the accuracy of its AI-powered translation solution. RMSE is used for regression models, not text translation. ROUGE is used for text summarization, not translation. The F1 score is used for classification problems. These metrics are therefore inappropriate for evaluating the accuracy of translations.

Answer 46

C. Configure SageMaker to use a VPC with an S3 endpoint. This is the correct answer because using a VPC with an S3 endpoint creates a private connection between SageMaker Studio and the S3 bucket. This ensures secure and efficient data transfer within the AWS network, avoiding the public internet and improving both security and performance. Option A is incorrect because Amazon Inspector is for assessing the security vulnerabilities of EC2 instances and not for managing data flow between S3 and SageMaker. Option B is incorrect because Amazon Macie is a data loss prevention service, not directly related to managing data transfer between S3 and SageMaker. Option D is incorrect because S3 Glacier Deep Archive is for archiving data, not for actively accessing and using data for model training in SageMaker. Data retrieval from Glacier Deep Archive is significantly slower than accessing data from a standard S3 bucket, making it unsuitable for this use case.

Answer 47

B Automatic model evaluation is the correct answer because it requires minimal human intervention, making it the most operationally efficient method. Crowd-sourced evaluation and model evaluation with human workers both require significant human resources and time, leading to higher operational overhead. Reinforcement learning from human feedback (RLHF) is a training method, not an evaluation method, and is even more resource-intensive than human-based evaluation.

Answer 48

C. Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score The correct answer is C because ROUGE is specifically designed to evaluate the quality of generated text by comparing it to reference texts. Since the company wants the LLM output to be similar to the provided examples of readable text, ROUGE's ability to measure n-gram overlap, precision, recall, and F1 score makes it the most appropriate metric. Option A (Value of the loss function) is incorrect because the loss function measures the error during model training, not the quality of the final output compared to human-written examples. Option B (Semantic robustness) is too broad; while important, it doesn't directly measure the similarity to the provided examples. Option D (Latency of text generation) measures the speed of the LLM, not the quality of its output.

Answer 49

B The correct answer is B because negative prompts explicitly tell the model what NOT to include in the generated image. This directly addresses the problem of unrelated image generation by filtering out undesired outputs. Option A (zero-shot prompts) doesn't directly address the issue of unrelated images; it simply refers to prompting without prior fine-tuning. Option C (positive prompts), while helpful for specifying desired elements, doesn't actively prevent unrelated content. Option D (ambiguous prompts) would likely exacerbate the problem, leading to even more irrelevant image generation.

Answer 50

B. Increase the regularization parameter to decrease model complexity. The model is overfitting the training data, meaning it's learned the training data too well and is not generalizing well to unseen data. Increasing the regularization parameter reduces the model's complexity, preventing it from fitting the noise in the training data and improving its ability to generalize to new, unseen data. Option A is incorrect because decreasing the regularization parameter would increase model complexity, exacerbating the overfitting problem. Option C is incorrect because adding more features might not solve the overfitting; it could even worsen it if the added features are irrelevant or noisy. Option D is incorrect because training for more epochs would likely further overfit the model to the training data.

Answer 51

B. Amazon Aurora PostgreSQL Amazon Aurora PostgreSQL is the correct answer because it supports the pgvector extension, which is specifically designed for storing and querying vector embeddings. Athena is a query service, not a database. Redshift is a data warehouse optimized for analytical queries, not real-time vector searches. EMR is a managed Hadoop framework and not suitable for this use case.

Answer 52

C The correct answer is C because increasing the classifier-free guidance (CFG) scale directly controls how closely the generated image adheres to the text prompt. A higher CFG scale forces the model to prioritize the prompt details, leading to more specific and less random images. Option A is incorrect because increasing generation steps might improve image quality in general, but it doesn't directly address the lack of specificity tied to the prompt. Option B is incorrect as it relates to masking, which is not directly relevant to increasing specificity based on the prompt. Option D, while seemingly relevant, is less effective than adjusting the CFG scale in Stable Diffusion for controlling adherence to prompt specifics. Increasing prompt strength might help, but CFG scale offers more precise control over how closely the image matches the details in the prompt.

Answer 53

A K-nearest neighbors (k-NN) is the correct answer because it is a classification algorithm suitable for predicting the class of a data point (in this case, the type of flower) based on the features (petal length, petal width, sepal length, and sepal width). K-means is a clustering algorithm, not a classification algorithm. ARIMA is used for time series forecasting and is not applicable here. Linear regression is used for predicting continuous values, not classifying discrete categories.

Answer 54

A. Logistic regression model Logistic regression is the best choice because it allows for easy interpretation and adjustment of the weights assigned to variables. Employees can view the influence of each factor in the model and manually adjust them according to domain knowledge, making lead prioritization more transparent and customizable. Deep learning models (including those built on principal components) and neural networks are generally considered "black boxes," making it difficult to understand and adjust individual variable weights. K-nearest neighbors does not use weights in the same way and is not easily interpretable in terms of feature importance.

Answer 55

D The correct answer is D, Business goal identification. While data collection (C) certainly *must* adhere to compliance regulations, the initial determination of which regulations apply happens during the definition of business goals. The business goals define the scope of the project, including the types of data that will be used and therefore which laws and regulations must be adhered to. Feature engineering (A) and model training (B) occur after the compliance requirements are already defined.

Answer 56

A The correct answer is A, Human-in-the-loop. This technique involves human review and intervention after the AI model has generated content, allowing for the identification and mitigation of bias and toxicity in the output. Options B, C, and D are all techniques used during the *training* phase of the ML lifecycle, not the post-processing phase. Data augmentation modifies the training data, feature engineering focuses on selecting and transforming input features, and adversarial training improves model robustness against adversarial attacks. None of these directly address bias and toxicity in generated content *after* it has been produced.

Answer 57

A Adversarial prompting is the correct answer because it is specifically designed to protect against prompt injection attacks. Adversarial prompting uses carefully crafted prompts to make it harder for the model to be manipulated by malicious inputs. Zero-shot prompting, least-to-most prompting, and chain-of-thought prompting are not designed to directly mitigate prompt injection attacks. While chain-of-thought prompting may indirectly reduce the likelihood of some attacks by encouraging more careful reasoning, it is not a primary defense against prompt injection.

Answer 58

C. F1 score The F1 score is the most appropriate metric because it considers both precision (the proportion of correctly identified positive results) and recall (the proportion of actual positive results correctly identified). A high F1 score indicates a good balance between correctly identifying relevant information and avoiding false positives and negatives, which is crucial for a helpful help desk chatbot. Option A (Precision) only considers the proportion of correctly identified positive results, ignoring the possibility of missing relevant information (low recall). Option B (Time to first token) measures response speed, not accuracy. Option D (Word error rate) is used for evaluating speech recognition systems, not the accuracy of text-based responses.

Answer 59

D The correct answer is D, Extracting the prompt template. This attack directly reveals the internal instructions (the prompt) used to configure the LLM's behavior. This prompt contains rules, constraints, and settings that influence how the model responds. Discovering this information allows attackers to manipulate the model's responses or exploit vulnerabilities in its configuration. Options A, B, and C are other types of prompting attacks, but they do not directly expose the LLM's internal configuration in the same way as prompt extraction.

Answer 60

B The correct answer is B because network isolation in Amazon SageMaker allows training and inference jobs to run within a Virtual Private Cloud (VPC) without internet access. This ensures compliance with regulatory requirements for isolated environments. Option A is incorrect because SageMaker Experiments is for tracking and comparing different training runs, not for network isolation. Option C is incorrect because data encryption at rest is a security measure but does not address the requirement for network isolation and internet access restriction. Option D is incorrect because IAM roles manage access control, not network isolation.

Answer 61

C Amazon SageMaker Model Cards are specifically designed for documenting machine learning models. They provide a standardized format for capturing essential information about a model's intended use, training process, evaluation metrics, and other relevant details. This structured approach facilitates auditing and ensures transparency throughout the model lifecycle. Option A is insufficient because while storing documents in S3 provides storage, it lacks the structure and standardization needed for effective model auditing. Option B is incorrect because AWS AI Service Cards are designed for pre-trained models offered by AWS, not for custom models developed by the ML team. Option D focuses on the training scripts, which are important but do not encompass the comprehensive information required for a thorough model audit; a model audit requires information about the model's performance and intended uses beyond just the training code.

Answer 62

D While B is a reasonable approach, D is the best answer because it directly addresses increased productivity through AI-powered code generation. Option A uses AI but focuses on code review, not generation. Option C predicts issues but doesn't directly increase productivity. Option B, while helpful, may not leverage advanced AI and could involve less sophisticated methods. Natural Language Processing (NLP) tools offer the most direct route to enhanced productivity by automating code creation.

Answer 63

C. Time series data The correct answer is C because the Amazon SageMaker DeepAR algorithm is specifically designed for time series forecasting. Time series data, characterized by observations collected over time at regular intervals (e.g., daily sales figures), is essential for DeepAR to identify historical patterns and predict future demand. Options A, B, and D are incorrect because text data, image data, and binary data are not suitable for forecasting future values based on temporal trends, which is the core function of DeepAR.

Answer 64

C Retraining (A) and fine-tuning (B) an LLM on a large dataset like company policies are computationally expensive and time-consuming. Pre-training and data augmentation (D) are relevant to initial LLM development, not adapting an existing one for a specific knowledge base. Retrieval Augmented Generation (RAG) (C) is the most cost-effective solution because it leverages a pre-trained LLM and only requires retrieving relevant information from the policy database at query time, avoiding the resource-intensive process of retraining or fine-tuning the entire model.

Answer 65

A Amazon Q Developer is the correct answer because it provides natural language assistance and tools to help users build and deploy machine learning models, including assistance with AWS Glue tasks such as data integration, transformation, and automation. This simplifies the use of AWS Glue for users with minimal programming experience. AWS Config (B) is incorrect; it's a service for managing and assessing the configurations of AWS resources, not for directly assisting with AWS Glue development. Amazon Personalize (C) is incorrect as it's a service for building personalized recommendations, unrelated to simplifying AWS Glue usage. Amazon Comprehend (D) is also incorrect; it's a service for natural language processing, not for assisting with AWS Glue development or simplifying its usage.

Answer 66

A. Fairness The correct answer is A because using a diverse dataset mitigates bias and ensures the model performs equally well across different demographic groups. This directly addresses the principle of fairness in AI. Option B, Explainability, refers to the ability to understand how a model arrives at its predictions. Option C, Governance, relates to the policies and procedures for managing AI development and deployment. Option D, Transparency, focuses on the openness and clarity of the AI system's operations. None of these options are directly demonstrated by the company's use of a diverse dataset.

Answer 67

A. Amazon SageMaker Clarify Amazon SageMaker Clarify is the correct answer because it provides both bias detection and model explainability features. The other options are incorrect: SageMaker Data Wrangler is for data preparation, not bias detection or explanation; SageMaker Model Cards are for documenting models, not directly detecting bias or explaining predictions; and AWS AI Service Cards are not a specific solution for bias detection or model explainability within the SageMaker ecosystem.

Answer 68

A. AWS Key Management Service (AWS KMS) AWS KMS is the correct answer because it allows the management and use of customer-managed encryption keys (CMKs) to encrypt data, including model artifacts created during Amazon Bedrock model customization jobs. Amazon Inspector is a vulnerability management service, Amazon Macie is a data security and privacy service, and AWS Secrets Manager is for managing secrets, not encryption keys for data at rest. These services do not directly address the requirement of using a company-managed encryption key for model artifact encryption.

Answer 69

B. Text generation The correct answer is B because generating code from natural language comments requires the LLM to create new text (the code) based on the input (the comments). Text generation is specifically designed for this task of producing human-like text, which includes code. Option A (Text summarization) is incorrect because it focuses on shortening existing text, not creating new code. Option C (Text completion) is partially correct as it can be used in this context, but text generation is a more encompassing and accurate description of the core functionality needed. Option D (Text classification) is incorrect because it involves categorizing text, not generating it. While text completion might be *used* in a text generation process, text generation itself is the higher-level feature directly addressing the core problem.

Answer 70

C The correct answer is C because assessing the model's alignment with specific use cases directly addresses whether the model fulfills the business's needs and goals. Options A, B, and D are insufficient on their own. Benchmark datasets (A) may not reflect real-world business scenarios. Analyzing architecture and hyperparameters (B) provides technical insights but doesn't guarantee alignment with business objectives. Measuring computational resources (D) is important for deployment but doesn't evaluate the model's effectiveness in meeting business goals.

Answer 71

A. Supervised learning Supervised learning is the correct answer because it uses labeled data, which the company already possesses. The model learns to map the input images (animals) to their corresponding labels (animal types). Unsupervised learning doesn't use labeled data, making it unsuitable. Reinforcement learning involves training an agent through trial and error, which isn't applicable here. Active learning focuses on selectively labeling data, but the company is not going to label more data.

Answer 72

D Amazon SageMaker is a fully managed service for deploying machine learning models. Using an Amazon SageMaker endpoint allows for predictions without the need to manage servers or infrastructure because SageMaker handles resource provisioning, scaling, and maintenance. Options A, B, and C all require some level of server or infrastructure management. EC2 instances require direct server management. EKS requires managing a Kubernetes cluster. CloudFront and S3 are storage and delivery services; they don't inherently provide the compute necessary for model inference.

Answer 73

C. Computer vision Computer vision is the correct answer because it focuses on enabling computers to "see" and interpret images, making it ideal for inspecting products for defects. A recommendation system (A) is used for suggesting items, NLP (B) deals with text and language, and while image processing (D) is a component of computer vision, it is not the overarching AI application type being described.

Answer 74

B. Amazon SageMaker Amazon SageMaker offers automated model tuning through features like SageMaker Autopilot and SageMaker Hyperparameter Optimization. Amazon Personalize is for personalized recommendations, Amazon Athena is for querying data, and Amazon Comprehend is for natural language processing; none of these directly address automated model tuning for a predictive model.

Answer 75

A and D AWS CloudTrail (A) is the correct choice because it is designed to log API calls made to various AWS services, including Amazon Bedrock. It provides a detailed audit trail of these actions. Amazon S3 Intelligent-Tiering (D) is the correct choice because it offers cost-effective long-term storage. It automatically moves data between access tiers based on usage patterns, optimizing costs for infrequently accessed logs that need to be retained for five years. Amazon CloudWatch (B) is incorrect because it is primarily for monitoring metrics and logs from applications and resources, not specifically for API call logging at the level of detail required. AWS Audit Manager (C) is incorrect because it is a compliance management service, not a logging service. It does not directly store logs. Amazon S3 Standard (E) is incorrect because, while it provides secure storage, it is significantly more expensive than Intelligent-Tiering for long-term storage of infrequently accessed data.

Answer 76

A. Amazon Personalize Amazon Personalize is a fully managed machine learning service designed to create personalized recommendations. It uses user behavior and item metadata to re-rank search results, directly addressing the ecommerce company's need for customized search engine recommendations. B. Amazon Kendra is a search service, but it doesn't inherently personalize results based on individual user behavior. C. Amazon Rekognition is an image and video analysis service, irrelevant to this scenario. D. Amazon Transcribe is a speech-to-text service, also unrelated to personalized search recommendations.

Answer 77

A Data residency is the correct answer because it ensures that data remains within specified geographical boundaries, complying with data sovereignty and privacy regulations like HIPAA or GDPR. The question specifically states that the data must remain in the country of origin, and data residency directly addresses this requirement. Option B, data quality, focuses on the accuracy and completeness of data, which is important but doesn't directly address the geographical location requirement. Option C, data discoverability, refers to the ease of finding and accessing data, and Option D, data enrichment, involves enhancing data with additional information; neither of these options addresses the data location requirement.

Answer 78

A. Amazon CloudWatch Amazon CloudWatch is the correct answer because it is a highly scalable monitoring service specifically designed to track and provide performance metrics for AWS resources, including machine learning systems. It offers real-time monitoring and alerting capabilities, making it suitable for performance monitoring needs. AWS CloudTrail (B) is for auditing and security logging, not performance monitoring. AWS Trusted Advisor (C) provides recommendations for best practices, not direct performance monitoring. AWS Config (D) monitors and manages the configuration of AWS resources, but not their performance.

Answer 79

B. Continuous pre-training Continuous pre-training is the correct answer because it involves regularly updating the foundation model with new data, ensuring the model stays relevant and accurate over time. Batch learning trains the model on a fixed dataset and doesn't incorporate new data regularly. Static training implies no updates after initial training. Latent training is not a standard model training approach in this context.

Answer 80

A. Chain-of-thought prompting

Answer 81

B. Amazon Bedrock Amazon Bedrock is the correct answer because it is a fully managed service explicitly designed to provide access to foundation models (FMs) from various AI companies. This allows users to build and scale generative AI applications easily. Option A, Amazon Q Developer, focuses on helping build analytics, AI/ML, and generative AI applications but is not primarily focused on providing access to foundation models in the same way Bedrock does. Options C and D, Amazon Kendra and Amazon Comprehend, are services focused on specific AI tasks (search and natural language processing respectively) and don't offer the broad access to foundation models that Bedrock provides.

Answer 82

D. Chain-of-thought prompting Chain-of-thought prompting is the correct answer because it explicitly encourages the LLM to break down complex problems into smaller, manageable steps and explain its reasoning process at each stage. This directly addresses the requirement for detailed reasoning and step-by-step explanations. Few-shot prompting (A) provides a few examples to the LLM, but doesn't inherently force a step-by-step explanation. Zero-shot prompting (B) provides no examples, relying solely on the prompt's instructions, making detailed reasoning less likely. Directional stimulus prompting (C) is not a standard prompt engineering technique.

Answer 83

[Image](https://img.examtopics.com/aws-certified-ai-practitioner-aif-c01/image6.png)

Answer 84

B. K-means K-means is a clustering algorithm that groups data points based on similarity. This directly addresses the company's need to group customers based on their demographics and buying patterns. The other options are incorrect because: K-nearest neighbors is a classification algorithm; decision trees and support vector machines are used for classification or regression, not clustering.

Answer 85

B. Add messages to the model prompt. The correct answer is B because LLMs process information provided in the prompt. Including previous messages in the prompt gives the LLM the context needed to understand the conversation history and respond appropriately. Option A is incorrect because logging messages doesn't directly provide the context to the LLM during processing. Option C is incorrect because Amazon Personalize is a recommendation engine, not a tool for managing LLM context within a conversation. Option D is incorrect because Provisioned Throughput manages the LLM's processing speed, not its access to past conversation data.

Answer 86

C. Amazon Bedrock Amazon Bedrock is the correct answer because it's a fully managed service providing access to various foundation models (FMs). These FMs can be used to build and scale generative AI applications, such as automating product recommendations and descriptions based on customer location. A. Amazon Macie is incorrect; it's a data security and privacy service, not relevant to generating product recommendations. B. Amazon Transcribe is incorrect; it's a speech-to-text service, not designed for generating product recommendations. D. Amazon Textract is incorrect; it's an optical character recognition (OCR) service, not suitable for this task.

Answer 87

A Amazon Macie is the correct answer because it is a fully managed service specifically designed for automated sensitive data detection in Amazon S3. It requires minimal development effort as it's a pre-built solution. Option B is incorrect because deploying and managing a custom LLM through SageMaker requires significant development effort. While LLMs are powerful, they are overkill and less efficient for this specific problem. Option C is incorrect because developing and maintaining multiple regex patterns is time-consuming and requires considerable development effort. It's also prone to errors and may not catch all instances of sensitive data. Option D is incorrect because it doesn't solve the problem; it relies on user behavior which is unreliable and doesn't provide automated detection.

Answer 88

The correct order is: 1. Define business goal and frame ML problem 2. Develop a model 3. Deploy a model 4. Monitor model This is the correct order because it follows the standard ML development lifecycle. First, you must define the business problem you're trying to solve with ML and frame it as an ML problem. Then, you develop and train a model. Next, you deploy the model to make predictions in a production environment. Finally, you monitor the model's performance and make adjustments as needed. Any other order would be inefficient or incomplete.

Answer 89

The correct answer is shown in [Image](https://img.examtopics.com/aws-certified-ai-practitioner-aif-c01/image4.png). Scenario 1 (Chatbot) requires real-time inference because it needs immediate predictions to interpret user intent. Scenario 2 (Data Processing Job) uses batch transform because it processes large volumes of data during specific periods and doesn't require immediate results.

Answer 90

The correct answer matches the image provided as the "Suggested Answer" ([Image](https://img.examtopics.com/aws-certified-ai-practitioner-aif-c01/image8.png)). This is because: * **Few-shot prompting** involves providing a few examples to the model before asking the main question. This matches template 1 in the question image. * **Zero-shot prompting** involves asking a question directly without providing any examples. This matches template 3. * **Chain-of-thought prompting** involves breaking down a complex problem into smaller, logical steps to guide the model's reasoning process. This matches template 2.

Answer 91

B. Standardizing information about a model’s purpose, performance, and limitations. Amazon SageMaker Model Cards are designed to provide standardized and detailed documentation about AI models. They help clearly communicate a model's purpose, performance, and limitations, promoting transparency and ethical AI use. Option A is incorrect because while Model Cards might be visually appealing, their primary benefit is standardization, not visual appeal. Option C is incorrect because Model Cards do not affect a model's computational requirements. Option D is incorrect because Model Cards are for documentation, not physical storage of the model itself.

Answer 92

A The F1 score is a measure of a model's precision and recall. Options B, C, and D are incorrect because the F1 score does not directly measure model speed, financial cost, or energy efficiency. The F1 score focuses solely on the accuracy of a model's predictions, balancing the number of true positives against false positives and false negatives.

Answer 93

D The best answer is D because Retrieval Augmented Generation (RAG) with prompt engineering is the most cost-effective solution for handling frequently changing customer questions. RAG combines a language generation model with an information retrieval system. This allows the system to generate answers based on up-to-date data and questions without requiring continuous model fine-tuning or retraining, making it more cost-effective than options A, B, and C. Options A, B, and C would necessitate ongoing and potentially expensive retraining or fine-tuning as questions evolve, whereas RAG adapts more readily to changing information.

Answer 94

A. Fairness The scenario highlights a fairness issue. A biased training dataset leads to an AI system that may discriminate against certain demographic groups, thus violating the principle of fairness in AI. Explainability (B) refers to understanding how the AI arrives at its decisions; while relevant, it's secondary to the core issue of unfair outcomes. Privacy and security (C) are not directly implicated in the description. Transparency (D), while related to responsible AI, is less central than the ethical concern of potential bias and unfair treatment caused by the skewed dataset.

Answer 95

A, B AWS Audit Manager helps in continuous audit management by automating the collection of evidence related to compliance with company policies and industry regulations. AWS Config allows continuous monitoring and assessment of resource configurations to ensure they comply with defined policies and regulations. Amazon Inspector is used for vulnerability assessment, Amazon CloudWatch for monitoring application performance, and AWS CloudTrail for logging API calls – these are not directly related to assessing compliance requirements in the same way as Audit Manager and Config.

Answer 96

D Fine-tuning, retraining, and training a new FM are all significantly more expensive and time-consuming than prompt engineering. Prompt engineering involves crafting more effective prompts to guide the AI's response, optimizing results with minimal investment. Options A, B, and C require substantial computational resources and time, making them less cost-effective than prompt engineering for simply improving response accuracy.

Answer 97

B Amazon Comprehend toxicity detection is the most appropriate service because it uses natural language processing (NLP) to analyze text and identify harmful language such as toxic comments. It can automatically detect patterns of toxicity without requiring labeled data for training. Option A is incorrect because Amazon Rekognition is designed for image and video analysis, not text. Option C is incorrect because while SageMaker can be used to build custom models, it requires data for training, contradicting the problem statement. Option D is incorrect because Amazon Polly is a text-to-speech service and not relevant to identifying harmful language in text.

Answer 98

D. Amazon SageMaker Model Monitor Amazon SageMaker Model Monitor is designed for continuous monitoring of the performance of machine learning models deployed in production. It helps identify potential quality drifts or shifts in data patterns over time, ensuring the model continues to perform as expected. This is crucial for tracking changes in viewer behavior and demographics while recommending personalized content. Options A, B, and C are incorrect because they do not directly address the need for monitoring model quality drift in a production environment. Amazon Rekognition is for image and video analysis, Amazon SageMaker Clarify focuses on model bias detection, and Amazon Comprehend is for natural language processing.

Answer 99

A. Amazon SageMaker Model Cards Amazon SageMaker Model Cards are designed to promote transparency and explainability by documenting detailed information about a model's purpose, performance, limitations, and decision-making processes. They help provide clear and standardized explanations of model outputs, making them ideal for fulfilling the company's requirements. Options B, C, and D are incorrect because they are specific AI/ML services focused on particular tasks (image analysis, text analysis, and conversational AI, respectively) and do not offer the general explainability and transparency features provided by SageMaker Model Cards.

Answer 100

A. Amazon Translate Amazon Translate is a machine learning-based service that provides high-quality automatic translation across multiple languages. It is ideal for quickly and efficiently generating product descriptions in different languages, allowing the company to easily reach a global audience. The other options are incorrect: Amazon Transcribe is for speech-to-text, Amazon Kendra is a search service, and Amazon Polly is for text-to-speech.

Answer 101

The correct answer is represented by the image: [Image](https://img.examtopics.com/aws-certified-ai-practitioner-aif-c01/image12.png) The correct mapping is based on the typical applications of each customization method: * **"The models must be taught a new domain-specific task"**: Model fine-tuning is the appropriate choice. Fine-tuning adapts a pre-trained model to a new, specific task using a dataset relevant to that task. * **"A limited amount of labeled data is available and more data is needed"**: Data augmentation is the best solution. This technique artificially increases the size of a limited dataset by creating modified versions of existing data points. * **"Only unlabeled data is available"**: Continued pre-training is the correct method. This involves training the model further on a large amount of unlabeled data to improve its general understanding and performance before fine-tuning on a specific task.

Practice Questions - Amazon AWS Certified AI Practitioner AIF-C01 Flashcards

(154 cards)