Practice Questions Flashcards

Question

[https://www.examtopics.com/discussions/databricks/view/150272-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/150272-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer has created a RAG application which can help employees retrieve answers from an internal knowledge base, such as Confluence pages or Google Drive. The prototype application is now working with some positive feedback from internal company testers. Now the Generative Al Engineer wants to formally evaluate the system’s performance and understand where to focus their efforts to further improve the system. How should the Generative AI Engineer evaluate the system? A. Use cosine similarity score to comprehensively evaluate the quality of the final generated answers. B. Curate a dataset that can test the retrieval and generation components of the system separately. Use MLflow’s built in evaluation metrics to perform the evaluation on the retrieval and generation components. C. Benchmark multiple LLMs with the same data and pick the best LLM for the job. D. Use an LLM-as-a-judge to evaluate the quality of the final answers generated.

Answer 1

B. Curating a dataset to test retrieval and generation separately and using MLflow's evaluation metrics is the correct approach. This is because it allows for a focused understanding of each component's performance, enabling targeted improvements. Why other options are wrong: A. Cosine similarity alone is insufficient to comprehensively evaluate the generated answers' quality, as it doesn't capture semantic accuracy or relevance. C. Benchmarking LLMs is useful for selection but doesn't directly evaluate the RAG system's performance, including retrieval effectiveness. D. While LLM-as-a-judge can be valuable, it is more effective when used in conjunction with a separate evaluation of the retrieval component, as suggested by option B.

Answer 2

B. Logging the model using MLflow during training, directly registering the model to Unity Catalog using the MLflow API, and starting a serving endpoint is the easiest process. MLflow streamlines model management and deployment within Databricks. * **Why B is correct:** MLflow provides a convenient and integrated way to log, register, and deploy models directly within the Databricks environment. It simplifies the process compared to manual methods. * **Why A is incorrect:** While Unity Catalog Volumes can store model artifacts, pickling and manually uploading is less streamlined than using MLflow's integrated capabilities. * **Why C is incorrect:** Building Docker images and managing containers adds unnecessary complexity compared to using Databricks' built-in serving capabilities with MLflow. * **Why D is incorrect:** Wrapping the model in a Flask application and serving with Gunicorn is a more manual approach and doesn't leverage Databricks' managed serving infrastructure.

Answer 3

** B. Deploy the model using pay-per-token throughput as it comes with cost guarantees. **Explanation:** * **Why B is correct:** Pay-per-token throughput is designed for scenarios with varying or low request volumes. It avoids the fixed costs associated with provisioned throughput, making it more cost-effective when usage is low. * **Why other options are wrong:** * **A:** Switching to External Models is not directly related to cost-effectiveness in terms of throughput management. * **C:** Changing the model based on parameter size is not a direct answer, as it targets hardware constraints and not necessarily cost-effectiveness for low request volumes. * **D:** Manually throttling requests might avoid rate limiting, but it doesn't address the fundamental issue of cost-effectiveness for the current volume of use.

Answer 4

B. Using a neutralizer to normalize the tone and style of the underlying documents will not help and will likely hinder the LLM from generating text with the desired tone and style. Options A, C, and D are all valid methods for shaping the LLM's output to match the desired tone and style. A neutralizer would remove the elements needed for the desired output.

Answer 5

** D. Create an agent with tools for SQL querying of Delta tables and web searching, provide retrieved values to an LLM for generation of response. * **Why this is correct:** This approach allows the LLM to access real-time data from both the Delta table (stock prices) and the internet (news articles) through dedicated tools managed by an agent. The agent's ability to query and search ensures the LLM receives the most up-to-date and relevant information for generating responses. * **Why other options are wrong:** * **A:** Using an LLM to summarize news and then lookup stock tickers is an indirect and potentially less accurate way to retrieve stock prices. * **B:** This focuses only on volatile stocks and doesn't address the need for comprehensive news and stock price data. * **C:** Storing data in a vector store might not provide the most up-to-date information, especially for rapidly changing data like stock prices and news. Although RAG is a solid architecture, this does not address the need to perform real-time SQL queries and web searches to gather up-to-date information.

Answer 6

B. Diversity of responses. A variety of responses keeps users interested and engaged with the chatbot, leading to longer platform retention. Randomness (A) and Repetition of responses (D) would make the bot unpredictable or boring, decreasing engagement. Lack of relevance (C) would make the bot unhelpful and frustrating for users.

Answer 7

C. unstructured. This package allows you to extract text efficiently while minimizing the amount of code.

Answer 8

C. The chatbot should be implemented as a multi-step LLM workflow. First, identify the type of question asked, then route the question to the appropriate model. If it’s an upcoming event question, send the query to a text-to-SQL model. If it’s about ticket purchasing, the customer should be redirected to a payment platform. This is the right answer because the requirement specifies that the chatbot needs to understand the type of question being asked and route it to the appropriate model. Option C provides a workflow that addresses this directly by using a multi-step LLM workflow to classify the question and route it accordingly. Here's why the other options are wrong: * **A:** The chatbot should only look at previous event information - This is too restrictive and doesn't address the requirement of handling different types of queries, such as ticket purchasing. * **B:** There should be two different chatbots handling different types of user queries - While technically possible, it is less efficient and harder to manage than a single chatbot that can intelligently route queries. * **D:** The chatbot should only process payments - This is far too limited and does not address the broader business requirement of answering various types of questions.

Answer 9

D. DBRX DBRX is the correct answer because it is an open-source LLM developed by Databricks specifically designed with a large context window (up to 32K tokens), making it suitable for applications requiring handling long documents or extended conversations. * **Why A is wrong:** DistilBERT is a smaller, faster transformer model primarily used for tasks like text classification and question answering, and does not have a large context window. * **Why B is wrong:** MPT-30B has a context window of 8k tokens, which is less than DBRX. * **Why C is wrong:** Llama2-70B has a context window of 4096, which is also less than DBRX. ```

Answer 10

D. Curating upstream data properly, including manual review, is the most effective way to mitigate offensive text outputs in a RAG system because it directly addresses the source of the problem: potentially harmful or biased content in the data. * **Why D is correct:** Manual review allows for the identification and removal or modification of offensive content *before* it's ingested into the RAG system. * **Why other options are wrong:** * A (Increasing data updates) might keep the data fresher but doesn't guarantee the removal of offensive content. * B (Informing the user) manages expectations but doesn't solve the problem of offensive outputs. * C (Restricting access) doesn't address the issue of offensive content within the existing data.

Answer 11

B. Fine-tuning the model and *hoping* it will learn what is appropriate is not a reliable approach to prevent hallucination or data leakage. Fine-tuning *can* help, but it requires careful data preparation and validation. Relying solely on the hope that the model will learn what's appropriate is insufficient. * **Why A is wrong:** Guardrails are an effective way to filter outputs and prevent unwanted content from being displayed. * **Why C is wrong:** Limiting data access based on user level is an important security measure to prevent unauthorized disclosure. * **Why D is wrong:** Strong system prompts guide the model's behavior and can help to keep outputs on track and reduce undesirable behaviour.

Answer 12

The correct answer is **D. Provide few shot examples of desired output format to the system and/or user prompt.** * **Why D is correct:** Few-shot learning involves providing the LLM with examples of the desired input and output format. By showing the model examples of summaries without explanations, it can learn to generate summaries that conform to the desired format. * **Why A is incorrect:** Splitting the LLM output by newline characters is a crude method that may truncate the summary itself, not just the explanation. It's not a reliable or targeted solution. * **Why B is incorrect:** Tuning chunk size and embedding models addresses the quality of information retrieval and might affect the summary's content, but it doesn't directly address the issue of unwanted explanations in the output. * **Why C is incorrect:** Re-examining the document ingestion logic will ensure data integrity but it doesn't control the format of the output the LLM generates.

Answer 13

D. Using a different semantic similarity search algorithm can improve the relevance of the RAG response. The problem is that the RAG application is retrieving irrelevant context. Changing the semantic similarity search algorithm directly addresses this by improving the retrieval of relevant information. * **Why D is correct:** The core issue is poor retrieval of relevant information. A different semantic similarity search algorithm can more accurately identify and retrieve contextually relevant data. * **Why A is wrong:** Assessing the quality of the retrieved context is useful for diagnosis but doesn't directly solve the problem of retrieving irrelevant information. * **Why B is wrong:** Caching is for performance optimization and doesn't address the relevance of the retrieved context. * **Why C is wrong:** Using a different LLM addresses the quality of the generated response, not the accuracy of the retrieved context.

Answer 14

B. Picking a smaller, domain-specific LLM is the right choice because it balances cost-effectiveness with performance tailored to the cancer research field. This allows the startup to cater to customer needs without incurring the high costs associated with larger, general-purpose LLMs. * **Why B is right:** A smaller, domain-specific LLM is more efficient and cost-effective for a specific task like cancer research. * **Why other options are wrong:** * A: Limiting documents may hurt the quality of the RAG application, as it restricts the information available for retrieval. * C: Limiting queries impacts customer experience and doesn't address the core issue of cost-effective model selection. * D: Using the largest LLM is not cost-conscious and may be overkill for a specific domain, leading to unnecessary expenses.

Answer 15

D. LangChain is the most suitable library. LangChain is specifically designed for building applications using LLMs, including multi-step workflows. Pandas is for data manipulation, TensorFlow is a deep learning framework, and PySpark is for large-scale data processing. These are not designed for building LLM workflows.

Answer 16

** The correct answer is **D. Llama2-70B**. * **Why it's correct:** Llama2-70B is a large language model that can be self-hosted, addressing both the need for high-quality answers and the requirement that no data be transmitted to third parties. The question states that latency is not a concern, therefore a bigger, slower model such as Llama2-70B is acceptable. * **Why other options are wrong:** * **A. Dolly 1.5B:** While it can be self-hosted, it's a smaller model and unlikely to provide the "best possible quality" answers. * **B. OpenAI GPT-4:** While it offers high-quality answers, it involves transmitting data to a third-party (OpenAI), violating the confidentiality requirements. * **C. BGE-large:** BGE-large is primarily an embedding model, not a generative model. It's suitable for the retrieval component of a RAG system, but cannot generate answers on its own.

Answer 17

B. **Correct**. Option B is correct because it provides an example of the desired JSON output format. Including an example in the prompt helps the LLM understand the exact structure and format required, leading to higher accuracy. A. **Incorrect**. Option A only instructs the LLM to return JSON format without providing an example, which can lead to variability in the output. C. **Incorrect**. Option C asks for human-readable format, which contradicts the requirement for JSON format. D. **Incorrect**. Option D has a typo ("order IReturn") and lacks an example, making it less effective than Option B.

Answer 18

** B is the correct answer. * **Why B is correct:** This approach directly leverages the capabilities of an agent-based LLM system. By providing the agent with a system prompt that defines available tools (API call for event dates, table query for standings, etc.), the agent can intelligently decide which tool to use based on the user's query and execute the appropriate actions. * **Why A is incorrect:** While RAG is useful, it primarily addresses answering questions from unstructured text data. It doesn't directly address the need to make API calls or query tables. * **Why C is incorrect:** Relying on the LLM to output specific keywords ("RAG", "API", "TABLE") and then using conditional statements is less robust and more prone to errors. It tightly couples the LLM's output format to the code, making it less flexible. * **Why D is incorrect:** Including all possible event dates and table information in the system prompt is not scalable and can quickly exceed the LLM's context window. It's also inefficient since the information needs to be updated manually in the prompt. RAG is useful, but does not fully cover the functionality needed.

Answer 19

D. Splitting the HR documentation into chunks, embedding them into a vector store, using the employee question to retrieve relevant chunks, and then using an LLM to generate a response is the most straightforward and efficient approach. This method leverages vector embeddings for semantic search, enabling the LLM to access the most relevant information. * **Why D is correct:** This approach uses retrieval-augmented generation (RAG), which is standard for this type of task. It breaks down the HR documentation into manageable chunks, creates vector embeddings for each chunk, stores the embeddings in a vector store, retrieves the most relevant chunks based on the user's question, and then feeds these chunks along with the user's question into the LLM to generate an answer. * **Why A is incorrect:** Averaging embeddings across entire documents loses granular context. This is generally less effective than chunking. * **Why B is incorrect:** Summarizing the entire HR documentation loses detailed information and context that might be necessary to answer specific employee questions accurately. Feeding the summaries might be too general for the LLM to construct responses. * **Why C is incorrect:** Creating an interaction matrix using ALS is an overcomplicated approach for this problem. ALS is typically used for recommender systems and collaborative filtering, which is not the primary goal here. It adds unnecessary complexity to the system design, particularly in the initial stages.

Answer 20

C. Adding credentials using environment variables is the correct approach. Environment variables allow you to securely pass credentials (like API keys, database passwords, etc.) to your application without hardcoding them into your code. This is the most secure way to manage secrets. Options A and B are not designed for secure credential management. Option D is incorrect as it is never recommended to pass secrets in plain text due to security risks.

Answer 21

D. CodeLlama-34B is the best fit because it is specifically designed for code generation and prioritizes quality across multiple programming languages, aligning with the team's primary objective. The other options are not optimized for code generation. Llama2-70b is a general-purpose language model. BGE-large is for embeddings. MPT-7b might be suitable for code, but CodeLlama is more specialized.

Answer 22

B. Flatten the dataframe to one chunk per row, create a unique identifier for each row, and enable change feed on the output Delta table. **Explanation:** Enabling Change Data Feed (CDF) is a critical requirement for Databricks Vector Search ingestion. Without it, the Vector Search index won’t automatically pick up inserts or updates from the Delta table. **Why other options are wrong:** * **A:** While UDFs and JSON structures might be useful for other data processing tasks, they are not directly related to the core requirements for preparing data for Databricks Vector Search ingestion. * **C:** Using the original filename as the unique identifier is not sufficient, as each chunk needs a unique identifier for proper indexing and retrieval in Vector Search. Also, this option does not enable change feed. * **D:** This option is close but misses the key requirement to enable change feed, which is necessary for ingestion into Vector Search. The Delta table needs to have change feed enabled to allow for automatic updates to the Vector Search index.

Answer 23

A. Use OAuth machine-to-machine authentication * **Why A is correct:** OAuth machine-to-machine (M2M) authentication is a widely recognized best practice for cloud platforms due to its enhanced security. It avoids reliance on individual user credentials, which can be compromised, and it's suitable for automated processes. * **Why B is incorrect:** While service principal tokens are better than user tokens, OAuth M2M is a superior method for secure, non-interactive authentication. * **Why C is incorrect:** Using a workspace user's access token is not a security best practice, as the application's access is tied to a specific user and their permissions. If the user leaves or their permissions change, the application could be affected. * **Why D is incorrect:** Rotating tokens is a good practice, but using a frequently rotated token belonging to a workspace user or service principal is still less secure than OAuth M2M. User tokens should be avoided.

Answer 24

A. MMLU is NOT a metric to implement. MMLU is a benchmarking score used during LLM pre-training and evaluation, not something you’d monitor in a deployed, real-world production setting. The other options (B, C, and D) are relevant metrics for monitoring a production LLM application as they directly relate to its performance and utility in a real-world customer service scenario.

Answer 25

** C. Curating a dataset to test the retrieval and generation components separately, and using MLflow's built-in evaluation metrics, is the most effective method. This modular approach allows for targeted debugging and optimization of each component in the RAG system. By isolating variables, the engineer can methodically evaluate each part, leading to a more scientific and effective approach to improving the system's overall performance. The other options are not as comprehensive as they do not allow for the seperation of the generation and retrieval components.

Answer 26

AD. * **A is correct:** Adding the section header as a prefix to each chunk gives the retriever additional context, allowing it to better match queries to the correct policy area, boosting retrieval relevance. * **D is correct:** Increasing the document chunk size allows each chunk to carry more contiguous context. This prevents key details from being split across multiple chunks and allows the retriever to return more complete sections. * **B is incorrect:** Splitting the document by sentence might lead to an even more fragmented context and hurt retrieval performance. * **C is incorrect:** The engineer already tried different embedding LLMs, so the size of the embedding model isn't the core issue. * **E is incorrect:** Fine-tuning the response generation model addresses the generation of responses, not the retrieval of relevant context.

Answer 27

B. Host Llama Guard on Foundation Model API and use it to detect unsafe responses. **Explanation:** Hosting Llama Guard on the Foundation Model API provides out-of-the-box toxicity and safety checks without application code changes. It proactively blocks or redacts unsafe content, requiring the least effort compared to custom detection calls or regex rules. Relying on user reports (Option A) is reactive, not proactive. Adding LLM calls (Option C) or regex expressions (Option D) requires more effort and custom implementation.

Answer 28

** A. Use few-shot prompting to instruct the model on the expected output format. Few-shot prompting provides the LLM with several examples of inputs paired with the desired, concise output format (just the mushroom label). This teaches the model the pattern you want it to follow, making it more reliable than single zero-shot instructions or generic system messages. The model can see concrete input-output pairs, learning the exact format required. Options B, C, and D are less effective because they don't provide the model with specific examples of the desired output format.

Answer 29

** C. Keeping all news articles is not advisable because irrelevant content can lead to poor retrieval and potentially derail the model, even if the prompt emphasizes a focus on technology news. This increases noise and decreases the relevancy of the context provided to the model. The other options will filter out the noise, improving the accuracy of the RAG application.

Answer 30

A is correct. * **Explanation:** Option A directly addresses the company's goals by enabling both faster and more personalized customer support. Grouping chat logs by users and summarizing their interactions allows for a personalized response. * **Why other options are wrong:** * B lacks the personalization aspect, focusing on similar questions rather than individual user history. * C and D focus on customer reviews instead of customer support interactions. * C and D do not group by user, which is critical for personalized service. ```

Answer 31

A. BLEU (Bilingual Evaluation Understudy) is the correct metric. It is specifically designed to evaluate the quality of machine-translated text by comparing it to one or more reference translations. * **Why A is correct:** BLEU is explicitly designed for translation tasks, measuring the similarity between the generated translation and reference translations. * **Why B is wrong:** NDCG (Normalized Discounted Cumulative Gain) is used for ranking tasks, not translation. * **Why C is wrong:** ROUGE is primarily used for text summarization, not translation. * **Why D is wrong:** Recall, while a general evaluation metric, doesn't provide a specific measure tailored to translation quality like BLEU.

Answer 32

D. Ingest documents from a source -> Index the documents and save to Vector Search -> User submits queries against an LLM -> LLM retrieves relevant documents -> LLM generates a response -> Evaluate model -> Deploy it using Model Serving * **Why D is correct:** The correct sequence for building and deploying a RAG application involves ingesting and indexing documents, allowing users to submit queries, the LLM retrieving relevant documents and generating a response, evaluating the response, and finally deploying the model. Evaluation must happen after response generation. * **Why other options are wrong:** * **A:** Evaluation should occur after the LLM generates a response, not before. * **B:** Document ingestion and indexing must precede any user queries. * **C:** Evaluation and deployment cannot happen before user queries and response generation; these steps rely on having data and a model to evaluate.

Answer 33

D. All of the above * **Why D is correct:** All options offer potential methods for adjusting the LLM's output. Excluding unwanted headlines helps filter results. Fine-tuning directly trains the model on desired styles. Prompt engineering guides the LLM's generation process. * **Why A is incorrect:** While filtering can help, it doesn't address the underlying issue of the LLM generating unwanted outputs. It's a reactive, not proactive, approach. * **Why B is incorrect:** Fine-tuning can be effective, but it can be resource-intensive and time-consuming. It might be overkill if prompt engineering can achieve the desired results. * **Why C is incorrect:** Prompt engineering is a quick and efficient way to guide the LLM's output. It provides instructions on the desired tone and style. While effective alone, it is enhanced when used with filtering and fine-tuning methods.

Practice Questions Flashcards

(57 cards)