Practice Questions Flashcards

(57 cards)

1
Q

ExamTopics URL

A Generative Al Engineer has created a RAG application to look up answers to questions about a series of fantasy novels that are being asked on the author’s web forum. The fantasy novel texts are chunked and embedded into a vector store with metadata (page number, chapter number, book title), retrieved with the user’s query, and provided to an LLM for response generation. The Generative AI Engineer used their intuition to pick the chunking strategy and associated configurations but now wants to more methodically choose the best values.

Which TWO strategies should the Generative AI Engineer take to optimize their chunking strategy and parameters? (Choose two.)

A. Change embedding models and compare performance.
B. Add a classifier for user queries that predicts which book will best contain the answer. Use this to filter retrieval.
C. Choose an appropriate evaluation metric (such as recall or NDCG) and experiment with changes in the chunking strategy, such as splitting chunks by paragraphs or chapters. Choose the strategy that gives the best performance metric.
D. Pass known questions and best answers to an LLM and instruct the LLM to provide the best token count. Use a summary statistic (mean, median, etc.) of the best token counts to choose chunk size.
E. Create an LLM-as-a-judge metric to evaluate how well previous questions are answered by the most appropriate chunk. Optimize the chunking parameters based upon the values of the metric.

A

CE

  • C is correct because it advocates for a systematic, metric-driven approach to chunking. By selecting an appropriate evaluation metric (like recall or NDCG) and experimenting with different chunking strategies (paragraphs vs. chapters), the engineer can quantitatively determine the best performing strategy.
  • E is correct because using an LLM as a judge provides a direct measure of how well chunks answer questions. This directly assesses the suitability of question and answer pairs, allowing for targeted optimization of chunking parameters.
  • A is incorrect because changing embedding models addresses the quality of the vector representations, not the chunking strategy itself. While important, it’s a separate optimization concern.
  • B is incorrect because adding a classifier for books focuses on improving retrieval by filtering content, not on optimizing the chunking strategy.
  • D is incorrect because relying on an LLM to suggest token counts may be limiting and not as comprehensive of an optimization strategy as actually evaluating different chunk division strategies.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

https://www.examtopics.com/discussions/databricks/view/148716-exam-certified-generative-ai-engineer-associate-topic-1/

A company has a typical RAG-enabled, customer-facing chatbot on its website.

Select the correct sequence of components a user’s questions will go through before the final output is returned. Use the diagram above for reference.

A. 1. embedding model, 2. vector search, 3. context-augmented prompt, 4. response-generating LLM
B. 1. context-augmented prompt, 2. vector search, 3. embedding model, 4. response-generating LLM
C. 1. response-generating LLM, 2. vector search, 3. context-augmented prompt, 4. embedding model
D. 1. response-generating LLM, 2. context-augmented prompt, 3. vector search, 4. embedding model

**

A

**

A. The correct sequence is embedding model, vector search, context-augmented prompt, and finally response-generating LLM. First, the user’s question needs to be converted into a vector embedding. This embedding is then used to search a vector database for relevant context. The relevant context is combined with the original query to create a context-augmented prompt, which is then fed into the LLM to generate the response.

  • Why A is correct: This follows the standard RAG pipeline. The question is embedded, relevant context is retrieved, a prompt is constructed, and a response is generated.
  • Why B is incorrect: The embedding model needs to be the first step.
  • Why C is incorrect: The LLM needs the context-augmented prompt to generate a relevant response, which is why it comes last in the sequence.
  • Why D is incorrect: The LLM needs the context-augmented prompt to generate a relevant response, which is why it comes last in the sequence and embedding model needs to be the first step.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A Generative Al Engineer interfaces with an LLM with prompt/response behavior that has been trained on customer calls inquiring about product availability. The LLM is designed to output “In Stock” if the product is available or only the term “Out of Stock” if not.
Which prompt will work to allow the engineer to respond to call classification labels correctly?
A.
Respond with “In Stock” if the customer asks for a product.
B.
You will be given a customer call transcript where the customer asks about product availability. The outputs are either “In Stock” or “Out of Stock”. Format the output in JSON, for example: {“call_id”: “123”, “label”: “In Stock”}.
C.
Respond with “Out of Stock” if the customer asks for a product.
D.
You will be given a customer call transcript where the customer inquires about product availability. Respond with “In Stock” if the product is available or “Out of Stock” if not.

https://www.examtopics.com/discussions/databricks/view/148717-exam-certified-generative-ai-engineer-associate-topic-1/

A

D. Option D provides the LLM with the necessary context (customer call transcript about product availability) and clear instructions on how to respond based on product availability, which aligns with the LLM’s design.

  • Why D is correct: It directly instructs the LLM to respond with “In Stock” if available and “Out of Stock” if not, covering both possible outcomes.
  • Why A is incorrect: It only specifies the response for one scenario (“In Stock”) and doesn’t cover the “Out of Stock” case.
  • Why B is incorrect: While it provides a format, it does not specify when to use each output (“In Stock” or “Out of Stock”) and introduces an unasked for JSON format.
  • Why C is incorrect: It only specifies the response for one scenario (“Out of Stock”) and doesn’t cover the “In Stock” case.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

https://www.examtopics.com/discussions/databricks/view/148718-exam-certified-generative-ai-engineer-associate-topic-1/

A Generative AI Engineer is testing a simple prompt template in LangChain using the code below, but is getting an error.

Assuming the API key was properly defined, what change does the Generative AI Engineer need to make to fix their chain?

A.

B.

C.

D.

A

D is the correct answer. The OpenAI() constructor needs to be passed to the LLMChain() constructor as llm=OpenAI(). Options A, B, and C do not pass the OpenAI() constructor to the LLMChain() constructor, leading to an error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A Generative AI Engineer is designing an LLM-powered live sports commentary platform. The platform provides real-time updates and LLM-generated analyses for any users who would like to have live summaries, rather than reading a series of potentially outdated news articles.

Which tool below will give the platform access to real-time data for generating game analyses based on the latest game scores?

A. DatabricksIQ
B. Foundation Model APIs
C. Feature Serving
D. AutoML

https://www.examtopics.com/discussions/databricks/view/148834-exam-certified-generative-ai-engineer-associate-topic-1/

**

A

**

C. Feature Serving. Feature Serving is designed to provide features to machine learning models in real-time. This allows the platform to ingest live sports data (scores, statistics) and feed it into the LLM for generating up-to-the-minute game analyses.

  • Why C is correct: Feature serving is specifically built for providing real-time data to machine learning models.
  • Why A is wrong: DatabricksIQ is for model development and optimization, not real-time data provisioning.
  • Why B is wrong: Foundation Model APIs provide access to pre-trained LLMs but do not handle live data integration.
  • Why D is wrong: AutoML is focused on automating machine learning model development, not real-time data delivery.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

https://www.examtopics.com/discussions/databricks/view/148866-exam-certified-generative-ai-engineer-associate-topic-1/

When developing an LLM application, it’s crucial to ensure that the data used for training the model complies with licensing requirements to avoid legal risks.
Which action is NOT appropriate to avoid legal risks?

A. Reach out to the data curators directly before you have started using the trained model to let them know.
B. Use any available data you personally created which is completely original and you can decide what license to use.
C. Only use data explicitly labeled with an open license and ensure the license terms are followed.
D. Reach out to the data curators directly after you have started using the trained model to let them know.

A

D. Reaching out to data curators after you’ve already started using the trained model is not appropriate. Licensing should be verified before using the data to avoid potential legal issues.

  • Why D is correct: Using data without confirming its license beforehand puts you at risk of violating licensing terms. Contacting curators after usage doesn’t mitigate the initial violation.
  • Why A is incorrect: Contacting data curators before using the model is a proactive and appropriate step to ensure compliance.
  • Why B is incorrect: Using your own, original data is acceptable as you control the licensing.
  • Why C is incorrect: Using openly licensed data, provided you adhere to the license terms, is a valid and appropriate practice.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A Generative AI Engineer is developing a chatbot designed to assist users with insurance-related queries. The chatbot is built on a large language model (LLM) and is conversational. However, to maintain the chatbot’s focus and to comply with company policy, it must not provide responses to questions about politics. Instead, when presented with political inquiries, the chatbot should respond with a standard message:
“Sorry, I cannot answer that. I am a chatbot that can only answer questions around insurance.”
Which framework type should be implemented to solve this?
A. Safety Guardrail
B. Security Guardrail
C. Contextual Guardrail
D. Compliance Guardrail

https://www.examtopics.com/discussions/databricks/view/148874-exam-certified-generative-ai-engineer-associate-topic-1/

A

The correct answer is A. Safety Guardrail.

  • Why A is correct: Safety Guardrails are designed to ensure that a conversational AI system stays within intended boundaries, preventing it from generating unsafe or irrelevant responses, including explicitly disallowed topics like politics.
  • Why B is incorrect: Security Guardrails focus on protecting the system from vulnerabilities and unauthorized access, not on content filtering.
  • Why C is incorrect: Contextual Guardrails are more about keeping the conversation relevant to the current topic (insurance) but doesn’t necessarily block specific topics entirely.
  • Why D is incorrect: While Compliance Guardrails might seem relevant, they are more broadly concerned with adhering to legal and regulatory requirements, rather than specific content restrictions dictated by company policy.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A Generative AI Engineer is responsible for developing a chatbot to enable their company’s internal HelpDesk Call Center team to more quickly find related tickets and provide resolution. While creating the GenAI application work breakdown tasks for this project, they realize they need to start planning which data sources (either Unity Catalog volume or Delta table) they could choose for this application. They have collected several candidate data sources for consideration:

  • call_rep_history: a Delta table with primary keys representative_id, call_id. This table is maintained to calculate representatives’ call resolution from fields call_duration and call start_time.
  • transcript Volume: a Unity Catalog Volume of all recordings as a *.wav files, but also a text transcript as *.txt files.
  • call_cust_history: a Delta table with primary keys customer_id, cal1_id. This table is maintained to calculate how much internal customers use the HelpDesk to make sure that the charge back model is consistent with actual service use.
  • call_detail: a Delta table that includes a snapshot of all call details updated hourly. It includes root_cause and resolution fields, but those fields may be empty for calls that are still active.
  • maintenance_schedule – a Delta table that includes a listing of both HelpDesk application outages as well as planned upcoming maintenance downtimes.

They need sources that could add context to best identify ticket root cause and resolution.

Which TWO sources do that? (Choose two.)

A. call_cust_history
B. maintenance_schedule
C. call_rep_history
D. call_detail
E. transcript Volume

https://www.examtopics.com/discussions/databricks/view/149588-exam-certified-generative-ai-engineer-associate-topic-1/

**

A

**

The correct answers are D. call_detail and E. transcript Volume.

  • Why D is correct: The call_detail Delta table directly includes root_cause and resolution fields, providing immediate insights into ticket resolution, even if some entries are incomplete.
  • Why E is correct: The transcript Volume contains text transcripts of conversations, offering detailed information about the customer’s issue, which is invaluable for determining the root cause.

Why other options are incorrect:

  • A. call_cust_history: This table focuses on customer usage of the HelpDesk, which is relevant for chargeback models but not for identifying the root cause and resolution of specific tickets.
  • B. maintenance_schedule: This table is useful for understanding outages and downtime, but it doesn’t provide specific context for identifying the root cause and resolution of individual tickets.
  • C. call_rep_history: This table focuses on representative performance metrics, which is not directly related to identifying ticket root causes and resolutions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

** https://www.examtopics.com/discussions/databricks/view/149795-exam-certified-generative-ai-engineer-associate-topic-1/

A Generative Al Engineer is creating an LLM-based application. The documents for its retriever have been chunked to a maximum of 512 tokens each. The Generative Al Engineer knows that cost and latency are more important than quality for this application. They have several context length levels to choose from.
Which will fulfill their need?

A. context length 514; smallest model is 0.44GB and embedding dimension 768
B. context length 2048: smallest model is 11GB and embedding dimension 2560
C. context length 32768: smallest model is 14GB and embedding dimension 4096
D. context length 512: smallest model is 0.13GB and embedding dimension 384

**

A

**

The correct answer is D.

  • Why D is correct: Because cost and latency are more important than quality, the smallest model with a context length that accommodates the 512 token chunks is the best choice. Option D (context length 512, 0.13GB model) fulfills this requirement with the least resources.
  • Why other options are wrong:
    • A: While the model size is relatively small, the context length of 514 is unnecessarily large, slightly increasing model size and computational cost without a significant quality gain.
    • B & C: These options offer much larger context lengths and significantly larger model sizes. This leads to increased cost and latency, violating the stated priorities.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

https://www.examtopics.com/discussions/databricks/view/149988-exam-certified-generative-ai-engineer-associate-topic-1/

A Generative AI Engineer is designing a RAG application for answering user questions on technical regulations as they learn a new sport. What are the steps needed to build this RAG application and deploy it?

A. Ingest documents from a source –> Index the documents and saves to Vector Search –> User submits queries against an LLM –> LLM retrieves relevant documents –> Evaluate model –> LLM generates a response –> Deploy it using Model Serving

B. Ingest documents from a source –> Index the documents and save to Vector Search –> User submits queries against an LLM –> LLM retrieves relevant documents –> LLM generates a response -> Evaluate model –> Deploy it using Model Serving

C. Ingest documents from a source –> Index the documents and save to Vector Search –> Evaluate model –> Deploy it using Model Serving

D. User submits queries against an LLM –> Ingest documents from a source –> Index the documents and save to Vector Search –> LLM retrieves relevant documents –> LLM generates a response –> Evaluate model –> Deploy it using Model Serving

A

B. The correct sequence of steps is: Ingest documents from a source –> Index the documents and save to Vector Search –> User submits queries against an LLM –> LLM retrieves relevant documents –> LLM generates a response -> Evaluate model –> Deploy it using Model Serving

Explanation:

  • Why B is correct: This option outlines the logical flow of building and deploying a RAG application. First, documents are ingested and indexed to create a searchable vector database. Then, a user query initiates the retrieval process, the LLM generates a response, the model is evaluated, and finally, the application is deployed.
  • Why A is incorrect: Option A has “User submits queries against an LLM” before the LLM generates a response, which is out of order. The LLM needs to retrieve documents and then generate a response, so the query step must precede the response step.
  • Why C is incorrect: Option C misses several crucial steps in the RAG pipeline, specifically the user query, document retrieval, and LLM response generation. It jumps directly from indexing to evaluation and deployment, which is not a complete RAG implementation.
  • Why D is incorrect: Option D starts with the user submitting a query before the documents are ingested and indexed. This is impossible, as the LLM needs a knowledge base to retrieve relevant documents from. Document ingestion and indexing must occur before any queries can be processed.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

https://www.examtopics.com/discussions/databricks/view/150014-exam-certified-generative-ai-engineer-associate-topic-1/

A Generative AI Engineer just deployed an LLM application at a digital marketing company that assists with answering customer service inquiries.

Which metric should they monitor for their customer service LLM application in production?

A. Number of customer inquiries processed per unit of time
B. Energy usage per query
C. Final perplexity scores for the training of the model
D. HuggingFace Leaderboard values for the base LLM

A

The correct answer is A. Number of customer inquiries processed per unit of time.

  • Why A is correct: This metric directly reflects the application’s performance in a production environment for customer service. It measures the efficiency and throughput of the LLM application in handling customer inquiries.
  • Why other options are wrong:
    • B. Energy usage per query: While energy efficiency is important, it’s not the primary metric for evaluating the customer service application’s performance in production.
    • C. Final perplexity scores for the training of the model: Perplexity is a training metric and not relevant once the model is deployed.
    • D. HuggingFace Leaderboard values for the base LLM: HuggingFace Leaderboard values are used during the development phase, not during production monitoring.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A Generative AI Engineer is building a Generative AI system that suggests the best matched employee team member to newly scoped projects. The team member is selected from a very large team. The match should be based upon project date availability and how well their employee profile matches the project scope. Both the employee profile and project scope are unstructured text.

How should the Generative Al Engineer architect their system?

A. Create a tool for finding available team members given project dates. Embed all project scopes into a vector store, perform a retrieval using team member profiles to find the best team member.
B. Create a tool for finding team member availability given project dates, and another tool that uses an LLM to extract keywords from project scopes. Iterate through available team members’ profiles and perform keyword matching to find the best available team member.
C. Create a tool to find available team members given project dates. Create a second tool that can calculate a similarity score for a combination of team member profile and the project scope. Iterate through the team members and rank by best score to select a team member.
D. Create a tool for finding available team members given project dates. Embed team profiles into a vector store and use the project scope and filtering to perform retrieval to find the available best matched team members.

https://www.examtopics.com/discussions/databricks/view/150015-exam-certified-generative-ai-engineer-associate-topic-1/

**

A

**

D is the correct answer.

  • Why D is correct: Option D is the most scalable and efficient approach. By embedding team profiles into a vector store, the system can quickly retrieve the best-matched team members for a given project scope using vector similarity search. This approach is particularly well-suited for very large teams because it avoids iterating through all team member profiles.
  • Why A is incorrect: Embedding project scopes instead of team member profiles is less efficient. The number of projects is likely to grow, so it is more effective to have a fixed number of team embeddings.
  • Why B is incorrect: Keyword matching is less effective than embedding similarity and not as useful for unstructured text. Additionally, iterating through team members does not scale to “very large teams”.
  • Why C is incorrect: Calculating similarity scores by iterating through all team members is inefficient for a large team. Furthermore, the “similarity score” is vague and does not take advantage of an established and effective method like vector embeddings and similarity search.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

** https://www.examtopics.com/discussions/databricks/view/150066-exam-certified-generative-ai-engineer-associate-topic-1/

A Generative AI Engineer has a provisioned throughput model serving endpoint as part of a RAG application and would like to monitor the serving endpoint’s incoming requests and outgoing responses. The current approach is to include a micro-service in between the endpoint and the user interface to write logs to a remote server.

Which Databricks feature should they use instead which will perform the same task?

A. Vector Search
B. Lakeview
C. DBSQL
D. Inference Tables

**

A

**

D. Inference Tables

  • Why D is correct: Inference Tables are designed to store and manage prediction results from machine learning models, which includes recording request and response data. This allows for monitoring incoming requests and outgoing responses effectively.
  • Why other options are incorrect:
    • A. Vector Search is used for similarity search of embeddings, not for logging requests and responses.
    • B. Lakeview is for creating data-driven dashboards for data analysis and not for this task.
    • C. DBSQL is used for querying data, not for logging requests and responses from a serving endpoint.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A Generative Al Engineer is building a system which will answer questions on latest stock news articles. Which will NOT help with ensuring the outputs are relevant to financial news?

A. Implement a comprehensive guardrail framework that includes policies for content filters tailored to the finance sector.
B. Increase the compute to improve processing speed of questions to allow greater relevancy analysis
C. Implement a profanity filter to screen out offensive language.
D. Incorporate manual reviews to correct any problematic outputs prior to sending to the users

https://www.examtopics.com/discussions/databricks/view/150114-exam-certified-generative-ai-engineer-associate-topic-1/

**

A

**

B. Increasing compute power primarily improves processing speed but does not inherently improve the relevancy of the answers to financial news. Relevancy is determined by the data sources, retrieval methods, and filtering mechanisms, not processing speed. Options A, C, and D directly contribute to ensuring relevance: A uses tailored content filters, C filters offensive language to keep the responses professional, and D uses manual reviews to correct any irrelevant outputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

** https://www.examtopics.com/discussions/databricks/view/150144-exam-certified-generative-ai-engineer-associate-topic-1/

A Generative AI Engineer has been asked to build an LLM-based question-answering application. The application should take into account new documents that are frequently published. The engineer wants to build this application with the least cost and least development effort and have it operate at the lowest cost possible.

Which combination of chaining components and configuration meets these requirements?

A. For the application a prompt, a retriever, and an LLM are required. The retriever output is inserted into the prompt which is given to the LLM to generate answers.
B. The LLM needs to be frequently with the new documents in order to provide most up-to-date answers.
C. For the question-answering application, prompt engineering and an LLM are required to generate answers.
D. For the application a prompt, an agent and a fine-tuned LLM are required. The agent is used by the LLM to retrieve relevant content that is inserted into the prompt which is given to the LLM to generate answers.

**

A

**
A. This option is the most suitable because it uses a retriever to fetch information from new documents and insert it into the prompt for the LLM. This approach effectively provides up-to-date information, reflecting frequently updated documentation, while minimizing cost and development effort.

  • Why A is correct: It provides a cost-effective and efficient way to incorporate new documents into the question-answering application by using a retriever to find relevant information and inserting it into the prompt.
  • Why B is wrong: It only mentions the update frequency of the LLM but doesn’t describe the application architecture.
  • Why C is wrong: It mentions prompt engineering and LLM but does not explain how to handle updates for new documents.
  • Why D is wrong: It describes an agent-using approach, but lacks specifics on how an agent-using structure would achieve the same effective information retrieval and insertion as A. Also, using a fine-tuned LLM is more costly.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A Generative AI Engineer is using the code below to test setting up a vector store:

Assuming they intend to use Databricks managed embeddings with the default embedding model, what should be the next logical function call? [https://www.examtopics.com/discussions/databricks/view/150263-exam-certified-generative-ai-engineer-associate-topic-1/]

A. vsc.get_index()
B. vsc.create_delta_sync_index()
C. vsc.create_direct_access_index()
D. vsc.similarity_search()

A

The correct answer is C. vsc.create_direct_access_index().

Explanation:

create_direct_access_index() is the appropriate next step when testing the setup of a vector store, especially when using Databricks managed embeddings with the default embedding model without a pre-existing Delta table. This method allows for manually adding documents and embeddings, which is ideal for initial testing and minimal setup.

Why other options are wrong:

  • A. vsc.get_index(): This function is used to retrieve an existing index, not create a new one. Since the engineer is setting up the vector store, an index likely doesn’t exist yet.
  • B. vsc.create_delta_sync_index(): This option is suitable for production workflows where a Delta table is already in use and the index needs to automatically synchronize with it. It’s not appropriate for a minimal test setup.
  • D. vsc.similarity_search(): This function is used to perform a similarity search on an existing index. An index must be created and populated first.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

https://www.examtopics.com/discussions/databricks/view/150264-exam-certified-generative-ai-engineer-associate-topic-1/

A Generative AI Engineer wants to build an LLM-based solution to help a restaurant improve its online customer experience with bookings by automatically handling common customer inquiries. The goal of the solution is to minimize escalations to human intervention and phone calls while maintaining a personalized interaction. To design the solution, the Generative AI Engineer needs to define the input data to the LLM and the task it should perform.

Which input/output pair will support their goal?

A. Input: Online chat logs; Output: Group the chat logs by users, followed by summarizing each user’s interactions
B. Input: Online chat logs; Output: Buttons that represent choices for booking details
C. Input: Customer reviews; Output: Classify review sentiment
D. Input: Online chat logs; Output: Cancellation options

A

B. Input: Online chat logs; Output: Buttons that represent choices for booking details

This is the best answer because it allows for the automatic handling of customer inquiries in a structured way, providing immediate responses to questions about reservations.
* Why A is wrong: Summarizing user interactions, while potentially useful, doesn’t directly address the task of handling customer inquiries and minimizing human intervention during the booking process.
* Why C is wrong: Classifying review sentiment is more focused on understanding customer feedback than directly assisting with booking inquiries.
* Why D is wrong: Providing cancellation options is a limited functionality and doesn’t address the broader goal of handling various customer inquiries related to bookings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

https://www.examtopics.com/discussions/databricks/view/150265-exam-certified-generative-ai-engineer-associate-topic-1/

What is an effective method to preprocess prompts using custom code before sending them to an LLM?

A. Directly modify the LLM’s internal architecture to include preprocessing steps
B. It is better not to introduce custom code to preprocess prompts as the LLM has not been trained with examples of the preprocessed prompts
C. Rather than preprocessing prompts, it’s more effective to postprocess the LLM outputs to align the outputs to desired outcomes
D. Write a MLflow PyFunc model that has a separate function to process the prompts

A

D. Writing an MLflow PyFunc model with a separate function to process prompts allows for systematic and flexible preprocessing, potentially improving LLM performance through optimized prompts.

  • Why D is correct: This approach enables organized and adaptable prompt manipulation before they are fed into the LLM.
  • Why A is wrong: Modifying the LLM’s internal architecture is generally not feasible or practical.
  • Why B is wrong: Preprocessing prompts with custom code can be beneficial for optimizing LLM performance.
  • Why C is wrong: While post-processing is valuable, pre-processing can proactively shape the input for better initial results.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

https://www.examtopics.com/discussions/databricks/view/150266-exam-certified-generative-ai-engineer-associate-topic-1/

A Generative AI Engineer is developing an LLM application that users can use to generate personalized birthday poems based on their names.

Which technique would be most effective in safeguarding the application, given the potential for malicious user inputs?

A. Implement a safety filter that detects any harmful inputs and ask the LLM to respond that it is unable to assist
B. Reduce the time that the users can interact with the LLM
C. Ask the LLM to remind the user that the input is malicious but continue the conversation with the user
D. Increase the amount of compute that powers the LLM to process input faster

A

A. Implementing a safety filter is the most effective technique. It directly addresses the potential for malicious input by detecting and blocking harmful content, thus safeguarding the application.

  • Why A is correct: Safety filters prevent the LLM from processing and potentially generating harmful or inappropriate responses based on malicious input.
  • Why B is incorrect: Reducing interaction time doesn’t prevent malicious input, it only limits the duration of potential harm.
  • Why C is incorrect: Reminding the user that the input is malicious while continuing the conversation does not prevent the generation of harmful outputs and could potentially encourage further malicious behavior.
  • Why D is incorrect: Increasing compute power doesn’t address the safety concerns related to malicious input. It only affects processing speed.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

https://www.examtopics.com/discussions/databricks/view/150267-exam-certified-generative-ai-engineer-associate-topic-1/

Which indicator should be considered to evaluate the safety of the LLM outputs when qualitatively assessing LLM responses for a translation use case?

A. The ability to generate responses in code
B. The similarity to the previous language
C. The latency of the response and the length of text generated
D. The accuracy and relevance of the responses

A

D. Accuracy and relevance are key to ensuring the LLM output is safe and appropriate in a translation use case.

  • Why D is correct: Accuracy and relevance directly relate to whether the translated output conveys the intended meaning without introducing harmful or misleading information.
  • Why the other options are incorrect:
    • A: Code generation is irrelevant to translation safety.
    • B: Similarity to the original language doesn’t guarantee safety; a harmful statement could be translated faithfully.
    • C: Latency and text length are performance metrics, not safety indicators.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

https://www.examtopics.com/discussions/databricks/view/150268-exam-certified-generative-ai-engineer-associate-topic-1/

A Generative AI Engineer is developing a patient-facing healthcare-focused chatbot. If the patient’s question is not a medical emergency, the chatbot should solicit more information from the patient to pass to the doctor’s office and suggest a few relevant pre-approved medical articles for reading. If the patient’s question is urgent, direct the patient to calling their local emergency services.

Given the following user input:
“I have been experiencing severe headaches and dizziness for the past two days.”

Which response is most appropriate for the chatbot to generate?

A. Here are a few relevant articles for your browsing. Let me know if you have questions after reading them.
B. Please call your local emergency services.
C. Headaches can be tough. Hope you feel better soon!
D. Please provide your age, recent activities, and any other symptoms you have noticed along with your headaches and dizziness.

**

A

**

B. Please call your local emergency services.

  • Why this is right: Severe headaches and dizziness persisting for two days can indicate a serious medical condition requiring immediate attention. The chatbot should prioritize patient safety and direct the user to emergency services.
  • Why other options are wrong:
    • A: Providing articles is inappropriate for potentially urgent symptoms.
    • C: This is a dismissive and unhelpful response.
    • D: Delaying immediate help by asking for more information is dangerous when the symptoms suggest a possible emergency.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

ExamTopics URL

After changing the response generating LLM in a RAG pipeline from GPT-4 to a model with a shorter context length that the company self-hosts, the Generative AI Engineer is getting the following error:

Image Text: ValueError: This model's maximum context length is 2048 tokens. However, you requested 2049 tokens (1025 in the messages, 1024 in the completion). Please reduce the length of the messages or completion.

What TWO solutions should the Generative AI Engineer implement without changing the response generating model? (Choose two.)

A. Use a smaller embedding model to generate embeddings
B. Reduce the maximum output tokens of the new model
C. Decrease the chunk size of embedded documents
D. Reduce the number of records retrieved from the vector database
E. Retrain the response generating model using ALiBi

A

CD

C is correct because decreasing the chunk size of embedded documents directly reduces the number of tokens included in the prompt, addressing the context length issue.

D is correct because reducing the number of records retrieved from the vector database limits the amount of information passed to the LLM, thereby reducing the total number of tokens in the prompt.

A is incorrect because using a smaller embedding model affects the quality of embeddings, but it doesn’t directly address the token limit issue in the prompt.

B is incorrect because reducing the maximum output tokens limits the model’s ability to generate complete responses, which is undesirable.

E is incorrect because retraining the response generating model using ALiBi is complex and unnecessary. The issue can be resolved by reducing the input tokens without retraining.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

https://www.examtopics.com/discussions/databricks/view/150270-exam-certified-generative-ai-engineer-associate-topic-1/

A Generative AI Engineer is building a RAG application that answers questions about internal documents for the company SnoPen AI. The source documents may contain a significant amount of irrelevant content, such as advertisements, sports news, or entertainment news, or content about other companies. Which approach is advisable when building a RAG application to achieve this goal of filtering irrelevant information?

A. Keep all articles because the RAG application needs to understand non-company content to avoid answering questions about them.
B. Include in the system prompt that any information it sees will be about SnoPenAI, even if no data filtering is performed.
C. Include in the system prompt that the application is not supposed to answer any questions unrelated to SnoPen AI.
D. Consolidate all SnoPen AI related documents into a single chunk in the vector database.

A

The correct answer is C.

  • Why C is correct: By specifying in the system prompt that the application should not answer questions unrelated to SnoPen AI, you directly instruct the model to filter out irrelevant information. This allows the application to focus solely on questions relevant to the company, improving accuracy and efficiency.
  • Why other options are wrong:
    • A: Keeping all articles, including irrelevant content, would dilute the context and potentially lead the model to answer questions outside the scope of SnoPen AI, defeating the purpose of filtering.
    • B: Including in the prompt that all information is about SnoPen AI, without any data filtering, would be misleading to the model, especially if the ingested documents contain information about other topics. The model would likely generate inaccurate responses.
    • D: Consolidating all SnoPen AI related documents into a single chunk might seem beneficial, but it could create an overly large chunk that exceeds the model’s context window, leading to truncation or incomplete information retrieval. Also, it doesn’t inherently filter out irrelevant information within those documents.
24
Q

ExamTopics URL

A Generative Al Engineer has successfully ingested unstructured documents and chunked them by document sections. They would like to store the chunks in a Vector Search index. The current format of the dataframe has two columns: (i) original document file name (ii) an array of text chunks for each document.

What is the most performant way to store this dataframe?

A. Split the data into train and test set, create a unique identifier for each document, then save to a Delta table
B. Flatten the dataframe to one chunk per row, create a unique identifier for each row, and save to a Delta table
C. First create a unique identifier for each document, then save to a Delta table
D. Store each chunk as an independent JSON file in Unity Catalog Volume. For each JSON file, the key is the document section name and the value is the array of text chunks for that section

A

B. Flatten the dataframe to one chunk per row, create a unique identifier for each row, and save to a Delta table

This is the most performant because flattening the dataframe ensures each chunk is a distinct row, optimized for vector search indexing. The unique identifier enables efficient retrieval. Options A and C do not address the need to index individual chunks. Option D is less performant due to the overhead of managing numerous JSON files.

25
[https://www.examtopics.com/discussions/databricks/view/150272-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/150272-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer has created a RAG application which can help employees retrieve answers from an internal knowledge base, such as Confluence pages or Google Drive. The prototype application is now working with some positive feedback from internal company testers. Now the Generative Al Engineer wants to formally evaluate the system’s performance and understand where to focus their efforts to further improve the system. How should the Generative AI Engineer evaluate the system? A. Use cosine similarity score to comprehensively evaluate the quality of the final generated answers. B. Curate a dataset that can test the retrieval and generation components of the system separately. Use MLflow’s built in evaluation metrics to perform the evaluation on the retrieval and generation components. C. Benchmark multiple LLMs with the same data and pick the best LLM for the job. D. Use an LLM-as-a-judge to evaluate the quality of the final answers generated.
B. Curating a dataset to test retrieval and generation separately and using MLflow's evaluation metrics is the correct approach. This is because it allows for a focused understanding of each component's performance, enabling targeted improvements. Why other options are wrong: A. Cosine similarity alone is insufficient to comprehensively evaluate the generated answers' quality, as it doesn't capture semantic accuracy or relevance. C. Benchmarking LLMs is useful for selection but doesn't directly evaluate the RAG system's performance, including retrieval effectiveness. D. While LLM-as-a-judge can be valuable, it is more effective when used in conjunction with a separate evaluation of the retrieval component, as suggested by option B.
26
[https://www.examtopics.com/discussions/databricks/view/150273-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/150273-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative Al Engineer has already trained an LLM on Databricks and it is now ready to be deployed. Which of the following steps correctly outlines the easiest process for deploying a model on Databricks? A. Log the model as a pickle object, upload the object to Unity Catalog Volume, register it to Unity Catalog using MLflow, and start a serving endpoint B. Log the model using MLflow during training, directly register the model to Unity Catalog using the MLflow API, and start a serving endpoint C. Save the model along with its dependencies in a local directory, build the Docker image, and run the Docker container D. Wrap the LLM’s prediction function into a Flask application and serve using Gunicorn
B. Logging the model using MLflow during training, directly registering the model to Unity Catalog using the MLflow API, and starting a serving endpoint is the easiest process. MLflow streamlines model management and deployment within Databricks. * **Why B is correct:** MLflow provides a convenient and integrated way to log, register, and deploy models directly within the Databricks environment. It simplifies the process compared to manual methods. * **Why A is incorrect:** While Unity Catalog Volumes can store model artifacts, pickling and manually uploading is less streamlined than using MLflow's integrated capabilities. * **Why C is incorrect:** Building Docker images and managing containers adds unnecessary complexity compared to using Databricks' built-in serving capabilities with MLflow. * **Why D is incorrect:** Wrapping the model in a Flask application and serving with Gunicorn is a more manual approach and doesn't leverage Databricks' managed serving infrastructure.
27
## Exam Question and Answer: **** [https://www.examtopics.com/discussions/databricks/view/150274-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/150274-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer developed an LLM application using the provisioned throughput Foundation Model API. Now that the application is ready to be deployed, they realize their volume of requests are not sufficiently high enough to create their own provisioned throughput endpoint. They want to choose a strategy that ensures the best cost-effectiveness for their application. What strategy should the Generative AI Engineer use? A. Switch to using External Models instead B. Deploy the model using pay-per-token throughput as it comes with cost guarantees C. Change to a model with a fewer number of parameters in order to reduce hardware constraint issues D. Throttle the incoming batch of requests manually to avoid rate limiting issues **
** B. Deploy the model using pay-per-token throughput as it comes with cost guarantees. **Explanation:** * **Why B is correct:** Pay-per-token throughput is designed for scenarios with varying or low request volumes. It avoids the fixed costs associated with provisioned throughput, making it more cost-effective when usage is low. * **Why other options are wrong:** * **A:** Switching to External Models is not directly related to cost-effectiveness in terms of throughput management. * **C:** Changing the model based on parameter size is not a direct answer, as it targets hardware constraints and not necessarily cost-effectiveness for low request volumes. * **D:** Manually throttling requests might avoid rate limiting, but it doesn't address the fundamental issue of cost-effectiveness for the current volume of use.
28
[https://www.examtopics.com/discussions/databricks/view/150275-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/150275-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer is building an LLM to generate article summaries in the form of a type of poem, such as a haiku, given the article content. However, the initial output from the LLM does not match the desired tone or style. Which approach will NOT improve the LLM’s response to achieve the desired response? A. Provide the LLM with a prompt that explicitly instructs it to generate text in the desired tone and style B. Use a neutralizer to normalize the tone and style of the underlying documents C. Include few-shot examples in the prompt to the LLM D. Fine-tune the LLM on a dataset of desired tone and style
B. Using a neutralizer to normalize the tone and style of the underlying documents will not help and will likely hinder the LLM from generating text with the desired tone and style. Options A, C, and D are all valid methods for shaping the LLM's output to match the desired tone and style. A neutralizer would remove the elements needed for the desired output.
29
**** [https://www.examtopics.com/discussions/databricks/view/150276-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/150276-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer is creating an LLM-powered application that will need access to up-to-date news articles and stock prices. The design requires the use of stock prices which are stored in Delta tables and finding the latest relevant news articles by searching the internet. How should the Generative AI Engineer architect their LLM system? A. Use an LLM to summarize the latest news articles and lookup stock tickers from the summaries to find stock prices. B. Query the Delta table for volatile stock prices and use an LLM to generate a search query to investigate potential causes of the stock volatility. C. Download and store news articles and stock price information in a vector store. Use a RAG architecture to retrieve and generate at runtime. D. Create an agent with tools for SQL querying of Delta tables and web searching, provide retrieved values to an LLM for generation of response. **
** D. Create an agent with tools for SQL querying of Delta tables and web searching, provide retrieved values to an LLM for generation of response. * **Why this is correct:** This approach allows the LLM to access real-time data from both the Delta table (stock prices) and the internet (news articles) through dedicated tools managed by an agent. The agent's ability to query and search ensures the LLM receives the most up-to-date and relevant information for generating responses. * **Why other options are wrong:** * **A:** Using an LLM to summarize news and then lookup stock tickers is an indirect and potentially less accurate way to retrieve stock prices. * **B:** This focuses only on volatile stocks and doesn't address the need for comprehensive news and stock price data. * **C:** Storing data in a vector store might not provide the most up-to-date information, especially for rapidly changing data like stock prices and news. Although RAG is a solid architecture, this does not address the need to perform real-time SQL queries and web searches to gather up-to-date information.
30
[https://www.examtopics.com/discussions/databricks/view/150277-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/150277-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer is designing a chatbot for a gaming company that aims to engage users on its platform while its users play online video games. Which metric would help them increase user engagement and retention for their platform? A. Randomness B. Diversity of responses C. Lack of relevance D. Repetition of responses
B. Diversity of responses. A variety of responses keeps users interested and engaged with the chatbot, leading to longer platform retention. Randomness (A) and Repetition of responses (D) would make the bot unpredictable or boring, decreasing engagement. Lack of relevance (C) would make the bot unhelpful and frustrating for users.
31
[https://www.examtopics.com/discussions/databricks/view/150278-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/150278-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer is building a RAG application that will rely on context retrieved from source documents that are currently in PDF format. These PDFs can contain both text and images. They want to develop a solution using the least amount of lines of code. Which Python package should be used to extract the text from the source documents? A. flask B. beautifulsoup C. unstructured D. numpy
C. unstructured. This package allows you to extract text efficiently while minimizing the amount of code.
32
[ExamTopics URL](https://www.examtopics.com/discussions/databricks/view/150279-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer received the following business requirements for an external chatbot. The chatbot needs to know what types of questions the user asks and routes to appropriate models to answer the questions. For example, the user might ask about upcoming event details. Another user might ask about purchasing tickets for a particular event. What is an ideal workflow for such a chatbot? A. The chatbot should only look at previous event information B. There should be two different chatbots handling different types of user queries. C. The chatbot should be implemented as a multi-step LLM workflow. First, identify the type of question asked, then route the question to the appropriate model. If it’s an upcoming event question, send the query to a text-to-SQL model. If it’s about ticket purchasing, the customer should be redirected to a payment platform. D. The chatbot should only process payments
C. The chatbot should be implemented as a multi-step LLM workflow. First, identify the type of question asked, then route the question to the appropriate model. If it’s an upcoming event question, send the query to a text-to-SQL model. If it’s about ticket purchasing, the customer should be redirected to a payment platform. This is the right answer because the requirement specifies that the chatbot needs to understand the type of question being asked and route it to the appropriate model. Option C provides a workflow that addresses this directly by using a multi-step LLM workflow to classify the question and route it accordingly. Here's why the other options are wrong: * **A:** The chatbot should only look at previous event information - This is too restrictive and doesn't address the requirement of handling different types of queries, such as ticket purchasing. * **B:** There should be two different chatbots handling different types of user queries - While technically possible, it is less efficient and harder to manage than a single chatbot that can intelligently route queries. * **D:** The chatbot should only process payments - This is far too limited and does not address the broader business requirement of answering various types of questions.
33
```markdown A Generative Al Engineer is tasked with developing an application that is based on an open source large language model (LLM). They need a foundation LLM with a large context window. Which model fits this need? A. DistilBERT B. MPT-30B C. Llama2-70B D. DBRX [https://www.examtopics.com/discussions/databricks/view/150280-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/150280-exam-certified-generative-ai-engineer-associate-topic-1/)
D. DBRX DBRX is the correct answer because it is an open-source LLM developed by Databricks specifically designed with a large context window (up to 32K tokens), making it suitable for applications requiring handling long documents or extended conversations. * **Why A is wrong:** DistilBERT is a smaller, faster transformer model primarily used for tasks like text classification and question answering, and does not have a large context window. * **Why B is wrong:** MPT-30B has a context window of 8k tokens, which is less than DBRX. * **Why C is wrong:** Llama2-70B has a context window of 4096, which is also less than DBRX. ```
34
[https://www.examtopics.com/discussions/databricks/view/152048-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/152048-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative Al Engineer is tasked with improving the RAG quality by addressing its inflammatory outputs. Which action would be most effective in mitigating the problem of offensive text outputs? A. Increase the frequency of upstream data updates B. Inform the user of the expected RAG behavior C. Restrict access to the data sources to a limited number of users D. Curate upstream data properly that includes manual review before it is fed into the RAG system
D. Curating upstream data properly, including manual review, is the most effective way to mitigate offensive text outputs in a RAG system because it directly addresses the source of the problem: potentially harmful or biased content in the data. * **Why D is correct:** Manual review allows for the identification and removal or modification of offensive content *before* it's ingested into the RAG system. * **Why other options are wrong:** * A (Increasing data updates) might keep the data fresher but doesn't guarantee the removal of offensive content. * B (Informing the user) manages expectations but doesn't solve the problem of offensive outputs. * C (Restricting access) doesn't address the issue of offensive content within the existing data.
35
[https://www.examtopics.com/discussions/databricks/view/152074-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/152074-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative Al Engineer has developed an LLM application to answer questions about internal company policies. The Generative AI Engineer must ensure that the application doesn’t hallucinate or leak confidential data. Which approach should NOT be used to mitigate hallucination or confidential data leakage? A. Add guardrails to filter outputs from the LLM before it is shown to the user B. Fine-tune the model on your data, hoping it will learn what is appropriate and not C. Limit the data available based on the user’s access level D. Use a strong system prompt to ensure the model aligns with your needs.
B. Fine-tuning the model and *hoping* it will learn what is appropriate is not a reliable approach to prevent hallucination or data leakage. Fine-tuning *can* help, but it requires careful data preparation and validation. Relying solely on the hope that the model will learn what's appropriate is insufficient. * **Why A is wrong:** Guardrails are an effective way to filter outputs and prevent unwanted content from being displayed. * **Why C is wrong:** Limiting data access based on user level is an important security measure to prevent unauthorized disclosure. * **Why D is wrong:** Strong system prompts guide the model's behavior and can help to keep outputs on track and reduce undesirable behaviour.
36
[https://www.examtopics.com/discussions/databricks/view/153568-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/153568-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative Al Engineer is creating an LLM system that will retrieve news articles from the year 1918 and related to a user's query and summarize them. The engineer has noticed that the summaries are generated well but often also include an explanation of how the summary was generated, which is undesirable. Which change could the Generative Al Engineer perform to mitigate this issue? A. Split the LLM output by newline characters to truncate away the summarization explanation. B. Tune the chunk size of news articles or experiment with different embedding models. C. Revisit their document ingestion logic, ensuring that the news articles are being ingested properly. D. Provide few shot examples of desired output format to the system and/or user prompt.
The correct answer is **D. Provide few shot examples of desired output format to the system and/or user prompt.** * **Why D is correct:** Few-shot learning involves providing the LLM with examples of the desired input and output format. By showing the model examples of summaries without explanations, it can learn to generate summaries that conform to the desired format. * **Why A is incorrect:** Splitting the LLM output by newline characters is a crude method that may truncate the summary itself, not just the explanation. It's not a reliable or targeted solution. * **Why B is incorrect:** Tuning chunk size and embedding models addresses the quality of information retrieval and might affect the summary's content, but it doesn't directly address the issue of unwanted explanations in the output. * **Why C is incorrect:** Re-examining the document ingestion logic will ensure data integrity but it doesn't control the format of the output the LLM generates.
37
[https://www.examtopics.com/discussions/databricks/view/155675-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/155675-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer at an electronics company deployed a RAG application. The RAG response often returns information about an irrelevant product. What can the engineer do to improve the relevance of the RAG’s response? A. Assess the quality of the retrieved context B. Implement caching for frequently asked questions C. Use a different LLM to improve the generated response D. Use a different semantic similarity search algorithm
D. Using a different semantic similarity search algorithm can improve the relevance of the RAG response. The problem is that the RAG application is retrieving irrelevant context. Changing the semantic similarity search algorithm directly addresses this by improving the retrieval of relevant information. * **Why D is correct:** The core issue is poor retrieval of relevant information. A different semantic similarity search algorithm can more accurately identify and retrieve contextually relevant data. * **Why A is wrong:** Assessing the quality of the retrieved context is useful for diagnosis but doesn't directly solve the problem of retrieving irrelevant information. * **Why B is wrong:** Caching is for performance optimization and doesn't address the relevance of the retrieved context. * **Why C is wrong:** Using a different LLM addresses the quality of the generated response, not the accuracy of the retrieved context.
38
[https://www.examtopics.com/discussions/databricks/view/157158-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/157158-exam-certified-generative-ai-engineer-associate-topic-1/) A small and cost-conscious startup in the cancer research field wants to build a RAG application using Foundation Model APIs. Which strategy would allow the startup to build a good-quality RAG application while being cost-conscious and able to cater to customer needs? A. Limit the number of relevant documents available for the RAG application to retrieve from B. Pick a smaller LLM that is domain-specific C. Limit the number of queries a customer can send per day D. Use the largest LLM possible because that gives the best performance for any general queries
B. Picking a smaller, domain-specific LLM is the right choice because it balances cost-effectiveness with performance tailored to the cancer research field. This allows the startup to cater to customer needs without incurring the high costs associated with larger, general-purpose LLMs. * **Why B is right:** A smaller, domain-specific LLM is more efficient and cost-effective for a specific task like cancer research. * **Why other options are wrong:** * A: Limiting documents may hurt the quality of the RAG application, as it restricts the information available for retrieval. * C: Limiting queries impacts customer experience and doesn't address the core issue of cost-effective model selection. * D: Using the largest LLM is not cost-conscious and may be overkill for a specific domain, leading to unnecessary expenses.
39
[https://www.examtopics.com/discussions/databricks/view/272728-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/272728-exam-certified-generative-ai-engineer-associate-topic-1/) What is the most suitable library for building a multi-step LLM-based workflow? A. Pandas B. TensorFlow C. PySpark D. LangChain
D. LangChain is the most suitable library. LangChain is specifically designed for building applications using LLMs, including multi-step workflows. Pandas is for data manipulation, TensorFlow is a deep learning framework, and PySpark is for large-scale data processing. These are not designed for building LLM workflows.
40
**** A Generative Al Engineer is tasked with developing a RAG application that will help a small internal group of experts at their company answer specific questions, augmented by an internal knowledge base. They want the best possible quality in the answers, and neither latency nor throughput is a huge concern given that the user group is small and they’re willing to wait for the best answer. The topics are sensitive in nature and the data is highly confidential and so, due to regulatory requirements, none of the information is allowed to be transmitted to third parties. Which model meets all the Generative Al Engineer’s needs in this situation? A. Dolly 1.5B B. OpenAI GPT-4 C. BGE-large D. Llama2-70B [https://www.examtopics.com/discussions/databricks/view/272731-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/272731-exam-certified-generative-ai-engineer-associate-topic-1/) **
** The correct answer is **D. Llama2-70B**. * **Why it's correct:** Llama2-70B is a large language model that can be self-hosted, addressing both the need for high-quality answers and the requirement that no data be transmitted to third parties. The question states that latency is not a concern, therefore a bigger, slower model such as Llama2-70B is acceptable. * **Why other options are wrong:** * **A. Dolly 1.5B:** While it can be self-hosted, it's a smaller model and unlikely to provide the "best possible quality" answers. * **B. OpenAI GPT-4:** While it offers high-quality answers, it involves transmitting data to a third-party (OpenAI), violating the confidentiality requirements. * **C. BGE-large:** BGE-large is primarily an embedding model, not a generative model. It's suitable for the retrieval component of a RAG system, but cannot generate answers on its own.
41
[https://www.examtopics.com/discussions/databricks/view/272735-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/272735-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative Al Engineer would like an LLM to generate formatted JSON from emails. This will require parsing and extracting the following information: order ID, date, and sender email. Here’s a sample email: ![Image](https://img.examtopics.com/certified-generative-ai-engineer-associate/image6.png) The image contains the text: ```text From: "Customer Service" <[email protected]> Date: April 16, 2024 at 1:23:45 PM PDT To: "[email protected]" <[email protected]> Subject: Your Order Confirmation - RE987D Dear Valued Customer, Thank you for your recent order! This email confirms that your order RE987D has been successfully processed and is currently being prepared for shipment. Order Details: * Order ID: RE987D * Order Date: April 16, 2024 * Shipping Address: [Customer Shipping Address] * Billing Address: [Customer Billing Address] If you have any questions or require further assistance, please do not hesitate to contact us. Thank you, Customer Service Team ``` They will need to write a prompt that will extract the relevant information in JSON format with the highest level of output accuracy. Which prompt will do that? A. You will receive customer emails and need to extract date, sender email, and order ID. You should return the date, sender email, and order ID information in JSON format. B. You will receive customer emails and need to extract date, sender email, and order ID. Return the extracted information in JSON format. Here’s an example: {“date”: “April 16, 2024”, “sender_email”: “[email protected]”, “order_id”: “RE987D”} C. You will receive customer emails and need to extract date, sender email, and order ID. Return the extracted information in a human-readable format. D. You will receive customer emails and need to extract date, sender email, and order IReturn the extracted information in JSON format.
B. **Correct**. Option B is correct because it provides an example of the desired JSON output format. Including an example in the prompt helps the LLM understand the exact structure and format required, leading to higher accuracy. A. **Incorrect**. Option A only instructs the LLM to return JSON format without providing an example, which can lead to variability in the output. C. **Incorrect**. Option C asks for human-readable format, which contradicts the requirement for JSON format. D. **Incorrect**. Option D has a typo ("order IReturn") and lacks an example, making it less effective than Option B.
42
**** [https://www.examtopics.com/discussions/databricks/view/272741-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/272741-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer is creating an agent-based LLM system for their favorite monster truck team. The system can answer text based questions about the monster truck team, lookup event dates via an API call, or query tables on the team’s latest standings. How could the Generative AI Engineer best design these capabilities into their system? A. Ingest PDF documents about the monster truck team into a vector store and query it in a RAG architecture. B. Write a system prompt for the agent listing available tools and bundle it into an agent system that runs a number of calls to solve a query. C. Instruct the LLM to respond with “RAG”, “API”, or “TABLE” depending on the query, then use text parsing and conditional statements to resolve the query. D. Build a system prompt with all possible event dates and table information in the system prompt. Use a RAG architecture to lookup generic text questions and otherwise leverage the information in the system prompt. **
** B is the correct answer. * **Why B is correct:** This approach directly leverages the capabilities of an agent-based LLM system. By providing the agent with a system prompt that defines available tools (API call for event dates, table query for standings, etc.), the agent can intelligently decide which tool to use based on the user's query and execute the appropriate actions. * **Why A is incorrect:** While RAG is useful, it primarily addresses answering questions from unstructured text data. It doesn't directly address the need to make API calls or query tables. * **Why C is incorrect:** Relying on the LLM to output specific keywords ("RAG", "API", "TABLE") and then using conditional statements is less robust and more prone to errors. It tightly couples the LLM's output format to the code, making it less flexible. * **Why D is incorrect:** Including all possible event dates and table information in the system prompt is not scalable and can quickly exceed the LLM's context window. It's also inefficient since the information needs to be updated manually in the prompt. RAG is useful, but does not fully cover the functionality needed.
43
[https://www.examtopics.com/discussions/databricks/view/272745-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/272745-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer has been asked to design an LLM-based application that accomplishes the following business objective: answer employee HR questions using HR PDF documentation. Which set of high level tasks should the Generative AI Engineer's system perform? A. Calculate averaged embeddings for each HR document, compare embeddings to user query to find the best document. Pass the best document with the user query into an LLM with a large context window to generate a response to the employee. B. Use an LLM to summarize HR documentation. Provide summaries of documentation and user query into an LLM with a large context window to generate a response to the user. C. Create an interaction matrix of historical employee questions and HR documentation. Use ALS to factorize the matrix and create embeddings. Calculate the embeddings of new queries and use them to find the best HR documentation. Use an LLM to generate a response to the employee question based upon the documentation retrieved. D. Split HR documentation into chunks and embed into a vector store. Use the employee question to retrieve best matched chunks of documentation, and use the LLM to generate a response to the employee based upon the documentation retrieved.
D. Splitting the HR documentation into chunks, embedding them into a vector store, using the employee question to retrieve relevant chunks, and then using an LLM to generate a response is the most straightforward and efficient approach. This method leverages vector embeddings for semantic search, enabling the LLM to access the most relevant information. * **Why D is correct:** This approach uses retrieval-augmented generation (RAG), which is standard for this type of task. It breaks down the HR documentation into manageable chunks, creates vector embeddings for each chunk, stores the embeddings in a vector store, retrieves the most relevant chunks based on the user's question, and then feeds these chunks along with the user's question into the LLM to generate an answer. * **Why A is incorrect:** Averaging embeddings across entire documents loses granular context. This is generally less effective than chunking. * **Why B is incorrect:** Summarizing the entire HR documentation loses detailed information and context that might be necessary to answer specific employee questions accurately. Feeding the summaries might be too general for the LLM to construct responses. * **Why C is incorrect:** Creating an interaction matrix using ALS is an overcomplicated approach for this problem. ALS is typically used for recommender systems and collaborative filtering, which is not the primary goal here. It adds unnecessary complexity to the system design, particularly in the initial stages.
44
[https://www.examtopics.com/discussions/databricks/view/272749-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/272749-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer is tasked with deploying an application that takes advantage of a custom MLflow Pyfunc model to return some interim results. How should they configure the endpoint to pass the secrets and credentials? A. Use spark.conf.set () B. Pass variables using the Databricks Feature Store API C. Add credentials using environment variables D. Pass the secrets in plain text
C. Adding credentials using environment variables is the correct approach. Environment variables allow you to securely pass credentials (like API keys, database passwords, etc.) to your application without hardcoding them into your code. This is the most secure way to manage secrets. Options A and B are not designed for secure credential management. Option D is incorrect as it is never recommended to pass secrets in plain text due to security risks.
45
[https://www.examtopics.com/discussions/databricks/view/272752-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/272752-exam-certified-generative-ai-engineer-associate-topic-1/) A team wants to serve a code generation model as an assistant for their software developers. It should support multiple programming languages. Quality is the primary objective. Which of the Databricks Foundation Model APIs, or models available in the Marketplace, would be the best fit? A. Llama2-70b B. BGE-large C. MPT-7b D. CodeLlama-34B
D. CodeLlama-34B is the best fit because it is specifically designed for code generation and prioritizes quality across multiple programming languages, aligning with the team's primary objective. The other options are not optimized for code generation. Llama2-70b is a general-purpose language model. BGE-large is for embeddings. MPT-7b might be suitable for code, but CodeLlama is more specialized.
46
[https://www.examtopics.com/discussions/databricks/view/302723-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/302723-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer has written scalable PySpark code to ingest unstructured PDF documents and chunk them in preparation for storing in a Databricks Vector Search index. Currently, the two columns of their dataframe include the original filename as a string and an array of text chunks from that document. What set of steps should the Generative AI Engineer perform to store the chunks in a ready-to-ingest manner for Databricks Vector Search? A. Use PySpark’s autoloader to apply a UDF across all chunks, formatting them in a JSON structure for Vector Search ingestion. B. Flatten the dataframe to one chunk per row, create a unique identifier for each row, and enable change feed on the output Delta table. C. Utilize the original filename as the unique identifier and save the dataframe as is. D. Create a unique identifier for each document, flatten the dataframe to one chunk per row and save to an output Delta table.
B. Flatten the dataframe to one chunk per row, create a unique identifier for each row, and enable change feed on the output Delta table. **Explanation:** Enabling Change Data Feed (CDF) is a critical requirement for Databricks Vector Search ingestion. Without it, the Vector Search index won’t automatically pick up inserts or updates from the Delta table. **Why other options are wrong:** * **A:** While UDFs and JSON structures might be useful for other data processing tasks, they are not directly related to the core requirements for preparing data for Databricks Vector Search ingestion. * **C:** Using the original filename as the unique identifier is not sufficient, as each chunk needs a unique identifier for proper indexing and retrieval in Vector Search. Also, this option does not enable change feed. * **D:** This option is close but misses the key requirement to enable change feed, which is necessary for ingestion into Vector Search. The Delta table needs to have change feed enabled to allow for automatic updates to the Vector Search index.
47
[https://www.examtopics.com/discussions/databricks/view/303123-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/303123-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer is ready to deploy an LLM application written using Foundation Model APIs. They want to follow security best practices for production scenarios. Which authentication method should they choose? A. Use OAuth machine-to-machine authentication B. Use an access token belonging to service principals C. Use an access token belonging to any workspace user D. Use a frequently rotated access token belonging to either a workspace user or a service principal
A. Use OAuth machine-to-machine authentication * **Why A is correct:** OAuth machine-to-machine (M2M) authentication is a widely recognized best practice for cloud platforms due to its enhanced security. It avoids reliance on individual user credentials, which can be compromised, and it's suitable for automated processes. * **Why B is incorrect:** While service principal tokens are better than user tokens, OAuth M2M is a superior method for secure, non-interactive authentication. * **Why C is incorrect:** Using a workspace user's access token is not a security best practice, as the application's access is tied to a specific user and their permissions. If the user leaves or their permissions change, the application could be affected. * **Why D is incorrect:** Rotating tokens is a good practice, but using a frequently rotated token belonging to a workspace user or service principal is still less secure than OAuth M2M. User tokens should be avoided.
48
[https://www.examtopics.com/discussions/databricks/view/303238-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/303238-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer has just deployed an LLM application at a manufacturing company that assists with answering customer service inquiries. They need to identify the key enterprise metrics to monitor the application in production. Which is NOT a metric they will implement for their customer service LLM application in production? A. Massive Multi-task Language Understanding (MMLU) score B. Number of customer inquiries processed per unit of time C. Factual accuracy of the response D. Time taken for LLM to generate a response
A. MMLU is NOT a metric to implement. MMLU is a benchmarking score used during LLM pre-training and evaluation, not something you’d monitor in a deployed, real-world production setting. The other options (B, C, and D) are relevant metrics for monitoring a production LLM application as they directly relate to its performance and utility in a real-world customer service scenario.
49
**** [https://www.examtopics.com/discussions/databricks/view/303249-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/303249-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer has created a RAG application which can help employees interpret HR documentation. The prototype application is now working with some positive feedback from internal company testers. Now the Generative AI Engineer wants to formally evaluate the system’s performance and understand where to focus their efforts to further improve the system. How should the Generative AI Engineer evaluate the system? A. Use ROUGE score to comprehensively evaluate the quality of the final generated answers. B. Use an LLM-as-a-judge to evaluate the quality of the final answers generated. C. Curate a dataset that can test the retrieval and generation components of the system separately. Use MLflow’s built in evaluation metrics to perform the evaluation on the retrieval and generation components. D. Benchmark multiple LLMs with the same data and pick the best LLM for the job. **
** C. Curating a dataset to test the retrieval and generation components separately, and using MLflow's built-in evaluation metrics, is the most effective method. This modular approach allows for targeted debugging and optimization of each component in the RAG system. By isolating variables, the engineer can methodically evaluate each part, leading to a more scientific and effective approach to improving the system's overall performance. The other options are not as comprehensive as they do not allow for the seperation of the generation and retrieval components.
50
[https://www.examtopics.com/discussions/databricks/view/303261-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/303261-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer is developing a RAG system for their company to perform internal document Q&A for structured HR policies, but the answers returned are frequently incomplete and unstructured. It seems that the retriever is not returning all relevant context. The Generative AI Engineer has experimented with different embedding and response generating LLMs but that did not improve results. Which TWO options could be used to improve the response quality? (Choose two.) A. Add the section header as a prefix to chunks B. Split the document by sentence C. Use a larger embedding model D. Increase the document chunk size E. Fine tune the response generation model
AD. * **A is correct:** Adding the section header as a prefix to each chunk gives the retriever additional context, allowing it to better match queries to the correct policy area, boosting retrieval relevance. * **D is correct:** Increasing the document chunk size allows each chunk to carry more contiguous context. This prevents key details from being split across multiple chunks and allows the retriever to return more complete sections. * **B is incorrect:** Splitting the document by sentence might lead to an even more fragmented context and hurt retrieval performance. * **C is incorrect:** The engineer already tried different embedding LLMs, so the size of the embedding model isn't the core issue. * **E is incorrect:** Fine-tuning the response generation model addresses the generation of responses, not the retrieval of relevant context.
51
[https://www.examtopics.com/discussions/databricks/view/303262-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/303262-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer is building a production-ready LLM system which replies directly to customers. The solution makes use of the Foundation Model API via provisioned throughput. They are concerned that the LLM could potentially respond in a toxic or otherwise unsafe way. They also wish to perform this with the least amount of effort. Which approach will do this? A. Ask users to report unsafe responses B. Host Llama Guard on Foundation Model API and use it to detect unsafe responses. C. Add some LLM calls to their chain to detect unsafe content before returning text D. Add a regex expression on inputs and outputs to detect unsafe responses.
B. Host Llama Guard on Foundation Model API and use it to detect unsafe responses. **Explanation:** Hosting Llama Guard on the Foundation Model API provides out-of-the-box toxicity and safety checks without application code changes. It proactively blocks or redacts unsafe content, requiring the least effort compared to custom detection calls or regex rules. Relying on user reports (Option A) is reactive, not proactive. Adding LLM calls (Option C) or regex expressions (Option D) requires more effort and custom implementation.
52
**** [https://www.examtopics.com/discussions/databricks/view/303264-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/303264-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer is using an LLM to classify species of edible mushrooms based on text descriptions of certain features. The model is returning accurate responses in testing and the Generative AI Engineer is confident they have the correct list of possible labels, but the output frequently contains additional reasoning in the answer when the Generative AI Engineer only wants to return the label with no additional text. Which action should they take to elicit the desired behavior from this LLM? A. Use few shot prompting to instruct the model on expected output format B. Use zero shot prompting to instruct the model on expected output format C. Use zero shot chain-of-thought prompting to prevent a verbose output format D. Use a system prompt to instruct the model to be succinct in its answer **
** A. Use few-shot prompting to instruct the model on the expected output format. Few-shot prompting provides the LLM with several examples of inputs paired with the desired, concise output format (just the mushroom label). This teaches the model the pattern you want it to follow, making it more reliable than single zero-shot instructions or generic system messages. The model can see concrete input-output pairs, learning the exact format required. Options B, C, and D are less effective because they don't provide the model with specific examples of the desired output format.
53
**** [https://www.examtopics.com/discussions/databricks/view/303266-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/303266-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer is building a RAG application that answers questions about technology-related news articles. The source documents may contain a significant amount of irrelevant content, such as advertisements, sports news, or entertainment news. Which approach is NOT advisable for building a RAG application focused on answering technology-only questions? A. Include in the system prompt that the application is not supposed to answer any questions unrelated to technology. B. Filter out irrelevant news articles in the retrieval process. C. Keep all news articles because the RAG application needs to understand non-technological content to avoid answering questions about them. D. Filter out irrelevant news articles in the upstream document database. **
** C. Keeping all news articles is not advisable because irrelevant content can lead to poor retrieval and potentially derail the model, even if the prompt emphasizes a focus on technology news. This increases noise and decreases the relevancy of the context provided to the model. The other options will filter out the noise, improving the accuracy of the RAG application.
54
```markdown [https://www.examtopics.com/discussions/databricks/view/303269-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/303269-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer is working with a retail company that wants to enhance its customer experience by automatically handling common customer inquiries. They are working on an LLM-powered AI solution that should improve response times while maintaining a personalized interaction. They want to define the appropriate input and LLM task to do this. Which input/output pair will do this? A. Input: Customer service chat logs; Output: Group the chat logs by users, followed by summarizing each user’s interactions, then respond B. Input: Customer service chat logs; Output: Find the answers to similar questions and respond with a summary C. Input: Customer reviews; Output: Classify review sentiment D. Input: Customer reviews; Output: Group the reviews by users and aggregate per-user average rating, then respond
A is correct. * **Explanation:** Option A directly addresses the company's goals by enabling both faster and more personalized customer support. Grouping chat logs by users and summarizing their interactions allows for a personalized response. * **Why other options are wrong:** * B lacks the personalization aspect, focusing on similar questions rather than individual user history. * C and D focus on customer reviews instead of customer support interactions. * C and D do not group by user, which is critical for personalized service. ```
55
[https://www.examtopics.com/discussions/databricks/view/303270-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/303270-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer has built an LLM-based system that will automatically translate user text between two languages. They now want to benchmark multiple LLM’s on this task and pick the best one. They have an evaluation set with known high quality translation examples. They want to evaluate each LLM using the evaluation set with a performant metric. Which metric should they choose for this evaluation? A. BLEU metric B. NDCG metric C. ROUGE metric D. RECALL metric
A. BLEU (Bilingual Evaluation Understudy) is the correct metric. It is specifically designed to evaluate the quality of machine-translated text by comparing it to one or more reference translations. * **Why A is correct:** BLEU is explicitly designed for translation tasks, measuring the similarity between the generated translation and reference translations. * **Why B is wrong:** NDCG (Normalized Discounted Cumulative Gain) is used for ranking tasks, not translation. * **Why C is wrong:** ROUGE is primarily used for text summarization, not translation. * **Why D is wrong:** Recall, while a general evaluation metric, doesn't provide a specific measure tailored to translation quality like BLEU.
56
[ExamTopics URL](https://www.examtopics.com/discussions/databricks/view/303306-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer is building a RAG application for answering employee questions on company policies. What are the steps needed to build this RAG application and deploy it? A. Ingest documents from a source -> Index the documents and saves to Vector Search -> User submits queries against an LLM -> LLM retrieves relevant documents -> Evaluate model -> LLM generates a response -> Deploy it using Model Serving B. User submits queries against an LLM -> Ingest documents from a source -> Index the documents and save to Vector Search -> LLM retrieves relevant documents -> LLM generates a response -> Evaluate model -> Deploy it using Model Serving C. Ingest documents from a source -> Index the documents and save to Vector Search -> Evaluate model -> Deploy it using Model Serving -> User submits queries against an LLM -> LLM retrieves relevant documents -> LLM generates a response D. Ingest documents from a source -> Index the documents and save to Vector Search -> User submits queries against an LLM -> LLM retrieves relevant documents -> LLM generates a response -> Evaluate model -> Deploy it using Model Serving
D. Ingest documents from a source -> Index the documents and save to Vector Search -> User submits queries against an LLM -> LLM retrieves relevant documents -> LLM generates a response -> Evaluate model -> Deploy it using Model Serving * **Why D is correct:** The correct sequence for building and deploying a RAG application involves ingesting and indexing documents, allowing users to submit queries, the LLM retrieving relevant documents and generating a response, evaluating the response, and finally deploying the model. Evaluation must happen after response generation. * **Why other options are wrong:** * **A:** Evaluation should occur after the LLM generates a response, not before. * **B:** Document ingestion and indexing must precede any user queries. * **C:** Evaluation and deployment cannot happen before user queries and response generation; these steps rely on having data and a model to evaluate.
57
[https://www.examtopics.com/discussions/databricks/view/304026-exam-certified-generative-ai-engineer-associate-topic-1/](https://www.examtopics.com/discussions/databricks/view/304026-exam-certified-generative-ai-engineer-associate-topic-1/) A Generative AI Engineer is building an LLM to generate article headlines given the article content. However, the initial output from the LLM does not match the desired tone or style. Which approach would be most effective for adjusting the LLM’s response to achieve the desired response? A. Exclude any article headlines that do not match the desired output B. Fine-tune the LLM on a dataset of desired tone and style C. Provide the LLM with a prompt that explicitly instructs it to generate text in the desired tone and style D. All of the above
D. All of the above * **Why D is correct:** All options offer potential methods for adjusting the LLM's output. Excluding unwanted headlines helps filter results. Fine-tuning directly trains the model on desired styles. Prompt engineering guides the LLM's generation process. * **Why A is incorrect:** While filtering can help, it doesn't address the underlying issue of the LLM generating unwanted outputs. It's a reactive, not proactive, approach. * **Why B is incorrect:** Fine-tuning can be effective, but it can be resource-intensive and time-consuming. It might be overkill if prompt engineering can achieve the desired results. * **Why C is incorrect:** Prompt engineering is a quick and efficient way to guide the LLM's output. It provides instructions on the desired tone and style. While effective alone, it is enhanced when used with filtering and fine-tuning methods.