RAG Systems, LLM Pipelines, and LangChain Flashcards

Question 1

Q

What is the RAG pipeline architecture?

Answer

A

Ingestion → Embedding → Vector DB → Retrieval → Prompt Injection → LLM Response.

Question 2

Q

What is the ingestion stage of the RAG pipeline?

Answer

A

Building of the knowledge base
Raw documents (PDFs, websites, databases, files) are processed
cleaned, chunked (e.g. into 500-1000 token chunks) to optimize retrieval granularity.

Question 3

Q

What is the Embedding Stage of the RAG pipeline?

Answer

A

Each chunk is passed through an embedding model to generate explanations
e.g. OpenAI embedding model
- embedding converts textual information into high dimensional vectors that capture semantic meaning

Question 4

Q

What is a vector database? What does it do in our RAG pipeline?

Answer

A

The embeddings are stored in a vector database
allows fast similarity search using metrics like cosine similarity or dot-product
Database may also store metadata for filtering during retrieval

Question 5

Q

What is the retrieval stage of our RAG pipeline?

Answer

A

When user submits query:
- Query is embedded using the same embedding model
- embedded query vector us used to retrieve the top-k most semantically similar document chunks from the vector DB

Question 6

Q

What is the prompt injection phase of our RAG pipeline?

Answer

A

The retrieved documents are formatted and injected into the prompt for the LLM.

Question 7

Q

What is the final response of our RAG pipeline?

Answer

A

The final prompt is passed to the LLM (e.g. GPT-4o, Claude, Gemini, etc.).

The LLM generates a response conditioned on both:
- The original user query.
- The retrieved external knowledge injected into the prompt.

Output may be post-processed for citations, filtering, or user feedback collection.

RAG Systems, LLM Pipelines, and LangChain Flashcards

(7 cards)