RAG Systems, LLM Pipelines, and LangChain Flashcards
(7 cards)
What is the RAG pipeline architecture?
Ingestion → Embedding → Vector DB → Retrieval → Prompt Injection → LLM Response.
What is the ingestion stage of the RAG pipeline?
- Building of the knowledge base
- Raw documents (PDFs, websites, databases, files) are processed
- cleaned, chunked (e.g. into 500-1000 token chunks) to optimize retrieval granularity.
What is the Embedding Stage of the RAG pipeline?
Each chunk is passed through an embedding model to generate explanations
e.g. OpenAI embedding model
- embedding converts textual information into high dimensional vectors that capture semantic meaning
What is a vector database? What does it do in our RAG pipeline?
- The embeddings are stored in a vector database
- allows fast similarity search using metrics like cosine similarity or dot-product
- Database may also store metadata for filtering during retrieval
What is the retrieval stage of our RAG pipeline?
When user submits query:
- Query is embedded using the same embedding model
- embedded query vector us used to retrieve the top-k most semantically similar document chunks from the vector DB
What is the prompt injection phase of our RAG pipeline?
The retrieved documents are formatted and injected into the prompt for the LLM.
What is the final response of our RAG pipeline?
The final prompt is passed to the LLM (e.g. GPT-4o, Claude, Gemini, etc.).
The LLM generates a response conditioned on both:
- The original user query.
- The retrieved external knowledge injected into the prompt.
Output may be post-processed for citations, filtering, or user feedback collection.