ML and Gen AI Refresh Flashcards
(97 cards)
What kind of DB does rag use?
Vector database
What is a vector database
It is a database that is designed to index, store, and query data in a vector format (e.g. an n-dimensional vector embedding).
How does a vector database work?
Works by querying for k-vectors that are closest to other vectors in terms of a distance metric like cosine similarity, dot product etc. Instead of using K-exact nearest neighbors, we typically use ANN approx nearest neighbors. It diminishes recall (e.g. drops some documents that might in fact be similar FN) but is more performant from a computational perspective.
Why I would want to optimize recall!
We optimize recall if we do not want many false negatives. E.g. me telling someone they don’t have an STD but they do.
Why would I want to optimize precision?
We optimize precision if we want to ensure that we don’t get a lot of false positives–e.g. telling someone they have cancer but they don’t.
What are some issues with vector queries
Not many great algorithms for efficent knn queries to gurantee finding the k-nearest neighbor to a given vector. Hence why we typically opt for ANN, which drops in accuracy, but is efficient.
What is zero-shot prompting?
Zero-shot prompting can be thought of like your asking someone to solve a problem with no context and hoping they get the right answer.
What is few-shot prompting?
Few-shot prompting can be thought of like your adding some context (examples) to help the person solve the problem.
What are some limitations of few-shot prompting?
They’re not great at dealing with complex reasoning tasks. In this case we need more of a chatbot like structure in our response, such as Chain of Thought (CoT) prompting.
What is CoT Prompting?
CoT Prompting is like a Q-A-Q-A answering technique that aims to get to the right answer by breaking out the reasoning into steps. For example you could ask a simple math problem, get the answer back, and then ask a different question that is somewhat an expansion on the first and get back the correct response compared to asking that question upfront.
How does CoT come into play with zero-shot?
You can add in zero-shot prompts with CoT by simply adding “Let’s think step by step” in the prompt.
What are chains in LangChain
They are a sequence of calls, whether to an LLM, tool, or a data preprocessing step.
Why do we use chains?
Chains allow you to go beyond just a single API call to a language model and instead chain together multiple calls in a logical sequence.
Give me an example of the input for chains.
prompt and model (llm)
Give me an example of the input for chain.run
query, text where query is the base prompt and text is what will be going into the chain.
What is the main architecture powering Foundational Models
Transformer Architecture. Essentially it just provides the ability to perform parallel training of gigantic neural networks with billions of parameters.
What is an encoder-only architecture?
BERT is an example of this. Essentially it only contains the encoder piece and transforms the text into it’s vector representation.
Explain to me the transformer architecture
Reference: https://blue-season.github.io/transformer-in-5-minutes/
What is a decoder-only architecture?
GPT-3. Contains only the decoder. They extend input text sequence by generating continuations. Text completion and generation.
What is an encoder-decoder architecture?
Contains both. Decoder consumes the encoded embeddings to generate output text. This can be used for things like text-to-text e.g. translation.
What makes Foundational Models different?
Scale, architecture, pretraining, customization, versatility, infrastructure.
What are some different types of FMs
Language, Computer Vision, Generative model, Multimodal
Walk me through a basic RAG architecture
Let’s break out what is in RAG. On an extreme high level, rag consists of three important things:
1. The question (user)
2. The external knowledge database (the library of current and or relevant knowledge)
3. The retriever (the librarian tasked to get documents related to that task and return it to better help the user answer their question)
4. A really smart, but also out of date or touch model (the LLM)
So here’s what happens:
1. The librarian takes the users question and finds similar documents that are relevant to that question from the library.
2. Those similar documents are then added to the users questions to help the LLM answer that question way more correctly.
What should all RAGS integrate? NNIC
Counter Factual Robustness + Noise Robustness
* Ability to handle noisy or irrelevant data contained in the retrieved documents
- Ability to be like hey, these documents I pulled are totally irrelevant to this question the user is asking. E.g. I want to know how to make hot chocolate, here are some documents about how to bake a cake :(.
Negative Rejection
* Reject the answer if it lacks sufficient knowledge (I.E. the LLM gave a crap answer and OR our database returned crap documents because it has nothing related to say a very nuanced question on how to formalize a university grade class on underwater basket weaving.
Information Integration
* Ability to integrate information from multiple sources to answer more complex questions. E.g. think of our library not centralized to just english, but science, music, and dare I say information even about yourself!