Vector Database Concepts Flashcards

(49 cards)

1
Q

What is a vector database?

A

A specialized database designed to store and query high-dimensional vector embeddings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a vector embedding?

A

A numeric representation of data (text, images, etc.) in a high-dimensional space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why use vector databases?

A

To perform efficient similarity search across large sets of embeddings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a similarity search?

A

Finding items that are most similar to a given query vector.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is cosine similarity?

A

A metric that measures the cosine of the angle between two vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Euclidean distance?

A

A metric that measures the straight-line distance between two points in space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is dot product similarity?

A

A similarity metric based on the dot product of two vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is an ANN index?

A

Approximate Nearest Neighbor index used to speed up similarity search.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is brute-force search?

A

A similarity search method that compares every vector — accurate but slow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is HNSW?

A

Hierarchical Navigable Small World — a popular ANN algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is IVF in vector search?

A

Inverted File Index — used to partition vectors for faster search.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is PQ (Product Quantization)?

A

A technique to compress and speed up vector similarity searches.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Faiss?

A

A library developed by Facebook for efficient similarity search and clustering of dense vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Milvus?

A

An open-source vector database for scalable similarity search.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Weaviate?

A

A vector search engine with integrated machine learning capabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Pinecone?

A

A fully managed vector database service for production-grade applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Qdrant?

A

An open-source vector search engine focused on performance and reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is Vespa?

A

A platform for serving vector search and recommendation systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is vector dimensionality?

A

The number of features or coordinates in each embedding vector.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is vector normalization?

A

Scaling vectors to have unit length for consistent similarity metrics.

21
Q

What is index building?

A

The process of preparing vectors for fast similarity search.

22
Q

What is index probing?

A

Querying an ANN index to find nearest vectors.

23
Q

What is filtering in vector search?

A

Restricting search to vectors that meet certain metadata criteria.

24
Q

What is metadata in vector databases?

A

Structured attributes associated with each vector for filtering.

25
What is hybrid search?
Combining keyword-based search with vector similarity search.
26
What is semantic search?
Search based on meaning and context rather than exact keywords.
27
What is multimodal search?
Search that supports multiple input types, like text and images.
28
What is a collection in a vector database?
A logical grouping of vectors with similar purpose.
29
What is a vector store?
A system or component that holds and manages vector embeddings.
30
What is vector recall?
The proportion of relevant vectors successfully retrieved.
31
What is vector precision?
The proportion of retrieved vectors that are relevant.
32
What is embedding drift?
When the meaning of embeddings changes due to retraining.
33
What is index rebalancing?
Redistributing data in an index for optimal performance.
34
What is sharding in vector databases?
Splitting data across multiple machines for scalability.
35
What is replication in vector databases?
Copying data across nodes for fault tolerance.
36
What is vector ingestion?
The process of adding new embeddings to the vector database.
37
What is vector deletion?
Removing vectors from a vector database.
38
What is vector update?
Replacing or modifying vectors in the database.
39
What is cold start in vector search?
Performance issues when index is not yet warmed up.
40
What is ANN accuracy vs. speed tradeoff?
Faster searches may be less accurate due to approximation.
41
What is latency in vector search?
The time it takes to return search results.
42
What is throughput in vector search?
The number of queries a system can handle per second.
43
What is batch querying?
Submitting multiple queries at once for efficiency.
44
What is vector quantization?
Compressing vectors for faster processing and reduced memory usage.
45
What is a vector similarity threshold?
A cutoff value to decide which vectors are similar enough.
46
What is top-k retrieval?
Returning the top k most similar vectors for a given query.
47
What is dynamic indexing?
Updating the index as new data is added or removed.
48
What is recall@k?
The fraction of relevant items found in the top-k results.
49
What is ANN benchmarking?
Evaluating accuracy and performance of approximate nearest neighbor algorithms.