What does a Database do?
Primarily 2 things: stores and retrieves data
What are the 2 main philosophies for primary data storage
2. Update in place
Log
An append only sequence of records
Index
Additional metadata to help locate specific data
SSTable
Sorted String Table
- A file in which data is sorted by key
LSM-tree
Log-structured Merge Tree
Fundamental idea
What is the DB write path when using SSTable / LSM-tree
What is the DB read path for a DB implemented with SSTables / LSM-trees
LSM-tree Compaction & Merge Strategies
Size-tiered - newer, smaller SSTables merged into older, larger SSTables
(HBase, supported by Cassandra, Scylla)
Levels - key range split into smaller SSTables, older data is moved into separate levels
(Used by LevelDB, RocksDB, supported by Cassandra, Scylla)
B-Tree
Standard index implementation in almost all relational DBs (and many non-relational)
Branching factor
The number of references to child pages in one B-tree page
WAL
Write-ahead Log
- Append-only file where every B -tree modification is written before its written to the tree (in case of failure during hardware write, which could cause an orphan page with no parent)
Writes/Reads for LSM-tree vs B-tree
LSM-tree typically faster for writes
B-tree typically fasters for reads