Storage and Retrieval Flashcards
What are the two types of database workload?
Transactional Workload (write intensive), Analytical Workload (read-intensive)
What are logs?
Logs are used by databases to record the data in the database, they are read-only data files storing the records in sequence
What is the disadvantage of log files?
Reading the file is in O(n) which is slow for a large databse
How might log file access be sped up?
Using binary files rather than plain text
How are records deleted form log files?
Instead of scanning the entire log to delete a record a special delete (tombstone) record is appended to the end
What is a hash index?
An index for key-value data, similar to dictionaries.
How is disk space managed for log files?
The log is broken into segments of a certain size by closing the file when it reaches a given size. Subsequent writes are on a new file.
What is compaction?
The process of removing duplicate keys to reduce segment size.
What are the advantages of a Hash table?
- Sequential write operations are much faster than random access to disk
- Crash Recovery can be easier since when updating a value you’re not actually overwriting its old content
What are the disadvantages of a hash table?
- The hash table must fit into memory
- Range queries are not possible
What are SSTables?
Sorted String Tables in which keys appear only once per segment and the log is sorted by key.
How do SSTables locate keys?
A sparse selection of keys are held in memory pointing to segments. To find a specific key the segment is found by comparing alphabetically to the keys in index.
How is the SSTables log kept sorted by key?
Using an AVL tree