10. Database Files and Storage Flashcards
(7 cards)
Storage Hierarchy:
Multiple levels of storage with trade-offs in speed, cost, and capacity: Primary (CPU access - cache, RAM), Secondary (Disk - HDDs, SSDs), Tertiary (Optical, Tape). Databases rely heavily on secondary storage.
Blocks:
Data is transferred between disk and memory in fixed-size units called blocks (or pages). It is more efficient to transfer data in blocks than individual records due to disk access time components (seek time, rotational latency).
Buffering:
Reserving areas in main memory (buffers) to hold disk blocks, allowing parallelisation of I/O and CPU processing.
Records (Collections of related data values):
◦ Fixed Length: All records have the same size.
◦ Variable Length: Records have different sizes.
Spanned vs Unspanned Records:
◦ Unspanned: A record cannot cross block boundaries. Simple to implement, but can waste space if records don’t fit neatly into blocks.
◦ Spanned: A record can be split across multiple blocks. More efficient use of block space, especially for large or variable-length records, but more complex to implement and access.
Record Blocking:
The blocking factor (bfr) is the number of records stored per block. For fixed-length, bfr = floor(Block Size / Record Size).
File Organization (How records are arranged in a file on disk):
◦ Unordered Records (Heap Files): Records are inserted at the end. Efficient for insertion. Slow for search (linear scan). Deletion leaves fragmented space. Sorting requires creating a new copy.
◦ Ordered Records (Sorted Files): Records are sorted based on a key field. Efficient for sequential reading and range searches on the sort key. Binary search on the sort key is O(log2 b) disk accesses. Expensive for insertion/deletion/update of sort key values. No help for searching on non-sort key fields.
◦ Hashed Records (Hash Files): Records are placed based on a hash function applied to a key field. Extremely fast access (direct address) for exact matches on the hash key. Easier insertion/deletion than sorted files. Collision resolution is needed.