11. Indexing Flashcards
(10 cards)
Indexing Purpose:
To provide alternative access paths to records in a file, significantly improving search performance compared to linear scans or binary search on large data files. Helps overcome limitations of physical sorting (only one sort key possible).
Index Structure:
An index is typically a separate file containing index key values and pointers to data records or blocks. Index entries are sorted by the index key.
Dense vs Sparse Index:
◦ Dense Index: Has one index entry for every data record. Secondary indexes are typically dense.
◦ Sparse Index: Has fewer index entries than data records, typically one entry per data block. Primary and Clustered indexes are sparse.
Single-Level Ordered Indexes:
◦ Primary Index: On a file physically sorted by a unique key. Determines physical placement. Sparse. Index entries point to the start block of records with that key (or the first record if key is unique). One per file.
◦ Clustered Index: On a file physically sorted by a non-unique key. Determines physical placement. Sparse. One per file. Can lead to wasted space in blocks if key values span blocks.
◦ Secondary Index: On fields other than the physical sort key or on unordered files. Does not determine physical placement. Can be unique (on alternate keys) or non-unique. Can have many per file. Unique secondary indexes are dense. Non-unique secondary indexes can be dense or sparse (using indirection/pointer lists). Dense secondary indexes have one entry per record and point to the record location.
Multi-Level Indexes:
Indexing the index file itself to further improve performance, especially for very large indexes. The bottom level is the single-level index; higher levels are primary indexes on the level below. Larger blocking factor (more entries per block) reduces the number of levels needed.
B-Tree/B+-Tree:
Dynamic multi-level index structures that are always balanced. Nodes are typically stored in disk blocks. They provide efficient search, insertion, and deletion while keeping the tree balanced.
◦ B-Tree: Data pointers are in both leaf and internal nodes. Nodes are at least half full (except root). All leaf nodes are at the same depth.
◦ B+-Tree: Data pointers only appear in the leaf nodes. Leaf nodes are typically linked together for efficient range traversal. Internal nodes have more entries than B-trees of the same order because they don’t store data pointers.
Indexing Benefits:
Significantly reduce disk accesses for searches compared to linear scans. Binary search on an index is faster than on the data file itself because the index is smaller.
Indexing Downsides:
Add overhead for data modification (insert, delete, update) as both data and index(es) need to be updated. Index rebuild is needed after bulk loads.
Hashing vs B-Trees for Queries:
Hashing is very fast for exact matches on the hash key.
However, hashing is terrible for range queries because records with similar key values are not necessarily stored physically close together.
B-trees are good for exact matches and efficient for range queries due to the ordered structure and linked leaves.
Optimizer Use:
The query optimizer decides whether to use an index based on the query type, restrictions, table size, and index characteristics. Index-only queries (where all needed data are in the index) are very fast. Indexes are useful for exact key matches, partial key matches (on leading parts), and range queries. Sequential scans are used if no suitable index exists or if the query retrieves a large percentage of the data.