Hash tables Flashcards
(22 cards)
What is a hash table used for?
Efficiently storing and retrieving key-value pairs.
What is the basic idea of hashing?
Map a key to an array index using a hash function.
Why can’t we use direct addressing for all keys?
Because the key space may be sparse and memory usage would be inefficient.
What is a collision in a hash table?
When two different keys map to the same index.
How does chaining handle collisions?
Each slot holds a linked list of entries with the same hash index.
What is open addressing?
A method of resolving collisions by probing for the next free slot in the array.
What is linear probing?
A probing strategy where we search sequentially for the next free slot.
Why is deletion hard in open addressing?
Removing a key breaks the probe chain and can make some entries unreachable.
What is a tombstone in open addressing?
A special marker used in place of a deleted entry to preserve the probe chain.
Why are tombstones necessary?
To ensure that searches can continue past deleted slots and find displaced entries.
What problem do tombstones create over time?
They accumulate and degrade performance unless periodically cleaned up.
What is the load factor (α) in a hash table?
The ratio of the number of items (n) to the table size (M): α = n/M.
What is the average complexity of put/get in open addressing?
Approximately Θ(1 + 1/(1 - α)) under uniform hashing.
What is primary clustering?
The formation of large contiguous blocks of occupied slots in linear probing.
How can primary clustering be reduced?
By using quadratic probing or double hashing instead of linear probing.
What makes a good hash function?
One that distributes keys uniformly and depends on all characters and their positions.
What is an example of a good hash function for strings in Java?
hash = hash * 31 + char; 31 is chosen for performance and distribution.
What happens if your table size M interacts poorly with your hash function?
The hash function’s effectiveness is reduced — may ignore most input characters.
What is the complexity of rehashing?
O(n) — but amortized over many operations, the average cost remains low.
How does chaining compare to open addressing in terms of worst-case performance?
Both degrade to O(n) in the worst case, but chaining handles high load factors better.
Why is resizing a hash table necessary?
To maintain low load factor and efficient performance.
What is the impact of a high load factor on open addressing?
It increases probe lengths and degrades performance.