Dictionaries and Hash Tables Flashcards
(16 cards)
What is a hash table?
A data structure that stores a key/value pair based on an index calculated from a hashing algorithm
What is a collision in hash tables?
When two key values compute to the same hash value
What are properties of good hash functions?
- Make use of all info provided by the key
- Uniformly distribute output across table to reduce clustering
- Maps similar keys to very different hash values
- Always give same hash value for a key
- Make use of only computationally fast operations
- Handle keys of any data type
How can collisions be dealt with?
- (Rehashing)?
- Linear probing
- Seperate Chaining
What is linear probing and what are its disadvantages?
- Look for the next available index in hash and insert item there
- However increases potential for further collisions and can promote clustering
- no known association between key and its location so linear search needs to be used to find item
What is clustering in hash tables?
Nearby indexes are not randomly or evenly distributed
What is separate chaining?
Storing a list at each location in the hash table, rather than single key value pairs.
When item is inserted into hash table, it is appended to the list stored at the location indicated by the hash value of the key.
Allows multiple key-value pairs to be stored at the same hash table index
What is rehashing?
Process of increasing the size of a hash table and redistributing the elements to new indexes based on their new hash values
reduces likelihood of collisions as there are now more buckets/locations for key value pairs to be in
What are the uses of hash tables?
- Databases: quick storage and retrieval of data based on primary keys
- Cache memory: determining the memory address for a given item of data
- Operating Systems: store locations of its programs and utilities for quick access
(Encryption)
How do hash tables work?
- When an item is added, its key is processed by a hashing algorithm that returns a hash value
- Items are placed in indexes specified by the hash value
- To search for an item, key is processed again and the resulting index is checked for the item, if it does not exist we can confidently say it is not in the table
What are the advantages of using hash tables to store data?
- Faster to lookup a record
- Time complexity of O(1) for retrieval
What are the advantages of storing data in a dictionary?
- More compact file as there are no empty records
- Produce a sorted list
- Do not have to design a hash algorithm so easier to design
- No need to deal with collisions
How are records added to a hash table?
- Hashing algorithm is applied to a key
- Result is the location in the table where the record should be stored
- If location is not empty then use next free location
Benefits of separate chaining over linear probing
You can keep adding items without causing clustering or increasing likelihood of collisions
Key value pairs can still be found in known locations - no need search whole table to find item
Drawbacks of separate chaining
The appropriate list still needs to be linear searched when accessing an item
When is rehashing performed
When a hash table load factor exceeds a predetermined load threshold.
The load factor is the ratio of stored key-value pairs to locations/buckets within the hash table.