Hash Tables Flashcards
Hashing
- A technique used for performing insertions, deletions, and searches on constant average time
- Tree operations that require any ordering information among the elements are not supported efficiently
Hashing Data Structure
Hash Table (generalization of ordinary arrays) is used to implement hashing.
Hash Table Components
Key and entry
How to find an entry?
- To find an entry in the table, you only need to know the content of one of the fields (not all of them). This field is called key
- Ideally, a key uniquely identifies an entry
Direct Adress table
-Direct-address tables are ordinary arrays
- Facilitate direct addressing
- Element whose key is k is obtained by indexing into the kth position of the array
- It is applicable when we can afford to allocate an array with one position for every possible key
- Insert/Delete/Search can be implemented to take O(1) time
Disadvantage of direct-address table
The disadvantage is that we could end up with a big array that has a lot of empty space. This wastes lots of memory. The bigger we scale, the worse this problem gets.
Cons direct-address table
- If U is large, then a direct-address table T of size |U| is impractical or impossible (considering memory)
- ->The range of keys determines the size of T
-The set K could be so small relative to U that most of the allocated space is wasted
Direct-Address table vs. Hash Table
-Direct-address table
o Element with key k is stored in slot k
-Hash table
o Use hash function h(k) to compute the slot where k will be inserted
Hash table big idea
- We can make the array smaller, and use the hash function to select which slot
we put our values into. For example the hash function = modulus function - Basically, we are compressing the domain of our keys so that they fit into the array
Hash table size
-The size of the hash table is proportional to |K|
o We lose the direct-addressing ability
o We need to define functions that map keys to slots of the hash table
Hash function definition
-A hash function h transforms a key k into a number that is used as an index in an array to locate the desired location (“bucket” or “slot”)
o An element with key k hashes to slot h(k) 11.2 Hash tables o h(k) is the hash value of key k
Hash function pros
oSimple to compute
oAny two distinct keys get different cells
oThis is impossible, thus we seek a hash function that distributes the keys evenly among the
cells
Hash table components
- The ideal hash table is an array of some fixed size m, containing items (key and additional entry)
- Each key k is mapped (using a hash function) into some number in the range 0 to m – 1 and places in the appropriate cell
Hash table big O for operations
-Insertion of a new entry is O(1)
-Lookup for data is O(1) on average
oStatistically unlikely that O(n) will occur
Issues with hashing
- Decide what to do when two keys hash to the same value (collision)
- Choose a good hash function
- Choose the size of the table
Hashing Division method
-Maps key k into one of m slots by taking the remainder of k divided by m o h(k) = k % m
Collision
-A hash function maps most of the keys onto unique integers, but maps some
other keys onto the same integer
- Multiple keys can hash to the same slot: collisions are unavoidable
- Design a good hash function will minimize the number of collisions that occur, but they will still happen, so we need to be able to deal with them
Basic collision resolution policies
- –Chaining
- > Store all elements that hash to the same slot in a linked list
- > Store a pointer to the head of the linked list in the hash table slot
- –Open Addressing
- > All elements stored in hash table itself
- > When collision occur, use a systematic procedure to store elements in free slots of the table
Basic collision resolution policies
- –Chaining
- > Store all elements that hash to the same slot in a linked list
- > Store a pointer to the head of the linked list in the hash table slot
- –Open Addressing (linear, quadratic, double hashing)
- > All elements stored in hash table itself
- > When collision occur, use a systematic procedure to store elements in free slots of the table (linear, quadratic, double hashing)
Chaining Insert Pseudocode and running time
CHAINED-HASH-INSERT(T,x)
insert x at the head of list T[h(x.key)]
Worst case: O(1)
Chaining search pseudocode and running time
CHAINED-HASHSEARCH(T,k)
search for an element with key k in list T[h(k)]
Worst case: proportional to the length of the list
Chaining delete pseudocode and running time
CHAINED-HASH-DELETE(T,x)
delete x from the list T[h(x.key)]
Worst-case running time for deletion is O(1) if list is doubly linked
When is chaining used?
- Chaining is used when we cannot predict the number of records in advance, thus the choice of the size of the table depends on available memory
- Typically it’s chosen relatively small, so no large area of contiguous memory is used; but large enough so that the lists are short for more efficient search
Open Addressing
- Idea: Store all elements in the hash table itself. That is, each slot contains either an element or NIL
- Advantages: Avoids pointers