Hash Tables Flashcards

1
Q

Hashing

A
  • A technique used for performing insertions, deletions, and searches on constant average time
  • Tree operations that require any ordering information among the elements are not supported efficiently
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Hashing Data Structure

A

Hash Table (generalization of ordinary arrays) is used to implement hashing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Hash Table Components

A

Key and entry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to find an entry?

A
  • To find an entry in the table, you only need to know the content of one of the fields (not all of them). This field is called key
  • Ideally, a key uniquely identifies an entry
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Direct Adress table

A

-Direct-address tables are ordinary arrays

  • Facilitate direct addressing
  • Element whose key is k is obtained by indexing into the kth position of the array
  • It is applicable when we can afford to allocate an array with one position for every possible key
  • Insert/Delete/Search can be implemented to take O(1) time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Disadvantage of direct-address table

A

The disadvantage is that we could end up with a big array that has a lot of empty space. This wastes lots of memory. The bigger we scale, the worse this problem gets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Cons direct-address table

A
  • If U is large, then a direct-address table T of size |U| is impractical or impossible (considering memory)
  • ->The range of keys determines the size of T

-The set K could be so small relative to U that most of the allocated space is wasted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Direct-Address table vs. Hash Table

A

-Direct-address table
o Element with key k is stored in slot k

-Hash table
o Use hash function h(k) to compute the slot where k will be inserted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Hash table big idea

A
  • We can make the array smaller, and use the hash function to select which slot
    we put our values into. For example the hash function = modulus function
  • Basically, we are compressing the domain of our keys so that they fit into the array
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hash table size

A

-The size of the hash table is proportional to |K|
o We lose the direct-addressing ability
o We need to define functions that map keys to slots of the hash table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Hash function definition

A

-A hash function h transforms a key k into a number that is used as an index in an array to locate the desired location (“bucket” or “slot”)

o An element with key k hashes to slot h(k) 11.2 Hash tables
o h(k) is the hash value of key k
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Hash function pros

A

oSimple to compute
oAny two distinct keys get different cells
oThis is impossible, thus we seek a hash function that distributes the keys evenly among the
cells

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Hash table components

A
  • The ideal hash table is an array of some fixed size m, containing items (key and additional entry)
  • Each key k is mapped (using a hash function) into some number in the range 0 to m – 1 and places in the appropriate cell
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Hash table big O for operations

A

-Insertion of a new entry is O(1)

-Lookup for data is O(1) on average
oStatistically unlikely that O(n) will occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Issues with hashing

A
  • Decide what to do when two keys hash to the same value (collision)
  • Choose a good hash function
  • Choose the size of the table
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Hashing Division method

A
-Maps key k into one of m slots by taking the remainder of k divided by m 
o h(k) = k % m
17
Q

Collision

A

-A hash function maps most of the keys onto unique integers, but maps some
other keys onto the same integer

  • Multiple keys can hash to the same slot: collisions are unavoidable
  • Design a good hash function will minimize the number of collisions that occur, but they will still happen, so we need to be able to deal with them
18
Q

Basic collision resolution policies

A
  • –Chaining
  • > Store all elements that hash to the same slot in a linked list
  • > Store a pointer to the head of the linked list in the hash table slot
  • –Open Addressing
  • > All elements stored in hash table itself
  • > When collision occur, use a systematic procedure to store elements in free slots of the table
19
Q

Basic collision resolution policies

A
  • –Chaining
  • > Store all elements that hash to the same slot in a linked list
  • > Store a pointer to the head of the linked list in the hash table slot
  • –Open Addressing (linear, quadratic, double hashing)
  • > All elements stored in hash table itself
  • > When collision occur, use a systematic procedure to store elements in free slots of the table (linear, quadratic, double hashing)
20
Q

Chaining Insert Pseudocode and running time

A

CHAINED-HASH-INSERT(T,x)
insert x at the head of list T[h(x.key)]

Worst case: O(1)

21
Q

Chaining search pseudocode and running time

A

CHAINED-HASHSEARCH(T,k)
search for an element with key k in list T[h(k)]

Worst case: proportional to the length of the list

22
Q

Chaining delete pseudocode and running time

A

CHAINED-HASH-DELETE(T,x)
delete x from the list T[h(x.key)]

Worst-case running time for deletion is O(1) if list is doubly linked

23
Q

When is chaining used?

A
  • Chaining is used when we cannot predict the number of records in advance, thus the choice of the size of the table depends on available memory
  • Typically it’s chosen relatively small, so no large area of contiguous memory is used; but large enough so that the lists are short for more efficient search
24
Q

Open Addressing

A
  • Idea: Store all elements in the hash table itself. That is, each slot contains either an element or NIL
  • Advantages: Avoids pointers
25
Open Addressing methods
- linear probing - quadratic probing - double hashing
26
Open Addressing linear probing hash function
If collision occurs, next probes (slots examined during a key insertion or searching) are performed following the formula: -> h(k)=(hash(k)+ f (i)) mod m ``` o h(k) is the index in the table indicating where to insert k o hash(k) is the auxiliary hash function o f(i) is the collision resolution function with i = 0 ... m – 1 o m the size of the hash table ```
27
Open Addressing linear probing hash function
If collision occurs, next probes (slots examined during a key insertion or searching) are performed following the formula: -> h(k)=(hash(k)+ f (i)) mod m ``` o h(k) is the index in the table indicating where to insert k o hash(k) is the auxiliary hash function o f(i) is the collision resolution function with i = 0 ... m – 1 o m the size of the hash table ```
28
Open addressing insert pseudocode
§ Insert: Examine slot h(k) o If slot is empty, insert element o If slot is full, try another slot, ..., until an open slot is found (probing) ü Wrapping around to the beginning if we hit the end of the table ``` Insert(T,k) I←0 repeat j ← h(k,i) if T[j] = NIL then T[j] ← k return j else i←i+1 until i = m ``` error “hash table overflow”
29
Open addressing search pseudocode
§ Search: Examine slot h(k) o If slot h(k) contains key k, search successful o If the slot h(k) contains NIL, search unsuccessful o If slot h(k) contains a key that is not k, keep probing (by following the same sequence of probes when inserting the element) until we either find key k or we find a slot holding NIL ``` Search(T,k) i←0 repeat j ← h(k,i) if T[j] = k then return j i←i+1 until T[j] = NIL or i = m return NIL ```
30
Open addressing deletion
-If we delete a key from slot h(k) we cannot mark the slot NIL. Why? o We need to: -> Mark the slot with a special value ->Modify SEARCH so it keeps on looking when it sees the special value -> INSERT would treat the slot with a special value as empty slot, so a new value can be added
31
Quadratic probing
§ Similar to linear probing § ... but instead of using a linear function to advance in the table in search of an empty slot, we use a quadratic function to compute the next index in the table to be probed h(k)=(hash(k)+i^2) mod m i = 0...m−1 § The idea is to skip regions in the table with possible clusters
32
Quadratic probing disadvantage
Secondary Clustering
33
Linear probing disadvantage
Primary Clustering § We can see clusters form (blocks of adjacent buckets all occupied) § Thus increasing the time it takes to do insert and search for any key that hashes to a buckets in that cluster
34
Double Hashing
- Purpose same as in quadratic probing: to overcome the disadvantage of clustering - Instead of examining each successive entry following a collided position, we use a second hash function to get a fixed increment for the “probe” sequence § hash1 gives the initial probe. hash2 gives the remaining probes § There are a couple of requirements for the second function: o It must never evaluate to 0 o Must make sure that all buckets can be probed h(k)=(hash1(k)+i * hash2 (k)) mod m i = 0...m−1
35
Popular second hash function
hash2 (k) = R − (k mod R) , where R is a prime number that is smaller than the size of the table