10-12 Flashcards

1
Q

How does a hash table work? What is its complexity?

A

Data is stored at a position in the hash array which correlates to its value. This can then be accessed in O(1) time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is direct hashing and extraction? What are the problems?

A

direct hashing places key i into position A[i] of an array. If the keys are large they would require a really large array, much larger than the amount of items. Extraction takes only certain parts of the key e.g(4th and 9th digit), but this could increase collisions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is division hashing?

A

store the key in position k % m, where m is the size of the array. Choose m to be a prime not too close to a power of 2 for best results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do we convert characters to integers?

A

Could do A = 0, B = 1. But a better solution is to take account of position e.g multiply ASCII values by a number raised to power of position.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two simple collision resolution strategies for hash tables?

A

Linear probing: If a position is already occupied store the key in h(k) + 1, etc, wrapping around the array until a spot is found or the whole array has been examined. This can lead to primary clusters, which tend to increase in size.
Quadratic probing: Same as linear but step size is quadratic, can lead to secondary clustering and wasted space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is double hashing?

A

h(k, i) = (h(k) + i . g(k))%m
Where i is the number of collisions so far, h does division hashing by tablesize m, and g is a secondary hash function (typically something like 1 + k%(m-1)). It is helpful if m is prime.
Double hashing helps to prevent primary and secondary clustering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What happens as hash tables get full? What can we do?

A

Gaps become few and more collisions occur, losing the O(1) property. If this occurs it may be necessary to re hash everything to a new table. Alternatively each location could maintain chains.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is open addressing? How do we remove items in hash tables?

A

hashing without chains. To remove items we must use lazy deletion in case collisions have moved other items further up.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is universal hashing?

A

hash functions are chosen randomly before creation of hash table. If two random keys are chosen the chance for them to end up in the same location should be no more than 1/m where m is the size of the htable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we get universal hash functions?

A

Choose a prime number p big enough that every possible key k is less than it and choose a tablesize less than p.
Now use the equation h(k) = ((ak+b) % p) % m.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is perfect hashing? How do we do it?

A

If the set of keys is static(no insertions or deletions) we can design the hash function to get a worst case search = O(1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can we get a perfect hash function?

A

If the table is size n^2 the chances of even one collision is less than half. We can try a few at random, remember the parameters and use to place values in the table. This is bad as n^2 storage for n keys wastes a lot of space. So instead we can build a hash table from small secondary hash tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does the primary/secondary hash function creation work?

A

choose size m = n, choose p > biggest key value. Choose a and b randomly to get the primary hash function: h(k) = ((ak+b)%p)%m.
Keep a count c of how many keys hash to location i. If the sum of all the c^2 is greater than > 2n, find a new primary hash function. If the primary hash function is good enough, for each slot get the secondary hash function by leaving p as p, setting m to c^2 and choosing a and b randomly. Check to make sure no collisions occur in secondary hash tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Cuckoo hashing?

A

use two hash functions, so that every key k gets two addresses h(k) and g(k), start by inserting at h(k), if something is already there, move the current occupant to its second address. Repeat this process if necessary. If this loops(which is theoretically rare) an expensive rehash will be necessary. Otherwise it will stop when an empty cell is found.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly