Hashing Flashcards

1
Q

Hash Table

A
  • Data structure used to implement sets and dictionaries
  • Implemented with an array (often an array of linked lists) and a hash function
  • Good performance time
  • But ineffective for ordering/sorting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Collisions

A
  • Collisions occur when multiple elements map to the same index
  • 2 ways to resolve collissions
    1. Open addressing
    2. Separate Chaining
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Clustering

A
  • Clustering is a situation in which collisions cause elements to group around the same index
  • Clustering is the result of open addressing
  • Clustering slows down performance speed
  • The larger the cluster, the more likely more collisions will occur
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hash Functions

A
  • Function used to compute index of item in hash table

Often consists of 2 things

  1. Hash function generates a new #
  2. Then you modulo that # by array_size to compute index

5 things make a good hash function

  1. Use all data being hashed
  2. Use only data being hashed
  3. Should be deterministic
  4. Map similar keys to different values
  5. Hash values should be distributed evenly
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Load Factor

A
  • Ratio of elements in hash table to number of buckets
  • N/M
  • Where M is the size of the array
  • As load factor grows, hash table becomes slower
  • With open addressing, it’s the percentage of occupied cells
    • Good load factor is between 1/8 and 1/2
  • With separate chaining, it’s the average length of the linked lists
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Hashing

Folding Method

A
  • A type of hash function for integers

Steps

  1. Break #’s into equal chunks (groups of 2 or 3, etc.)
  2. Sum chunks to produce hash value
  3. Mod the result
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Hashing

Multiplication Method

A
  • A type of hash function for numbers
  • Uses the following formula

h(k) = floor[m(kA mod 1)]

  • m ⇒ size of the array
  • k ⇒ key
  • A ⇒ number between 0 and 1
    • A good choice is the golden ratio
    • (root(5) - 1)/2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Hashing

Strings

A
  • A common approach to hashing strings is to convert each char to a # and sum

Steps

  1. Loop through char, adding up ascii values
  2. Mod the total
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Open Addressing

A
  • An implementation of hash functions that involves “probing” for free slots when collisions occur
  • If slot is occupies, insert into next free slot
  • If you reach end of array, loop through to beginning and search from top
  • This approach can cause clustering
  • Also, hash table is of fixed size, so must resize (and rehash) periodically
  • 2 versions
    1. Linear probing
    2. Quadratic probing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Linear Probing

A
  • A form of open addressing
  • Inervals between probes is a fixed constant (usually 1)
  • Uses formula

h(x) + c

  • Where size of interval c >= 1
  • Using larger intervals reduces clustering
  • Ex: if index 3 is filled, and interval == 2, then check 5, 7, 9, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Quadratic Probing

A
  • A form of open addressing
  • Intervals between probes increase quadratically
  • This approach decreases clustering
  • Uses formula:

i(k) + c1p+ c2p2

  • Where k is the key, and i(k) is the hash index, and p the number of probes
  • Note you can include a linear component, but c1 is often 0
  • Ex: if index 1 is filled, and c1 = 0 and c2 = 1, then check slots 2, 5, 10, 17, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Separate Chaining

A
  • An implementation of hash tables that uses an array of linked lists
  • Each index is the head of its own list
  • Prevents clustering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Hash Table

Complexity

A
  • Applies to both linear probing and separate chaining
  • Applies to all operations (insertion, deletion, and search (get))

Complexity

Time

Average/Amortized: O(1) ♦ Worst: O(n)

Space

O(1)

*Worst case delete/search from looping through elements that hash to same value, worst case insert from rehashing operation (reallocating array)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly