Hash Maps Flashcards

1
Q

What is a map data type? Whats another name for it? What does a map data type consist of?

A

Map data types are data types found in programming languages. In python, maps are known as dictionaries. Maps can also be known as associative array. Maps consist of key-value pairs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Is hash map the same as hash table?

A

Yes it’s the same thing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is time complexity of lookup/search in a hash table?

A

O(1) on average because to get the index of a key, we run it through the hash function which is a fixed number of steps, and once we know the index, searching at a known index is constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Whats the work flow for getting an index in a hash map based on a given key (two step process)

A

Pass the key through the hash function
Take modulo of the hashed key % capacity of hash map….this gives you the index relative to the array size

hash = hash_function(key)
index = hash % array_size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a hash function?

A

A hash function takes a key value of some type and returns an integer that corresponds to an index in an array

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When choosing/designing a hash function, what are the 3 properties you should have for an optimal function?

A

1) Determinism- The hash function should be consistent in the hashing that it does for values. A given input should always map to the same hash value.
2) Uniformity- the inputs should be mapped as evenly as possible over the output range. A non-uniform function can result in many collisions, where multiple elements are hashed to the same array index. So basically the hash function should as best as possible uniformly DISTRIBUTE filling buckets across the whole array
3) Speed- The hash function should have a low computational burden…in other words it should be fast

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a perfect hash function?

A

A perfect hash function is one that results in no collisions…every key gets a unique output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a minimally perfect hash function?

A

A minimally perfect hash function is one that results in no collisions for a table size that equals exactly the number of elements.

A minimally perfect hash function is one that results in no collisions for a table size that exactly equals the number of elements. A minimal perfect hash function is a perfect hash function that maps n keys to n consecutive integers – usually the numbers from 0 to n − 1 or from 1 to n.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a collision?

A

A collision is when two different keys are hashed to the same index. i.e same output for two different inputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is chaining and what is it used to address?

A

Chaining is a way to resolve collisions in a hash table. Chaining allows keys to be hashed to the same index by creating linked lists or buckets at every index. When two different keys are hashed to the same index, the key/values are simply chained as nodes in the bucket where they belong.

To accommodate multiple keys, linked lists can be used to store the individual keys that map to the same entry. The linked lists are commonly referred to as buckets or chains, and this technique of collision resolution is known as chaining.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Are linked lists the only data structure you can use to address chaining?

A

No, you can also use balanced binary trees or dynamic arrays

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In a chained hash table, accessing the value for a particular key would follow this procedure:

A

1) hash the key to get the index of the bucket
2) search the linked list in the correct bucket for the key. There may be multiple keys hashed into the same bucket, so we need to search the linked list for the specific key that was passed in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the “load factor” of a hash table. Whats its equation?

A

Load factor is the average number of elements in each bucket

The equation is 𝝺 = total number of elements in table/number of buckets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In a chained hash table, can the load factor be greater than 1?

A

Yes in a chained hash table, the load factor can be greater than 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

As load factor increases, what happens to the speed of operations on the table?

A

Speed decreases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

For a linked list-based chained table, the average number of links traversed for successful searches is…(hint: load factor plays a part)

A

𝝺 / 2

17
Q

For unsuccessful searches, the average number of links traversed is equal to…

A

the load factor, 𝝺

18
Q

What can you do to maintain strong performance speeds on a hash table?

A

To maintain strong performance with a hash table, we often double the number of buckets when the load factor reaches a certain limit (e.g., 8)

19
Q

When resizing a hash table, what do you have to do to all elements?

A

You have to rehash them based on the new capacity/new number of buckets

20
Q

What is the average-case complexity of a linked list-based chained hash table?

A
  1. Assume that the hash function has a good distribution.
  2. The AVERAGE case for all operations is O(𝝺).
    If the number of buckets is adjusted according to the load factor, then the number of elements is a constant factor of the number of buckets (i.e.,: 𝝺 = n/m = O(m)/m = amortized O(1)). This is because the hash table will get resized every time it reaches a certain amount of elements. So, assuming the hash function has good distribution, there will be less collisions as the hash table size increases. Thus, lookup will be O(1) because you wouldn’t have to iterate through long buckets to find the value you’re looking for for a given key
21
Q

What is the worst case time complexity of a hash table? When would this happen?

A

O(n). This would happen when all elements end up in the same bucket. This could be a result of a bad hash function and/or a large load factor with a lack of resizing.

22
Q

Aside from chaining, what is another form of collision resolution in hash tables? Briefly explain how it resolves collisions.

A

Open addressing. It works by probing for a next empty spot in the array if a collision occurs.

23
Q

How are hashed elements stored in the array in open addressing?

A

They are stored directly in the array. No linked lists or other data structures involving chaining.

24
Q

Whats the 3 step general process for inserting into hash table that uses open addressing collision resolution?

A

1) Hash the passed key for output of an index to be inserted in the array
2) If the index is empty, insert the element there and stop
3) If the index is not empty, iterate down the line looking for the next open spot using a probing method

25
Q

What is the process called for searching for a next open position in a hash map using open addressing collision resolution? What are three different kinds of it?

A

Probing.
Linear probing: index = index_initial + j (where j = 1, 2, 3,….)
Quadratic probing: index = index_inital + j^2 (where j = 1, 2, 3…)
Double hashing: index = index_inital + j*(h_2(key)) (where j = 1, 2, 3,…)
h_2 is a second independent hash function

Probing is only done when the initial index from the hash function results in a collision, meaning an element already exists at that index so we have to find the next open spot

26
Q

How do you know when to stop probing when searching in an open-address collision resolution?

A

You probe until you find the element you’re looking for or you encounter an empty spot

27
Q

What do you need to do when removing an element from a hash table thats using open addressing. Why do you need to do this?

A

You have to insert a special value called a tombstone when you remove an element. This is needed because without it it disrupts probing for an element you’re searching for because probing relies on stopping when you’ve found the first empty index or the element you’re searching for

28
Q

Can tombstones be replaced?

A

Yes, tombstones just denote a no longer occupied space” so when you add a new element it can be hashed into the space that the tombstone is occupying

29
Q

Does open addressing involve linked lists or buckets?

A

No it’s just an array

30
Q

What is clustering in open addressing? Which probing method is more likely to create clusters? Why?

A

Clustering is when elements are placed directly next to each other in adjacent indices, causing clusters of elements.

Linear probing is more likely to create clusters. This is because the probability of a new entry being added to an existing cluster increases as the size of the cluster increases, mainly because linear probing simply searches for the next adjacent available spot.

31
Q

What probing methods can be used to limit clustering?

A

Quadratic probing and double hashing

32
Q

Using open addressing, a table’s load factor absolutely cannot exceed what number? Why?

A
  1. Because a load factor of 1 means the array is full and you can therefore no longer insert new elements into it.
33
Q

What is a common solution to a full hash table? What is a good load factor to use this solution?

A

Resize it into a bigger table. Resizing could occur ar some threshold such as 0.75 load factor

34
Q

Why is it bad to have a very low load factor, especially in open-addressing hash tables?

A

This means that we have a lot of unused space allocated, so this is space inefficient.