7. Hashing Flashcards

1
Q

Why do we need hashing in addition to sorting in databases?

A

Sometimes, sorting is a bit overkill for the problem. In a lot of cases, all we want is to group the same value together, but we do not actually care about the order the values appear in (think GROUP BY or de-duplication).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why do we need special hashing algorithms for databases?

A

same as for sorting: we cannot fit all of our data in memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

External hashing algorithm - describe

A

In-memory buffer has B pages

  1. “divide” - use some hash function to put all records into B-1 partitions. We do this by using B-1 output buffers. When an output buffer fills up we flush the page to disk.
  2. If some partitions do not fit in memory (have >B-1 pages), partition them once more with another hash function. Repeat until all pages fit in memory.
  3. “conquer” - build a hash table for each partition with a common in-memory algorithm, concatenate all the tables together.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly