- A technique used for performing insertions, deletions, and searches on constant average time - Tree operations that require any ordering information among the elements are not supported efficiently

-Direct-address tables are ordinary arrays - Facilitate direct addressing - Element whose key is k is obtained by indexing into the kth position of the array - It is applicable when we can afford to allocate an array with one position for every possible key - Insert/Delete/Search can be implemented to take O(1) time

- We can make the array smaller, and use the hash function to select which slot we put our values into. For example the hash function = modulus function - Basically, we are compressing the domain of our keys so that they fit into the array

- Decide what to do when two keys hash to the same value (collision) - Choose a good hash function - Choose the size of the table

Hash Tables Flashcards by Ivonne Pena

Hashing

A technique used for performing insertions, deletions, and searches on constant average time
Tree operations that require any ordering information among the elements are not supported efficiently

How well did you know this?

Not at all

Perfectly

Hashing Data Structure

Hash Table (generalization of ordinary arrays) is used to implement hashing.

How well did you know this?

Not at all

Perfectly

Hash Table Components

Key and entry

How well did you know this?

Not at all

Perfectly

How to find an entry?

To find an entry in the table, you only need to know the content of one of the fields (not all of them). This field is called key
Ideally, a key uniquely identifies an entry

How well did you know this?

Not at all

Perfectly

Direct Adress table

-Direct-address tables are ordinary arrays

Facilitate direct addressing
Element whose key is k is obtained by indexing into the kth position of the array
It is applicable when we can afford to allocate an array with one position for every possible key
Insert/Delete/Search can be implemented to take O(1) time

How well did you know this?

Not at all

Perfectly

Disadvantage of direct-address table

The disadvantage is that we could end up with a big array that has a lot of empty space. This wastes lots of memory. The bigger we scale, the worse this problem gets.

How well did you know this?

Not at all

Perfectly

Cons direct-address table

If U is large, then a direct-address table T of size |U| is impractical or impossible (considering memory)
->The range of keys determines the size of T

-The set K could be so small relative to U that most of the allocated space is wasted

How well did you know this?

Not at all

Perfectly

Direct-Address table vs. Hash Table

-Direct-address table
o Element with key k is stored in slot k

-Hash table
o Use hash function h(k) to compute the slot where k will be inserted

How well did you know this?

Not at all

Perfectly

Hash table big idea

We can make the array smaller, and use the hash function to select which slot
we put our values into. For example the hash function = modulus function
Basically, we are compressing the domain of our keys so that they fit into the array

How well did you know this?

Not at all

Perfectly

Hash table size

-The size of the hash table is proportional to |K|
o We lose the direct-addressing ability
o We need to define functions that map keys to slots of the hash table

How well did you know this?

Not at all

Perfectly

Hash function definition

-A hash function h transforms a key k into a number that is used as an index in an array to locate the desired location (“bucket” or “slot”)

o An element with key k hashes to slot h(k) 11.2 Hash tables
o h(k) is the hash value of key k

How well did you know this?

Not at all

Perfectly

Hash function pros

oSimple to compute
oAny two distinct keys get different cells
oThis is impossible, thus we seek a hash function that distributes the keys evenly among the
cells

How well did you know this?

Not at all

Perfectly

Hash table components

The ideal hash table is an array of some fixed size m, containing items (key and additional entry)
Each key k is mapped (using a hash function) into some number in the range 0 to m – 1 and places in the appropriate cell

How well did you know this?

Not at all

Perfectly

Hash table big O for operations

-Insertion of a new entry is O(1)

-Lookup for data is O(1) on average
oStatistically unlikely that O(n) will occur

How well did you know this?

Not at all

Perfectly

Issues with hashing

Decide what to do when two keys hash to the same value (collision)
Choose a good hash function
Choose the size of the table

How well did you know this?

Not at all

Perfectly

Hashing Division method

Study These Flashcards

-Maps key k into one of m slots by taking the remainder of k divided by m 
o h(k) = k % m

Collision

Study These Flashcards

-A hash function maps most of the keys onto unique integers, but maps some
other keys onto the same integer

Multiple keys can hash to the same slot: collisions are unavoidable
Design a good hash function will minimize the number of collisions that occur, but they will still happen, so we need to be able to deal with them

Basic collision resolution policies

Study These Flashcards

–Chaining
> Store all elements that hash to the same slot in a linked list
> Store a pointer to the head of the linked list in the hash table slot
–Open Addressing
> All elements stored in hash table itself
> When collision occur, use a systematic procedure to store elements in free slots of the table

Basic collision resolution policies

Study These Flashcards

–Chaining
> Store all elements that hash to the same slot in a linked list
> Store a pointer to the head of the linked list in the hash table slot
–Open Addressing (linear, quadratic, double hashing)
> All elements stored in hash table itself
> When collision occur, use a systematic procedure to store elements in free slots of the table (linear, quadratic, double hashing)

Chaining Insert Pseudocode and running time

Study These Flashcards

CHAINED-HASH-INSERT(T,x)
insert x at the head of list T[h(x.key)]

Worst case: O(1)

Chaining search pseudocode and running time

Study These Flashcards

CHAINED-HASHSEARCH(T,k)
search for an element with key k in list T[h(k)]

Worst case: proportional to the length of the list

Chaining delete pseudocode and running time

Study These Flashcards

CHAINED-HASH-DELETE(T,x)
delete x from the list T[h(x.key)]

Worst-case running time for deletion is O(1) if list is doubly linked

When is chaining used?

Study These Flashcards

Chaining is used when we cannot predict the number of records in advance, thus the choice of the size of the table depends on available memory
Typically it’s chosen relatively small, so no large area of contiguous memory is used; but large enough so that the lists are short for more efficient search

Open Addressing

Study These Flashcards

Idea: Store all elements in the hash table itself. That is, each slot contains either an element or NIL
Advantages: Avoids pointers

Open Addressing methods

- linear probing - quadratic probing - double hashing

Open Addressing linear probing hash function

If collision occurs, next probes (slots examined during a key insertion or searching) are performed following the formula: -> h(k)=(hash(k)+ f (i)) mod m ``` o h(k) is the index in the table indicating where to insert k o hash(k) is the auxiliary hash function o f(i) is the collision resolution function with i = 0 ... m – 1 o m the size of the hash table ```

Open Addressing linear probing hash function

Open addressing insert pseudocode

§ Insert: Examine slot h(k) o If slot is empty, insert element o If slot is full, try another slot, ..., until an open slot is found (probing) ü Wrapping around to the beginning if we hit the end of the table ``` Insert(T,k) I←0 repeat j ← h(k,i) if T[j] = NIL then T[j] ← k return j else i←i+1 until i = m ``` error “hash table overflow”

Open addressing search pseudocode

§ Search: Examine slot h(k) o If slot h(k) contains key k, search successful o If the slot h(k) contains NIL, search unsuccessful o If slot h(k) contains a key that is not k, keep probing (by following the same sequence of probes when inserting the element) until we either find key k or we find a slot holding NIL ``` Search(T,k) i←0 repeat j ← h(k,i) if T[j] = k then return j i←i+1 until T[j] = NIL or i = m return NIL ```

Open addressing deletion

-If we delete a key from slot h(k) we cannot mark the slot NIL. Why? o We need to: -> Mark the slot with a special value ->Modify SEARCH so it keeps on looking when it sees the special value -> INSERT would treat the slot with a special value as empty slot, so a new value can be added

Quadratic probing

§ Similar to linear probing § ... but instead of using a linear function to advance in the table in search of an empty slot, we use a quadratic function to compute the next index in the table to be probed h(k)=(hash(k)+i^2) mod m i = 0...m−1 § The idea is to skip regions in the table with possible clusters

Quadratic probing disadvantage

Secondary Clustering

Linear probing disadvantage

Primary Clustering § We can see clusters form (blocks of adjacent buckets all occupied) § Thus increasing the time it takes to do insert and search for any key that hashes to a buckets in that cluster

Double Hashing

- Purpose same as in quadratic probing: to overcome the disadvantage of clustering - Instead of examining each successive entry following a collided position, we use a second hash function to get a fixed increment for the “probe” sequence § hash1 gives the initial probe. hash2 gives the remaining probes § There are a couple of requirements for the second function: o It must never evaluate to 0 o Must make sure that all buckets can be probed h(k)=(hash1(k)+i * hash2 (k)) mod m i = 0...m−1

Popular second hash function

hash2 (k) = R − (k mod R) , where R is a prime number that is smaller than the size of the table

Hash Tables Flashcards

(35 cards)