Hash Table Flashcards

1
Q

what are the properties of a hash function?

A

it’s typically one-way and u lose some data when u hash something , it’s repeatable , and deterministic which means that every time i put the same thing in the hash function i expect the outcome to be the same each time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are hashing rules?

A

1 - hashing must be deterministic under the same context
2- Two objects that are equal should return the same hash
3- the same hash might be produced by two different objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what could go wrong in while using hashing?

A

hash collision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the advantages of having a hash table?

A

it’s extremely fast in retrieving data also in inserting and deleting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the name of an entry in a hash table?

A

bucket

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

do u use whatever number the hash function produces to be the index?

A

No, u run it through another equation might be like % the size of the array to make it fit in the array allocated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how to manage collisions?

A

1- separate chaining:

where each bucket in the collision is actually another array or a linked list of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Hashing custom class

A

the default equality behavior of two classes that u check the address pointing to these objects if they are the same but if u want to override the equality method to check for the internal state then u must redefine the hashing mechanism for that object as to hash two of the same objects must return the same hash value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the difference between hash table and hash map in java?

A

hash table is used in multi-threading but with performance cost while hash-map is used in a single threaded application

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is a common problem to solve with hash tables?

A

1 -phone contact problem where u want to map each name to a phone number and each phone number to a name
2- finding two words that are the same in a document
3- spelling correction to search in a dictionary
4- compilers and interpreters
5- substring search
6- string commanalities
7- file & directory syncronization
9- cryptography

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how does hash tables get linked to dropbox or google drive?

A

as they are distributed storage systems they use hash tables if 3 users upload the same video they store it as one physical entity and just have the 3 users point to that

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how does distributed storage system work with the naive way?

A

u first upload the files then go through each file in the system and compare it bye by byte to see if it matches any other files
the draw backs of this of course that u already uploaded the file and it will be O(N^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how does distributed storage system work with the rabkin-karp’s algorithm?

A

if there is a hash already in the system then the files are the same upload and compare but if the files are different then it doesn’t exist then just upload
and the draw back of this that u will have to upload to compare it to the files already in the system and collisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how does distributed storage system work with several hashes?

A

each file before upload is hashed like 5 times then u will actually be sure that the file already exists in the system or not and we only now search with the hash not with the file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the solution to instant upload and storage optimization?

A

hash table with multiple hash functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is still the problem with online storage even though we solved it with a hash table?

A

that we r using only one hash table as millions of user request on the same hash table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is the problem with hash tables and big data?

A

very large amount of memory is needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what are the main operations of a hash tables?

A

insert , delete , search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what are the obstacles in hash tables?

A

1 - they keys are not integers

2 - gigantic memory hog

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is the solution to the keys that are not integers?

A

prehashing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is prehashing?

A

mapping keys to non negative integers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is the solution to the memory hog of the keys

A

hashing

23
Q

what is hashing?

A

it’s to reduce the prehashing output to be of a reasonable size to be put into a small array

24
Q

why does a collision happen?

A

because the number of elements in an array is much smaller than the number of keys available which is knows as the pigeon hog problem

25
Q

what is chaining?

A

it’s when two elements map to the same position u make that index a linked list of all the elements

26
Q

what is simple uniform hashing?

A

it’s an assumption that each key is equally likely to be hashed to the any slot of the table independent of where other keys hashing

27
Q

what are the methods of hashing function?

A

1-Division : which uses mod
2-Multiplication: which uses bit shifting
3-Universal Hashing:which uses random numbers and prime numbers

28
Q

what is the analysis of a hash table with chaining?

A

constant time in insertion , deletion , and lookup if we just want to say if a key exist return true other wise return false

29
Q

what is the biggest question with hash tables?

A

how to choose m? the length of the table

30
Q

how to solve the biggest question of the hash table?

A

start small then shrink or expand accordingly

31
Q

what to do when u want to grow or shrink a table?

A

u allocate a new place in memory and u copy all the elements and hashing them while u go

32
Q

why do u need a new hash function when u shrink or grow a table?

A

cuz a hash function depends on m the length of the table other wise u will just put ur elements in the same size as the last table

33
Q

how much should u increase your table?

A

2m and not m + 1 cuz we must think about the cost of insertion and deletion for m+1 it’s n^2 but for 2m it’s n

34
Q

what is table doubling?

A

it’s increasing the size of the table double when u need to increase it and it’s not too costly cuz it happens only on a couple of operations not each time like m+ 1

35
Q

what is amortization?

A

we calculate the running time not on each specific operation but we say that for a whole bunch of operations it takes T(N) amortized

36
Q

what are is the amount to decrease table for deletion?

A

1- if m = n/2 then decrease to m/2 but that will make insertion operations take n time
2- if m = n/4 decrease to m/2

37
Q

what is the benefit of table doubling?

A

u achieve amortized O(1)

38
Q

how to really achieve O(1) for each operation not amortized

A

by knowing beforehand they u are actually getting full and u prepare the extra space so once u actually get full u just switch over

39
Q

what is a problem that a hash table solves?

A

string matching:
two strings s & t u want to know does s occur as a substring of t
like google search for something in the entire web

40
Q

how to solve matching strings with a hash table?

A

by comparing hashes and not strings but we still have a problem of hashing each length minus a character of the big string

41
Q

how to solve the problem of hashing contiguous strings like s - 1 then s + 1 etc

A

by using rolling hashes

42
Q

what is karp-rabin algorithm?

A

it uses rolling hash table meaning this DS has three main functionalities append which adds a character to the end of the string and popLeft which removes a character from the beginning of a string and when we call a method on it it returns the hash value of the current string in the DS

43
Q

what is the problem and the solution of karp-rabin algorithm?

A

that two substring might have the same hash but still different strings because of collisions so we solve it by comparing them char by char if they have the same hash value

44
Q

what is the complexity analysis of karp-rabin?

A

O(1) OR O(length(s) + length(t) + #matching s)

45
Q

what is open addressing?

A

it’s another way to deal with collisions other than chaining and it uses the notion of probing

46
Q

what is probing?

A

it’s hashing the already hashed key to find an empty slot for it… it’s a hash function specifies the order of slots to probe for a key for (insert/delete/search)

47
Q

how to search using probing?

A

u compute the hash of the key if it equals what is in the slot then return it other wise continue probing until u find an empty slot or the number of trials exceeds the length of the array

48
Q

what is the problem with probing?

A

deleting like delete the second probe of an elements so when we search for it it’s actually found in the third probe but when we encounter the empty slot we just assume that it’s not found

49
Q

how to solve the delete problem with probing?

A

insert a flag in place of the deleted slot and when u encounter it u just continue searching

50
Q

what are some probing strategies?

A

1- linear probing

51
Q

what is the problem of linear probing?

A

clustering

52
Q

how to solve clustering of linear probing?

A

by double hashing

53
Q

what are somethings u don’t want to do while using a hashtable?

A

1- don’t rely on order
cuz it changes over time as the table grows
2- don’t insert while iterating
cuz when u insert the table might get bigger resulting in hashing the values in the table again which makes a different order for iteration
3- can’t have mutable keys
cuz it will cost u collisions

54
Q

what is universal hashing?

A

1- choose a hash function h from a set of hash functions randomly H
2 - Require H to be universal hash family
3 - the probability of two hashes with two different keys to be equal is minimal