HashSets Flashcards

(14 cards)

1
Q

What is a HashSet?

A

A HashSet is a data structure in which elements are placed in their respective array locations based on their hash code. This optimizes the efficiency of the add(), remove(), and contains(). That is, they are O(1) operations.

In Java, the HashSet class implements the Set interface. Therefore, as with all Sets, it does not admit duplicates. It also does not store elements in sorted order, which is why the above operations have constant complexity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What limitations of a LinkedList does a HashSet overcome?

A

The contains( ) method for a LinkedList is O(n) whereas this is O(1) for a HashSet.

The add, remove, and contains operations of a HashSet are all O(1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is a HashSet constructed?

A

HashSets are constructed with hash tables in which elements are grouped into smaller collections of elements that share similar characteristics. The groupings are computed with hash codes and stored in LinkedLists, referred to as “buckets” in this context. Therefore, the implementation is an array of LinkedLists or an array of buckets into which elements are grouped.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hash function / hash code

A
A function that computes a hash code, a unique integer value, for an object. Becausing hashing is so important, the Object class has a hashCode method:
int hash = x.hashCode();
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is “compression”?

A

Given that it’s not possible to create an underlying an array for a HashSet with all possible integer values of the objects’ hash codes, we must pick an array of some reasonable size and then “compress” each hash code to become a valid array index. Compresion can easily be achieved by using the remainder operation:
int h = x.hashCode( );
position = h % arrayLength;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a “collison”?

A

A collision occurs when two objects have the same unique integer identifier or hash code. A good hash function minimizes collisions so no two objects have the same hash code. However, after compression, it is more likely that several objects will collide.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is “open addressing”?

A

Open addressing is a solution approach to hash collisions in which a new element is placed in the next available index (including wraparound indices) in the array if its own index is already occupied.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is “separate chaining”?

A

Separate chaining is a solution approach to hash collisions in which more than one element can be associated with the same index / bucket of the underlying array. This is typically achieved by placing the elements in a LinkedList at that index, with new nodes being chained / added to the existing LinkedList in an occupied slot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is “simple uniform hashing”?

A

The assumption that each element is equally as likely to belong to any given bucket. That is, the distribution of elements is equal across buckets. This allows us to assume that the runtime of the line of code in a HashSet function like bucket.contains(value) in the contains method is O(1) as opposed to O(n).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is one risk / potential inefficiency of HashSets?

A

When a HashSet gets overly full, the O(1) operation complexity can be compromised. HashSets can can devolve into LinkedLists if all the elements end up in the same bucket and searching now becomes O(n). If the number of elements far exceeds the number of buckets then searching is no longer O(1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a “load factor”?

A

An attribute of the HashSet object that represents the acceptable average load, or number of elements, for each bucket. This is typically a value between 0 and 1. The hash table in the standard Java library reallocates the table when the load factor exceeds .75.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can a HashSet automatically adjust itself if it is becoming inefficient?

A

By adding more buckets when it gets too full. That is, when the average number of elements per buckets exceeds the load factor it dynamically resizes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How are HashSets implemented in the Java API?

A

It is a class that uses Java generics in the Java.util package and provides the following contructors and methods:

Constructors:

  • HashSet(): Constructs a new, empty, set with default initial capacity (16) and load factor (0.75)
  • HashSet(Collection Extends E> c): Constructs a new set containing all of the elements in the input collection c
  • HashSet(int initialCapacity): Constructs a new, empty, set with the specified initial capacity and default load factor (0.75)
  • HashSet(int initialCapacity, float loadFactor): Constructs a new, empty, set with the specified initial capacity and load factor

Methods:

  • add(E e): Adds the specified element to the set if it is not already present.
  • remove(Object o): Removes the specified element from the set if it is present. Returns true if an element is removed, and false otherwise.
  • contains(Object o): Returns true if the set contains the specified element, and false otherwise.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why doesn’t the HashSet class provide a method to add an object to a specific index?

A

Because it makes no sense to add an element at a particular position in a Set, since a Set can order the elements any way it likes to make the add, remove, and contains methods most efficient. Therefore, you always add elements directly to the Set, not at a specific position.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly