HashSets Flashcards
(14 cards)
What is a HashSet?
A HashSet is a data structure in which elements are placed in their respective array locations based on their hash code. This optimizes the efficiency of the add(), remove(), and contains(). That is, they are O(1) operations.
In Java, the HashSet class implements the Set interface. Therefore, as with all Sets, it does not admit duplicates. It also does not store elements in sorted order, which is why the above operations have constant complexity.
What limitations of a LinkedList does a HashSet overcome?
The contains( ) method for a LinkedList is O(n) whereas this is O(1) for a HashSet.
The add, remove, and contains operations of a HashSet are all O(1).
How is a HashSet constructed?
HashSets are constructed with hash tables in which elements are grouped into smaller collections of elements that share similar characteristics. The groupings are computed with hash codes and stored in LinkedLists, referred to as “buckets” in this context. Therefore, the implementation is an array of LinkedLists or an array of buckets into which elements are grouped.
Hash function / hash code
A function that computes a hash code, a unique integer value, for an object. Becausing hashing is so important, the Object class has a hashCode method: int hash = x.hashCode();
What is “compression”?
Given that it’s not possible to create an underlying an array for a HashSet with all possible integer values of the objects’ hash codes, we must pick an array of some reasonable size and then “compress” each hash code to become a valid array index. Compresion can easily be achieved by using the remainder operation:
int h = x.hashCode( );
position = h % arrayLength;
What is a “collison”?
A collision occurs when two objects have the same unique integer identifier or hash code. A good hash function minimizes collisions so no two objects have the same hash code. However, after compression, it is more likely that several objects will collide.
What is “open addressing”?
Open addressing is a solution approach to hash collisions in which a new element is placed in the next available index (including wraparound indices) in the array if its own index is already occupied.
What is “separate chaining”?
Separate chaining is a solution approach to hash collisions in which more than one element can be associated with the same index / bucket of the underlying array. This is typically achieved by placing the elements in a LinkedList at that index, with new nodes being chained / added to the existing LinkedList in an occupied slot.
What is “simple uniform hashing”?
The assumption that each element is equally as likely to belong to any given bucket. That is, the distribution of elements is equal across buckets. This allows us to assume that the runtime of the line of code in a HashSet function like bucket.contains(value) in the contains method is O(1) as opposed to O(n).
What is one risk / potential inefficiency of HashSets?
When a HashSet gets overly full, the O(1) operation complexity can be compromised. HashSets can can devolve into LinkedLists if all the elements end up in the same bucket and searching now becomes O(n). If the number of elements far exceeds the number of buckets then searching is no longer O(1).
What is a “load factor”?
An attribute of the HashSet object that represents the acceptable average load, or number of elements, for each bucket. This is typically a value between 0 and 1. The hash table in the standard Java library reallocates the table when the load factor exceeds .75.
How can a HashSet automatically adjust itself if it is becoming inefficient?
By adding more buckets when it gets too full. That is, when the average number of elements per buckets exceeds the load factor it dynamically resizes.
How are HashSets implemented in the Java API?
It is a class that uses Java generics in the Java.util package and provides the following contructors and methods:
Constructors:
- HashSet(): Constructs a new, empty, set with default initial capacity (16) and load factor (0.75)
- HashSet(Collection Extends E> c): Constructs a new set containing all of the elements in the input collection c
- HashSet(int initialCapacity): Constructs a new, empty, set with the specified initial capacity and default load factor (0.75)
- HashSet(int initialCapacity, float loadFactor): Constructs a new, empty, set with the specified initial capacity and load factor
Methods:
- add(E e): Adds the specified element to the set if it is not already present.
- remove(Object o): Removes the specified element from the set if it is present. Returns true if an element is removed, and false otherwise.
- contains(Object o): Returns true if the set contains the specified element, and false otherwise.
Why doesn’t the HashSet class provide a method to add an object to a specific index?
Because it makes no sense to add an element at a particular position in a Set, since a Set can order the elements any way it likes to make the add, remove, and contains methods most efficient. Therefore, you always add elements directly to the Set, not at a specific position.