4. Sorting Algorithms Flashcards
What notations are used to refer to collections of comparable elements by Algorithms in a Nutshell?
A collection of comparable elements A is presented to be sorted in place; the notations A[i] and ai are used to refer to the ith element of the collection. By convention, the first element in the collection is A[0]. A[low, low + n) is used to refer to the subcollection A[low] … A[low + n — 1] of n elements, whereas A[low, low + n] contains n + 1 elements.
What are the requirements of a sorted collection?
To sort a collection, the elements of A must be reorganized such that if A[i] < A[j], then i < j. If there are duplicate elements, these elements must be contiguous in the resulting ordered collection—that is, if A[i] = A[j] in a sorted collection, then there can be no k such that i < k < j and A[i] ≠ A[k]. Finally, the sorted collection A must be a permutation of the elements that originally formed A.
Where might the collection of elements be stored in a computer?
The collection may already be stored in the computer’s random access memory (RAM), but it might simply exist in a file on the filesystem, known as secondary storage. The collection may be archived in part on tertiary storage (such as tape libraries and optical jukeboxes), which may require extra processing time just to locate the information; in addition, the information may need to be copied to secondary storage (such as hard disk drives) before it can be processed.
What are the common forms that information stored in RAM typically takes?
Pointer-based or value-based.
What are the benefits of pointer-based and value-based storage?
Using pointer-based storage, an array of information contains pointers to the actual information rather than storing the information itself. Such an approach enables arbitrarily complex records to be stored and sorted. By contrast, value-based storage packs a collection of n elements into record blocks of a fixed size, s, which is better suited for secondary or tertiary storage.
How can value-based storage be accessed?
The information is contiguous and can be viewed as a one-dimensional array B[0, n*s), where n is the number of elements and s is the size of each element. Note that B[r*s + c] accesses the cth character of the rth element (where c ≥ 0 and r ≥ 0); also, the ith element of the collection (for i ≥ 0) is the subarray B[i*s, (i + 1)*s).
How do sort algorithms work with secondary storage?
Information is usually written to secondary storage as a value-based contiguous collection of bytes. Sorting algorithms can be written to work with disk-based information by implementing swap functions that transpose bytes within the files on disk; however, the resulting performance will differ because of the increased input/output costs in accessing secondary storage. Merge Sort is particularly well-suited for sorting data in secondary storage.
How are elements in a collection compared?
The elements in the collection being compared must admit a total ordering. That is, for any two elements p and q in a collection, exactly one of the following three predicates is true: p = q, p < q, or p > q. Commonly sorted primitive types include integers, floating-point values, and characters. When composite elements are sorted (such as strings of characters), lexicographical ordering is imposed on each individual element of the composite, thus reducing a complex sort into individual sorts on primitive types. Strings, for example, are compared by comparing each individual letter, from left to right, until a string runs out of characters or an individual character in one string is different from its partner in the other string.
How are more complex comparison scenarios handled with strings?
The question of ordering is far from simple when considering capitalization (is “A” greater than “a”?), diacritical marks (is “è” less than “ê”?), and diphthongs (is “æ” less than “a”?). The Unicode standard uses encodings, such as UTF-16, to represent each individual character using up to four bytes. The Unicode Consortium has developed a sorting standard (known as “the collation algorithm”) that handles the wide variety of ordering rules found in different languages and cultures.
What is the requirement of sorting algorithms to operate on a collection of elements?
A comparator function, cmp
, which compares element p to q and returns 0 if p = q, a negative number if p < q, and a positive number if p > q. If the elements are complex records, the cmp
function might only compare a “key” value of the elements.
When is a sorting algorithm considered to be stable?
When the comparator function cmp
determines that two elements, ai and aj, in the original unordered collection are equal, it may be important to maintain their relative ordering in the sorted set—that is, if i < j, then the final location for ai must be to the left of the final location for aj. Sorting algorithms that guarantee this property are considered to be stable. An unstable algorithm pays no attention to the relationships between element locations in the original collection (it might maintain relative ordering, but it also might not).
What are some qualitative criteria for choosing a sorting algorithm?
- Are there only a few items? Use Insertion Sort.
- Are the items mostly sorted already? Use Insertion Sort.
- Is there concern about worst-case scenarios? Use Heap Sort.
- Is there interest in a good average-case behavior? Use Quicksort.
- Are items drawn from a uniform dense universe? Use Bucket Sort.
- Is there desire to write as little code as possible? Use Insertion Sort.
- Is stable sort required? Use Merge Sort.
What is transposition sorting?
Early sorting algorithms found elements in the collection A that were out of place and moved them into their proper position by transposing (or swapping) elements in A. Selection Sort and (the infamous) Bubble Sort belong to this sorting family, as well as Insertion Sort and Heap Sort.
How does Insertion Sort work?
Insertion Sort repeatedly invokes a helper function to ensure A[0, i] is properly sorted; eventually, i reaches the rightmost element, sorting A entirely. A is sorted in place by incrementing pos = 1 up to n — 1 and inserting the element A[pos] into its rightful position in the growing sorted region A[0, pos].
When should Insertion Sort be used?
Use Insertion Sort when you have a small number of elements to sort or the elements in the initial collection are already “nearly sorted”. Determining when the array is “small enough” varies from one machine to another and by programming language. Indeed, even the type of element being compared may be significant.
What are the best and worst case scenarios for Insertion Sort?
The optimal performance occurs when the array is already sorted, and arrays sorted in reverse order produce the worst performance for Insertion Sort. If the array is already mostly sorted, Insertion Sort does well because there is less need to transpose elements.
What are the space requirements of Insertion Sort?
Insertion Sort requires very little extra space to function; it only needs to reserve space for a single element. For value-based representations, most language libraries offer a block memory move function to make transpositions more efficient.
What is the best case performance of Insertion Sort?
In the best case, each of the n items is in its proper place and thus Insertion Sort takes linear time, or O(n). Though it may seem an uncommon scenario, it is important because Insertion Sort is the only comparison-based sorting algorithm that has this best-case behavior.
Should Insertion Sort be used in production code?
Much real-world data is already partially sorted, so Insertion Sort could be an effective algorithm to use. The efficiency of Insertion Sort increases when duplicate items are present, since there are fewer swaps to perform.
What is the average performance of Insertion Sort?
Insertion Sort does not perform as well when all n items are distinct and the array is randomly organized (i.e. all permutations of the data are equally likely) because each item starts on average n/3 positions in the array from its final position. In the average and worst case, each of the n items must be transposed a linear number of positions, thus Insertion Sort requires O(n2) quadratic time.
How does Insertion Sort perform on value-based data?
Insertion Sort operates inefficiently for value-based data because of the amount of memory that must be shifted to make room for a new value. Implementations improve by using block memory moves rather than individual memory swapping. Still, as the array size doubles, the performance time approximately quadruples. Even with the bulk move improvement, Insertion Sort still remains quadratic.
How does Insertion Sort perform on pointer-based data?
When Insertion Sort operates over pointer-based input, swapping elements is more efficient; the compiler can even generate optimized code to minimize costly memory accesses.
How does Selection Sort work?
Selection Sort selects the largest value from the range A[0, n) and swaps its location with the rightmost element A[n-1]. This process is repeated, subsequently, on each successive smaller range A[0, n-1) until A is sorted. It is an example of a Greedy approach.
How does Selection Sort perform?
Selection Sort is one of the slowest of all the sorting algorithms; it requires quadratic time even in the best case (i.e., when the array is already sorted). It repeatedly performs almost the same task without learning anything from one iteration to the next. Selecting the largest element, max, in A takes n - 1 comparisons, and selecting the second largest element, second, takes n - 2 comparisons—not much progress! Many of these comparisons are wasted, because if an element is smaller than second, it can’t possibly be the largest element and therefore has no impact on the computation for max.
What is a heap?
A heap is a binary tree whose structure ensures two properties:
Shape Property
A leaf node at depth k > 0 can exist only if all 2k-1 nodes at depth k -1 exist. Additionally, nodes at a partially filled level must be added “from left to right.” The root node has a depth of 0.
Heap Property
Each node in the tree contains a value greater than or equal to either of its two children, if it has any.
How can a heap be represented by an array?
Given the rigid structure imposed by the shape property, a heap can be stored in an array A without losing any of its structural information. The root is labeled 0. For a node with label i, its left child (should it exist) is labeled 2*i + 1; its right child (should it exist) is labeled 2*i + 2. Similarly, for a non-root node labeled i, its parent node is labeled ⌊ (i-1)/2 ⌋. Using this labeling scheme, a heap can be stored in an array by storing the element value for a node in the array position identified by the node’s label.
How can Heap Sort be implemented?
Heap Sort sorts an array A by first converting that array in place into a heap using buildHeap which makes repeated calls to heapify. heapify(A, i, n) updates A to ensure that the tree structure rooted at A[i] is a valid heap. Large numbers are eventually “lifted up” in the resulting heap (which means they are swapped in A with smaller elements to the left). Generally the number of element pairs swapped is far fewer than the total number of elements swapped in Insertion Sort.
How does Heap Sort work?
Heap Sort processes an array A of size n by treating it as two distinct subarrays, A[0, m) and A[m, n), which represents a heap of size m and a sorted subarray of n - m elements, respectively. As i iterates from n - 1 down to 1, Heap Sort grows the sorted subarray A[i, n) downward by swapping the largest element in the heap (at position A[0]) with A[i]; it then reconstructs A[0, i) to be a valid heap by executing heapify. The resulting nonempty subarray A[i, n) will be sorted because the largest element in the heap represented in A[0, i) is guaranteed to be smaller than or equal to any element in the sorted subarray A[i, n).
How does Heap Sort compare to Quicksort?
Heap Sort is not a stable sort. Heap Sort avoids many of the nasty (almost embarrassing!) cases that cause Quicksort to perform badly. Nonetheless, in the average case, Quicksort outperforms Heap Sort.