DC2: Linear-time Median Flashcards

1
Q

Explain how we get from the median problem to the more general “find the kth smallest item problem”?

A

The median of a list A is the ceiling(n/2)th smallest item. If for odd n = 2L + 1, the median is therefore the L + 1 smallest.

So solving for the kth smallest item is a generalization of the median problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the easy algorithm for finding the kth smallest item?

A

If the list is sorted, we can just iterate over it until we get to the kth item. So the easy algorithm would be sort, then iterate. Many sorting algorithms (like MergeSort) are O(nlogn), so we need to do better than that to beat the easy algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe the basics of the QuickSelect algorithm

A

QuickSelect finds the kth smallest item in list A, so it’s input are A and k.

Select(A, k):
1) Select a pivot p (this is the hard part)
2) Partition A in A<p, A=p, and A>p
3) If k < |A<p|, return Select(A<p, k)
If |A<p| < k <= |A<p| + |A=p|, return (p) (because the kth item must be in A=p)
If k > |A<p| + |A=p|, return Select(A>p, k - |A<p| - |A=p|)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give the pseudocode for FastSelect(), the O(n) algorithm for finding the kth smallest item of A.

A

FastSelect(A, k):
1) Divide A into n/5 chunks, G_1…G_n/5
2) For i = 1 to n/5, set m_i = median(G_i)
3) Let S = {m_1, …, m_n/5}
4) Let p = FastSelect(S, n/5 * 1/2)
5) Partition A into A<p, A=p, and A>p
6) If k < |A<p|, return FastSelect(A<p, k)
else if k > |A<p| + |A=p|, return FastSelect(A>p, k - |A<p| - |A=p|)
else return (p) (because the kth item must be in A=p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do we find a good pivot p in O(n) time?

A

Instead of median, we look for “approximate median”, i.e., the value within some band around the median. Like the n/4th smallest to the 3n/4th smallest elements.

This gives a recurrence like T(n) = T(3n/4) + O(n) = O(n). This is true for any fraction q between 0 and 1: T(n) = T(qn) + O(n).

So a good pivot is any pivot p for which |A<p| <= 3n/4 and |A>p| <= 3n/4. Then we can find it in O(n) time.

Note T(n) = T(3n/4) + T(n/5) + O(n) is still O(n), because 3/4 + 1/5 < 1. So we have T(n/5) + O(n) time to find a good pivot (T(3n/4) is the time to do the rest of the algorithm).

That means we need to choose a subset S of size n/5, then set p = median(S).

S needs to be a representative sample. For each x in S, a few elements of A should be smaller and a few should be larger.

To accomplish this we take each n/5 subsets of A, and sort each of them (because they are fixed size 5, this takes O(1) time). then we take the median from each group of 5 and add it to S.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly