Two-Heap Problems Flashcards

1
Q

Find the Median of a Number Stream

Design a class to calculate the median of a number stream. The class should have the following two methods:

  1. insertNum(int num): stores the number in the class
  2. findMedian(): returns the median of all numbers inserted in the class

If the count of numbers inserted in the class is even, the median will be the average of the middle two numbers.

Example 1

  1. insertNum(3)
  2. insertNum(1)
  3. findMedian() -> output: 2
  4. insertNum(5)
  5. findMedian() -> output: 3
  6. insertNum(4)
  7. findMedian() -> output: 3.5
A

As we know, the median is the middle value in an ordered integer list. So a brute force solution could be to maintain a sorted list of all numbers inserted in the class so that we can efficiently return the median whenever required. Inserting a number in a sorted list will take O(N) time if there are ‘N’ numbers in the list. This insertion will be similar to the Insertion sort. Can we do better than this? Can we utilize the fact that we don’t need the fully sorted list - we are only interested in finding the middle element?

Assume ‘x’ is the median of a list. This means that half of the numbers in the list will be smaller than (or equal to) ‘x’ and half will be greater than (or equal to) ‘x’. This leads us to an approach where we can divide the list into two halves: one half to store all the smaller numbers (let’s call it smallNumList) and one half to store the larger numbers (let’s call it largNumList). The median of all the numbers will either be the largest number in the smallNumList or the smallest number in the largNumList. If the total number of elements is even, the median will be the average of these two numbers.

The best data structure that comes to mind to find the smallest or largest number among a list of numbers is a Heap. Let’s see how we can use a heap to find a better algorithm.

We can store the first half of numbers (i.e., smallNumList) in a Max Heap. We should use a Max Heap as we are interested in knowing the largest number in the first half.

We can store the second half of numbers (i.e., largeNumList) in a Min Heap, as we are interested in knowing the smallest number in the second half.

Inserting a number in a heap will take O(logN), which is better than the brute force approach.

At any time, the median of the current list of numbers can be calculated from the top element of the two heaps.

The time complexity of the insertNum() will be O(logN) due to the insertion in the heap. The time complexity of the findMedian() will be O(1) as we can find the median from the top elements of the heaps.

The space complexity will be O(N) because, as at any time, we will be storing all the numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Sliding Window Median

Given an array of numbers and a number ‘k’, find the median of all the ‘k’ sized sub-arrays (or windows) of the array.

Example 1

Input: nums=[1, 2, -1, 3, 5], k = 2
Output: [1.5, 0.5, 1.0, 4.0]
Explanation: Lets consider all windows of size ‘2’:

[1, 2, -1, 3, 5] -> median is 1.5

[1, 2, -1, 3, 5] -> median is 0.5

[1, 2, -1, 3, 5] -> median is 1.0

[1, 2, -1, 3, 5] -> median is 4.0

Example 2

Input: nums=[1, 2, -1, 3, 5], k = 3
Output: [1.0, 2.0, 3.0]
Explanation: Lets consider all windows of size ‘3’:

[1, 2, -1, 3, 5] -> median is 1.0

[1, 2, -1, 3, 5] -> median is 2.0

[1, 2, -1, 3, 5] -> median is 3.0

A

This problem follows the Two Heaps pattern and share similarities with Find the Median of a Number Stream. We can follow a similar approach of maintaining a max-heap and a min-heap for the list of numbers to find their median.

The only difference is that we need to keep track of a sliding window of ‘k’ numbers. Which means, in each iteration, when we insert a new number in the heaps, we need to remove one number from the heaps which goes out of the sliding window. After the removal, we need to rebalance the heaps in the same way that we did while inserting.

The time complexity of our algorithm is O(N*K) where ‘N’ is the total number of elements in the input array and ‘K’ is the size of the sliding window. This is due to the fact that we are going through all the ‘N’ numbers and, while doing so, we are doing two things:

Inserting/removing numbers from heaps of size ‘K’. This will take O(logK)

Removing the element going out of the sliding window. This will take O(K) as we will be searching this element in an array of size ‘K’ (i.e., a heap).

The space complexity will be O(K) because, at any time, we will be storing all the numbers within the sliding window.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Maximize Capital

Given a set of investment projects with their respective profits, we need to find the most profitable projects. We are given an initial capital and are allowed to invest only in a fixed number of projects. Our goal is to choose projects that give us the maximum profit.

We can start an investment project only when we have the required capital. Once a project is selected, we can assume that its profit has become our capital.

Example 1

Input: Project Capitals=[0,1,2], Project Profits=[1,2,3], Initial Capital=1, Number of Projects=2
Output: 6
Explanation:

With initial capital of ‘1’, we will start the second project which will give us profit of ‘2’. Once we selected our first project, our total capital will become 3 (profit + initial capital).

With ‘3’ capital, we will select the third project, which will give us ‘3’ profit.

After the completion of the two projects, our total capital will be 6 (1+2+3).

Example 2

Input: Project Capitals=[0,1,2,3], Project Profits=[1,2,3,5], Initial Capital=0, Number of Projects=3
Output: 8
Explanation:

With ‘0’ capital, we can only select the first project, bringing out capital to 1.

Next, we will select the second project, which will bring our capital to 3.

Next, we will select the fourth project, giving us a profit of 5.

After selecting the three projects, our total capital will be 8 (1+2+5).

A

While selecting projects we have two constraints:

  1. We can select a project only when we have the required capital.
  2. There is a maximum limit on how many projects we can select.

Since we don’t have any constraint on time, we should choose a project, among the projects for which we have enough capital, which gives us a maximum profit. Following this greedy approach will give us the best solution.

While selecting a project, we will do two things:

  1. Find all the projects that we can choose with the available capital.
  2. From the list of projects in the 1st step, choose the project that gives us a maximum profit.

We can follow the Two Heaps approach similar to Find the Median of a Number Stream. Here are the steps of our algorithm:

  1. Add all project capitals to a min-heap, so that we can select a project with the smallest capital requirement.
  2. Go through the top projects of the min-heap and filter the projects that can be completed within our available capital. Insert the profits of all these projects into a max-heap, so that we can choose a project with the maximum profit.
  3. Finally, select the top project of the max-heap for investment.
  4. Repeat the 2nd and 3rd steps for the required number of projects.

Since, at the most, all the projects will be pushed to both the heaps once, the time complexity of our algorithm is O(NlogN + KlogN), where ‘N’ is the total number of projects and ‘K’ is the number of projects we are selecting.

The space complexity will be O(N) because we will be storing all the projects in the heaps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Next Interval

Given an array of intervals, find the next interval of each interval. In a list of intervals, for an interval ‘i’ its next interval ‘j’ will have the smallest ‘start’ greater than or equal to the ‘end’ of ‘i’.

Write a function to return an array containing indices of the next interval of each input interval. If there is no next interval of a given interval, return -1. It is given that none of the intervals have the same start point.

Example 1

  • *Input:** Intervals [[2,3], [3,4], [5,6]]
  • *Output:** [1, 2, -1]
  • *Explanation:** The next interval of [2,3] is [3,4] having index ‘1’. Similarly, the next interval of [3,4] is [5,6] having index ‘2’. There is no next interval for [5,6] hence we have ‘-1’.

Example 2

  • *Input:** Intervals [[3,4], [1,5], [4,6]]
  • *Output:** [2, -1, -1]
  • *Explanation:** The next interval of [3,4] is [4,6] which has index ‘2’. There is no next interval for [1,5] and [4,6].
A

A brute force solution could be to take one interval at a time and go through all the other intervals to find the next interval. This algorithm will take O(N2) where ‘N’ is the total number of intervals. Can we do better than that?

We can utilize the Two Heaps approach. We can push all intervals into two heaps: one heap to sort the intervals on maximum start time (let’s call it maxStartHeap) and the other on maximum end time (let’s call it maxEndHeap). We can then iterate through all intervals of the maxEndHeap to find their next interval. Our algorithm will have the following steps:

  1. Take out the top (having highest end) interval from the maxEndHeap to find its next interval. Let’s call this interval topEnd.
  2. Find an interval in the maxStartHeap with the closest start greater than or equal to the start of topEnd. Since maxStartHeap is sorted by ‘start’ of intervals, it is easy to find the interval with the highest ‘start’. Let’s call this interval topStart.
  3. Add the index of topStart in the result array as the next interval of topEnd. If we can’t find the next interval, add ‘-1’ in the result array.
  4. Put the topStart back in the maxStartHeap, as it could be the next interval of other intervals.
  5. Repeat the steps 1-4 until we have no intervals left in maxEndHeap.

The time complexity of our algorithm will be O(NlogN), where ‘N’ is the total number of intervals.

The space complexity will be O(N) because we will be storing all the intervals in the heaps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

K-Messed Array Sort

Given an array of integers arr where each element is at most k places away from its sorted position, code an efficient function sortKMessedArray that sorts arr. For instance, for an input array of size 10 and k = 2, an element belonging to index 6 in the sorted array will be located at either index 4, 5, 6, 7 or 8 in the input array.

Analyze the time and space complexities of your solution.

Example

input: arr = [1, 4, 5, 2, 3, 7, 8, 6, 10, 9], k = 2
output: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

A

If we use min heap, we can get an asymptotically better time complexity. We can solve this problem in O(N⋅log(K)). The idea is to construct a min-heap of size k+1 and insert first k+1 elements into the heap. Then we remove min from the heap and insert next element from the array into the heap and continue the process until both array and heap are exhausted. Each pop operation from the heap should insert the corresponding top element in its correct position in the array.

Time Complexity: building a heap takes O(K) time for K+1 elements. Insertion into and extraction from the min-heap take O(log(K)), each. Across all three loops, we do at least one of these actions N times, so the total time complexity is O(N⋅log(K)). if K is substantially smaller than N, then we can consider log(K) constant and argue that the complexity is practically linear.

Space Complexity: we need to a maintain min-heap of size K+1 throughout the algorithm, so the auxiliary space complexity is O(K).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly