Loading problems...
TL;DR: The optimal solution uses two priority queues (heaps) to maintain the lower and upper halves of the data stream, allowing access to the median in time and insertion in time.
The Find Median from Data Stream problem asks us to design a data structure that supports two operations: adding an integer to a running stream of numbers and retrieving the median of all numbers added so far. The median is defined as the middle value in an ordered list. If the list has an even number of elements, the median is the average of the two middle values.
This is a classic LeetCode 295 solution scenario often encountered in technical interviews at top-tier tech companies. The challenge lies not in finding the median once, but in efficiently maintaining the median as data flows in dynamically.
A naive approach to solving Find Median from Data Stream involves maintaining a simple list (or dynamic array) of the numbers.
ArrayList in Java or vector in C++).textclass MedianFinder: list = [] func addNum(num): list.append(num) func findMedian(): sort(list) n = list.size if n is odd: return list[n/2] else: return (list[n/2 - 1] + list[n/2]) / 2.0
addNum: (amortized).findMedian: due to sorting, where is the number of elements added so far.While correct, this approach is inefficient for high-frequency queries. The problem constraints allow up to calls. If we alternate between addNum and findMedian, the total time complexity approaches or depending on the sorting behavior. This will result in a Time Limit Exceeded (TLE) error. Even keeping the array sorted using insertion sort ( per add) results in an overall complexity, which is too slow.
The intuition behind using two heaps stems from the definition of the median. The median effectively partitions a dataset into two halves: a "lower half" containing the smaller elements and an "upper half" containing the larger elements.
To calculate the median, we do not need to know the relative order of elements within the lower half or the upper half. We only need to know:
If we have these two values, we can compute the median instantly. Heaps are designed precisely for this: a Max-Heap gives us the maximum element in , and a Min-Heap gives us the minimum element in .
The core constraint enforced by the Two Heaps pattern is:
Imagine the data stream sorted on a horizontal line. We place a cut right in the middle. The left side (smaller numbers) is managed by a Max-Heap, so the "peak" of this heap is the largest value to the left of the cut. The right side (larger numbers) is managed by a Min-Heap, so the "peak" of this heap is the smallest value to the right of the cut. These two peaks are adjacent to the median cut. As new numbers arrive, we adjust the heaps to keep the cut centered.

Let's trace the algorithm with inputs: [1, 2, 3].
Initialization:
maxHeap (Lower Half): []minHeap (Upper Half): []1. addNum(1):
1 to maxHeap. State: maxHeap: [1], minHeap: [].maxHeap to minHeap. State: maxHeap: [], minHeap: [1].minHeap size (1) > maxHeap size (0). Move top of minHeap to maxHeap.maxHeap: [1], minHeap: [].2. addNum(2):
2 to maxHeap. State: maxHeap: [2, 1] (heap ordered), minHeap: [].maxHeap (2) to minHeap. State: maxHeap: [1], minHeap: [2].maxHeap: [1], minHeap: [2].findMedian(): Sizes equal. Return (1 + 2) / 2.0 = 1.5.3. addNum(3):
3 to maxHeap. State: maxHeap: [3, 1], minHeap: [2].maxHeap (3) to minHeap. State: maxHeap: [1], minHeap: [2, 3].minHeap size (2) > maxHeap size (1). Move top of minHeap (2) to maxHeap.maxHeap: [2, 1], minHeap: [3].findMedian(): maxHeap is larger. Return 2.0.The correctness relies on the invariant that maxHeap contains the smallest elements and minHeap contains the largest elements.
maxHeap and immediately moving the max to minHeap, we ensure that no element in maxHeap is larger than the smallest element in minHeap. The subsequent rebalancing moves the smallest of the large half back to the small half only if necessary to maintain size counts, preserving the relative order property.maxHeap.size() is either equal to minHeap.size() (even ) or minHeap.size() + 1 (odd ).maxHeap).maxHeap and top of minHeap respectively.cpp#include <queue> #include <vector> class MedianFinder { private: // Max-heap to store the smaller half of the numbers std::priority_queue<int> maxHeap; // Min-heap to store the larger half of the numbers std::priority_queue<int, std::vector<int>, std::greater<int>> minHeap; public: MedianFinder() { // Constructor strictly initializes empty heaps } void addNum(int num) { // Always add to maxHeap first to let it filter the largest maxHeap.push(num); // Move the largest element of the small half to the large half minHeap.push(maxHeap.top()); maxHeap.pop(); // Rebalance: maxHeap size must be equal to or 1 greater than minHeap size if (minHeap.size() > maxHeap.size()) { maxHeap.push(minHeap.top()); minHeap.pop(); } } double findMedian() { if (maxHeap.size() > minHeap.size()) { // Odd number of elements: median is the top of maxHeap return (double)maxHeap.top(); } else { // Even number of elements: median is average of both tops return (maxHeap.top() + minHeap.top()) / 2.0; } } };
javaimport java.util.PriorityQueue; import java.util.Collections; class MedianFinder { // Max-heap for the lower half private PriorityQueue<Integer> maxHeap; // Min-heap for the upper half private PriorityQueue<Integer> minHeap; public MedianFinder() { // Max-heap requires reverseOrder comparator maxHeap = new PriorityQueue<>(Collections.reverseOrder()); minHeap = new PriorityQueue<>(); } public void addNum(int num) { maxHeap.offer(num); minHeap.offer(maxHeap.poll()); // Maintain size property: maxHeap can have at most 1 more element than minHeap if (minHeap.size() > maxHeap.size()) { maxHeap.offer(minHeap.poll()); } } public double findMedian() { if (maxHeap.size() > minHeap.size()) { return maxHeap.peek(); } else { return (maxHeap.peek() + minHeap.peek()) / 2.0; } } }
pythonimport heapq class MedianFinder: def __init__(self): # Python's heapq is a min-heap by default. # We simulate a max-heap by storing negated numbers. self.small = [] # Max-heap (stores -num) self.large = [] # Min-heap (stores num) def addNum(self, num: int) -> None: # Push to max-heap (negate value) heapq.heappush(self.small, -num) # Ensure every element in small is <= every element in large # We pop the largest from small (which is -small[0]) and push to large val = -heapq.heappop(self.small) heapq.heappush(self.large, val) # Rebalance sizes: small can have 1 more element than large if len(self.large) > len(self.small): val = heapq.heappop(self.large) heapq.heappush(self.small, -val) def findMedian(self) -> float: if len(self.small) > len(self.large): return -self.small[0] else: return (-self.small[0] + self.large[0]) / 2.0
addNum: . We perform at most 3 heap push/pop operations. Each heap operation takes logarithmic time relative to the number of elements.findMedian: . We simply access the top elements of the heaps.The Two Heaps pattern is a specialized technique primarily used for dynamic median or percentile tracking.
multiset or Fenwick trees, the intuition of maintaining partitions of sorted data remains similar.Q: Why does the sorting approach result in Time Limit Exceeded (TLE)?
A: Sorting takes . If you sort inside findMedian and call it frequently, the total runtime becomes quadratic relative to the number of operations, which fails for .
Mistake 1: Incorrect Heap Initialization
Mistake 2: Forgetting to Rebalance
Mistake 3: Integer Division in Median Calculation
(a + b) / 2 instead of (a + b) / 2.0 in strictly typed languages.a and b are integers, (a + b) / 2 performs integer division, truncating the decimal.double return type.