-
Notifications
You must be signed in to change notification settings - Fork 0
Binary Heap
A priority queue is an abstract data type similar to a regular queue or stack, but where each element has a "priority". Unlike a FIFO queue or a LIFO stack, elements are retrieved from a priority queue based on their priority. The element with the highest priority is served first.
Think of it like boarding priority at an airport (higher status passengers board first) or a hospital emergency room (patients with more critical conditions are seen first).
A priority queue typically supports at least the following operations:
- Insert: Add a new element with its priority.
- Extract (Max/Min): Remove and return the element with the highest (or lowest) priority.
- Peek (Max/Min): Return the element with the highest (or lowest) priority without removing it.
The binary heap is one of the simplest and most common data structures used to implement a priority queue. It offers a good balance of performance for core operations and relative ease of implementation.
Binary heaps are widely used in computer science:
- Sorting: Heap sort is an efficient comparison-based sorting algorithm that uses a binary heap.
-
Graph Algorithms:
- Prim's algorithm for finding the minimum spanning tree.
- Dijkstra's algorithm for finding the shortest path in a graph (especially with large graphs where simpler priority queue implementations are too slow).
- Event Simulation: Managing scheduled events in simulations, where events need to be processed in chronological order.
- Operating Systems: Task scheduling (prioritizing processes).
- Data Compression: Huffman coding uses a priority queue to build the Huffman tree.
A binary heap is a specific type of binary tree that satisfies two main properties:
- Shape Property: It is a complete binary tree. This means that all levels of the tree are fully filled, except possibly the last level, which is filled from left to right. This property ensures that the tree is as balanced as possible, minimizing its height.
-
Heap Property: There is a specific relationship between parent nodes and their children based on priority.
-
Max-Heap: In a max-heap, for every node
iother than the root, the priority ofiis less than or equal to the priority of its parent (priority(i) <= priority(parent(i))). The element with the highest priority is always at the root. -
Min-Heap: In a min-heap, for every node
iother than the root, the priority ofiis greater than or equal to the priority of its parent (priority(i) >= priority(parent(i))). The element with the lowest priority is always at the root.
-
Max-Heap: In a max-heap, for every node
We will focus on max-heaps for the operation descriptions below, as delMax is a common operation name, but the concepts apply symmetrically to min-heaps.
A complete binary tree has a structure that allows for a very efficient space-saving representation using a simple array (or list). Instead of using pointers, we can store the nodes level by level, from left to right, in the array.
Let's assume a 1-based indexing for simplicity (the first element is at index 1, not 0). If a node is at index
- Its parent is at index
$floor(i / 2)$ . - Its left child is at index
$2 i$ . - Its right child is at index
$2 i + 1$ .
If using 0-based indexing (the first element is at index 0):
- Its parent is at index
$floor((i - 1) / 2)$ . - Its left child is at index
$2 i + 1$ . - Its right child is at index
$2 i + 2$ .
The floor function is important for parent calculation with integer division. For example, with 1-based indexing, the parent of node 3 (index 3) is at index
This array representation is extremely compact, using only the necessary space for the elements, and allows for fast calculation of parent/child indices. The size of the heap (number of elements) corresponds to the number of elements currently in the array. The root is always at index 1 (or 0).
All operations on a binary heap are designed to maintain the heap property while preserving the complete tree shape. The two fundamental helper operations are swim and sink, which restore the heap property after an element's priority changes or a structural change occurs.
The max operation (or peekMax) returns the element with the highest priority without removing it.
Explanation: In a max-heap, the element with the highest priority is always located at the root of the tree. In the array representation, this is the element at index 1 (if using 1-based indexing) or index 0 (if using 0-based indexing).
Detailed Steps:
- Check if the heap is empty. If so, return an indicator of empty (e.g., null or throw an exception).
- Return the element at the root index (index 1 or 0).
Time Complexity:
Accessing an element at a fixed index in an array takes constant time.
Therefore, the time complexity of the max operation is O(1).
The swim operation is used to restore the heap property when a node's priority becomes higher than its parent's priority. This typically happens after inserting a new element at the end of the heap.
Explanation: If a node's priority is higher than its parent's, they violate the heap property. To fix this, the node is swapped with its parent. After the swap, the node (now higher up in the tree) might still have a higher priority than its new parent, so the process is repeated. The node "swims" up the tree until it reaches the root or its priority is no longer greater than its parent's.
Detailed Steps (for a node at index k):
- While
k > 1(ork > 0for 0-based indexing, meaningkis not the root) AND the element at indexkhas higher priority than its parent (at indexfloor(k/2)orfloor((k-1)/2)): - Swap the element at index
kwith the element at its parent's index. - Update
kto the parent's index (k = floor(k/2)ork = floor((k-1)/2)) and repeat the check.
Time Complexity:
In the worst case, an element might "swim" all the way up from the bottom level to the root. The number of levels in a complete binary tree with n nodes is swim operation involves a comparison and potentially a swap, both constant time operations.
Therefore, the time complexity of the swim operation is
The sink operation is used to restore the heap property when a node's priority becomes lower than one or both of its children's priorities. This typically happens after replacing the root element (as in delMax) or decreasing a node's priority.
Explanation: If a node's priority is lower than that of its children, it violates the heap property. To fix this, the node is swapped with the child that has the highest priority (in a max-heap). Swapping with the highest-priority child ensures that the node moves below a higher-priority element and also keeps the higher-priority child higher up in the tree. After the swap, the node (now lower down) might still have a lower priority than its new children, so the process is repeated. The node "sinks" down the tree until it reaches a leaf node or its priority is no longer lower than either of its children's.
Detailed Steps (for a node at index k):
- Let
nbe the current number of elements in the heap. - While the node at index
khas at least one child within the heap bounds (i.e.,2*k <= nfor 1-based indexing, or2*k + 1 < nfor 0-based indexing): - Determine the index
jof the child with the higher priority. Start with the left child (j = 2*korj = 2*k + 1). - If the right child exists (
2*k + 1 <= nfor 1-based, or2*k + 2 < nfor 0-based) AND has a higher priority than the left child, setjto the right child's index. - If the element at index
khas priority greater than or equal to the element at indexj(the higher-priority child), the heap property is satisfied for this node and its children. Stop the sinking process. - Otherwise (element at
khas lower priority than element atj), swap the element at indexkwith the element at indexj. - Update
kto the child's index (k = j) and repeat the check.
Time Complexity:
In the worst case, an element might "sink" all the way down from the root to a leaf node. The number of levels in a complete binary tree with n nodes is sink operation involves comparisons (up to 2 children) and potentially a swap, both constant time operations.
Therefore, the time complexity of the sink operation is
The insert operation adds a new element with its priority into the heap while maintaining the heap properties.
Explanation:
To maintain the complete tree shape, a new element is always added at the next available position in the last level of the tree. In the array representation, this corresponds to adding the new element at the end of the array. After adding the element, it might violate the heap property because its priority could be higher than its parent's. The swim operation is then used to move the new element up the tree until the heap property is restored.
Detailed Steps:
- Add the new element to the end of the array/list used to store the heap elements.
- Increment the size/count of elements in the heap.
- Call the
swimoperation on the index of the newly added element to move it up to its correct position.
Time Complexity:
Adding an element to the end of an array/list (if using dynamic resizing) is typically an amortized swim operation takes n is the current number of elements.
Therefore, the dominant factor is the swim operation, making the time complexity of insert
The delMax operation removes and returns the element with the highest priority from the heap.
Explanation:
In a max-heap, the highest priority element is always at the root (index 1 or 0). Removing the root would create a hole at the top and potentially break the complete tree shape. To maintain the complete tree shape, the element from the very last position in the heap (the last element in the array) is moved to the root position. After this replacement, the element now at the root might have a lower priority than its children, violating the heap property. The sink operation is then used on the new root element to move it down the tree until the heap property is restored. The size of the heap is then decremented.
Detailed Steps:
- Check if the heap is empty. If so, return an indicator of empty or throw an exception.
- Store the maximum element (which is at the root, index 1 or 0) to be returned later.
- Swap the element at the root with the element at the very last position in the heap (index
norn-1, wherenis the current number of elements). - Decrement the size/count of elements in the heap (
n--). The swapped element is now outside the considered heap range. - Call the
sinkoperation on the new root element (index 1 or 0) to move it down to its correct position and restore the heap property. - Return the stored maximum element.
Time Complexity:
Swapping two elements and decrementing the size are sink operation takes n is the current number of elements.
Therefore, the dominant factor is the sink operation, making the time complexity of delMax
Here is a summary of the time complexities for the main operations on a binary heap with n elements:
| Operation | Time Complexity | Notes |
|---|---|---|
max |
Accessing the root element | |
insert |
Dominated by the swim operation |
|
delMax |
Dominated by the sink operation |
|
swim |
Proportional to the height of the tree | |
sink |
Proportional to the height of the tree |
The binary heap is an efficient and fundamental data structure for implementing priority queues. Its key advantages are:
-
Efficient Operations:
insertanddelMax(orextractMin) operations have a logarithmic time complexity$O(lg n)$ , which is quite fast, especially for large heaps. Themax(ormin) operation is$O(1)$ . -
Space Efficiency: When implemented with an array, it requires only
$O(n)$ space fornelements, which is very space-efficient compared to pointer-based tree implementations. - Simplicity: The array-based implementation is relatively straightforward.
For many applications, the binary heap provides excellent performance. However, other priority queue implementations exist which offer better worst-case guarantees for specific operations, though often at the cost of increased complexity or average-case performance.
-
Binomial Heaps: Can perform merging of two heaps more efficiently
$O(lg n)$ than binary heaps ($O(n)$ in naive implementations,$O(m log(n+m)$ ) if inserting one by one, where m and n are sizes). Useful in algorithms where heaps are frequently merged. -
Fibonacci Heaps: Offer even better amortized time complexities for operations like
decreaseKey($O(1)$ amortized ), which are crucial in certain graph algorithms like Dijkstra's and Prim's on sparse graphs. However, they have higher constant factors and are significantly more complex to implement than binary heaps.
Despite the existence of these alternatives, the binary heap remains a cornerstone data structure due to its simplicity, good performance characteristics, and wide applicability. It is often the default choice for priority queue implementations unless specific requirements (like very fast merging or decrease-key operations) necessitate a more complex structure.