June 2014 – Basicsbehind.com

What is Heap Data Structure?

Heap data structure is nowhere related to heap memory area referred in Programming language like C++ and Java.
You can visualize it as a nearly complete binary tree. It is also referred as Binary heap.
Each node in a heap should satisfy the below heap condition.
- Max Heap: Parent node is always greater than its child. (parent dominates)
- Min Heap: Parent node is always smaller than its child. (child dominates)
It is a weakly or partially ordered (as compared to Binary Search Tree) data structure that is especially suitable for implementing priority queue.
Never think of Heap as being sorted. It is not sorted at all. What makes a heap interesting is that the largest element is always on the top of a max heap.
It can be represented through simple Array. Thank GOD there is no links.

Going forward Heap refers max heap only. MinHeap will be trivial if you understand MaxHeap clearly, you will just have to change some less than(<) and greater than(>) sign. I have implemented a min heap in question section.

What is a complete binary tree?
A complete binary tree is completely filled from left to right across each row with one possible exception that the last row need not be full; it is filled from the left up to a point.

Important properties of a complete binary tree:

A complete binary tree of height h has 2^h+1−1 nodes.
A complete binary tree of height h has 2^h−1 non-leaf nodes.
A complete binary tree of height h has 2^h leaf nodes.
A complete binary tree of height h has between 2^h and 2^h+1 – 1 node.
A Heap tree height with n nodes is equal to ⌊log2ⁿ⌋.

It’s very easy to understand if you know the basics of Logrithms.
Let’s try to understand this:

In above complete binary tree we have 7 nodes which we can write as 2³-1.
2^h+1-1 => 2²⁺¹-1 = 7 (approximately we raised 2 to the height of the tree to get number of nodes)
When you write log2ⁿ means that how many times you will raise 2 to get n. For ex. Log2⁸ =3 => 2³=8.

Representation of a heap
A heap can be implemented as an array by recording its elements in the top-down, left-to-right fashion.

Important observations:

Here you can see that a heap is a complete binary tree therefore there are no holes in the array used to represent it.
The parental node keys will be in the first ⌈n/2⌉ positions of the array, while the leaf keys will occupy the last ⌊n/2⌋ positions.
If you know the index (i) of any node you can find the parent, left child and right child index by following formula (assuming array index starts from 0).
- Parent index = (i-1)/2;
- Left child index = 2*i + 1;
- Right child index = 2*i + 2;

Now it’s time to some real implementations of Heap data structure:

How can we construct a heap for a given array of keys? There are two waysfor doing this.

Bottom-up heap construction algorithm also known as MaxHeapify and trickleDown.
TopDown or trickleUp heap construction algorithm. This will be discussed in next section with priority queue.

MaxHeapify: You can think of MaxHeapify() as a simple method which convert an array to a heap. This algorithm works in bottom-up manner to convert an array A [0..n],where n is the size of Array, into a max-heap. It initializes the array as complete binary tree with n elements by placing elements in the order given and then it “Heapify” the tree as follows:

- Starting with the last parental node (n/2th position), the algorithm checks whether the parent satisfy the heap condition or not i.e. parent node should be greater than its children.
- If it does not, the algorithm exchanges the node’s key K with the larger key of its children and checks whether the heap condition holds for K in its new position.
- This process continues until the heap condition for K is satisfied.
- After completing the “heapification” of the subtree rooted at the current parental node, the algorithm proceeds to do the same for the node’s immediate predecessor. The algorithm stops after this is done for the root of the tree.

- - - The last parental node is 10, we start from here. Compare this node with its largest children, in this case it is 12 therefore swap it with its largest child, now the 10 is at its position and there is no left and right child so we stop here.
    - Repeat same process for the second last parental node 8. This node satisfies the heap condition so do nothing here.
    - Repeat same process for the third last parental node 2. Compare this node with its largest children, in this case it is 12 therefore swap it with its largest child. Next check whether the node 2 satisfies the heap property at its new position, it’s not because left child 10 is greater than 2 so swap 2 with 10.
      Now the tree is a heap.

Java Code for Heap
Lets start with simple maxHeapify() method and subsequently we will add another method like heapSort() and PriorityQueue’s methods extractMax(), and insert().
To easily understand the code visualizes the Heap as an array.

package com.test.heap;

import java.util.Arrays;

public class Heap {

	private void maxHeapify(int[] heapArray, int index, int heapSize) {
		int largest = index;
		while (largest < heapSize / 2) { // check before leaf node only
			int left = (2 * index) + 1;//left child
			int right = left + 1;      //right child
			if (left < heapSize && heapArray[left] > heapArray[index]) {
				largest = left; 			}
			if (right < heapSize && heapArray[right] > heapArray[largest]) {
				largest = right;
			}
			if (largest != index) { // swap parent with largest child
				int temp = heapArray[index];
				heapArray[index] = heapArray[largest];
				heapArray[largest] = temp;
				index = largest;
			} else {
				break; // heap condition satisfied for index
			}

		}
	}

	public void builMaxHeap(int[] heapArray, int heapSize) {
		// we are not starting from 0 because we are swapping largest child to
		// the parent(see algorithm)
		for (int i = (heapSize - 1) / 2; i >= 0; i--) {
			maxHeapify(heapArray, i, heapSize);
		}
	}
	
	public static void main(String[] args) {
		int[] heapArray = { 1, 8, 9, 2, 10, 14, 3, 4, 7, 16 };
		
		System.out.println("Before heapify: "+Arrays.toString(heapArray));		
		new Heap().builMaxHeap(heapArray, heapArray.length);
		System.out.println("After heapify: "+Arrays.toString(heapArray));
		
	}
}

Output
--------
Before heapify: [1, 8, 9, 2, 10, 14, 3, 4, 7, 16]
After heapify : [16, 10, 14, 7, 8, 9, 3, 4, 2, 1]

Efficiency of maxHeapify()
As we have seen that Heap is a binary tree with some special property. A Heap tree height with N nodes is equal to ⌊log2^N⌋ or ⌈log2^{(N + 1)}⌉ − 1.
Let’s understand this once again-
Consider below Heap, here you can see the no of nodes N is 7 and height is 2. Let’s derive height through N. Remember log2^N means how many times you will raise 2 to get N. For ex. log2⁸=3 => 2³=8.
Height = ⌊log2^N⌋ => ⌊log2⁷⌋=2
Or Height = ⌈log2^N+1⌉-1 => ⌈log2⁷⁺¹⌉-1 =>⌈log2⁸⌉-1=> 3-1 = 2

We can observe that in worst case of the buildMaxHeap() algorithm each node value on level i of the tree will travel to the leaf level h. We requires two comparisons before moving node value to the next level down —one to find the larger child and the other to determine whether the exchange is required—the total number of key comparisons involving a key on level i will be 2(h − i). Confused! let’s clear the confusion with below example:

Here you can see that node 10 is at the level 1 (i=1) is violating the heap property and to determine whether this node is violating the heap property we need two comparisons- first to larger child which is 12 – second to compare whether the larger child is greater that its parent or not and hence there are 2 comparison at level i=1. We can see that 2(h-i) => 2(2-1)=2;
Therefore, the total number of key comparisons in the worst case will be directly proportional to the tree height(h). We can say that the running time of maxHeapify() on a node of height h as O(h) or O(log2^N).

What about the running time of buildMaxHeap()? It’s easy to compute the upper bound of buildMaxHeap()- we know that we are calling maxHeapify() in loop from 0 to N/2. Each call to maxHeapify() takes O(logn) time, and buildMaxHeap() makes O(n) such calls. Thus, the running time is O(nlogn).
Although above upper bound is correct but there is a tighter bound by which you can derive the running time cost to O(n). Let’s go deeper into this:

When mxHeapify() is called, the running time depends on how far an element might move down in tree before the loop terminates or we can say that it depends on the height of node. In the worst case the node value will move up to the leaf level.
Let’s see the work done at each level.
At the last row of the tree there are 2^h nodes – no call to maxHeapify .
At the second last row there are 2^{(h − 1)} node – and each might move down by 1 level.
At the third last row there are 2^{(h − 2)} nodes, and each might move down by 2 levels.
…
…
At the top row there are 2^{(h − h)} = 2⁰=1 nodes, and the node might move down up to the leaf i.e. equal to the height logn.
Here we can see that the nodes at the top may have to move maximum, but there are far fewer nodes near the top of the tree than the bottom. We can say that not all mxHeapify() operations are O(logn), this is why you are getting O(n).
↠↠Next ↠Heap Sort

Go to the next page – Click on the below red circle with page number.

1 2