Definition.
A tree is heap-ordered if the key in each node is larger than or equal to the keys in all of that
node's children (if any). Equivalently, the key in each node of a heap-ordered tree is smaller than or equal
to the key in that node's children (if any).
The most important property of a heap-ordered tree is that the root always contains the largest (smallest)
element.
A heap-ordered tree.
The tree on the picture above is a complete binary tree, and in this case we can use an array to store
all information about the nodes. To keep the complete information about a tree in an array we will enumerate
all the vertices starting from the root, then sequentially nodes of each level from left to right as shown
below.
A heap-ordered tree mapped to an array.
Definition.
A heap is a set of nodes with keys arranged in a complete heap-ordered binary tree, represented as an
array.
We need to distinguish max-heaps (the root of the tree contains the largest element) and
min-heaps (the root of the tree contain the smallest element).
It is not difficult to see the rules for enumeration of a parent node and its children in heaps. If a node
has index n in the array, then its left child has index 2n+1 and its right child
has index 2n+2. Accordingly, its parent has index [(n-1)/2].
We can define the following functions that will be useful in the future:
int parent(int n) { return (n-1)/2; }
int left_child(int n) { return 2*n+1; }
int right_child(int n) { return 2*n+2; }
Building a heap
The next question we are about to discuss is how to build a heap. Let's assume that we have the following
tree:
Not a heap at all.
Obviously, this is not a heap-ordered tree because the red numbers violate the restriction. To construct
a heap-ordered tree out of this one we will use a simple idea. We will design a simple procedure
heapify that looks at a node and its children, finds the biggest element out of these three
and puts it in this small tree. For example, after looking at picture A below this procedure should find the
value 11 and swaps it with the value 8 (as shown on picture B):
A. Before.
B. After.
If the procedure works under the assumption that the both children are roots of heap-ordered subtrees, then
this procedure should recursively call itself for the child that had the largest value (the one moved up).
For example, after using this procedure against the node containing the value 8 we will have
Thus, our heapify function applied to an array of the size heap_size can look like
void max_heapify(int data[], int heap_size, int node)
{
int l = left_child(node),
r = right_child(node);
if( l >= heap_size ) return; // this node does not have children
int max_ind = data[node]<data[l] ? l : node;
if( r<heap_size && data[max_ind]<data[r] )
max_ind = r;
// if the largest value belongs to one of the children then ...
if( max_ind != node ){
exch(data[node], data[max_ind]); // ... exchange it with the node ...max_heapify(data, heap_size, max_ind); // ... and make sure this child's subtree is still OK
}
}
where data is an array we are trying to convert to the heap, heap_size is the size of the heap, and
node is the index of the node we apply this function to.
This function will successfully work only if both children of a tested node are the roots of "good"
subtrees (that is, those subtrees are heap-ordered). To satisfy this condition while building a heap out of
an array, we need to start from the nodes of lowest level. More exactly, we can start with the nodes whose
children are terminal nodes (that is, do not have any children by themselves). For example, on the pictures
above we can start with the nodes with indexes 5, 4, 3 and go up to the root. Thus, the implementation of
the build_max_heap function is very simple
void build_max_heap(int data[], int N)
{
for(int node=N/2-1; node>=0; node--)
max_heapify(data, N, node);
}
where data is an array to be converted, and N is th size of the array. Let us note that for
future use we distinguish the value N (the size of the array data) and value heap_size
(the number of elements in the heap stored within array data).
Heap sort
The heap is a very important data structure and widely used in programming. It also can be used for sorting
arrays. The idea of sorting using the heaps is based on the main property of the
heaps. Assume we have an array data of the size N, that was converted to a heap. Thus, the
size of the heap (heap_size) is also equal to N. If we would like to sort this array, then
we can use the property that the root (or the element with index zero) always contain the largest element.
Since the right place for the largest element is at the very end of the array, we need to swap the last
element and the first one. By doing this we, of course, ruined the heap, but we can apply the
max_hepify procedure to the root node and restore the heap-ordered structure. Although, this restoration
will bring the biggest element (the one we put at the end) to its original place at the beginning of the
array, to avoid this, we need to decrease the value of the heap_size variable.
After restoring the heap structure on a sub-array data [0, N-2], we know that the second largest
element now takes the first position in the array. So, we swap it with the element #N-2 and again restore
the heap-order by using max_heapify(data, --heapsize, 0).
Keep repeating such steps until we put all elements in their right positions.
Glossary
Definition.
A tree is a nonempty collection of edges and nodes that has exactly one path connecting any two nodes.
Definition.
A node (or vertex) is simple object that can have a name and can carry other associated information.
Definition.
An edge is a connection between two nodes.
Definition.
A path in a tree is a list of distinct nodes in which successive nodes are connected by edges in the
tree.
Definition.
A binary tree is an ordered (means we distinguish the children of a node) tree consisting of two types
of nodes:
external nodes with no children and
internal nodes with exactly two children.
Definition.
A complete binary tree is one with all levels filled, except possibly the final one, which is filled
from left to right.
Definition.
Nodes with no children are called leaves, or terminal nodes.