Huffman Code
Huffman Code
Input is array of unique characters along with their frequency of occurrences and output is
Huffman Tree.
1. Create a leaf node for each unique character and build a min heap of all leaf nodes (Min
Heap is used as a priority queue. The value of frequency field is used to compare two
nodes in min heap. Initially, the least frequent character is at root)
2. Extract two nodes with the minimum frequency from the min heap.
3. Create a new internal node with frequency equal to the sum of the two nodes
frequencies. Make the first extracted node as its left child and the other extracted node as
its right child. Add this node to the min heap.
4. Repeat steps#2 and #3 until the heap contains only one node. The remaining node is
the root node and the tree is complete.
Example: MISSISSIPPI RIVER
Assuming we have to send a string of data , say "mississippi river" . If we are sending it
in the general way each letters will take 8bits. Here there are 17 letters for this
word.hence it will take a total of 136 bits (17 * 8) .
now coming to huffman's method , here initialy we are finding the relative frequencies of each
letters . Considering our example . We have to note that in “mississipi river”, space between these
two words is also considered as a letter. Hence here am using “_” to represent the white space :
m ----> 1
i ----> 5
s ----> 4
p ----> 2
r ----> 2
v ----> 1
e ----> 1
_ ----> 1
now we have to assign codes. And as a first step, we have to sort these letters in the
frequency of its occurence. That is, “i” in this example comes first with occurence 5.
similarly s(4) and so on. So the respective sorted order will be :
i5 s4 p2 r2 m1 v1 e1 _1
now our next step towards assigning codes is to add the respective frequencies. We start
from the words with the shortest frequencies and we build the tree with the above data as
leaf nodes. So here first we add
e2 here first we added e and space “ _ ” and their frequencies . And we got e_2 ,
where 2 is their added frequency.
e1 _ 1
similarly we add (v1, m1 ) (p2 , r2 ) (i5 , s4) to get vm2 , pr4 , is9 .
V1 m1 p2 r2 i5 s4
similarly we keep adding to the top as shown in the figure below to construct the tree.
After constructing the tree we label the left branch with a “0” and right branch with “1” :
isprmve_17
prmve_8
0 1
mve_4
0
0
1
i5 s4 p2 r2 m1 v1 e1 _1
now the tree structure is complete. from this tree we get the codes for each letter .
For that we have to view the tree from the top to leaf for each repective letter. For eg
taking the letter i. Starting from the root “ isprmve_ “ it goes to “ is9 ” and then reaches
leaf i5 through the branch 00 . thus assigned code fr the letter “a” is 00 .
similarly we get codes for each respective letters by traversing the tree.
m i s s i s s i p p i “_” r i v e r
1100 00 01 01 00 01 01 00 100 100 00 1111 101 00 1101 1110 101 = 46
concluding we can say huffman algorithm compresses the bits required to represent
a text data in an efficient way.
The following post will be explaining steps to implement huffman using python
and java script based on this algorithm..
Practical usage:
Huffman is widely used in all the mainstream compression formats that you might
encounter - from GZIP, PKZIP (winzip etc) and BZIP2, to image formats such as JPEG
and PNG.