Huffman Code

Huffman coding is a lossless data compression algorithm that assigns variable-length codes to input characters based on their frequencies. The most frequent characters receive the shortest codes, while the least frequent characters receive the longest codes. It builds a Huffman tree from the character frequencies and assigns codes by traversing the tree from root to leaf for each character. Huffman coding is widely used in compression formats like GZIP, PKZIP, BZIP2, JPEG and PNG to efficiently compress data by representing more common symbols with fewer bits.

Uploaded by

Nitin

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

328 views

Huffman Code

Uploaded by

Nitin

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

HUFFMAN CODING

In computer science and information theory, a Huffman code is a particular type of

optimal prefix code that is commonly used for lossless data compression. The process of
finding and/or using such a code proceeds by means of Huffman coding, an algorithm
developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the
1952 paper "A Method for the Construction of Minimum-Redundancy Codes".
The output from Huffman's algorithm can be viewed as a variable-length code table for
encoding a source symbol (such as a character in a file). The algorithm derives this table
from the estimated probability or frequency of occurrence (weight) for each possible value
of the source symbol. As in other entropy encoding methods, more common symbols are
generally represented using fewer bits than less common symbols. Huffman's method can
be efficiently implemented, finding a code in time linear to the number of input weights if
these weights are sorted.
Huffman coding is a lossless data compression algorithm. The idea is to assign variable-
length codes to input characters, lengths of the assigned codes are based on the
frequencies of corresponding characters. The most frequent character gets the smallest
code and the least frequent character gets the largest code.
The variable-length codes assigned to input characters are Prefix Codes, means the
codes (bit sequences) are assigned in such a way that the code assigned to one character
is not prefix of code assigned to any other character. This is how Huffman Coding makes
sure that there is no ambiguity when decoding the generated bit stream.
Let us understand prefix codes with a counter example. Let there be four characters a, b,
c and d, and their corresponding variable length codes be 00, 01, 0 and 1. This coding
leads to ambiguity because code assigned to c is prefix of codes assigned to a and b. If
the compressed bit stream is 0001, the de-compressed output may be “cccd” or “ccb” or
“acd” or “ab”.

See this for applications of Huffman Coding.

There are mainly two major parts in Huffman Coding
1) Build a Huffman Tree from input characters.
2) Traverse the Huffman Tree and assign codes to characters.

Steps to build Huffman Tree

Input is array of unique characters along with their frequency of occurrences and output is
Huffman Tree.
1. Create a leaf node for each unique character and build a min heap of all leaf nodes (Min
Heap is used as a priority queue. The value of frequency field is used to compare two
nodes in min heap. Initially, the least frequent character is at root)
2. Extract two nodes with the minimum frequency from the min heap.
3. Create a new internal node with frequency equal to the sum of the two nodes
frequencies. Make the first extracted node as its left child and the other extracted node as
its right child. Add this node to the min heap.
4. Repeat steps#2 and #3 until the heap contains only one node. The remaining node is
the root node and the tree is complete.
Example: MISSISSIPPI RIVER

Assuming we have to send a string of data , say "mississippi river" . If we are sending it
in the general way each letters will take 8bits. Here there are 17 letters for this
word.hence it will take a total of 136 bits (17 * 8) .

now coming to huffman's method , here initialy we are finding the relative frequencies of each
letters . Considering our example . We have to note that in “mississipi river”, space between these
two words is also considered as a letter. Hence here am using “_” to represent the white space :

m ----> 1
i ----> 5
s ----> 4
p ----> 2
r ----> 2
v ----> 1
e ----> 1
_ ----> 1

now we have to assign codes. And as a first step, we have to sort these letters in the
frequency of its occurence. That is, “i” in this example comes first with occurence 5.
similarly s(4) and so on. So the respective sorted order will be :

i5 s4 p2 r2 m1 v1 e1 _1

now our next step towards assigning codes is to add the respective frequencies. We start
from the words with the shortest frequencies and we build the tree with the above data as
leaf nodes. So here first we add

e2 here first we added e and space “ _ ” and their frequencies . And we got e_2 ,
where 2 is their added frequency.

e1 _ 1
similarly we add (v1, m1 ) (p2 , r2 ) (i5 , s4) to get vm2 , pr4 , is9 .

vm2 pr2 is9

V1 m1 p2 r2 i5 s4

similarly we keep adding to the top as shown in the figure below to construct the tree.
After constructing the tree we label the left branch with a “0” and right branch with “1” :

isprmve_17

prmve_8

0 1

mve_4
0

0
1

is9 pr4 mv2 e_2

0 1 1
0 0 1 0 1

i5 s4 p2 r2 m1 v1 e1 _1

now the tree structure is complete. from this tree we get the codes for each letter .
For that we have to view the tree from the top to leaf for each repective letter. For eg
taking the letter i. Starting from the root “ isprmve_ “ it goes to “ is9 ” and then reaches
leaf i5 through the branch 00 . thus assigned code fr the letter “a” is 00 .

similarly we get codes for each respective letters by traversing the tree.

m ----> 1 ----> 1100

i ----> 5 ----> 00
s ----> 4 ---- > 01
p ----> 2 ---- > 100
r ----> 2 ----> 101
v ----> 1 ----> 1101
e ----> 1----> 1110
_ ----> 1----> 1111
and checking the obtained result we can see that the letter with least frequencies lik “v” and “e”
need more bits to represent than letters with high frequency like “i” and “s” (2 bits
each) .
So now when we try to send the word “mississippi river” it needs only 46 bits instead of 136
bits

m i s s i s s i p p i “_” r i v e r
1100 00 01 01 00 01 01 00 100 100 00 1111 101 00 1101 1110 101 = 46

concluding we can say huffman algorithm compresses the bits required to represent
a text data in an efficient way.

The following post will be explaining steps to implement huffman using python
and java script based on this algorithm..

Practical usage:
Huffman is widely used in all the mainstream compression formats that you might
encounter - from GZIP, PKZIP (winzip etc) and BZIP2, to image formats such as JPEG
and PNG.

All compression schemes have pathological data-sets that cannot be meaningfully

compressed; the archive formats I listed above simply 'store' such files uncompressed
when they are encountered.

Newer arithmetic and range coding schemes are often avoided because of patent

issues, meaning Huffman remains the work-horse of the compression industry.

Project Report
No ratings yet
Project Report
69 pages
SY - Synchronous Counter Using Flip Flops
100% (1)
SY - Synchronous Counter Using Flip Flops
2 pages
Sequential Circuit Description: Unit 5
100% (1)
Sequential Circuit Description: Unit 5
76 pages
IC Applications Lab Manual Satish Babu
No ratings yet
IC Applications Lab Manual Satish Babu
74 pages
Spread Spectrum
No ratings yet
Spread Spectrum
41 pages
DPSK
No ratings yet
DPSK
3 pages
Fingerprint Based Electronic Voting Machine
No ratings yet
Fingerprint Based Electronic Voting Machine
20 pages
Addressingmodes tms320c5x
No ratings yet
Addressingmodes tms320c5x
16 pages
Industrial Visit Report ON Kurl-On Limited. Gwalior, Madhya Pradesh
No ratings yet
Industrial Visit Report ON Kurl-On Limited. Gwalior, Madhya Pradesh
13 pages
F.Y.B.Sc Electronics Syllabus NEP 2020
No ratings yet
F.Y.B.Sc Electronics Syllabus NEP 2020
39 pages
Term Paper On Embedded System
0% (1)
Term Paper On Embedded System
11 pages
Lec10 Register Transfer and Microoperations
No ratings yet
Lec10 Register Transfer and Microoperations
22 pages
Devops Record
No ratings yet
Devops Record
109 pages
Po, Co - Pso Vlsi Design
No ratings yet
Po, Co - Pso Vlsi Design
9 pages
Module 1 Notes (17EC81)
No ratings yet
Module 1 Notes (17EC81)
16 pages
Cse III Logic Design 10cs33 Notes PDF
0% (1)
Cse III Logic Design 10cs33 Notes PDF
80 pages
Hybrid Low Radix Encoding-Based Approximate Booth Multipliers
100% (1)
Hybrid Low Radix Encoding-Based Approximate Booth Multipliers
33 pages
18 MHC 205 J
No ratings yet
18 MHC 205 J
3 pages
Digital Logic Design: Decoder & Encoder
No ratings yet
Digital Logic Design: Decoder & Encoder
20 pages
Embedded System Case Study
No ratings yet
Embedded System Case Study
6 pages
Logic Design With MSI Components and Programmable Logic Devices
No ratings yet
Logic Design With MSI Components and Programmable Logic Devices
0 pages
2 To 4 Decoder
No ratings yet
2 To 4 Decoder
6 pages
DSP Unit 1
No ratings yet
DSP Unit 1
186 pages
Chapter 4 (Processors and Memory Hierarchy)
100% (1)
Chapter 4 (Processors and Memory Hierarchy)
17 pages
Theoretical and Practical Analysis On CNN, MTCNN and Caps-Net Base Face Recognition and Detection PDF
No ratings yet
Theoretical and Practical Analysis On CNN, MTCNN and Caps-Net Base Face Recognition and Detection PDF
35 pages
DSP Mod1@AzDOCUMENTS - in
No ratings yet
DSP Mod1@AzDOCUMENTS - in
60 pages
IT Assignment 2
No ratings yet
IT Assignment 2
2 pages
Humidity Sensor: Key Kai Wong
No ratings yet
Humidity Sensor: Key Kai Wong
15 pages
Finger Print PDF
No ratings yet
Finger Print PDF
63 pages
Python Application Programming - 18CS752 - Syllabus
No ratings yet
Python Application Programming - 18CS752 - Syllabus
4 pages
SERIAL - PORT
No ratings yet
SERIAL - PORT
17 pages
NATL Notes Unit2
No ratings yet
NATL Notes Unit2
10 pages
MCT306 - Industry 4.0 & IIOT
No ratings yet
MCT306 - Industry 4.0 & IIOT
55 pages
University Paper DSP(KEC503)_2023-24
No ratings yet
University Paper DSP(KEC503)_2023-24
2 pages
BM304 Biomedical Signal Processing PDF
No ratings yet
BM304 Biomedical Signal Processing PDF
2 pages
Co Unit 1 Notes
100% (1)
Co Unit 1 Notes
51 pages
Intranet Mailing
No ratings yet
Intranet Mailing
107 pages
Caddo Spectrum Analyzer 80058005TG
No ratings yet
Caddo Spectrum Analyzer 80058005TG
27 pages
Olutions To Assignment #4
No ratings yet
Olutions To Assignment #4
1 page
Digital Signal Processors and Architectures (DSPA) Unit-2
No ratings yet
Digital Signal Processors and Architectures (DSPA) Unit-2
92 pages
EC8652-Wireless Communication Unit 4 Notes
No ratings yet
EC8652-Wireless Communication Unit 4 Notes
25 pages
CEC349
No ratings yet
CEC349
2 pages
Seminar Final
No ratings yet
Seminar Final
29 pages
Speech Recognition Using Matlab Project Report: Submitted For The Course
No ratings yet
Speech Recognition Using Matlab Project Report: Submitted For The Course
6 pages
Module-1 Theory of Parallelism: The State of Computing Computer Development Milestones
No ratings yet
Module-1 Theory of Parallelism: The State of Computing Computer Development Milestones
48 pages
21EC732_Module1
No ratings yet
21EC732_Module1
18 pages
B.Tech Project Mid Term Report: Handwritten Digits Recognition Using Neural Networks
No ratings yet
B.Tech Project Mid Term Report: Handwritten Digits Recognition Using Neural Networks
13 pages
Embedded Systems Design Using Arm Technology
No ratings yet
Embedded Systems Design Using Arm Technology
3 pages
CMOSDICD
No ratings yet
CMOSDICD
19 pages
MPMC Lab Manual
No ratings yet
MPMC Lab Manual
107 pages
IOT Based Air Pollution Monitoring System
No ratings yet
IOT Based Air Pollution Monitoring System
24 pages
Cse - Iot IV Years Cs & Syllabus Ug r20
No ratings yet
Cse - Iot IV Years Cs & Syllabus Ug r20
168 pages
JNTUK M.Tech R13 CNC Syllabus
No ratings yet
JNTUK M.Tech R13 CNC Syllabus
18 pages
Mini Project Report
No ratings yet
Mini Project Report
15 pages
Cycle-I: Computer Communication Networks Lab Manual (18TE63), 2020-2021
No ratings yet
Cycle-I: Computer Communication Networks Lab Manual (18TE63), 2020-2021
77 pages
Comm Lab-II (BECL504) Manual
No ratings yet
Comm Lab-II (BECL504) Manual
45 pages
Project Report Computer Hardware Networking Mass Infotech (Cedti), Yamuna Nagar (Hariyana)
No ratings yet
Project Report Computer Hardware Networking Mass Infotech (Cedti), Yamuna Nagar (Hariyana)
151 pages
Led Distance Indicator
No ratings yet
Led Distance Indicator
14 pages
REPORT - DRONE AND IMPROVED HUMAN DETECTION IN SEA USING PI PICO New
No ratings yet
REPORT - DRONE AND IMPROVED HUMAN DETECTION IN SEA USING PI PICO New
52 pages
Huff Man Coding
No ratings yet
Huff Man Coding
8 pages
Project Proposal For DST & Texas Instruments Inc. India Innovation Challenge Design Contest 2016 Anchored by IIM Bangalore
No ratings yet
Project Proposal For DST & Texas Instruments Inc. India Innovation Challenge Design Contest 2016 Anchored by IIM Bangalore
16 pages
Introduction To Telephony-: Basic Requirements of Speech Transmission
No ratings yet
Introduction To Telephony-: Basic Requirements of Speech Transmission
36 pages
Security Architecture in UMTS Third Generation Cellular Networks
No ratings yet
Security Architecture in UMTS Third Generation Cellular Networks
20 pages
Winter Training Report-2017-18: G B Pant Govt. Engineering College
No ratings yet
Winter Training Report-2017-18: G B Pant Govt. Engineering College
2 pages
Hamming Code: Mathematical Block Length Message Length
No ratings yet
Hamming Code: Mathematical Block Length Message Length
4 pages
Hamming Code: Mathematical Block Length Message Length
No ratings yet
Hamming Code: Mathematical Block Length Message Length
4 pages
Nitin
No ratings yet
Nitin
38 pages
GB Pant Engineering College: Steganography Advancements Using Information Technology
No ratings yet
GB Pant Engineering College: Steganography Advancements Using Information Technology
27 pages
Faster Arithmeticmethods
No ratings yet
Faster Arithmeticmethods
3 pages
19Nh14 102190051 Lab13 Chương Trình MapReduce Shortest Path Using Parallel Breadth First Search BFS 02
No ratings yet
19Nh14 102190051 Lab13 Chương Trình MapReduce Shortest Path Using Parallel Breadth First Search BFS 02
16 pages
20BCP021 Assignment 6
No ratings yet
20BCP021 Assignment 6
15 pages
Minimal Polynomial
No ratings yet
Minimal Polynomial
5 pages
R21UIT307-Mini Project Question
No ratings yet
R21UIT307-Mini Project Question
4 pages
DIP Quiz 7883
No ratings yet
DIP Quiz 7883
2 pages
Dijkestra Algorithm PPT L-20
No ratings yet
Dijkestra Algorithm PPT L-20
19 pages
Gauss Jordan Elimination
No ratings yet
Gauss Jordan Elimination
13 pages
L07 Adversarial Search
No ratings yet
L07 Adversarial Search
48 pages
2-Introduction To AI-19-07-2024
No ratings yet
2-Introduction To AI-19-07-2024
3 pages
Applied Mathematics and Computation: Kiyotaka Yamamura, Koki Suda, Naoya Tamura
No ratings yet
Applied Mathematics and Computation: Kiyotaka Yamamura, Koki Suda, Naoya Tamura
9 pages
First Order - Second Order Iir Filters
No ratings yet
First Order - Second Order Iir Filters
4 pages
Harmonics THD Distortion Calculation by Filtering For PQM
No ratings yet
Harmonics THD Distortion Calculation by Filtering For PQM
4 pages
Heuristic Search
No ratings yet
Heuristic Search
11 pages
2019 May CS464-A - Ktu Qbank
No ratings yet
2019 May CS464-A - Ktu Qbank
2 pages
UNIT-6
No ratings yet
UNIT-6
30 pages
Pattern - Recognition - 3 - Code With Output
No ratings yet
Pattern - Recognition - 3 - Code With Output
7 pages
Teaching Learning Based Optimization: Application and Variation
No ratings yet
Teaching Learning Based Optimization: Application and Variation
5 pages
Lab 4 Line Coding Techniques
No ratings yet
Lab 4 Line Coding Techniques
10 pages
Tutorial 10: Solving Cutting Stock Problem Using Column Generation Technique
No ratings yet
Tutorial 10: Solving Cutting Stock Problem Using Column Generation Technique
13 pages
Po-Jui Huang and Duan-Yu Chen Department of Electrical Engineering, Yuan Ze University, Chung-Li, Taiwan Dychen@saturn - Yzu.edu - TW, S970561@mail - Yzu.edu - TW
No ratings yet
Po-Jui Huang and Duan-Yu Chen Department of Electrical Engineering, Yuan Ze University, Chung-Li, Taiwan Dychen@saturn - Yzu.edu - TW, S970561@mail - Yzu.edu - TW
5 pages
Wavelet Decomposition of Data Streams: by Dragana Veljkovic
No ratings yet
Wavelet Decomposition of Data Streams: by Dragana Veljkovic
34 pages
Mobile Computer Vision: Optical Flow and Tracking
No ratings yet
Mobile Computer Vision: Optical Flow and Tracking
55 pages
Des in C and Java
No ratings yet
Des in C and Java
18 pages
Curve Fitting
No ratings yet
Curve Fitting
20 pages
ECEN 314: Signals and Systems: 1 Continuous-Time Convolution
No ratings yet
ECEN 314: Signals and Systems: 1 Continuous-Time Convolution
6 pages
Neural Networks
No ratings yet
Neural Networks
5 pages
Report Error Correction
100% (1)
Report Error Correction
3 pages
5.5 Graph Linear Functions NAME: - Corrective Assignment DATE
No ratings yet
5.5 Graph Linear Functions NAME: - Corrective Assignment DATE
3 pages
MATLAB Basic Functions and Commands: Ece120L - Introduction To Matlab Laboratory Activity #1
No ratings yet
MATLAB Basic Functions and Commands: Ece120L - Introduction To Matlab Laboratory Activity #1
6 pages