0% found this document useful (0 votes)

139 views

Analysis and Comparison of Algorithms For Lossless Data Compression

This is a Paper Algorithm

Uploaded by

Rachel Avenger

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

139 views

Analysis and Comparison of Algorithms For Lossless Data Compression

This is a Paper Algorithm

Uploaded by

Rachel Avenger

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

International Journal of Information and Computation Technology.

ISSN 0974-2239 Volume 3, Number 3 (2013), pp. 139-146

Analysis and Comparison of Algorithms for

Lossless Data Compression

Anmol Jyot Maan

Hyderabad, INDIA.

Abstract

Data compression is an art used to reduce the size of a particular file.

The goal of data compression is to eliminate the redundancy in a file’s
code in order to reduce its size. It is useful in reducing the data storage
space and in reducing the time needed to transmit the data. Data
compression can either be lossless or lossy. Lossless data compression
recreates the exact original data from the compressed data while lossy
data compression cannot regenerate the perfect original data from the
compressed data. Lossy methods are mainly used for compressing
sound, images or video. A lot of data compression algorithms are
available to compress files of different formats. This paper involves the
discussion and comparison of a selected set of lossless data
compression algorithms.

Keywords: Data compression, Lossless Compression, Lossy

Compression, Huffman Coding, Arithmetic Coding, Run Length
Encoding.

1. Introduction
Data compression is the art of representing information in compact form. It reduces the
file size which in turn reduces the required storage space and makes the transmission
of data quicker. Compression techniques try to find redundant data and remove these
redundancies. Data compression can be divided into two broad classes: lossless data
compression and lossy data compression. In lossless compression, the exact original
data can be recovered from compressed data. It is used when the difference between
original data and decompressed data cannot be tolerated. Medical images, text needed
in legal purposes and computer executable files are compressed using lossless
140 Anmol Jyot Maan

compression techniques. Lossy compression, as the name suggests, involves loss of

information. It is used in the applications where the lack of reconstruction is not an
issue. Videos and audios are compressed using lossy compression.
The extremely fast growth of data that needs to be stored and transferred has given
rise to the demands of better transmission and storage techniques. Various lossless data
compression algorithms have been proposed and used. Huffman Coding, Arithmetic
Coding, Shannon Fano Algorithm, Run Length Encoding Algorithm are some of the
techniques in use. This paper examines Huffman Coding, Arithmetic Coding, and Run
Length Encoding Algorithm.

2. Run Length Encoding

Run Length Encoding (RLE) is the simplest of the data compression algorithms. It
replaces runs of two or more of the same character with a number which represents the
length of the run, followed by the original character. Single characters are coded as
runs of 1. The major task of this algorithm is to identify the runs of the source file, and
to record the symbol and the length of each run. The Run Length Encoding algorithm
uses those runs to compress the original source file while keeping all the non-runs
without using for the compression process.

Example of RLE:
Input: AAABBCCCCD
Output: 3A2B4C1D

3. Huffman Coding
First Huffman coding algorithm was developed by David Huffman in 1951. Huffman
coding is an entropy encoding algorithm used for lossless data compression. In this
algorithm fixed length codes are replaced by variable length codes. When using
variable-length code words, it is desirable to create a prefix code, avoiding the need for
a separator to determine codeword boundaries. Huffman Coding uses such prefix code.
Huffman procedure works as follow:
1. Symbols with a high frequency are expressed using shorter encodings than
symbols which occur less frequently.
2. The two symbols that occur least frequently will have the same length.
The Huffman algorithm uses the greedy approach i.e. at each step the algorithm
chooses the best available option. A binary tree is built up from the bottom up. To see
how Huffman Coding works, let’s take an example. Assume that the characters in a
file to be compressed have the following frequencies:
A: 25 B: 10 C: 99 D: 87 E: 9 F: 66
The processing of building this tree is:
1. Create a list of leaf nodes for each symbol and arrange the nodes in the order
from highest to lowest.
Analysis and Comparison of Algorithms for Lossless Data Compression 141

C:99 D:87 F:66 A:25 B:10 E:9

2. Select two leaf nodes with the lowest frequency. Create a parent node with
these two nodes and assign the frequency equal to the sum of the frequencies of
two child nodes.

Now add the parent node in the list and remove the two child nodes from the list.
And repeat this step until you have only one node left.
142 Anmol Jyot Maan

3. Now label each edge. The left child of each parent is labeled with the digit 0
and right child with 1. The code word for each source letter is the sequence of
labels along the path from root to the leaf node representing the letter.

Huffman Codes are shown below in the table

Table 1: Huffman Codes.

C 00
D 01
F 10
A 110
B 1110
E 1111

4. Arithmetic Coding
Arithmetic Coding is useful for small alphabets with highly skewed probabilities. In
this method, a code word is not used to represent a symbol of the text. Instead, it
produces a code for an entire message. Arithmetic Coding assigns an interval to each
symbol. Then a decimal number is assigned to this interval. Initially, the interval is [0,
1). A message is represented by a half open interval [x, y) where x and y are real
numbers between 0 and 1. The interval is then divided into sub-intervals. The number
of sub-intervals is identical to the number of symbols in the current set of symbols and
size is proportional to their probability of appearance. For each symbol a new internal
division takes place based on the last sub interval.
Consider an example illustrating encoding in Arithmetic Coding.
Analysis and Comparison of Algorithms for Lossless Data Compression 143

Table 2: Encoding in Arithmetic Coding.

Symbol Probability Range

X 0.5 [0.0, 0.5)
Y 0.3 [0.5, 0.8)
Z 0.2 [0.8, 1.0)

Table 3: Encoding symbol “YXX”.

Symbol Range Low High

Value Value
0 1
Y 1 0.5 0.8
X 0.3 0.5 0.65
X 0.15 0.5 0.575

In table 3, range, high value and low value are calculated as:
Range= High value – Low value
High Value= Low value + Range * high range of the symbol being computed
Low Value= Low value + Range * low range of the symbol being computed

The string “YXX” is represented by an arbitrary number within the interval [0.5,
0.575).

Figure 1: Graphical display of shrinking ranges.

144 Anmol Jyot Maan

5. Measuring compression performances

There are various criteria to measure the performance of a compression algorithm.
However, the main concern has always been the space efficiency and time efficiency.
Following are some measurements used to evaluate the performances of lossless
algorithm.

1. Compression Ratio: It is the ratio between the size of the compressed file and
the size of the source file.

2. Compression factor: It is the inverse of the compression ratio.

3. Saving percentage: it calculates the shrinkage of the source file.

6. Comparing the algorithms:

1. Run Length Encoding: In the worst case RLE generates the output data which
is 2 times more than the size of input data. This is due to the fewer amount of
runs in the source file. And the files that are compressed have very high values
of compression ratio. This algorithm does not provide significant improvement
over the original file.
2. Huffman Coding vs. Arithmetic Coding: Huffman Coding Algorithm uses a
static table for the whole coding process, so it is faster. However it does not
produce efficient compression ratio.
On the contrary, Arithmetic algorithm can generate a high compression ratio, but
its compression speed is slow.
The table 4 presents a simple comparison between these compression methods.

Table 4: Huffman Coding Vs. Arithmetic Coding.

COMPRESSION METHOD ARITHMETIC HUFFMAN

Compression ratio Very good Poor
Compression speed Slow Fast
Decompression speed Slow Fast
Memory space Very low Low
Compressed pattern matching No Yes
Permits Random access No Yes
Analysis and Comparison of Algorithms for Lossless Data Compression 145

Conclusion
Arithmetic coding techniques outperforms Huffman coding and Run Length Encoding.
Also the Compression ratio of the Arithmetic coding algorithm is better than the other
two algorithms examined above. In this paper, it is found that the Arithmetic Coding is
the most efficient algorithm among the selected ones.

References

[1] Introduction to Data Compression, Khalid Sayood, Ed Fox (Editor), March

2000.
[2] Ken Huffman. Profile: David A. Huffman, Scientific American, September
1991, pp. 54–58.
[3] Blelloch, E., 2002. Introduction to Data Compression, Computer Science
Department, Carnegie Mellon University.
[4] Senthil Shanmugasundaram, Robert Lourdusamy, A Comparative Study Of
Text Compression Algorithm, International Journal of Wisdom Based
Computing, Vol.1 (3)
[5] S.R. Kodituwakku. U. S.Amarasinghe Comparison Of Lossless Data
Compression Algorithms For Text Data
[6] P.Yellamma Dr.Narasimham Challa. Performance Analysis Of Different Data
Compression Techniques On Text File October-2012.
[7] http://www.ieeeghn.org/wiki/index.php/Historyof Lossless Data Compression
Algorithms
[8] http://www.binaryessence.com/dct/en000003.htm
[9] Data compression Wikipedia.
146 Anmol Jyot Maan

Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
From Everand
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
Sherwyn Allibang
5/5 (2)
The Enochian Holy Table
100% (15)
The Enochian Holy Table
3 pages
Computer Science Extended Essay
No ratings yet
Computer Science Extended Essay
15 pages
Case Study On Amazon Simpledb For A Particular Real-World Application
No ratings yet
Case Study On Amazon Simpledb For A Particular Real-World Application
7 pages
Comparison of Lossless Data Compression Algorithms
No ratings yet
Comparison of Lossless Data Compression Algorithms
12 pages
Literature Survey
No ratings yet
Literature Survey
5 pages
Wa0023.
No ratings yet
Wa0023.
28 pages
Assignment 1
No ratings yet
Assignment 1
14 pages
Application of Compression
No ratings yet
Application of Compression
14 pages
Assignment Agmase
No ratings yet
Assignment Agmase
14 pages
ICT - Module 1 Lecture 3
No ratings yet
ICT - Module 1 Lecture 3
43 pages
Text Compression
No ratings yet
Text Compression
25 pages
Unit 1 Data Compression
No ratings yet
Unit 1 Data Compression
30 pages
Data Compression Algorithms and Their Applications
100% (1)
Data Compression Algorithms and Their Applications
14 pages
Data Compression Techniques: Pushpender Rana, Student
No ratings yet
Data Compression Techniques: Pushpender Rana, Student
4 pages
IJCST V4I3P43 With Cover Page v2
No ratings yet
IJCST V4I3P43 With Cover Page v2
7 pages
XXBEC00xx_VL2020210102036_DA (1)
No ratings yet
XXBEC00xx_VL2020210102036_DA (1)
3 pages
Compression Techniques and Cyclic Redundency Check
No ratings yet
Compression Techniques and Cyclic Redundency Check
5 pages
Chapter 3 Multimedia Data Compression
No ratings yet
Chapter 3 Multimedia Data Compression
23 pages
Data Compression
No ratings yet
Data Compression
20 pages
Comparison of Huffman Algorithm and Lempel-Ziv Algorithm For Audio, Image and Text Compression
No ratings yet
Comparison of Huffman Algorithm and Lempel-Ziv Algorithm For Audio, Image and Text Compression
7 pages
Data Compresion 1
No ratings yet
Data Compresion 1
2 pages
Data Compression (RCS 087)
No ratings yet
Data Compression (RCS 087)
51 pages
Lossless Data Compression Techniques and Their Performance
No ratings yet
Lossless Data Compression Techniques and Their Performance
6 pages
DC 3
No ratings yet
DC 3
20 pages
Text Data Compression
No ratings yet
Text Data Compression
13 pages
ibook.pub-basic-arithmetic-coding-based-approach-to-compress-a-character-string
No ratings yet
ibook.pub-basic-arithmetic-coding-based-approach-to-compress-a-character-string
8 pages
2024-11-12 Huffman Trees 分享_f87016674ea4d3483130e3146734847d
No ratings yet
2024-11-12 Huffman Trees 分享_f87016674ea4d3483130e3146734847d
11 pages
Ultimedia OF ATA Ompression: IS502:M D I S
No ratings yet
Ultimedia OF ATA Ompression: IS502:M D I S
29 pages
CHAPTER 7
No ratings yet
CHAPTER 7
36 pages
Lossless Img Comp
No ratings yet
Lossless Img Comp
8 pages
Huffman Coding, RLE, LZW
No ratings yet
Huffman Coding, RLE, LZW
41 pages
Data Compression
No ratings yet
Data Compression
18 pages
Image Compression by Retaining Image Quality - Ieee Format
No ratings yet
Image Compression by Retaining Image Quality - Ieee Format
4 pages
Vik
No ratings yet
Vik
23 pages
Module 5 - Info Theory and Compression Algo
No ratings yet
Module 5 - Info Theory and Compression Algo
58 pages
Jancy-Jayakumar2019 Article SequenceStatisticalCodeBasedDa
No ratings yet
Jancy-Jayakumar2019 Article SequenceStatisticalCodeBasedDa
15 pages
A Comparitive Study of Text Compression Algorithms PDF
No ratings yet
A Comparitive Study of Text Compression Algorithms PDF
9 pages
Data Compression
No ratings yet
Data Compression
7 pages
Witten Acm 87 Ar It HM Coding
No ratings yet
Witten Acm 87 Ar It HM Coding
21 pages
Main Techniques and Performance of Each Compression
No ratings yet
Main Techniques and Performance of Each Compression
23 pages
Compression and Decompression Using Huffman Convention Synopsis
No ratings yet
Compression and Decompression Using Huffman Convention Synopsis
10 pages
Documentation in Daa
No ratings yet
Documentation in Daa
16 pages
hggj Chapter Four
No ratings yet
hggj Chapter Four
30 pages
Huffman Coding MCQ
No ratings yet
Huffman Coding MCQ
9 pages
An Introduction To Arithmetic Coding: Glen G. Langdon, JR
No ratings yet
An Introduction To Arithmetic Coding: Glen G. Langdon, JR
15 pages
Why Needed?: Without Compression, These Applications Would Not Be Feasible
No ratings yet
Why Needed?: Without Compression, These Applications Would Not Be Feasible
11 pages
A Novel Encoding Algorithm For Textual Data Compression
No ratings yet
A Novel Encoding Algorithm For Textual Data Compression
14 pages
Image Compression
No ratings yet
Image Compression
38 pages
An FPGA-Based Implementation of Multi-Alphabet Arithmetic Coding
No ratings yet
An FPGA-Based Implementation of Multi-Alphabet Arithmetic Coding
9 pages
Data Compression Report
No ratings yet
Data Compression Report
10 pages
16 San
No ratings yet
16 San
7 pages
Multimedia Data Compression
No ratings yet
Multimedia Data Compression
31 pages
Introduction To Data Compression - Guy E. Blelloch PDF
No ratings yet
Introduction To Data Compression - Guy E. Blelloch PDF
54 pages
KMA SS05 Kap03 Compression
No ratings yet
KMA SS05 Kap03 Compression
54 pages
Entropy & Run Length Coding
No ratings yet
Entropy & Run Length Coding
45 pages
Group-8 DIP Presentation
No ratings yet
Group-8 DIP Presentation
100 pages
Compression PDF
No ratings yet
Compression PDF
55 pages
Improvised GZIP Published Eai.1!10!2019.160599
No ratings yet
Improvised GZIP Published Eai.1!10!2019.160599
8 pages
Umit;1 Mmdcs
No ratings yet
Umit;1 Mmdcs
17 pages
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Maintaning Computer System and Netwoks
No ratings yet
Maintaning Computer System and Netwoks
11 pages
Omac Packml V3.0 State Model Simulator
No ratings yet
Omac Packml V3.0 State Model Simulator
4 pages
Reflective Essay Dbms
No ratings yet
Reflective Essay Dbms
3 pages
Career Objective: Chiranjeevi Oracle Apps Technical Cell: +91-8801136343
No ratings yet
Career Objective: Chiranjeevi Oracle Apps Technical Cell: +91-8801136343
2 pages
Online4US Brochure en
No ratings yet
Online4US Brochure en
2 pages
Azure - Lab 01
No ratings yet
Azure - Lab 01
15 pages
Realtek Wi-Fi SDK For Android JB 4 2
No ratings yet
Realtek Wi-Fi SDK For Android JB 4 2
15 pages
Videoedge 2U Network Video Recorder: Data Sheet
No ratings yet
Videoedge 2U Network Video Recorder: Data Sheet
3 pages
Srikanth Resume Personal
No ratings yet
Srikanth Resume Personal
5 pages
SONOACE X4 Product Catalog
No ratings yet
SONOACE X4 Product Catalog
6 pages
Ii: Marathi:: Name & Signature of Invigilator/s
No ratings yet
Ii: Marathi:: Name & Signature of Invigilator/s
8 pages
C++ For New Beginners
100% (1)
C++ For New Beginners
243 pages
Adding Planning Capabilities To Your Game AI
No ratings yet
Adding Planning Capabilities To Your Game AI
8 pages
An Overview of Block-Chain Technology and Related Security Attacks: Systematic Literature Review
No ratings yet
An Overview of Block-Chain Technology and Related Security Attacks: Systematic Literature Review
15 pages
CP Functions
No ratings yet
CP Functions
39 pages
Nokia Central and Edge Data Center Solutions
No ratings yet
Nokia Central and Edge Data Center Solutions
24 pages
Chapter 2
No ratings yet
Chapter 2
9 pages
Advancrd Python Practical SEM II PDF
No ratings yet
Advancrd Python Practical SEM II PDF
48 pages
Project-Timeline Ms
No ratings yet
Project-Timeline Ms
3 pages
Ucl PHD Thesis Font Size
100% (3)
Ucl PHD Thesis Font Size
6 pages
Interactive Map.docx
No ratings yet
Interactive Map.docx
2 pages
Build Prop6 5
No ratings yet
Build Prop6 5
6 pages
School Ict Coordinator Designation Order
100% (1)
School Ict Coordinator Designation Order
3 pages
08 Cimio
50% (2)
08 Cimio
86 pages
Curriculum Vitae Example Graphic Designer
100% (2)
Curriculum Vitae Example Graphic Designer
6 pages
ERP Playbook (1st Draft)
No ratings yet
ERP Playbook (1st Draft)
9 pages
Source Coding Vs Channel Coding
No ratings yet
Source Coding Vs Channel Coding
5 pages
A Voltmeter Gives 120 Oscillations Per Minute When Connected To The Rotor of An Induction Motor. The Frequency Is 50 Hz. What Is The Slip of The Motor - Quora
No ratings yet
A Voltmeter Gives 120 Oscillations Per Minute When Connected To The Rotor of An Induction Motor. The Frequency Is 50 Hz. What Is The Slip of The Motor - Quora
5 pages