ibook.pub-basic-arithmetic-coding-based-approach-to-compress-a-character-string
ibook.pub-basic-arithmetic-coding-based-approach-to-compress-a-character-string
Abstract Data compression plays an important role for storing and transmitting
text or multimedia information. This paper refers to a lossless data algorithm is
developed in C-platform to compress character string based on Basic Arithmetic
Coding. At the preliminary stage, this algorithm was tested for the character array
comprising of vowels only and the probability distribution is assumed arbitrarily.
The result being obtained is encouraging with compression ratio far beyond unity.
Though the algorithm was tested for vowels only but the work can be extended for
any character array with probability of distribution as obtained from the survey of
few randomly selected articles.
1 Introduction
I. Mondal (✉)
Department of CSE, Techno India Batanagar, Kolkata, India
e-mail: ipsita.mondal@yahoo.com
S.J. Sarkar
Department of EE, Techno India Batanagar, Kolkata, India
e-mail: subhro89@gmail.com
Input Encoding
character Algorithm
string Primary or
Secondary Memory
for storing encrypted
string
Decoded Decoding
character
Algorithm
string
There are numerous methods of data compression. Broadly, the compression can
be classified as lossy or lossless compression. In lossy compression, there is some
removal of some unimportant data values present in the file while performing these
algorithms. Some of its examples include transform coding, Karhunen–Loeve
Transform (KLT) coding, wavelet-based coding, etc. Real-time applications of
these compression algorithms are in compression of multimedia files like audio,
video, images, etc. [1]. On the other hand, there is no loss of data information in
lossless data compression techniques like Shannon–Fano algorithm, Huffman
algorithm, arithmetic Coding, etc. [1, 4]. Lossless data compression is more popular
for compressing text documents, images of higher importance like image of
cancerous tissues, etc. [4].
The application of the work done in [1] was confined to the compression of data
string for power system applications only. If the algorithm can be extended to
compress any character string, it can be used for the applications like compression
of files present in any office or compression of the contents of books present in the
library, etc. As the actual data obtained after data compression algorithm is
encrypted, it becomes impossible for any external agency to decode the data. So,
this method can allow only the authenticated users to access and use the data [5, 6].
The proposed algorithm is developed in C-language and the tested offline to obtain
the results.
2 Arithmetic Coding
The algorithm for basic arithmetic coding in order to compress a string com-
prising of the characters given in Table 2, is given in the subsequent section [1, 9].
STEP 1: Obtain the string, s and calculate its length (l).
STEP II: Initialize variables min = 0, max = 1, and r = 1.
STEP III: Set a counter i = 1.
STEP IV: Repeat steps V–IX until i != (l + 1).
STEP V: x = s (i).
STEP VI: Obtain r_low and r_hi corresponds to x.
STEP VII: Update min = min + r * r_low and max = min + r * r_hi.
STEP VIII: r = max − min.
STEP IX: i = i + 1.
STEP X: End of the loop.
STEP XI: Obtain a number num with minimum binary string length such that
min < num < max.
STEP XII: End.
Considering a string {‘1’, ‘1’, ‘2’, ‘5’, ‘6’} with 5 characters upon which
arithmetic coding algorithm is going to be implemented. The process of execution
is illustrated below [1].
Initialization: min = 0, max = 1and r = 1.
l = 5 → No. of iterations = 5.
34 I. Mondal and S.J. Sarkar
3 Proposed Algorithm
Encoding algorithm
STEP 1: Start
STEP 2: Input the string i.e. str.
STEP 3: Count the length of the string.
STEP 4: Initialize pini = 0.0, pfin = 1.0 and r = pfin-pini.
STEP 5: i = 0
STEP 6: Repeat steps 7 and 8 while i < length do
STEP 7: Fetch the r_min[i] and r_max[i] values from Table No. 4 and detemine the
corresponding stringthat lies between the ranges.
STEP 8: i = i+1
[End of loop]
STEP 9: i1 = 0
STEP 10: Repeat STEP XI and XII while i1< length do
STEP 11: pini = pini + r * r_max[i1] and pfin = pini + r * r_max[i1]
STEP 12: i1 <- i1+1
[End of loop]
STEP 13: Select a value i.e. val which lies between pini and pfin i.e. pini < val < pfin
STEP 14: Convert val to binary and store it in str1
STEP 15: Check if str1 % 7 != 0 do
t = str1%7
t1 = 7 – t
add t1 number of 0’s at the start bits of the string
[End of if]
STEP 16: Count the length of str1 i.e. l
STEP 17: l1 = l / 7
STEP 18: i2 = 0
STEP 19: Repeat steps 20 to 25 while i2 < l1
STEP 20: j = 0
STEP 21: Repeat while j < 7
STEP 22: Store str1[j] in a new array
STEP 23: Convert it to decimal equivalent value
STEP 24: j = j+1
[End of inner loop]
STEP 25: i2 = i2+1
[End of outer loop]
STEP 26: Print the decimal string which is the compressed string.
36 I. Mondal and S.J. Sarkar
Decoding algorithmtpb 2
See Table 4.
STEP 1: Read the number of zero added i.e. t1
STEP 2: i = 0
STEP 3: Repeat steps 4 and 5 while i < l1 do
STEP 4: Read the 7 bit decimal number and convert it in equivalent binary
STEP 5: i = i + 1
[End of loop]
STEP 6: Concatenate all the strings and delete the t1 number of 0’s from the string
STEP 7: i1 = t1
STEP 8: Repeat steps 9 and 10 while i1 < length do
STEP 9: Store the elements in an array
STEP 10: i1 = i1 + 1
[End of loop]
STEP 11: Determine the decimal equivalent of the string
STEP 12: Check the range of pini and pfin where the decimal value lies in between i.e.
pini < deci < pfin
STEP 13: i2 <- 0
STEP 14: Repeat while i < length do
STEP 15: Check the r_min[i2] and r_max[i2] values from table 4 and print the
correspondingstring.
STEP 16: i2 = i2 + 1
[End of loop]
STEP 17: The string is the required output that is the input string
STEP 18: End
The proposed algorithm is tested with input of various string length and corre-
sponding output size is obtained. Compression ratio is an important parameter for
any data compression algorithm which gives the effectiveness of the compression.
Basic Arithmetic Coding Based Approach … 37
Table 5 Variation of output string size with string length for best, intermediate, and worst cases
Sl. no. String length Compression ratio
Best case Intermediate case Worst case
1 5 2.5 2.5 2.5
2 15 3.75 3.75 3.75
3 25 5 4.167 3.571
4 35 7 5 4.375
The value of compression ratio being obtained by the proposed algorithm is much
beyond unity. The compression ratio being obtained for different string length is
given in Table 5. The input string can have three possible combinations, i.e., string
containing characters with highest probability only (best case), string containing
characters with lowest probability only (worst case), and any random combination
of characters (intermediate case). It is obvious that compression ratio for the best
case will have highest possible value than that obtained for intermediate or worst
case. From Table 5, it is also clear that compression ratio increases with the input
string for all three cases and thereby can be used for compressing large strings quite
effectively.
5 Conclusions
From Table 5, it is clear that the compression ratio being obtained is pretty
impressive for longer strings. In this paper, only vowel characters, i.e., a, e, i, o, u
are considered with arbitrary probability to test the algorithm. The algorithm can
extended to be implemented for compressing the character string containing all the
characters including special characters. But it is obvious that the compression ratio
will not be as high as obtained in this case. Accurate determination of the proba-
bility of occurrence of the characters is required to improve the compression ratio.
This is possible either by following the character probability pattern of previous
available data or by employing adaptive algorithm. But the adaptive algorithm has
its own limitations due to the requirement of probability distribution table for
decoding purpose. The variation of actual string and encrypted data size for the
three possible cases with the length of input string is provided in Fig. 2. From the
graph given in Fig. 2, it is clear that though the encrypted data size for all the three
cases are same for lower string length, but for larger string, there is a significant
variation of encrypted data size between the best and worst case.
38 I. Mondal and S.J. Sarkar
Data size
Best Case
20
Intermediate
15 Worst
10
5
0
5 15 25 35
Length of input array
References
1. Sarkar, S. J., Das, B., Dutta, T., Dey, P., Mukherjee, A.: An Alternative Voltage and Frequency
Monitoring Scheme for SCADA based Communication in Power System using Data
Compression. In: International Conference and Workshop on Computing and Communication
(IEMCON), pp. 1–7 (2015)
2. Takahashi, Y., Matsui, S., Nakata, Y., Kondo, T.: Communication Method with Data
Compression & Encryption for Mobile Computing Environment, https://www.isoc.org/inet96/
proceedings/a6/a6_2.html
3. Liu, H.-S., Chuang, C.-C., Lin, C.-C., Chang, R.-I, Wang, C.-H., Hsieh, C.-C.: Data
Compression for Energy Efficient Communication on Ubiquitous Sensor Network. In:
Tamkang Journal of Science and Engineering, Vol. 14, No. 3, pp. 345–354 (2011)
4. Kodituwakku, S. R., Amarasinghe, U. S.: Comparisons of Lossless Data Compression
Algorithms for Text Data. In: Indian Journal of Computer Science and Engineering, Vol. 1,
No. 4, pp. 406–425
5. Brar, R. S. and Singh, B.,: A survey on different compression techniques and bit reduction
algorithm for compression of text data. In: International Journal of Advanced Research in
Computer Science and Software Engineering (IJARCSSE) Volume 3, Issue 3 (March 2013)
6. Theory of Data Compression, http://www.data-compression.com/theory.shtml
7. Porwal, S., Chaudhary, Y., Joshi, J., and Jain, M.: Data Compression Methodologies for
Lossless Data and Comparison between Algorithms. In:International Journal of Engineering
Science and Innovative Technology (IJESIT) Volume 2, Issue 2 (March 2013)
8. Shanmugasundaram, S., and Lourdusamy, R.: A Comparative Study of Text Compres-
sionAlgorithms. In:International Journal of Wisdom Based Computing, Vol. 1 (3) (December
2011)
9. Li, Z.-N., Drew, Mark S., Liu, J.: Fundamentals of Multimedia, 2nd Edition, Springer (2014)