Word-Based Fixed and Flexible List Compression

Celikel, Ebru; Dalkilic, Mehmet E.; Dalkilic, Gokhan

doi:10.1007/11569596_80

Ebru Celikel¹⁹,
Mehmet E. Dalkilic¹⁹ &
Gokhan Dalkilic²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3733))

Included in the following conference series:

International Symposium on Computer and Information Sciences

2668 Accesses

Abstract

We present a dictionary based lossless text compression scheme where we keep frequent words in separate lists (list_n contains words of length n). We pursued two alternatives in terms of the lengths of the lists. In the "fixed" approach all lists have equal number of words whereas in the "flexible" approach no such constraint is imposed. Results clearly show that the "flexible" scheme is much better in all test cases possibly due to the fact that it can accomodate short, medium or long word lists reflecting on the word length distributions of a particular language. Our approach encodes a word as a prefix (the length of the word) and the body of the word (as an index in the corresponding list). For prefix encoding we have employed both a static encoding and a dynamic encoding (Huffman) using the word length statistics of the source language. Dynamic prefix encoding clearly outperformed its static counterpart in all cases. A language with a higher average word length can, theoretically, benefit more from a word-list based compression approach as compared to one with a lower average word length. We have put this hypothesis to test using Turkish and English languages with average word lengths of 6.1 and 4.4, respectively. Our results strongly support the validity of this hypothesis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Multi-Stream Word-Based Compression Algorithm for Compressed Text Search

Article 12 June 2018

Trigram-Based Vietnamese Text Compression

Lempel–Ziv-78 Compressed String Dictionaries

Article 26 July 2017

References

Witten, I., Moffat, A., Bell, T.C.: Managing Gigabytes – Compressing and Indexing Documents and Images, San Francisco, CA, USA (1999)
Google Scholar
Nelson, M.: The Data Compression Book. NewYork, USA, ch. 3 (1996)
Google Scholar
Diri, B.: A Text Compression System Based on the Morphology of Turkish Language. In: International Symposium on Computer and Information Sciences (ISCIS) XV, October 11-13. Yildiz Technical University, Istanbul (2000)
Google Scholar
Bentley, J.L., Sleator, D.D., Tarjan, R.E., Wei, V.K.: A Locally Adaptive Data Compression Scheme. Communications of the ACM 29(4), 320–330 (1986)
Article MATH MathSciNet Google Scholar
Teahan, W.J.: Modelling English Text. In: The Entropy of English Using PPM Based Models, ch. 8, p. 140 (1998)
Google Scholar
Celikel, E., Dincer, B.T.: Improving the Compression Performance of Turkish Texts with PoS Tags. In: International Conference on Information and Knowledge Engineering (IKE 2004), Las Vegas, NV, USA, pp. 519–523 (2004)
Google Scholar
Dalkılıç, M.E., Dalkılıç, G.: Some Measurable Language Characteristics of Printed Turkish. In: International Symposium on Computer and Information Sciences (ISCIS) XVI, Antalya, November 5-7 (2001)
Google Scholar
Diri, B.: A System for Turkish Texts Based on the Analysis of Turkish Language Structure and Providing Dynamic Compression with Word-based Lossless Recovery (in Turkish) PhD thesis. Yildiz Technical University, Istanbul (1999)
Google Scholar
Koltuksuz, A.H.: Cryptanalitic Measures of Turkish for Symmetrical Cryptosystems (in Turkish) PhD Thesis. Ege University Department of Computer Engineering, Izmir, Turkey (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

International Computer Institute, Ege University, 35100, Bornova, Izmir, Turkey
Ebru Celikel & Mehmet E. Dalkilic
Computer Engineering Department, Dokuz Eylul University, 35100, Bornova, Izmir, Turkey
Gokhan Dalkilic

Authors

Ebru Celikel
View author publications
You can also search for this author in PubMed Google Scholar
Mehmet E. Dalkilic
View author publications
You can also search for this author in PubMed Google Scholar
Gokhan Dalkilic
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Engineering, Boğaziçi University, 34342, Bebek, Istanbul, Turkey
pInar Yolum & Can Özturan &
Computer Engineering Department, Boğaziçi University, 34342, Bebek, İstanbul, Turkey
Tunga Güngör
Computer Engineering Department, Bogazici University, 80815, Bebek, Istanbul, Turkey
Fikret Gürgen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Celikel, E., Dalkilic, M.E., Dalkilic, G. (2005). Word-Based Fixed and Flexible List Compression. In: Yolum, p., Güngör, T., Gürgen, F., Özturan, C. (eds) Computer and Information Sciences - ISCIS 2005. ISCIS 2005. Lecture Notes in Computer Science, vol 3733. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11569596_80

Download citation

DOI: https://doi.org/10.1007/11569596_80
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29414-6
Online ISBN: 978-3-540-32085-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Word-Based Fixed and Flexible List Compression

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Multi-Stream Word-Based Compression Algorithm for Compressed Text Search

Trigram-Based Vietnamese Text Compression

Lempel–Ziv-78 Compressed String Dictionaries

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Word-Based Fixed and Flexible List Compression

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Multi-Stream Word-Based Compression Algorithm for Compressed Text Search

Trigram-Based Vietnamese Text Compression

Lempel–Ziv-78 Compressed String Dictionaries

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation