research-article

Indexing Highly Repetitive String Collections, Part I: Repetitiveness Measures

Author:

Gonzalo NavarroAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 54, Issue 2

Article No.: 29, Pages 1 - 31

https://doi.org/10.1145/3434399

Published: 05 March 2021 Publication History

Abstract

Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through applications like bioinformatics, the string collections experienced a growth that outperforms Moore’s Law and challenges our ability to handle them even in compressed form. It turns out, fortunately, that many of these rapidly growing string collections are highly repetitive, so that their information content is orders of magnitude lower than their plain size. The statistical compression methods used for classical collections, however, are blind to this repetitiveness, and therefore a new set of techniques has been developed to properly exploit it. The resulting indexes form a new generation of data structures able to handle the huge repetitive string collections that we are facing. In this survey, formed by two parts, we cover the algorithmic developments that have led to these data structures.

In this first part, we describe the distinct compression paradigms that have been used to exploit repetitiveness, and the algorithmic techniques that provide direct access to the compressed strings. In the quest for an ideal measure of repetitiveness, we uncover a fascinating web of relations between those measures, as well as the limits up to which the data can be recovered, and up to which direct access to the compressed data can be provided. This is the basic aspect of indexability, which is covered in the second part of this survey.

References

[1]

A. Apostolico. 1985. The myriad virtues of subword trees. In Combinatorial Algorithms on Words (NATO ISI Series). Springer-Verlag, 85--96.

[2]

D. Belazzougui, M. Cáceres, T. Gagie, P. Gawrychowski, J. Kärkkäinen, G. Navarro, A. Ordóñez, S. J. Puglisi, and Y. Tabei. 2021. Block Trees. Journal of Computer and System Sciences 117 (2021), 1--22.

[3]

D. Belazzougui and F. Cunial. 2017. Representing the suffix tree with the CDAWG. In Proceedings of the 28th Annual Symposium on Combinatorial Pattern Matching (CPM’17). 7:1--7:13.

[4]

D. Belazzougui, F. Cunial, T. Gagie, N. Prezza, and M. Raffinot. 2015a. Composite repetition-aware data structures. In Proceedings of the 26th Annual Symposium on Combinatorial Pattern Matching (CPM’15). 26--39.

[5]

D. Belazzougui, F. Cunial, T. Gagie, N. Prezza, and M. Raffinot. 2017. Flexible indexing of repetitive collections. In Proceedings of the 13th Conference on Computability in Europe (CiE’17). 162--174.

[6]

D. Belazzougui, T. Gagie, P. Gawrychowski, J. Kärkkäinen, A. Ordóñez, S. J. Puglisi, and Y. Tabei. 2015b. Queries on lz-bounded encodings. In Proceedings of the 25th Data Compression Conference (DCC’15). 83--92.

[7]

D. Belazzougui and G. Navarro. 2015. Optimal lower and upper bounds for representing sequences. ACM Trans. Algor. 11, 4 (2015), article 31.

[8]

D. Belazzougui, S. J. Puglisi, and Y. Tabei. 2015c. Access, rank, select in grammar-compressed strings. In Proceedings of the 23rd Annual European Symposium on Algorithms (ESA’15). 142--154.

[9]

T. C. Bell, J. Cleary, and I. H. Witten. 1990. Text Compression. Prentice Hall.

[10]

M. Bender and M. Farach-Colton. 2004. The level ancestor problem simplified. Theoret. Comput. Sci. 321, 1 (2004), 5--12.

Digital Library

[11]

J. Bentley, D. Gibney, and S. V. Thankachan. 2019. On the complexity of BWT-runs minimization via alphabet reordering. CoRR 1911.03035.

[12]

P. Bille, T. Gagie, I. Li Gørtz, and N. Prezza. 2018. A separation between RLSLPs and LZ77. J. Discrete Algor. 50 (2018), 36--39.

[13]

P. Bille, I. L. Gørtz, P. H. Cording, B. Sach, H. W. Vildhøj, and S. Vind. 2017. Fingerprints in compressed strings. J. Comput. Syst. Sci. 86 (2017), 171--180.

Digital Library

[14]

P. Bille and I. L. Gørtz. 2020. Random access in persistent strings. CoRR 2006.15575.

[15]

P. Bille, G. M. Landau, R. Raman, K. Sadakane, S. S. Rao, and O. Weimann. 2015. Random access to grammar-compressed strings and trees. SIAM J. Comput. 44, 3 (2015), 513--539.

Digital Library

[16]

A. Blumer, J. Blumer, D. Haussler, R. M. McConnell, and A. Ehrenfeucht. 1987. Complete inverted files for efficient text retrieval and analysis. J. ACM 34, 3 (1987), 578--595.

Digital Library

[17]

M. Burrows and D. Wheeler. 1994. A Block Sorting Lossless Data Compression Algorithm. Technical Report 124. Digital Equipment Corporation.

[18]

M. Charikar, E. Lehman, D. Liu, R. Panigrahy, M. Prabhakaran, A. Rasala, A. Sahai, and A. Shelat. 2002. Approximating the smallest grammar: Kolmogorov complexity in natural models. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC’02). 792--801.

[19]

M. Charikar, E. Lehman, D. Liu, R. Panigrahy, M. Prabhakaran, A. Sahai, and A. Shelat. 2005. The smallest grammar problem. IEEE Trans. Info. Theory 51, 7 (2005), 2554--2576.

Digital Library

[20]

A. R. Christiansen, M. B. Ettienne, T. Kociumaka, G. Navarro, and N. Prezza. 2020. Optimal-time dictionary-compressed indexes. ACM Transactions on Algorithms 17, 1, Article 8 (2020).

[21]

F. Claude, A. Fariña, M. Martínez-Prieto, and G. Navarro. 2016. Universal indexes for highly repetitive document collections. Info. Syst. 61 (2016), 1--23.

[22]

T. Cover and J. Thomas. 2006. Elements of Information Theory (2nd ed.). Wiley.

[23]

M. Crochemore, C. S. Iliopoulos, M. Kubica, W. Rytter, and T. Waleń. 2012. Efficient algorithms for three variants of the LPF table. J. Discrete Algor. 11 (2012), 51--61.

Digital Library

[24]

P. Dinklage, J. Fischer, D. Köppl, M. Löbel, and K. Sadakane. 2017. Compression with the tudocomp framework. In Proceedings of the 16th International Symposium on Experimental Algorithms (SEA’17).

[25]

J. Driscoll, N. Sarnak, D. Sleator, and R. E. Tarjan. 1989. Making data structures persistent. J. Comput. Syst. Sci. 38 (1989), 86--124.

Digital Library

[26]

T. Elsayed and D. W. Oard. 2006. Modeling identity in archival collections of email: A preliminary study. In Proceedings of the 3rd Conference on Email and Anti-Spam (CEAS’06).

[27]

M. Farach and M. Thorup. 1998. String matching in Lempel-Ziv compressed strings. Algorithmica 20, 4 (1998), 388--404.

[28]

P. Ferragina, R. Giancarlo, G. Manzini, and M. Sciortino. 2005. Boosting textual compression in optimal linear time. J. ACM 52, 4 (2005), 688--713.

Digital Library

[29]

P. Ferragina and G. Manzini. 2005. Indexing compressed texts. J. ACM 52, 4 (2005), 552--581.

Digital Library

[30]

J. Fischer, T. I. D. Köppl, and K. Sadakane. 2018. Lempel-ziv factorization powered by space efficient suffix trees. Algorithmica 80, 7 (2018), 2048--2081.

Digital Library

[31]

M. H.-Y. Fritz, R. Leinonen, G. Cochrane, and E. Birney. 2011. Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res. (2011), 734--740.

[32]

T. Gagie. 2006. Large alphabets and incompressibility. Inform. Process. Lett. 99, 6 (2006), 246--251.

Digital Library

[33]

T. Gagie, P. Gawrychowski, J. Kärkkäinen, Y. Nekrich, and S. J. Puglisi. 2012. A faster grammar-based self-index. In Proceedings of the 6th International Conference on Language and Automata Theory and Applications (LATA’12). 240--251.

[34]

T. Gagie, P. Gawrychowski, J. Kärkkäinen, Y. Nekrich, and S. J. Puglisi. 2014. LZ77-based self-indexing with faster pattern matching. In Proceedings of the 11th Latin American Symposium on Theoretical Informatics (LATIN’14). 731--742.

[35]

T. Gagie, G. Navarro, and N. Prezza. 2018. On the approximation ratio of lempel-ziv parsing. In Proceedings of the 13th Latin American Symposium on Theoretical Informatics (LATIN’18). 490--503.

[36]

T. Gagie, G. Navarro, and N. Prezza. 2020. Fully functional suffix trees and optimal text searching in BWT-runs bounded space. J. ACM 67, 1 (2020), article 2.

Digital Library

[37]

J. K. Gallant. 1982. String Compression Algorithms. Ph.D. Dissertation. Princeton University.

[38]

M. Ganardi, A. Jeż, and M. Lohrey. 2019. Balancing straight-line programs. In Proceedings of the 60th IEEE Annual Symposium on Foundations of Computer Science (FOCS’19). 1169--1183.

[39]

L. Gasieniec, M. Karpinski, W. Plandowski, and W. Rytter. 1996. Efficient algorithms for lempel-ziv encoding. In Proceedings of the 5th Scandinavian Workshop on Algorithm Theory (SWAT’96). 392--403.

[40]

L. Gasieniec, R. Kolpakov, I. Potapov, and P. Sant. 2005. Real-time traversal in grammar-based compressed files. In Proceedings of the 15th Data Compression Conference (DCC’05). 458--458.

[41]

P. Gawrychowski. 2011. Pattern matching in lempel-ziv compressed strings: Fast, simple, and deterministic. In Proceedings of the 19th Annual European Symposium on Algorithms (ESA’11). 421--432.

[42]

S. Giuliani, S. Inenaga, Z. Lipták, N. Prezza, M. Sciortino, and A. Toffanello. 2020. Novel results on the number of runs of the burrows-wheeler-transform. CoRR 2008.08506.

[43]

S. Gog, J. Kärkkäinen, D. Kempa, M. Petri, and S. J. Puglisi. 2019. Fixed block compression boosting in FM-indexes: Theory and practice. Algorithmica 81, 4 (2019), 1370--1391.

Digital Library

[44]

M. R. Henzinger. 2006. Finding near-duplicate web pages: A large-scale evaluation of algorithms. In Proceedings of the 29th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR’06). 284--291.

Digital Library

[45]

D. Hucke, M. Lohrey, and C. P. Reh. 2016. The smallest grammar problem revisited. In Proceedings of the 23rd International Symposium on String Processing and Information Retrieval (SPIRE’16). 35--49.

[46]

G. Jacobson. 1989. Space-efficient static trees and graphs. In Proceedings of the 30th IEEE Symposium on Foundations of Computer Science (FOCS’89). 549--554.

Digital Library

[47]

A. Jeż. 2015. Approximation of grammar-based compression via recompression. Theoret. Comput. Sci. 592 (2015), 115--134.

Digital Library

[48]

A. Jeż. 2016. A really simple approximation of smallest grammar. Theoret. Comput. Sci. 616 (2016), 141--150.

Digital Library

[49]

C. Kapser and M. W. Godfrey. 2005. Improved tool support for the investigation of duplication in software. In Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM’05). 305--314.

[50]

J. Kärkkäinen, D. Kempa, and S. J. Puglisi. 2012. Slashing the time for BWT inversion. In Proceedings of the 22nd Data Compression Conference (DCC’12). 99--108.

[51]

J. Kärkkäinen, D. Kempa, and S. J. Puglisi. 2016. Lazy lempel-ziv factorization algorithms. ACM J. Exper. Algor. 21, 1 (2016), 2.4:1--2.4:19.

[52]

J. Kärkkäinen and S. J. Puglisi. 2010. Medium-space algorithms for inverse BWT. In Proceedings of the 18th Annual European Symposium on Algorithms (ESA’10). 451--462.

[53]

J. Kärkkäinen, P. Sanders, and S. Burkhardt. 2006. Linear work suffix array construction. J. ACM 53, 6 (2006), 918--936.

Digital Library

[54]

R. M. Karp and M. O. Rabin. 1987. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 2 (1987), 249--260.

Digital Library

[55]

T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. 2001. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching (CPM’01). 181--192.

[56]

D. Kempa and T. Kociumaka. 2019. Resolution of the burrows-wheeler transform conjecture. In Proceedings of the IEEE Computer Society Technical Committee on Mathematical Foundations of Computing (FOCS’20). 1002--1013.

[57]

D. Kempa and N. Prezza. 2018. At the roots of dictionary compression: String attractors. In Proceedings of the 50th Annual ACM Symposium on the Theory of Computing (STOC’18). 827--840.

[58]

T. Kida, T. Matsumoto, Y. Shibata, M. Takeda, A. Shinohara, and S. Arikawa. 2003. Collage system: A unifying framework for compressed pattern matching. Theoret. Comput. Sci. 298, 1 (2003), 253--272.

Digital Library

[59]

J. C. Kieffer and E.-H. Yang. 2000. Grammar-based codes: A new class of universal lossless source codes. IEEE Trans. Info. Theory 46, 3 (2000), 737--754.

Digital Library

[60]

D. K. Kim, J. S. Sim, H. Park, and K. Park. 2005. Constructing suffix arrays in linear time. J. Discrete Algor. 3, 2--4 (2005), 126--142.

[61]

P. Ko and S. Aluru. 2005. Space efficient linear time construction of suffix arrays. J. Discrete Algor. 3, 2--4 (2005), 143--156.

[62]

T. Kociumaka, G. Navarro, and N. Prezza. 2020. Towards a definitive measure of repetitiveness. In Proceedings of the 14th Latin American Symposium on Theoretical Informatics (LATIN’20). Lecture Notes in Computer Science, Vol. 12118. 207--219.

Digital Library

[63]

T. Kociumaka, G. Navarro, and N. Prezza. 2021. Towards a definitive compressibility measure for repetitive sequences. CoRR 1910.02151.

[64]

A. N. Kolmogorov. 1965. Three approaches to the quantitative definition of information. Prob. Info. Trans. 1, 1 (1965), 1--7.

[65]

R. Kosaraju and G. Manzini. 2000. Compression of low entropy strings with lempel-ziv algorithms. SIAM J. Comput. 29, 3 (2000), 893--911.

Digital Library

[66]

S. Kreft and G. Navarro. 2013. On compressing and indexing repetitive sequences. Theoret. Comput. Sci. 483 (2013), 115--133.

Digital Library

[67]

K. Kutsukake, T. Matsumoto, Y. Nakashima, S. Inenaga, H. Bannai, and M. Takeda. 2020. On repetitiveness measures of Thue-Morse words. In Proceedings of the 27th International Symposium on String Processing and Information Retrieval (SPIRE’20). 213--220.

[68]

J. Larsson and A. Moffat. 2000. Off-line dictionary-based compression. Proc. IEEE 88, 11 (2000), 1722--1732.

[69]

A. Lempel and J. Ziv. 1976. On the complexity of finite sequences. IEEE Trans. Info. Theory 22, 1 (1976), 75--81.

Digital Library

[70]

V. Mäkinen and G. Navarro. 2005. Succinct suffix arrays based on run-length encoding. Nordic J. Comput. 12, 1 (2005), 40--66.

Digital Library

[71]

V. Mäkinen, G. Navarro, J. Sirén, and N. Välimäki. 2010. Storage and retrieval of highly repetitive sequence collections. J. Comput. Biol. 17, 3 (2010), 281--308.

[72]

U. Manber and G. Myers. 1993. Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22, 5 (1993), 935--948.

Digital Library

[73]

S. Mantaci, A. Restivo, G. Romana, G. Rosone, and M. Sciortino. 2021. A combinatorial view on string attractors. Theoretical Computer Science 850 (2021), 236--248.

[74]

G. Manzini. 2001. An analysis of the burrows-wheeler transform. J. ACM 48, 3 (2001), 407--430.

Digital Library

[75]

E. McCreight. 1976. A space-economical suffix tree construction algorithm. J. ACM 23, 2 (1976), 262--272.

Digital Library

[76]

G. Navarro. 2016. Compact Data Structures—A Practical Approach. Cambridge University Press.

[77]

G. Navarro. 2019. Document listing on repetitive collections with guaranteed performance. Theoret. Comput. Sci. 777 (2019), 58--72.

[78]

G. Navarro. 2021. Indexing highly repetitive string collections, part II: Compressed indexes. ACM Computing Surveys 54, 2, Article 26 (2021).

[79]

G. Navarro and V. Mäkinen. 2007. Compressed full-text indexes. Comput. Surveys 39, 1 (2007), article 2.

[80]

G. Navarro and N. Prezza. 2019. Universal compressed text indexing. Theoret. Comput. Sci. 762 (2019), 41--50.

[81]

G. Navarro, N. Prezza, and C. Ochoa. 2021. On the approximation ratio of greedy parsings. IEEE Transactions on Information Theory 67, 2 (2021), 1008--1026.

[82]

G. Navarro and J. Rojas-Ledesma. 2020. Predecessor search. Comput. Surveys 53, 5 (2020), article 105.

[83]

C. Nevill-Manning, I. Witten, and D. Maulsby. 1994. Compression by induction of hierarchical grammars. In Proceedings of the 4th Data Compression Conference (DCC’94). 244--253.

[84]

T. Nishimoto, T. I. S. Inenaga, H. Bannai, and M. Takeda. 2016. Fully dynamic data structure for LCE queries in compressed space. In Proceedings of the 41st International Symposium on Mathematical Foundations of Computer Science (MFCS’16). 72:1--72:15.

[85]

T. Nishimoto and Y. Tabei. 2019. LZRR: LZ77 parsing with right reference. In Proceedings of the 29th Data Compression Conference (DCC’19). 211--220.

[86]

C. Ochoa and G. Navarro. 2019. RePair and all irreducible grammars are upper bounded by high-order empirical entropy. IEEE Trans. Info. Theory 65, 5 (2019), 3160--3164.

[87]

N. Prezza. 2016. Compressed Computation for Text Indexing. Ph.D. Dissertation. University of Udine.

[88]

M. Przeworski, R. R. Hudson, and A. Di Rienzo. 2000. Adjusting the focus on human variation. Trends Genet. 16, 7 (2000), 296--302.

[89]

S. Raskhodnikova, D. Ron, R. Rubinfeld, and A. D. Smith. 2013. Sublinear algorithms for approximating string compressibility. Algorithmica 65, 3 (2013), 685--709.

Digital Library

[90]

M. Rodeh, V. R. Pratt, and S. Even. 1981. Linear algorithm for data compression via string matching. J. ACM 28, 1 (1981), 16--24.

Digital Library

[91]

F. Rubin. 1976. Experiments in text file compression. Commun. ACM 19, 11 (1976), 617--623.

Digital Library

[92]

L. M. S. Russo, A. Correia, G. Navarro, and A. P. Francisco. 2020. Approximating optimal bidirectional macro schemes. In Proceedings of the 30th Data Compression Conference (DCC’20). 153--162.

[93]

W. Rytter. 2003. Application of lempel-ziv factorization to the approximation of grammar-based compression. Theoret. Comput. Sci. 302, 1--3 (2003), 211--222.

Digital Library

[94]

H. Sakamoto. 2005. A fully linear-time approximation algorithm for grammar-based compression. J. Discrete Algor. 3, 24 (2005), 416--430.

[95]

N. Sarnak and R. E. Tarjan. 1986. Planar point location using persistent search trees. Commun. ACM 29, 7 (1986), 669--679.

Digital Library

[96]

C. E. Shannon. 1948. A mathematical theory of communication. Bell Syst. Techn. J. 27 (1948), 398--403.

[97]

D. D. Sleator and R. E. Tarjan. 1983. A data structure for dynamic trees. J. Comput. Syst. Sci. 26, 3 (1983), 362--391.

Digital Library

[98]

Z. D. Stephens, S. Y. Lee, F. Faghri, R. H. Campbell, Z. Chenxiang, M. J. Efron, R. Iyer, S. Sinha, and G. E. Robinson. 2015. Big data: Astronomical or genomical? PLoS Biol. 17, 7 (2015), e1002195.

[99]

J. A. Storer and T. G. Szymanski. 1982. Data compression via textual substitution. J. ACM 29, 4 (1982), 928--951.

Digital Library

[100]

K. Tao, F. Abel, C. Hauff, G.-J. Houben, and U. Gadiraju. 2013. Groundhog day: Near-duplicate detection on twitter. In Proceedings of the 22nd International World Wide Web Conference (WWW’13). 1273--1284.

[101]

E. Verbin and W. Yu. 2013. Data structure lower bounds on random access to grammar-compressed strings. In Proceedings of the 24th Annual Symposium on Combinatorial Pattern Matching (CPM’13). 247--258.

[102]

P. Weiner. 1973. Linear pattern matching algorithms. In Proceedings of the 14th IEEE Symposium on Switching and Automata Theory (FOCS’73). 1--11.

Digital Library

[103]

I. H. Witten, R. M. Neal, and J. G. Cleary. 1987. Arithmetic coding for data compression. Commun. ACM 30 (1987), 520--540.

Digital Library

[104]

J. Ziv and A. Lempel. 1977. A universal algorithm for sequential data compression. IEEE Trans. Info. Theory 23, 3 (1977), 337--343.

Digital Library

Cited By

Becker RCanton MCenzato DKim SKodric BPrezza N(2024)Sketching and Streaming for Dictionary Compression2024 Data Compression Conference (DCC)10.1109/DCC58796.2024.00029(213-222)Online publication date: 19-Mar-2024
https://doi.org/10.1109/DCC58796.2024.00029
Carfagna LManzini G(2024)The Landscape of Compressibility Measures for Two-Dimensional DataIEEE Access10.1109/ACCESS.2024.341762112(87268-87283)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3417621
Boucher CCenzato DLipták ZRossi MSciortino M(2024)r-indexing the eBWTInformation and Computation10.1016/j.ic.2024.105155(105155)Online publication date: Mar-2024
https://doi.org/10.1016/j.ic.2024.105155
Show More Cited By

Index Terms

Recommendations

Indexing Highly Repetitive String Collections, Part II: Compressed Indexes

Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through applications like ...
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching

The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for space-efficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text $T$ ...
Document retrieval on repetitive string collections

Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their repetitiveness, which can ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 54, Issue 2

March 2022

800 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3450359

Editor:
Albert Zomaya
University of Sydney, Australia

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 March 2021

Accepted: 01 November 2020

Revised: 01 November 2020

Received: 01 April 2020

Published in CSUR Volume 54, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Fondecyt
ANID Basal Funds FB0001, Millennium Science Initiative Program

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
525
Total Downloads

Downloads (Last 12 months)120
Downloads (Last 6 weeks)7

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Becker RCanton MCenzato DKim SKodric BPrezza N(2024)Sketching and Streaming for Dictionary Compression2024 Data Compression Conference (DCC)10.1109/DCC58796.2024.00029(213-222)Online publication date: 19-Mar-2024
https://doi.org/10.1109/DCC58796.2024.00029
Carfagna LManzini G(2024)The Landscape of Compressibility Measures for Two-Dimensional DataIEEE Access10.1109/ACCESS.2024.341762112(87268-87283)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3417621
Boucher CCenzato DLipták ZRossi MSciortino M(2024)r-indexing the eBWTInformation and Computation10.1016/j.ic.2024.105155(105155)Online publication date: Mar-2024
https://doi.org/10.1016/j.ic.2024.105155
Kociumaka TNavarro GOlivares F(2024)Near-Optimal Search Time in -Optimal Space, and Vice VersaAlgorithmica10.1007/s00453-023-01186-086:4(1031-1056)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1007/s00453-023-01186-0
Navarro GUrbina C(2024)Iterated Straight-Line ProgramsLATIN 2024: Theoretical Informatics10.1007/978-3-031-55598-5_5(66-80)Online publication date: 6-Mar-2024
https://doi.org/10.1007/978-3-031-55598-5_5
Gagie TGoga AJeż ANavarro G(2024)Space-Efficient Conversions from SLPsLATIN 2024: Theoretical Informatics10.1007/978-3-031-55598-5_10(146-161)Online publication date: 6-Mar-2024
https://doi.org/10.1007/978-3-031-55598-5_10
Kociumaka TNavarro GPrezza N(2023)Toward a Definitive Compressibility Measure for Repetitive SequencesIEEE Transactions on Information Theory10.1109/TIT.2022.322438269:4(2074-2092)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TIT.2022.3224382
Kempa DKociumaka T(2023)Collapsing the Hierarchy of Compressed Data Structures: Suffix Arrays in Optimal Compressed Space2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00114(1877-1886)Online publication date: 6-Nov-2023
https://doi.org/10.1109/FOCS57990.2023.00114
Giancarlo RManzini GRestivo ARosone GSciortino M(2023)A New Class of String Transformations for Compressed Text IndexingInformation and Computation10.1016/j.ic.2023.105068(105068)Online publication date: Jul-2023
https://doi.org/10.1016/j.ic.2023.105068
Carfagna LManzini G(2023)Compressibility Measures for Two-Dimensional DataString Processing and Information Retrieval10.1007/978-3-031-43980-3_9(102-113)Online publication date: 20-Sep-2023
https://doi.org/10.1007/978-3-031-43980-3_9
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents