Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Indexing Highly Repetitive String Collections, Part I: Repetitiveness Measures

Published: 05 March 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through applications like bioinformatics, the string collections experienced a growth that outperforms Moore’s Law and challenges our ability to handle them even in compressed form. It turns out, fortunately, that many of these rapidly growing string collections are highly repetitive, so that their information content is orders of magnitude lower than their plain size. The statistical compression methods used for classical collections, however, are blind to this repetitiveness, and therefore a new set of techniques has been developed to properly exploit it. The resulting indexes form a new generation of data structures able to handle the huge repetitive string collections that we are facing. In this survey, formed by two parts, we cover the algorithmic developments that have led to these data structures.
    In this first part, we describe the distinct compression paradigms that have been used to exploit repetitiveness, and the algorithmic techniques that provide direct access to the compressed strings. In the quest for an ideal measure of repetitiveness, we uncover a fascinating web of relations between those measures, as well as the limits up to which the data can be recovered, and up to which direct access to the compressed data can be provided. This is the basic aspect of indexability, which is covered in the second part of this survey.

    References

    [1]
    A. Apostolico. 1985. The myriad virtues of subword trees. In Combinatorial Algorithms on Words (NATO ISI Series). Springer-Verlag, 85--96.
    [2]
    D. Belazzougui, M. Cáceres, T. Gagie, P. Gawrychowski, J. Kärkkäinen, G. Navarro, A. Ordóñez, S. J. Puglisi, and Y. Tabei. 2021. Block Trees. Journal of Computer and System Sciences 117 (2021), 1--22.
    [3]
    D. Belazzougui and F. Cunial. 2017. Representing the suffix tree with the CDAWG. In Proceedings of the 28th Annual Symposium on Combinatorial Pattern Matching (CPM’17). 7:1--7:13.
    [4]
    D. Belazzougui, F. Cunial, T. Gagie, N. Prezza, and M. Raffinot. 2015a. Composite repetition-aware data structures. In Proceedings of the 26th Annual Symposium on Combinatorial Pattern Matching (CPM’15). 26--39.
    [5]
    D. Belazzougui, F. Cunial, T. Gagie, N. Prezza, and M. Raffinot. 2017. Flexible indexing of repetitive collections. In Proceedings of the 13th Conference on Computability in Europe (CiE’17). 162--174.
    [6]
    D. Belazzougui, T. Gagie, P. Gawrychowski, J. Kärkkäinen, A. Ordóñez, S. J. Puglisi, and Y. Tabei. 2015b. Queries on lz-bounded encodings. In Proceedings of the 25th Data Compression Conference (DCC’15). 83--92.
    [7]
    D. Belazzougui and G. Navarro. 2015. Optimal lower and upper bounds for representing sequences. ACM Trans. Algor. 11, 4 (2015), article 31.
    [8]
    D. Belazzougui, S. J. Puglisi, and Y. Tabei. 2015c. Access, rank, select in grammar-compressed strings. In Proceedings of the 23rd Annual European Symposium on Algorithms (ESA’15). 142--154.
    [9]
    T. C. Bell, J. Cleary, and I. H. Witten. 1990. Text Compression. Prentice Hall.
    [10]
    M. Bender and M. Farach-Colton. 2004. The level ancestor problem simplified. Theoret. Comput. Sci. 321, 1 (2004), 5--12.
    [11]
    J. Bentley, D. Gibney, and S. V. Thankachan. 2019. On the complexity of BWT-runs minimization via alphabet reordering. CoRR 1911.03035.
    [12]
    P. Bille, T. Gagie, I. Li Gørtz, and N. Prezza. 2018. A separation between RLSLPs and LZ77. J. Discrete Algor. 50 (2018), 36--39.
    [13]
    P. Bille, I. L. Gørtz, P. H. Cording, B. Sach, H. W. Vildhøj, and S. Vind. 2017. Fingerprints in compressed strings. J. Comput. Syst. Sci. 86 (2017), 171--180.
    [14]
    P. Bille and I. L. Gørtz. 2020. Random access in persistent strings. CoRR 2006.15575.
    [15]
    P. Bille, G. M. Landau, R. Raman, K. Sadakane, S. S. Rao, and O. Weimann. 2015. Random access to grammar-compressed strings and trees. SIAM J. Comput. 44, 3 (2015), 513--539.
    [16]
    A. Blumer, J. Blumer, D. Haussler, R. M. McConnell, and A. Ehrenfeucht. 1987. Complete inverted files for efficient text retrieval and analysis. J. ACM 34, 3 (1987), 578--595.
    [17]
    M. Burrows and D. Wheeler. 1994. A Block Sorting Lossless Data Compression Algorithm. Technical Report 124. Digital Equipment Corporation.
    [18]
    M. Charikar, E. Lehman, D. Liu, R. Panigrahy, M. Prabhakaran, A. Rasala, A. Sahai, and A. Shelat. 2002. Approximating the smallest grammar: Kolmogorov complexity in natural models. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC’02). 792--801.
    [19]
    M. Charikar, E. Lehman, D. Liu, R. Panigrahy, M. Prabhakaran, A. Sahai, and A. Shelat. 2005. The smallest grammar problem. IEEE Trans. Info. Theory 51, 7 (2005), 2554--2576.
    [20]
    A. R. Christiansen, M. B. Ettienne, T. Kociumaka, G. Navarro, and N. Prezza. 2020. Optimal-time dictionary-compressed indexes. ACM Transactions on Algorithms 17, 1, Article 8 (2020).
    [21]
    F. Claude, A. Fariña, M. Martínez-Prieto, and G. Navarro. 2016. Universal indexes for highly repetitive document collections. Info. Syst. 61 (2016), 1--23.
    [22]
    T. Cover and J. Thomas. 2006. Elements of Information Theory (2nd ed.). Wiley.
    [23]
    M. Crochemore, C. S. Iliopoulos, M. Kubica, W. Rytter, and T. Waleń. 2012. Efficient algorithms for three variants of the LPF table. J. Discrete Algor. 11 (2012), 51--61.
    [24]
    P. Dinklage, J. Fischer, D. Köppl, M. Löbel, and K. Sadakane. 2017. Compression with the tudocomp framework. In Proceedings of the 16th International Symposium on Experimental Algorithms (SEA’17).
    [25]
    J. Driscoll, N. Sarnak, D. Sleator, and R. E. Tarjan. 1989. Making data structures persistent. J. Comput. Syst. Sci. 38 (1989), 86--124.
    [26]
    T. Elsayed and D. W. Oard. 2006. Modeling identity in archival collections of email: A preliminary study. In Proceedings of the 3rd Conference on Email and Anti-Spam (CEAS’06).
    [27]
    M. Farach and M. Thorup. 1998. String matching in Lempel-Ziv compressed strings. Algorithmica 20, 4 (1998), 388--404.
    [28]
    P. Ferragina, R. Giancarlo, G. Manzini, and M. Sciortino. 2005. Boosting textual compression in optimal linear time. J. ACM 52, 4 (2005), 688--713.
    [29]
    P. Ferragina and G. Manzini. 2005. Indexing compressed texts. J. ACM 52, 4 (2005), 552--581.
    [30]
    J. Fischer, T. I. D. Köppl, and K. Sadakane. 2018. Lempel-ziv factorization powered by space efficient suffix trees. Algorithmica 80, 7 (2018), 2048--2081.
    [31]
    M. H.-Y. Fritz, R. Leinonen, G. Cochrane, and E. Birney. 2011. Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res. (2011), 734--740.
    [32]
    T. Gagie. 2006. Large alphabets and incompressibility. Inform. Process. Lett. 99, 6 (2006), 246--251.
    [33]
    T. Gagie, P. Gawrychowski, J. Kärkkäinen, Y. Nekrich, and S. J. Puglisi. 2012. A faster grammar-based self-index. In Proceedings of the 6th International Conference on Language and Automata Theory and Applications (LATA’12). 240--251.
    [34]
    T. Gagie, P. Gawrychowski, J. Kärkkäinen, Y. Nekrich, and S. J. Puglisi. 2014. LZ77-based self-indexing with faster pattern matching. In Proceedings of the 11th Latin American Symposium on Theoretical Informatics (LATIN’14). 731--742.
    [35]
    T. Gagie, G. Navarro, and N. Prezza. 2018. On the approximation ratio of lempel-ziv parsing. In Proceedings of the 13th Latin American Symposium on Theoretical Informatics (LATIN’18). 490--503.
    [36]
    T. Gagie, G. Navarro, and N. Prezza. 2020. Fully functional suffix trees and optimal text searching in BWT-runs bounded space. J. ACM 67, 1 (2020), article 2.
    [37]
    J. K. Gallant. 1982. String Compression Algorithms. Ph.D. Dissertation. Princeton University.
    [38]
    M. Ganardi, A. Jeż, and M. Lohrey. 2019. Balancing straight-line programs. In Proceedings of the 60th IEEE Annual Symposium on Foundations of Computer Science (FOCS’19). 1169--1183.
    [39]
    L. Gasieniec, M. Karpinski, W. Plandowski, and W. Rytter. 1996. Efficient algorithms for lempel-ziv encoding. In Proceedings of the 5th Scandinavian Workshop on Algorithm Theory (SWAT’96). 392--403.
    [40]
    L. Gasieniec, R. Kolpakov, I. Potapov, and P. Sant. 2005. Real-time traversal in grammar-based compressed files. In Proceedings of the 15th Data Compression Conference (DCC’05). 458--458.
    [41]
    P. Gawrychowski. 2011. Pattern matching in lempel-ziv compressed strings: Fast, simple, and deterministic. In Proceedings of the 19th Annual European Symposium on Algorithms (ESA’11). 421--432.
    [42]
    S. Giuliani, S. Inenaga, Z. Lipták, N. Prezza, M. Sciortino, and A. Toffanello. 2020. Novel results on the number of runs of the burrows-wheeler-transform. CoRR 2008.08506.
    [43]
    S. Gog, J. Kärkkäinen, D. Kempa, M. Petri, and S. J. Puglisi. 2019. Fixed block compression boosting in FM-indexes: Theory and practice. Algorithmica 81, 4 (2019), 1370--1391.
    [44]
    M. R. Henzinger. 2006. Finding near-duplicate web pages: A large-scale evaluation of algorithms. In Proceedings of the 29th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR’06). 284--291.
    [45]
    D. Hucke, M. Lohrey, and C. P. Reh. 2016. The smallest grammar problem revisited. In Proceedings of the 23rd International Symposium on String Processing and Information Retrieval (SPIRE’16). 35--49.
    [46]
    G. Jacobson. 1989. Space-efficient static trees and graphs. In Proceedings of the 30th IEEE Symposium on Foundations of Computer Science (FOCS’89). 549--554.
    [47]
    A. Jeż. 2015. Approximation of grammar-based compression via recompression. Theoret. Comput. Sci. 592 (2015), 115--134.
    [48]
    A. Jeż. 2016. A really simple approximation of smallest grammar. Theoret. Comput. Sci. 616 (2016), 141--150.
    [49]
    C. Kapser and M. W. Godfrey. 2005. Improved tool support for the investigation of duplication in software. In Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM’05). 305--314.
    [50]
    J. Kärkkäinen, D. Kempa, and S. J. Puglisi. 2012. Slashing the time for BWT inversion. In Proceedings of the 22nd Data Compression Conference (DCC’12). 99--108.
    [51]
    J. Kärkkäinen, D. Kempa, and S. J. Puglisi. 2016. Lazy lempel-ziv factorization algorithms. ACM J. Exper. Algor. 21, 1 (2016), 2.4:1--2.4:19.
    [52]
    J. Kärkkäinen and S. J. Puglisi. 2010. Medium-space algorithms for inverse BWT. In Proceedings of the 18th Annual European Symposium on Algorithms (ESA’10). 451--462.
    [53]
    J. Kärkkäinen, P. Sanders, and S. Burkhardt. 2006. Linear work suffix array construction. J. ACM 53, 6 (2006), 918--936.
    [54]
    R. M. Karp and M. O. Rabin. 1987. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 2 (1987), 249--260.
    [55]
    T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. 2001. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching (CPM’01). 181--192.
    [56]
    D. Kempa and T. Kociumaka. 2019. Resolution of the burrows-wheeler transform conjecture. In Proceedings of the IEEE Computer Society Technical Committee on Mathematical Foundations of Computing (FOCS’20). 1002--1013.
    [57]
    D. Kempa and N. Prezza. 2018. At the roots of dictionary compression: String attractors. In Proceedings of the 50th Annual ACM Symposium on the Theory of Computing (STOC’18). 827--840.
    [58]
    T. Kida, T. Matsumoto, Y. Shibata, M. Takeda, A. Shinohara, and S. Arikawa. 2003. Collage system: A unifying framework for compressed pattern matching. Theoret. Comput. Sci. 298, 1 (2003), 253--272.
    [59]
    J. C. Kieffer and E.-H. Yang. 2000. Grammar-based codes: A new class of universal lossless source codes. IEEE Trans. Info. Theory 46, 3 (2000), 737--754.
    [60]
    D. K. Kim, J. S. Sim, H. Park, and K. Park. 2005. Constructing suffix arrays in linear time. J. Discrete Algor. 3, 2--4 (2005), 126--142.
    [61]
    P. Ko and S. Aluru. 2005. Space efficient linear time construction of suffix arrays. J. Discrete Algor. 3, 2--4 (2005), 143--156.
    [62]
    T. Kociumaka, G. Navarro, and N. Prezza. 2020. Towards a definitive measure of repetitiveness. In Proceedings of the 14th Latin American Symposium on Theoretical Informatics (LATIN’20). Lecture Notes in Computer Science, Vol. 12118. 207--219.
    [63]
    T. Kociumaka, G. Navarro, and N. Prezza. 2021. Towards a definitive compressibility measure for repetitive sequences. CoRR 1910.02151.
    [64]
    A. N. Kolmogorov. 1965. Three approaches to the quantitative definition of information. Prob. Info. Trans. 1, 1 (1965), 1--7.
    [65]
    R. Kosaraju and G. Manzini. 2000. Compression of low entropy strings with lempel-ziv algorithms. SIAM J. Comput. 29, 3 (2000), 893--911.
    [66]
    S. Kreft and G. Navarro. 2013. On compressing and indexing repetitive sequences. Theoret. Comput. Sci. 483 (2013), 115--133.
    [67]
    K. Kutsukake, T. Matsumoto, Y. Nakashima, S. Inenaga, H. Bannai, and M. Takeda. 2020. On repetitiveness measures of Thue-Morse words. In Proceedings of the 27th International Symposium on String Processing and Information Retrieval (SPIRE’20). 213--220.
    [68]
    J. Larsson and A. Moffat. 2000. Off-line dictionary-based compression. Proc. IEEE 88, 11 (2000), 1722--1732.
    [69]
    A. Lempel and J. Ziv. 1976. On the complexity of finite sequences. IEEE Trans. Info. Theory 22, 1 (1976), 75--81.
    [70]
    V. Mäkinen and G. Navarro. 2005. Succinct suffix arrays based on run-length encoding. Nordic J. Comput. 12, 1 (2005), 40--66.
    [71]
    V. Mäkinen, G. Navarro, J. Sirén, and N. Välimäki. 2010. Storage and retrieval of highly repetitive sequence collections. J. Comput. Biol. 17, 3 (2010), 281--308.
    [72]
    U. Manber and G. Myers. 1993. Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22, 5 (1993), 935--948.
    [73]
    S. Mantaci, A. Restivo, G. Romana, G. Rosone, and M. Sciortino. 2021. A combinatorial view on string attractors. Theoretical Computer Science 850 (2021), 236--248.
    [74]
    G. Manzini. 2001. An analysis of the burrows-wheeler transform. J. ACM 48, 3 (2001), 407--430.
    [75]
    E. McCreight. 1976. A space-economical suffix tree construction algorithm. J. ACM 23, 2 (1976), 262--272.
    [76]
    G. Navarro. 2016. Compact Data Structures—A Practical Approach. Cambridge University Press.
    [77]
    G. Navarro. 2019. Document listing on repetitive collections with guaranteed performance. Theoret. Comput. Sci. 777 (2019), 58--72.
    [78]
    G. Navarro. 2021. Indexing highly repetitive string collections, part II: Compressed indexes. ACM Computing Surveys 54, 2, Article 26 (2021).
    [79]
    G. Navarro and V. Mäkinen. 2007. Compressed full-text indexes. Comput. Surveys 39, 1 (2007), article 2.
    [80]
    G. Navarro and N. Prezza. 2019. Universal compressed text indexing. Theoret. Comput. Sci. 762 (2019), 41--50.
    [81]
    G. Navarro, N. Prezza, and C. Ochoa. 2021. On the approximation ratio of greedy parsings. IEEE Transactions on Information Theory 67, 2 (2021), 1008--1026.
    [82]
    G. Navarro and J. Rojas-Ledesma. 2020. Predecessor search. Comput. Surveys 53, 5 (2020), article 105.
    [83]
    C. Nevill-Manning, I. Witten, and D. Maulsby. 1994. Compression by induction of hierarchical grammars. In Proceedings of the 4th Data Compression Conference (DCC’94). 244--253.
    [84]
    T. Nishimoto, T. I. S. Inenaga, H. Bannai, and M. Takeda. 2016. Fully dynamic data structure for LCE queries in compressed space. In Proceedings of the 41st International Symposium on Mathematical Foundations of Computer Science (MFCS’16). 72:1--72:15.
    [85]
    T. Nishimoto and Y. Tabei. 2019. LZRR: LZ77 parsing with right reference. In Proceedings of the 29th Data Compression Conference (DCC’19). 211--220.
    [86]
    C. Ochoa and G. Navarro. 2019. RePair and all irreducible grammars are upper bounded by high-order empirical entropy. IEEE Trans. Info. Theory 65, 5 (2019), 3160--3164.
    [87]
    N. Prezza. 2016. Compressed Computation for Text Indexing. Ph.D. Dissertation. University of Udine.
    [88]
    M. Przeworski, R. R. Hudson, and A. Di Rienzo. 2000. Adjusting the focus on human variation. Trends Genet. 16, 7 (2000), 296--302.
    [89]
    S. Raskhodnikova, D. Ron, R. Rubinfeld, and A. D. Smith. 2013. Sublinear algorithms for approximating string compressibility. Algorithmica 65, 3 (2013), 685--709.
    [90]
    M. Rodeh, V. R. Pratt, and S. Even. 1981. Linear algorithm for data compression via string matching. J. ACM 28, 1 (1981), 16--24.
    [91]
    F. Rubin. 1976. Experiments in text file compression. Commun. ACM 19, 11 (1976), 617--623.
    [92]
    L. M. S. Russo, A. Correia, G. Navarro, and A. P. Francisco. 2020. Approximating optimal bidirectional macro schemes. In Proceedings of the 30th Data Compression Conference (DCC’20). 153--162.
    [93]
    W. Rytter. 2003. Application of lempel-ziv factorization to the approximation of grammar-based compression. Theoret. Comput. Sci. 302, 1--3 (2003), 211--222.
    [94]
    H. Sakamoto. 2005. A fully linear-time approximation algorithm for grammar-based compression. J. Discrete Algor. 3, 24 (2005), 416--430.
    [95]
    N. Sarnak and R. E. Tarjan. 1986. Planar point location using persistent search trees. Commun. ACM 29, 7 (1986), 669--679.
    [96]
    C. E. Shannon. 1948. A mathematical theory of communication. Bell Syst. Techn. J. 27 (1948), 398--403.
    [97]
    D. D. Sleator and R. E. Tarjan. 1983. A data structure for dynamic trees. J. Comput. Syst. Sci. 26, 3 (1983), 362--391.
    [98]
    Z. D. Stephens, S. Y. Lee, F. Faghri, R. H. Campbell, Z. Chenxiang, M. J. Efron, R. Iyer, S. Sinha, and G. E. Robinson. 2015. Big data: Astronomical or genomical? PLoS Biol. 17, 7 (2015), e1002195.
    [99]
    J. A. Storer and T. G. Szymanski. 1982. Data compression via textual substitution. J. ACM 29, 4 (1982), 928--951.
    [100]
    K. Tao, F. Abel, C. Hauff, G.-J. Houben, and U. Gadiraju. 2013. Groundhog day: Near-duplicate detection on twitter. In Proceedings of the 22nd International World Wide Web Conference (WWW’13). 1273--1284.
    [101]
    E. Verbin and W. Yu. 2013. Data structure lower bounds on random access to grammar-compressed strings. In Proceedings of the 24th Annual Symposium on Combinatorial Pattern Matching (CPM’13). 247--258.
    [102]
    P. Weiner. 1973. Linear pattern matching algorithms. In Proceedings of the 14th IEEE Symposium on Switching and Automata Theory (FOCS’73). 1--11.
    [103]
    I. H. Witten, R. M. Neal, and J. G. Cleary. 1987. Arithmetic coding for data compression. Commun. ACM 30 (1987), 520--540.
    [104]
    J. Ziv and A. Lempel. 1977. A universal algorithm for sequential data compression. IEEE Trans. Info. Theory 23, 3 (1977), 337--343.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Computing Surveys
    ACM Computing Surveys  Volume 54, Issue 2
    March 2022
    800 pages
    ISSN:0360-0300
    EISSN:1557-7341
    DOI:10.1145/3450359
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 March 2021
    Accepted: 01 November 2020
    Revised: 01 November 2020
    Received: 01 April 2020
    Published in CSUR Volume 54, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Text indexing
    2. compressed data structures
    3. repetitive string collections
    4. string searching

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Fondecyt
    • ANID Basal Funds FB0001, Millennium Science Initiative Program

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)120
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Sketching and Streaming for Dictionary Compression2024 Data Compression Conference (DCC)10.1109/DCC58796.2024.00029(213-222)Online publication date: 19-Mar-2024
    • (2024)The Landscape of Compressibility Measures for Two-Dimensional DataIEEE Access10.1109/ACCESS.2024.341762112(87268-87283)Online publication date: 2024
    • (2024)r-indexing the eBWTInformation and Computation10.1016/j.ic.2024.105155(105155)Online publication date: Mar-2024
    • (2024)Near-Optimal Search Time in -Optimal Space, and Vice VersaAlgorithmica10.1007/s00453-023-01186-086:4(1031-1056)Online publication date: 1-Apr-2024
    • (2024)Iterated Straight-Line ProgramsLATIN 2024: Theoretical Informatics10.1007/978-3-031-55598-5_5(66-80)Online publication date: 6-Mar-2024
    • (2024)Space-Efficient Conversions from SLPsLATIN 2024: Theoretical Informatics10.1007/978-3-031-55598-5_10(146-161)Online publication date: 6-Mar-2024
    • (2023)Toward a Definitive Compressibility Measure for Repetitive SequencesIEEE Transactions on Information Theory10.1109/TIT.2022.322438269:4(2074-2092)Online publication date: 1-Apr-2023
    • (2023)Collapsing the Hierarchy of Compressed Data Structures: Suffix Arrays in Optimal Compressed Space2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00114(1877-1886)Online publication date: 6-Nov-2023
    • (2023)A New Class of String Transformations for Compressed Text IndexingInformation and Computation10.1016/j.ic.2023.105068(105068)Online publication date: Jul-2023
    • (2023)Compressibility Measures for Two-Dimensional DataString Processing and Information Retrieval10.1007/978-3-031-43980-3_9(102-113)Online publication date: 20-Sep-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media