Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Text Indexing for Long Patterns: Anchors are All you Need

Published: 01 May 2023 Publication History

Abstract

In many real-world database systems, a large fraction of the data is represented by strings: sequences of letters over some alphabet. This is because strings can easily encode data arising from different sources. It is often crucial to represent such string datasets in a compact form but also to simultaneously enable fast pattern matching queries. This is the classic text indexing problem. The four absolute measures anyone should pay attention to when designing or implementing a text index are: (i) index space; (ii) query time; (iii) construction space; and (iv) construction time. Unfortunately, however, most (if not all) widely-used indexes (e.g., suffix tree, suffix array, or their compressed counterparts) are not optimized for all four measures simultaneously, as it is difficult to have the best of all four worlds. Here, we take an important step in this direction by showing that text indexing with locally consistent anchors (lc-anchors) offers remarkably good performance in all four measures, when we have at hand a lower bound l on the length of the queried patterns --- which is arguably a quite reasonable assumption in practical applications. Specifically, we improve on the construction of the index proposed by Loukides and Pissis, which is based on bidirectional string anchors (bd-anchors), a new type of lc-anchors, by: (i) designing an average-case linear-time algorithm to compute bd-anchors; and (ii) developing a semi-external-memory implementation to construct the index in small space using near-optimal work. We then present an extensive experimental evaluation, based on the four measures, using real benchmark datasets. The results show that, for long patterns, the index constructed using our improved algorithms compares favorably to all classic indexes: (compressed) suffix tree; (compressed) suffix array; and the FM-index.

References

[1]
James Abello, Adam L. Buchsbaum, and Jeffery R. Westbrook. 2002. A Functional Approach to External Graph Algorithms. Algorithmica 32, 3 (2002), 437--458.
[2]
Alberto Apostolico, Maxime Crochemore, Martin Farach-Colton, Zvi Galil, and S. Muthukrishnan. 2016. 40 years of suffix trees. Commun. ACM 59, 4 (2016), 66--73.
[3]
Mozhdeh Ariannezhad, Ali Montazeralghaem, Hamed Zamani, and Azadeh Shakery. 2017. Improving Retrieval Performance for Verbose Queries via Axiomatic Analysis of Term Discrimination Heuristic. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7--11, 2017, Noriko Kando, Tetsuya Sakai, Hideo Joho, Hang Li, Arjen P. de Vries, and Ryen W. White (Eds.). ACM, 1201--1204.
[4]
Jérémy Barbay, Francisco Claude, Travis Gagie, Gonzalo Navarro, and Yakov Nekrich. 2014. Efficient Fully-Compressed Sequence Representations. Algorithmica 69, 1 (2014), 232--268.
[5]
Djamal Belazzougui. 2014. Linear time construction of compressed text indices in compact space. In Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31 - June 03, 2014, David B. Shmoys (Ed.). ACM, 148--193.
[6]
Djamal Belazzougui, Fabio Cunial, Juha Kärkkäinen, and Veli Mäkinen. 2020. Linear-time String Indexing and Analysis in Small Space. ACM Trans. Algorithms 16, 2 (2020), 17:1--17:54.
[7]
Djamal Belazzougui and Gonzalo Navarro. 2015. Optimal Lower and Upper Bounds for Representing Sequences. ACM Trans. Algorithms 11, 4 (2015), 31:1--31:21.
[8]
Djamal Belazzougui and Simon J. Puglisi. 2016. Range Predecessor and Lempel-Ziv Parsing. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10--12, 2016, Robert Krauthgamer (Ed.). SIAM, 2053--2071.
[9]
Stav Ben-Nun, Shay Golan, Tomasz Kociumaka, and Matan Kraus. 2020. Time-Space Tradeoffs for Finding a Long Common Substring. In 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, June 17--19, 2020, Copenhagen, Denmark (LIPIcs), Inge Li Gørtz and Oren Weimann (Eds.), Vol. 161. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 5:1--5:14.
[10]
Michael Bendersky and W. Bruce Croft. 2008. Discovering key concepts in verbose queries. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, Singapore, July 20--24, 2008, Sung-Hyon Myaeng, Douglas W. Oard, Fabrizio Sebastiani, Tat-Seng Chua, and Mun-Kew Leong (Eds.). ACM, 491--498.
[11]
Nico Bertram, Jonas Ellert, and Johannes Fischer. 2021. Lyndon Words Accelerate Suffix Sorting, See [77], 15:1--15:13.
[12]
Timo Bingmann, Johannes Fischer, and Vitaly Osipov. 2016. Inducing Suffix and LCP Arrays in External Memory. ACM J. Exp. Algorithmics 21, 1 (2016), 2.3:1--2.3:27.
[13]
Or Birenzwige, Shay Golan, and Ely Porat. 2020. Locally Consistent Parsing for Text Indexing in Small Space. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5--8, 2020, Shuchi Chawla (Ed.). SIAM, 607--626.
[14]
Peter A. Boncz, Thomas Neumann, and Viktor Leis. 2020. FSST: Fast Random Access String Compression. Proc. VLDB Endow. 13, 11 (2020), 2649--2661.
[15]
Stefan Burkhardt and Juha Kärkkäinen. 2003. Fast Lightweight Suffix Array Construction and Checking. In Combinatorial Pattern Matching, 14th Annual Symposium, CPM 2003, Morelia, Michocán, Mexico, June 25--27, 2003, Proceedings (Lecture Notes in Computer Science), Ricardo A. Baeza-Yates, Edgar Chávez, and Maxime Crochemore (Eds.), Vol. 2676. Springer, 55--69.
[16]
Timothy M. Chan, Kasper Green Larsen, and Mihai Patrascu. 2011. Orthogonal range searching on the RAM, revisited. In Proceedings of the 27th ACM Symposium on Computational Geometry, Paris, France, June 13--15, 2011, Ferran Hurtado and Marc J. van Kreveld (Eds.). ACM, 1--10.
[17]
Panagiotis Charalampopoulos, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen.2018. Linear-Time Algorithm for Long LCF with k Mismatches. In Annual Symposium on Combinatorial Pattern Matching, CPM 2018, July 2--4, 2018 - Qingdao, China (LIPIcs), Gonzalo Navarro, David Sankoff, and Binhai Zhu (Eds.), Vol. 105. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 23:1--23:16.
[18]
Panagiotis Charalampopoulos, Tomasz Kociumaka, Solon P. Pissis, and Jakub Radoszewski. 2021. Faster Algorithms for Longest Common Substring, See [77], 30:1--30:17.
[19]
Panagiotis Charalampopoulos, Solon P. Pissis, and Jakub Radoszewski. 2022. Longest Palindromic Substring in Sublinear Time. In 33rd Annual Symposium on Combinatorial Pattern Matching, CPM 2022, June 27--29, 2022, Prague, Czech Republic (LIPIcs), Hideo Bannai and Jan Holub (Eds.), Vol. 223. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 20:1--20:9.
[20]
Ferdinando Cicalese, Ely Porat, and Ugo Vaccaro (Eds.). 2015. Combinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, Ischia Island, Italy, June 29 - July 1, 2015, Proceedings. Lecture Notes in Computer Science, Vol. 9133. Springer.
[21]
Francisco Claude, Gonzalo Navarro, Hannu Peltola, Leena Salmela, and Jorma Tarhio. 2012. String matching with alphabet sampling. J. Discrete Algorithms 11 (2012), 37--50.
[22]
Richard Cole, Tsvi Kopelowitz, and Moshe Lewenstein. 2015. Suffix Trays and Suffix Trists: Structures for Faster Text Indexing. Algorithmica 72, 2 (2015), 450--466.
[23]
Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. 2007. Algorithms on strings. Cambridge University Press.
[24]
Patrick Dinklage, Johannes Fischer, and Alexander Herlez. 2021. Engineering Predecessor Data Structures for Dynamic Integer Sets. In 19th International Symposium on Experimental Algorithms, SEA 2021, June 7--9, 2021, Nice, France (LIPIcs), David Coudert and Emanuele Natale (Eds.), Vol. 190. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 7:1--7:19.
[25]
Patrick Dinklage, Johannes Fischer, Alexander Herlez, Tomasz Kociumaka, and Florian Kurpicz. 2020. Practical Performance of Space Efficient Data Structures for Longest Common Extensions, See [43], 39:1--39:20.
[26]
Martin Farach. 1997. Optimal Suffix Tree Construction with Large Alphabets. In 38th Annual Symposium on Foundations of Computer Science, FOCS '97, Miami Beach, Florida, USA, October 19--22, 1997. IEEE Computer Society, 137--143.
[27]
Paolo Ferragina, Rodrigo González, Gonzalo Navarro, and Rossano Venturini. 2008. Compressed text indexes: From theory to practice. ACM J. Exp. Algorithmics 13 (2008).
[28]
Paolo Ferragina and Giovanni Manzini. 2005. Indexing compressed text. J. ACM 52, 4 (2005), 552--581.
[29]
Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, and Gonzalo Navarro. 2004. An Alphabet-Friendly FM-Index. In String Processing and Information Retrieval, 11th International Conference, SPIRE 2004, Padova, Italy, October 5--8, 2004, Proceedings (Lecture Notes in Computer Science), Alberto Apostolico and Massimo Melucci (Eds.), Vol. 3246. Springer, 150--160.
[30]
Johannes Fischer and Pawel Gawrychowski. 2015. Alphabet-Dependent String Searching with Wexponential Search Trees, See [20], 160--171.
[31]
Johannes Fischer and Volker Heun. 2011. Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays. SIAM J. Comput. 40, 2 (2011), 465--492.
[32]
Gianni Franceschini and S. Muthukrishnan. 2007. In-Place Suffix Sorting. In Automata, Languages and Programming, 34th International Colloquium, ICALP 2007, Wroclaw, Poland, July 9--13, 2007, Proceedings (Lecture Notes in Computer Science), Lars Arge, Christian Cachin, Tomasz Jurdzinski, and Andrzej Tarlecki (Eds.), Vol. 4596. Springer, 533--545.
[33]
Michael L. Fredman, János Komlós, and Endre Szemerédi. 1984. Storing a Sparse Table with 0(1) Worst Case Access Time. J. ACM 31, 3 (1984), 538--544.
[34]
Michael L. Fredman and Dan E. Willard. 1990. BLASTING through the Information Theoretic Barrier with FUSION TREES. In Proceedings of the 22nd Annual ACM Symposium on Theory of Computing, May 13--17, 1990, Baltimore, Maryland, USA, Harriet Ortiz (Ed.). ACM, 1--7.
[35]
Travis Gagie, Gonzalo Navarro, and Nicola Prezza. 2020. Fully Functional Suffix Trees and Optimal Text Searching in BWT-Runs Bounded Space. J. ACM 67, 1 (2020), 2:1--2:54.
[36]
Younan Gao, Meng He, and Yakov Nekrich. 2020. Fast Preprocessing for Optimal Orthogonal Range Reporting and Range Successor with Applications to Text Indexing, See [43], 54:1--54:18.
[37]
Pawel Gawrychowski and Tomasz Kociumaka. 2017. Sparse Suffix Tree Construction in Optimal Time and Space. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16--19, Philip N. Klein (Ed.). SIAM, 425--439.
[38]
Simon Gog, Timo Beller, Alistair Moffat, and Matthias Petri. 2014. From Theory to Practice: Plug and Play with Succinct Data Structures. In Experimental Algorithms - 13th International Symposium, SEA 2014, Copenhagen, Denmark, June 29 - July 1, 2014. Proceedings (Lecture Notes in Computer Science), Joachim Gudmundsson and Jyrki Katajainen (Eds.), Vol. 8504. Springer, 326--337.
[39]
Simon Gog, Juha Kärkkäinen, Dominik Kempa, Matthias Petri, and Simon J. Puglisi. 2019. Fixed Block Compression Boosting in FM-Indexes: Theory and Practice. Algorithmica 81, 4 (2019), 1370--1391.
[40]
Simon Gog, Alistair Moffat, and Matthias Petri. 2017. CSA++: Fast Pattern Search for Large Alphabets. In Proceedings of the Ninteenth Workshop on Algorithm Engineering and Experiments, ALENEX 2017, Barcelona, Spain, Hotel Porta Fira, January 17--18, 2017, Sándor P. Fekete and Vijaya Ramachandran (Eds.). SIAM, 73--82.
[41]
Keisuke Goto. 2019. Optimal Time and Space Construction of Suffix Arrays and LCP Arrays for Integer Alphabets. In Prague Stringology Conference 2019, Prague, Czech Republic, August 26--28, 2019, Jan Holub and Jan Zdárek (Eds.). Czech Technical University in Prague, Faculty of Information Technology, Department of Theoretical Computer Science, 111--125. http://www.stringology.org/event/2019/p11.html
[42]
Szymon Grabowski and Marcin Raniszewski. 2017. Sampled suffix array with minimizers. Softw. Pract. Exp. 47, 11 (2017), 1755--1771.
[43]
Fabrizio Grandoni, Grzegorz Herman, and Peter Sanders (Eds.). 2020. 28th Annual European Symposium on Algorithms, ESA 2020, September 7--9, 2020, Pisa, Italy (Virtual Conference). LIPIcs, Vol. 173. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://www.dagstuhl.de/dagpub/978-3-95977-162-7
[44]
Roberto Grossi and Jeffrey Scott Vitter. 2005. Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. SIAM J. Comput. 35, 2 (2005), 378--407.
[45]
Manish Gupta and Michael Bendersky. 2015. Information Retrieval with Verbose Queries. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, August 9--13, 2015, Ricardo Baeza-Yates, Mounia Lalmas, Alistair Moffat, and Berthier A. Ribeiro-Neto (Eds.). ACM, 1121--1124.
[46]
Dan Gusfield. 1997. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press.
[47]
Monika Rauch Henzinger. 2006. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, August 6--11, 2006, Efthimis N. Efthimiadis, Susan T. Dumais, David Hawking, and Kalervo Järvelin (Eds.). ACM, 284--291.
[48]
Wing-Kai Hon, Kunihiko Sadakane, and Wing-Kin Sung. 2009. Breaking a Time-and-Space Barrier in Constructing Full-Text Indices. SIAM J. Comput. 38, 6 (2009), 2162--2178.
[49]
Tomohiro I, Juha Kärkkäinen, and Dominik Kempa. 2014. Faster Sparse Suffix Sorting. In 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014), STACS 2014, March 5--8, 2014, Lyon, France (LIPIcs), Ernst W. Mayr and Natacha Portier (Eds.), Vol. 25. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 386--396.
[50]
Chirag Jain, Arang Rhie, Nancy Hansen, Sergey Koren, and Adam M. Phillippy. 2022. Long-read mapping to repetitive reference sequences using Winnowmap2. Nat Methods 19 (2022), 705--710.
[51]
Jiaojiao Jiang, Steve Versteeg, Jun Han, Md. Arafat Hossain, Jean-Guy Schneider, Christopher Leckie, and Zeinab Farahmandpour. 2019. P-Gram: Positional N-Gram for the Clustering of Machine-Generated Messages. IEEE Access 7 (2019), 88504--88516.
[52]
Juha Kärkkäinen and Dominik Kempa. 2016. LCP Array Construction in External Memory. ACM J. Exp. Algorithmics 21, 1 (2016), 1.7:1--1.7:22.
[53]
Juha Kärkkäinen and Dominik Kempa. 2016. LCP Array Construction Using O(sort(n)) (or Less) I/Os. In String Processing and Information Retrieval - 23rd International Symposium, SPIRE 2016, Beppu, Japan, October 18--20, 2016, Proceedings (Lecture Notes in Computer Science), Shunsuke Inenaga, Kunihiko Sadakane, and Tetsuya Sakai (Eds.), Vol. 9954. 204--217.
[54]
Juha Kärkkäinen and Dominik Kempa. 2019. Better External Memory LCP Array Construction. ACM J. Exp. Algorithmics 24, 1 (2019), 1.3:1--1.3:27.
[55]
Juha Kärkkäinen, Dominik Kempa, and Simon J. Puglisi. 2015. Parallel External Memory Suffix Sorting, See [20], 329--342.
[56]
Juha Kärkkäinen, Dominik Kempa, Simon J. Puglisi, and Bella Zhukova. 2017. Engineering External Memory Induced Suffix Sorting. In Proceedings of the Ninteenth Workshop on Algorithm Engineering and Experiments, ALENEX 2017, Barcelona, Spain, Hotel Porta Fira, January 17--18, 2017, Sándor P. Fekete and Vijaya Ramachandran (Eds.). SIAM, 98--108.
[57]
Juha Kärkkäinen, Peter Sanders, and Stefan Burkhardt. 2006. Linear work suffix array construction. J. ACM 53, 6 (2006), 918--936.
[58]
Toru Kasai, Gunho Lee, Hiroki Arimura, Setsuo Arikawa, and Kunsoo Park. 2001. Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications. In Combinatorial Pattern Matching, 12th Annual Symposium, CPM 2001 Jerusalem, Israel, July 1--4, 2001 Proceedings (Lecture Notes in Computer Science), Amihood Amir and Gad M. Landau (Eds.), Vol. 2089. Springer, 181--192.
[59]
Dominik Kempa and Tomasz Kociumaka. 2019. String synchronizing sets: sublinear-time BWT construction and optimal LCE data structure. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, AZ, USA, June 23--26, 2019, Moses Charikar and Edith Cohen (Eds.). ACM, 756--767.
[60]
Dominik Kempa and Tomasz Kociumaka. 2023. Breaking the O(n)-Barrier in the Construction of Compressed Suffix Arrays and Suffix Trees. In Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22--25, 2023, Nikhil Bansal and Viswanath Nagarajan (Eds.). SIAM, 5122--5202.
[61]
Tomasz Kociumaka. 2016. Minimal Suffix and Rotation of a Substring in Optimal Time. In 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016, June 27--29, 2016, Tel Aviv, Israel (LIPIcs), Roberto Grossi and Moshe Lewenstein (Eds.), Vol. 54. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 28:1--28:12.
[62]
Stefan Kurtz. 1999. Reducing the space requirement of suffix trees. Softw. Pract. Exp. 29, 13 (1999), 1149--1171.
[63]
Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L. Salzberg. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10, 3 (2009), R25.
[64]
Heng Li and Richard Durbin. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinform. 25, 14 (2009), 1754--1760.
[65]
Ruiqiang Li, Chang Yu, Yingrui Li, Tak Wah Lam, Siu-Ming Yiu, Karsten Kristiansen, and Jun Wang. 2009. SOAP2: an improved ultrafast tool for short read alignment. Bioinform. 25, 15 (2009), 1966--1967.
[66]
Zhize Li, Jian Li, and Hongwei Huo. 2022. Optimal in-place suffix sorting. Inf. Comput. 285, Part (2022), 104818.
[67]
Glennis A. Logsdon, Mitchell R. Vollger, and Evan E. Eichler. 2020. Long-read human genome sequencing and its applications. Nat. Rev. Genet. 21, 10 (2020), 597--614.
[68]
Grigorios Loukides and Solon P. Pissis. 2021. Bidirectional String Anchors: A New String Sampling Mechanism, See [77], 64:1--64:21.
[69]
Grigorios Loukides, Solon P. Pissis, and Michelle Sweering. 2023. Bidirectional String Anchors for Improved Text Indexing and Top-K Similarity Search. IEEE Trans. Knowl. Data Eng. (2023).
[70]
Mamoru Maekawa. 1985. A Square Root N Algorithm for Mutual Exclusion in Decentralized Systems. ACM Trans. Comput. Syst. 3, 2 (1985), 145--159.
[71]
Veli Mäkinen and Gonzalo Navarro. 2006. Position-Restricted Substring Searching. In LATIN 2006: Theoretical Informatics, 7th Latin American Symposium, Valdivia, Chile, March 20--24, 2006, Proceedings (Lecture Notes in Computer Science), José R. Correa, Alejandro Hevia, and Marcos A. Kiwi (Eds.), Vol. 3887. Springer, 703--714.
[72]
Udi Manber and Eugene W. Myers. 1993. Suffix Arrays: A New Method for On-Line String Searches. SIAM J. Comput. 22, 5 (1993), 935--948.
[73]
Olena Medelyan and Ian H. Witten. 2006. Thesaurus based automatic keyphrase indexing. In ACM/IEEE Joint Conference on Digital Libraries, JCDL 2006, Chapel Hill, NC, USA, June 11--15, 2006, Proceedings, Gary Marchionini, Michael L. Nelson, and Catherine C. Marshall (Eds.). ACM, 296--297.
[74]
Donald R. Morrison. 1968. PATRICIA - Practical Algorithm To Retrieve Information Coded in Alphanumeric. J. ACM 15, 4 (1968), 514--534.
[75]
Ingo Müller, Cornelius Ratsch, and Franz Färber. 2014. Adaptive String Dictionary Compression in In-Memory Column-Store Database Systems. In Proceedings of the 17th International Conference on Extending Database Technology, EDBT 2014, Athens, Greece, March 24--28, 2014, Sihem Amer-Yahia, Vassilis Christophides, Anastasios Kementsietsidis, Minos N. Garofalakis, Stratos Idreos, and Vincent Leroy (Eds.). OpenProceedings.org, 283--294.
[76]
J. Ian Munro, Gonzalo Navarro, and Yakov Nekrich. 2017. Space-Efficient Construction of Compressed Indexes in Deterministic Linear Time. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16--19. 408--424.
[77]
Petra Mutzel, Rasmus Pagh, and Grzegorz Herman (Eds.). 2021. 29th Annual European Symposium on Algorithms, ESA 2021, September 6--8, 2021, Lisbon, Portugal (Virtual Conference). LIPIcs, Vol. 204. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://www.dagstuhl.de/dagpub/978-3-95977-204-4
[78]
Gonzalo Navarro. 2016. Compact Data Structures - A Practical Approach. Cambridge University Press. http://www.cambridge.org/de/academic/subjects/computer-science/algorithmics-complexity-computer-algebra-and-computational-g/compact-data-structures-practical-approach?format=HB
[79]
Gonzalo Navarro and Yakov Nekrich. 2017. Time-Optimal Top-k Document Retrieval. SIAM J. Comput. 46, 1 (2017), 80--113.
[80]
Enno Ohlebusch, Johannes Fischer, and Simon Gog. 2010. CST++. In String Processing and Information Retrieval - 17th International Symposium, SPIRE 2010, Los Cabos, Mexico, October 11--13, 2010. Proceedings (Lecture Notes in Computer Science), Edgar Chávez and Stefano Lonardi (Eds.), Vol. 6393. Springer, 322--333.
[81]
Nicola Prezza. 2021. Optimal Substring Equality Queries with Applications to Sparse Text Indexing. ACM Trans. Algorithms 17, 1 (2021), 7:1--7:23.
[82]
Michael Roberts, Wayne Hayes, Brian R. Hunt, Stephen M. Mount, and James A. Yorke. 2004. Reducing storage requirements for biological sequence comparison. Bioinform. 20, 18 (2004), 3363--3369.
[83]
Patricia Rodriguez-Tomé, Peter Stoehr, Graham Cameron, and Tomas P. Flores. 1996. The European Bioinformatics Institute (EBI) databases. Nucleic Acids Res. 24, 1 (1996), 6--12.
[84]
Kunihiko Sadakane. 2007. Compressed Suffix Trees with Full Functionality. Theory Comput. Syst. 41, 4 (2007), 589--607.
[85]
Saul Schleimer, Daniel Shawcross Wilkerson, and Alexander Aiken. 2003. Winnowing: Local Algorithms for Document Fingerprinting. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, June 9--12, 2003, Alon Y. Halevy, Zachary G. Ives, and AnHai Doan (Eds.). ACM, 76--85.
[86]
Kazutoshi Umemoto, Ruihua Song, Jian-Yun Nie, Xing Xie, Katsumi Tanaka, and Yong Rui. 2017. Search by Screenshots for Universal Article Clipping in Mobile Apps. ACM Trans. Inf. Syst. 35, 4 (2017), 34:1--34:29.
[87]
Jeffrey Scott Vitter. 2006. Algorithms and Data Structures for External Memory. Found. Trends Theor. Comput. Sci. 2, 4 (2006), 305--474.
[88]
Adrian Vogelsgesang, Michael Haubenschild, Jan Finis, Alfons Kemper, Viktor Leis, Tobias Mühlbauer, Thomas Neumann, and Manuel Then. 2018. Get Real: How Benchmarks Fail to Represent the Real World. In Proceedings of the 7th International Workshop on Testing Database Systems, DBTest@SIGMOD 2018, Houston, TX, USA, June 15, 2018, Alexander Böhm and Tilmann Rabl (Eds.). ACM, 1:1--1:6.
[89]
Peter Weiner. 1973. Linear Pattern Matching Algorithms. In 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, October 15--17, 1973. IEEE Computer Society, 1--11.
[90]
Aaron M. Wenger et al. 2019. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37 (2019), 1155--1162.
[91]
Hongyu Zheng, Carl Kingsford, and Guillaume Marçais. 2020. Improved design and analysis of practical minimizers. Bioinform. 36, Supplement-1 (2020), i119--i127.

Cited By

View all
  • (2024)Space-Efficient Indexes for Uncertain Strings2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00367(4828-4842)Online publication date: 13-May-2024
  • (2024) Conway–Bromage–Lyndon (CBL): an exact, dynamic representation of k -mer sets Bioinformatics10.1093/bioinformatics/btae21740:Supplement_1(i48-i57)Online publication date: 28-Jun-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 16, Issue 9
May 2023
330 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 May 2023
Published in PVLDB Volume 16, Issue 9

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Space-Efficient Indexes for Uncertain Strings2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00367(4828-4842)Online publication date: 13-May-2024
  • (2024) Conway–Bromage–Lyndon (CBL): an exact, dynamic representation of k -mer sets Bioinformatics10.1093/bioinformatics/btae21740:Supplement_1(i48-i57)Online publication date: 28-Jun-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media