Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Compressed Cache-Oblivious String B-Tree

Published: 03 August 2016 Publication History

Abstract

In this article, we study three variants of the well-known prefix-search problem for strings, and we design solutions for the cache-oblivious model which improve the best known results. Among these contributions, we close (asymptotically) the classic problem, which asks for the detection of the set of strings that share the longest common prefix with a queried pattern by providing an I/O-optimal solution that matches the space lower bound for tries up to a constant multiplicative factor of the form (1 + ϵ), for ϵ > 0. Our solutions hinge upon a novel compressed storage scheme that adds the ability to decompress prefixes of the stored strings I/O-optimally to the elegant locality-preserving front coding (Bender et al. 2006) still preserving its space bounds.

References

[1]
Alberto Apostolico. 1985. The myriad virtues of subword trees. Combinatorial Algorithms on Words (1985), 85--96.
[2]
Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. 2010. Fast prefix search in little space, with applications. In Proceedings of the 18th Annual European Symposium on Algorithms (ESA). 427--438.
[3]
Michael A. Bender, Martin Farach-Colton, and Bradley C. Kuszmaul. 2006. Cache-oblivious string B-trees. In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). 233--242.
[4]
Gerth Stølting Brodal and Rolf Fagerberg. 2006. Cache-oblivious string dictionaries. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 581--590. 10.1145/1109557.1109621
[5]
Andrej Brodnik and J. Ian Munro. 1999. Membership in constant time and almost-minimum space. SIAM Journal of Computing 28, 5 (1999), 1627--1640.
[6]
Erik D. Demaine, John Iacono, and Stefan Langerman. 2004. Worst-case optimal tree layout in a memory hierarchy. CoRR cs.DS/0410048 (2004).
[7]
Peter Elias. 1974. Efficient storage and retrieval by content and address of static files. J. ACM 21, 2 (1974), 246--260.
[8]
Robert M. Fano. 1971. On the number of bits required to implement an associative memory. Memorandum 61, Computer Structures Group, Project MAC. MIT, Cambridge, MA.
[9]
Paolo Ferragina. 2013. On the weak prefix-search problem. Theoretical Computer Science 483 (2013), 75--84.
[10]
Paolo Ferragina and Roberto Grossi. 1999. The string b-tree: A new data structure for string search in external memory and its applications. Journal of the ACM 46, 2 (1999), 236--280. org/10.1145/301970.301973
[11]
Paolo Ferragina, Roberto Grossi, Ankur Gupta, Rahul Shah, and Jeffrey Scott Vitter. 2008. On searching compressed string collections cache-obliviously. In Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). 181--190. 10.1145/1376916.1376943
[12]
Paolo Ferragina and Rossano Venturini. 2010. The compressed permuterm index. ACM Transactions on Algorithms (TALG) 7, 1, Article 10 (2010), 10:1--10:21 pages.
[13]
Paolo Ferragina and Rossano Venturini. 2013. Compressed cache-oblivious string b-tree. In Proceedings of 21th Annual European Symposium on Algorithms (ESA). 469--480. 10.1007/978-3-642-40450-4_40
[14]
William Frakes and Ricardo Baeza-Yates. 1992. Information Retrieval: Data Structures and Algorithms. Prentice-Hall.
[15]
Edward Fredkin. 1960. Trie memory. Communication of the ACM 3, 9 (Sept. 1960), 490--499.
[16]
Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. 2012. Cache-oblivious algorithms. ACM Transactions on Algorithms (TALG) 8, 1 (2012), 4.
[17]
Dan Gusfield. 1997. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press.
[18]
Meng He, J. Ian Munro, and Srinivasa Rao Satti. 2012. Succinct ordinal trees based on tree covering. ACM Transactions on Algorithms (TALG) 8, 4 (2012), 42.
[19]
Wing-Kai Hon, Rahul Shah, and Jeffrey Scott Vitter. 2010. Compression, indexing, and retrieval for massive string data. In Procedings of the 21st Annual Symposium on Combinatorial Pattern Matching CPM. 260--274.
[20]
Richard M. Karp and Michael O. Rabin. 1987. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development 31, 2 (1987), 249--260.
[21]
Donald R. Morrison. 1968. PATRICIA - Practical algorithm to retrieve coded in alphanumeric. Journal of the ACM 15, 4 (1968), 514--534.
[22]
J. Ian Munro. 1996. Tables. In Proceedings of the 16th Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS). 37--42.
[23]
Gonzalo Navarro and Veli Mäkinen. 2007. Compressed full-text indexes. ACM Computing Surveys 39, 1, Article 2 (2007).
[24]
Ian H. Witten, Alistair Moffat, and Timothy C. Bell. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images (2nd ed.). Morgan Kaufmann.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Algorithms
ACM Transactions on Algorithms  Volume 12, Issue 4
September 2016
310 pages
ISSN:1549-6325
EISSN:1549-6333
DOI:10.1145/2983296
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 August 2016
Accepted: 01 March 2016
Revised: 01 February 2016
Received: 01 April 2015
Published in TALG Volume 12, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Pattern matching
  2. compressed index
  3. data compression
  4. indexing data structure
  5. string dictionary

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • SoBigData EU Project
  • MIUR of Italy under project PRIN ARS Technomedia 2012

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)2
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media