Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3183713.3196931acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

SuRF: Practical Range Query Filtering with Fast Succinct Tries

Published: 27 May 2018 Publication History

Abstract

We present the Succinct Range Filter (SuRF), a fast and compact data structure for approximate membership tests. Unlike traditional Bloom filters, SuRF supports both single-key lookups and common range queries: open-range queries, closed-range queries, and range counts. SuRF is based on a new data structure called the Fast Succinct Trie (FST) that matches the point and range query performance of state-of-the-art order-preserving indexes, while consuming only 10 bits per trie node. The false positive rates in SuRF for both point and range queries are tunable to satisfy different application needs. We evaluate SuRF in RocksDB as a replacement for its Bloom filters to reduce I/O by filtering requests before they access on-disk data structures. Our experiments on a 100 GB dataset show that replacing RocksDB's Bloom filters with SuRFs speeds up open-seek (without upper-bound) and closed-seek (with upper-bound) queries by up to 1.5× and 5× with a modest cost on the worst-case (all-missing) point query throughput due to slightly higher false positive rate.

References

[1]
2010. tx-trie 0.18 -- Succinct Trie Implementation. https://github.com/hillbig/ tx-trie. (2010).
[2]
2013. Squid Web Proxy Cache. http://www.squid-cache.org/. (2013).
[3]
2014. Google LevelDB. https://github.com/google/leveldb. (2014).
[4]
2015. Apache HBase. https://hbase.apache.org/. (2015).
[5]
2015. Facebook MyRocks. http://myrocks.io/. (2015).
[6]
2015. Facebook RocksDB. http://rocksdb.org/. (2015).
[7]
2015. KairosDB. https://kairosdb.github.io/. (2015).
[8]
2015. QuasarDB. https://en.wikipedia.org/wiki/Quasardb. (2015).
[9]
2016. ARF Implementation. https://github.com/carolinux/adaptive_range_filters. (2016).
[10]
2016. Succinct Data Structures. https://en.wikipedia.org/wiki/Succinct_data_ structure. (2016).
[11]
2017. InfluxData InfluxDB. https://www.influxdata.com/time-series-platform/ influxdb/. (2017).
[12]
2017. The InfluxDB Storage Engine and the Time-Structured Merge Tree (TSM). https://docs.influxdata.com/influxdb/v1.0/concepts/storage_engine/. (2017).
[13]
Rachit Agarwal, Anurag Khandelwal, and Ion Stoica. 2015. Succinct: Enabling queries on compressed data. In NSDI '15 . 337--350.
[14]
Karolina Alexiou, Donald Kossmann, and Per-Åke Larson. 2013. Adaptive range filters for cold data: Avoiding trips to siberia. Proceedings of the VLDB Endowment 6, 14 (2013), 1714--1725.
[15]
Diego Arroyuelo, Rodrigo Cánovas, Gonzalo Navarro, and Kunihiko Sadakane. 2010. Succinct trees in practice. In Proc. of ALENEX '10 . 84--97.
[16]
Manos Athanassoulis, Michael S Kester, Lukas M Maas, Radu Stoica, Stratos Idreos, Anastasia Ailamaki, and Mark Callaghan. 2016. Designing Access Methods: The RUM Conjecture. In EDBT, Vol. 2016. 461--466.
[17]
David Benoit, Erik D Demaine, J Ian Munro, Rajeev Raman, Venkatesh Raman, and S Srinivasa Rao. 2005. Representing trees of higher degree. Algorithmica 43, 4 (2005), 275--292.
[18]
Timo Bingmann. 2008. STX B+ Tree C++ Template Classes. http://idlebox.net/ 2007/stx-btree/. (2008).
[19]
Burton H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. CACM 13, 7 (1970), 422--426.
[20]
Flavio Bonomi, Michael Mitzenmacher, Rina Panigrahy, Sushil Singh, and George Varghese. 2006. An Improved Construction for Counting Bloom Filters. In 14th Annual European Symposium on Algorithms, LNCS 4168 . 684--695.
[21]
Brian Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Rus- sell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In ACM Symposium on Cloud Computing .
[22]
Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. Monkey: Optimal navigable key-value store. In Proceedings of the 2017 ACM International Conference on Management of Data . ACM, 79--94.
[23]
Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL Server's Memory-optimized OLTP Engine. In Proceedings of the 2013 ACM SIGMOD Inter- national Conference on Management of Data .
[24]
Siying Dong. 2017. personal communication. (2017). 2017-08--28.
[25]
Siying Dong, Mark Callaghan, Leonidas Galanis, Dhruba Borthakur, Tony Savor, and Michael Strum. 2017. Optimizing Space Amplification in RocksDB. In CIDR .
[26]
Facebook. 2015. RocksDB Tuning Guide. https://github.com/facebook/rocksdb/ wiki/RocksDB-Tuning-Guide. (2015).
[27]
Li Fan, Pei Cao, Jussara Almeida, and Andrei Z. Broder. 1998. Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. In Proc. ACM SIGCOMM .
[28]
Richard F Geary, Naila Rahman, Rajeev Raman, and Venkatesh Raman. 2006. A simple optimal representation for balanced parentheses. Theoretical Computer Science 368, 3 (2006).
[29]
Guy Golan-Gueta, Edward Bortnikov, Eshcar Hillel, and Idit Keidar. 2015. Scal- ing concurrent log-structured data stores. In Proceedings of the Tenth European Conference on Computer Systems . ACM, 32.
[30]
Rodrigo González, Szymon Grabowski, Veli Mäkinen, and Gonzalo Navarro. 2005. Practical implementation of rank and select queries. In Proc. of WEA . 27--38.
[31]
Roberto Grossi and Giuseppe Ottaviano. 2013. Design of practical succinct data structures for large data collections. In Proc. of SEA '13 .
[32]
Roberto Grossi and Giuseppe Ottaviano. 2015. Fast compressed tries through path decompositions. Journal of Experimental Algorithmics (JEA) 19 (2015), 3--4.
[33]
Roberto Grossi and Jeffrey Scott Vitter. 2005. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SICOMP 35, 2 (2005), 378--407.
[34]
Guy Jacobson. 1989. Space-efficient static trees and graphs. In Foundations of Computer Science . IEEE, 549--554.
[35]
Anurag Khandelwal, Rachit Agarwal, and Ion Stoica. 2016. Blowfish: Dynamic storage-performance tradeoff in data stores. In Proc of NSDI' 16 .
[36]
A. Lakshman and P. Malik. 2010. Cassandra: A decentralized structured storage system. ACM SIGOPS Operating System Review 44 (April 2010), 35--40.
[37]
Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The Adaptive Radix Tree: ARTful Indexing for Main-memory Databases. In Proceedings of ICDE .
[38]
Hsueh-I Lu and Chia-Chi Yeh. 2008. Balanced parentheses strike back. TALG 4, 3 (2008), 28.
[39]
Miguel Martínez-Prieto, Nieves Brisaboa, Rodrigo Cánovas, Francisco Claude, and Gonzalo Navarro. 2016. Practical compressed string dictionaries. Information Systems 56 (2016), 73--108.
[40]
J Ian Munro and Venkatesh Raman. 2001. Succinct representation of balanced parentheses and static trees. SIAM J. Comput. 31, 3 (2001), 762--776.
[41]
J Ian Munro, Venkatesh Raman, and S Srinivasa Rao. 2001. Space efficient suffix trees. Journal of Algorithms 39, 2 (2001), 205--222.
[42]
J Ian Munro and S Srinivasa Rao. 2004. Succinct representations of functions. In ICALP .
[43]
Gonzalo Navarro and Eliana Providel. 2012. Fast, small, simple rank/select on bitmaps. In Proc. of SEA '12 . 295--306.
[44]
Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. 1996. The Log-structured Merge-tree. Acta Inf. 33, 4 (1996), 351--385.
[45]
Felix Putze, Peter Sanders, and Singler Johannes. 2007. Cache-, Hash- and Space- Efficient Bloom Filters. In Experimental Algorithms . 108--121.
[46]
Naila Rahman, Rajeev Raman, et al . 2006. Engineering the LOUDS succinct tree representation. In Proc. of WEA '06 . 134--145.
[47]
Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. 2007. Succinct in- dexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. TALG 3, 4 (2007), 43.
[48]
Sean Rhea, Eric Wang, Edmund Wong, Ethan Atkins, and Nat Storer. 2017. Lit- tleTable: A Time-Series Database and Its Uses. In Proceedings of the 2017 ACM International Conference on Management of Data . ACM, 125--138.
[49]
Kunihiko Sadakane and Gonzalo Navarro. 2010. Fully-functional succinct trees. In SODA .
[50]
Russell Sears and Raghu Ramakrishnan. 2012. bLSM: A General Purpose Log Structured Merge Tree. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data .
[51]
Haoyu Song, Sarang Dharmapurikar, Jonathan Turner, and John Lockwood. 2005. Fast Hash Table Lookup Using Extended Bloom Filter: An Aid to Network Processing. In SIGCOMM . 181--192.
[52]
The Apache Software Foundation. 2015. Apache Cassandra. https://cassandra. apache.org/. (2015).
[53]
Sebastiano Vigna. 2008. Broadword implementation of rank/select queries. In Pro- ceedings of the 7th international conference on Experimental algorithms (WEA'08) . 154--168.
[54]
WiredTiger. 2014. WiredTiger. http://www.wiredtiger.com/. (2014).
[55]
Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang. 2015. LSM-trie: an LSM-tree- based ultra-large key-value store for small data. In Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference . USENIX Association, 71--82.
[56]
Minlan Yu, Alex Fabrikant, and Jennifer Rexford. 2009. BUFFALO: Bloom filter forwarding architecture for large organizations. In Proc. CoNEXT .
[57]
Huanchen Zhang, David G. Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, and Rui Shen. 2016. Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes. In SIGMOD .
[58]
Dong Zhou, David G. Andersen, and Michael Kaminsky. 2013. Space-Efficient, High-Performance Rank &Select Structures on Uncompressed Bit Sequences. In Symposium on Experimental Algorithms

Cited By

View all
  • (2025)Disco: A Compact Index for LSM-treesProceedings of the ACM on Management of Data10.1145/37096833:1(1-27)Online publication date: 11-Feb-2025
  • (2024)LavaStore: ByteDance's Purpose-Built, High-Performance, Cost-Effective Local Storage Engine for Cloud ServicesProceedings of the VLDB Endowment10.14778/3685800.368580717:12(3799-3812)Online publication date: 8-Nov-2024
  • (2024)Aleph Filter: To Infinity in Constant TimeProceedings of the VLDB Endowment10.14778/3681954.368202717:11(3644-3656)Online publication date: 1-Jul-2024
  • Show More Cited By

Index Terms

  1. SuRF: Practical Range Query Filtering with Fast Succinct Tries

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
    May 2018
    1874 pages
    ISBN:9781450347037
    DOI:10.1145/3183713
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 May 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    • Best Paper

    Author Tags

    1. fast succinct tries
    2. lsm-trees
    3. range filter
    4. succinct data structures
    5. surf

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGMOD/PODS '18
    Sponsor:

    Acceptance Rates

    SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)692
    • Downloads (Last 6 weeks)90
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Disco: A Compact Index for LSM-treesProceedings of the ACM on Management of Data10.1145/37096833:1(1-27)Online publication date: 11-Feb-2025
    • (2024)LavaStore: ByteDance's Purpose-Built, High-Performance, Cost-Effective Local Storage Engine for Cloud ServicesProceedings of the VLDB Endowment10.14778/3685800.368580717:12(3799-3812)Online publication date: 8-Nov-2024
    • (2024)Aleph Filter: To Infinity in Constant TimeProceedings of the VLDB Endowment10.14778/3681954.368202717:11(3644-3656)Online publication date: 1-Jul-2024
    • (2024)Towards Systematic Index DynamizationProceedings of the VLDB Endowment10.14778/3681954.368196917:11(2867-2879)Online publication date: 1-Jul-2024
    • (2024)Oasis: An Optimal Disjoint Segmented Learned Range FilterProceedings of the VLDB Endowment10.14778/3659437.365944717:8(1911-1924)Online publication date: 1-Apr-2024
    • (2024)Memento Filter: A Fast, Dynamic, and Robust Range FilterProceedings of the ACM on Management of Data10.1145/36988202:6(1-27)Online publication date: 20-Dec-2024
    • (2024)SuccinctKV: a CPU-efficient LSM-tree Based KV Store with Scan-based CompactionACM Transactions on Architecture and Code Optimization10.1145/369587321:4(1-26)Online publication date: 20-Nov-2024
    • (2024)CAMAL: Optimizing LSM-trees via Active LearningProceedings of the ACM on Management of Data10.1145/36771382:4(1-26)Online publication date: 30-Sep-2024
    • (2024)Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and QualityProceedings of the ACM on Management of Data10.1145/36771342:4(1-31)Online publication date: 30-Sep-2024
    • (2024)Structural Designs Meet Optimality: Exploring Optimized LSM-tree Structures in a Colossal Configuration SpaceProceedings of the ACM on Management of Data10.1145/36549782:3(1-26)Online publication date: 30-May-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media