Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3183713.3196931acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

SuRF: Practical Range Query Filtering with Fast Succinct Tries

Published: 27 May 2018 Publication History
  • Get Citation Alerts
  • Abstract

    We present the Succinct Range Filter (SuRF), a fast and compact data structure for approximate membership tests. Unlike traditional Bloom filters, SuRF supports both single-key lookups and common range queries: open-range queries, closed-range queries, and range counts. SuRF is based on a new data structure called the Fast Succinct Trie (FST) that matches the point and range query performance of state-of-the-art order-preserving indexes, while consuming only 10 bits per trie node. The false positive rates in SuRF for both point and range queries are tunable to satisfy different application needs. We evaluate SuRF in RocksDB as a replacement for its Bloom filters to reduce I/O by filtering requests before they access on-disk data structures. Our experiments on a 100 GB dataset show that replacing RocksDB's Bloom filters with SuRFs speeds up open-seek (without upper-bound) and closed-seek (with upper-bound) queries by up to 1.5× and 5× with a modest cost on the worst-case (all-missing) point query throughput due to slightly higher false positive rate.

    References

    [1]
    2010. tx-trie 0.18 -- Succinct Trie Implementation. https://github.com/hillbig/ tx-trie. (2010).
    [2]
    2013. Squid Web Proxy Cache. http://www.squid-cache.org/. (2013).
    [3]
    2014. Google LevelDB. https://github.com/google/leveldb. (2014).
    [4]
    2015. Apache HBase. https://hbase.apache.org/. (2015).
    [5]
    2015. Facebook MyRocks. http://myrocks.io/. (2015).
    [6]
    2015. Facebook RocksDB. http://rocksdb.org/. (2015).
    [7]
    2015. KairosDB. https://kairosdb.github.io/. (2015).
    [8]
    2015. QuasarDB. https://en.wikipedia.org/wiki/Quasardb. (2015).
    [9]
    2016. ARF Implementation. https://github.com/carolinux/adaptive_range_filters. (2016).
    [10]
    2016. Succinct Data Structures. https://en.wikipedia.org/wiki/Succinct_data_ structure. (2016).
    [11]
    2017. InfluxData InfluxDB. https://www.influxdata.com/time-series-platform/ influxdb/. (2017).
    [12]
    2017. The InfluxDB Storage Engine and the Time-Structured Merge Tree (TSM). https://docs.influxdata.com/influxdb/v1.0/concepts/storage_engine/. (2017).
    [13]
    Rachit Agarwal, Anurag Khandelwal, and Ion Stoica. 2015. Succinct: Enabling queries on compressed data. In NSDI '15 . 337--350.
    [14]
    Karolina Alexiou, Donald Kossmann, and Per-Åke Larson. 2013. Adaptive range filters for cold data: Avoiding trips to siberia. Proceedings of the VLDB Endowment 6, 14 (2013), 1714--1725.
    [15]
    Diego Arroyuelo, Rodrigo Cánovas, Gonzalo Navarro, and Kunihiko Sadakane. 2010. Succinct trees in practice. In Proc. of ALENEX '10 . 84--97.
    [16]
    Manos Athanassoulis, Michael S Kester, Lukas M Maas, Radu Stoica, Stratos Idreos, Anastasia Ailamaki, and Mark Callaghan. 2016. Designing Access Methods: The RUM Conjecture. In EDBT, Vol. 2016. 461--466.
    [17]
    David Benoit, Erik D Demaine, J Ian Munro, Rajeev Raman, Venkatesh Raman, and S Srinivasa Rao. 2005. Representing trees of higher degree. Algorithmica 43, 4 (2005), 275--292.
    [18]
    Timo Bingmann. 2008. STX B+ Tree C++ Template Classes. http://idlebox.net/ 2007/stx-btree/. (2008).
    [19]
    Burton H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. CACM 13, 7 (1970), 422--426.
    [20]
    Flavio Bonomi, Michael Mitzenmacher, Rina Panigrahy, Sushil Singh, and George Varghese. 2006. An Improved Construction for Counting Bloom Filters. In 14th Annual European Symposium on Algorithms, LNCS 4168 . 684--695.
    [21]
    Brian Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Rus- sell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In ACM Symposium on Cloud Computing .
    [22]
    Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. Monkey: Optimal navigable key-value store. In Proceedings of the 2017 ACM International Conference on Management of Data . ACM, 79--94.
    [23]
    Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL Server's Memory-optimized OLTP Engine. In Proceedings of the 2013 ACM SIGMOD Inter- national Conference on Management of Data .
    [24]
    Siying Dong. 2017. personal communication. (2017). 2017-08--28.
    [25]
    Siying Dong, Mark Callaghan, Leonidas Galanis, Dhruba Borthakur, Tony Savor, and Michael Strum. 2017. Optimizing Space Amplification in RocksDB. In CIDR .
    [26]
    Facebook. 2015. RocksDB Tuning Guide. https://github.com/facebook/rocksdb/ wiki/RocksDB-Tuning-Guide. (2015).
    [27]
    Li Fan, Pei Cao, Jussara Almeida, and Andrei Z. Broder. 1998. Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. In Proc. ACM SIGCOMM .
    [28]
    Richard F Geary, Naila Rahman, Rajeev Raman, and Venkatesh Raman. 2006. A simple optimal representation for balanced parentheses. Theoretical Computer Science 368, 3 (2006).
    [29]
    Guy Golan-Gueta, Edward Bortnikov, Eshcar Hillel, and Idit Keidar. 2015. Scal- ing concurrent log-structured data stores. In Proceedings of the Tenth European Conference on Computer Systems . ACM, 32.
    [30]
    Rodrigo González, Szymon Grabowski, Veli Mäkinen, and Gonzalo Navarro. 2005. Practical implementation of rank and select queries. In Proc. of WEA . 27--38.
    [31]
    Roberto Grossi and Giuseppe Ottaviano. 2013. Design of practical succinct data structures for large data collections. In Proc. of SEA '13 .
    [32]
    Roberto Grossi and Giuseppe Ottaviano. 2015. Fast compressed tries through path decompositions. Journal of Experimental Algorithmics (JEA) 19 (2015), 3--4.
    [33]
    Roberto Grossi and Jeffrey Scott Vitter. 2005. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SICOMP 35, 2 (2005), 378--407.
    [34]
    Guy Jacobson. 1989. Space-efficient static trees and graphs. In Foundations of Computer Science . IEEE, 549--554.
    [35]
    Anurag Khandelwal, Rachit Agarwal, and Ion Stoica. 2016. Blowfish: Dynamic storage-performance tradeoff in data stores. In Proc of NSDI' 16 .
    [36]
    A. Lakshman and P. Malik. 2010. Cassandra: A decentralized structured storage system. ACM SIGOPS Operating System Review 44 (April 2010), 35--40.
    [37]
    Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The Adaptive Radix Tree: ARTful Indexing for Main-memory Databases. In Proceedings of ICDE .
    [38]
    Hsueh-I Lu and Chia-Chi Yeh. 2008. Balanced parentheses strike back. TALG 4, 3 (2008), 28.
    [39]
    Miguel Martínez-Prieto, Nieves Brisaboa, Rodrigo Cánovas, Francisco Claude, and Gonzalo Navarro. 2016. Practical compressed string dictionaries. Information Systems 56 (2016), 73--108.
    [40]
    J Ian Munro and Venkatesh Raman. 2001. Succinct representation of balanced parentheses and static trees. SIAM J. Comput. 31, 3 (2001), 762--776.
    [41]
    J Ian Munro, Venkatesh Raman, and S Srinivasa Rao. 2001. Space efficient suffix trees. Journal of Algorithms 39, 2 (2001), 205--222.
    [42]
    J Ian Munro and S Srinivasa Rao. 2004. Succinct representations of functions. In ICALP .
    [43]
    Gonzalo Navarro and Eliana Providel. 2012. Fast, small, simple rank/select on bitmaps. In Proc. of SEA '12 . 295--306.
    [44]
    Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. 1996. The Log-structured Merge-tree. Acta Inf. 33, 4 (1996), 351--385.
    [45]
    Felix Putze, Peter Sanders, and Singler Johannes. 2007. Cache-, Hash- and Space- Efficient Bloom Filters. In Experimental Algorithms . 108--121.
    [46]
    Naila Rahman, Rajeev Raman, et al . 2006. Engineering the LOUDS succinct tree representation. In Proc. of WEA '06 . 134--145.
    [47]
    Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. 2007. Succinct in- dexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. TALG 3, 4 (2007), 43.
    [48]
    Sean Rhea, Eric Wang, Edmund Wong, Ethan Atkins, and Nat Storer. 2017. Lit- tleTable: A Time-Series Database and Its Uses. In Proceedings of the 2017 ACM International Conference on Management of Data . ACM, 125--138.
    [49]
    Kunihiko Sadakane and Gonzalo Navarro. 2010. Fully-functional succinct trees. In SODA .
    [50]
    Russell Sears and Raghu Ramakrishnan. 2012. bLSM: A General Purpose Log Structured Merge Tree. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data .
    [51]
    Haoyu Song, Sarang Dharmapurikar, Jonathan Turner, and John Lockwood. 2005. Fast Hash Table Lookup Using Extended Bloom Filter: An Aid to Network Processing. In SIGCOMM . 181--192.
    [52]
    The Apache Software Foundation. 2015. Apache Cassandra. https://cassandra. apache.org/. (2015).
    [53]
    Sebastiano Vigna. 2008. Broadword implementation of rank/select queries. In Pro- ceedings of the 7th international conference on Experimental algorithms (WEA'08) . 154--168.
    [54]
    WiredTiger. 2014. WiredTiger. http://www.wiredtiger.com/. (2014).
    [55]
    Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang. 2015. LSM-trie: an LSM-tree- based ultra-large key-value store for small data. In Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference . USENIX Association, 71--82.
    [56]
    Minlan Yu, Alex Fabrikant, and Jennifer Rexford. 2009. BUFFALO: Bloom filter forwarding architecture for large organizations. In Proc. CoNEXT .
    [57]
    Huanchen Zhang, David G. Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, and Rui Shen. 2016. Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes. In SIGMOD .
    [58]
    Dong Zhou, David G. Andersen, and Michael Kaminsky. 2013. Space-Efficient, High-Performance Rank &Select Structures on Uncompressed Bit Sequences. In Symposium on Experimental Algorithms

    Cited By

    View all
    • (2024)Oasis: An Optimal Disjoint Segmented Learned Range FilterProceedings of the VLDB Endowment10.14778/3659437.365944717:8(1911-1924)Online publication date: 1-Apr-2024
    • (2024)Structural Designs Meet Optimality: Exploring Optimized LSM-tree Structures in a Colossal Configuration SpaceProceedings of the ACM on Management of Data10.1145/36549782:3(1-26)Online publication date: 30-May-2024
    • (2024)Optimizing Time Series Queries with VersionsProceedings of the ACM on Management of Data10.1145/36549622:3(1-27)Online publication date: 30-May-2024
    • Show More Cited By

    Index Terms

    1. SuRF: Practical Range Query Filtering with Fast Succinct Tries

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
      May 2018
      1874 pages
      ISBN:9781450347037
      DOI:10.1145/3183713
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 May 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Badges

      • Best Paper

      Author Tags

      1. fast succinct tries
      2. lsm-trees
      3. range filter
      4. succinct data structures
      5. surf

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      SIGMOD/PODS '18
      Sponsor:

      Acceptance Rates

      SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
      Overall Acceptance Rate 785 of 4,003 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)654
      • Downloads (Last 6 weeks)60
      Reflects downloads up to 09 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Oasis: An Optimal Disjoint Segmented Learned Range FilterProceedings of the VLDB Endowment10.14778/3659437.365944717:8(1911-1924)Online publication date: 1-Apr-2024
      • (2024)Structural Designs Meet Optimality: Exploring Optimized LSM-tree Structures in a Colossal Configuration SpaceProceedings of the ACM on Management of Data10.1145/36549782:3(1-26)Online publication date: 30-May-2024
      • (2024)Optimizing Time Series Queries with VersionsProceedings of the ACM on Management of Data10.1145/36549622:3(1-27)Online publication date: 30-May-2024
      • (2024)GRF: A Global Range Filter for LSM-Trees with Shape EncodingProceedings of the ACM on Management of Data10.1145/36549442:3(1-27)Online publication date: 30-May-2024
      • (2024)CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated InfrastructureProceedings of the ACM on Management of Data10.1145/36549272:3(1-28)Online publication date: 30-May-2024
      • (2024)LeCo: Lightweight Compression via Learning Serial CorrelationsProceedings of the ACM on Management of Data10.1145/36393202:1(1-28)Online publication date: 26-Mar-2024
      • (2024)Grafite: Taming Adversarial Queries with Optimal Range FiltersProceedings of the ACM on Management of Data10.1145/36392582:1(1-23)Online publication date: 26-Mar-2024
      • (2024)Wormhole Filters: Caching Your Hash on Persistent MemoryProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629590(456-471)Online publication date: 22-Apr-2024
      • (2024)Beyond Bloom: A Tutorial on Future Feature-Rich FiltersCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654681(636-644)Online publication date: 9-Jun-2024
      • (2024)Optimizing Read Performance of HBase through Dynamic Control of Data Block Sizes and KVCacheProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3635898(1495-1503)Online publication date: 8-Apr-2024
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media