Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Grafite: Taming Adversarial Queries with Optimal Range Filters

Published: 26 March 2024 Publication History

Abstract

Range filters allow checking whether a query range intersects a given set of keys with a chance of returning a false positive answer, thus generalising the functionality of Bloom filters from point to range queries. Existing practical range filters have addressed this problem heuristically, resulting in high false positive rates and query times when dealing with adversarial inputs, such as in the common scenario where queries are correlated with the keys.
We introduce Grafite, a novel range filter that solves these issues with a simple design and clear theoretical guarantees that hold regardless of the input data and query distribution: given a fixed space budget of B bits per key, the query time is O(1), and the false positive probability is upper bounded by l/2B-2, where l is the query range size. Our experimental evaluation shows that Grafite is the only range filter to date to achieve robust and predictable false positive rates across all combinations of datasets, query workloads, and range sizes, while providing faster queries and construction times, and dominating all competitors in the case of correlated queries.
As a further contribution, we introduce a very simple heuristic range filter whose performance on uncorrelated queries is very close to or better than the one achieved by the best heuristic range filters proposed in the literature so far.

References

[1]
Karolina Alexiou, Donald Kossmann, and Per-Åke Larson. 2013. Adaptive Range Filters for Cold Data: Avoiding Trips to Siberia. PVLDB, Vol. 6, 14 (2013), 1714--1725. https://doi.org/10.14778/2556549.2556556
[2]
Stephen Alstrup, Gerth Stølting Brodal, and Theis Rauhe. 2001. Optimal static range reporting in one dimension. In Proc. 33rd Annual ACM Symposium on Theory of Computing (STOC). 476--482. https://doi.org/10.1145/380752.380842
[3]
Michael Axtmann, Sascha Witt, Daniel Ferizovic, and Peter Sanders. 2022. Engineering In-place (Shared-memory) Sorting Algorithms. ACM Trans. Parallel Comput., Vol. 9, 1 (2022), 2:1--2:62. https://doi.org/10.1145/3505286
[4]
Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. 2010. Fast Prefix Search in Little Space, with Applications. In Proc.18th Annual European Symposium on Algorithms (ESA). 427--438. https://doi.org/10.1007/978--3--642--15775--2_37
[5]
Burton H. Bloom. 1970. Space/Time Trade-offs in Hash Coding with Allowable Errors. Commun. ACM, Vol. 13, 7 (1970), 422--426. https://doi.org/10.1145/362686.362692
[6]
Andrei Broder and Michael Mitzenmacher. 2004. Network Applications of Bloom Filters: A Survey. Internet mathematics, Vol. 1, 4 (2004), 485--509.
[7]
Rayan Chikhi, Jan Holub, and Paul Medvedev. 2022. Data Structures to Represent a Set of k-long DNA Sequences. ACM Comput. Surv., Vol. 54, 1 (2022), 17:1--17:22. https://doi.org/10.1145/3445967
[8]
David Richard Clark. 1996. Compact Pat Trees. Ph.,D. Dissertation. University of Waterloo, Canada.
[9]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2022. Introduction to Algorithms 4 ed.). The MIT Press.
[10]
Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2018. Optimal Bloom Filters and Adaptive Merging for LSM-Trees. ACM Trans. Database Syst., Vol. 43, 4 (2018), 16:1--16:48. https://doi.org/10.1145/3276980
[11]
Niv Dayan, Ioana Bercea, Pedro Reviriego, and Rasmus Pagh. 2023. InfiniFilter: Expanding Filters to Infinity and Beyond. Proc. ACM Manag. Data, Vol. 1, 2, Article 140 (jun 2023), 27 pages. https://doi.org/10.1145/3589285
[12]
Sarang Dharmapurikar, Praveen Krishnamurthy, and David E. Taylor. 2003. Longest Prefix Matching Using Bloom Filters. In Proc. ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM). 201--212.
[13]
Peter C. Dillinger, Lorenz Hü bschle-Schneider, Peter Sanders, and Stefan Walzer. 2022. Fast Succinct Retrieval and Approximate Membership Using Ribbon. In Proc. 20th International Symposium on Experimental Algorithms (SEA), Vol. 233. 4:1--4:20. https://doi.org/10.4230/LIPICS.SEA.2022.4
[14]
Peter Elias. 1974. Efficient Storage and Retrieval by Content and Address of Static Files. J. ACM, Vol. 21, 2 (1974), 246--260.
[15]
Bin Fan, David G. Andersen, Michael Kaminsky, and Michael Mitzenmacher. 2014. Cuckoo Filter: Practically Better Than Bloom. In Proc. 10th ACM International on Conference on emerging Networking Experiments and Technologies (CoNEXT). 75--88. https://doi.org/10.1145/2674005.2674994
[16]
Robert Mario Fano. 1971. On the Number of Bits Required to Implement an Associative Memory. Massachusetts Institute of Technology, Project MAC.
[17]
Bob Goodwin, Michael Hopcroft, Dan Luu, Alex Clemmer, Mihaela Curmei, Sameh Elnikety, and Yuxiong He. 2017. BitFunnel: Revisiting Signatures for Search. In Proc. 40th International ACM Conference on Research and Development in Information Retrieval (SIGIR). 605--614. https://doi.org/10.1145/3077136.3080789
[18]
Mayank Goswami, Allan Grønlund, Kasper Green Larsen, and Rasmus Pagh. 2014. Approximate Range Emptiness in Constant Time and Optimal Space. In Proc. 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 769--775.
[19]
Thomas Mueller Graf and Daniel Lemire. 2020. Xor Filters. ACM J. Exp. Algorithmics, Vol. 25 (2020), 1--16. https://doi.org/10.1145/3376122
[20]
Guy Jacobson. 1989. Space-efficient Static Trees and Graphs. In Proc. 30th IEEE Symposium on Foundations of Computer Science (FOCS). 549--554. https://doi.org/10.1109/SFCS.1989.63533
[21]
Eric R. Knorr, Baptiste Lemaire, Andrew Lim, Siqiang Luo, Huanchen Zhang, Stratos Idreos, and Michael Mitzenmacher. 2022. Proteus: A Self-Designing Range Filter. In Proc. 48th ACM International Conference on Management of Data (SIGMOD). 1670--1684. Code available at https://github.com/Eric-R-Knorr/Proteus.
[22]
Evgenios M. Kornaropoulos, Silei Ren, and Roberto Tamassia. 2022. The Price of Tailoring the Index to Your Data: Poisoning Attacks on Learned Index Structures. In Proc. 48th International Conference on Management of Data (SIGMOD). 1331--1344. https://doi.org/10.1145/3514221.3517867
[23]
Florian Kurpicz. 2022. Engineering Compact Data Structures for Rank and Select Queries on Bit Vectors. In Proc. 29th International Symposium on String Processing and Information Retrieval (SPIRE). 257--272. https://doi.org/10.1007/978--3-031--20643--6_19
[24]
Lailong Luo, Deke Guo, Richard T. B. Ma, Ori Rottenstreich, and Xueshan Luo. 2019. Optimizing Bloom Filter: Challenges, Solutions, and Comparisons. IEEE Commun. Surv. Tutorials, Vol. 21, 2 (2019), 1912--1949. https://doi.org/10.1109/COMST.2018.2889329
[25]
Siqiang Luo, Subarna Chatterjee, Rafael Ketsetsidis, Niv Dayan, Wilson Qin, and Stratos Idreos. 2020. Rosetta: A Robust Space-Time Optimized Range Filter for Key-Value Stores. In Proc. 46th ACM International Conference on Management of Data (SIGMOD). 2071--2086.
[26]
Yoshinori Matsunobu, Siying Dong, and Herman Lee. 2020. MyRocks: LSM-tree database storage engine serving facebook's social graph. PVLDB, Vol. 13, 12 (2020), 3217--3230.
[27]
Bernhard Mößner, Christian Riegger, Arthur Bernhardt, and Ilia Petrov. 2023. bloomRF: On Performing Range-Queries in Bloom-Filters with Piecewise-Monotone Hash Functions and Prefix Hashing. In Proc. 26th International Conference on Extending Database Technology (EDBT). 131--143. https://doi.org/10.48786/edbt.2023.11
[28]
Gonzalo Navarro. 2016. Compact data structures: a practical approach. Cambridge University Press.
[29]
Gonzalo Navarro and Javiel Rojas-Ledesma. 2020. Predecessor Search. ACM Comput. Surv., Vol. 53, 5, Article 105 (2020), 35 pages. https://doi.org/10.1145/3409371
[30]
Daisuke Okanohara and Kunihiko Sadakane. 2007. Practical Entropy-Compressed Rank/Select Dictionary. In Proc. 9th Workshop on Algorithm Engineering and Experiments (ALENEX). 60--70. https://doi.org/10.1137/1.9781611972870.6
[31]
Giuseppe Ottaviano and Rossano Venturini. 2014. Partitioned Elias-Fano indexes. In Proc. 37th International ACM Conference on Research and Development in Information Retrieval (SIGIR). 273--282. https://doi.org/10.1145/2600428.2609615
[32]
Anna Pagh, Rasmus Pagh, and S. Srinivasa Rao. 2005. An Optimal Bloom Filter Replacement. In Proc. 16th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 823--829.
[33]
Giulio Ermanno Pibiri and Rossano Venturini. 2017. Dynamic Elias-Fano Representation. In Proc. 28th Annual Symposium on Combinatorial Pattern Matching (CPM). 30:1--30:14. https://doi.org/10.4230/LIPIcs.CPM.2017.30
[34]
Ely Porat. 2009. An Optimal Bloom Filter Replacement Based on Matrix Solving. In Proc. 4th International Computer Science Symposium in Russia on Computer Science - Theory and Applications (CSR). 263--273. https://doi.org/10.1007/978--3--642-03351--3_25
[35]
Sasu Tarkoma, Christian Esteve Rothenberg, and Eemil Lagerspetz. 2012. Theory and Practice of Bloom Filters for Distributed Systems. IEEE Commun. Surv. Tutorials, Vol. 14, 1 (2012), 131--155. https://doi.org/10.1109/SURV.2011.031611.00024
[36]
Kapil Vaidya, Subarna Chatterjee, Eric R. Knorr, Michael Mitzenmacher, Stratos Idreos, and Tim Kraska. 2022. SNARF : A Learning-Enhanced Range Filter. PVLDB, Vol. 15, 8 (2022), 1632--1644. https://doi.org/10.14778/3529337.3529347 Code available at https://github.com/kapilvaidya24/SNARF.
[37]
Sebastiano Vigna. 2013. Quasi-Succinct Indices. In Proc. 6th ACM International Conference on Web Search and Data Mining (WSDM). 83--92. https://doi.org/10.1145/2433396.2433409
[38]
Ziwei Wang, Zheng Zhong, Jiarui Guo, Yuhan Wu, Haoyu Li, Tong Yang, Yaofeng Tu, Huanchen Zhang, and Bin Cui. 2023. REncoder: A Space-Time Efficient Range Filter with Local Encoder. In Proc. 39th IEEE International Conference on Data Engineering (ICDE). https://doi.org/10.1109/ICDE55515.2023.00158
[39]
Mark N. Wegman and J. Lawrence Carter. 1981. New Hash Functions and Their Use in Authentication and Set Equality. J. Comput. System Sci., Vol. 22, 3 (June 1981), 265--279. https://doi.org/10.1016/0022-0000(81)90033--7
[40]
Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G. Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2018. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In Proc. 44th ACM International Conference on Management of Data (SIGMOD). 323--336. Code available at https://github.com/efficient/SuRF.
[41]
Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G. Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2020. Succinct Range Filters. ACM Trans. Database Syst., Vol. 45, 2 (2020), 5:1--5:31. https://doi.org/10.1145/3375660
[42]
Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G. Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2021. Succinct range filters. Commun. ACM, Vol. 64, 4 (2021), 166--173. https://doi.org/10.1145/3450262

Cited By

View all
  • (2024)Structural Designs Meet Optimality: Exploring Optimized LSM-tree Structures in a Colossal Configuration SpaceProceedings of the ACM on Management of Data10.1145/36549782:3(1-26)Online publication date: 30-May-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 2, Issue 1
SIGMOD
February 2024
1874 pages
EISSN:2836-6573
DOI:10.1145/3654807
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2024
Published in PACMMOD Volume 2, Issue 1

Author Tags

  1. bloom filter
  2. data structure
  3. range filter
  4. range search

Qualifiers

  • Research-article

Funding Sources

  • Ministero dell'Università e della Ricerca
  • European Union
  • European Union ? NextGenerationEU ? PNRR
  • European Union ? NextGenerationEU ? PNRR

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)242
  • Downloads (Last 6 weeks)68
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Structural Designs Meet Optimality: Exploring Optimized LSM-tree Structures in a Colossal Configuration SpaceProceedings of the ACM on Management of Data10.1145/36549782:3(1-26)Online publication date: 30-May-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media