Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3618260.3649649acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article
Open access

Space Lower Bounds for Dynamic Filters and Value-Dynamic Retrieval

Published: 11 June 2024 Publication History

Abstract

A filter is a data structure that answers approximate-membership queries on a set S of n elements, with a false-positive rate of є. A filter is said to be dynamic if it supports insertions/deletions to the set S, subject to a capacity constraint of n. This paper considers the space requirement of filters, regardless of running time. It has been known for decades that static filters have optimal space n logє−1 + O(1) expected bits, and that dynamic filters can be implemented in space n logє−1 + Θ(n) bits. We prove that this Θ(n)-bit gap is fundamental: any dynamic filter must use n logє−1 + Ω(n) bits, no matter the choice of є. Extending our techniques, we are also able to obtain a lower bound for the value-dynamic retrieval problem. Here again, we show that there is a Θ(n)-bit gap between the optimal static and (value-)dynamic solutions.

References

[1]
Anes Abdennebi and Kamer Kaya. 2021. A Bloom Filter Survey: Variants for Different Domain Applications. CoRR, abs/2106.12189 (2021), arxiv:2106.12189.
[2]
Yuriy Arbitman, Moni Naor, and Gil Segev. 2010. Backyard Cuckoo Hashing: Constant Worst-Case Operations with a Succinct Representation. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS. IEEE Computer Society, 787–796. https://doi.org/10.1109/FOCS.2010.80
[3]
Martin Aumüller, Martin Dietzfelbinger, and Michael Rink. 2009. Experimental Variations of a Theoretically Good Retrieval Data Structure. In Algorithms - ESA 2009, 17th Annual European Symposium (Lecture Notes in Computer Science, Vol. 5757). Springer, 742–751. https://doi.org/10.1007/978-3-642-04128-0_66
[4]
Michael A. Bender, Rathish Das, Martin Farach-Colton, Tianchi Mo, David Tench, and Yung Ping Wang. 2021. Mitigating False Positives in Filters: to Adapt or to Cache? In 2nd Symposium on Algorithmic Principles of Computer Systems, APOCS. SIAM, 16–24. https://doi.org/10.1137/1.9781611976489.2
[5]
Michael A. Bender, Martin Farach-Colton, Mayank Goswami, Rob Johnson, Samuel McCauley, and Shikha Singh. 2018. Bloom Filters, Adaptivity, and the Dictionary Problem. In 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS. IEEE Computer Society, 182–193. https://doi.org/10.1109/FOCS.2018.00026
[6]
Michael A. Bender, Martin Farach-Colton, Rob Johnson, Russell Kraner, Bradley C. Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P. Spillane, and Erez Zadok. 2012. Don’t Thrash: How to Cache Your Hash on Flash. Proc. VLDB Endow., 5, 11, 1627–1637. https://doi.org/10.14778/2350229.2350275
[7]
Michael A. Bender, Martin Farach-Colton, John Kuszmaul, William Kuszmaul, and Mingmou Liu. 2022. On the optimal time/space tradeoff for hash tables. In STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing. ACM, 1284–1297. https://doi.org/10.1145/3519935.3519969
[8]
James Blustein and Amal El-Maazawi. 2002. Bloom filters. a tutorial, analysis, and survey. Halifax, NS: Dalhousie University, 1–31. https://cdn.dal.ca/content/dam/dalhousie/pdf/faculty/computerscience/technical-reports/CS-2002-10.pdf
[9]
Andrei Z. Broder and Michael Mitzenmacher. 2003. Survey: Network Applications of Bloom Filters: A Survey. Internet Math., 1, 4 (2003), 485–509. https://doi.org/10.1080/15427951.2004.10129096
[10]
Larry Carter, Robert W. Floyd, John Gill, George Markowsky, and Mark N. Wegman. 1978. Exact and Approximate Membership Testers. In Proceedings of the 10th Annual ACM Symposium on Theory of Computing. ACM, 59–65. https://doi.org/10.1145/800133.804332
[11]
Bernard Chazelle, Joe Kilian, Ronitt Rubinfeld, and Ayellet Tal. 2004. The Bloomier filter: an efficient data structure for static support lookup tables. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA. SIAM, 30–39. http://dl.acm.org/citation.cfm?id=982792.982797
[12]
Hanhua Chen, Liangyi Liao, Hai Jin, and Jie Wu. 2017. The dynamic cuckoo filter. In 25th IEEE International Conference on Network Protocols, ICNP. IEEE Computer Society, 1–10. https://doi.org/10.1109/ICNP.2017.8117563
[13]
John G. Cleary. 1984. Compact Hash Tables Using Bidirectional Linear Probing. IEEE Trans. Computers, 33, 9 (1984), 828–834. https://doi.org/10.1109/TC.1984.1676499
[14]
Erik D. Demaine, Friedhelm Meyer auf der Heide, Rasmus Pagh, and Mihai Puatracscu. 2006. De Dictionariis Dynamicis Pauco Spatio Utentibus (lat. On Dynamic Dictionaries Using Little Space). 3887 (2006), 349–361. https://doi.org/10.1007/11682462_34
[15]
Erik D Demaine, Friedhelm Meyer auf der Heide, Rasmus Pagh, and Mihai Pǎtraşcu. 2006. De Dictionariis Dynamicis Pauco Spatio Utentibus: (lat. On Dynamic Dictionaries Using Little Space). In Latin American Symposium on Theoretical Informatics. 349–361.
[16]
Martin Dietzfelbinger. 2007. Design Strategies for Minimal Perfect Hash Functions. In Stochastic Algorithms: Foundations and Applications, 4th International Symposium, SAGA (Lecture Notes in Computer Science, Vol. 4665). Springer, 2–17. https://doi.org/10.1007/978-3-540-74871-7_2
[17]
Martin Dietzfelbinger and Rasmus Pagh. 2008. Succinct Data Structures for Retrieval and Approximate Membership (Extended Abstract). In Automata, Languages and Programming, 35th International Colloquium, ICALP (Lecture Notes in Computer Science, Vol. 5125). Springer, 385–396. https://doi.org/10.1007/978-3-540-70575-8_32
[18]
Martin Dietzfelbinger and Michael Rink. 2009. Applications of a Splitting Trick. In Automata, Languages and Programming, 36th International Colloquium, ICALP (Lecture Notes in Computer Science, Vol. 5555). Springer, 354–365. https://doi.org/10.1007/978-3-642-02927-1_30
[19]
Martin Dietzfelbinger and Stefan Walzer. 2019. Constant-Time Retrieval with O( ologm) Extra Bits. In Proc. 36th STACS. 24:1–24:16. isbn:978-3-95977-100-9 issn:1868-8969 https://doi.org/10.4230/LIPIcs.STACS.2019.24
[20]
Peter C. Dillinger, Lorenz Hübschle-Schneider, Peter Sanders, and Stefan Walzer. 2022. Fast Succinct Retrieval and Approximate Membership Using Ribbon. In 20th International Symposium on Experimental Algorithms, SEA (LIPIcs, Vol. 233). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 4:1–4:20. https://doi.org/10.4230/LIPICS.SEA.2022.4
[21]
Peter C. Dillinger and Stefan Walzer. 2021. Ribbon filter: practically smaller than Bloom and Xor. CoRR, abs/2103.02515 (2021), arxiv:2103.02515.
[22]
Bin Fan, David G. Andersen, Michael Kaminsky, and Michael Mitzenmacher. 2014. Cuckoo Filter: Practically Better Than Bloom. In Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, CoNEXT. ACM, 75–88. https://doi.org/10.1145/2674005.2674994
[23]
Michael L Fredman and János Komlós. 1984. On the size of separating systems and families of perfect hash functions. SIAM Journal on Algebraic Discrete Methods, 5, 1 (1984), 61–68.
[24]
Afton Geil, Martin Farach-Colton, and John D. Owens. 2018. Quotient Filters: Approximate Membership Queries on the GPU. In 2018 IEEE International Parallel and Distributed Processing Symposium, IPDPS. IEEE Computer Society, 451–462. https://doi.org/10.1109/IPDPS.2018.00055
[25]
Thomas Mueller Graf and Daniel Lemire. 2020. Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters. ACM J. Exp. Algorithmics, 25 (2020), Article 1.5, mar, 16 pages. issn:1084-6654 https://doi.org/10.1145/3376122
[26]
Torben Hagerup and Torsten Tholey. 2001. Efficient minimal perfect hashing in nearly minimal space. In STACS 2001: 18th Annual Symposium on Theoretical Aspects of Computer Science. 317–326. https://doi.org/10.1007/3-540-44693-1_28
[27]
David J. Lee, Samuel McCauley, Shikha Singh, and Max Stein. 2021. Telescoping Filter: A Practical Adaptive Filter. 204 (2021), 60:1–60:18. https://doi.org/10.4230/LIPICS.ESA.2021.60
[28]
Mingmou Liu, Yitong Yin, and Huacheng Yu. 2020. Succinct Filters for Sets of Unknown Sizes. 168 (2020), 79:1–79:19. https://doi.org/10.4230/LIPICS.ICALP.2020.79
[29]
Shachar Lovett and Ely Porat. 2013. A Space Lower Bound for Dynamic Approximate Membership Data Structures. SIAM J. Comput., 42, 6, 2182–2196. https://doi.org/10.1137/120867044
[30]
Lailong Luo, Deke Guo, Richard T. B. Ma, Ori Rottenstreich, and Xueshan Luo. 2019. Optimizing Bloom Filter: Challenges, Solutions, and Comparisons. IEEE Commun. Surv. Tutorials, 21, 2 (2019), 1912–1949. https://doi.org/10.1109/COMST.2018.2889329
[31]
Lailong Luo, Deke Guo, Ori Rottenstreich, Richard T. B. Ma, Xueshan Luo, and Bangbang Ren. 2019. The Consistent Cuckoo Filter. In 2019 IEEE Conference on Computer Communications, INFOCOM. IEEE, 712–720. https://doi.org/10.1109/INFOCOM.2019.8737454
[32]
Kurt Mehlhorn. 1984. Data Structures and Algorithms 3: Multi-dimensional Searching and Computational Geometry. 3 (1984), isbn:3-540-13642-8 https://doi.org/10.1007/978-3-642-69900-9
[33]
Michael Mitzenmacher. 2018. A Model for Learned Bloom Filters and Optimizing by Sandwiching. 462–471. https://proceedings.neurips.cc/paper/2018/hash/0f49c89d1e7298bb9930789c8ed59d48-Abstract.html
[34]
Michael Mitzenmacher, Salvatore Pontarelli, and Pedro Reviriego. 2020. Adaptive Cuckoo Filters. 20 pages. https://doi.org/10.1145/3339504
[35]
Christian Worm Mortensen, Rasmus Pagh, and Mihai Puatracscu. 2005. On dynamic range reporting in one dimension. In Proceedings of the 37th Annual ACM Symposium on Theory of Computing. ACM, 104–111. https://doi.org/10.1145/1060590.1060606
[36]
Sabuzima Nayak and Ripon Patgiri. 2019. A Review on Role of Bloom Filter on DNA Assembly. IEEE Access, 7 (2019), 66939–66954. https://doi.org/10.1109/ACCESS.2019.2910180
[37]
Anna Pagh, Rasmus Pagh, and S. Srinivasa Rao. 2005. An optimal Bloom filter replacement. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA. SIAM, 823–829. http://dl.acm.org/citation.cfm?id=1070432.1070548
[38]
Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo hashing. J. Algorithms, 51, 2 (2004), 122–144. https://doi.org/10.1016/J.JALGOR.2003.12.002
[39]
Rasmus Pagh, Gil Segev, and Udi Wieder. 2013. How to Approximate a Set without Knowing Its Size in Advance. In 54th Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society, 80–89. https://doi.org/10.1109/FOCS.2013.17
[40]
Prashant Pandey, Michael A. Bender, Rob Johnson, and Rob Patro. 2017. A General-Purpose Counting Filter: Making Every Bit Count. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference. ACM, 775–787. https://doi.org/10.1145/3035918.3035963
[41]
Prashant Pandey, Alex Conway, Joe Durie, Michael A. Bender, Martin Farach-Colton, and Rob Johnson. 2021. Vector Quotient Filters: Overcoming the Time/Space Trade-Off in Filter Design. In SIGMOD ’21: International Conference on Management of Data. ACM, 1386–1399. https://doi.org/10.1145/3448016.3452841
[42]
Ripon Patgiri, Sabuzima Nayak, and Samir Kumar Borgohain. 2018. Preventing DDoS using Bloom Filter: A Survey. EAI Endorsed Trans. Scalable Inf. Syst., 5, 19 (2018), e3. https://doi.org/10.4108/EAI.19-6-2018.155865
[43]
Ripon Patgiri, Sabuzima Nayak, and Samir Kumar Borgohain. 2019. Role of Bloom Filter in Big Data Research: A Survey. CoRR, abs/1903.06565 (2019), arxiv:1903.06565.
[44]
Ely Porat. 2009. An Optimal Bloom Filter Replacement Based on Matrix Solving. In Computer Science - Theory and Applications, Fourth International Computer Science Symposium in Russia, CSR (Lecture Notes in Computer Science, Vol. 5675). Springer, 263–273. https://doi.org/10.1007/978-3-642-03351-3_25
[45]
Amritpal Singh, Sahil Garg, Ravneet Kaur, Shalini Batra, Neeraj Kumar, and Albert Y. Zomaya. 2020. Probabilistic data structures for big data analytics: A comprehensive review. Knowl. Based Syst., 188 (2020), https://doi.org/10.1016/J.KNOSYS.2019.104987
[46]
Sasu Tarkoma, Christian Esteve Rothenberg, and Eemil Lagerspetz. 2012. Theory and Practice of Bloom Filters for Distributed Systems. IEEE Commun. Surv. Tutorials, 14, 1 (2012), 131–155. https://doi.org/10.1109/SURV.2011.031611.00024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
STOC 2024: Proceedings of the 56th Annual ACM Symposium on Theory of Computing
June 2024
2049 pages
ISBN:9798400703836
DOI:10.1145/3618260
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Approximate membership data structure
  2. Bloom filter
  3. Retrieval data structure
  4. Space lower bound

Qualifiers

  • Research-article

Conference

STOC '24
Sponsor:
STOC '24: 56th Annual ACM Symposium on Theory of Computing
June 24 - 28, 2024
BC, Vancouver, Canada

Acceptance Rates

Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Upcoming Conference

STOC '25
57th Annual ACM Symposium on Theory of Computing (STOC 2025)
June 23 - 27, 2025
Prague , Czech Republic

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 230
    Total Downloads
  • Downloads (Last 12 months)230
  • Downloads (Last 6 weeks)57
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media