Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3485447.3511996acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Seesaw Counting Filter: An Efficient Guardian for Vulnerable Negative Keys During Dynamic Filtering

Published: 25 April 2022 Publication History

Abstract

Bloom filter is an efficient data structure for filtering negative keys (keys not in a given set) with substantially small space. However, in real-world applications, there widely exist vulnerable negative keys, which will bring high costs if not being properly filtered, especially when positive keys are added/deleted dynamically. To address the problem, we propose SeeSaw Counting Filter (SSCF), which is innovated with encapsulating the vulnerable negative keys into a unified counter array named seesaw counter array, and dynamically modulating (or varying) the applied hash functions to guard the encapsulated keys from being misidentified. Moreover, we propose ada-SSCF to handle the scenarios where the vulnerable negative keys cannot be obtained in advance. We extensively evaluate our SSCF, which shows that SSCF outperforms the cutting-edge filters by 3 × on averages regarding accuracy while ensuring a low operation latency. All source codes are in [2].

References

[1]
[1] [n.d.]. https://bitcoinops.org/en/newsletters/2019/08/21/#bitcoin-core-16248.
[2]
2021. SSCF Source Code. https://anonymous.4open.science/r/SSCF-7505.
[3]
Parvez Anandam. 2019. Network Access Control using Bloom filters. https://courses.cs.washington.edu/courses/csep521/07wi/prj/parvez.pdf.
[4]
Burton H Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Communications of ACM 13, 7 (1970), 422–426.
[5]
Flavio Bonomi, Michael Mitzenmacher, Rina Panigrahy, Sushil Singh, and George Varghese. 2006. An improved construction for counting bloom filters. In European Symposium on Algorithms. Springer, 684–695.
[6]
Jehoshua Bruck, Jie Gao, and Anxiao Jiang. 2006. Weighted Bloom filter. In International Symposium on Information Theory. IEEE, 2304–2308.
[7]
Denis Charles and Kumar Chellapilla. 2008. Bloomier filters: A second look. In European Symposium on Algorithms. Springer, 259–270.
[8]
Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of symposium on Cloud computing. ACM, 143–154.
[9]
Zhenwei Dai and Anshumali Shrivastava. 2019. Adaptive learned Bloom filter (Ada-BF): Efficient utilization of the classifier. arXiv preprint (2019).
[10]
Kyle Deeds, Brian Hentschel, and Stratos Idreos. 2020. Stacked filters: learning to filter by structure. In Proceedings of International Conference on Very Large Data Bases, Vol. 14. VLDB Endowment, 600–612.
[11]
Facebook. 2013. A facebook fork of leveldb which is optimized for flash and big memory machines. https://rocksdb.org/.
[12]
Li Fan, Pei Cao, Jussara Almeida, and Andrei Z Broder. 2000. Summary cache: a scalable wide-area web cache sharing protocol. Transactions on Networking 8, 3 (2000), 281–293.
[13]
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google File System. ACM, 29–43.
[14]
Bob Goodwin, Michael Hopcroft, Dan Luu, Alex Clemmer, Mihaela Curmei, Sameh Elnikety, and Yuxiong He. 2017. BitFunnel: Revisiting signatures for search. In Proceedings of International Conference on Research and Development in Information Retrieval. ACM, 605–614.
[15]
Google. 2011. LevelDB. A fast and lightweight key/value database library. http://code.google.com/p/leveldb/.
[16]
Shalla Secure Services KG. 2021. Shalla’s Blacklists. http://www.shallalist.de/index.html.
[17]
Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The case for learned index structures. In Proceedings of the International Conference on Management of Data. ACM, 489–504.
[18]
Yi Lu, Balaji Prabhakar, and Flavio Bonomi. 2006. Perfect hashing for network applications. In International Symposium on Information Theory. IEEE, 2774–2778.
[19]
Siqiang Luo, Subarna Chatterjee, Rafael Ketsetsidis, Niv Dayan, Wilson Qin, and Stratos Idreos. 2020. Rosetta: A Robust Space-Time Optimized Range Filter for Key-Value Stores. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. ACM, Portland OR USA, 2071–2086. https://doi.org/10/gpcb5b
[20]
Michael Mitzenmacher. 2018. A model for learned Bloom filters and optimizing by sandwiching. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 464–473.
[21]
Palo Alto Networks. 2019. Palo Alto Networks Malicious IP Address Feeds. https://docs.paloaltonetworks.com/pan-os/8-1/pan-os-admin/policy/use-an-external-dynamic-list-in-policy/palo-alto-networks-malicious-ip-address-feeds.
[22]
Salvatore Pontarelli, Pedro Reviriego, and Juan Antonio Maestro. 2016. Improving counting Bloom filter performance with fingerprints. Inform. Process. Lett. 116, 4 (2016), 304–309.
[23]
David MW Powers. 1998. Applications and explanations of Zipf’s law. In New methods in language processing and computational natural language learning. Association for Computational Linguistics.
[24]
Do Le Quoc, Istemi Ekin Akkus, Pramod Bhatotia, Spyros Blanas, Ruichuan Chen, Christof Fetzer, and Thorsten Strufe. 2018. Approxjoin: Approximate distributed joins. In Proceedings of the ACM Symposium on Cloud Computing. ACM, 426–438.
[25]
Karan Singhal and Philip Weiss. 2020. DeepBloom. https://github.com/karan1149/DeepBloom/tree/master/data.
[26]
Rongbiao Xie, Meng Li, Zheyu Miao, Rong Gu, Huang He, Haipeng Dai, and Guihai Chen. 2021. Hash Adaptive Bloom filter. In Proceedings of International Conference on Data Engineering. IEEE.
[27]
Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G. Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2018. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In Proceedings of the 2018 International Conference on Management of Data. ACM, Houston TX USA, 323–336. https://doi.org/10/gg224v
[28]
Ming Zhong, Pin Lu, Kai Shen, and Joel Seiferas. 2008. Optimizing data popularity conscious Bloom filters. In Proceedings of symposium on Principles of distributed computing. ACM, 355–364.

Cited By

View all
  • (2024)Oasis: An Optimal Disjoint Segmented Learned Range FilterProceedings of the VLDB Endowment10.14778/3659437.365944717:8(1911-1924)Online publication date: 31-May-2024
  • (2024)Structural Designs Meet Optimality: Exploring Optimized LSM-tree Structures in a Colossal Configuration SpaceProceedings of the ACM on Management of Data10.1145/36549782:3(1-26)Online publication date: 30-May-2024
  • (2024)GRF: A Global Range Filter for LSM-Trees with Shape EncodingProceedings of the ACM on Management of Data10.1145/36549442:3(1-27)Online publication date: 30-May-2024
  • Show More Cited By

Index Terms

  1. Seesaw Counting Filter: An Efficient Guardian for Vulnerable Negative Keys During Dynamic Filtering
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          WWW '22: Proceedings of the ACM Web Conference 2022
          April 2022
          3764 pages
          ISBN:9781450390965
          DOI:10.1145/3485447
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 25 April 2022

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Bloom filter
          2. Hash Modulation
          3. Probabilistic Data Structure

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Funding Sources

          Conference

          WWW '22
          Sponsor:
          WWW '22: The ACM Web Conference 2022
          April 25 - 29, 2022
          Virtual Event, Lyon, France

          Acceptance Rates

          Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)49
          • Downloads (Last 6 weeks)6
          Reflects downloads up to 30 Aug 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Oasis: An Optimal Disjoint Segmented Learned Range FilterProceedings of the VLDB Endowment10.14778/3659437.365944717:8(1911-1924)Online publication date: 31-May-2024
          • (2024)Structural Designs Meet Optimality: Exploring Optimized LSM-tree Structures in a Colossal Configuration SpaceProceedings of the ACM on Management of Data10.1145/36549782:3(1-26)Online publication date: 30-May-2024
          • (2024)GRF: A Global Range Filter for LSM-Trees with Shape EncodingProceedings of the ACM on Management of Data10.1145/36549442:3(1-27)Online publication date: 30-May-2024
          • (2024)Beyond Bloom: A Tutorial on Future Feature-Rich FiltersCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654681(636-644)Online publication date: 9-Jun-2024
          • (2024)A Generic Framework for Finding Special Quadratic Elements in Data StreamsIEEE/ACM Transactions on Networking10.1109/TNET.2024.339202932:4(3269-3284)Online publication date: Aug-2024
          • (2024)The Reinforcement Cuckoo FilterIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621370(1940-1949)Online publication date: 20-May-2024
          • (2023)Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic WorkloadsProceedings of the ACM on Management of Data10.1145/36173331:3(1-25)Online publication date: 13-Nov-2023
          • (2023)EDIndex: Enabling Fast Data Queries in Edge Storage SystemsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591676(675-685)Online publication date: 19-Jul-2023

          View Options

          Get Access

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media