Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

BhBF: A Bloom Filter Using Bh Sequences for Multi-set Membership Query

Published: 09 March 2022 Publication History

Abstract

Multi-set membership query is a fundamental issue for network functions such as packet processing and state machines monitoring. Given the rigid query speed and memory requirements, it would be promising if a multi-set query algorithm can be designed based on Bloom filter (BF), a space-efficient probabilistic data structure. However, existing efforts on multi-set query based on BF suffer from at least one of the following drawbacks: low query speed, low query accuracy, limitation in only supporting insertion and query operations, or limitation in the set size. To address the issues, we design a novel Bh sequence-based Bloom filter (BhBF) for multi-set query, which supports four operations: insertion, query, deletion, and update. In BhBF, the set ID is encoded as a code in a Bh sequence. Exploiting good properties of Bh sequences, we can correctly decode the BF cells to obtain the set IDs even when the number of hash collisions is high, which brings high query accuracy. In BhBF, we propose two strategies to further speed up the query speed and increase the query accuracy. On the theoretical side, we analyze the false positive and classification failure rate of our BhBF. Our results from extensive experiments over two real datasets demonstrate that BhBF significantly advances state-of-the-art multi-set query algorithms.

References

[2]
Burton H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13, 7 (1970), 422–426.
[3]
Flavio Bonomi, Michael Mitzenmacher, Rina Panigrah, Sushil Singh, and George Varghese. 2006. Beyond bloom filters: From approximate membership checks to approximate state machines. ACM SIGCOMM Computer Communication Review 36, 4 (2006), 315–326.
[4]
Raj Chandra Bose and Sarvadaman Chowla. 1960. Theorems in the Additive Theory of Numbers. Technical Report. North Carolina State University. Dept. of Statistics.
[5]
The CAIDA Anonymized Internet Traces. Retrieved on 11 Jan. 2022 from http://www.caida.org/data/.
[6]
Francis Chang, Wu-chang Feng, and Kang Li. 2004. Approximate caches for packet classification. In Proceedings of the IEEE INFOCOM 2004, Vol. 4. 2196–2207.
[7]
Adina Crainiceanu and Daniel Lemire. 2015. Bloofi: Multidimensional bloom filters. Information Systems 54 (2015), 311–324.
[8]
Haipeng Dai, Yuankun Zhong, Alex X Liu, Wei Wang, and Meng Li. 2016. Noisy bloom filters for multi-set membership testing. In Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science. 139–151.
[9]
HARM DERKSEN. 2004. Error-correcting codes and bh-sequences. IEEE Transactions on Information Theory 50, 3 (2004), 476–485.
[10]
Li Fan, Pei Cao, Jussara Almeida, and Andrei Z Broder. 2000. Summary cache: A scalable wide-area web cache sharing protocol. IEEE/ACM Transactions on Networking 8, 3 (2000), 281–293.
[11]
Michael T Goodrich and Michael Mitzenmacher. 2011. Invertible bloom lookup tables. In Proceedings of the 2011 49th Annual Allerton Conference on Communication, Control, and Computing. IEEE, 792–799.
[12]
Fang Hao, Murali Kodialam, TV Lakshman, and Haoyu Song. 2012. Fast dynamic multiple-set membership testing using combinatorial bloom filters. IEEE/ACM Transactions on Networking20, 1 (2012), 295–304.
[13]
Jianyuan Lu, Tong Yang, Yi Wang, Huichen Dai, Xi Chen, Linxiao Jin, Haoyu Song, and Bin Liu. 2018. Low computational cost bloom filters. IEEE/ACM Transactions on Networking 26, 5 (2018), 2254–2267.
[14]
Lailong Luo, Deke Guo, Richard TB Ma, Ori Rottenstreich, and Xueshan Luo. 2018. Optimizing Bloom filter: Challenges, solutions, and comparisons. IEEE Communications Surveys & Tutorials 21, 2 (2018), 1912–1949.
[15]
Lailong Luo, Deke Guo, Ori Rottenstreich, Richard TB Ma, Xueshan Luo, and Bangbang Ren. 2019. The consistent Cuckoo filter. In Proceedings of the IEEE INFOCOM 2019-IEEE Conference on Computer Communications. 712–720.
[16]
The MAWI Working Group Traffic Archive. [n.d.]. Retrieved on 11 Jan. 2022 from http://mawi.nezu.wide.ad.jp/mawi/.
[17]
Michael Mitzenmacher, Pedro Reviriego, and Salvatore Pontarelli. 2016. OMASS: One memory access set separation. IEEE Transactions on Knowledge and Data Engineering 28, 7 (2016), 1940–1943.
[18]
Jiangbo Qian, Zhipeng Huang, Qiang Zhu, and Huahui Chen. 2018. Hamming metric multi-granularity locality-sensitive bloom filter. IEEE/ACM Transactions on Networking 26, 4 (2018), 1660–1673.
[19]
Yan Qiao, Shigang Chen, Zhen Mo, and Myungkeun Yoon. 2016. When bloom filters are no longer compact: Multi-set membership lookup for network applications. IEEE/ACM Transactions on Networking 24, 6 (2016), 3326–3339.
[20]
Ori Rottenstreich, Yossi Kanizo, and Isaac Keslassy. 2013. The variable-increment counting Bloom filter. IEEE/ACM Transactions on Networking 22, 4 (2013), 1092–1105.
[21]
Simon Sidon. 1932. Ein satz \(\ddot{u}\)ber trigonometrische Polynome und seine Anwendung in der Theorie der Fourier-Reihen. Mathematische Annalen 106, 1 (1932), 536–539.
[22]
Lu Tang, Qun Huang, and Patrick P. C. Lee. 2019. MV-Sketch: A fast and compact invertible sketch for heavy flow detection in network data streams. In Proceedings of the IEEE INFOCOM 2019-IEEE Conference on Computer Communications. 2026–2034.
[23]
Sasu Tarkoma, Christian Esteve Rothenberg, and Eemil Lagerspetz. 2011. Theory and practice of bloom filters for distributed systems. IEEE Communications Surveys and Tutorials 14, 1 (2011), 131–155.
[24]
Yang Tong, Dongsheng Yang, Jie Jiang, Siang Gao, Bin Cui, Lei Shi, and Xiaoming Li. 2019. Coloring embedder: A memory efficient data structure for answering multi-set query. In Proceedings of the 2019 IEEE 35th International Conference on Data Engineering. 1142–1153.
[25]
Sisi Xiong, Yanjun Yao, Shuangjiang Li, Qing Cao, Tian He, Hairong Qi, Leon Tolbert, and Yilu Liu. 2017. KBF: Towards approximate and bloom filter based key-value storage for cloud computing systems. IEEE Transactions on Cloud Computing 5, 1 (2017), 85–98.
[26]
Tong Yang, Alex X Liu, Muhammad Shahzad, Dongsheng Yang, Qiaobin Fu, Gaogang Xie, and Xiaoming Li. 2017. A shifting framework for set queries. IEEE/ACM Transactions on Networking 25, 5 (2017), 3116–3131.
[27]
Tong Yang, Gaogang Xie, YanBiao Li, Qiaobin Fu, Alex X Liu, Qi Li, and Laurent Mathy. 2014. Guarantee IP lookup performance with FIB explosion. In Proceedings of the 2014 ACM Conference on SIGCOMM. 39–50.
[28]
Myung Keun Yoon, JinWoo Son, and Seon-Ho Shin. 2014. Bloom tree: A search tree based on bloom filters for multiple-set membership testing. In Proceedings of the IEEE INFOCOM 2014-IEEE Conference on Computer Communications. IEEE, 1429–1437.
[29]
Minlan Yu, Alex Fabrikant, and Jennifer Rexford. 2009. BUFFALO: Bloom filter forwarding architecture for large organizations. In Proceedings of the 5th International Conference on Emerging Networking Experiments and Technologies. 313–324.

Cited By

View all
  • (2023)An HBase-Based Optimization Model for Distributed Medical Data Storage and RetrievalElectronics10.3390/electronics1204098712:4(987)Online publication date: 16-Feb-2023
  • (2023)Multidimensional query processing algorithm by dimension transformationScientific Reports10.1038/s41598-023-31758-713:1Online publication date: 11-Apr-2023

Index Terms

  1. BhBF: A Bloom Filter Using Bh Sequences for Multi-set Membership Query

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 16, Issue 5
    October 2022
    532 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3514187
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 March 2022
    Accepted: 01 November 2021
    Revised: 01 September 2021
    Received: 01 April 2021
    Published in TKDD Volume 16, Issue 5

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Multi-set membership query
    2. bloom filter

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • National Natural Science Foundation of China
    • NSF Electrical, Communications and Cyber Systems (ECCS)
    • NSF Communication and Information Foundations (CIF)
    • Hunan Provincial Innovation Foundation for Postgraduate Studies

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)115
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)An HBase-Based Optimization Model for Distributed Medical Data Storage and RetrievalElectronics10.3390/electronics1204098712:4(987)Online publication date: 16-Feb-2023
    • (2023)Multidimensional query processing algorithm by dimension transformationScientific Reports10.1038/s41598-023-31758-713:1Online publication date: 11-Apr-2023

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media