Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3626246.3654681acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
tutorial
Open access

Beyond Bloom: A Tutorial on Future Feature-Rich Filters

Published: 09 June 2024 Publication History

Abstract

Filters, such as Bloom, quotient, and cuckoo, save space by maintaining an approximate representation of a set and occasionally returning false positives. Filters play a critical role in building modern dataintensive applications and are used across various domains such as databases, storage engines, computational biology, cyber- security, and networks. There has been extensive research on filters in the past few decades resulting in filters with much improved performance and features. Yet modern data-intensive applications are still designed around the limitations of traditional filters resulting in complex designs and sub-optimal performance.
This tutorial aims to bring together researchers at the forefront of filter data structure research to help the database community learn about the recent advancements in the theory and practice of filters. The tutorial will cover real-world case studies of redesigning applications using the modern filter APIs to achieve simplicity and improved application performance. The tutorial will further help uncover the open research problems, both in theory and systems, and increase interaction among researchers to tackle those problems.

References

[1]
Karolina Alexiou, Donald Kossmann, and Per-Åke Larson. 2013. Adaptive range filters for cold data: Avoiding trips to Siberia. Proceedings of the VLDB Endowment, Vol. 6, 14 (2013), 1714--1725.
[2]
Paulo Sé rgio Almeida, Carlos Baquero, Nuno Preguicc a, and David Hutchison. 2007. Scalable Bloom Filters. Inform. Process. Lett. (2007).
[3]
Fatemeh Almodaresi, Jamshed Khan, Sergey Madaminov, Michael Ferdman, Rob Johnson, Prashant Pandey, and Rob Patro. 2022. An incrementally updatable and scalable system for large-scale sequence search using the Bentley--Saxe transformation. Bioinformatics, Vol. 38, 12 (March 2022), 3155--3163. https://doi.org/10.1093/bioinformatics/btac142
[4]
Fatemeh Almodaresi, Prashant Pandey, Michael Ferdman, Rob Johnson, and Rob Patro. 2019. An Efficient, Scalable and Exact Representation of High-Dimensional Color Information Enabled via de Bruijn Graph Search. In International Conference on Research in Computational Molecular Biology (RECOMB). Springer, 1--18.
[5]
Fatemeh Almodaresi, Prashant Pandey, Michael Ferdman, Rob Johnson, and Rob Patro. 2020. An Efficient, Scalable, and Exact Representation of High-Dimensional Color Information Enabled Using de Bruijn Graph Search. Journal of Computational Biology, Vol. 27, 4 (2020), 485--499.
[6]
Sattam Alsubaiee, Alexander Behm, Vinayak Borkar, Zachary Heilbron, Young-Seok Kim, Michael J Carey, Markus Dreseler, and Chen Li. 2014. Storage management in AsterixDB. Proceedings of the VLDB Endowment, Vol. 7, 10 (2014), 841--852.
[7]
David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, and Vijay Vasudevan. 2009. FAWN: A Fast Array of Wimpy Nodes. SOSP (2009).
[8]
Jim Apple. 2022. Stretching your data with taffy filters. Software: Practice and Experience (2022).
[9]
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems. 53--64.
[10]
Michael A. Bender, Alex Conway, Martin Farach-Colton, William Jannen, Yizheng Jiao, Rob Johnson, Eric Knorr, Sara McAllister, Nirjhar Mukherjee, Prashant Pandey, Donald E. Porter, Jun Yuan, and Yang Zhan. 2021a. External-memory Dictionaries in the Affine and PDAM Models. ACM Trans. Parallel Comput., Vol. 8, 3 (2021), 15:1--15:20. https://doi.org/10.1145/3470635
[11]
Michael A. Bender, Rathish Das, Mart'in Farach-Colton, Tianchi Mo, David Tench, and Yung Ping Wang. 2021b. Mitigating False Positives in Filters: to Adapt or to Cache?. In Proc. 2nd Symposium on Algorithmic Principles of Computer System (APoCS).
[12]
Michael A. Bender, Martin Farach-Colton, Mayank Goswami, Rob Johnson, Samuel McCauley, and Shikha Singh. 2018. Bloom Filters, Adaptivity, and the Dictionary Problem. In Proc. 59th Annual IEEE Symposium on Foundations of Computer Science (FOCS). Paris, France, 182--193.
[13]
Michael A. Bender, Martin Farach-Colton, Rob Johnson, Russell Kaner, Bradley C. Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P. Spillane, and Erez Zadok. 2012. Don't Thrash: How to Cache Your Hash on Flash. Proceedings of the VLDB Endowment, Vol. 5, 11 (2012).
[14]
Ioana O Bercea and Guy Even. 2020. Fully-Dynamic Space-Efficient Dictionaries and Filters with Constant Number of Memory Accesses. SWAT.
[15]
Burton H. Bloom. 1970. Space/time Trade-offs in Hash Coding With Allowable Errors. Commun. ACM, Vol. 13, 7 (1970), 422--426.
[16]
Flavio Bonomi, Michael Mitzenmacher, Rina Panigrahy, Sushil Singh, and George Varghese. 2006. An improved construction for counting Bloom filters. In European Symposium on Algorithms (ESA). Springer, 684--695.
[17]
Phelim Bradley, Henk C Den Bakker, Eduardo PC Rocha, Gil McVean, and Zamin Iqbal. 2019. Ultrafast search of all deposited bacterial and viral genomic data. Nature biotechnology, Vol. 37, 2 (2019), 152--159.
[18]
Alex D Breslow and Nuwan S Jayasena. 2018. Morton filters: faster, space-efficient cuckoo filters via biasing, compression, and decoupled logical sparsity. Proceedings of the VLDB Endowment, Vol. 11, 9 (2018), 1041--1055.
[19]
Andrei Broder and Michael Mitzenmacher. 2004. Network applications of Bloom filters: A survey. Internet Mathematics, Vol. 1, 4 (2004), 485--509.
[20]
Zhichao Cao, Siying Dong, Sagar Vemuri, and David HC Du. 2020. Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook. In 18th USENIX Conference on File and Storage Technologies (FAST). 209--223.
[21]
Badrish Chandramouli, Guna Prasaad, Donald Kossmann, Justin J Levandoski, James Hunter, and Mike Barnett. 2018. FASTER: A Concurrent Key-Value Store with In-Place Updates. SIGMOD (2018).
[22]
Bernard Chazelle, Joe Kilian, Ronitt Rubinfeld, and Ayellet Tal. 2004 a. The Bloomier filter: an efficient data structure for static support lookup tables. In Symposium on Discrete Algorithms.
[23]
Bernard Chazelle, Joe Kilian, Ronitt Rubinfeld, and Ayellet Tal. 2004 b. The Bloomier filter: an efficient data structure for static support lookup tables. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 30--39.
[24]
Hanhua Chen, Liangyi Liao, Hai Jin, and Jie Wu. 2017. The Dynamic Cuckoo Filter. In ICNP.
[25]
Rayan Chikhi and Guillaume Rizk. 2013. Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms for Molecular Biology, Vol. 8, 1 (2013), 22.
[26]
Justin Chu, Sara Sadeghi, Anthony Raymond, Shaun D Jackman, Ka Ming Nip, Richard Mar, Hamid Mohamadi, Yaron S Butterfield, A Gordon Robertson, and Inanc Birol. 2014. BioBloom tools: fast, accurate and memory-efficient host species sequence screening using bloom filters. Bioinformatics, Vol. 30, 23 (2014), 3402--3404.
[27]
Saar Cohen and Yossi Matias. 2003. Spectral Bloom filters. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 241--252.
[28]
Alex Conway, Mart'in Farach-Colton, and Rob Johnson. 2023. SplinterDB and Maplets: Improving the Tradeoffs in Key-Value Store Compaction Policy. Proceedings of the ACM on Management of Data (2023).
[29]
Alexander Conway, Martin Farach-Colton, and Philip Shilane. 2018. Optimal Hashing in External Memory. In ICALP (LIPIcs, Vol. 107). Schloss Dagstuhl - Leibniz-Zentrum fü r Informatik, 39:1--39:14.
[30]
Alexander Conway, Abhishek Gupta, Vijay Chidambaram, Martin Farach-Colton, Richard Spillane, Amy Tai, and Rob Johnson. 2020. $$SplinterDB$$: Closing the Bandwidth Gap for $$NVMe$$$$Key-Value$$ Stores. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). 49--63.
[31]
Marco Costa, Paolo Ferragina, and Giorgio Vinciguerra. 2023. Grafite: Taming Adversarial Queries with Optimal Range Filters. arXiv preprint arXiv:2311.15380 (2023).
[32]
Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. Monkey: Optimal Navigable Key-Value Store. SIGMOD (2017).
[33]
Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2018. Optimal Bloom Filters and Adaptive Merging for LSM-Trees. TODS, Vol. 43, 4 (2018), 16:1--16:48.
[34]
Niv Dayan, Ioana Bercea, and Rasmus Pagh. 2024. Aleph Filter: To Infinity in Constant Time. arXiv preprint arXiv:2404.04703 (2024).
[35]
Niv Dayan, Ioana Bercea, Pedro Reviriego, and Rasmus Pagh. 2023. InfiniFilter: Expanding Filters to Infinity and Beyond. Proceedings of the ACM on Management of Data (2023).
[36]
Niv Dayan and Stratos Idreos. 2018. Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging. SIGMOD (2018).
[37]
Niv Dayan and Stratos Idreos. 2019. The Log-Structured Merge-Bush & the Wacky Continuum. In SIGMOD.
[38]
Niv Dayan and Moshe Twitto. 2021. Chucky: A Succinct Cuckoo Filter for LSM-Tree. In SIGMOD.
[39]
Niv Dayan, Moshe Twitto, Yuval Rochman, Uri Beitler, Itai Ben Zion, Edward Bortnikov, Shmuel Dashevsky, Ofer Frishman, Evgeni Ginzburg, Igal Maly, et al. 2021. The End of Moore's Law and the Rise of the Data Processor. VLDB (2021).
[40]
Biplob Debnath, Sudipta Sengupta, Jin Li, David J Lilja, and David HC Du. 2011. BloomFlash: Bloom filter on flash-based storage. In Proceedings of the 31st International Conference on Distributed Computing Systems (ICDCS). 635--644.
[41]
Biplob K Debnath, Sudipta Sengupta, and Jin Li. 2010. ChunkStash: Speeding Up Inline Storage Deduplication Using Flash Memory. In Proceedings of the USENIX Annual Technical Conference (ATC).
[42]
Kyle Deeds, Brian Hentschel, and Stratos Idreos. 2020. Stacked filters: learning to filter by structure. PVLDB (2020).
[43]
Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL server's memory-optimized OLTP engine. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 1243--1254.
[44]
Peter C Dillinger, Lorenz Hübschle-Schneider, Peter Sanders, and Stefan Walzer. 2022. Fast Succinct Retrieval and Approximate Membership Using Ribbon. SEA (2022).
[45]
Peter C. Dillinger and Panagiotis (Pete) Manolios. 2009. Fast, All-Purpose State Storage. In Proceedings of the 16th International SPIN Workshop on Model Checking Software (Grenoble, France). Springer-Verlag, Berlin, Heidelberg, 12--31. https://doi.org/10.1007/978--3--642-02652--2_6
[46]
Gil Einziger and Roy Friedman. 2016. Counting with TinyTable: Every Bit Counts!. In Proceedings of the 17th International Conference on Distributed Computing and Networking (Singapore, Singapore) (ICDCN '16). Association for Computing Machinery, New York, NY, USA, Article 27, bibinfonumpages10 pages. https://doi.org/10.1145/2833312.2833449
[47]
John Esmet, Michael A. Bender, Martin Farach-Colton, and Bradley C. Kuszmaul. 2012. The TokuFS Streaming File System. In Proc. 4th USENIX Workshop on Hot Topics in Storage (HotStorage). Boston, MA, USA.
[48]
Tomer Even, Guy Even, and Adam Morrison. 2022. Prefix Filter: Practically and Theoretically Better Than Bloom. Proc. VLDB Endow., Vol. 15, 7 (2022), 1311--1323. https://doi.org/10.14778/3523210.3523211
[49]
Bin Fan, Dave G Andersen, Michael Kaminsky, and Michael D Mitzenmacher. 2014. Cuckoo Filter: Practically Better Than Bloom. In Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies. ACM, 75--88.
[50]
Martin Farach-Colton, Rohan J. Fernandes, and Miguel A. Mosteiro. 2009. Bootstrapping a hop-optimal network in the weak sensor model. ACM Trans. Algorithms, Vol. 5, 4 (2009), 37:1--37:30.
[51]
Mayank Goswami, Allan Grønlund, Kasper Green Larsen, and Rasmus Pagh. 2014. Approximate range emptiness in constant time and optimal space. In Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms. SIAM, 769--775.
[52]
Thomas Mueller Graf and Daniel Lemire. 2020. Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters. JEA (2020).
[53]
Deke Guo, Jie Wu, Honghui Chen, and Xueshan Luo. 2006. Theory and Network Applications of Dynamic Bloom Filters. In INFOCOM.
[54]
Deke Guo, Jie Wu, Honghui Chen, Ye Yuan, and Xueshan Luo. 2009. The Dynamic Bloom Filters. IEEE Trans Knowl Data Eng (2009).
[55]
Robert S. Harris and Paul Medvedev. 2019. Improved representation of sequence bloom trees. Bioinformatics, Vol. 36, 3 (Aug. 2019), 721--727. https://doi.org/10.1093/bioinformatics/btz662
[56]
InternetLiveStats.com. 2022. Google search statistics. https://www.internetlivestats.com/google-search-statistics/
[57]
Shaun D Jackman, Benjamin P Vandervalk, Hamid Mohamadi, Justin Chu, Sarah Yeo, S Austin Hammond, Golnaz Jahesh, Hamza Khan, Lauren Coombe, Rene L Warren, et al. 2017. ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter. Genome research, Vol. 27, 5 (2017), 768--777.
[58]
William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2015a. BetrFS: A Right-Optimized Write-Optimized File System. In Proceedings of the 13th USENIX Conference on File and Storage Technologies, FAST 2015, Santa Clara, CA, USA, February 16--19, 2015, Jiri Schindler and Erez Zadok (Eds.). USENIX Association, 301--315. https://www.usenix.org/conference/fast15/technical-sessions/presentation/jannen
[59]
William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2015b. BetrFS: Write-Optimization in a Kernel File System. ACM Trans. Storage, Vol. 11, 4 (2015), 18:1--18:29. https://doi.org/10.1145/2798729
[60]
Eric R Knorr, Baptiste Lemaire, Andrew Lim, Siqiang Luo, Huanchen Zhang, Stratos Idreos, and Michael Mitzenmacher. 2022. Proteus: A self-designing range filter. In Proceedings of the 2022 International Conference on Management of Data. 1670--1684.
[61]
Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. SIGMOD (2018).
[62]
Harald Lang, Thomas Neumann, Alfons Kemper, and Peter Boncz. 2019. Performance-Optimal Filtering: Bloom Overtakes Cuckoo at High Throughput. In VLDB.
[63]
David J. Lee, Samuel McCauley, Shikha Singh, and Max Stein. 2021. Telescoping Filter: A Practical Adaptive Filter., Vol. 204 (2021), 60:1--60:18. https://doi.org/10.4230/LIPIcs.ESA.2021.60
[64]
Meng Li, Deyi Chen, Haipeng Dai, Rongbiao Xie, Siqiang Luo, Rong Gu, Tong Yang, and Guihai Chen. 2022. Seesaw Counting Filter: An Efficient Guardian for Vulnerable Negative Keys During Dynamic Filtering. In Proceedings of the ACM Web Conference 2022 (Virtual Event, Lyon, France) (WWW '22). Association for Computing Machinery, New York, NY, USA, 2759--2767. https://doi.org/10.1145/3485447.3511996
[65]
Lailong Luo, Deke Guo, Ori Rottenstreich, Richard TB Ma, Xueshan Luo, and Bangbang Ren. 2019. The Consistent Cuckoo Filter. In INFOCOM.
[66]
Siqiang Luo, Subarna Chatterjee, Rafael Ketsetsidis, Niv Dayan, Wilson Qin, and Stratos Idreos. 2020. Rosetta: A robust space-time optimized range filter for key-value stores. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2071--2086.
[67]
Hunter McCoy, Steven Hofmeyr, Katherine Yelick, and Prashant Pandey. 2023. High-performance filters for gpus. In Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming. 160--173.
[68]
Michael Mitzenmacher, Salvatore Pontarelli, and Pedro Reviriego. 2020. Adaptive Cuckoo Filters. ACM J. Exp. Algorithmics, Vol. 25 (2020), 1--20. https://doi.org/10.1145/3339504
[69]
Bernhard Mößner, Christian Riegger, Arthur Bernhardt, and Ilia Petrov. 2023. bloomRF: On performing range-queries in Bloom-Filters with piecewise-monotone hash functions and prefix hashing. In Advances in database technology: Proceedings of the 26th International Conference on Extending database Technology (EDBT), Vol. 26. 131--143.
[70]
Patrick E. O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth J. O'Neil. 1996. The Log-Structured Merge-Tree (LSM-Tree). Acta Informatica (1996).
[71]
Anna Pagh, Rasmus Pagh, and S Srinivasa Rao. 2005. An optimal Bloom filter replacement. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 823--829.
[72]
Rasmus Pagh, Gil Segev, and Udi Wieder. 2013. How to Approximate a Set Without Knowing its Size in Advance. In FOCS.
[73]
Prashant Pandey, Fatemeh Almodaresi, Michael A Bender, Michael Ferdman, Rob Johnson, and Rob Patro. 2018. Mantis: A fast, small, and exact large-scale sequence-search index. Cell systems, Vol. 7, 2 (2018), 201--207.
[74]
Prashant Pandey, Michael A Bender, Rob Johnson, and Rob Patro. 2017a. deBGR: an efficient and near-exact representation of the weighted de Bruijn graph. Bioinformatics, Vol. 33, 14 (2017), i133--i141.
[75]
Prashant Pandey, Michael A Bender, Rob Johnson, and Rob Patro. 2017b. A general-purpose counting filter: Making every bit count. In Proceedings of the 2017 ACM International Conference on Management of Data. 775--787.
[76]
Prashant Pandey, Michael A Bender, Rob Johnson, and Rob Patro. 2017c. Squeakr: an exact and approximate k-mer counting system. Bioinformatics, Vol. 34, 4 (2017), 568--575.
[77]
Prashant Pandey, Alex Conway, Joe Durie, Michael A. Bender, Martin Farach-Colton, and Rob Johnson. 2021. Vector Quotient Filters: Overcoming the Time/Space Trade-Off in Filter Design. In Proceedings of the 2021 International Conference on Management of Data. ACM, 1386--1399. https://doi.org/10.1145/3448016.3452841
[78]
Jason Pell, Arend Hintze, Rosangela Canino-Koning, Adina Howe, James M Tiedje, and C Titus Brown. 2012. Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proceedings of the National Academy of Sciences, Vol. 109, 33 (2012), 13272--13277.
[79]
Jack Rae, Sergey Bartunov, and Timothy Lillicrap. 2019. Meta-learning neural bloom filters. In International Conference on Machine Learning.
[80]
Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. 2017. PebblesDB: Building key-value stores using fragmented log-structured merge trees. In Proceedings of the 26th Symposium on Operating Systems Principles. 497--514.
[81]
Brandon Reagen, Udit Gupta, Robert Adolf, Michael M Mitzenmacher, Alexander M Rush, Gu-Yeon Wei, and David Brooks. 2017. Weightless: Lossy weight encoding for deep neural network compression. arXiv preprint arXiv:1711.04686 (2017).
[82]
Kai Ren, Qing Zheng, Joy Arulraj, and Garth Gibson. 2017. SlimDB: A Space-Efficient Key-Value Storage Engine For Semi-Sorted Data. PVLDB (2017).
[83]
Pedro Reviriego, Alfonso Sánchez-Macián, Stefan Walzer, and Peter C. Dillinger. 2021. Approximate Membership Query Filters with a False Positive Free Set.
[84]
Kamil Salikhov, Gustavo Sacomoto, and Gregory Kucherov. 2013. Using cascading Bloom filters to improve the memory usage for de Brujin graphs. In Algorithms in Bioinformatics. Springer, 364--376.
[85]
Subhadeep Sarkar, Niv Dayan, and Manos Athanassoulis. 2023. The LSM Design Space and its Read Optimizations. In ICDE.
[86]
Securelist.com. 2022. . https://securelist.com/kaspersky-security-bulletin-2021-statistics/105205/
[87]
Dimitrios Skarlatos, Apostolos Kokolis, Tianyin Xu, and Josep Torrellas. 2020. Elastic Cuckoo Page Tables: Rethinking Virtual Memory Translation for Parallelism. In ASPLOS.
[88]
Brad Solomon and Carl Kingsford. 2016. Fast search of thousands of short-read sequencing experiments. Nature biotechnology, Vol. 34, 3 (2016), 300.
[89]
Henrik Stranneheim, Max K"aller, Tobias Allander, Björn Andersson, Lars Arvestad, and Joakim Lundeberg. 2010. Classification of DNA sequences using Bloom filters. Bioinformatics, Vol. 26, 13 (2010), 1595--1600.
[90]
Bo Sun, Mitsuaki Akiyama, Takeshi Yagi, Mitsuhiro Hatada, and Tatsuya Mori. 2016. Automating URL blacklist generation with similarity search approach. IEICE TRANSACTIONS on Information and Systems, Vol. 99, 4 (2016), 873--882.
[91]
Mahesh V. Tripunitara and Bogdan Carbunar. 2009. Efficient Access Enforcement in Distributed Role-Based Access Control (RBAC) Deployments. In Proceedings of the 14th ACM Symposium on Access Control Models and Technologies (Stresa, Italy) (SACMAT '09). Association for Computing Machinery, New York, NY, USA, 155--164. https://doi.org/10.1145/1542207.1542232
[92]
Kapil Vaidya, Subarna Chatterjee, Eric Knorr, Michael Mitzenmacher, Stratos Idreos, and Tim Kraska. 2022. SNARF: a learning-enhanced range filter. Proceedings of the VLDB Endowment, Vol. 15, 8 (2022), 1632--1644.
[93]
Hengrui Wang, Tw Guo, Junzhao Yang, and Zhang Huanchen. 2024. GRF: A Global Range Filter for LSM-Trees with Shape Encoding. In SIGMOD.
[94]
Peng Wang, Guangyu Sun, Song Jiang, Jian Ouyang, Shiding Lin, Chen Zhang, and Jason Cong. 2014. An efficient design and implementation of LSM-tree based key-value store on open-channel SSD. In Proceedings of the 9th European Conference on Computer Systems (EuroSys). 16:1--16:14.
[95]
Ziwei Wang, Zheng Zhong, Jiarui Guo, Yuhan Wu, Haoyu Li, Tong Yang, Yaofeng Tu, Huanchen Zhang, and Bin Cui. 2023. Rencoder: A space-time efficient range filter with local encoder. In 2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE, 2036--2049.
[96]
Richard Wen, Hunter McCoy, David Tench, Guido Tagliavini, Michael Bender, Alex Conway, Martin Farach-Colton, Rob Johnson, and Prashant Pandey. 2025. Adaptive Quotient Filters. In Proceedings of the 2025 International Conference on Management of Data. ACM.
[97]
Yuhan Wu, Jintao He, Shen Yan, Jianyu Wu, Tong Yang, Olivier Ruas, Gong Zhang, and Bin Cui. 2021. Elastic Bloom Filter: Deletable and Expandable Filter Using Elastic Fingerprints. IEEE Trans Comput (2021).
[98]
Kun Xie, Yinghua Min, Dafang Zhang, Jigang Wen, and Gaogang Xie. 2007. A Scalable Bloom Filter for Membership Queries. In GLOBECOM.
[99]
Minghao Xie, Quan Chen, Tao Wang, Feng Wang, Yongchao Tao, and Lianglun Cheng. 2022. Towards Capacity-Adjustable and Scalable Quotient Filter Design for Packet Classification in Software-Defined Networks. IEEE Open Journal of the Computer Society (2022).
[100]
Jun Yuan, Yang Zhan, William Jannen, Prashant Pandey, Amogh Akshintala, Kanchan Chandnani, Pooja Deo, Zardosht Kasheff, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2016. Optimizing Every Operation in a Write-optimized File System. In 14th USENIX Conference on File and Storage Technologies, FAST 2016, Santa Clara, CA, USA, February 22--25, 2016, Angela Demke Brown and Florentina I. Popovici (Eds.). USENIX Association, 1--14. https://www.usenix.org/conference/fast16/technical-sessions/presentation/yuan
[101]
Jun Yuan, Yang Zhan, William Jannen, Prashant Pandey, Amogh Akshintala, Kanchan Chandnani, Pooja Deo, Zardosht Kasheff, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2017. Writes Wrought Right, and Other Adventures in File System Optimization. ACM Trans. Storage, Vol. 13, 1 (2017), 3:1--3:26. https://doi.org/10.1145/3032969
[102]
Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2018. Surf: Practical range query filtering with fast succinct tries. In Proceedings of the 2018 International Conference on Management of Data. 323--336.
[103]
Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2020. Succinct range filters. ACM Transactions on Database Systems (TODS), Vol. 45, 2 (2020), 1--31.
[104]
Benjamin Zhu, Kai Li, and R Hugo Patterson. 2008. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST). 1--14.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of Data
June 2024
694 pages
ISBN:9798400704222
DOI:10.1145/3626246
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2024

Check for updates

Author Tags

  1. dictionary data structure
  2. filters
  3. membership query

Qualifiers

  • Tutorial

Funding Sources

Conference

SIGMOD/PODS '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 81
    Total Downloads
  • Downloads (Last 12 months)81
  • Downloads (Last 6 weeks)26
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media