Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3448016.3457273acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections

Chucky: A Succinct Cuckoo Filter for LSM-Tree

Published: 18 June 2021 Publication History


Modern key-value stores typically rely on an LSM-tree in storage (SSD) to handle writes and Bloom filters in memory (DRAM) to optimize reads. With ongoing advances in SSD technology shrinking the performance gap between storage and memory devices, the Bloom filters are now emerging as a performance bottleneck.
We propose Chucky, a new design that replaces the multiple Bloom filters by a single Cuckoo filter that maps each data entry to an auxiliary address of its location within the LSM-tree. We show that while such a design entails fewer memory accesses than with Bloom filters, its false positive rate off the bat is higher. The reason is that the auxiliary addresses occupy bits that would otherwise be used as parts of the Cuckoo filter's fingerprints. To address this, we harness techniques from information theory to succinctly encode the auxiliary addresses so that the fingerprints can stay large. As a result, Chucky achieves the best of both worlds: a modest access cost and a low false positive rate at the same time.

Supplementary Material

MP4 File (3448016.3457273.mp4)
Modern persistent key-value stores rely on an LSM-tree in storage (SSD) to optimize writes and Bloom filters in memory (DRAM) to optimize reads. In this work, we show that the Bloom filters are creeping as a performance bottleneck. First, the costs of probing and constructing them grow with data size. Second, their costs are becoming more pronounced as storage access on modern SSDs is becoming less expensive relative to memory access. Recently, a new slew of data structures has emerged as an alternative to Bloom filters. They work by storing a fingerprint for every data entry within a compact hash table. We call them Fingerprint Filters (FFs). In this paper, we show how to scale an LSM-tree's memory bandwidth by replacing its Bloom filters with an FF that's augmented with every entry's location within the LSM-tree. However, we show that this new design does not scale out-of-the-box in terms of its false positive rate (FPR). The culprit is the auxiliary location information, which grows superlinarly with the data size, thus taking away bits from the fingerprints. By harnessing information theory and compression techniques, we show how to scale the size of this location information to keep the FPR small as the data grows. In this way, we show how to achieve the best of both worlds: scalable memory and storage bandwidth at the same time.


Ahmad, M. Y., and Kemme, B. Compaction management in distributed key-value datastores. PVLDB 8, 8 (2015), 850--861.
Alsubaiee, S., Altowim, Y., Altwaijry, H., Behm, A., Borkar, V. R., Bu, Y., Carey, M. J., Cetindil, I., Cheelangi, M., Faraaz, K., Gabrielova, E., Grover, R., Heilbron, Z., Kim, Y.-S., Li, C., Li, G., Ok, J. M., Onose, N., Pirzadeh, P., Tsotras, V. J., Vernica, R., Wen, J., and Westmann, T. AsterixDB: A Scalable, Open Source BDMS. PVLDB 7, 14 (2014), 1905--1916.
Apache. Cassandra. http://cassandra.apache.org.
Apache. HBase. http://hbase.apache.org/.
Athanassoulis, M., and Idreos, S. Design Tradeoffs of Data Access Methods. SIGMOD (2016).
Athanassoulis, M., Kester, M. S., Maas, L. M., Stoica, R., Idreos, S., Ailamaki, A., and Callaghan, M. Designing Access Methods: The RUM Conjecture. EDBT (2016).
Balmau, O., Didona, D., Guerraoui, R., Zwaenepoel, W., Yuan, H., Arora, A., Gupta, K., and Konka, P. TRIAD: Creating Synergies Between Memory, Disk and Log in Log Structured Key-Value Stores. USENIX ATC (2017).
Balmau, O., Dinu, F., Zwaenepoel, W., Gupta, K., Chandhiramoorthi, R., and Didona, D. $$SILK$$: Preventing latency spikes in log-structured merge key-value stores. In USENIX ATC (2019).
Bender, M. A., Farach-Colton, M., Johnson, R., Kraner, R., Kuszmaul, B. C., Medjedovic, D., Montes, P., Shetty, P., Spillane, R. P., and Zadok, E. Don't Thrash: How to Cache Your Hash on Flash. PVLDB 5, 11 (2012), 1627--1637.
Bjørling, M., Bonnet, P., Bouganim, L., and Dayan, N. The Necessary Death of the Block Device Interface. CIDR (2013).
Bloom, B. H. Space/Time Trade-offs in Hash Coding with Allowable Errors. CACM 13, 7 (1970), 422--426.
Bonnet, P., and Dayan, N. Solid-state storage device flash translation layer, 2017. US Patent App. 15/056,381.
Bonomi, F., Mitzenmacher, M., Panigrahy, R., Singh, S., and Varghese, G. An improved construction for counting bloom filters. In European Symposium on Algorithms (2006).
Bortnikov, E., Braginsky, A., Hillel, E., Keidar, I., and Sheffi, G. Accordion: Better Memory Organization for LSM Key-Value Stores. PVLDB 11, 12 (2018), 1863--1875.
Breslow, A. D., and Jayasena, N. S. Morton filters: faster, space-efficient cuckoo filters via biasing, compression, and decoupled logical sparsity. In VLDB (2018).
Broder, A. Z., and Mitzenmacher, M. Network Applications of Bloom Filters: A Survey. Internet Mathematics 1 (2002), 636--646.
Canim, M., Mihaila, G. A., Bhattacharjee, B., Ross, K. A., and Lang, C. A. SSD Bufferpool Extensions for Database Systems. PVLDB 3, 1--2 (2010), 1435--1446.
Chan, H. H. W., Li, Y., Lee, P. P. C., and Xu, Y. HashKV: Enabling Efficient Updates in KV Storage via Hashing. ATC (2018).
Charles, D., and Chellapilla, K. Bloomier filters: A second look. In European Symposium on Algorithms (2008).
Chazelle, B., Kilian, J., Rubinfeld, R., and Tal, A. The bloomier filter: an efficient data structure for static support lookup tables. In Symposium on Discrete Algorithms (2004).
Chen, G. J., Wiener, J. L., Iyer, S., Jaiswal, A., Lei, R., Simha, N., Wang, W., Wilfong, K., Williamson, T., and Yilmaz, S. Realtime data processing at facebook. In SIGMOD (2016).
Chen, H., Liao, L., Jin, H., and Wu, J. The dynamic cuckoo filter. In IEEE ICNP (2017).
Cooper, B. F., Silberstein, A., Tam, E., Ramakrishnan, R., and Sears, R. Benchmarking cloud serving systems with YCSB. SoCC (2010).
Dai, Y., Xu, Y., Ganesan, A., Alagappan, R., Kroth, B., Arpaci-Dusseau, A., and Arpaci-Dusseau, R. From wisckey to bourbon: A learned index for log-structured merge trees. In USENIX OSDI (2020).
Dayan, N., Athanassoulis, M., and Idreos, S. Monkey: Optimal Navigable Key-Value Store. SIGMOD (2017).
Dayan, N., Athanassoulis, M., and Idreos, S. Optimal Bloom Filters and Adaptive Merging for LSM-Trees. TODS 43, 4 (2018), 16:1--16:48.
Dayan, N., Bonnet, P., and Idreos, S. GeckoFTL: Scalable Flash Translation Techniques For Very Large Flash Devices. SIGMOD (2016).
Dayan, N., and Idreos, S. Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging. SIGMOD (2018).
Dayan, N., and Idreos, S. The log-structured merge-bush & the wacky continuum. In SIGMOD (2019).
Dayan, N., Svendsen, M. K., Bjorling, M., Bonnet, P., and Bouganim, L. Eagletree: exploring the design space of ssd-based algorithms. VLDB (2013).
Debnath, B., Sengupta, S., Li, J., Lilja, D. J., and Du, D. H. Bloomflash: Bloom filter on flash-based storage. In ICDCS (2011).
Deeds, K., Hentschel, B., and Idreos, S. Stacked filters: learning to filter by structure. PVLDB (2020).
Dillinger, P. C., and Manolios, P. Bloom Filters in Probabilistic Verification. Formal Methods in Computer-Aided Design (2004).
Dinh, T. T. A., Wang, J., Chen, G., Liu, R., Ooi, B. C., and Tan, K.-L. Blockbench: A framework for analyzing private blockchains. In SIGMOD (2017).
Dong, S., Callaghan, M., Galanis, L., Borthakur, D., Savor, T., and Strum, M. Optimizing Space Amplification in RocksDB. CIDR (2017).
Duda, J., Tahboub, K., Gadgil, N. J., and Delp, E. J. The use of asymmetric numeral systems as an accurate replacement for huffman coding. In Picture Coding Symposium (PCS) (2015).
Eisenman, A., Gardner, D., AbdelRahman, I., Axboe, J., Dong, S., Hazelwood, K., Petersen, C., Cidon, A., and Katti, S. Reducing dram footprint with nvm in facebook. In EuroSys (2018).
Facebook. RocksDB. https://github.com/facebook/rocksdb.
Fan, B., Andersen, D. G., Kaminsky, M., and Mitzenmacher, M. Cuckoo Filter: Practically Better Than Bloom. CoNEXT (2014).
Fan, L., Cao, P., Almeida, J., and Broder, A. Z. Summary cache: A scalable wide-area Web cache sharing protocol. IEEE/ACM Transactions on Networking 8, 3 (2000), 281--293.
Gilad, E., Bortnikov, E., Braginsky, A., Gottesman, Y., Hillel, E., Keidar, I., Moscovici, N., and Shahout, R. Evendb: Optimizing key-value storage for spatial locality. In EuroSys (2020).
Golan-Gueta, G., Bortnikov, E., Hillel, E., and Keidar, I. Scaling Concurrent Log-Structured Data Stores. EuroSys (2015).
Golomb, S. Run-length encodings. IEEE transactions on information theory (1966).
Graf, T. M., and Lemire, D. Xor filters: Faster and smaller than bloom and cuckoo filters. Journal of Experimental Algorithmics (2020).
Huang, G., Cheng, X., Wang, J., Wang, Y., He, D., Zhang, T., Li, F., Wang, S., Cao, W., and Li, Q. X-engine: An optimized storage engine for large-scale e-commerce transaction processing. In SIGMOD (2019).
Huffman, D. A. A Method for the Construction of Minimum-Redundancy Codes. Proceedings of the IRE 40, 9 (1952), 1098--1101.
Idreos, S., Athanassoulis, M., Dayan, N., Guo, D., Kester, M. S., Maas, L., and Zoumpatianos, K. Past and future steps for adaptive storage data systems: From shallow to deep adaptivity. In Real-Time Business Intelligence and Analytics. Springer, 2015.
Idreos, S., and Callaghan, M. Key-value storage engines. In SIGMOD (2020).
Idreos, S., and Dayan, N. File management with log-structured merge bush, 2020. US Patent App. 16/963,411.
Idreos, S., and Dayan, N. Key-value stores with optimized merge policies and optimized lsm-tree structures, 2020. US Patent App. 16/963,411.
Idreos, S., Dayan, N., and Athanassoulis, M. Optimized navigable key-value store, 2020. US Patent App. 16/433,075.
Idreos, S., Dayan, N., Qin, W., Akmanalp, M., Hilgard, S., Ross, A., Lennon, J., Jain, V., Gupta, H., Li, D., et al. Learning key-value store design. arXiv preprint arXiv:1907.05443 (2019).
Idreos, S., Dayan, N., Qin, W., Akmanalp, M., Hilgard, S., Ross, A., Lennon, J., Jain, V., Gupta, H., Li, D., and Zhu, Z. Design continuums and the path toward self-designing key-value stores that know and learn. In CIDR (2019).
Idreos, S., and Kraska, T. From auto-tuning one size fits all to self-designed and learned data-intensive systems. In SIGMOD (2019).
Idreos, S., Zoumpatianos, K., Athanassoulis, M., Dayan, N., Hentschel, B., Kester, M. S., Guo, D., Maas, L. M., Qin, W., Wasay, A., and Sun, Y. The Periodic Table of Data Structures. IEEE DEBULL 41, 3 (2018), 64--75.
Idreos, S., Zoumpatianos, K., Chatterjee, S., Qin, W., Wasay, A., Hentschel, B., Kester, M., Dayan, N., Guo, D., Kang, M., et al. Learning data structure alchemy. IEEE DEBULL (2019).
Idreos, S., Zoumpatianos, K., Hentschel, B., Kester, M. S., and Guo, D. The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models. SIGMOD (2018).
Im, J., Bae, J., Chung, C., Lee, S., et al. Pink: High-speed in-storage key-value store with bounded tails. In USENIX ATC (2020).
Jagadish, H. V., Narayan, P. P. S., Seshadri, S., Sudarshan, S., and Kanneganti, R. Incremental Organization for Data Recording and Warehousing. VLDB (1997).
Kirsch, A., and Mitzenmacher, M. Less hashing, same performance: Building a better Bloom filter. Random Structures & Algorithms 33, 2 (2008), 187--218.
Kondylakis, H., Dayan, N., Zoumpatianos, K., and Palpanas, T. Coconut: A scalable bottom-up approach for building data series indexes. VLDB 11, 6 (2018), 677--690.
Kondylakis, H., Dayan, N., Zoumpatianos, K., and Palpanas, T. Coconut palm: Static and streaming data series exploration now in your palm. In SIGMOD (2019).
Kondylakis, H., Dayan, N., Zoumpatianos, K., and Palpanas, T. Coconut: sortable summarizations for scalable indexes over static and streaming data series. VLDBJ (2019).
Kraft, L. G. A device for quantizing, grouping, and coding amplitude-modulated pulses. PhD thesis, MIT, 1949.
Kraska, T., Beutel, A., Chi, E. H., Dean, J., and Polyzotis, N. The Case for Learned Index Structures. SIGMOD (2018).
Lang, H., Neumann, T., Kemper, A., and Boncz, P. Performance-optimal filtering: Bloom overtakes cuckoo at high throughput. In VLDB (2019).
Lu, G., Debnath, B., and Du, D. H. C. A Forest-structured Bloom Filter with flash memory. MSST (2011).
Lu, L., Pillai, T. S., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. WiscKey: Separating Keys from Values in SSD-conscious Storage. FAST (2016).
Luo, C., and Carey, M. J. On performance stability in lsm-based storage systems. In VLDB (2019).
Luo, C., and Carey, M. J. Lsm-based storage techniques: a survey. The VLDB Journal (2020).
Luo, C., Tözün, P., Tian, Y., Barber, R., Raman, V., and Sidle, R. Umzi: Unified multi-zone indexing for large-scale htap. In EDBT (2019).
Luo, L., Guo, D., Ma, R. T., Rottenstreich, O., and Luo, X. Optimizing bloom filter: Challenges, solutions, and comparisons. IEEE Commun. Surv. Tutor. (2018).
Luo, S., Chatterjee, S., Ketsetsidis, R., Dayan, N., Qin, W., and Idreos, S. Rosetta: A robust space-time optimized range filter for key-value stores. In SIGMOD (2020).
Matsunobu, Y., Dong, S., and Lee, H. Myrocks: Lsm-tree database storage engine serving facebook's social graph. VLDB (2020).
McMillan, B. Two inequalities implied by unique decipherability. IRE Transactions on Information Theory (1956).
Mei, F., Cao, Q., Jiang, H., and Li, J. Sifrdb: A unified solution for write-optimized key-value stores in large datacenter. In ACM SOCC (2018).
Mitzenmacher, M. Compressed bloom filters. IEEE/ACM Transactions on Networking (2002).
Mitzenmacher, M., Pontarelli, S., and Reviriego, P. Adaptive cuckoo filters. In SIAM ALENEX (2018).
O'Neil, P. E., Cheng, E., Gawlick, D., and O'Neil, E. J. The log-structured merge-tree (LSM-tree). Acta Informatica 33, 4 (1996), 351--385.
Pagh, A., Pagh, R., and Rao, S. S. An optimal bloom filter replacement. In SODA (2005).
Pandey, P., Bender, M. A., Johnson, R., and Patro, R. A general-purpose counting filter: Making every bit count. In SIGMOD (2017).
Pasco, R. C. Source coding algorithms for fast data compression. PhD thesis, Stanford University CA, 1976.
Polychroniou, O., and Ross, K. A. Vectorized Bloom filters for advanced SIMD processors. DAMON (2014).
Putze, F., Sanders, P., and Singler, J. Cache-, hash-, and space-efficient bloom filters. Journal of Experimental Algorithmics (JEA) (2010).
Raju, P., Kadekodi, R., Chidambaram, V., and Abraham, I. PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees. SOSP (2017).
Ren, K., Zheng, Q., Arulraj, J., and Gibson, G. SlimDB: A Space-Efficient Key-Value Storage Engine For Semi-Sorted Data. PVLDB 10, 13 (2017), 2037--2048.
Rissanen, J., and Langdon, G. G. Arithmetic coding. IBM Journal of research and development (1979).
Rothenberg, C. E., Macapuna, C., Verdi, F., and Magalhaes, M. The deletable Bloom filter: a new member of the Bloom family. IEEE Communications Letters 14, 6 (jun 2010), 557--559.
Rottenstreich, O., Kanizo, Y., and Keslassy, I. The variable-increment counting bloom filter. IEEE/ACM Transactions on Networking (2013).
Sarkar, S., Papon, T. I., Staratzis, D., and Athanassoulis, M. Lethe: A tunable delete-aware lsm engine. In SIGMOD (2020).
Sears, R., and Ramakrishnan, R. bLSM: A General Purpose Log Structured Merge Tree. SIGMOD (2012).
Shetty, P., Spillane, R. P., Malpani, R., Andrews, B., Seyster, J., and Zadok, E. Building Workload-Independent Storage with VT-trees. FAST (2013).
Tarkoma, S., Rothenberg, C. E., and Lagerspetz, E. Theory and Practice of Bloom Filters for Distributed Systems. IEEE Communications Surveys & Tutorials 14, 1 (2012), 131--155.
Thonangi, R., and Yang, J. On Log-Structured Merge for Solid-State Drives. ICDE (2017).
Vincc on, T., Hardock, S., Riegger, C., Oppermann, J., Koch, A., and Petrov, I. Noftl-kv: Tackling write-amplification on kv-stores with native storage management. In EDBT (2018).
Wang, M., Zhou, M., Shi, S., and Qian, C. Vacuum filters: more space-efficient and faster replacement for bloom and cuckoo filters. In VLDB (2019).
Wang, P., Sun, G., Jiang, S., Ouyang, J., Lin, S., Zhang, C., and Cong, J. An Efficient Design and Implementation of LSM-Tree based Key-Value Store on Open-Channel SSD. EuroSys (2014).
Wu, X., Xu, Y., Shao, Z., and Jiang, S. LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Items. USENIX ATC (2015).
Yao, T., Wan, J., Huang, P., He, X., Wu, F., and Xie, C. Building Efficient Key-Value Stores via a Lightweight Compaction Tree. TOS 13, 4 (2017), 29:1--29:28.
Zhang, H., Lim, H., Leis, V., Andersen, D. G., Kaminsky, M., Keeton, K., and Pavlo, A. SuRF: Practical Range Query Filtering with Fast Succinct Tries. SIGMOD (2018).
Zhang, T., Wang, J., Cheng, X., Xu, H., Yu, N., Huang, G., Zhang, T., He, D., Li, F., Cao, W., et al. Fpga-accelerated compactions for lsm-based key-value store. In USENIX FAST (2020).

Cited By

View all
  • (2025)Disco: A Compact Index for LSM-treesProceedings of the ACM on Management of Data10.1145/37096833:1(1-27)Online publication date: 11-Feb-2025
  • (2025)Aster: Enhancing LSM-structures for Scalable Graph DatabaseProceedings of the ACM on Management of Data10.1145/37096623:1(1-26)Online publication date: 11-Feb-2025
  • (2024)LavaStore: ByteDance's Purpose-Built, High-Performance, Cost-Effective Local Storage Engine for Cloud ServicesProceedings of the VLDB Endowment10.14778/3685800.368580717:12(3799-3812)Online publication date: 8-Nov-2024
  • Show More Cited By



Information & Contributors


Published In

cover image ACM Conferences
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
June 2021
2969 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021


Request permissions for this article.

Check for updates

Author Tags

  1. Huffman coding
  2. LSM-tree
  3. bloom filter
  4. cuckoo filter
  5. entropy encoding
  6. succinct data structures


  • Research-article



Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)113
  • Downloads (Last 6 weeks)9
Reflects downloads up to 17 Feb 2025

Other Metrics


Cited By

View all
  • (2025)Disco: A Compact Index for LSM-treesProceedings of the ACM on Management of Data10.1145/37096833:1(1-27)Online publication date: 11-Feb-2025
  • (2025)Aster: Enhancing LSM-structures for Scalable Graph DatabaseProceedings of the ACM on Management of Data10.1145/37096623:1(1-26)Online publication date: 11-Feb-2025
  • (2024)LavaStore: ByteDance's Purpose-Built, High-Performance, Cost-Effective Local Storage Engine for Cloud ServicesProceedings of the VLDB Endowment10.14778/3685800.368580717:12(3799-3812)Online publication date: 8-Nov-2024
  • (2024)Optimizing Collections of Bloom Filters within a Space BudgetProceedings of the VLDB Endowment10.14778/3681954.368202017:11(3551-3564)Online publication date: 1-Jul-2024
  • (2024)Bf-Tree: A Modern Read-Write-Optimized Concurrent Larger-Than-Memory Range IndexProceedings of the VLDB Endowment10.14778/3681954.368201217:11(3442-3455)Online publication date: 1-Jul-2024
  • (2024)SepHash: A Write-Optimized Hash Index On Disaggregated Memory via Separate Segment StructureProceedings of the VLDB Endowment10.14778/3641204.364121817:5(1091-1104)Online publication date: 2-May-2024
  • (2024)Memento Filter: A Fast, Dynamic, and Robust Range FilterProceedings of the ACM on Management of Data10.1145/36988202:6(1-27)Online publication date: 20-Dec-2024
  • (2024)CAMAL: Optimizing LSM-trees via Active LearningProceedings of the ACM on Management of Data10.1145/36771382:4(1-26)Online publication date: 30-Sep-2024
  • (2024)Structural Designs Meet Optimality: Exploring Optimized LSM-tree Structures in a Colossal Configuration SpaceProceedings of the ACM on Management of Data10.1145/36549782:3(1-26)Online publication date: 30-May-2024
  • (2024)Optimizing Time Series Queries with VersionsProceedings of the ACM on Management of Data10.1145/36549622:3(1-27)Online publication date: 30-May-2024
  • Show More Cited By

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.







Share this Publication link

Share on social media