Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Revisiting B-tree Compression: An Experimental Study

Published: 30 May 2024 Publication History
  • Get Citation Alerts
  • Abstract

    B-trees are widely recognized as one of the most important index structures in database systems, providing efficient query processing capabilities. Over the past few decades, many techniques have been developed to enhance the efficiency of B-trees from various perspectives. Among them, B-tree compression is an important technique introduced as early as the 1970s to improve both space efficiency and query performance. Since then, several B-tree compression techniques have been developed. However, to our surprise, we have found that these B-tree compression techniques were never compared against each other in prior works. Consequently, many important questions remain unanswered, such as whether B-tree compression is truly effective or not. If it is effective, under what scenarios and which B-tree compression methods should be employed? In this paper, we conduct the first experimental evaluation of seven widely used B-tree compression techniques using both synthetic and real datasets. Based on our evaluation, we present lessons and insights that can be leveraged to guide system design decisions in modern databases regarding the use of B-tree compression.

    References

    [1]
    2007. WEBSPAM-UK2007 Dataset. https://chato.cl/webspam/datasets/uk2007/
    [2]
    2008. SNAP Memetracker Dataset. https://www.kaggle.com/datasets/snap/snap-memetracker
    [3]
    2022. The Default Page Size Change of SQLite 3.12.0. https://www.sqlite.org/pgszchng2016.html
    [4]
    2022. Source Code of WiredTiger's B-Tree Implementation. https://github.com/wiredtiger/wiredtiger/tree/develop/ src/btree
    [5]
    2023. CREATE INDEX Statement in SAP HANA (https://help.sap.com/docs/SAP_HANA_PLATFORM/ 4fe29514fd584807ac9f2a04f6754767/20d44b4175191014a940afff4b47c7ea.html).
    [6]
    2023. Database Page Layout in PostgreSQL 16. https://www.postgresql.org/docs/current/storage-page-layout.html
    [7]
    2023. MyISAM Source Code in MySQL. https://github.com/mysql/mysql-server/tree/ a246bad76b9271cb4333634e954040a970222e0a/storage/myisam
    [8]
    2023. MySQL Reference Manual. https://dev.mysql.com/doc/refman/8.0/en/key-space.html#: :text=Prefix% 20compression%20is%20used%20on,when%20you%20create%20the%20table.
    [9]
    2023. TPC-H Benchmark. https://www.tpc.org/tpch/
    [10]
    2023. WiredTiger Documentation. https://source.wiredtiger.com/11.1.0/file_formats.html#file_formats_compression
    [11]
    Daniel J. Abadi, Samuel Madden, and Miguel Ferreira. 2006. Integrating compression and execution in column-oriented database systems. In SIGMOD. 671--682.
    [12]
    Gennady Antoshenkov. 1997. Dictionary-Based Order-Preserving String Compression. VLDB Journal 6, 1 (1997), 26--39.
    [13]
    G. Antoshenkov, D. Lomet, and J. Murray. 1996. Order Preserving String Compression. In ICDE. 655--663.
    [14]
    Rudolf Bayer and Edward M. McCreight. 1972. Organization and Maintenance of Large Ordered Indices. Acta Informatica 1 (1972), 173--189.
    [15]
    Rudolf Bayer and Karl Unterauer. 1977. Prefix B-Trees. TODS 2, 1 (1977), 11--26.
    [16]
    Bishwaranjan Bhattacharjee, Lipyeow Lim, Timothy Malkemus, George A. Mihaila, Kenneth A. Ross, Sherman Lau, Cathy McCarthur, Zoltan Toth, and Reza Sherkat. 2009. Efficient Index Compression in DB2 LUW. PVLDB 2, 2 (2009), 1462--1473.
    [17]
    Carsten Binnig, Stefan Hildenbrand, and Franz Färber. 2009. Dictionary-based Order-preserving String Compression for Main Memory Column Stores. In SIGMOD. 283--296.
    [18]
    Philip Bohannon, Peter McIlroy, and Rajeev Rastogi. 2001. Main-Memory Index Structures with Fixed-Size Partial Keys. In SIGMOD. 163--174.
    [19]
    Lars Breddemann. 2020. What is CPB-Tree in SAP HANA? https://www.lbreddemann.org/what-is-cpb-tree-in-saphana/
    [20]
    Peter Bumbulis and Ivan T. Bowman. 2002. A Compact B-tree. In SIGMOD. 533--541.
    [21]
    Igor Canadi. 2019. Converged Index: The Secret Sauce Behind Rockset's Fast Queries. https://rockset.com/blog/ converged-indexing-the-secret-sauce-behind-rocksets-fast-queries/
    [22]
    Xinyu Chen, Jiannan Tian, Ian Beaver, Cynthia Freeman, Yan Yan, Jianguo Wang, and Dingwen Tao. 2024. FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point Data. PVLDB 17, 6 (2024), 1418--1431.
    [23]
    Zhiyuan Chen, Johannes Gehrke, and Flip Korn. 2001. Query Optimization In Compressed Database Systems. In SIGMOD. 271--282.
    [24]
    Gregg Christman. 2022. Compression Features Included with Oracle Database Enterprise Edition. https://blogs.oracle. com/dbstorage/post/compression-features-included-with-oracle-database-enterprise-edition
    [25]
    Douglas Comer. 1979. The Ubiquitous B-Tree. CSUR 11, 2 (1979), 121--137.
    [26]
    Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Åke Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL server's memory-optimized OLTP engine. In SIGMOD. 1243--1254.
    [27]
    Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David B. Lomet, and Tim Kraska. 2020. ALEX: An Updatable Adaptive Learned Index. In SIGMOD. 969--984.
    [28]
    Goetz Graefe. 2010. A Survey of B-tree Locking Techniques. TODS 35, 3 (2010), 16:1--16:26.
    [29]
    Goetz Graefe. 2011. Modern B-Tree Techniques. Foundations and Trends in Databases 3, 4 (2011), 203--402.
    [30]
    Goetz Graefe and Per-Åke Larson. 2001. B-Tree Indexes and CPU Caches. In ICDE. 349--358.
    [31]
    Richard A. Hankins and Jignesh M. Patel. 2003. Effect of Node Size on the Performance of Cache-conscious B-trees. In SIGMETRICS. 283--294.
    [32]
    Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alex Rasin, Stanley B. Zdonik, Evan P. C. Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, John Hugg, and Daniel J. Abadi. 2008. H-store: A High-performance, Distributed Main memory Transaction Processing System. PVLDB 1, 2 (2008), 1496--1499.
    [33]
    Changkyu Kim, Jatin Chhugani, Nadathur Satish, Eric Sedlar, Anthony D. Nguyen, Tim Kaldewey, Victor W. Lee, Scott A. Brandt, and Pradeep Dubey. 2010. FAST: Fast Architecture Sensitive Tree Search on Modern CPUs and GPUs. In SIGMOD. 339--350.
    [34]
    Sangchul Kim, Junhee Lee, Srinivasa Rao Satti, and Bongki Moon. 2016. SBH: Super Byte-aligned Hybrid Bitmap Compression. Information Systems 62 (2016), 155--168.
    [35]
    Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In SIGMOD. 489--504.
    [36]
    Harald Lang, Alexander Beischl, Viktor Leis, Peter A. Boncz, Thomas Neumann, and Alfons Kemper. 2020. Tree-Encoded Bitmaps. In SIGMOD. 937--967.
    [37]
    Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The Adaptive Radix Tree: ARTful Indexing for Main-memory Databases. In ICDE. 38--49.
    [38]
    Justin J. Levandoski, David B. Lomet, and Sudipta Sengupta. 2013. The Bw-Tree: A B-tree for New Hardware Platforms. In ICDE. 302--313.
    [39]
    Chunwei Liu, Hao Jiang, John Paparrizos, and Aaron J. Elmore. 2021. Decomposed Bounded Floats for Fast Compression and Queries. PVLDB 14, 11 (2021), 2586--2598.
    [40]
    Chunwei Liu, McKade Umbenhower, Hao Jiang, Pranav Subramaniam, Jihong Ma, and Aaron J. Elmore. 2019. Mostly Order Preserving Dictionaries. In ICDE. 1214--1225.
    [41]
    David B. Lomet. 2001. The Evolution of Effective B-tree: Page Organization and Techniques: A Personal Account. SIGMOD Record 30, 3 (2001), 64--69.
    [42]
    Tuomas Pelkonen, Scott Franklin, Paul Cavallaro, Qi Huang, Justin Meza, Justin Teller, and Kaushik Veeraraghavan. 2015. Gorilla: A Fast, Scalable, In-Memory Time Series Database. PVLDB 8, 12 (2015), 1816--1827.
    [43]
    Jianguo Wang, Chunbin Lin, Ruining He, Moojin Chae, Yannis Papakonstantinou, and Steven Swanson. 2017. MILC: Inverted List Compression in Memory. PVLDB 10, 8 (2017), 853--864.
    [44]
    Jianguo Wang, Chunbin Lin, Yannis Papakonstantinou, and Steven Swanson. 2017. An Experimental Study of Bitmap Compression vs. Inverted List Compression. In SIGMOD. 993--1008.
    [45]
    Ziqi Wang, Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, Michael Kaminsky, and David G. Andersen. 2018. Building a Bw-Tree Takes More Than Just Buzz Words. In SIGMOD. 473--488.
    [46]
    Kesheng Wu, Ekow J. Otoo, and Arie Shoshani. 2006. Optimizing Bitmap Indices with Efficient Compression. TODS 31, 1 (2006), 1--38.
    [47]
    Helen Xu, Amanda Li, Brian Wheatman, Manoj Marneni, and Prashant Pandey. 2023. BP-tree: Overcoming the Point-Range Operation Tradeoff for In-Memory B-trees. PVLDB 16, 11 (2023), 2976--2989.
    [48]
    Chaoqun Zhan, Maomeng Su, Chuangxian Wei, Xiaoqiang Peng, Liang Lin, Sheng Wang, Zhe Chen, Feifei Li, Yue Pan, Fang Zheng, and Chengliang Chai. 2019. AnalyticDB: Real-time OLAP Database System at Alibaba Cloud. PVLDB 12, 12 (2019), 2059--2070.
    [49]
    Feng Zhang, Weitao Wan, Chenyang Zhang, Jidong Zhai, Yunpeng Chai, Haixiang Li, and Xiaoyong Du. 2022. CompressDB: Enabling Efficient Compressed Data Direct Processing for Various Databases. In SIGMOD. 1655--1669.
    [50]
    Feng Zhang, Jidong Zhai, Xipeng Shen, Dalin Wang, Zheng Chen, Onur Mutlu, Wenguang Chen, and Xiaoyong Du. 2021. TADOC: Text Analytics Directly on Compression. VLDB Journal 30, 2 (2021), 163--188.
    [51]
    Huanchen Zhang, Xiaoxuan Liu, David G. Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2020. Order-Preserving Key Compression for In-Memory Search Trees. In SIGMOD. 1601--1615.
    [52]
    Tobias Ziegler, Sumukha Tumkur Vani, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2019. Designing Distributed Tree-based Index Structures for Fast RDMA-capable Networks. In SIGMOD. 741--758.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Management of Data
    Proceedings of the ACM on Management of Data  Volume 2, Issue 3
    SIGMOD
    June 2024
    1953 pages
    EISSN:2836-6573
    DOI:10.1145/3670010
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 May 2024
    Published in PACMMOD Volume 2, Issue 3

    Permissions

    Request permissions for this article.

    Author Tags

    1. b-tree compression
    2. b-trees
    3. database indexes

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 139
      Total Downloads
    • Downloads (Last 12 months)139
    • Downloads (Last 6 weeks)138

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media