Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3626246.3653393acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

Native Cloud Object Storage in Db2 Warehouse: Implementing a Fast and Cost-Efficient Cloud Storage Architecture

Published: 09 June 2024 Publication History

Abstract

Database systems built on traditional storage subsystems typically store their data in small blocks referred to as data pages (commonly sized in a multiple of 4KB for historical reasons). These traditional storage subsystems, for example network attached block storage, were designed for efficient random-access I/O patterns at the block level, and the block size is usually configurable by the application based on its needs. For large scale analytic databases in cloud environments, these traditional storage subsystems are not cost effective when compared to cloud object storage, and database systems that exploit them risk becoming uncompetitive. This paper describes the modernization of the storage architecture of Db2 Warehouse, a traditional full feature and high-performance database system with 3 decades of development, to exploit the new paradigm of cost-effective storage for the cloud. We discuss a solution based on the integration of LSM trees as part of the storage subsystem, that enables Db2 Warehouse to efficiently store data pages within object storage, and through the application of special techniques to minimize read and write latencies as well as all of the amplification factors (write, read, and storage), achieve not only storage cost savings, but also higher performance. Further, by retaining the traditional data page format, we are able to avoid significantly re-architecting the database kernel and thereby retain the substantial capabilities and optimizations of the existing system.

References

[1]
IBM, "IBM Cloud Object Storage," [Online]. Available: https://www.ibm.com/products/cloud-object-storage. [Accessed 26 11 2023].
[2]
Amazon, "Amazon Simple Storage Service (S3)," [Online]. Available: https://aws.amazon.com/s3/. [Accessed 26 11 2023].
[3]
B. Dageville, T. Cruanes, M. Zukowski, V. Antonov, A. Avanes, J. Bock, J. Claybaugh, D. Engovatov, M. Hentschel, J. Huang, A. Lee, A. Motivala, A. Munir, S. Pelley, P. Povinec and Rah, "The Snowflake Elastic Data Warehouse," In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). Association for Computing Machinery, p. 215--226, 2016.
[4]
IBM, "IBM Db2 Warehouse," [Online]. Available: https://www.ibm.com/products/db2/warehouse. [Accessed 26 11 2023].
[5]
IBM, "IBM Cloud Block Storage," [Online]. Available: https://www.ibm.com/products/block-storage. [Accessed 26 11 2023].
[6]
Amazon, "Amazon Elastic Block Storage," [Online]. Available: https://aws.amazon.com/ebs/. [Accessed 26 11 2023].
[7]
IBM, "IBM Cloud Block Storage capacity and performance," [Online]. Available: https://cloud.ibm.com/docs/vpc?topic=vpc-capacity-performance. [Accessed 26 11 2023].
[8]
V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, T. Malkemus, R. Mueller, I. Pandis, B. Schiefer, D. Sharpe and S, "DB2 with BLU acceleration: so much more than just a column store," Proc. VLDB Endow., vol. 6, no. 11, p. 1080--1091, 2013.
[9]
P. O'Neil, E. Cheng, D. Gawlick and E. O'Neil, "The log-structured merge-tree (LSM-tree)," Acta Informatica, vol. 33, p. 351--385, 1996.
[10]
Facebook, "RocksDB: A persistent key-value store," [Online]. Available: https://github.com/facebook/rocksdb. [Accessed 26 11 2023].
[11]
Rockset, "RocksDB-Cloud: A Key-Value Store for Cloud Applications," [Online]. Available: https://github.com/rockset/rocksdb-cloud. [Accessed 26 11 2023].
[12]
Apple, "FoundationDB: the open source, distributed, transactional key-value store," [Online]. Available: https://github.com/apple/foundationdb. [Accessed 26 11 2023].
[13]
Facebook, "MemTables in RocksDB," [Online]. Available: https://github.com/facebook/rocksdb/wiki/MemTable. [Accessed 26 11 2023].
[14]
K. Huang, Z. Shen, Z. Jia, Z. Shao and F. Chen, "Removing Double-Logging with Passive Data Persistence in LSM-tree based Relational Databases," in 20th USENIX Conference on File and Storage Technologies (FAST), Santa Clara, CA, USA, 2022.
[15]
V. Jain, J. Lennon and H. Gupta, "LSM-Trees and B-Trees: The Best of Both Worlds.," in In Proceedings of the 2019 Internationa Conference on Management of Data (SIGMOD '19). Association for Computing Machinery., New York, NY, USA, 2019.
[16]
TPC, "TPC-DS: Decision Support Benchmark," [Online]. Available: https://www.tpc.org/tpcds/. [Accessed 26 11 2023].
[17]
D. Kalmuk and C. Garcia-Arellano, "Db2 Warehouse delivers 4x faster query performance than previously, while cutting storage costs by 34x," IBM, 11 07 2023. [Online]. Available: https://www.ibm.com/blog/db2-warehouse-delivers-4x-faster-query-performance-than-previously-while-cutting-storage-costs-by-34x. [Accessed 26 11 2023].
[18]
A. Gupta, D. Agarwal, D. Tan, J. Kulesza, R. Pathak, S. Stefani and V. Srinivasan, "Amazon Redshift and the Case for Simpler Data Warehouses," in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, 2015.
[19]
N. Armenatzoglou, S. Basu, N. Bhanoori, M. Cai, N. Chainani, K. Chinta, V. Govindaraju, T. J. Green, M. Gupta, S. Hillig, E. Hotinger, Y. Leshinksy, J. Liang, M. McCreedy and F. Nagel, "Amazon Redshift Re-Invented," in Proceedings of the 2022 International Conference on Management of Data, Philadelphia, PA, USA, 2022.
[20]
Apache Software Foundation, "Apache Parquet," [Online]. Available: https://github.com/apache/parquet-format. [Accessed 26 11 2023].
[21]
Apache Software Foundation, "Apache Iceberg," [Online]. Available: https://iceberg.apache.org/. [Accessed 26 11 2023].
[22]
C. Garcia-Arellano, H. Roumani, R. Sidle, J. Tiefenbach, K. Rakopoulos, I. Sayyid, A. Storm, R. Barber, F. Ozcan, D. Zilio, A. Cheung, G. Gershinsky, H. Pirahesh, D. Kalmuk and Y. Tian, "Db2 Event Store: A Purpose-Built IoT Database Engine," Proc. VLDB Endow., vol. 13, no. 12, p. 3299--3312, 2020.
[23]
Amazon, "AWS Glue Data Catalog now supports automatic compaction of Apache Iceberg tables," [Online]. Available: https://aws.amazon.com/blogs/aws/aws-glue-data-catalog-now-supports-automatic-compaction-of-apache-iceberg-tables/. [Accessed 26 11 2023].
[24]
Databricks, "Use liquid clustering for Delta tables," [Online]. Available: https://docs.databricks.com/en/delta/clustering.html. [Accessed 26 11 2023].
[25]
S. Dhoot, "How We Use RocksDB at Rockset," 27 06 2019. [Online]. Available: https://rockset.com/blog/how-we-use-rocksdb-at-rockset. [Accessed 26 11 2023].
[26]
ClickHouse, "Embedded RocksDB Engine," [Online]. Available: https://clickhouse.com/docs/en/engines/table-engines/integrations/embedded-rocksdb. [Accessed 26 11 2023].
[27]
MySQL, "HeatWave," [Online]. Available: https://www.mysql.com/products/mysqlheatwave/fully-managed/. [Accessed 26 11 2023].
[28]
SingleStore, "SingleStore," [Online]. Available: https://www.singlestore.com/. [Accessed 26 11 2023].
[29]
S. Dong, M. Callaghan, L. Galanis, D. Borthakur, T. Savor and M. Strum, "Optimizing Space Amplification in RocksDB," in Conference on Innovative Data Systems Research, 2017.
[30]
S. Petrunia, "MyRocks In MariaDB," 11 2017. [Online]. Available: https://mariadb.org/wp-content/uploads/2017/11/shenzhen2017-myrocks-in-mariadb.pdf. [Accessed 26 11 2023].
[31]
R. Taft, I. Sharif, A. Matei, N. VanBenschoten, J. Lewis, T. Grieger, K. Niemi, A. Woods, A. Birzin, R. Poss, P. Bardea, A. Ranade, B. Darnell and B. Gruneir, "CockroachDB: The Resilient Geo-Distributed SQL Database," in Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, Portland, OR, USA, 2020.
[32]
A. Ailamaki, D. J. DeWitt and M. D. Hill, "Data Page Layouts for Relational Databases on Deep Memory Hierarchies," The VLDB Journal, vol. 11, no. 3, p. 198--215, 2002.
[33]
M. Abebe, H. Lazu and K. Daudjee, "Proteus: Autonomous Adaptive Storage for Mixed Workloads," in SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data (700--714), 2022.
[34]
Apache Software Foundation, "Apache Parquet," [Online]. Available: https://parquet.apache.org/. [Accessed 26 11 2023].
[35]
Snowflake, "Automatic Clustering," [Online]. Available: https://docs.snowflake.com/en/user-guide/tables-auto-reclustering. [Accessed 26 11 2023].

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of Data
June 2024
694 pages
ISBN:9798400704222
DOI:10.1145/3626246
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2024

Check for updates

Author Tags

  1. LSM tree
  2. OLAP
  3. analytics
  4. cloud data warehouse
  5. cloud object storage
  6. data lake
  7. rocksDB

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 487
    Total Downloads
  • Downloads (Last 12 months)487
  • Downloads (Last 6 weeks)124
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media