Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3448016.3457560acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

PolarDB Serverless: A Cloud Native Database for Disaggregated Data Centers

Published: 18 June 2021 Publication History

Abstract

\beginabstract The trend in the DBMS market is to migrate to the cloud for elasticity, high availability, and lower costs. The traditional, monolithic database architecture is difficult to meet these requirements. With the development of high-speed network and new memory technologies, disaggregated data center has become a reality: it decouples various components from monolithic servers into separated resource pools (e.g., compute, memory, and storage) and connects them through a high-speed network. The next generation cloud native databases should be designed for disaggregated data centers. In this paper, we describe the novel architecture of \name, which follows thedisaggregation design paradigm: the CPU resource on compute nodes is decoupled from remote memory pool and storage pool. Each resource pool grows or shrinks independently, providing \revon-demand provisoning at multiple dimensions while improving reliability. We also design our system to mitigate the inherent penalty brought by resource disaggregation, and introduce optimizations such as optimistic locking and index awared prefetching. Compared to the architecture that uses local resources, \name achieves better dynamic resource provisioning capabilities and 5.3 times faster failure recovery speed, while achieving comparable performance. \endabstract

Supplementary Material

MP4 File (3448016.3457560.mp4)
The trend in the DBMS market is to migrate to the cloud for elasticity, high availability, and lower costs. The traditional, monolithic database architecture is difficult to meet these requirements. With the development of high-speed network and new memory technologies, disaggregated data center has become a reality: it decouples various components from monolithic servers into separated resource pools (e.g., compute, memory, and storage) and connects them through a high-speed network. The next generation cloud native databases are designed based on disaggregated data centers, disassembling the monolithic database architecture into independently extensible and composable components. This facilitates the construction of stateful services (e.g., memory pool and storagepool) as multi-tenant systems to achieve better resource utilization, elasticity, availability, and scalability.In this paper, we describe the novel architecture of PolarDB Serverless, which follows the disaggregation design paradigm: the CPU resource on compute nodes is decoupled from remote memory pool and storage pool, and each resource pool grows or shrinks independently, providing elasticity at multiple levels. Database processes on multiple compute nodes share cached pages in memory and persistent data in storage to amortize the cost. With this new architecture, different resources no longer share the same failure domain. The single point of failure in each component is handled independently, improving system reliability significantly. We design our system to mitigate the inherent penalty brought by resource disaggregation, and introduce optimizations such as one-sided RDMAverbs, optimistic locking, page materialization offloading, prefetching, and table scan pushdown. Compared to the architecture that uses local resources, PolarDB Serverless achieves better dynamic resource provisioning capabilities and 5.3 times faster failure recovery speed, while achieving comparable performance.

References

[1]
A.-C. Anadiotis, R. Appuswamy, A. Ailamaki, I. Bronshtein, H. Avni, D. Dominguez-Sal, S. Goikhman, and E. Levy. A system design for elastically scaling transaction processing engines in virtualized servers. Proc. VLDB Endow., 13(12):3085--3098, Aug. 2020.
[2]
P. Antonopoulos, A. Budovski, C. Diaconu, A. Hernandez Saenz, J. Hu, H. Kodavalla, D. Kossmann, S. Lingam, U. F. Minhas, N. Prakash, V. Purohit, H. Qu, C. S. Ravella, K. Reisteter, S. Shrotri, D. Tang, and V. Wakade. Socrates: The new sql server in the cloud. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD '19, page 1743--1756, New York, NY, USA, 2019. Association for Computing Machinery.
[3]
AWS. Aurora serverless. https://aws.amazon.com/cn/rds/aurora/serverless/.
[4]
Azure. Azure sql serverless. https://docs.microsoft.com/en-us/azure/azure-sql/database/serverless-tier-overview.
[5]
T. Bang, N. May, I. Petrov, and C. Binnig. The tale of 1000 cores: An evaluation of concurrency control on real(ly) large multi-socket hardware. In Proceedings of the 16th International Workshop on Data Management on New Hardware, DaMoN '20, New York, NY, USA, 2020. Association for Computing Machinery.
[6]
V. Barshai, Y. Chan, H. Lu, S. Sohal, et al. Delivering continuity and extreme capacity with the IBM DB2 pureScale feature. IBM Redbooks, 2012.
[7]
Q. Cai, W. Guo, H. Zhang, D. Agrawal, G. Chen, B. C. Ooi, K.-L. Tan, Y. M. Teo, and S. Wang. Efficient distributed memory management with rdma and caching. Proc. VLDB Endow., 11(11):1604--1617, July 2018.
[8]
W. Cao, Y. Liu, Z. Cheng, N. Zheng, W. Li, W. Wu, L. Ouyang, P. Wang, Y. Wang, R. Kuan, Z. Liu, F. Zhu, and T. Zhang. POLARDB meets computational storage: Efficiently support analytical workloads in cloud-native relational database. In 18th USENIX Conference on File and Storage Technologies (FAST 20), pages 29--41, Santa Clara, CA, Feb. 2020. USENIX Association.
[9]
W. Cao, Z. Liu, P. Wang, S. Chen, C. Zhu, S. Zheng, Y. Wang, and G. Ma. Polarfs: An ultra-low latency and failure resilient distributed file system for shared storage cloud database. Proc. VLDB Endow., 11(12):1849--1862, Aug. 2018.
[10]
S. K. Cha, S. Hwang, K. Kim, and K. Kwon. Cache-conscious concurrency control of main-memory indexes on shared-memory multiprocessor systems. In Proceedings of the 27th International Conference on Very Large Data Bases, VLDB '01, page 181--190, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc.
[11]
S. Chen, A. Ailamaki, P. B. Gibbons, and T. C. Mowry. Improving hash join performance through prefetching. ACM Trans. Database Syst., 32(3):17--es, Aug. 2007.
[12]
T. P. P. Council. Tpc benchmark. http://tpc.org/TPC_Documents_Current_Versions/pdf/tpc-c_v5.11.0.pdf.
[13]
A. Dragojevi, D. Narayanan, M. Castro, and O. Hodson. Farm: Fast remote memory. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), pages 401--414, Seattle, WA, Apr. 2014. USENIX Association.
[14]
A. Fekete, D. Liarokapis, E. O'Neil, P. O'Neil, and D. Shasha. Making snapshot isolation serializable. ACM Trans. Database Syst., 30(2):492--528, June 2005.
[15]
G. Graefe. A survey of b-tree locking techniques. ACM Trans. Database Syst., 35(3), July 2010.
[16]
J. Gu, Y. Lee, Y. Zhang, M. Chowdhury, and K. G. Shin. Efficient memory disaggregation with infiniswap. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), pages 649--667, Boston, MA, Mar. 2017. USENIX Association.
[17]
C. Guo, H. Wu, Z. Deng, G. Soni, J. Ye, J. Padhye, and M. Lipshteyn. Rdma over commodity ethernet at scale. In Proceedings of the 2016 ACM SIGCOMM Conference, SIGCOMM '16, page 202--215, New York, NY, USA, 2016. Association for Computing Machinery.
[18]
G. Huang, X. Cheng, J. Wang, Y. Wang, D. He, T. Zhang, F. Li, S. Wang, W. Cao, and Q. Li. X-engine: An optimized storage engine for large-scale e-commerce transaction processing. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD '19, page 651--665, New York, NY, USA, 2019. Association for Computing Machinery.
[19]
Z. István, D. Sidler, and G. Alonso. Caribou: Intelligent distributed storage. Proc. VLDB Endow., 10(11):1202--1213, Aug. 2017.
[20]
I. Jo, D.-H. Bae, A. S. Yoon, J.-U. Kang, S. Cho, D. D. G. Lee, and J. Jeong. Yoursql: A high-performance database system leveraging in-storage computing. Proc. VLDB Endow., 9(12):924--935, Aug. 2016.
[21]
J. W. Josten, C. Mohan, I. Narang, and J. Z. Teng. Db2's use of the coupling facility for data sharing. IBM Systems Journal, 36(2):327--351, 1997.
[22]
Juchang Lee, Kihong Kim, and S. K. Cha. Differential logging: a commutative and associative logging scheme for highly parallel main memory database. In Proceedings 17th International Conference on Data Engineering, pages 173--182, 2001.
[23]
A. Kopytov. Sysbench: a system performance benchmark. http://sysbench. sourceforge. net/, 2004.
[24]
T. Lahiri, V. Srihari, W. Chan, N. MacNaughton, and S. Chandrasekaran. Cache fusion: Extending shared-disk clusters with shared caches. In Proceedings of the 27th International Conference on Very Large Data Bases, VLDB '01, page 683--686, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc.
[25]
P. L. Lehman and s. B. Yao. Efficient locking for concurrent operations on b-trees. ACM Trans. Database Syst., 6(4):650--670, Dec. 1981.
[26]
F. Li. Cloud-native database systems at alibaba: Opportunities and challenges. Proc. VLDB Endow., 12(12):2263--2272, Aug. 2019.
[27]
F. Li, S. Das, M. Syamala, and V. R. Narasayya. Accelerating relational databases by leveraging remote memory and rdma. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16, page 355--370, New York, NY, USA, 2016. Association for Computing Machinery.
[28]
Z. Liu, I. Calciu, M. Herlihy, and O. Mutlu. Concurrent data structures for near-memory computing. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '17, page 235--245, New York, NY, USA, 2017. Association for Computing Machinery.
[29]
R. A. Lorie, C. Mohan, and M. H. Pirahesh. Multiple version database concurrency control system, 1994.
[30]
Y. Mao, E. Kohler, and R. T. Morris. Cache craftiness for fast multicore key-value storage. In Proceedings of the 7th ACM European Conference on Computer Systems, EuroSys '12, page 183--196, New York, NY, USA, 2012. Association for Computing Machinery.
[31]
H. A. Maruf and M. Chowdhury. Effectively prefetching remote memory with leap. In 2020 USENIX Annual Technical Conference (USENIX ATC 20), pages 843--857. USENIX Association, July 2020.
[32]
C. Mohan. Aries/kvl: A key-value locking method for concurrency control of multiaction transactions operating on b-tree indexes. In Proceedings of the 16th International Conference on Very Large Data Bases, VLDB '90, page 392--405, San Francisco, CA, USA, 1990. Morgan Kaufmann Publishers Inc.
[33]
C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. Aries: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst., 17(1):94--162, Mar. 1992.
[34]
C. Mohan, H. Pirahesh, and R. Lorie. Efficient and flexible methods for transient versioning of records to avoid locking by read-only transactions. In Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data, SIGMOD '92, page 124--133, New York, NY, USA, 1992. Association for Computing Machinery.
[35]
Oracle. Innodb. https://dev.mysql.com/doc/refman/8.0/en/innodb-storage-engine.html.
[36]
J. Ousterhout, A. Gopalan, A. Gupta, A. Kejriwal, C. Lee, B. Montazeri, D. Ongaro, S. J. Park, H. Qin, M. Rosenblum, S. Rumble, R. Stutsman, and S. Yang. The ramcloud storage system. ACM Trans. Comput. Syst., 33(3), Aug. 2015.
[37]
M. Poess and C. Floyd. New tpc benchmarks for decision support and web commerce. SIGMOD Rec., 29(4):64--71, Dec. 2000.
[38]
Y. Shan, Y. Huang, Y. Chen, and Y. Zhang. Legoos: A disseminated, distributed OS for hardware resource disaggregation. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 69--87, Carlsbad, CA, Oct. 2018. USENIX Association.
[39]
G. Singh, L. Chelini, S. Corda, A. Javed Awan, S. Stuijk, R. Jordans, H. Corporaal, and A. Boonstra. A review of near-memory computing architectures: Opportunities and challenges. In 2018 21st Euromicro Conference on Digital System Design (DSD), pages 608--617, 2018.
[40]
M. Singh and B. Leonhardi. Introduction to the IBM Netezza warehouse appliance. In Proceedings of the Conference of the Center for Advanced Studies on Collaborative Research (CASCON), pages 385--386, 2011.
[41]
J. Tan, T. Zhang, F. Li, J. Chen, Q. Zheng, P. Zhang, H. Qiao, Y. Shi, W. Cao, and R. Zhang. Ibtune: Individualized buffer tuning for large-scale cloud databases. Proc. VLDB Endow., 12(10):1221--1234, June 2019.
[42]
M. TECHNOLOGIES. Rdma aware networks programming user manual. http://www.mellanox.com/related-docs/prod_software/RDMA_Aware_Programming_user_manual.pdf.
[43]
A. Verbitski, A. Gupta, D. Saha, M. Brahmadesam, K. Gupta, R. Mittal, S. Krishnamurthy, S. Maurice, T. Kharatishvili, and X. Bao. Amazon aurora: Design considerations for high throughput cloud-native relational databases. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD '17, page 1041--1052, New York, NY, USA, 2017. Association for Computing Machinery.
[44]
Y. Wiseman and S. Jiang. Advanced Operating Systems and Kernel Applications: Techniques and Technologies: Techniques and Technologies. IGI Global, 2009.
[45]
L. Woods, Z. István, and G. Alonso. Ibex: An intelligent storage engine with support for advanced sql offloading. Proc. VLDB Endow., 7(11):963--974, July 2014.
[46]
E. Zamanian, C. Binnig, T. Harris, and T. Kraska. The end of a myth: Distributed transactions can scale. Proc. VLDB Endow., 10(6):685--696, Feb. 2017.
[47]
Q. Zhang, Y. Cai, X. Chen, S. Angel, A. Chen, V. Liu, and B. T. Loo. Understanding the effect of data center resource disaggregation on production dbmss. Proc. VLDB Endow., 13(9):1568--1581, May 2020.

Cited By

View all
  • (2024)SepHash: A Write-Optimized Hash Index On Disaggregated Memory via Separate Segment StructureProceedings of the VLDB Endowment10.14778/3641204.364121817:5(1091-1104)Online publication date: 2-May-2024
  • (2024)Database Native Model Selection: Harnessing Deep Neural Networks in Database SystemsProceedings of the VLDB Endowment10.14778/3641204.364121217:5(1020-1033)Online publication date: 2-May-2024
  • (2024)Scythe: A Low-latency RDMA-enabled Distributed Transaction System for Disaggregated MemoryACM Transactions on Architecture and Code Optimization10.1145/366600421:3(1-26)Online publication date: 27-May-2024
  • Show More Cited By

Index Terms

  1. PolarDB Serverless: A Cloud Native Database for Disaggregated Data Centers

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
      June 2021
      2969 pages
      ISBN:9781450383431
      DOI:10.1145/3448016
      This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 June 2021

      Check for updates

      Author Tags

      1. cloud database
      2. disaggregated data center
      3. shared remote memory
      4. shared storage

      Qualifiers

      • Research-article

      Conference

      SIGMOD/PODS '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 785 of 4,003 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1,537
      • Downloads (Last 6 weeks)131
      Reflects downloads up to 12 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)SepHash: A Write-Optimized Hash Index On Disaggregated Memory via Separate Segment StructureProceedings of the VLDB Endowment10.14778/3641204.364121817:5(1091-1104)Online publication date: 2-May-2024
      • (2024)Database Native Model Selection: Harnessing Deep Neural Networks in Database SystemsProceedings of the VLDB Endowment10.14778/3641204.364121217:5(1020-1033)Online publication date: 2-May-2024
      • (2024)Scythe: A Low-latency RDMA-enabled Distributed Transaction System for Disaggregated MemoryACM Transactions on Architecture and Code Optimization10.1145/366600421:3(1-26)Online publication date: 27-May-2024
      • (2024)A Memory-Disaggregated Radix TreeACM Transactions on Storage10.1145/366428920:3(1-41)Online publication date: 6-Jun-2024
      • (2024)Understanding the Performance Implications of the Design Principles in Storage-Disaggregated DatabasesProceedings of the ACM on Management of Data10.1145/36549832:3(1-26)Online publication date: 30-May-2024
      • (2024)Towards Buffer Management with Tiered Main MemoryProceedings of the ACM on Management of Data10.1145/36392862:1(1-26)Online publication date: 26-Mar-2024
      • (2024)Rcmp: Reconstructing RDMA-Based Memory Disaggregation via CXLACM Transactions on Architecture and Code Optimization10.1145/363491621:1(1-26)Online publication date: 19-Jan-2024
      • (2024)TimeCloth: Fast Point-in-Time Database Recovery in The CloudCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653382(214-226)Online publication date: 9-Jun-2024
      • (2024)PolarDB-MP: A Multi-Primary Cloud-Native Database via Disaggregated Shared MemoryCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653377(295-308)Online publication date: 9-Jun-2024
      • (2024)FaaSKeeper: Learning from Building Serverless Services with ZooKeeper as an ExampleProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658661(94-108)Online publication date: 3-Jun-2024
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media