Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3555041.3589403acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
tutorial
Open access

Disaggregated Database Systems

Published: 05 June 2023 Publication History

Abstract

Disaggregated database systems achieve unprecedented excellence in elasticity and resource utilization at the cloud scale and have gained great momentum from both industry and academia recently. Such systems are developed in response to the emerging trend of disaggregated data centers where resources are physically separated and connected through fast data center networks. Database management systems have been traditionally built based on monolithic architectures, so disaggregation fundamentally challenges the designs. On the other hand, disaggregation offers benefits like independent scaling of compute, memory, and storage. Nonetheless, there is a lack of systematic investigation into new research challenges and opportunities in recent disaggregated database systems.
To provide database researchers and practitioners with insights into different forms of resource disaggregation, we take a snapshot of state-of-the-art disaggregated database systems and related techniques and present an in-depth tutorial. The primary goal is to better understand the enabling techniques and characteristics of resource disaggregation and its implications for next-generation database systems. To that end, we survey recent work on storage disaggregation, which separates secondary storage devices (e.g., SSDs) from compute servers and is widely deployed in current cloud data centers, and memory disaggregation, which further splits compute and memory with Remote Direct Memory Access (RDMA) and is driving the transformation of clouds. In addition, we mention two techniques that bring novel perspectives to the above two paradigms: persistent memory and Compute Express Link (CXL). Finally, we identify several directions that shed light on the future development of disaggregated database systems.

References

[1]
Advancing Cloud with Memory Disaggregation, https://www.ibm.com/blogs/research/2018/01/advancing-cloud-memory-disaggregation/.
[2]
Amazon Redshift RA3 Instances with Managed Storage, https://aws.amazon.com/redshift/features/ra3/.
[3]
Intel RSD, https://www.intel.com/content/www/us/en/architecture-and-technology/rack-scale-design-overview.html.
[4]
J. Aguilar-Saborit and R. Ramakrishnan. POLARIS: The Distributed SQL Engine in Azure Synapse. Proceedings of the VLDB Endowment (PVLDB), 13(12):3204--3216, 2020.
[5]
M. Ahn, A. Chang, D. Lee, J. Gim, J. Kim, J. Jung, O. Rebholz, V. Pham, K. T. Malladi, and Y. Ki. Enabling CXL Memory Expansion for In-Memory Database Management Systems. In International Conference on Management of Data (DaMoN), pages 8:1--8:5, 2022.
[6]
P. Antonopoulos, A. Budovski, C. Diaconu, A. H. Saenz, J. Hu, H. Kodavalla, D. Kossmann, S. Lingam, U. F. Minhas, N. Prakash, V. Purohit, H. Qu, C. S. Ravella, K. Reisteter, S. Shrotri, D. Tang, and V. Wakade. Socrates: The New SQL Server in the Cloud. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), pages 1743--1756, 2019.
[7]
N. Armenatzoglou, S. Basu, N. Bhanoori, M. Cai, N. Chainani, K. Chinta, V. Govindaraju, T. J. Green, M. Gupta, S. Hillig, E. Hotinger, Y. Leshinksy, J. Liang, M. McCreedy, F. Nagel, I. Pandis, P. Parchas, R. Pathak, O. Polychroniou, F. Rahman, G. Saxena, G. Soundararajan, S. Subramanian, and D. Terry. Amazon Redshift Re-invented. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), pages 2205--2217, 2022.
[8]
Azure SQL Database. Hyperscale service tier. https://learn.microsoft.com/en-us/azure/azure-sql/database/service-tier-hyperscale?view=azuresql, 2023.
[9]
W. Bai, S. S. Abdeen, A. Agrawal, K. K. Attre, P. Bahl, A. Bhagat, G. Bhaskara, T. Brokhman, L. Cao, A. Cheema, R. Chow, J. Cohen, M. Elhaddad, V. Ette, I. Figlin, D. Firestone, M. George, I. German, L. Ghai, E. Green, A. Greenberg, M. Gupta, R. Haagens, M. Hendel, R. Howlader, N. John, J. Johnstone, T. Jolly, G. Kramer, D. Kruse, A. Kumar, E. Lan, I. Lee, A. Levy, M. Lipshteyn, X. Liu, C. Liu, G. Lu, Y. Lu, X. Lu, V. Makhervaks, U. Malashanka, D. A. Maltz, I. Marinos, R. Mehta, S. Murthi, A. Namdhari, A. Ogus, J. Padhye, M. Pandya, D. Phillips, A. Power, S. Puri, S. Raindel, J. Rhee, A. Russo, M. Sah, A. Sheriff, C. Sparacino, A. Srivastava, W. Sun, N. Swanson, F. Tian, L. Tomczyk, V. Vadlamuri, A. Wolman, Y. Xie, J. Yom, L. Yuan, Y. Zhang, and B. Zill. Empowering Azure Storage with RDMA. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2023.
[10]
W. Cao, Z. Liu, P. Wang, S. Chen, C. Zhu, S. Zheng, Y. Wang, and G. Ma. PolarFS: An Ultra-Low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database. Proceedings of the VLDB Endowment (PVLDB), 11(12):1849--1862, 2018.
[11]
W. Cao, Y. Zhang, X. Yang, F. Li, S. Wang, Q. Hu, X. Cheng, Z. Chen, Z. Liu, J. Fang, B. Wang, Y. Wang, H. Sun, Z. Yang, Z. Cheng, S. Chen, J. Wu, W. Hu, J. Zhao, Y. Gao, S. Cai, Y. Zhang, and J. Tong. PolarDB Serverless: A Cloud Native Database for Disaggregated Data Centers. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), pages 2477--2489, 2021.
[12]
H. Craddock, L. P. Konudula, K. Cheng, and G. Kul. The Case for Physical Memory Pools: A Vision Paper. In International Conference on Cloud Computing (CLOUD), pages 208--221, 2019.
[13]
U. Cubukcu, O. Erdogan, S. Pathak, S. Sannakkayala, and M. Slot. Citus: Distributed PostgreSQL for Data-Intensive Applications. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), pages 2490--2502, 2021.
[14]
B. Dageville, T. Cruanes, M. Zukowski, V. Antonov, A. Avanes, J. Bock, J. Claybaugh, D. Engovatov, M. Hentschel, J. Huang, A. W. Lee, A. Motivala, A. Q. Munir, S. Pelley, P. Povinec, G. Rahn, S. Triantafyllis, and P. Unterbrunner. The Snowflake Elastic Data Warehouse. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), pages 215--226, 2016.
[15]
A. Depoutovitch, C. Chen, J. Chen, P. Larson, S. Lin, J. Ng, W. Cui, Q. Liu, W. Huang, Y. Xiao, and Y. He. Taurus Database: How to be Fast, Available, and Frugal in the Cloud. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), pages 1463--1478, 2020.
[16]
Google Cloud. AlloyDB for PostgreSQL Under the Hood: Intelligent, Database-aware Storage. https://cloud.google.com/blog/products/databases/alloydb-for-postgresql-intelligent-scalable-storage, 2022.
[17]
D. Gouk, S. Lee, M. Kwon, and M. Jung. Direct Access, High-Performance Memory Disaggregation with DirectCXL. In USENIX Annual Technical Conference (ATC), pages 287--294, 2022.
[18]
J. Gu, Y. Lee, Y. Zhang, M. Chowdhury, and K. G. Shin. Efficient Memory Disaggregation with Infiniswap. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI), pages 649--667, 2017.
[19]
C. Guo, H. Wu, Z. Deng, G. Soni, J. Ye, J. Padhye, and M. Lipshteyn. RDMA over Commodity Ethernet at Scale. In Proceedings of the ACM Conference on Data Communication (SIGCOMM), pages 202--215, 2016.
[20]
A. Gupta, D. Agarwal, D. Tan, J. Kulesza, R. Pathak, S. Stefani, and V. Srinivasan. Amazon Redshift and the Case for Simpler Data Warehouses. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), pages 1917--1923, 2015.
[21]
A. Kalia, D. G. Andersen, and M. Kaminsky. Challenges and Solutions for Fast Remote Persistent Memory Access. In ACM Symposium on Cloud Computing (SoCC), pages 105--119, 2020.
[22]
K. Keeton. Memory-Driven Computing. https://www.usenix.org/sites/default/files/conference/protected-files/fast17_slides_keeton.pdf. In USENIX Conference on File and Storage Technologies (FAST), 2017.
[23]
D. Korolija, D. Koutsoukos, K. Keeton, K. Taranov, D. S. Milojicic, and G. Alonso. Farview: Disaggregated Memory with Operator Off-loading for Database Engines. In Conference on Innovative Data Systems Research (CIDR), 2022.
[24]
V. Leis, F. Scheibner, A. Kemper, and T. Neumann. The ART of Practical Synchronization. In Proceedings of the International Workshop on Data Management on New Hardware (DaMoN), pages 3:1--3:8, 2016.
[25]
F. Li. Cloud Native Database Systems at Alibaba: Opportunities and Challenges. Proceedings of the VLDB Endowment (PVLDB), 12(12):2263--2272, 2019.
[26]
G. Li, H. Dong, and C. Zhang. Cloud Databases: New Techniques, Challenges, and Opportunities. Proceedings of the VLDB Endowment (PVLDB), 15(12):3758--3761, 2022.
[27]
H. Li, M. Hao, S. Novakovic, V. Gogte, S. Govindan, D. R. K. Ports, I. Zhang, R. Bianchini, H. S. Gunawi, and A. Badam. Pond: CXL-Based Memory Pooling Systems for Cloud Platforms. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2023.
[28]
P. Mehra and T. Coughlin. Taming Memory With Disaggregation. Computer, 55(9):94--98, 2022.
[29]
S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, T. Vassilakis, H. Ahmadi, D. Delorey, S. Min, M. Pasumansky, and J. Shute. Dremel: A Decade of Interactive SQL Analysis at Web Scale. Proceedings of the VLDB Endowment (PVLDB), 13(12):3461--3472, 2020.
[30]
G. Moerkotte. Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing. In Proceedings of International Conference on Very Large Data Bases (VLDB), pages 476--487, 1998.
[31]
V. R. Narasayya and S. Chaudhuri. Cloud Data Services: Workloads, Architectures and Multi-Tenancy. Foundations and Trends in Databases, 10(1):1--107, 2021.
[32]
V. R. Narasayya and S. Chaudhuri. Multi-Tenant Cloud Data Services: State-of-the-Art, Challenges and Opportunities. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), pages 2465--2473, 2022.
[33]
L. Poutievski, O. Mashayekhi, J. Ong, A. Singh, M. M. B. Tariq, R. Wang, J. Zhang, V. Beauregard, P. Conner, S. D. Gribble, R. Kapoor, S. Kratzer, N. Li, H. Liu, K. Nagaraj, J. Ornstein, S. Sawhney, R. Urata, L. Vicisano, K. Yasumura, S. Zhang, J. Zhou, and A. Vahdat. Jupiter Evolving: Transforming Google's Datacenter Network via Optical Circuit Switches and Software-Defined Networking. In Proceedings of the ACM Conference on Data Communication (SIGCOMM), pages 66--85, 2022.
[34]
C. Ruan, Y. Zhang, C. Bi, X. Ma, H. Chen, F. Li, X. Yang, C. Li, A. Aboulnaga, and Y. Xu. Persistent Memory Disaggregation for Cloud-Native Relational Databases. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 498--512, 2023.
[35]
Y. Shan, Y. Huang, Y. Chen, and Y. Zhang. LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 69--87, 2018.
[36]
J. Shi. Exadata with Persistent Memory: An Epic Journey, https://www.snia.org/educational-library/exadata-persistent-memoryan-epic-journey-2020, 2020.
[37]
M. Stonebraker. The Case for Shared Nothing. IEEE Data Engineering Bulletin, 9(1):4--9, 1986.
[38]
M. Stonebraker. Shared-nothing vs Shared-disk, https://www.youtube.com/watch?v=G-o2bFd91Sw. In Extremely Large Databases Workshop (XLDB), 2011.
[39]
The CXL Consortium. Compute Express Link: The Breakthrough CPU-to-Device Interconnect. https://www.computeexpresslink.org, 2022.
[40]
A. Verbitski, A. Gupta, D. Saha, M. Brahmadesam, K. Gupta, R. Mittal, S. Krishnamurthy, S. Maurice, T. Kharatishvili, and X. Bao. Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), pages 1041--1052, 2017.
[41]
M. Vuppalapati, J. Miron, R. Agarwal, D. Truong, A. Motivala, and T. Cruanes. Building An Elastic Query Engine on Disaggregated Storage. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI), pages 449--462, 2020.
[42]
J. Wang. Purdue CS592: Disaggregated Database Systems (Spring 2023), https://www.cs.purdue.edu/homes/csjgwang/CS592DisaggregatedDB/, 2023.
[43]
Q. Wang, Y. Lu, and J. Shu. Sherman: A Write-Optimized Distributed BTree Index on Disaggregated Memory. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), pages 1033--1048, 2022.
[44]
R. Wang, J. Wang, S. Idreos, M. Özsu, and W. Aref. The Case for Distributed Shared-Memory Databases with RDMA-Enabled Memory Disaggregation. Proceedings of the VLDB Endowment (PVLDB), 16(1):15--22, 2022.
[45]
R. Wang, J. Wang, P. Kadam, M. T. Özsu, and W. G. Aref. dLSM: An LSM-Based Index for Memory Disaggregation. In IEEE International Conference on Data Engineering (ICDE), 2023.
[46]
C. Wu, M. J. Amiri, J. Asch, H. Nagda, Q. Zhang, and B. T. Loo. FlexChain: An Elastic Disaggregated Blockchain. Proceedings of the VLDB Endowment (PVLDB), 16(1):23--36, 2022.
[47]
S. Xue, S. Zhao, Q. Chen, G. Deng, Z. Liu, J. Zhang, Z. Song, T. Ma, Y. Yang, Y. Zhou, K. Niu, S. Sun, and M. Guo. Spool: Reliable Virtualized NVMe Storage Pool in Public Cloud Infrastructure. In USENIX Annual Technical Conference (ATC), pages 97--110, 2020.
[48]
Y. Yang, M. Youill, M. E. Woicik, Y. Liu, X. Yu, M. Serafini, A. Aboulnaga, and M. Stonebraker. FlexPushdownDB: Hybrid Pushdown and Caching in a Cloud DBMS. Proceedings of the VLDB Endowment (PVLDB), 14(11):2101--2113, 2021.
[49]
C. Zhan, M. Su, C. Wei, X. Peng, L. Lin, S. Wang, Z. Chen, F. Li, Y. Pan, F. Zheng, and C. Chai. AnalyticDB: Real-time OLAP Database System at Alibaba Cloud. Proceedings of the VLDB Endowment (PVLDB), 12(12):2059--2070, 2019.
[50]
M. Zhang, Y. Hua, P. Zuo, and L. Liu. FORD: Fast One-sided RDMA-based Distributed Transactions for Disaggregated Persistent Memory. In USENIX Conference on File and Storage Technologies (FAST), pages 51--68, 2022.
[51]
Q. Zhang, P. A. Bernstein, D. S. Berger, and B. Chandramouli. Redy: Remote Dynamic Memory Cache. Proceedings of the VLDB Endowment (PVLDB), 15(4):766 -- 779, 2022.
[52]
Q. Zhang, P. A. Bernstein, D. S. Berger, B. Chandramouli, V. Liu, and B. T. Loo. CompuCache: Remote Computable Caching using Spot VMs. In Conference on Innovative Data Systems Research (CIDR), 2022.
[53]
Q. Zhang, Y. Cai, S. Angel, V. Liu, A. Chen, and B. T. Loo. Rethinking Data Management Systems for Disaggregated Data Centers. In Conference on Innovative Data Systems Research (CIDR), 2020.
[54]
Q. Zhang, Y. Cai, X. Chen, S. Angel, A. Chen, V. Liu, and B. T. Loo. Understanding the Effect of Data Center Resource Disaggregation on Production DBMSs. Proceedings of the VLDB Endowment (PVLDB), 13(9):1568--1581, 2020.
[55]
Q. Zhang, X. Chen, S. Sankhe, Z. Zheng, K. Zhong, S. Angel, A. Chen, V. Liu, and B. T. Loo. Optimizing Data-intensive Systems in Disaggregated Data Centers with TELEPORT. In ACM International Conference on Management of Data (SIGMOD), pages 1345--1359, 2022.
[56]
Y. Zhang, C. Ruan, C. Li, J. Yang, W. Cao, F. Li, B. Wang, J. Fang, Y. Wang, J. Huo, and C. Bi. Towards Cost-Effective and Elastic Cloud Database Deployment via Memory Disaggregation. Proceedings of the VLDB Endowment (PVLDB), 14(10):1900--1912, 2021.
[57]
T. Ziegler, S. Tumkur Vani, C. Binnig, R. Fonseca, and T. Kraska. Designing Distributed Tree-based Index Structures for Fast RDMA-capable Networks. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), pages 741--758, 2019.
[58]
P. Zuo, J. Sun, L. Yang, S. Zhang, and Y. Hua. One-sided RDMA-Conscious Extendible Hashing for Disaggregated Memory. In USENIX Annual Technical Conference (ATC), pages 15--29, 2021.

Cited By

View all
  • (2024)Energy-Aware Analytics in the CloudProceedings of the International Workshop on Big Data in Emergent Distributed Environments10.1145/3663741.3664789(1-6)Online publication date: 9-Jun-2024
  • (2024)Understanding the Performance Implications of the Design Principles in Storage-Disaggregated DatabasesProceedings of the ACM on Management of Data10.1145/36549832:3(1-26)Online publication date: 30-May-2024
  • (2024)SIMDified Data Processing - Foundations, Abstraction, and Advanced TechniquesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654694(613-621)Online publication date: 9-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '23: Companion of the 2023 International Conference on Management of Data
June 2023
330 pages
ISBN:9781450395076
DOI:10.1145/3555041
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2023

Check for updates

Author Tags

  1. databases
  2. memory disaggregation
  3. storage disaggregation

Qualifiers

  • Tutorial

Conference

SIGMOD/PODS '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)891
  • Downloads (Last 6 weeks)66
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Energy-Aware Analytics in the CloudProceedings of the International Workshop on Big Data in Emergent Distributed Environments10.1145/3663741.3664789(1-6)Online publication date: 9-Jun-2024
  • (2024)Understanding the Performance Implications of the Design Principles in Storage-Disaggregated DatabasesProceedings of the ACM on Management of Data10.1145/36549832:3(1-26)Online publication date: 30-May-2024
  • (2024)SIMDified Data Processing - Foundations, Abstraction, and Advanced TechniquesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654694(613-621)Online publication date: 9-Jun-2024
  • (2024)Vector Database Management Techniques and SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654691(597-604)Online publication date: 9-Jun-2024
  • (2024)Share: Stackelberg-Nash based Data Markets2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00275(3573-3586)Online publication date: 13-May-2024
  • (2024)Survey of vector database management systemsThe VLDB Journal10.1007/s00778-024-00864-x33:5(1591-1615)Online publication date: 15-Jul-2024
  • (2024)Optimizing LSM-based indexes for disaggregated memoryThe VLDB Journal10.1007/s00778-024-00863-yOnline publication date: 19-Jun-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media