Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Towards cost-effective and elastic cloud database deployment via memory disaggregation

Published: 01 June 2021 Publication History
  • Get Citation Alerts
  • Abstract

    It is challenging for cloud-native relational databases to meet the ever-increasing needs of scaling compute and memory resources independently and elastically. The recent emergence of memory disaggregation architecture, relying on high-speed RDMA network, offers opportunities to build cost-effective and elastic cloud-native databases. There exist proposals to let unmodified applications run transparently on disaggregated systems. However, running relational database kernel atop such proposals experiences notable performance degradation and time-consuming failure recovery, offsetting the benefits of disaggregation.
    To address these challenges, in this paper, we propose a novel database architecture called LegoBase, which explores the co-design of database kernel and memory disaggregation. It pushes the memory management back to the database layer for bypassing the Linux I/O stack and re-using or designing (remote) memory access optimizations with an understanding of data access patterns. LegoBase further splits the conventional ARIES fault tolerance protocol to independently handle the local and remote memory failures for fast recovery of compute instances. We implemented LegoBase atop MySQL. We compare LegoBase against MySQL running on a standalone machine and the state-of-the-art disaggregation proposal Infiniswap. Our evaluation shows that even with a large fraction of data placed on the remote memory, LegoBase's system performance in terms of throughput (up to 9.41% drop) and P99 latency (up to 11.58% increase) is comparable to the monolithic MySQL setup, and significantly outperforms (1.99x-2.33x, respectively) the deployment of MySQL over Infiniswap. Meanwhile, LegoBase introduces an up to 3.87x and 5.48x speedup of the recovery and warm-up time, respectively, over the monolithic MySQL and MySQL over Infiniswap, when handling failures or planned re-configurations.

    References

    [1]
    Marcos K. Aguilera, Nadav Amit, Irina Calciu, Xavier Deguillard, Jayneel Gandhi, Stanko Novakovic, Arun Ramanathan, Pratap Subrahmanyam, Lalith Suresh, Kiran Tati, Rajesh Venkatasubramanian, and Michael Wei. 2018. Remote regions: a simple abstraction for remote memory. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 775--787.
    [2]
    Hasan Al Maruf and Mosharaf Chowdhury. 2020. Effectively prefetching remote memory with leap. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). 843--857.
    [3]
    Sebastian Angel, Mihir Nanavati, and Siddhartha Sen. 2020. Disaggregation and the Application. In 12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 20). USENIX Association.
    [4]
    Krste Asanović. 2014. Firebox: A hardware building block for 2020 warehouse-scale computers. (2014).
    [5]
    TPC Benchmark. 2020. TPC-C. http://www.tpc.org/tpcc/. "[accessed-Dec-2020]".
    [6]
    TPC Benchmark. 2020. TPC-H. http://www.tpc.org/tpch/. "[accessed-Dec-2020]".
    [7]
    Laurent Bindschaedler, Ashvin Goel, and Willy Zwaenepoel. 2020. Hailstorm: Disaggregated Compute and Storage for Distributed LSM-based Databases. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 301--316.
    [8]
    Qingchao Cai, Wentian Guo, Hao Zhang, Divyakant Agrawal, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, Yong Meng Teo, and Sheng Wang. 2018. Efficient distributed memory management with RDMA and caching. Proceedings of the VLDB Endowment 11, 11 (2018), 1604--1617.
    [9]
    Wei Cao, Yang Liu, Zhushi Cheng, Ning Zheng, Wei Li, Wenjie Wu, Linqiang Ouyang, Peng Wang, Yijing Wang, Ray Kuan, et al. 2020. POLARDB Meets Computational Storage: Efficiently Support Analytical Workloads in Cloud-Native Relational Database. In 18th USENIX Conference on File and Storage Technologies (FAST 20). 29--41.
    [10]
    Wei Cao, Zhenjun Liu, Peng Wang, Sen Chen, Caifeng Zhu, Song Zheng, Yuhui Wang, and Guoqing Ma. 2018. PolarFS: an ultra-low latency and failure resilient distributed file system for shared storage cloud database. Proceedings of the VLDB Endowment 11, 12 (2018), 1849--1862.
    [11]
    Yue Cheng, Ali Anwar, and Xuejing Duan. 2018. Analyzing alibaba's co-located datacenter workloads. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 292--297.
    [12]
    I-Hsin Chung, Bulent Abali, and Paul Crumley. 2018. Towards a composable computer system. In Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region. 137--147.
    [13]
    HP Development Company. 2020. The Machine: A New Kind of Computer. https://www.hpl.hp.com/research/systems-research/themachine/. "[accessed-Oct-2020]".
    [14]
    James C Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, Jeffrey John Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, et al. 2013. Spanner: Google's globally distributed database. ACM Transactions on Computer Systems (TOCS) 31, 3 (2013), 1--22.
    [15]
    Sudipto Das, Miroslav Grbic, Igor Ilic, Isidora Jovandic, Andrija Jovanovic, Vivek R Narasayya, Miodrag Radulovic, Maja Stikic, Gaoxiang Xu, and Surajit Chaudhuri. 2019. Automatically indexing millions of databases in microsoft azure sql database. In Proceedings of the 2019 International Conference on Management of Data. 666--679.
    [16]
    Aleksandar Dragojević, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast remote memory. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). 401--414.
    [17]
    Facebook. 2013. Future rack technology. https://newsroom.intel.com/news-releases/intel-facebook-collaborate-on-future-data-center-rack-technologies/. "[accessed-Oct-2020]".
    [18]
    Paolo Faraboschi, Kimberly Keeton, Tim Marsland, and Dejan Milojicic. 2015. Beyond processor-centric operating systems. In 15th Workshop on Hot Topics in Operating Systems (HotOS XV).
    [19]
    Peter X Gao, Akshay Narayan, Sagar Karandikar, Joao Carreira, Sangjin Han, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. 2016. Network requirements for resource disaggregation. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 249--264.
    [20]
    Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, and Aditya Akella. 2014. Multi-resource packing for cluster schedulers. ACM SIG-COMM Computer Communication Review 44, 4 (2014), 455--466.
    [21]
    The PostgreSQL Global Development Group. 2021. PostgreSQL. https://www.postgresql.org/. "[accessed-April-2021]".
    [22]
    Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G Shin. 2017. Efficient memory disaggregation with infiniswap. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). 649--667.
    [23]
    Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. 2012. Erasure coding in windows azure storage. In Presented as part of the 2012 USENIX Annual Technical Conference (USENIX ATC 12). 15--26.
    [24]
    Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, et al. 2020. TiDB: a Raft-based HTAP database. Proceedings of the VLDB Endowment 13, 12 (2020), 3072--3084.
    [25]
    Docker Inc. 2020. Docker. https://www.docker.com/. "[accessed-Dec-2020]".
    [26]
    Intel. 2020. Rack Scale Architecture. https://www.intel.com/content/www/us/en/architecture-and-technology/rack-scale-design-overview.html. "[accessed-Oct-2020]".
    [27]
    Kostas Katrinis, Dimitris Syrivelis, Dionisios Pnevmatikatos, Georgios Zervas, Dimitris Theodoropoulos, Iordanis Koutsopoulos, K Hasharoni, Daniel Raho, Christian Pinto, F Espina, et al. 2016. Rack-scale disaggregated cloud data centers: The dReDBox project vision. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 690--695.
    [28]
    Kenneth C Knowlton. 1965. A fast storage allocator. Commun. ACM 8, 10 (1965), 623--624.
    [29]
    Alexey Kopytov. 2012. Sysbench manual. MySQL AB (2012), 2--3.
    [30]
    Willis Lang, Frank Bertsch, David J DeWitt, and Nigel Ellis. 2015. Microsoft azure SQL database telemetry. In Proceedings of the Sixth ACM Symposium on Cloud Computing. 189--194.
    [31]
    Feifei Li. 2019. Cloud-native database systems at Alibaba: Opportunities and challenges. Proceedings of the VLDB Endowment 12, 12 (2019), 2263--2272.
    [32]
    Huan Liu. 2011. A measurement study of server utilization in public clouds. In 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing. IEEE, 435--442.
    [33]
    Chandrasekaran Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, and Peter Schwarz. 1992. ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems (TODS) 17, 1 (1992), 94--162.
    [34]
    MySQL. 2015. MySQL Buffer Pool LRU Algorithm. https://dev.mysql.com/doc/refman/5.7/en/innodb-buffer-pool.html. "[accessed-Dec-2020]".
    [35]
    Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. 2015. Latency-tolerant software distributed shared memory. In 2015 {USENIX} Annual Technical Conference ({USENIX}{ATC} 15). 291--305.
    [36]
    Joe Novak, Sneha Kumar Kasera, and Ryan Stutsman. 2020. Auto-Scaling Cloud-Based Memory-Intensive Applications. In 2020 IEEE 13th International Conference on Cloud Computing (CLOUD). IEEE, 229--237.
    [37]
    James S Plank. 1997. A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems. Software: Practice and Experience 27, 9 (1997), 995--1012.
    [38]
    Chenhao Qu, Rodrigo N Calheiros, and Rajkumar Buyya. 2018. Auto-scaling web applications in clouds: A taxonomy and survey. ACM Computing Surveys (CSUR) 51, 4 (2018), 1--33.
    [39]
    Irving S Reed and Gustave Solomon. 1960. Polynomial codes over certain finite fields. Journal of the society for industrial and applied mathematics 8, 2 (1960), 300--304.
    [40]
    Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. 2018. LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, 69--87.
    [41]
    Reza Sherkat, Colin Florendo, Mihnea Andrei, Rolando Blanco, Adrian Dragusanu, Amit Pathak, Pushkar Khadilkar, Neeraj Kulkarni, Christian Lemke, Sebastian Seifert, et al. 2019. Native store extension for SAP HANA. Proceedings of the VLDB Endowment 12, 12 (2019), 2047--2058.
    [42]
    Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, and Xiaofeng Bao. 2017. Amazon aurora: Design considerations for high throughput cloud-native relational databases. In Proceedings of the 2017 ACM International Conference on Management of Data. 1041--1052.
    [43]
    Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale cluster management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems. 1--17.
    [44]
    Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. 2006. Ceph: A Scalable, High-Performance Distributed File System. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (Seattle, Washington) (OSDI '06). USENIX Association, USA, 307--320.
    [45]
    Qizhen Zhang, Yifan Cai, Xinyi Chen, Sebastian Angel, Ang Chen, Vincent Liu, and Boon Thau Loo. 2020. Understanding the effect of data center resource disaggregation on production DBMSs. Proceedings of the VLDB Endowment 13, 9 (2020), 1568--1581.

    Cited By

    View all
    • (2024)SepHash: A Write-Optimized Hash Index On Disaggregated Memory via Separate Segment StructureProceedings of the VLDB Endowment10.14778/3641204.364121817:5(1091-1104)Online publication date: 1-Jan-2024
    • (2024)Software-based Live Migration for Containerized RDMAProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663416(52-58)Online publication date: 3-Aug-2024
    • (2024)Understanding the Performance Implications of the Design Principles in Storage-Disaggregated DatabasesProceedings of the ACM on Management of Data10.1145/36549832:3(1-26)Online publication date: 30-May-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 14, Issue 10
    June 2021
    219 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 June 2021
    Published in PVLDB Volume 14, Issue 10

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)198
    • Downloads (Last 6 weeks)13

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)SepHash: A Write-Optimized Hash Index On Disaggregated Memory via Separate Segment StructureProceedings of the VLDB Endowment10.14778/3641204.364121817:5(1091-1104)Online publication date: 1-Jan-2024
    • (2024)Software-based Live Migration for Containerized RDMAProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663416(52-58)Online publication date: 3-Aug-2024
    • (2024)Understanding the Performance Implications of the Design Principles in Storage-Disaggregated DatabasesProceedings of the ACM on Management of Data10.1145/36549832:3(1-26)Online publication date: 30-May-2024
    • (2024)Towards Buffer Management with Tiered Main MemoryProceedings of the ACM on Management of Data10.1145/36392862:1(1-26)Online publication date: 26-Mar-2024
    • (2024)SplitFT: Fault Tolerance for Disaggregated Datacenters via Remote Memory LoggingProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629561(590-607)Online publication date: 22-Apr-2024
    • (2024)PolarDB-MP: A Multi-Primary Cloud-Native Database via Disaggregated Shared MemoryCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653377(295-308)Online publication date: 9-Jun-2024
    • (2024)Scaling Up Memory Disaggregated Applications with SMARTProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624857(351-367)Online publication date: 27-Apr-2024
    • (2023)Efficient Data Transfer in Shared-storage Cloud Data Processing Systems with OPTICSProceedings of the 33rd Annual International Conference on Computer Science and Software Engineering10.5555/3615924.3623630(230-234)Online publication date: 11-Sep-2023
    • (2023)Exploiting Cloud Object Storage for High-Performance AnalyticsProceedings of the VLDB Endowment10.14778/3611479.361148616:11(2769-2782)Online publication date: 24-Aug-2023
    • (2023)VeriTxn: Verifiable Transactions for Cloud-Native Databases with Storage DisaggregationProceedings of the ACM on Management of Data10.1145/36267641:4(1-27)Online publication date: 12-Dec-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media