Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Memory disaggregation: why now and what are the challenges

Published: 28 June 2023 Publication History

Abstract

Hardware disaggregation has emerged as one of the most fundamental shifts in how we build computer systems over the past decades. While disaggregation has been successful for several types of resources (storage, power, and others), memory disaggregation has yet to happen. We make the case that the time for memory disaggregation has arrived. We look at past successful disaggregation stories and learn that their success depended on two requirements: addressing a burning issue and being technically feasible. We examine memory disaggregation through this lens and find that both requirements are finally met. Once available, memory disaggregation will require software support to be used effectively. We discuss some of the challenges of designing an operating system that can utilize disaggregated memory for itself and its applications.

References

[1]
Emmanuel Amaro, Christopher Branner-Augmon, Zhihong Luo, Amy Ousterhout, Marcos K Aguilera, Aurojit Panda, Sylvia Ratnasamy, and Scott Shenker. Can far memory improve job throughput? In European Conference on Computer Systems, pages 1--16, April 2020.
[2]
Cristiana Amza, Alan L. Cox, Shandya Dwarkadas, Pete Keleher, Honghui Lu, Ramakrishnan Rajamony, Weimin Yu, and Willy Zwaenepoel. Tread- Marks: Shared memory computing on networks of workstations. IEEE Computer, 29(2):18--28, February 1996.
[3]
Krste Asanovi´c. FireBox: A hardware building block for 2020 Warehouse-Scale computers. In USENIX Conference on File and Storage Technologies, February 2014. Keynote talk.
[4]
J. K. Bennett, J. B. Carter, and W. Zwaenepoel. Munin: Distributed shared memory based on typespecific memory coherence. In ACM Symposium on Principles and Practice of Parallel Programming, pages 168--176, March 1990.
[5]
Maciej Bielski, Ilias Syrigos, Kostas Katrinis, Dimitris Syrivelis, Andrea Reale, Dimitris Theodoropoulos, Nikolaos Alachiotis, Dionisios N. Pnevmatikatos, Evert H. Pap, Georgios Zervas, Vaibhawa Mishra, Arsalan Saljoghei, Alvise Rigo, Jose Fernando Zazo, Sergio Lopez-Buedo, Martí Torrents, Ferad Zyulkyarov, Michael Enrico, and Oscar Gonzalez de Dios. dReDBox: Materializing a full-stack rack-scale system prototype of a next-generation disaggregated datacenter. In Design, Automation & Test in Europe Conference & Exhibition, pages 1093--1098, March 2018.
[6]
VMware Bitfusion. https://core.vmware.com/ bitfusion.
[7]
Qingchao Cai,Wentian Guo, Hao Zhang, Divyakant Agrawal, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, Yong Meng Teo, and Sheng Wang. Efficient distributed memory management with RDMA and caching. Proceedings of the VLDB Endowment, 11(11):1604--1617, July 2018.
[8]
Irina Calciu, M. Talha Imran, Ivan Puddu, Sanidhya Kashyap, Hasan Al Maruf, Onur Mutlu, and Aasheesh Kolli. Rethinking software runtimes for disaggregated memory. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems, April 2021.
[9]
Compute eXpress Link. https://www. computeexpresslink.org.
[10]
Paolo Faraboschi, Kimberly Keeton, Tim Marsland, and Dejan Milojicic. Beyond processor-centric operating systems. In Workshop on Hot Topics in Operating Systems, May 2015.
[11]
E. Felten and J. Zahorjan. Issues in the implementation of a remote memory paging system. Technical Report CSE TR 91-03-09, University of Washington, March 1991.
[12]
Gen-Z consortium. https://en.wikipedia. org/wiki/Gen-Z_(consortium).
[13]
Donghyun Gouk, Sangwon Lee, Miryeong Kwon, and Myoungsoo Jung. Direct access, highperformance memory disaggregation with DirectCXL. In USENIX Annual Technical Conference, pages 287--294, June 2022.
[14]
Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G Shin. Efficient memory disaggregation with Infiniswap. In Symposium on Networked Systems Design and Implementation, pages 649--667, March 2017.
[15]
Hannes Hapke and Catherine Nelson. Building Machine Learning Pipelines. O'Reilly Media, Inc, July 2020.
[16]
Intel rack scale architecture. https: //www-conf.slac.stanford.edu/ xldb2016/talks/published/Tues_6_ Mohan-Kumar-Rack-Scale-XLDB-Updated. pdf.
[17]
Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. Profiling a warehouse-scale computer. In International Symposium on Computer Architecture, pages 158--169, June 2015.
[18]
Kimberly Keeton. Memory driven computing. In USENIX Conference on File and Storage Technologies, February 2017. Keynote presentation.
[19]
Seung-seob Lee, Yanpeng Yu, Yupeng Tang, Anurag Khandelwal, Lin Zhong, and Abhishek Bhattacharjee. MIND: In-network memory management for disaggregated data centers. In ACM Symposium on Operating Systems Principles, pages 488--504, October 2021.
[20]
Huaicheng Li, Daniel S. Berger, Stanko Novakovic, Lisa Hsu, Dan Ernst, Pantea Zardoshti, Monish Shah, Samir Rajadnya, Scott Lee, Ishwar Agarwal, Mark D. Hill, Marcus Fontoura, and Ricardo Bianchini. Pond: CXL-based memory pooling systems for cloud platforms. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems, March 2023.
[21]
Kevin Lim, Jichuan Chang, Trevor Mudge, Parthasarathy Ranganathan, Steven K Reinhardt, and Thomas F Wenisch. Disaggregated memory for expansion and sharing in blade servers. ACM SIGARCH Computer Architecture News, 37(3):267--278, June 2009.
[22]
Mark Mansi and Michael M. Swift. /0sim: Preparing system software for a world with terabytescale memories. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems, page 267--282, March 2020.
[23]
Hasan Al Maruf, HaoWang, Abhishek Dhanotia, JohannesWeiner, Niket Agarwal, Pallab Bhattacharya, Chris Petersen, Mosharaf Chowdhury, Shobhit Kanaujia, and Prakash Chauhan. TPP: Transparent page placement for CXL-enabled tiered-memory. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems, page 742--755, March 2023.
[24]
Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. Latency-tolerant software distributed shared memory. In USENIX Annual Technical Conference, pages 291--305, July 2015.
[25]
Open compute project. https://www. opencompute.org.
[26]
Open19. https://www.open19.org.
[27]
OpenVMS. https://en.wikipedia.org/wiki/ OpenVMS.
[28]
Amanda Raybuck, Tim Stamler, Wei Zhang, Mattan Erez, and Simon Peter. HeMem: Scalable tiered memory management for big data applications and real NVM. In ACM Symposium on Operating Systems Principles, pages 392--407, October 2021.
[29]
RDMA over Converged Ethernet. https: //en.wikipedia.org/wiki/RDMA_over_ Converged_Ethernet.
[30]
Zhenyuan Ruan, Malte Schwarzkopf, Marcos K Aguilera, and Adam Belay. AIFM: Highperformance, application-integrated far memory. In Symposium on Operating Systems Design and Implementation, pages 315--332, November 2020.
[31]
Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. LegoOS: A disseminated, distributed os for hardware resource disaggregation. In Symposium on Operating Systems Design and Implementation, pages 69--87, October 2018.
[32]
Silicon photonics. https://en.wikipedia.org/ wiki/Silicon_photonics.
[33]
Scalable memory development kit. https:// github.com/OpenMPDK/SMDK.
[34]
Paul Teich. HPE powers up The Machine architecture, January 2017. https: //www.nextplatform.com/2017/01/09/ hpe-powers-machine-architecture.
[35]
Stephanie Wang, Eric Liang, Edward Oakes, Benjamin Hindman, Frank Sifei Luan, Audrey Cheng, and Ion Stoica. Ownership: A distributed futures system for fine-grained tasks. In Symposium on Networked Systems Design and Implementation, pages 671--686, April 2021.
[36]
Zi Yan, Daniel Lustig, David Nellans, and Abhishek Bhattacharjee. Nimble page management for tiered memory systems. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 331--345, April 2019.
[37]
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. Spark: Cluster computing with working sets. In Workshop on Hot Topics in Cloud Computing, June 2010.
[38]
Yang Zhou, Hassan MG Wassel, Sihang Liu, Jiaqi Gao, James Mickens, Minlan Yu, Chris Kennelly, Paul Turner, David E Culler, Henry M Levy, et al. Carbink: Fault-tolerant far memory. In Symposium on Operating Systems Design and Implementation, pages 55--71, July 2022. 46

Cited By

View all
  • (2024)An Examination of CXL Memory Use Cases for In-Memory Database Management Systems Using SAP HANAProceedings of the VLDB Endowment10.14778/3685800.368580917:12(3827-3840)Online publication date: 8-Nov-2024
  • (2024)Using Isoefficiency as a Metric to Assess Disaggregated Memory Systems for High Performance ComputingProceedings of the International Symposium on Memory Systems10.1145/3695794.3695812(192-197)Online publication date: 30-Sep-2024
  • (2024)CHIME: A Cache-Efficient and High-Performance Hybrid Index on Disaggregated MemoryProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695959(110-126)Online publication date: 4-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 57, Issue 1
SIGOPS
June 2023
53 pages
ISSN:0163-5980
DOI:10.1145/3606557
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2023
Published in SIGOPS Volume 57, Issue 1

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,206
  • Downloads (Last 6 weeks)119
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Examination of CXL Memory Use Cases for In-Memory Database Management Systems Using SAP HANAProceedings of the VLDB Endowment10.14778/3685800.368580917:12(3827-3840)Online publication date: 8-Nov-2024
  • (2024)Using Isoefficiency as a Metric to Assess Disaggregated Memory Systems for High Performance ComputingProceedings of the International Symposium on Memory Systems10.1145/3695794.3695812(192-197)Online publication date: 30-Sep-2024
  • (2024)CHIME: A Cache-Efficient and High-Performance Hybrid Index on Disaggregated MemoryProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695959(110-126)Online publication date: 4-Nov-2024
  • (2024)Rcmp: Reconstructing RDMA-Based Memory Disaggregation via CXLACM Transactions on Architecture and Code Optimization10.1145/363491621:1(1-26)Online publication date: 19-Jan-2024
  • (2024)Enabling Efficient Large Recommendation Model Training with Near CXL Memory Processing2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00036(382-395)Online publication date: 29-Jun-2024
  • (2024)Data Flow Architectures for Data Processing on Modern Hardware2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00439(5511-5522)Online publication date: 13-May-2024
  • (2024)Disaggregated RDDs: Extending and Analyzing Apache Spark for Memory Disaggregated Infrastructures2024 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E61754.2024.00019(107-117)Online publication date: 24-Sep-2024
  • (2024)Seastar: A Cache-Efficient and Load-Balanced Key-Value Store on Disaggregated Memory2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00031(275-285)Online publication date: 24-Sep-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media