Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3613424.3614256acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices

Published: 08 December 2023 Publication History

Abstract

The ever-growing demands for memory with larger capacity and higher bandwidth have driven recent innovations on memory expansion and disaggregation technologies based on Compute eXpress Link (CXL). Especially, CXL-based memory expansion technology has recently gained notable attention for its ability not only to economically expand memory capacity and bandwidth but also to decouple memory technologies from a specific memory interface of the CPU. However, since CXL memory devices have not been widely available, they have been emulated using DDR memory in a remote NUMA node. In this paper, for the first time, we comprehensively evaluate a true CXL-ready system based on the latest 4th-generation Intel Xeon CPU with three CXL memory devices from different manufacturers. Specifically, we run a set of microbenchmarks not only to compare the performance of true CXL memory with that of emulated CXL memory but also to analyze the complex interplay between the CPU and CXL memory in depth. This reveals important differences between emulated CXL memory and true CXL memory, some of which will compel researchers to revisit the analyses and proposals from recent work. Next, we identify opportunities for memory-bandwidth-intensive applications to benefit from the use of CXL memory. Lastly, we propose a CXL-memory-aware dynamic page allocation policy, Caption to more efficiently use CXL memory as a bandwidth expander. We demonstrate that Caption can automatically converge to an empirically favorable percentage of pages allocated to CXL memory, which improves the performance of memory-bandwidth-intensive applications by up to 24% when compared to the default page allocation policy designed for traditional NUMA systems.

References

[1]
Ahmed Abulila, Vikram Sharma Mailthody, Zaid Qureshi, Jian Huang, Nam Sung Kim, Jinjun Xiong, and Wen-mei Hwu. 2019. FlatFlash: Exploiting the Byte-Accessibility of SSDs within a Unified Memory-Storage Hierarchy. In Proceedings of the 24th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’19).
[2]
Neha Agarwal and Thomas F. Wenisch. 2017. Thermostat: Application-Transparent Page Management for Two-Tiered Main Memory. In Proceedings of the 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’17).
[3]
Mohammad Alian, Yifan Yuan, Jie Zhang, Ren Wang, Myoungsoo Jung, and Nam Sung Kim. 2020. Data Direct I/O Characterization for Future I/O System Exploration. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’20).
[4]
AMD. accessed in 2023. 4th Gen AMD EPYC™ Processor Architecture. https://www.amd.com/en/campaigns/epyc-9004-architecture.
[5]
Thomas E. Anderson, Marco Canini, Jongyul Kim, Dejan Kostić, Youngjin Kwon, Simon Peter, Waleed Reda, Henry N. Schuh, and Emmett Witchel. 2020. Assise: Performance and Availability via Client-local NVM in a Distributed File System. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI’20).
[6]
Moiz Arif, Kevin Assogba, M. Mustafa Rafique, and Sudharshan Vazhkudai. 2023. Exploiting CXL-Based Memory for Distributed Deep Learning. In Proceedings of the 51st International Conference on Parallel Processing (ICPP’22).
[7]
Jens Axboe. accessed in 2023. Flexible I/O Tester. https://github.com/axboe/fio.
[8]
Duck-Ho Bae, Insoon Jo, Youra Adel Choi, Joo-Young Hwang, Sangyeun Cho, Dong-Gi Lee, and Jaeheon Jeong. 2018. 2B-SSD: The Case for Dual, Byte- and Block-Addressable Solid-State Drives. In Proceedings of the ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18).
[9]
Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel Cohen. 2009. Noise Reduction in Speech Processing. Springer.
[10]
Lawrence Benson, Hendrik Makait, and Tilmann Rabl. 2021. Viper: An Efficient Hybrid PMem-DRAM Key-Value Store. Proceedings of the VLDB Endowment (2021).
[11]
Daniel S. Berger, Daniel Ernst, Huaicheng Li, Pantea Zardoshti, Monish Shah, Samir Rajadnya, Scott Lee, Lisa Hsu, Ishwar Agarwal, Mark D. Hill, and Ricardo Bianchini. 2023. Design Tradeoffs in CXL-Based Memory Pools for Public Cloud Platforms. IEEE Micro (2023).
[12]
Shai Bergman, Priyank Faldu, Boris Grot, Lluís Vilanova, and Mark Silberstein. 2022. Reconsidering OS Memory Optimizations in the Presence of Disaggregated Memory. In Proceedings of the ACM SIGPLAN International Symposium on Memory Management (ISMM’22).
[13]
James Bucek, Klaus-Dieter Lange, and Jóakim v. Kistowski. 2018. SPEC CPU2017: Next-Generation Compute Benchmark. In Companion of the ACM/SPEC International Conference on Performance Engineering(ICPE’18).
[14]
Irina Calciu, M. Talha Imran, Ivan Puddu, Sanidhya Kashyap, Hasan Al Maruf, Onur Mutlu, and Aasheesh Kolli. 2021. Rethinking Software Runtimes for Disaggregated Memory. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’21).
[15]
Daniel W. Chang, Gyungsu Byun, Hoyoung Kim, Minwook Ahn, Soojung Ryu, Nam S. Kim, and Michael Schulte. 2013. Reevaluating the Latency Claims of 3D Stacked Memories. In Proceedings of the 18th Asia and South Pacific Design Automation Conference (ASP-DAC’18).
[16]
Youmin Chen, Youyou Lu, Kedong Fang, Qing Wang, and Jiwu Shu. 2020. uTree: a Persistent B+-Tree with Low Tail Latency. Proceedings of the VLDB Endowment (2020).
[17]
Youmin Chen, Youyou Lu, Fan Yang, Qing Wang, Yang Wang, and Jiwu Shu. 2020. FlatStore: An Efficient Log-Structured Key-Value Storage Engine for Persistent Memory. In Proceedings of the 25th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’20).
[18]
Chiachen Chou, Aamer Jaleel, and Moinuddin Qureshi. 2017. BATMAN: Techniques for Maximizing System Bandwidth of Memory Systems with Stacked-DRAM. In Proceedings of the International Symposium on Memory Systems(MEMSYS’17).
[19]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing(SoCC’10).
[20]
CXL Consortium. accessed in 2023. Compute Express Link (CXL). https://www.computeexpresslink.org.
[21]
Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. 2014. FaRM: Fast Remote Memory. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI’14).
[22]
Zhuohui Duan, Haikun Liu, Xiaofei Liao, Hai Jin, Wenbin Jiang, and Yu Zhang. 2019. HiNUMA: NUMA-Aware Data Placement and Migration in Hybrid Memory Systems. In Proceedings of the IEEE 37th International Conference on Computer Design (ICCD’19).
[23]
Padmapriya Duraisamy, Wei Xu, Scott Hare, Ravi Rajwar, David Culler, Zhiyi Xu, Jianing Fan, Christopher Kennelly, Bill McCloskey, Danijela Mijailovic, Brian Morris, Chiranjit Mukherjee, Jingliang Ren, Greg Thelen, Paul Turner, Carlos Villavieja, Parthasarathy Ranganathan, and Amin Vahdat. 2023. Towards an Adaptable Systems Architecture for Memory Tiering at Warehouse-Scale. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’23).
[24]
eBPF.io. accessed in 2023. eBPF Documentation. https://ebpf.io/what-is-ebpf/.
[25]
Samsung Eletronics. accessed in 2023. Scalable Memory Development Kit v1.3. https://github.com/OpenMPDK/SMDK.
[26]
Alireza Farshin, Amir Roozbeh, Gerald Q. Maguire Jr., and Dejan Kostić. 2019. Make the Most out of Last Level Cache in Intel Processors. In Proceedings of the 14th European Conference on Computer Systems (EuroSys’19).
[27]
Alireza Farshin, Amir Roozbeh, Gerald Q. Maguire Jr., and Dejan Kostić. 2020. Reexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks. In Proceedings of the USENIX Annual Technical Conference (ATC’20).
[28]
Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, Nayan Katarki, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson, Kelvin Hu, Meghna Pancholi, Yuan He, Brett Clancy, Chris Colen, Fukang Wen, Catherine Leung, Siyuan Wang, Leon Zaruvinsky, Mateo Espinosa, Rick Lin, Zhongling Liu, Jake Padilla, and Christina Delimitrou. 2019. An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems. In Proceedings of the 24th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’19).
[29]
Donghyun Gouk, Miryeong Kwon, Hanyeoreum Bae, Sangwon Lee, and Myoungsoo Jung. 2023. Memory Pooling with CXL. IEEE Micro (2023).
[30]
Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G. Shin. 2017. Efficient Memory Disaggregation with INFINISWAP. In Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation (NSDI’17).
[31]
Manish Gupta, Vilas Sridharan, David Roberts, Andreas Prodromou, Ashish Venkat, Dean Tullsen, and Rajesh Gupta. 2018. Reliability-Aware Data Placement for Heterogeneous Memory Architecture. In Proceedings of the 24th IEEE International Symposium on High Performance Computer Architecture (HPCA’18).
[32]
Rambus Incorporated. accessed in 2023. Memory Interface Chips – CXL Memory Interconnect Initiative. https://www.rambus.com/memory-and-interfaces/cxl-memory-interconnect/.
[33]
Intel Corporation. accessed in 2023. 4th Gen Intel Xeon Processor Scalable Family, sapphire rapids. https://www.intel.com/content/www/us/en/developer/articles/technical/fourth-generation-xeon-scalable-family-overview.html.
[34]
Intel Corporation. accessed in 2023. Difference of Cache Memory Between CPUs for Intel Xeon E5 Processors and Intel Xeon Scalable Processors. https://www.intel.com/content/www/us/en/support/articles/000027820/processors/intel-xeon-processors.html.
[35]
Intel Corporation. accessed in 2023. Intel Launches 4th Gen Xeon Scalable Processors, Max Series CPUs. https://www.intel.com/content/www/us/en/newsroom/news/4th-gen-xeon-scalable-processors-max-series-cpus-gpus.html#gs.o28z2f.
[36]
Intel Corporation. accessed in 2023. Intel Performance Counter Monitor. https://github.com/intel/pcm.
[37]
Intel Corporation. accessed in 2023. Intel Xeon Gold 6430 Processor. https://ark.intel.com/content/www/us/en/ark/products/231737/intel-xeon-gold-6430-processor-60m-cache-2-10-ghz.html.
[38]
Intel Corporation. accessed in 2023. Intel®  Data Direct I/O (DDIO). https://www.intel.com/content/www/us/en/io/data-direct-i-o-technology.html.
[39]
Intel Corporation. accessed in 2023. Intel® 64 and IA-32 Architectures Optimization Reference Manual. https://cdrdv2-public.intel.com/671488/248966-046A-software-optimization-manual.pdf.
[40]
Intel Corporation. accessed in 2023. Intel® Agilex™ 7 FPGA I-Series Development Kit. https://www.intel.com/content/www/us/en/products/details/fpga/development-kits/agilex/i-series/dev-agi027.html.
[41]
Intel Corporation. accessed in 2023. Intel® FPGA Compute Express Link (CXL) IP. https://www.intel.com/content/www/us/en/products/details/fpga/intellectual-property/interface-protocols/cxl-ip.html.
[42]
Intel Corporation. accessed in 2023. Intel® Memory Latency Checker v3.10. https://www.intel.com/content/www/us/en/developer/articles/tool/intelr-memory-latency-checker.html.
[43]
Intel Corporation. accessed in 2023. Performance Analysis Guide for Intel® Core™ i7 Processor and Intel® Xeon™ 5500 processors. https://www.intel.com/content/dam/develop/external/us/en/documents/performance-analysis-guide-181827.pdf.
[44]
JEDEC – Global Standards for the Microelectronics Industry. accessed in 2023. Main Memory: DDR4 & DDR5 SDRAM. https://www.jedec.org/category/technology-focus-area/main-memory-ddr3-ddr4-sdram.
[45]
Olzhas Kaiyrakhmet, Songyi Lee, Beomseok Nam, Sam H. Noh, and Young-ri Choi. 2019. SLM-DB: Single-Level Key-Value Store with Persistent Memory. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST’19).
[46]
Anuj Kalia, David Andersen, and Michael Kaminsky. 2020. Challenges and Solutions for Fast Remote Persistent Memory Access. In Proceedings of the 11th ACM Symposium on Cloud Computing (SoCC’20).
[47]
Sudarsun Kannan, Nitish Bhat, Ada Gavrilovska, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2018. Redesigning LSMs for Nonvolatile Memory with NoveLSM. In Proceedings of the USENIX Annual Technical Conference (ATC’18).
[48]
Sudarsun Kannan, Ada Gavrilovska, Vishal Gupta, and Karsten Schwan. 2017. HeteroOS: OS Design for Heterogeneous Memory Management in Datacenter. In Proceedings of the ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA’17).
[49]
Sudarsun Kannan, Yujie Ren, and Abhishek Bhattacharjee. 2021. KLOCs: Kernel-Level Object Contexts for Heterogeneous Memory Systems. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’21).
[50]
Liu Ke, Udit Gupta, Benjamin Youngjae Cho, David Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Meng Li, Bert Maher, Dheevatsa Mudigere, Maxim Naumov, Martin Schatz, Mikhail Smelyanskiy, Xiaodong Wang, Brandon Reagen, Carole-Jean Wu, Mark Hempstead, and Xuan Zhang. 2020. RecNMP: Accelerating Personalized Recommendation with near-Memory Processing. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA’20).
[51]
Jonghyeon Kim, Wonkyo Choe, and Jeongseob Ahn. 2021. Exploring the Design Space of Page Management for Multi-Tiered Memory Systems. In Proceedings of the USENIX Annual Technical Conference (ATC’21).
[52]
Kyungsan Kim, Hyunseok Kim, Jinin So, Wonjae Lee, Junhyuk Im, Sungjoo Park, Jeonghyeon Cho, and Hoyoung Song. 2023. SMT: Software-Defined Memory Tiering for Heterogeneous Computing Systems With CXL Memory Expander. IEEE Micro (2023).
[53]
Wonbae Kim, Chanyeol Park, Dongui Kim, Hyeongjun Park, Young ri Choi, Alan Sussman, and Beomseok Nam. 2022. ListDB: Union of Write-Ahead Logs and Persistent SkipLists for Incremental Checkpointing on Persistent Memory. In Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI’22).
[54]
Vamsee Reddy Kommareddy, Simon David Hammond, Clayton Hughes, Ahmad Samih, and Amro Awad. 2019. Page Migration Support for Disaggregated Non-Volatile Memories. In Proceedings of the International Symposium on Memory Systems (MEMSYS’19).
[55]
Donghyuk Lee, Mike O’Connor, and Niladrish Chatterjee. 2018. Reducing Data Transfer Energy by Exploiting Similarity within a Data Transaction. In Proceedings of the 24th IEEE International Symposium on High Performance Computer Architecture (HPCA’18).
[56]
Sukhan Lee, Hyunyoon Cho, Young Hoon Son, Yuhwan Ro, Nam Sung Kim, and Jung Ho Ahn. 2018. Leveraging Power-Performance Relationship of Energy-Efficient Modern DRAM Devices. IEEE Access (2018).
[57]
Sukhan Lee, Kiwon Lee, Minchul Sung, Mohammad Alian, Chankyung Kim, Wooyeong Cho, Reum Oh, Seongil O, Jung Ho Ahn, and Nam Sung Kim. 2018. 3D-Xpath: High-Density Managed DRAM Architecture with Cost-Effective Alternative Paths for Memory Transactions. In Proceedings of the 27th ACM International Conference on Parallel Architectures and Compilation Techniques (PACT’18).
[58]
Yejin Lee, Seong Hoon Seo, Hyunji Choi, Hyoung Uk Sul, Soosung Kim, Jae W. Lee, and Tae Jun Ham. 2021. MERCI: Efficient Embedding Reduction on Commodity Hardware via Sub-Query Memoization. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’21).
[59]
Huaicheng Li, Daniel S. Berger, Lisa Hsu, Daniel Ernst, Pantea Zardoshti, Stanko Novakovic, Monish Shah, Samir Rajadnya, Scott Lee, Ishwar Agarwal, Mark D. Hill, Marcus Fontoura, and Ricardo Bianchini. 2023. Pond: CXL-Based Memory Pooling Systems for Cloud Platforms. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’23).
[60]
Kevin Lim, Jichuan Chang, Trevor Mudge, Parthasarathy Ranganathan, Steven K. Reinhardt, and Thomas F. Wenisch. 2009. Disaggregated Memory for Expansion and Sharing in Blade Servers. In Proceedings of the ACM/IEEE 36th Annual International Symposium on Computer Architecture (ISCA’09).
[61]
Lei Liu, Shengjie Yang, Lu Peng, and Xinyu Li. 2019. Hierarchical Hybrid Memory Management in OS for Tiered Memory Systems. IEEE Transactions on Parallel and Distributed Systems (2019).
[62]
Kevin Loughlin, Stefan Saroiu, Alec Wolman, Yatin A Manerkar, and Baris Kasikci. 2022. MOESI-prime: Preventing Coherence-Induced Hammering in Commodity Workloads. In Proceedings of the ACM/IEEE 49th Annual International Symposium on Computer Architecture (ISCA’22).
[63]
Hasan Al Maruf. accessed in 2023. Transparent Page Placement for Tiered-Memory. https://lore.kernel.org/all/[email protected]/.
[64]
Hasan Al Maruf, Hao Wang, Abhishek Dhanotia, Johannes Weiner, Niket Agarwal, Pallab Bhattacharya, Chris Petersen, Mosharaf Chowdhury, Shobhit Kanaujia, and Prakash Chauhan. 2023. TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’23).
[65]
Micron. accessed in 2023. CZ120 memory expansion module. https://www.micron.com/solutions/server/cxl.
[66]
Daniel Molka, Daniel Hackenberg, Robert Schöne, and Wolfgang E Nagel. 2015. Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture. In Proceedings of the 44th International Conference on Parallel Processing (ICPP’15).
[67]
Montage Technology. accessed in 2023. CXL Memory eXpander Controller (MXC). https://www.montage-tech.com/MXC.
[68]
Amanda Raybuck, Tim Stamler, Wei Zhang, Mattan Erez, and Simon Peter. 2021. HeMem: Scalable Tiered Memory Management for Big Data Applications and Real NVM. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP’21).
[69]
Redis Ltd.accessed in 2023. Redis. https://redis.io/.
[70]
Robert Blankenship. 2020. Compute Express Link (CXL): Memory and Cache Protocols. https://snia.org/sites/default/files/SDC/2020/130-Blankenship-CXL-1.1-Protocol-Extensions.pdf.
[71]
Seokhyun Ryu, Sohyun Kim, Jaeyung Jun, Donguk Moon, Kyungsoo Lee, Jungmin Choi, Sunwoong Kim, Hyungsoo Kim, Luke Kim, Won Ha Choi, Moohyeon Nam, Dooyoung Hwang, Hongchan Roh, and Youngpyo Joo. 2023. System Optimization of Data Analytics Platforms using Compute Express Link (CXL) Memory. In Proceedings of the IEEE International Conference on Big Data and Smart Computing (BigComp’23).
[72]
Samsung Semiconductor. accessed in 2023. Memory-Semantic SSD. https://samsungmsl.com/ms-ssd/.
[73]
Debendra Das Sharma. 2022. Compute Express Link (CXL): Enabling Heterogeneous Data-Centric Computing With Heterogeneous Memory Hierarchy. IEEE Micro (2022).
[74]
Joonseop Sim, Soohong Ahn, Taeyoung Ahn, Seungyong Lee, Myunghyun Rhee, Jooyoung Kim, Kwangsik Shin, Donguk Moon, Euiseok Kim, and Kyoung Park. 2023. Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications. IEEE Computer Architecture Letters (2023).
[75]
Tom Simon. 2021. Low Power High Performance PCIe SerDes IP for Samsung Silicon - SemiWiki. https://semiwiki.com/events/305345-low-power-high-performance-pcie-serdes-ip-for-samsung-silicon/.
[76]
Arjun Singhvi, Aditya Akella, Dan Gibson, Thomas F. Wenisch, Monica Wong-Chan, Sean Clark, Milo M. K. Martin, Moray McLaren, Prashant Chandra, Rob Cauble, Hassan M. G. Wassel, Behnam Montazeri, Simon L. Sabato, Joel Scherpelz, and Amin Vahdat. 2020. 1RMA: Re-Envisioning Remote Memory Access for Multi-Tenant Datacenters. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM’20).
[77]
SK hynix Inc. 2022. SK hynix Introduces Industry’s First CXL-based Computational Memory Solution (CMS) at the OCP Global Summit. https://news.skhynix.com/sk-hynix-introduces-industrys-first-cxl-based-cms-at-the-ocp-global-summit/.
[78]
Jingbo Su, Jiahao Li, Luofan Chen, Cheng Li, Kai Zhang, Liang Yang, and Yinlong Xu. 2023. Revitalizing the Forgotten On-Chip DMA to Expedite Data Movement in NVM-based Storage Systems. In Proceedings of the 21st USENIX Conference on File and Storage Technologies (FAST’23).
[79]
Kshitij Sudan, Karthick Rajamani, Wei Huang, and John B. Carter. 2012. Tiered Memory: An Iso-Power Memory Architecture to Address the Memory Power Wall. IEEE Trans. Comput. (2012).
[80]
Amin Tootoonchian, Aurojit Panda, Chang Lan, Melvin Walls, Katerina Argyraki, Sylvia Ratnasamy, and Scott Shenker. 2018. ResQ: Enabling SLOs in Network Function Virtualization. In Proceedings of 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI’18).
[81]
Majed Valad Beigi, Bahareh Pourshirazi, Gokhan Memik, and Zhichun Zhu. 2020. DeepSwapper: A Deep Learning Based Page Swap Management Scheme for Hybrid Memory Systems. In Proceedings of the 29th ACM International Conference on Parallel Architectures and Compilation Techniques (PACT’20).
[82]
Evangelos Vasilakis, Vassilis Papaefstathiou, Pedro Trancoso, and Ioannis Sourdis. 2020. Hybrid2: Combining Caching and Migration in Hybrid Memory Systems. In Proceedings of the 26th IEEE International Symposium on High Performance Computer Architecture (HPCA’20).
[83]
Markus Velten, Robert Schöne, Thomas Ilsche, and Daniel Hackenberg. 2022. Memory Performance of AMD EPYC Rome and Intel Cascade Lake SP Server Processors. In Proceedings of the ACM/SPEC on International Conference on Performance Engineering (ICPE’22).
[84]
Zixuan Wang, Xiao Liu, Jian Yang, Theodore Michailidis, Steven Swanson, and Jishen Zhao. 2020. Characterizing and Modeling Non-Volatile Memory Systems. In Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’20).
[85]
Johannes Weiner. accessed in 2023. [PATCH] mm: mempolicy: N:M interleave policy for tiered memory nodes. https://lore.kernel.org/linux-mm/YqD0%[email protected]/T/.
[86]
Johannes Weiner, Niket Agarwal, Dan Schatzberg, Leon Yang, Hao Wang, Blaise Sanouillet, Bikash Sharma, Tejun Heo, Mayank Jain, Chunqiang Tang, and Dimitrios Skarlatos. 2022. TMO: Transparent Memory Offloading in Datacenters. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’22).
[87]
Kan Wu, Zhihan Guo, Guanzhou Hu, Kaiwei Tu, Ramnatthan Alagappan, Rathijit Sen, Kwanghyun Park, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2021. The Storage Hierarchy is Not a Hierarchy: Optimizing Caching on Modern Storage Devices with Orthus. In Proceedings of the 19th USENIX Conference on File and Storage Technologies (FAST’21).
[88]
Lingfeng Xiang, Xingsheng Zhao, Jia Rao, Song Jiang, and Hong Jiang. 2022. Characterizing the Performance of Intel Optane Persistent Memory: A Close Look at Its on-DIMM Buffering. In Proceedings of the 17th European Conference on Computer Systems (EuroSys’22).
[89]
Zi Yan, Daniel Lustig, David Nellans, and Abhishek Bhattacharjee. 2019. Nimble Page Management for Tiered Memory Systems. In Proceedings of the 24th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’19).
[90]
Jian Yang, Juno Kim, Morteza Hoseinzadeh, Joseph Izraelevitz, and Steven Swanson. 2020. An Empirical Guide to the Behavior and Use of Scalable Persistent Memory. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST’20).
[91]
Yiwei Yang, Pooneh Safayenikoo, Jiacheng Ma, Tanvir Ahmed Khan, and Andrew Quinn. 2023. CXLMemSim: A pure software simulated CXL.mem for performance characterization. arXiv preprint arXiv:2303.06153 (2023).
[92]
Ting Yao, Yiwen Zhang, Jiguang Wan, Qiu Cui, Liu Tang, Hong Jiang, Changsheng Xie, and Xubin He. 2020. MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree Based KV Stores with Matrix Container in NVM. In Proceeings of the USENIX Annual Technical Conference (ATC’20).
[93]
Yifan Yuan, Mohammad Alian, Yipeng Wang, Ren Wang, Ilia Kurakin, Charlie Tai, and Nam Sung Kim. 2021. Don’t Forget the I/O When Allocating Your LLC. In Proceedings of the IEEE/ACM 48th International Symposium on Computer Architecture (ISCA’21).
[94]
Chaoliang Zeng, Layong Luo, Qingsong Ning, Yaodong Han, Yuhang Jiang, Ding Tang, Zilong Wang, Kai Chen, and Chuanxiong Guo. 2022. FAERY: An FPGA-accelerated Embedding-based Retrieval System. In Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI’22).
[95]
Wenhui Zhang, Xingsheng Zhao, Song Jiang, and Hong Jiang. 2021. ChameleonDB: A Key-Value Store for Optane Persistent Memory. In Proceedings of the 26th European Conference on Computer Systems (EuroSys’21).
[96]
Xu Zhang. accessed in 2023. gem5-CXL. https://github.com/zxhero/gem5-CXL.
[97]
Shengan Zheng, Morteza Hoseinzadeh, and Steven Swanson. 2019. Ziggurat: A Tiered File System for Non-Volatile Main Memories and Disks. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST’19).
[98]
Xinjing Zhou, Lidan Shou, Ke Chen, Wei Hu, and Gang Chen. 2019. DPTree: Differential Indexing for Persistent Memory. Proceedings of VLDB Endowment (2019).

Cited By

View all
  • (2024)Tiresias: Optimizing NUMA Performance with CXL Memory and Locality-Aware Process SchedulingProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674411(6-11)Online publication date: 5-Jul-2024
  • (2024)Yggdrasil: Reducing Network I/O Tax with (CXL-Based) Distributed Shared MemoryProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673138(597-606)Online publication date: 12-Aug-2024
  • (2024)An Introduction to the Compute Express Link (CXL) InterconnectACM Computing Surveys10.1145/366990056:11(1-37)Online publication date: 8-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture
October 2023
1528 pages
ISBN:9798400703294
DOI:10.1145/3613424
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2023

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Compute eXpress Link
  2. measurement
  3. tiered-memory management

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

MICRO '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2,863
  • Downloads (Last 6 weeks)254
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Tiresias: Optimizing NUMA Performance with CXL Memory and Locality-Aware Process SchedulingProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674411(6-11)Online publication date: 5-Jul-2024
  • (2024)Yggdrasil: Reducing Network I/O Tax with (CXL-Based) Distributed Shared MemoryProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673138(597-606)Online publication date: 12-Aug-2024
  • (2024)An Introduction to the Compute Express Link (CXL) InterconnectACM Computing Surveys10.1145/366990056:11(1-37)Online publication date: 8-Jul-2024
  • (2024)So Far and yet so Near - Accelerating Distributed Joins with CXLProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663449(1-9)Online publication date: 10-Jun-2024
  • (2024)Fastmove: A Comprehensive Study of On-Chip DMA and its Demonstration for Accelerating Data Movement in NVM-based Storage SystemsACM Transactions on Storage10.1145/365647720:3(1-30)Online publication date: 6-Jun-2024
  • (2024)Breaking Barriers: Expanding GPU Memory with Sub-Two Digit Nanosecond Latency CXL ControllerProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665953(108-115)Online publication date: 8-Jul-2024
  • (2024)OMB-CXL: A Micro-Benchmark Suite for Evaluating MPI Communication Utilizing Compute Express Link Memory DevicesPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670533(1-8)Online publication date: 17-Jul-2024
  • (2024)The Breakthrough Memory Solutions for Improved Performance on LLM InferenceIEEE Micro10.1109/MM.2024.337535244:3(40-48)Online publication date: May-2024
  • (2024)Enabling Efficient Large Recommendation Model Training with Near CXL Memory Processing2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00036(382-395)Online publication date: 29-Jun-2024
  • (2024)Synergizing CXL with Unified Memory for Scalable GPU Memory Expansion2024 International Conference on Electronics, Information, and Communication (ICEIC)10.1109/ICEIC61013.2024.10457110(1-4)Online publication date: 28-Jan-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media