research-article

Open access

Exploring Performance and Cost Optimization with ASIC-Based CXL Memory

Authors:

Jianjun ChenAuthors Info & Claims

EuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systems

Pages 818 - 833

https://doi.org/10.1145/3627703.3650061

Published: 22 April 2024 Publication History

Abstract

As memory-intensive applications continue to drive the need for advanced architectural solutions, Compute Express Link (CXL) has risen as a promising interconnect technology that enables seamless high-speed, low-latency communication between host processors and various peripheral devices. In this study, we explore the application performance of ASIC CXL memory in various data-center scenarios. We then further explore multiple potential impacts (e.g., throughput, latency, and cost reduction) of employing CXL memory via carefully designed policies and strategies. Our empirical results show the high potential of CXL memory, reveal multiple intriguing observations of CXL memory and contribute to the wide adoption of CXL memory in real-world deployment environments. Based on our benchmarks, we also develop an Abstract Cost Model that can estimate the cost benefit from using CXL memory.

References

[1]

Dez Blanchfield. The cloud native convergence: A new era of data-intensive applications. https://elnion.com/2023/06/05/the-cloud-native-convergence-a-new-era-of-data-intensive- applications/.

[2]

Ahmed Abulila, Vikram Sharma Mailthody, Zaid Qureshi, Jian Huang, Nam Sung Kim, Jinjun Xiong, and Wen-mei Hwu. Flatflash: Exploiting the byte-accessibility of ssds within a unified memory-storage hierarchy. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '19, page 971--985, New York, NY, USA, 2019. Association for Computing Machinery.

Digital Library

[3]

Shao-Peng Yang, Minjae Kim, Sanghyun Nam, Juhyung Park, Jin yong Choi, Eyee Hyun Nam, Eunji Lee, Sungjin Lee, and Bryan S. Kim. Overcoming the memory wall with CXL-Enabled SSDs. In 2023 USENIX Annual Technical Conference (USENIX ATC 23), pages 601--617, Boston, MA, July 2023. USENIX Association.

[4]

Compute Express Link (CXL). https://www.computeexpresslink.org/.

[5]

Huaicheng Li, Daniel S Berger, Stanko Novakovic, Lisa Hsu, Dan Ernst, Pantea Zardoshti, Monish Shah, Ishwar Agarwal, Mark Hill, Marcus Fontoura, et al. First-generation memory disaggregation for cloud platforms. arXiv preprint arXiv:2203.00241, 2022.

[6]

Albert Cho, Anish Saxena, Moinuddin Qureshi, and Alexandros Daglis. A case for cxl-centric server processors, 2023.

[7]

Leo Memory Connectivity Platform for CXL 1.1 and 2.0. https://www.asteralabs.com/wp-content/uploads/2022/08/Astera_Labs_Leo_Aurora_Product_FINAL.pdf.

[8]

Huaicheng Li, Daniel S. Berger, Stanko Novakovic, Lisa R. Hsu, Dan Ernst, Pantea Zardoshti, Monish Shah, Samir Rajadnya, Scott Lee, Ishwar Agarwal, Mark D. Hill, Marcus Fontoura, and Ricardo Bianchini. Pond: Cxl-based memory pooling systems for cloud platforms. Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, 2022.

[9]

Hasan Al Maruf, Hao Wang, Abhishek Dhanotia, Johannes Weiner, Niket Agarwal, Pallab Bhattacharya, Chris Petersen, Mosharaf Chowdhury, Shobhit Kanaujia, and Prakash Chauhan. Tpp: Transparent page placement for cxl-enabled tiered-memory. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, ASPLOS 2023, page 742--755, New York, NY, USA, 2023. Association for Computing Machinery.

Digital Library

[10]

Daniel S. Berger, Daniel Ernst, Huaicheng Li, Pantea Zardoshti, Monish Shah, Samir Rajadnya, Scott Lee, Lisa Hsu, Ishwar Agarwal, Mark D. Hill, and Ricardo Bianchini. Design tradeoffs in cxl-based memory pools for public cloud platforms. IEEE Micro, 43(2):30--38, 2023.

Digital Library

[11]

Yan Sun, Yifan Yuan, Zeduo Yu, Reese Kuper, Chihun Song, Jinghan Huang, Houxiang Ji, Siddharth Agarwal, Jiaqi Lou, Ipoom Jeong, Ren Wang, Jung Ho Ahn, Tianyin Xu, and Nam Sung Kim. Demystifying cxl memory with genuine cxl-ready systems and devices. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '23, page 105--121, New York, NY, USA, 2023. Association for Computing Machinery.

Digital Library

[12]

Donghyun Gouk, Sangwon Lee, Miryeong Kwon, and Myoungsoo Jung. Direct access, High-Performance memory disaggregation with DirectCXL. In 2022 USENIX Annual Technical Conference (USENIX ATC 22), pages 287--294, Carlsbad, CA, July 2022. USENIX Association.

[13]

Kyungsan Kim, Hyunseok Kim, Jinin So, Wonjae Lee, Junhyuk Im, Sungjoo Park, Jeonghyeon Cho, and Hoyoung Song. Smt: Software-defined memory tiering for heterogeneous computing systems with cxl memory expander. IEEE Micro, 43(2):20--29, 2023.

Digital Library

[14]

Daniel S. Berger, Daniel Ernst, Huaicheng Li, Pantea Zardoshti, Monish Shah, Samir Rajadnya, Scott Lee, Lisa Hsu, Ishwar Agarwal, Mark D. Hill, and Ricardo Bianchini. Design tradeoffs in cxl-based memory pools for public cloud platforms. IEEE Micro, 43(2):30--38, 2023.

Digital Library

[15]

Debendra Das Sharma. Compute express link (cxl): Enabling heterogeneous data-centric computing with heterogeneous memory hierarchy. IEEE Micro, 43(2):99--109, 2022.

Digital Library

[16]

What Are PCIe 4.0 and 5.0? https://www.intel.com/content/www/us/en/gaming/resources/what-is-pcie-4-and-why-does-it-matter.html.

[17]

Debendra Das Sharma, Robert Blankenship, and Daniel S. Berger. An introduction to the compute express link (cxl) interconnect, 2023.

[18]

Intel Corporation. Intel launches 4th gen xeon scalable processors, max series cpus. https://www.intel. com/content/www/us/en/newsroom/news/.

[19]

AMD Unveils Zen 4 CPU Roadmap: 96-Core 5nm Genoa in 2022, 128-Core Bergamo in 2023. https://wccftech.com/intel-clearwater-forest-e-core-xeon-cpus-up-to-288-cores-higher-ipc-more-cache/.

[20]

Montage Technology. Cxl memory expander controller (mxc). https://www.montage-tech.com/MXC, accessedin2023.

[21]

CZ120 memory expansion module. https://www.micron.com/products/memory/cxl-memory.

[22]

Minseon Ahn, Andrew Chang, Donghun Lee, Jongmin Gim, Jungmin Kim, Jaemin Jung, Oliver Rebholz, Vincent Pham, Krishna Malladi, and Yang Seok Ki. Enabling cxl memory expansion for in-memory database management systems. In Proceedings of the 18th International Workshop on Data Management on New Hardware, DaMoN '22, New York, NY, USA, 2022. Association for Computing Machinery.

Digital Library

[23]

Intel Corporation. Intel Agilex® 7 FPGA and SoC FPGA I-Series. https://www.intel.com/content/www/us/en/products/details/fpga/agilex/7/i-series.html.

[24]

Ian Kuon and Jonathan Rose. Measuring the gap between fpgas and asics. In Proceedings of the 2006 ACM/SIGDA 14th International Symposium on Field Programmable Gate Arrays, FPGA '06, page 21--30, New York, NY, USA, 2006. Association for Computing Machinery.

Digital Library

[25]

J. Weiner. [PATCH] mm: mempolicy: N:M interleave policy for tiered memory nodes. https://lore.kernel.org/linux-mm/YqD0%[email protected]/T/.

[26]

NUMA balancing: optimize memory placement for memory tiering system. https://lore.kernel.org/linux-mm/[email protected]/.

[27]

Tiered memory: Hot page selection. https://lore.kernel.org/lkml/[email protected]/T/.

[28]

Transparent Page Placement for Tiered-Memory. https://lore.kernel.org/all/[email protected]/.

[29]

David L Mulnix. Intel® Xeon® Processor Scalable Family Technical Overview. https://www.intel.com/content/www/us/en/developer/articles/technical/xeon-processor-scalable-family-technical-overview.html.

[30]

A. Cho and et al. A Case for CXL-Centric Server Processors. https://arxiv.org/abs/2305.05033.

[31]

Jifei Yi, Benchao Dong, Mingkai Dong, Ruizhe Tong, and Haibo Chen. MT2: Memory bandwidth regulation on hybrid NVM/DRAM platforms. In 20th USENIX Conference on File and Storage Technologies (FAST 22), pages 199--216, Santa Clara, CA, February 2022. USENIX Association.

[32]

Intel Corporation. Intel® Performance Counter Monitor (Intel® PCM). https://github.com/intel/pcm.

[33]

Intel Corporation. Intel Unveils Future-Generation Xeon with Robust Performance and Efficiency Architectures. https://www.intel.com/content/www/us/en/newsroom/news/intel-unveils-future-generation-xeon.html.

[34]

Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10, page 143--154, New York, NY, USA, 2010. Association for Computing Machinery.

Digital Library

[35]

Redis. https://redis.io/.

[36]

Tecton.ai. Managing your Redis Cluster. https://docs.tecton.ai/docs/0.5/setting-up-tecton/setting-up-other-components/managing- your-redis-cluster.

[37]

Google Cloud. Memory management best practices. https://cloud.google.com/memorystore/docs/redis/memory-management-best-practices.

[38]

Redis enterprise. https://redis.io/docs/about/redis-enterprise/, 2023.

[39]

Auto Tiering Extend Redis Enterprise databases beyond DRAM limits. https://redis.com/redis-enterprise/technology/auto-tiering/#:~:text=Redis%20Enterprise's%20auto%20tiering%20lets,compared%20to%20only%20DRAM%20deployments.

[40]

Xuechen Zhang, Ujjwal Khanal, Xinghui Zhao, and Stephen Ficklin. Making sense of performance in in-memory computing frameworks for scientific data analysis: A case study of the spark system. J. Parallel Distrib. Comput., 120(C):369--382, oct 2018.

Digital Library

[41]

Apache Spark. Unified engine for large-scale data analytics. https://spark.apache.org/.

[42]

Chen Zou, Hui Zhang, Andrew A. Chien, and Yang Seok Ki. Psacs: Highly-parallel shuffle accelerator on computational storage. In 2021 IEEE 39th International Conference on Computer Design (ICCD), pages 480--487, 2021.

[43]

TPC-H is a Decision Support Benchmark. https://www.tpc.org/tpch/.

[44]

Ice Lake SP: Overview and technical documentation. (n.d.). Intel. ht tps://www.intel.com/content/www/us/en/products/platforms/details/ice-lake-sp.html.

[45]

4th Gen Intel Xeon Processor Scalable Family, sapphire rapids. (n.d.). Intel. https: //www.intel.com/content/www/us/en/developer/articles/technical/fourth-generation-xeon-scalable-family-overview.html#gs.3m5uv2.

[46]

McDowell, S. (2023, December 18). Intel launches 5th generation "Emerald Rapids" Xeon processors. Forbes. https://www.forbes.com/sites/stevemcdowell/2023/12/17/intel-launches-5th-generation-emerald-rapids-xeon-processors/.

[47]

Kennedy, Patrick. "Intel Shows Granite Rapids and Sierra Forest Motherboards at OCP Summit 2023." ServeTheHome, 26 Oct. 2023,. www.servethehome.com/intel-shows-granite-rapids-and-sierra-forest-motherboards-at-ocp-summit-2023-qct-wistron.

[48]

Mujtaba, H. (2023, December 1). Intel Clearwater Forest E-Core Only Xeon CPUs to offer up to 288 cores. https://wccftech.com/intel-clearwater-forest-e-core-xeon-cpus-up-to-288-cores-higher-ipc-more-cache/.

[49]

Sangho Yi, Derrick Kondo, and Artur Andrzejak. Reducing costs of spot instances via checkpointing in the amazon elastic compute cloud. In 2010 IEEE 3rd International Conference on Cloud Computing, pages 236--243, 2010.

Digital Library

[50]

Amazon EC2 M7a Instances. https://aws.amazon.com/ec2/instance-types/m7a/, 2023.

[51]

Amazon EC2 M7i Instances. https://aws.amazon.com/ec2/instance-types/m7i/, 2023.

[52]

Intel Shows Granite Rapids and Sierra Forest Motherboards at OCP Summit 2023. https://www.servethehome.com/intel-shows-granite-rapids-and-sierra-forest-motherboards-at-ocp-summit-2023-qct-wistron/.

[53]

Elastic Compute Service, Volcano Engine, Bytedance. https://www.volcengine.com/product/ecs.

[54]

G. Wong D. Patel. GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE. https://www.semianalysis.com/p/gpt-4-architecture-infrastructure, 2023.

[55]

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention, 2023.

[56]

Lightllm A Light and Fast Inference Service for LLM. https://github.com/ModelTC/lightllm.

[57]

Julien Simon.Smaller is Better: Q8-Chat LLM is an Efficient Generative AI Experience on Intel® Xeon® Processors. https://www.intel.com/content/www/us/en/developer/articles/case-study/q8-chat-efficient-generative-ai-experience-xeon.html.

[58]

Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Bradbury, Anselm Levskaya, Jonathan Heek, Kefan Xiao, Shivani Agrawal, and Jeff Dean. Efficiently scaling transformer inference, 2022.

[59]

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Principles, SOSP '23, page 611--626, New York, NY, USA, 2023. Association for Computing Machinery.

Digital Library

[60]

Alpaca: A Strong, Replicable Instruction-Following Model. https://crfm.stanford.edu/2023/03/13/alpaca.html.

[61]

Nvidia GH200 Datasheet. https://resources.nvidia.com/en-us-dgx-gh200/nvidia-grace-hopper-superchip-datasheet.

[62]

Apple Introduces M2 Ultra. https://www.apple.com/newsroom/2023/06/apple-introduces-m2-ultra/.

[63]

N. Jouppi and et al. Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings. In Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023.

Digital Library

[64]

PCI-SIG explores an optical interconnect for higher PCIe performance. https://www.eenewseurope.com/en/pci-sig-explores-an-optical-connections-for-higher-pcie-performance/.

[65]

Ultra Ethernet Consortium. https://ultraethernet.org/.

[66]

Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. LegoOS: A disseminated, distributed OS for hardware resource disaggregation. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 69--87, Carlsbad, CA, October 2018. USENIX Association.

[67]

Seung-seob Lee, Yanpeng Yu, Yupeng Tang, Anurag Khandelwal, Lin Zhong, and Abhishek Bhattacharjee. Mind: In-network memory management for disaggregated data centers. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, SOSP '21, page 488--504, New York, NY, USA, 2021. Association for Computing Machinery.

[68]

Qirui Yang, Runyu Jin, Bridget Davis, Devasena Inupakutika, and Ming Zhao. Performance evaluation on cxl-enabled hybrid memory pool. In 2022 IEEE International Conference on Networking, Architecture and Storage (NAS), pages 1--5, 2022.

Cited By

Yin JWu FSu H(2024)An Image-Retrieval Method Based on Cross-Hardware Platform FeaturesApplied System Innovation10.3390/asi70400647:4(64)Online publication date: 23-Jul-2024
https://doi.org/10.3390/asi7040064
Wu JLiu JKestor GGioiosa RLi DMarquez A(2024)Performance Study of CXL Memory TopologyProceedings of the International Symposium on Memory Systems10.1145/3695794.3695809(172-177)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3695794.3695809
Wang HDai HChen SChen G(2024)Rethinking Hash Tables: Challenges and Opportunities with Compute Express Link (CXL)Proceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674418(23-27)Online publication date: 5-Jul-2024
https://dl.acm.org/doi/10.1145/3674399.3674418
Show More Cited By

Index Terms

Exploring Performance and Cost Optimization with ASIC-Based CXL Memory

Recommendations

TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3

The increasing demand for memory in hyperscale applications has led to memory becoming a large portion of the overall datacenter spend. The emergence of coherent interfaces like CXL enables main memory expansion and offers an efficient solution to ...
File-Based Memory Management for Non-volatile Main Memory
COMPSAC '13: Proceedings of the 2013 IEEE 37th Annual Computer Software and Applications Conference

Active research and development efforts on byte addressable non-volatile (NV) memory technologies, such as STT-RAM, PCM, and ReRAM, have been conducted in recent years. Because they are byte addressable, they can be used as main memory by directly ...
Multi-level queue NVM/DRAM hybrid memory management with language runtime support
RACS '15: Proceedings of the 2015 Conference on research in adaptive and convergent systems

Non-volatile memory devices (NVM) devices, such as PCM, STT-MRAM, and ReRAM, enable the integration of secondary storage into main memory. This integration reduces I/O access to slow block devices; however, it is currently unrealistic to construct a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

EuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systems

April 2024

1245 pages

ISBN:9798400704376

DOI:10.1145/3627703

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

EuroSys '24

Sponsor:

SIGOPS

EuroSys '24: Nineteenth European Conference on Computer Systems

April 22 - 25, 2024

Athens, Greece

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25

Sponsor:
sigops

Twentieth European Conference on Computer Systems

March 30 - April 3, 2025

Rotterdam , Netherlands

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
3,745
Total Downloads

Downloads (Last 12 months)3,745
Downloads (Last 6 weeks)463

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yin JWu FSu H(2024)An Image-Retrieval Method Based on Cross-Hardware Platform FeaturesApplied System Innovation10.3390/asi70400647:4(64)Online publication date: 23-Jul-2024
https://doi.org/10.3390/asi7040064
Wu JLiu JKestor GGioiosa RLi DMarquez A(2024)Performance Study of CXL Memory TopologyProceedings of the International Symposium on Memory Systems10.1145/3695794.3695809(172-177)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3695794.3695809
Wang HDai HChen SChen G(2024)Rethinking Hash Tables: Challenges and Opportunities with Compute Express Link (CXL)Proceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674418(23-27)Online publication date: 5-Jul-2024
https://dl.acm.org/doi/10.1145/3674399.3674418
Tang WAi TWu J(2024)Tiresias: Optimizing NUMA Performance with CXL Memory and Locality-Aware Process SchedulingProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674411(6-11)Online publication date: 5-Jul-2024
https://dl.acm.org/doi/10.1145/3674399.3674411

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents