Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3627703.3650061acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Open access

Exploring Performance and Cost Optimization with ASIC-Based CXL Memory

Published: 22 April 2024 Publication History

Abstract

As memory-intensive applications continue to drive the need for advanced architectural solutions, Compute Express Link (CXL) has risen as a promising interconnect technology that enables seamless high-speed, low-latency communication between host processors and various peripheral devices. In this study, we explore the application performance of ASIC CXL memory in various data-center scenarios. We then further explore multiple potential impacts (e.g., throughput, latency, and cost reduction) of employing CXL memory via carefully designed policies and strategies. Our empirical results show the high potential of CXL memory, reveal multiple intriguing observations of CXL memory and contribute to the wide adoption of CXL memory in real-world deployment environments. Based on our benchmarks, we also develop an Abstract Cost Model that can estimate the cost benefit from using CXL memory.

References

[1]
Dez Blanchfield. The cloud native convergence: A new era of data-intensive applications. https://elnion.com/2023/06/05/the-cloud-native-convergence-a-new-era-of-data-intensive- applications/.
[2]
Ahmed Abulila, Vikram Sharma Mailthody, Zaid Qureshi, Jian Huang, Nam Sung Kim, Jinjun Xiong, and Wen-mei Hwu. Flatflash: Exploiting the byte-accessibility of ssds within a unified memory-storage hierarchy. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '19, page 971--985, New York, NY, USA, 2019. Association for Computing Machinery.
[3]
Shao-Peng Yang, Minjae Kim, Sanghyun Nam, Juhyung Park, Jin yong Choi, Eyee Hyun Nam, Eunji Lee, Sungjin Lee, and Bryan S. Kim. Overcoming the memory wall with CXL-Enabled SSDs. In 2023 USENIX Annual Technical Conference (USENIX ATC 23), pages 601--617, Boston, MA, July 2023. USENIX Association.
[4]
Compute Express Link (CXL). https://www.computeexpresslink.org/.
[5]
Huaicheng Li, Daniel S Berger, Stanko Novakovic, Lisa Hsu, Dan Ernst, Pantea Zardoshti, Monish Shah, Ishwar Agarwal, Mark Hill, Marcus Fontoura, et al. First-generation memory disaggregation for cloud platforms. arXiv preprint arXiv:2203.00241, 2022.
[6]
Albert Cho, Anish Saxena, Moinuddin Qureshi, and Alexandros Daglis. A case for cxl-centric server processors, 2023.
[7]
Leo Memory Connectivity Platform for CXL 1.1 and 2.0. https://www.asteralabs.com/wp-content/uploads/2022/08/Astera_Labs_Leo_Aurora_Product_FINAL.pdf.
[8]
Huaicheng Li, Daniel S. Berger, Stanko Novakovic, Lisa R. Hsu, Dan Ernst, Pantea Zardoshti, Monish Shah, Samir Rajadnya, Scott Lee, Ishwar Agarwal, Mark D. Hill, Marcus Fontoura, and Ricardo Bianchini. Pond: Cxl-based memory pooling systems for cloud platforms. Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, 2022.
[9]
Hasan Al Maruf, Hao Wang, Abhishek Dhanotia, Johannes Weiner, Niket Agarwal, Pallab Bhattacharya, Chris Petersen, Mosharaf Chowdhury, Shobhit Kanaujia, and Prakash Chauhan. Tpp: Transparent page placement for cxl-enabled tiered-memory. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, ASPLOS 2023, page 742--755, New York, NY, USA, 2023. Association for Computing Machinery.
[10]
Daniel S. Berger, Daniel Ernst, Huaicheng Li, Pantea Zardoshti, Monish Shah, Samir Rajadnya, Scott Lee, Lisa Hsu, Ishwar Agarwal, Mark D. Hill, and Ricardo Bianchini. Design tradeoffs in cxl-based memory pools for public cloud platforms. IEEE Micro, 43(2):30--38, 2023.
[11]
Yan Sun, Yifan Yuan, Zeduo Yu, Reese Kuper, Chihun Song, Jinghan Huang, Houxiang Ji, Siddharth Agarwal, Jiaqi Lou, Ipoom Jeong, Ren Wang, Jung Ho Ahn, Tianyin Xu, and Nam Sung Kim. Demystifying cxl memory with genuine cxl-ready systems and devices. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '23, page 105--121, New York, NY, USA, 2023. Association for Computing Machinery.
[12]
Donghyun Gouk, Sangwon Lee, Miryeong Kwon, and Myoungsoo Jung. Direct access, High-Performance memory disaggregation with DirectCXL. In 2022 USENIX Annual Technical Conference (USENIX ATC 22), pages 287--294, Carlsbad, CA, July 2022. USENIX Association.
[13]
Kyungsan Kim, Hyunseok Kim, Jinin So, Wonjae Lee, Junhyuk Im, Sungjoo Park, Jeonghyeon Cho, and Hoyoung Song. Smt: Software-defined memory tiering for heterogeneous computing systems with cxl memory expander. IEEE Micro, 43(2):20--29, 2023.
[14]
Daniel S. Berger, Daniel Ernst, Huaicheng Li, Pantea Zardoshti, Monish Shah, Samir Rajadnya, Scott Lee, Lisa Hsu, Ishwar Agarwal, Mark D. Hill, and Ricardo Bianchini. Design tradeoffs in cxl-based memory pools for public cloud platforms. IEEE Micro, 43(2):30--38, 2023.
[15]
Debendra Das Sharma. Compute express link (cxl): Enabling heterogeneous data-centric computing with heterogeneous memory hierarchy. IEEE Micro, 43(2):99--109, 2022.
[16]
What Are PCIe 4.0 and 5.0? https://www.intel.com/content/www/us/en/gaming/resources/what-is-pcie-4-and-why-does-it-matter.html.
[17]
Debendra Das Sharma, Robert Blankenship, and Daniel S. Berger. An introduction to the compute express link (cxl) interconnect, 2023.
[18]
Intel Corporation. Intel launches 4th gen xeon scalable processors, max series cpus. https://www.intel. com/content/www/us/en/newsroom/news/.
[19]
AMD Unveils Zen 4 CPU Roadmap: 96-Core 5nm Genoa in 2022, 128-Core Bergamo in 2023. https://wccftech.com/intel-clearwater-forest-e-core-xeon-cpus-up-to-288-cores-higher-ipc-more-cache/.
[20]
Montage Technology. Cxl memory expander controller (mxc). https://www.montage-tech.com/MXC, accessedin2023.
[21]
CZ120 memory expansion module. https://www.micron.com/products/memory/cxl-memory.
[22]
Minseon Ahn, Andrew Chang, Donghun Lee, Jongmin Gim, Jungmin Kim, Jaemin Jung, Oliver Rebholz, Vincent Pham, Krishna Malladi, and Yang Seok Ki. Enabling cxl memory expansion for in-memory database management systems. In Proceedings of the 18th International Workshop on Data Management on New Hardware, DaMoN '22, New York, NY, USA, 2022. Association for Computing Machinery.
[23]
Intel Corporation. Intel Agilex® 7 FPGA and SoC FPGA I-Series. https://www.intel.com/content/www/us/en/products/details/fpga/agilex/7/i-series.html.
[24]
Ian Kuon and Jonathan Rose. Measuring the gap between fpgas and asics. In Proceedings of the 2006 ACM/SIGDA 14th International Symposium on Field Programmable Gate Arrays, FPGA '06, page 21--30, New York, NY, USA, 2006. Association for Computing Machinery.
[25]
J. Weiner. [PATCH] mm: mempolicy: N:M interleave policy for tiered memory nodes. https://lore.kernel.org/linux-mm/YqD0%[email protected]/T/.
[26]
NUMA balancing: optimize memory placement for memory tiering system. https://lore.kernel.org/linux-mm/[email protected]/.
[27]
Tiered memory: Hot page selection. https://lore.kernel.org/lkml/[email protected]/T/.
[28]
Transparent Page Placement for Tiered-Memory. https://lore.kernel.org/all/[email protected]/.
[29]
David L Mulnix. Intel® Xeon® Processor Scalable Family Technical Overview. https://www.intel.com/content/www/us/en/developer/articles/technical/xeon-processor-scalable-family-technical-overview.html.
[30]
A. Cho and et al. A Case for CXL-Centric Server Processors. https://arxiv.org/abs/2305.05033.
[31]
Jifei Yi, Benchao Dong, Mingkai Dong, Ruizhe Tong, and Haibo Chen. MT2: Memory bandwidth regulation on hybrid NVM/DRAM platforms. In 20th USENIX Conference on File and Storage Technologies (FAST 22), pages 199--216, Santa Clara, CA, February 2022. USENIX Association.
[32]
Intel Corporation. Intel® Performance Counter Monitor (Intel® PCM). https://github.com/intel/pcm.
[33]
Intel Corporation. Intel Unveils Future-Generation Xeon with Robust Performance and Efficiency Architectures. https://www.intel.com/content/www/us/en/newsroom/news/intel-unveils-future-generation-xeon.html.
[34]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10, page 143--154, New York, NY, USA, 2010. Association for Computing Machinery.
[35]
Redis. https://redis.io/.
[36]
Tecton.ai. Managing your Redis Cluster. https://docs.tecton.ai/docs/0.5/setting-up-tecton/setting-up-other-components/managing- your-redis-cluster.
[37]
Google Cloud. Memory management best practices. https://cloud.google.com/memorystore/docs/redis/memory-management-best-practices.
[38]
Redis enterprise. https://redis.io/docs/about/redis-enterprise/, 2023.
[39]
Auto Tiering Extend Redis Enterprise databases beyond DRAM limits. https://redis.com/redis-enterprise/technology/auto-tiering/#:~:text=Redis%20Enterprise's%20auto%20tiering%20lets,compared%20to%20only%20DRAM%20deployments.
[40]
Xuechen Zhang, Ujjwal Khanal, Xinghui Zhao, and Stephen Ficklin. Making sense of performance in in-memory computing frameworks for scientific data analysis: A case study of the spark system. J. Parallel Distrib. Comput., 120(C):369--382, oct 2018.
[41]
Apache Spark. Unified engine for large-scale data analytics. https://spark.apache.org/.
[42]
Chen Zou, Hui Zhang, Andrew A. Chien, and Yang Seok Ki. Psacs: Highly-parallel shuffle accelerator on computational storage. In 2021 IEEE 39th International Conference on Computer Design (ICCD), pages 480--487, 2021.
[43]
TPC-H is a Decision Support Benchmark. https://www.tpc.org/tpch/.
[44]
Ice Lake SP: Overview and technical documentation. (n.d.). Intel. ht tps://www.intel.com/content/www/us/en/products/platforms/details/ice-lake-sp.html.
[45]
4th Gen Intel Xeon Processor Scalable Family, sapphire rapids. (n.d.). Intel. https: //www.intel.com/content/www/us/en/developer/articles/technical/fourth-generation-xeon-scalable-family-overview.html#gs.3m5uv2.
[46]
McDowell, S. (2023, December 18). Intel launches 5th generation "Emerald Rapids" Xeon processors. Forbes. https://www.forbes.com/sites/stevemcdowell/2023/12/17/intel-launches-5th-generation-emerald-rapids-xeon-processors/.
[47]
Kennedy, Patrick. "Intel Shows Granite Rapids and Sierra Forest Motherboards at OCP Summit 2023." ServeTheHome, 26 Oct. 2023,. www.servethehome.com/intel-shows-granite-rapids-and-sierra-forest-motherboards-at-ocp-summit-2023-qct-wistron.
[48]
Mujtaba, H. (2023, December 1). Intel Clearwater Forest E-Core Only Xeon CPUs to offer up to 288 cores. https://wccftech.com/intel-clearwater-forest-e-core-xeon-cpus-up-to-288-cores-higher-ipc-more-cache/.
[49]
Sangho Yi, Derrick Kondo, and Artur Andrzejak. Reducing costs of spot instances via checkpointing in the amazon elastic compute cloud. In 2010 IEEE 3rd International Conference on Cloud Computing, pages 236--243, 2010.
[50]
Amazon EC2 M7a Instances. https://aws.amazon.com/ec2/instance-types/m7a/, 2023.
[51]
Amazon EC2 M7i Instances. https://aws.amazon.com/ec2/instance-types/m7i/, 2023.
[52]
Intel Shows Granite Rapids and Sierra Forest Motherboards at OCP Summit 2023. https://www.servethehome.com/intel-shows-granite-rapids-and-sierra-forest-motherboards-at-ocp-summit-2023-qct-wistron/.
[53]
Elastic Compute Service, Volcano Engine, Bytedance. https://www.volcengine.com/product/ecs.
[54]
G. Wong D. Patel. GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE. https://www.semianalysis.com/p/gpt-4-architecture-infrastructure, 2023.
[55]
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention, 2023.
[56]
Lightllm A Light and Fast Inference Service for LLM. https://github.com/ModelTC/lightllm.
[57]
Julien Simon.Smaller is Better: Q8-Chat LLM is an Efficient Generative AI Experience on Intel® Xeon® Processors. https://www.intel.com/content/www/us/en/developer/articles/case-study/q8-chat-efficient-generative-ai-experience-xeon.html.
[58]
Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Bradbury, Anselm Levskaya, Jonathan Heek, Kefan Xiao, Shivani Agrawal, and Jeff Dean. Efficiently scaling transformer inference, 2022.
[59]
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Principles, SOSP '23, page 611--626, New York, NY, USA, 2023. Association for Computing Machinery.
[60]
Alpaca: A Strong, Replicable Instruction-Following Model. https://crfm.stanford.edu/2023/03/13/alpaca.html.
[61]
Nvidia GH200 Datasheet. https://resources.nvidia.com/en-us-dgx-gh200/nvidia-grace-hopper-superchip-datasheet.
[62]
Apple Introduces M2 Ultra. https://www.apple.com/newsroom/2023/06/apple-introduces-m2-ultra/.
[63]
N. Jouppi and et al. Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings. In Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023.
[64]
PCI-SIG explores an optical interconnect for higher PCIe performance. https://www.eenewseurope.com/en/pci-sig-explores-an-optical-connections-for-higher-pcie-performance/.
[65]
Ultra Ethernet Consortium. https://ultraethernet.org/.
[66]
Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. LegoOS: A disseminated, distributed OS for hardware resource disaggregation. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 69--87, Carlsbad, CA, October 2018. USENIX Association.
[67]
Seung-seob Lee, Yanpeng Yu, Yupeng Tang, Anurag Khandelwal, Lin Zhong, and Abhishek Bhattacharjee. Mind: In-network memory management for disaggregated data centers. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, SOSP '21, page 488--504, New York, NY, USA, 2021. Association for Computing Machinery.
[68]
Qirui Yang, Runyu Jin, Bridget Davis, Devasena Inupakutika, and Ming Zhao. Performance evaluation on cxl-enabled hybrid memory pool. In 2022 IEEE International Conference on Networking, Architecture and Storage (NAS), pages 1--5, 2022.

Cited By

View all
  • (2024)An Image-Retrieval Method Based on Cross-Hardware Platform FeaturesApplied System Innovation10.3390/asi70400647:4(64)Online publication date: 23-Jul-2024
  • (2024)Rethinking Hash Tables: Challenges and Opportunities with Compute Express Link (CXL)Proceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674418(23-27)Online publication date: 5-Jul-2024
  • (2024)Tiresias: Optimizing NUMA Performance with CXL Memory and Locality-Aware Process SchedulingProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674411(6-11)Online publication date: 5-Jul-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systems
April 2024
1245 pages
ISBN:9798400704376
DOI:10.1145/3627703
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2024

Check for updates

Author Tags

  1. CXL-Memory
  2. Datacenters
  3. Memory Management
  4. Operating Systems
  5. measurement

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EuroSys '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3,313
  • Downloads (Last 6 weeks)717
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An Image-Retrieval Method Based on Cross-Hardware Platform FeaturesApplied System Innovation10.3390/asi70400647:4(64)Online publication date: 23-Jul-2024
  • (2024)Rethinking Hash Tables: Challenges and Opportunities with Compute Express Link (CXL)Proceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674418(23-27)Online publication date: 5-Jul-2024
  • (2024)Tiresias: Optimizing NUMA Performance with CXL Memory and Locality-Aware Process SchedulingProceedings of the ACM Turing Award Celebration Conference - China 202410.1145/3674399.3674411(6-11)Online publication date: 5-Jul-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media