Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3663408.3663418acmotherconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
poster

LEFT: LightwEight and FasT packet Reordering for RDMA

Published: 03 August 2024 Publication History

Abstract

RDMA, as a cutting-edge networking technology, has gained extensive adoption in large-scale data centers due to its exceptional characteristics, such as low and stable latency, high throughput and low CPU utilization. However, due to the limited on-chip memory of the RDMA Network Interface Cards (RNIC), commercial RDMA usually only supports single-path transmission and cannot fully utilize the rich parallel paths within the DCN, resulting in insufficient bandwidth utilization. Multipath transmission can improve bandwidth utilization, but the out-of-order (OoO) packets it brings negatively impacts the performance of RNICs. Recent works have attempted to address this issue by using bitmaps to record OoO packets to better support multipath transmission. However, these approaches either consume excessive memory for maintaining bitmaps, leading to poor connection scalability, or introduce high latency in bitmap sharing. Consequently, implementing efficient packet reordering in RDMA remains a challenge.
To this end, we propose LEFT, a fast and lightweight RDMA packet reorder. It is fast due to its ability to process packet reordering at speeds exceeding 200Gbps on the RNIC, and it is scalable by introducing enhanced dual-state shared bitmap schemes, thereby consuming minimal memory even under high concurrency. Specifically, LEFT adopts a fast and slow path of packet reordering to reduce latency when dealing with multipath RTT. Simulation evaluation shows that LEFT maintains a throughput of 94%+ even when the multi-path RTT difference is 32 times, which is 180% higher than the throughput achieved by using an ordinary shared bitmap pool. By adding an average of only 7B of extra on-chip states for each connection, LEFT achieves up to 1.54x higher throughput while reducing 99% tail FCT by 29% compared to single path transmission.

References

[1]
2017. RDMA in Data Centers: Looking Back and Looking Forward.https://conferences:sigcomm:org/events/apnet2017/slides/cx:pdf,2017.
[2]
2020. InfiniBand Trade Association. In 2020. InfiniBand Architecture Specification Release 1.4 Annex A17: RoCEv2.
[3]
Mohammad Al-Fares, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, and Amin Vahdat. 2010. Hedera: Dynamic Flow Scheduling for Data Center Networks. In Proceedings of the 7th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2010, April 28-30, 2010, San Jose, CA, USA. USENIX Association, 281–296. http://www.usenix.org/events/nsdi10/tech/full_papers/al-fares.pdf
[4]
Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, Ramanan Vaidyanathan, Kevin Chu, Andy Fingerhut, Vinh The Lam, Francis Matus, Rong Pan, Navindra Yadav, and George Varghese. 2014. CONGA: distributed congestion-aware load balancing for datacenters. In ACM SIGCOMM 2014 Conference, SIGCOMM’14, Chicago, IL, USA, August 17-22, 2014, Fabián E. Bustamante, Y. Charlie Hu, Arvind Krishnamurthy, and Sylvia Ratnasamy (Eds.). ACM, 503–514. https://doi.org/10.1145/2619239.2626316
[5]
Mina Tahmasbi Arashloo, Alexey Lavrov, Manya Ghobadi, Jennifer Rexford, David Walker, and David Wentzlaff. 2020. Enabling Programmable Transport Protocols in High-Speed NICs. In 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020, Santa Clara, CA, USA, February 25-27, 2020, Ranjita Bhagwan and George Porter (Eds.). USENIX Association, 93–109. https://www.usenix.org/conference/nsdi20/presentation/arashloo
[6]
Wei Bai, Shanim Sainul Abdeen, Ankit Agrawal, Krishan Kumar Attre, Paramvir Bahl, Ameya Bhagat, Gowri Bhaskara, Tanya Brokhman, Lei Cao, Ahmad Cheema, Rebecca Chow, Jeff Cohen, Mahmoud Elhaddad, Vivek Ette, Igal Figlin, Daniel Firestone, Mathew George, Ilya German, Lakhmeet Ghai, Eric Green, Albert Greenberg, Manish Gupta, Randy Haagens, Matthew Hendel, Ridwan Howlader, Neetha John, Julia Johnstone, Tom Jolly, Greg Kramer, David Kruse, Ankit Kumar, Erica Lan, Ivan Lee, Avi Levy, Marina Lipshteyn, Xin Liu, Chen Liu, Guohan Lu, Yuemin Lu, Xiakun Lu, Vadim Makhervaks, Ulad Malashanka, David A. Maltz, Ilias Marinos, Rohan Mehta, Sharda Murthi, Anup Namdhari, Aaron Ogus, Jitendra Padhye, Madhav Pandya, Douglas Phillips, Adrian Power, Suraj Puri, Shachar Raindel, Jordan Rhee, Anthony Russo, Maneesh Sah, Ali Sheriff, Chris Sparacino, Ashutosh Srivastava, Weixiang Sun, Nick Swanson, Fuhou Tian, Lukasz Tomczyk, Vamsi Vadlamuri, Alec Wolman, Ying Xie, Joyce Yom, Lihua Yuan, Yanzhao Zhang, and Brian Zill. 2023. Empowering Azure Storage with RDMA. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). USENIX Association, Boston, MA, 49–67. https://www.usenix.org/conference/nsdi23/presentation/bai
[7]
Guo Chen, Yuanwei Lu, Bojie Li, Kun Tan, Yongqiang Xiong, Peng Cheng, Jiansong Zhang, and Thomas Moscibroda. 2019. MP-RDMA: Enabling RDMA With Multi-Path Transport in Datacenters. IEEE/ACM Trans. Netw. 27, 6 (2019), 2308–2323. https://doi.org/10.1109/TNET.2019.2948917
[8]
cx3. 2018. Mellanox ConnectX-3 Product Brief. (2018). arxiv:cs.CL/2100.08172https://lenovopress.lenovo.com/tips0897-mellanox-connectx-3.
[9]
cx5. 2020. Mellanox ConnectX-5 Product Brief. (2020). arxiv:cs.CL/2300.08774https://network.nvidia.com/files/doc-2020/pb-connectx-5-en-card.pdf.
[10]
Advait Abhay Dixit, Pawan Prakash, Y. Charlie Hu, and Ramana Rao Kompella. 2013. On the impact of packet spraying in data center networks. In Proceedings of the IEEE INFOCOM 2013, Turin, Italy, April 14-19, 2013. IEEE, 2130–2138. https://doi.org/10.1109/INFCOM.2013.6567015
[11]
Yixiao Gao, Qiang Li, Lingbo Tang, Yongqing Xi, Pengcheng Zhang, Wenwen Peng, Bo Li, Yaohui Wu, Shaozong Liu, Lei Yan, Fei Feng, Yan Zhuang, Fan Liu, Pan Liu, Xingkui Liu, Zhongjie Wu, Junping Wu, Zheng Cao, Chen Tian, Jinbo Wu, Jiaji Zhu, Haiyong Wang, Dennis Cai, and Jiesheng Wu. 2021. When Cloud Storage Meets RDMA. In 18th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2021, April 12-14, 2021, James Mickens and Renata Teixeira (Eds.). USENIX Association, 519–533. https://www.usenix.org/conference/nsdi21/presentation/gao
[12]
Soudeh Ghorbani, Zibin Yang, Philip Brighten Godfrey, Yashar Ganjali, and Amin Firoozshahian. 2017. DRILL: Micro Load Balancing for Low-latency Data Center Networks. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication, SIGCOMM 2017, Los Angeles, CA, USA, August 21-25, 2017. ACM, 225–238. https://doi.org/10.1145/3098822.3098839
[13]
Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye, and Marina Lipshteyn. 2016. RDMA over Commodity Ethernet at Scale. In Proceedings of the ACM SIGCOMM 2016 Conference, Florianopolis, Brazil, August 22-26, 2016, Marinho P. Barcellos, Jon Crowcroft, Amin Vahdat, and Sachin Katti (Eds.). ACM, 202–215. https://doi.org/10.1145/2934872.2934908
[14]
Keqiang He, Eric Rozner, Kanak Agarwal, Wes Felter, John B. Carter, and Aditya Akella. 2015. Presto: Edge-based Load Balancing for Fast Datacenter Networks. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM 2015, London, United Kingdom, August 17-21, 2015, Steve Uhlig, Olaf Maennel, Brad Karp, and Jitendra Padhye (Eds.). ACM, 465–478. https://doi.org/10.1145/2785956.2787507
[15]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2019. Datacenter RPCs can be General and Fast. In 16th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2019, Boston, MA, February 26-28, 2019, Jay R. Lorch and Minlan Yu (Eds.). USENIX Association, 1–16. https://www.usenix.org/conference/nsdi19/presentation/kalia
[16]
Naga Praveen Katta, Mukesh Hira, Aditi Ghag, Changhoon Kim, Isaac Keslassy, and Jennifer Rexford. 2016. CLOVE: How I learned to stop worrying about the core and love the edge. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks, HotNets 2016, Atlanta, GA, USA, November 9-10, 2016, Bryan Ford, Alex C. Snoeren, and Ellen W. Zegura (Eds.). ACM, 155–161. https://doi.org/10.1145/3005745.3005751
[17]
John Kim, William J. Dally, Steve Scott, and Dennis Abts. 2008. Technology-Driven, Highly-Scalable Dragonfly Topology. In 35th International Symposium on Computer Architecture (ISCA 2008), June 21-25, 2008, Beijing, China. IEEE Computer Society, 77–88. https://doi.org/10.1109/ISCA.2008.19
[18]
Shuhei Kudo, Keigo Nitadori, Takuya Ina, and Toshiyuki Imamura. 2020. Implementation and Numerical Techniques for One EFlop/s HPL-AI Benchmark on Fugaku. In 11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA@SC 2020, Atlanta, GA, USA, November 13, 2020. IEEE, 69–76. https://doi.org/10.1109/SCALA51936.2020.00014
[19]
Petr Lapukhov, Ariff Premji, and Jon Mitchell. 2016. Use of BGP for Routing in Large-Scale Data Centers. RFC 7938 (2016), 1–35. https://doi.org/10.17487/RFC7938
[20]
Yuliang Li, Rui Miao, Hongqiang Harry Liu, Yan Zhuang, Fei Feng, Lingbo Tang, Zheng Cao, Ming Zhang, Frank Kelly, Mohammad Alizadeh, and Minlan Yu. 2019. HPCC: high precision congestion control. In Proceedings of the ACM Special Interest Group on Data Communication, SIGCOMM 2019, Beijing, China, August 19-23, 2019, Jianping Wu and Wendy Hall (Eds.). ACM, 44–58. https://doi.org/10.1145/3341302.3342085
[21]
Yuanwei Lu, Guo Chen, Zhenyuan Ruan, Wencong Xiao, Bojie Li, Jiansong Zhang, Yongqiang Xiong, Peng Cheng, and Enhong Chen. 2017. Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter. In Proceedings of the First Asia-Pacific Workshop on Networking, APNet 2017, Hong Kong, China, August 3-4, 2017, Kai Chen and Jitendra Padhye (Eds.). ACM, 22–28. https://doi.org/10.1145/3106989.3106993
[22]
Radhika Mittal, Vinh The Lam, Nandita Dukkipati, Emily R. Blem, Hassan M. G. Wassel, Monia Ghobadi, Amin Vahdat, Yaogong Wang, David Wetherall, and David Zats. 2015. TIMELY: RTT-based Congestion Control for the Datacenter. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM 2015, London, United Kingdom, August 17-21, 2015, Steve Uhlig, Olaf Maennel, Brad Karp, and Jitendra Padhye (Eds.). ACM, 537–550. https://doi.org/10.1145/2785956.2787510
[23]
Radhika Mittal, Alexander Shpiner, Aurojit Panda, Eitan Zahavi, Arvind Krishnamurthy, Sylvia Ratnasamy, and Scott Shenker. 2018. Revisiting network support for RDMA. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, SIGCOMM 2018, Budapest, Hungary, August 20-25, 2018, Sergey Gorinsky and János Tapolcai (Eds.). ACM, 313–326. https://doi.org/10.1145/3230543.3230557
[24]
Cha Hwan Song, Xin Zhe Khooi, Raj Joshi, Inho Choi, Jialin Li, and Mun Choon Chan. 2023. Network Load Balancing with In-network Reordering Support for RDMA. In Proceedings of the ACM SIGCOMM 2023 Conference, ACM SIGCOMM 2023, New York, NY, USA, 10-14 September 2023, Henning Schulzrinne, Vishal Misra, Eddie Kohler, and David A. Maltz (Eds.). ACM, 816–831. https://doi.org/10.1145/3603269.3604849
[25]
Cha Hwan Song, Xin Zhe Khooi, Raj Joshi, Inho Choi, Jialin Li, and Mun Choon Chan. 2023. Network Load Balancing with In-network Reordering Support for RDMA. In Proceedings of the ACM SIGCOMM 2023 Conference, ACM SIGCOMM 2023, New York, NY, USA, 10-14 September 2023, Henning Schulzrinne, Vishal Misra, Eddie Kohler, and David A. Maltz (Eds.). ACM, 816–831. https://doi.org/10.1145/3603269.3604849
[26]
Erico Vanini, Rong Pan, Mohammad Alizadeh, Parvin Taheri, and Tom Edsall. 2017. Let It Flow: Resilient Asymmetric Load Balancing with Flowlet Switching. In 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2017, Boston, MA, USA, March 27-29, 2017, Aditya Akella and Jon Howell (Eds.). USENIX Association, 407–420. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/vanini
[27]
Zilong Wang, Layong Luo, Qingsong Ning, Chaoliang Zeng, Wenxue Li, Xinchen Wan, Peng Xie, Tao Feng, Ke Cheng, Xiongfei Geng, Tianhao Wang, Weicheng Ling, Kejia Huo, Pingbo An, Kui Ji, Shideng Zhang, Bin Xu, Ruiqing Feng, Tao Ding, Kai Chen, and Chuanxiong Guo. 2023. SRNIC: A Scalable Architecture for RDMA NICs. In 23th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2023, April 20-23, 2023, J (Ed.). USENIX Association, 519–533. https://www.usenix.org/conference/nsdi21/presentation/gao
[28]
Gen Xu, Huda Ibeid, Xin Jiang, Vjekoslav Svilan, and Zhaojuan Bian. 2020. Simulation-Based Performance Prediction of HPC Applications: A Case Study of HPL. In IEEE/ACM International Workshop on HPC User Support Tools and Workshop on Programming and Performance Visualization Tools, HUST/ProTools@SC 2020, Atlanta, GA, USA, November 18, 2020. IEEE, 81–88. https://doi.org/10.1109/HUSTPROTOOLS51951.2020.00016

Cited By

View all
  • (2025)SDLoRe: A loss recovery algorithm based on segment detection in lossy RDMA networksComputer Networks10.1016/j.comnet.2024.111019258(111019)Online publication date: Feb-2025
  • (2024)SSPRD: A Shared-Storage-Based Hardware Packet Reordering and Deduplication System for Multipath Transmission in Wide Area NetworksMicromachines10.3390/mi1511132315:11(1323)Online publication date: 30-Oct-2024
  • (2024)ORNIC: A High-Performance RDMA NIC with Out-of-Order Packet Direct Write Method for Multipath TransmissionElectronics10.3390/electronics1401008814:1(88)Online publication date: 28-Dec-2024
  • Show More Cited By

Index Terms

  1. LEFT: LightwEight and FasT packet Reordering for RDMA

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    APNet '24: Proceedings of the 8th Asia-Pacific Workshop on Networking
    August 2024
    230 pages
    ISBN:9798400717581
    DOI:10.1145/3663408
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 August 2024

    Check for updates

    Author Tags

    1. data centers
    2. network architecture
    3. network hardware
    4. transport layer

    Qualifiers

    • Poster
    • Research
    • Refereed limited

    Funding Sources

    • kh2103016
    • 62172148
    • 2023YFB3002203
    • kh2401005
    • 62222204

    Conference

    APNet 2024

    Acceptance Rates

    APNet '24 Paper Acceptance Rate 50 of 118 submissions, 42%;
    Overall Acceptance Rate 50 of 118 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)139
    • Downloads (Last 6 weeks)39
    Reflects downloads up to 24 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)SDLoRe: A loss recovery algorithm based on segment detection in lossy RDMA networksComputer Networks10.1016/j.comnet.2024.111019258(111019)Online publication date: Feb-2025
    • (2024)SSPRD: A Shared-Storage-Based Hardware Packet Reordering and Deduplication System for Multipath Transmission in Wide Area NetworksMicromachines10.3390/mi1511132315:11(1323)Online publication date: 30-Oct-2024
    • (2024)ORNIC: A High-Performance RDMA NIC with Out-of-Order Packet Direct Write Method for Multipath TransmissionElectronics10.3390/electronics1401008814:1(88)Online publication date: 28-Dec-2024
    • (2024)Design of a Fast and Scalable FPGA-Based Bitmap for RDMA NetworksElectronics10.3390/electronics1324490013:24(4900)Online publication date: 12-Dec-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media