Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3603269.3604837acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

Understanding the Micro-Behaviors of Hardware Offloaded Network Stacks with Lumina

Published: 01 September 2023 Publication History

Abstract

Hardware offloaded network stacks are widely adopted in modern datacenters to meet the demand for high throughput, ultra-low latency and low CPU overhead. To fully leverage their exceptional performance, users need to have a deep understanding of their behaviors. Despite many efforts on testing software network stacks, hardware network stacks impose unique challenges to testing tools due to their kernel bypass nature and high performance.
In this paper, we present Lumina, a tool to test the correctness and performance of hardware network stacks. Lumina leverages network programmability to emulate various network scenarios at line rate. With user-friendly interfaces, Lumina enables developers to inject deterministic events, thus facilitating the development of precise and reproducible tests. Given the limited resource and flexibility of programmable network devices, we mirror all the packets to dedicated servers and dump them for offline analysis. We leverage Lumina to test four RDMA NICs from NVIDIA and Intel, and identify bugs that can significantly degrade performance or mislead network operations. Lumina also enables us to capture unexpected micro-behaviors which are missing or not clearly described in public documents and specifications. Vendors have confirmed the critical bugs we discovered and will include bug fixes in future releases.

References

[1]
2018. Intel Data Plane Development Kit (DPDK). http://dpdk.org/.
[2]
2021. InfiniBand Architecture Specification Volume 1 Release 1.5. https://www.infinibandta.org/ibta-specification/.
[3]
2021. NVIDIA Zero Touch RoCE (ZTR). https://tinyurl.com/yc6tnv7h/.
[4]
2021. P4-16 Language Specification. https://p4.org/p4-spec/docs/P4-16-v1.2.2.html.
[5]
2022. Broadcom M1100G16 100GbE OCP 2.0 Adapter. https://www.broadcom.com/products/ethernet-connectivity/network-adapters/100gb-nic-ocp/m1100g.
[6]
2022. Intel Ethernet Network Adapter E810. https://www.intel.com/content/www/us/en/products/details/ethernet/800-network-adapters/e810-network-adapters.html.
[7]
2022. Intel Tofino. https://www.intel.com/content/www/us/en/products/network-io/programmable-ethernet-switch/tofino-series.html.
[8]
2022. Intel Tofino 2. https://www.intel.com/content/www/us/en/products/network-io/programmable-ethernet-switch/tofino-2-series.html.
[9]
2022. NVIDIA ConnectX-6 Dx. https://www.nvidia.com/en-us/networking/ethernet/connectx-6-dx/.
[10]
2022. OFED perftest. https://github.com/linux-rdma/perftest.
[11]
2022. Oracle Cloud Infrastructure Blog: First principles: Building a high-performance network in the public cloud. https://blogs.oracle.com/cloud-infrastructure/post/building-high-performance-network-in-the-cloud.
[12]
2023. Azure high-performance computing. https://azure.microsoft.com/en-us/solutions/high-performance-computing/.
[13]
2023. NVIDIA ConnectX-7 400G Adapters. https://nvdam.widen.net/s/csf8rmnqwl/infiniband-ethernet-datasheet-connectx-7-ds-nv-us-2544471.
[14]
Wei Bai, Shanim Sainul Abdeen, Ankit Agrawal, Krishan Kumar Attre, Paramvir Bahl, Ameya Bhagat, Gowri Bhaskara, Tanya Brokhman, Lei Cao, Ahmad Cheema, Rebecca Chow, Jeff Cohen, Mahmoud Elhaddad, Vivek Ette, Igal Figlin, Daniel Firestone, Mathew George, Ilya German, Lakhmeet Ghai, Eric Green, Albert Greenberg, Manish Gupta, Randy Haagens, Matthew Hendel, Ridwan Howlader, Neetha John, Julia Johnstone, Tom Jolly, Greg Kramer, David Kruse, Ankit Kumar, Erica Lan, Ivan Lee, Avi Levy, Marina Lipshteyn, Xin Liu, Chen Liu, Guohan Lu, Yuemin Lu, Xiakun Lu, Vadim Makhervaks, Ulad Malashanka, David A. Maltz, Ilias Marinos, Rohan Mehta, Sharda Murthi, Anup Namdhari, Aaron Ogus, Jitendra Padhye, Madhav Pandya, Douglas Phillips, Adrian Power, Suraj Puri, Shachar Raindel, Jordan Rhee, Anthony Russo, Maneesh Sah, Ali Sheriff, Chris Sparacino, Ashutosh Srivastava, Weixiang Sun, Nick Swanson, Fuhou Tian, Lukasz Tomczyk, Vamsi Vadlamuri, Alec Wolman, Ying Xie, Joyce Yom, Lihua Yuan, Yanzhao Zhang, and Brian Zill. 2023. Empowering Azure Storage with RDMA. In USENIX NSDI.
[15]
Olivier Bonaventure, Quentin De Coninck, Fabien Duchêne, Anthony Gego, Mathieu Jadin, François Michel, Maxime Piraux, Chantal Poncin, and Olivier Tilmans. 2020. Open educational resources for computer networking. SIGCOMM CCR (August 2020).
[16]
Neal Cardwell, Yuchung Cheng, Lawrence Brakmo, Matt Mathis, Barath Raghavan, Nandita Dukkipati, Hsiao-keng Jerry Chu, Andreas Terzis, and Tom Herbert. 2013. packetdrill: Scriptable network stack testing, from sockets to packets. In USENIX ATC.
[17]
Yanqing Chen, Bingchuan Tian, Chen Tian, Li Dai, Yu Zhou, Mengjing Ma, Ming Tang, Hao Zheng, Zhewen Yang, Guihai Chen, Dennis Cai, and Ennan Zhai. 2023. Norma: Towards Practical Network Load Testing. In USENIX NSDI.
[18]
Scott Dawson, Farnam Jahanian, and Todd Mitton. 1997. Experiments on six commercial TCP implementations using a software fault injection tool. Software: Practice and Experience (1997).
[19]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.
[20]
Yixiao Gao, Qiang Li, Lingbo Tang, Yongqing Xi, Pengcheng Zhang, Wenwen Peng, Bo Li, Yaohui Wu, Shaozong Liu, Lei Yan, et al. 2021. When Cloud Storage Meets RDMA. In USENIX NSDI.
[21]
Vidhi Goel, Rui Paulo, and Christoph Paasch. 2020. Testing QUIC with packetdrill. In Proceedings of the Workshop on the Evolution, Performance, and Interoperability of QUIC.
[22]
Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye, and Marina Lipshteyn. 2016. RDMA over commodity ethernet at scale. In ACM SIGCOMM.
[23]
Nikita Ivkin, Zhuolong Yu, Vladimir Braverman, and Xin Jin. 2019. Qpipe: Quantiles sketch fully in the data plane. In ACM CoNEXT.
[24]
Anuj Kalia, Michael Kaminsky, and David G Andersen. 2014. Using RDMA efficiently for key-value services. In ACM SIGCOMM.
[25]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. Design Guidelines for High Performance RDMA Systems. In USENIX ATC.
[26]
Anuj Kalia, Michael Kaminsky, and David G Andersen. 2016. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs. In USENIX OSDI.
[27]
Manolis Katevenis, Stefanos Sidiropoulos, and Costas Courcoubetis. 1991. Weighted round-robin cell multiplexing in a general-purpose ATM switch chip. IEEE Journal on selected Areas in Communications 9, 8 (1991), 1265--1279.
[28]
Xinhao Kong, Jingrong Chen, Wei Bai, Yechen Xu, Mahmoud Elhaddad, Shachar Raindel, Jitendra Padhye, and Alvin R Lebeck Danyang Zhuo. 2023. Understanding RDMA Microarchitecture Resources for Performance Isolation. In USENIX NSDI.
[29]
Xinhao Kong, Yibo Zhu, Huaping Zhou, Zhuo Jiang, Jianxi Ye, Chuanxiong Guo, and Danyang Zhuo. 2022. Collie: Finding Performance Anomalies in RDMA Subsystems. In USENIX NSDI.
[30]
Victor Lama. 2011. Enhanced Transmission Selection-IEEE 802.1 Qaz. (2011). https://community.cisco.com/legacyfs/online/legacy/3/7/8/74873-Enhanced%20Transmission%20Selection%20v1.0.pdf.
[31]
Guanyu Li, Menghao Zhang, Cheng Guo, Han Bao, Mingwei Xu, Hongxin Hu, and Fenghua Li. 2022. IMap: Fast and Scalable In-Network Scanning with Programmable Switches. In USENIX NSDI.
[32]
Jialin Li, Ellis Michael, Naveen Kr. Sharma, Adriana Szekeres, and Dan R.K. Ports. 2016. Just say NO to Paxos overhead: Replacing consensus with network ordering. In USENIX OSDI.
[33]
Yuliang Li, Gautam Kumar, Hema Hariharan, Hassan Wassel, Peter Hochschild, Dave Platt, Simon Sabato, Minlan Yu, Nandita Dukkipati, Prashant Chandra, et al. 2020. Sundial: Fault-tolerant clock synchronization for datacenters. In USENIX OSDI.
[34]
Yuliang Li, Rui Miao, Mohammad Alizadeh, and Minlan Yu. 2019. DETER: Deterministic TCP Replay for Performance Diagnosis. In USENIX NSDI.
[35]
Zaoxing Liu, Zhihao Bai, Zhenming Liu, Xiaozhou Li, Changhoon Kim, Vladimir Braverman, Xin Jin, and Ion Stoica. 2019. DistCache: Provable Load Balancing for Large-Scale Storage Systems with Distributed Caching. In USENIX FAST.
[36]
Yuanwei Lu, Guo Chen, Zhenyuan Ruan, Wencong Xiao, Bojie Li, Jiansong Zhang, Yongqiang Xiong, Peng Cheng, and Enhong Chen. 2017. Memory efficient loss recovery for hardware-based transport in datacenter. In Asia-Pacific Workshop on Networking.
[37]
Rui Miao, Hongyi Zeng, Changhoon Kim, Jeongkeun Lee, and Minlan Yu. 2017. Silkroad: Making stateful layer-4 load balancing fast and cheap using switching ASICs. In ACM SIGCOMM.
[38]
Rui Miao, Lingjun Zhu, Shu Ma, Kun Qian, Shujun Zhuang, Bo Li, Shuguang Cheng, Jiaqi Gao, Yan Zhuang, Pengcheng Zhang, et al. 2022. From luna to solar: the evolutions of the compute-to-storage networks in Alibaba cloud. In ACM SIGCOMM.
[39]
Radhika Mittal, Alexander Shpiner, Aurojit Panda, Eitan Zahavi, Arvind Krishnamurthy, Sylvia Ratnasamy, and Scott Shenker. 2018. Revisiting network support for RDMA. In ACM SIGCOMM.
[40]
Srinivas Narayana, Anirudh Sivaraman, Vikram Nathan, Prateesh Goyal, Venkat Arun, Mohammad Alizadeh, Vimalkumar Jeyakumar, and Changhoon Kim. 2017. Language-Directed Hardware Design for Network Performance Monitoring. In ACM SIGCOMM.
[41]
Steve Parker and Chris Schmechel. 1996. The packet shell protocol testing tool. Software distribution at http://playground.sun.com/psh (1996).
[42]
Gregory F Pfister. 2001. An introduction to the infiniband architecture. High performance mass storage and parallel I/O (2001).
[43]
Jeff Rasley, Brent Stephens, Colin Dixon, Eric Rozner, Wes Felter, Kanak Agarwal, John Carter, and Rodrigo Fonseca. 2014. Planck: Millisecond-scale monitoring and control for commodity networks. ACM SIGCOMM.
[44]
Devdeep Ray and Srinivasan Seshan. 2022. CC-fuzz: genetic algorithm-based fuzzing for stress testing congestion control algorithms. In ACM SIGCOMM HotNets Workshop.
[45]
Amedeo Sapio, Marco Canini, Chen-Yu Ho, Jacob Nelson, Panos Kalnis, Changhoon Kim, Arvind Krishnamurthy, Masoud Moshref, Dan RK Ports, and Peter Richtárik. 2019. Scaling distributed machine learning with in-network aggregation. arXiv preprint arXiv:1903.06701 (February 2019).
[46]
Leah Shalev, Hani Ayoub, Nafea Bshara, and Erez Sabbag. 2020. A Cloud-Optimized Transport Protocol for Elastic and Scalable HPC. IEEE/ACM MICRO (2020).
[47]
Alexander Shpiner, Eitan Zahavi, Omar Dahley, Aviv Barnea, Rotem Damsker, Gennady Yekelis, Michael Zus, Eitan Kuta, and Dean Baram. 2017. RoCE rocks without PFC: Detailed evaluation. In Proceedings of the Workshop on Kernel-Bypass Networks.
[48]
Madhavapeddi Shreedhar and George Varghese. 1996. Efficient fair queuing using deficit round-robin. IEEE/ACM Transactions on networking 4, 3 (1996), 375--385.
[49]
Michael Sutton, Adam Greene, and Pedram Amini. 2007. Fuzzing: brute force vulnerability discovery. Pearson Education.
[50]
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:arXiv:2307.09288
[51]
Zhuolong Yu, Yiwen Zhang, Vladimir Braverman, Mosharaf Chowdhury, and Xin Jin. 2020. Netlock: Fast, centralized lock management using programmable switches. In ACM SIGCOMM.
[52]
Dai Zhang, Yu Zhou, Zhaowei Xi, Yangyang Wang, Mingwei Xu, and Jianping Wu. 2021. Hypertester: high-performance network testing driven by programmable switches. IEEE/ACM Transactions on Networking (2021).
[53]
Naiqian Zheng, Mengqi Liu, Ennan Zhai, Hongqiang Harry Liu, Yifan Li, Kaicheng Yang, Xuanzhe Liu, and Xin Jin. 2022. Meissa: scalable network testing for programmable data planes. In ACM SIGCOMM.
[54]
Hang Zhu, Zhihao Bai, Jialin Li, Ellis Michael, Dan Ports, Ion Stoica, and Xin Jin. 2019. Harmonia: Near-Linear Scalability for Replicated Storage with In-Network Conflict Detection. In Proceedings of the VLDB Endowment.
[55]
Yibo Zhu, Haggai Eran, Daniel Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron, Jitendra Padhye, Shachar Raindel, Mohamad Haj Yahia, and Ming Zhang. 2015. Congestion control for large-scale RDMA deployments. In ACM SIGCOMM.
[56]
Yibo Zhu, Nanxi Kang, Jiaxin Cao, Albert Greenberg, Guohan Lu, Ratul Mahajan, Dave Maltz, Lihua Yuan, Ming Zhang, Ben Y Zhao, et al. 2015. Packet-level telemetry in large datacenter networks. In ACM SIGCOMM.
[57]
Yong-Hao Zou, Jia-Ju Bai, Jielong Zhou, Jianfeng Tan, Chenggang Qin, and Shi-Min Hu. 2021. TCP-Fuzz: Detecting memory and semantic bugs in {TCP} stacks with fuzzing. In USENIX ATC.

Cited By

View all
  • (2024)Hostmesh: Monitor and Diagnose Networks in Rail-optimized RoCE ClustersProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663426(122-128)Online publication date: 3-Aug-2024
  • (2024)R-Pingmesh: A Service-Aware RoCE Network Monitoring and Diagnostic SystemProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672264(554-567)Online publication date: 4-Aug-2024
  • (2024)μMon: Empowering Microsecond-level Network Monitoring with WaveletsProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672236(274-290)Online publication date: 4-Aug-2024
  • Show More Cited By

Index Terms

  1. Understanding the Micro-Behaviors of Hardware Offloaded Network Stacks with Lumina

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          ACM SIGCOMM '23: Proceedings of the ACM SIGCOMM 2023 Conference
          September 2023
          1217 pages
          ISBN:9798400702365
          DOI:10.1145/3603269
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 01 September 2023

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. network testing
          2. hardware offloaded network stack
          3. programmable networking
          4. RDMA
          5. event injection

          Qualifiers

          • Research-article

          Funding Sources

          Conference

          ACM SIGCOMM '23
          Sponsor:
          ACM SIGCOMM '23: ACM SIGCOMM 2023 Conference
          September 10, 2023
          NY, New York, USA

          Acceptance Rates

          Overall Acceptance Rate 462 of 3,389 submissions, 14%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)307
          • Downloads (Last 6 weeks)25
          Reflects downloads up to 24 Jan 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Hostmesh: Monitor and Diagnose Networks in Rail-optimized RoCE ClustersProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663426(122-128)Online publication date: 3-Aug-2024
          • (2024)R-Pingmesh: A Service-Aware RoCE Network Monitoring and Diagnostic SystemProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672264(554-567)Online publication date: 4-Aug-2024
          • (2024)μMon: Empowering Microsecond-level Network Monitoring with WaveletsProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672236(274-290)Online publication date: 4-Aug-2024
          • (2024)Diagnosing End-Host Network Bottlenecks in RDMA ServersIEEE/ACM Transactions on Networking10.1109/TNET.2024.341641932:5(4302-4316)Online publication date: Oct-2024

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media