Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3563766.3564110acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Open access

Understanding host interconnect congestion

Published: 14 November 2022 Publication History

Abstract

We present evidence and characterization of host congestion in production clusters: adoption of high-bandwidth access links leading to emergence of bottlenecks within the host interconnect (NIC-to-CPU data path). We demonstrate that contention on existing IO memory management units and/or the memory subsystem can significantly reduce the available NIC-to-CPU bandwidth, resulting in hundreds of microseconds of queueing delays and eventual packet drops at hosts (even when running a state-of-the-art congestion control protocol that accounts for CPU-induced host congestion). We also discuss implications of host interconnect congestion to design of future host architecture, network stacks and network protocols.

References

[1]
Jasmin Ajanovic. 2008. PCI Express 3.0 Accelerator Features. (2008). https://www.intel.com.ec/content/dam/doc/white-paper/pci-express3-accelerator-white-paper.pdf
[2]
Mohammad Alizadeh, Albert Greenberg, David A Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data Center TCP (DCTCP). In ACM SIGCOMM.
[3]
Mohammad Alizadeh, Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, and Scott Shenker. 2013. pFabric: Minimal near-optimal datacenter transport. In ACM SIGCOMM.
[4]
Nadav Amit, Muli Ben-Yehuda, Dan Tsafrir, and Assaf Schuster. 2011. vIOMMU: efficient IOMMU emulation. In USENIX ATC.
[5]
Nadav Amit, Muli Ben-Yehuda, and Ben-Ami Yassour. 2010. IOMMU: strategies for mitigating the IOTLB bottleneck. In IEEE ISCA.
[6]
Arm. 2021. Memory system resource partitioning and monitoring (MPAM). (2021). https://developer.arm.com/documentation/ddi0598/latest
[7]
Lars Bergstrom. 2011. Measuring NUMA effects with the STREAM benchmark. arXiv:1103.3225 (2011).
[8]
Qizhe Cai, Shubham Chaudhary, Midhul Vuppalapati, Jaehyun Hwang, and Rachit Agarwal. 2021. Understanding host network stack overheads. In ACM SIGCOMM.
[9]
Qizhe Cai, Midhul Vuppalapati, Jaehyun Hwang, Christos Kozyrakis, and Rachit Agarwal. 2022. Towards μs tail latency and terabit ethernet: disaggregating the host network stack. In ACM SIGCOMM.
[10]
Alireza Farshin, Amir Roozbeh, Gerald Q Maguire Jr, and Dejan Kostić. 2020. Reexamining direct cache access to optimize I/O intensive applications for multi-hundred-gigabit networks. In USENIX ATC.
[11]
Matthew P Grosvenor, Malte Schwarzkopf, Ionel Gog, Robert NM Watson, Andrew W Moore, Steven Hand, and Jon Crowcroft. 2015. Queues don't matter when you can jump them!. In USENIX NSDI.
[12]
Sangtae Ha, Injong Rhee, and Lisong Xu. 2008. CUBIC: a new TCP-friendly high-speed TCP variant. In ACM SIGOPS OSR.
[13]
Intel. 2019. Introduction to Memory Bandwidth Allocation. (2019). https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-memory-bandwidth-allocation.html
[14]
Anuj Kalia, Michael Kaminsky, and David Andersen. 2019. Datacenter RPCs can be General and Fast. In USENIX NSDI.
[15]
Gautam Kumar, Nandita Dukkipati, Keon Jang, Hassan MG Wassel, Xian Wu, Behnam Montazeri, Yaogong Wang, Kevin Springborn, Christopher Alfeld, Michael Ryan, David Wetherall, and Amin Vahdat. 2020. Swift: Delay is simple and effective for congestion control in the datacenter. In ACM SIGCOMM.
[16]
Chang Joo Lee, Onur Mutlu, Veynu Narasiman, and Yale N Patt. 2008. Prefetch-aware DRAM controllers. In IEEE/ACM MICRO.
[17]
Bojie Li, Zhenyuan Ruan, Wencong Xiao, Yuanwei Lu, Yongqiang Xiong, Andrew Putnam, Enhong Chen, and Lintao Zhang. 2017. KV-direct: High-performance in-memory key-value store with programmable nic. In ACM SOSP.
[18]
John DC Little and Stephen C Graves. 2008. Little's law. In Building intuition. Springer.
[19]
Moshe Malka, Nadav Amit, Muli Ben-Yehuda, and Dan Tsafrir. 2015. rIOMMU: efficient IOMMU for I/O devices that employ ring buffers. ACM SIGPLAN Notices.
[20]
Ilias Marinos, Robert NM Watson, Mark Handley, and Randall R Stewart. 2017. Disk| Crypt| Net: rethinking the stack for high-performance video streaming. In ACM SIGCOMM.
[21]
Alex Markuze, Igor Smolyar, Adam Morrison, and Dan Tsafrir. 2018. DAMN: Overhead-free IOMMU protection for networking. In ACM ASPLOS.
[22]
Michael Marty, Marc de Kruijf, Jacob Adriaens, Christopher Alfeld, Sean Bauer, Carlo Contavalli, Michael Dalton, Nandita Dukkipati, William C Evans, Steve Gribble, et al. 2019. Snap: A microkernel approach to host networking. In ACM SOSP.
[23]
Hassan Mujtaba. 2020. Intel Sapphire Rapids Xeon Scalable CPUs. (2020). https://wccftech.com/intel-sapphire-rapids-xeon-scalable-cpus-volume-ramp-rumored-for-2023/
[24]
Onur Mutlu and Thomas Moscibroda. 2008. Parallelism-aware batch scheduling: enhancing both performance and fairness of shared DRAM systems. In IEEE ISCA.
[25]
Rolf Neugebauer, Gianni Antichi, José Fernando Zazo, Yury Audzevich, Sergio López-Buedo, and Andrew W Moore. 2018. Understanding PCIe performance for end host networking. In ACM SIGCOMM.
[26]
NVIDIA. 2022. ConnectX-5. (2022). https://www.nvidia.com/en-us/networking/ethernet/connectx-5/
[27]
NVIDIA. 2022. ConnectX-6. (2022). https://www.nvidia.com/en-us/networking/ethernet/connectx-6/
[28]
NVIDIA. 2022. ConnectX-7. (2022). https://nvdam.widen.net/s/srdqzxgdr5/connectx-7-datasheet
[29]
Omer Peleg, Adam Morrison, Benjamin Serebrin, and Dan Tsafrir. 2015. Utilizing the IOMMU scalably. In USENIX ATC.
[30]
Boris Pismenny, Liran Liss, Adam Morrison, and Dan Tsafrir. 2022. The benefits of general-purpose on-NIC memory. In ACM ASPLOS.
[31]
Sivasankar Radhakrishnan, Yilong Geng, Vimalkumar Jeyakumar, Abdul Kabbani, George Porter, and Amin Vahdat. 2014. SENIC: scalable NIC for end-host rate limiting. In USENIX NSDI.
[32]
Shelby Thomas, Geoffrey M Voelker, and George Porter. 2018. Cachecloud: towards speed-of-light datacenter communication. In USENIX HotCloud.
[33]
Stephen Van Doren. 2019. HOTI 2019: Compute Express Link. In IEEE HOTI.
[34]
Yimeng Zhao, Ahmed Saeed, Mostafa Ammar, and Ellen Zegura. 2021. Scouting the path to a million-client server. In PAM.

Cited By

View all
  • (2024)vSwitchLB: Stratified Load Balancing for vSwitch Efficiency in Data CentersProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663422(95-101)Online publication date: 3-Aug-2024
  • (2024)Rethinking Intra-host Congestion Control in RDMA NetworksProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663413(31-37)Online publication date: 3-Aug-2024
  • (2024)Mitigating Intra-host Network Congestion with SmartNIC2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682939(1-10)Online publication date: 19-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HotNets '22: Proceedings of the 21st ACM Workshop on Hot Topics in Networks
November 2022
252 pages
ISBN:9781450398992
DOI:10.1145/3563766
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2022

Check for updates

Author Tags

  1. congestion control
  2. datacenter transport
  3. network hardware

Qualifiers

  • Research-article

Funding Sources

  • NSF

Conference

HotNets '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 110 of 460 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)783
  • Downloads (Last 6 weeks)87
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)vSwitchLB: Stratified Load Balancing for vSwitch Efficiency in Data CentersProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663422(95-101)Online publication date: 3-Aug-2024
  • (2024)Rethinking Intra-host Congestion Control in RDMA NetworksProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663413(31-37)Online publication date: 3-Aug-2024
  • (2024)Mitigating Intra-host Network Congestion with SmartNIC2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682939(1-10)Online publication date: 19-Jun-2024
  • (2024)Userspace Networking in gem52024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00026(179-191)Online publication date: 5-May-2024
  • (2023)Throughput Optimization with a NUMA-Aware Runtime System for Efficient Scientific Data StreamingProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624593(795-805)Online publication date: 12-Nov-2023
  • (2023)Recent Advances in Data Intensive Applications: Survey2023 10th International Conference on Wireless Networks and Mobile Communications (WINCOM)10.1109/WINCOM59760.2023.10322920(1-6)Online publication date: 26-Oct-2023
  • (2023)Early Marking for Controllable Maximum Queue Length in Data Center Networks2023 32nd International Conference on Computer Communications and Networks (ICCCN)10.1109/ICCCN58024.2023.10230181(1-10)Online publication date: Jul-2023
  • (2023)Addressing Endpoint-Induced Congestion for Accelerator Scale-Out in a Medium-Scale Domain2023 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC58863.2023.10363611(1-8)Online publication date: 25-Sep-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media