Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3544216.3544230acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

Towards μs tail latency and terabit ethernet: disaggregating the host network stack

Published: 22 August 2022 Publication History

Abstract

Dedicated, tightly integrated, and static packet processing pipelines in today's most widely deployed network stacks preclude them from fully exploiting capabilities of modern hardware.
We present NetChannel, a disaggregated network stack architecture for μs-scale applications running atop Terabit Ethernet. NetChannel's disaggregated architecture enables independent scaling and scheduling of resources allocated to each layer in the packet processing pipeline. Using an end-to-end NetChannel realization within the Linux network stack, we demonstrate that NetChannel enables new operating points---(1) enabling a single application thread to saturate multi-hundred gigabit access link bandwidth; (2) enabling near-linear scalability for small message processing with number of cores, independent of number of application threads; and, (3) enabling isolation of latency-sensitive applications, allowing them to maintain μs-scale tail latency even when competing with throughput-bound applications operating at near-line rate.

Supplementary Material

PDF File (p767-cai-supp.pdf)
Supplemental material.

References

[1]
2017. CFQ (Complete Fairness Queueing). https://www.kernel.org/doc/Documentation/block/cfq-iosched.txt.
[2]
2017. Kyber multiqueue I/O scheduler. https://lwn.net/Articles/720071/.
[3]
2019. BFQ (Budget Fair Queueing). https://www.kernel.org/doc/Documentation/block/bfq-iosched.txt.
[4]
2022. Terabit Ethernet. https://en.wikipedia.org/wiki/Terabit_Ethernet.
[5]
Mohammad Alizadeh, Albert Greenberg, David A Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data center TCP (DCTCP). In ACM SIGCOMM.
[6]
Mina Tahmasbi Arashloo, Alexey Lavrov, Manya Ghobadi, Jennifer Rexford, David Walker, and David Wentzlaff. 2020. Enabling Programmable Transport Protocols in High-Speed NICs. In USENIX NSDI.
[7]
Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. 2014. IX: A Protected Dataplane Operating System for High Throughput and Low Latency. In USENIX OSDI.
[8]
Matias Bjørling, Jens Axboe, David Nellans, andPhilippe Bonnet. 2013. Linux Block IO: Introducing Multi-Queue SSD AccessonMulti-Core Systems. In ACM SYSTOR.
[9]
Marco Spaziani Brunella, Giacomo Belocchi, Marco Bonola, Salvatore Pontarelli, Giuseppe Siracusano, Giuseppe Bianchi, Aniello Cammarano, Alessandro Palumbo, Luca Petrucci, and Roberto Bifulco. 2020. hXDP: Efficient Software Packet Processing on FPGA NICs. In USENIX OSDI.
[10]
Qizhe Cai, Mina Tahmasbi Arashloo, and Rachit Agarwal. 2022. dcPIM: Near-Optimal Proactive Datacenter Transport. In ACM SIGCOMM.
[11]
Qizhe Cai, Shubham Chaudhary, Midhul Vuppalapati, Jaehyun Hwang, and Rachit Agarwal. 2021. Understanding Host Network Stack Overheads. InACM SIGCOMM.
[12]
Qizhe Cai, Midhul Vuppalapati, Jaehyun Hwang, Christos Kozyrakis, and Rachit Agarwal. 2022. Towards μs Tail Latency and Terabit Ethernet: Disaggregating the Host Network Stack. https://github.com/Terabit-Ethernet/NetChannel.
[13]
H. K. Jerry Chu. 1996. Zero-Copy TCP in Solaris. In USENIX ATC.
[14]
Jonathan Corbet. 2012. TCP small queues. https://lwn.net/Articles/507065/.
[15]
Jonathan Corbet. 2017. Zero-copy networking. https://lwn.net/Articles/726917/.
[16]
Jonathan Corbet. 2018. Zero-copy TCP receive. https://lwn.net/Articles/752188/.
[17]
Jonathan Corbet. 2021. Zero-copy network transmission with io_uring. https://lwn.net/Articles/879724/.
[18]
Gregory Detal and Sebastien Barre. 2022. MultiPath TCP - Linux Kernel implementation. https://multipath-tcp.org/.
[19]
Jon Dugan, John Estabrook, Jim Ferbuson, Andrew Gallatin, Mark Gates, Kevin Gibbs, Stephen Hemminger, Nathan Jones, Gerrit Renker Feng Qin, Ajay Tirumala, and Alex Warshavsky. 2021. iPerf - The ultimate speed test tool for TCP, UDP and SCTP. https://iperf.fr/.
[20]
Eric Dumazet. 2012. The Path To TCP 4K MTU and RX ZeroCopy. https://legacy.netdevconf.info/0x14/pub/slides/62/ImplementingTCPRXzerocopy.pdf.
[21]
Joshua Fried, Zhenyuan Ruan, Amy Ousterhout, and Adam Belay. 2020. Caladan: Mitigating Interference at Microsecond Timescales. In USENIX OSDI.
[22]
Peter Gao, Akshay Narayan, Sagar Karandikar, João Carreira, Sangjin Han, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. 2016. Network Requirements for Resource Disaggregation. In USENIX OSDI.
[23]
Peter X Gao, Akshay Narayan, Gautam Kumar, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. 2015. pHost: Distributed near-optimal datacenter transport over commodity network fabric. In ACM CoNEXT.
[24]
Google. 2022. gRPC: A high performance, open source universal RPC framework. https://grpc.io/.
[25]
Sangjin Han, Scott Marshall, Byung-Gon Chun, and Sylvia Ratnasamy. 2012. MegaPipe: A New Programming Interface for Scalable Network I/O. In USENIX OSDI.
[26]
Mohammad Hedayati, Kai Shen, Michael L. Scott, and Mike Marty. 2019. Multi-Queue Fair Queuing. In USENIX ATC.
[27]
Alex Hultman. 2020. io_uring is slower than epoll. https://github.com/axboe/liburing/issues/189.
[28]
Jaehyun Hwang, Qizhe Cai, Ao Tang, and Rachit Agarwal. 2020. TCP ≈ RDMA: CPU-efficient Remote Storage Access with i10. In USENIX NSDI.
[29]
Jaehyun Hwang, Midhul Vuppalapati, Simon Peter, and Rachit Agarwal. 2021. Rearchitecting Linux Storage Stack for μs Latency and High Throughput. In USENIX OSDI.
[30]
Intel. 2012. Intel® Data Direct I/O Technology. https://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/data-direct-i-o-technology-brief.pdf.
[31]
Intel. 2022. https://github.com/spdk/spdk/tree/master/examples/nvme/perf.
[32]
Intel. 2022. SPDK: NVMe over Fabrics Target. https://spdk.io/doc/nvmf.html.
[33]
Intel. 2022. Storage Performance Development Kit. https://spdk.io/.
[34]
EunYoung Jeong, Shinae Woo, Muhammad Asim Jamshed, Haewon Jeong, Sunghwan Ihm, Dongsu Han, and KyoungSoo Park. 2014. mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems. In USENIX NSDI.
[35]
Kostis Kafes, Timothy Chong, Jack Tigar Humphries, Adam Belay, David Mazières, and Christos Kozyrakis. 2019. Shinjuku: Preemptive Scheduling for μsecond-scale Tail Latency. In USENIX NSDI.
[36]
Anuj Kalia, Michael Kaminsky, and David Andersen. 2019. Datacenter RPCs can be general and fast. In USENIX NSDI.
[37]
Rishi Kapoor, George Porter, Malveeka Tewari, Geoffrey M. Voelker, and Amin Vahdat. 2012. Chronos: Predictable Low Latency for Data Center Applications. InACM SoCC.
[38]
Antoine Kaufmann, Tim Stamler, Simon Peter, Naveen Kr. Sharma, Arvind Krishnamurthy, and Thomas Anderson. 2019. TAS: TCP Acceleration as an OS Service. In ACM Eurosys.
[39]
Kernel. 2001. https://man7.org/linux/man-pages/man8/tc.8.html.
[40]
Kernel. 2019. Efficient IO with io_uring. https://kernel.dk/io_uring.pdf.
[41]
Collin Lee and Yilong Li. 2021. Homa DPDK Implementation. https://github.com/PlatformLab/Homa.
[42]
Xiaofeng Lin, Yu Chen, Xiaodong Li, Junjie Mao, Jiaquan He, Wei Xu, and Yuanchun Shi. 2016. Scalable Kernel TCP Design and Implementation for Short-Lived Connections. In ACM ASPLOS.
[43]
Linux. 2002. Qdisc: Pfifo Fast Scheduling Policy. https://man7.org/linux/man-pages/man8/tc-pfifo_fast.8.html.
[44]
Linux. 2021. epoll: I/O event notification facilit. https://man7.org/linux/man-pages/man7/epoll.7.html.
[45]
Linux. 2021. Socket. https://man7.org/linux/man-pages/man2/socket.2.html.
[46]
Linux. 2022. Linux Kernel CFS Scheduler. https://www.kernel.org/doc/html/latest/scheduler/sched-design-CFS.html.
[47]
Linux. 2022. MPTCP Upstream Implementation. https://github.com/multipath-tcp/mptcp_net-next/wiki.
[48]
Ming Liu, Tianyi Cui, Henry Schuh, Arvind Krishnamurthy, Simon Peter, and Karan Gupta. 2019. Offloading Distributed Applications onto SmartNICs using iPipe. In ACM SIGCOMM.
[49]
Ilias Marinos, Robert N.M. Watson, and Mark Handley. 2014. Network Stack Specialization for Performance. InACM SIGCOMM.
[50]
Ilias Marinos, Robert N.M. Watson, Mark Handley, and Randall R. Stewart. 2017. Disk|Crypt|Net: Rethinking the Stack for High-Performance Video Streaming. InACM SIGCOMM.
[51]
Michael Marty, Marc de Kruijf, Jacob Adriaens, Christopher Alfeld, Sean Bauer, Carlo Contavalli, Michael Dalton, Nandita Dukkipati, William C. Evans, Steve Gribble, Nicholas Kidd, Roman Kokonov, Gautam Kumar, Carl Mauer, Emily Musick, Lena Olson, Erik Rubow, Michael Ryan, Kevin Springborn, Paul Turner, Valas Valancius, Xi Wang, and Amin Vahdat. 2019. Snap: a Microkernel Approach to Host Networking. In ACM SOSP.
[52]
Mellanox. 2019. Mellanox Technologies: Dynamically-Tuned Interrupt Moderation (DIM). https://support.mellanox.com/s/article/dynamically-tuned-interrupt-moderation-dim-x.
[53]
Behnam Montazeri, Yilong Li, Mohammad Alizadeh, and John Ousterhout. 2018. Homa: A Receiver-Driven Low-Latency Transport Protocol Using Network Priorities. In ACM SIGCOMM.
[54]
Amy Ousterhout, Joshua Fried, Jonathan Behrens, Adam Belay, and Hari Balakrishnan. 2019. Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads. In USENIX NSDI.
[55]
John Ousterhout. 2021. A Linux Kernel Implementation of the Homa Transport Protocol. In USENIX ATC.
[56]
Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. 2014. Arrakis: The Operating System is the Control Plane. In USENIX OSDI.
[57]
George Prekas, Marios Kogias, and Edouard Bugnion. 2017. ZygOS: Achieving Low Tail Latency for Microsecond-scale Networked Tasks. In ACM SOSP.
[58]
Quoc-Thai V Le, Jonathan Stern, and Stephen M Brenner. 2017. Fast memcpy with SPDK and Intel I/OAT DMA Engine. https://software.intel.com/content/www/us/en/develop/articles/fast-memcpy-using-spdk-and-ioat-dma-engine.html.
[59]
Redis. 2022. Redis: an in-memory data structure store. https://redis.io.
[60]
Luigi Rizzo. 2012. netmap: A Novel Framework for Fast Packet I/O. In USENIX ATC.
[61]
Stephen M Rumble, Diego Ongaro, Ryan Stutsman, Mendel Rosenblum, and John K Ousterhout. 2011. It's Time for Low Latency. In USENIX HotOS.
[62]
Rajath Shashidhara, Tim Stamler, Antoine Kaufmann, and Simon Peter. 2022. FlexTOE: Flexible TCP Offload with Fine-Grained Parallelism. In USENIX NSDI.
[63]
Livio Soares and Michael Stumm. 2010. FlexSC: Flexible System Call Scheduling with Exception-Less System Calls. In USENIX OSDI.
[64]
u/T0p_H4t. 2021. https://tinyurl.com/iouring-reddit.
[65]
Vijay Vasudevan, David G. Andersen, and Michael Kaminsky. 2011. The Case for VOS: The Vector Operating System. In USENIX HotOS.
[66]
Yahoo. 2019. YCSB: Yahoo! Cloud Serving Benchmark. https://github.com/brianfrankcooper/YCSB/wiki.
[67]
Kenichi Yasukata, Michio Honda, Douglas Santry, and Lars Eggert. 2016. StackMap: Low-Latency Networking with the OS Stack and Dedicated NICs. InUSENIX ATC.
[68]
Irene Zhang, Amanda Raybuck, Pratyush Patel, Kirk Olynykr, Jacob Nelson, Omar S. Navarro Leija, Ashlie Martinez, Jing Liu, Anna Kornfeld Simpson, Sujay Jayakar, Pedro Henrique Penna, Max Demoulin, Piali Choudhury, and Anirudh Badam. 2021. The Demikernel Datapath OS Architecture for Microsecond-scale Datacenter Systems. In ACM SOSP.

Cited By

View all
  • (2025)Rapid Data Ingestion through DB-OS Co-designProceedings of the ACM on Management of Data10.1145/37097183:1(1-28)Online publication date: 11-Feb-2025
  • (2024)vFPIOProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692063(1167-1184)Online publication date: 10-Jul-2024
  • (2024)OSMOSISProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692007(247-263)Online publication date: 10-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGCOMM '22: Proceedings of the ACM SIGCOMM 2022 Conference
August 2022
858 pages
ISBN:9781450394208
DOI:10.1145/3544216
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2022

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. network stack
  2. operating system
  3. terabit ethernet

Qualifiers

  • Research-article

Funding Sources

  • NSF
  • Google faculty research award
  • National Research Foundation of Korea(NRF)
  • Sloan fellowship

Conference

SIGCOMM '22
Sponsor:
SIGCOMM '22: ACM SIGCOMM 2022 Conference
August 22 - 26, 2022
Amsterdam, Netherlands

Acceptance Rates

Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)479
  • Downloads (Last 6 weeks)33
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Rapid Data Ingestion through DB-OS Co-designProceedings of the ACM on Management of Data10.1145/37097183:1(1-28)Online publication date: 11-Feb-2025
  • (2024)vFPIOProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692063(1167-1184)Online publication date: 10-Jul-2024
  • (2024)OSMOSISProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692007(247-263)Online publication date: 10-Jul-2024
  • (2024)High-throughput and flexible host networking for accelerated computingProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691960(405-423)Online publication date: 10-Jul-2024
  • (2024)Lightweight Automated Reasoning for Network ArchitecturesProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696865(237-245)Online publication date: 18-Nov-2024
  • (2024)LiteQUIC: Improving QoE of Video Streams by Reducing CPU Overhead of QUICProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681670(7918-7927)Online publication date: 28-Oct-2024
  • (2024)Triton: A Flexible Hardware Offloading Architecture for Accelerating Apsara vSwitch in Alibaba CloudProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672224(750-763)Online publication date: 4-Aug-2024
  • (2024)The Open-source DeLiBA2 Hardware/Software Framework for Distributed Storage AcceleratorsACM Transactions on Reconfigurable Technology and Systems10.1145/362448217:2(1-32)Online publication date: 13-Mar-2024
  • (2024)Userspace Networking in gem52024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00026(179-191)Online publication date: 5-May-2024
  • (2024)Poster: Enhancing the Performance of a Single Connection Using Multipath Quic2024 IEEE 32nd International Conference on Network Protocols (ICNP)10.1109/ICNP61940.2024.10858527(1-3)Online publication date: 28-Oct-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media