research-article

Towards μs tail latency and terabit ethernet: disaggregating the host network stack

Authors:

Midhul Vuppalapati,

Christos Kozyrakis,

Rachit AgarwalAuthors Info & Claims

SIGCOMM '22: Proceedings of the ACM SIGCOMM 2022 Conference

Pages 767 - 779

https://doi.org/10.1145/3544216.3544230

Published: 22 August 2022 Publication History

Abstract

Dedicated, tightly integrated, and static packet processing pipelines in today's most widely deployed network stacks preclude them from fully exploiting capabilities of modern hardware.

We present NetChannel, a disaggregated network stack architecture for μs-scale applications running atop Terabit Ethernet. NetChannel's disaggregated architecture enables independent scaling and scheduling of resources allocated to each layer in the packet processing pipeline. Using an end-to-end NetChannel realization within the Linux network stack, we demonstrate that NetChannel enables new operating points---(1) enabling a single application thread to saturate multi-hundred gigabit access link bandwidth; (2) enabling near-linear scalability for small message processing with number of cores, independent of number of application threads; and, (3) enabling isolation of latency-sensitive applications, allowing them to maintain μs-scale tail latency even when competing with throughput-bound applications operating at near-line rate.

Supplementary Material

PDF File (p767-cai-supp.pdf)

Supplemental material.

Download
83.68 KB

References

[1]

2017. CFQ (Complete Fairness Queueing). https://www.kernel.org/doc/Documentation/block/cfq-iosched.txt.

[2]

2017. Kyber multiqueue I/O scheduler. https://lwn.net/Articles/720071/.

[3]

2019. BFQ (Budget Fair Queueing). https://www.kernel.org/doc/Documentation/block/bfq-iosched.txt.

[4]

2022. Terabit Ethernet. https://en.wikipedia.org/wiki/Terabit_Ethernet.

[5]

Mohammad Alizadeh, Albert Greenberg, David A Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data center TCP (DCTCP). In ACM SIGCOMM.

[6]

Mina Tahmasbi Arashloo, Alexey Lavrov, Manya Ghobadi, Jennifer Rexford, David Walker, and David Wentzlaff. 2020. Enabling Programmable Transport Protocols in High-Speed NICs. In USENIX NSDI.

[7]

Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. 2014. IX: A Protected Dataplane Operating System for High Throughput and Low Latency. In USENIX OSDI.

[8]

Matias Bjørling, Jens Axboe, David Nellans, andPhilippe Bonnet. 2013. Linux Block IO: Introducing Multi-Queue SSD AccessonMulti-Core Systems. In ACM SYSTOR.

[9]

Marco Spaziani Brunella, Giacomo Belocchi, Marco Bonola, Salvatore Pontarelli, Giuseppe Siracusano, Giuseppe Bianchi, Aniello Cammarano, Alessandro Palumbo, Luca Petrucci, and Roberto Bifulco. 2020. hXDP: Efficient Software Packet Processing on FPGA NICs. In USENIX OSDI.

[10]

Qizhe Cai, Mina Tahmasbi Arashloo, and Rachit Agarwal. 2022. dcPIM: Near-Optimal Proactive Datacenter Transport. In ACM SIGCOMM.

[11]

Qizhe Cai, Shubham Chaudhary, Midhul Vuppalapati, Jaehyun Hwang, and Rachit Agarwal. 2021. Understanding Host Network Stack Overheads. InACM SIGCOMM.

[12]

Qizhe Cai, Midhul Vuppalapati, Jaehyun Hwang, Christos Kozyrakis, and Rachit Agarwal. 2022. Towards μs Tail Latency and Terabit Ethernet: Disaggregating the Host Network Stack. https://github.com/Terabit-Ethernet/NetChannel.

[13]

H. K. Jerry Chu. 1996. Zero-Copy TCP in Solaris. In USENIX ATC.

[14]

Jonathan Corbet. 2012. TCP small queues. https://lwn.net/Articles/507065/.

[15]

Jonathan Corbet. 2017. Zero-copy networking. https://lwn.net/Articles/726917/.

[16]

Jonathan Corbet. 2018. Zero-copy TCP receive. https://lwn.net/Articles/752188/.

[17]

Jonathan Corbet. 2021. Zero-copy network transmission with io_uring. https://lwn.net/Articles/879724/.

[18]

Gregory Detal and Sebastien Barre. 2022. MultiPath TCP - Linux Kernel implementation. https://multipath-tcp.org/.

[19]

Jon Dugan, John Estabrook, Jim Ferbuson, Andrew Gallatin, Mark Gates, Kevin Gibbs, Stephen Hemminger, Nathan Jones, Gerrit Renker Feng Qin, Ajay Tirumala, and Alex Warshavsky. 2021. iPerf - The ultimate speed test tool for TCP, UDP and SCTP. https://iperf.fr/.

[20]

Eric Dumazet. 2012. The Path To TCP 4K MTU and RX ZeroCopy. https://legacy.netdevconf.info/0x14/pub/slides/62/ImplementingTCPRXzerocopy.pdf.

[21]

Joshua Fried, Zhenyuan Ruan, Amy Ousterhout, and Adam Belay. 2020. Caladan: Mitigating Interference at Microsecond Timescales. In USENIX OSDI.

[22]

Peter Gao, Akshay Narayan, Sagar Karandikar, João Carreira, Sangjin Han, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. 2016. Network Requirements for Resource Disaggregation. In USENIX OSDI.

[23]

Peter X Gao, Akshay Narayan, Gautam Kumar, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. 2015. pHost: Distributed near-optimal datacenter transport over commodity network fabric. In ACM CoNEXT.

[24]

Google. 2022. gRPC: A high performance, open source universal RPC framework. https://grpc.io/.

[25]

Sangjin Han, Scott Marshall, Byung-Gon Chun, and Sylvia Ratnasamy. 2012. MegaPipe: A New Programming Interface for Scalable Network I/O. In USENIX OSDI.

[26]

Mohammad Hedayati, Kai Shen, Michael L. Scott, and Mike Marty. 2019. Multi-Queue Fair Queuing. In USENIX ATC.

[27]

Alex Hultman. 2020. io_uring is slower than epoll. https://github.com/axboe/liburing/issues/189.

[28]

Jaehyun Hwang, Qizhe Cai, Ao Tang, and Rachit Agarwal. 2020. TCP ≈ RDMA: CPU-efficient Remote Storage Access with i10. In USENIX NSDI.

[29]

Jaehyun Hwang, Midhul Vuppalapati, Simon Peter, and Rachit Agarwal. 2021. Rearchitecting Linux Storage Stack for μs Latency and High Throughput. In USENIX OSDI.

[30]

Intel. 2012. Intel® Data Direct I/O Technology. https://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/data-direct-i-o-technology-brief.pdf.

[31]

Intel. 2022. https://github.com/spdk/spdk/tree/master/examples/nvme/perf.

[32]

Intel. 2022. SPDK: NVMe over Fabrics Target. https://spdk.io/doc/nvmf.html.

[33]

Intel. 2022. Storage Performance Development Kit. https://spdk.io/.

[34]

EunYoung Jeong, Shinae Woo, Muhammad Asim Jamshed, Haewon Jeong, Sunghwan Ihm, Dongsu Han, and KyoungSoo Park. 2014. mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems. In USENIX NSDI.

[35]

Kostis Kafes, Timothy Chong, Jack Tigar Humphries, Adam Belay, David Mazières, and Christos Kozyrakis. 2019. Shinjuku: Preemptive Scheduling for μsecond-scale Tail Latency. In USENIX NSDI.

[36]

Anuj Kalia, Michael Kaminsky, and David Andersen. 2019. Datacenter RPCs can be general and fast. In USENIX NSDI.

[37]

Rishi Kapoor, George Porter, Malveeka Tewari, Geoffrey M. Voelker, and Amin Vahdat. 2012. Chronos: Predictable Low Latency for Data Center Applications. InACM SoCC.

Digital Library

[38]

Antoine Kaufmann, Tim Stamler, Simon Peter, Naveen Kr. Sharma, Arvind Krishnamurthy, and Thomas Anderson. 2019. TAS: TCP Acceleration as an OS Service. In ACM Eurosys.

Digital Library

[39]

Kernel. 2001. https://man7.org/linux/man-pages/man8/tc.8.html.

[40]

Kernel. 2019. Efficient IO with io_uring. https://kernel.dk/io_uring.pdf.

[41]

Collin Lee and Yilong Li. 2021. Homa DPDK Implementation. https://github.com/PlatformLab/Homa.

[42]

Xiaofeng Lin, Yu Chen, Xiaodong Li, Junjie Mao, Jiaquan He, Wei Xu, and Yuanchun Shi. 2016. Scalable Kernel TCP Design and Implementation for Short-Lived Connections. In ACM ASPLOS.

[43]

Linux. 2002. Qdisc: Pfifo Fast Scheduling Policy. https://man7.org/linux/man-pages/man8/tc-pfifo_fast.8.html.

[44]

Linux. 2021. epoll: I/O event notification facilit. https://man7.org/linux/man-pages/man7/epoll.7.html.

[45]

Linux. 2021. Socket. https://man7.org/linux/man-pages/man2/socket.2.html.

[46]

Linux. 2022. Linux Kernel CFS Scheduler. https://www.kernel.org/doc/html/latest/scheduler/sched-design-CFS.html.

[47]

Linux. 2022. MPTCP Upstream Implementation. https://github.com/multipath-tcp/mptcp_net-next/wiki.

[48]

Ming Liu, Tianyi Cui, Henry Schuh, Arvind Krishnamurthy, Simon Peter, and Karan Gupta. 2019. Offloading Distributed Applications onto SmartNICs using iPipe. In ACM SIGCOMM.

[49]

Ilias Marinos, Robert N.M. Watson, and Mark Handley. 2014. Network Stack Specialization for Performance. InACM SIGCOMM.

[50]

Ilias Marinos, Robert N.M. Watson, Mark Handley, and Randall R. Stewart. 2017. Disk|Crypt|Net: Rethinking the Stack for High-Performance Video Streaming. InACM SIGCOMM.

[51]

Michael Marty, Marc de Kruijf, Jacob Adriaens, Christopher Alfeld, Sean Bauer, Carlo Contavalli, Michael Dalton, Nandita Dukkipati, William C. Evans, Steve Gribble, Nicholas Kidd, Roman Kokonov, Gautam Kumar, Carl Mauer, Emily Musick, Lena Olson, Erik Rubow, Michael Ryan, Kevin Springborn, Paul Turner, Valas Valancius, Xi Wang, and Amin Vahdat. 2019. Snap: a Microkernel Approach to Host Networking. In ACM SOSP.

[52]

Mellanox. 2019. Mellanox Technologies: Dynamically-Tuned Interrupt Moderation (DIM). https://support.mellanox.com/s/article/dynamically-tuned-interrupt-moderation-dim-x.

[53]

Behnam Montazeri, Yilong Li, Mohammad Alizadeh, and John Ousterhout. 2018. Homa: A Receiver-Driven Low-Latency Transport Protocol Using Network Priorities. In ACM SIGCOMM.

Digital Library

[54]

Amy Ousterhout, Joshua Fried, Jonathan Behrens, Adam Belay, and Hari Balakrishnan. 2019. Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads. In USENIX NSDI.

[55]

John Ousterhout. 2021. A Linux Kernel Implementation of the Homa Transport Protocol. In USENIX ATC.

[56]

Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. 2014. Arrakis: The Operating System is the Control Plane. In USENIX OSDI.

[57]

George Prekas, Marios Kogias, and Edouard Bugnion. 2017. ZygOS: Achieving Low Tail Latency for Microsecond-scale Networked Tasks. In ACM SOSP.

[58]

Quoc-Thai V Le, Jonathan Stern, and Stephen M Brenner. 2017. Fast memcpy with SPDK and Intel I/OAT DMA Engine. https://software.intel.com/content/www/us/en/develop/articles/fast-memcpy-using-spdk-and-ioat-dma-engine.html.

[59]

Redis. 2022. Redis: an in-memory data structure store. https://redis.io.

[60]

Luigi Rizzo. 2012. netmap: A Novel Framework for Fast Packet I/O. In USENIX ATC.

[61]

Stephen M Rumble, Diego Ongaro, Ryan Stutsman, Mendel Rosenblum, and John K Ousterhout. 2011. It's Time for Low Latency. In USENIX HotOS.

[62]

Rajath Shashidhara, Tim Stamler, Antoine Kaufmann, and Simon Peter. 2022. FlexTOE: Flexible TCP Offload with Fine-Grained Parallelism. In USENIX NSDI.

[63]

Livio Soares and Michael Stumm. 2010. FlexSC: Flexible System Call Scheduling with Exception-Less System Calls. In USENIX OSDI.

[64]

u/T0p_H4t. 2021. https://tinyurl.com/iouring-reddit.

[65]

Vijay Vasudevan, David G. Andersen, and Michael Kaminsky. 2011. The Case for VOS: The Vector Operating System. In USENIX HotOS.

[66]

Yahoo. 2019. YCSB: Yahoo! Cloud Serving Benchmark. https://github.com/brianfrankcooper/YCSB/wiki.

[67]

Kenichi Yasukata, Michio Honda, Douglas Santry, and Lars Eggert. 2016. StackMap: Low-Latency Networking with the OS Stack and Dedicated NICs. InUSENIX ATC.

[68]

Irene Zhang, Amanda Raybuck, Pratyush Patel, Kirk Olynykr, Jacob Nelson, Omar S. Navarro Leija, Ashlie Martinez, Jing Liu, Anna Kornfeld Simpson, Sujay Jayakar, Pedro Henrique Penna, Max Demoulin, Piali Choudhury, and Anirudh Badam. 2021. The Demikernel Datapath OS Architecture for Microsecond-scale Datacenter Systems. In ACM SOSP.

Cited By

Lim KYoon MKim KFekete AJung H(2025)Rapid Data Ingestion through DB-OS Co-designProceedings of the ACM on Management of Data10.1145/37097183:1(1-28)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709718
Chen JUnnibhavi HKoshiba ABhatotia PBagchi SZhang Y(2024)vFPIOProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692063(1167-1184)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692063
Khalilov MChrapek MShen SVezzu ABenz TDi Girolamo SSchneider TDe Sensi DBenini LHoefler TBagchi SZhang Y(2024)OSMOSISProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692007(247-263)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692007
Show More Cited By

Index Terms

Towards μs tail latency and terabit ethernet: disaggregating the host network stack

Recommendations

Ethernet bonding on a bare PC web server with dual NICs
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

Bare PC¹applications run without the support of an operating system (OS) or kernel and include the necessary hardware interfaces and network device drivers with each application. We describe a novel implementation of Ethernet bonding on a bare PC Web ...
A Survey of Kernel-Bypass Techniques in Network Stack
CSAI '18: Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence

With the rapid development of network interface controller (NIC), network stack has become the determinant of web service performance. Traditionally, network stack is implemented in the kernel. However, kernel network stack has its inefficiencies and ...
Ethernet

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGCOMM '22: Proceedings of the ACM SIGCOMM 2022 Conference

August 2022

858 pages

ISBN:9781450394208

DOI:10.1145/3544216

General Chairs:
Fernando Kuipers
Delft University of Technology
,
Ariel Orda
Technion Israel Institute of Technology

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCOMM: ACM Special Interest Group on Data Communication

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

NSF
Google faculty research award
National Research Foundation of Korea(NRF)
Sloan fellowship

Conference

SIGCOMM '22

Sponsor:

SIGCOMM

SIGCOMM '22: ACM SIGCOMM 2022 Conference

August 22 - 26, 2022

Amsterdam, Netherlands

Acceptance Rates

Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
1,979
Total Downloads

Downloads (Last 12 months)479
Downloads (Last 6 weeks)33

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lim KYoon MKim KFekete AJung H(2025)Rapid Data Ingestion through DB-OS Co-designProceedings of the ACM on Management of Data10.1145/37097183:1(1-28)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709718
Chen JUnnibhavi HKoshiba ABhatotia PBagchi SZhang Y(2024)vFPIOProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692063(1167-1184)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692063
Khalilov MChrapek MShen SVezzu ABenz TDi Girolamo SSchneider TDe Sensi DBenini LHoefler TBagchi SZhang Y(2024)OSMOSISProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692007(247-263)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692007
Skiadopoulos AXie ZZhao MCai QAgarwal SAdelmann JAhern DContavalli CGoldflam MMayatskikh VRaja RWalton DAgarwal RMukherjee SKozyrakis CGavrilovska ATerry D(2024)High-throughput and flexible host networking for accelerated computingProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691960(405-423)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691960
Bothra RArun VGodfrey BNarayan ASaeed A(2024)Lightweight Automated Reasoning for Network ArchitecturesProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696865(237-245)Online publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1145/3696348.3696865
Bi PZou YXiao MYu DLi YLiu ZXie QCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)LiteQUIC: Improving QoE of Video Streams by Reducing CPU Overhead of QUICProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681670(7918-7927)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681670
Li XJiang XYang YChen LWang YWang CXu CLv YYang BWu TGao HChen ZQiao YDing HDong YYang HSong JLu JZhang PWei CZhang ZChen WHe QZhu SSekar VYu MSeneviratne AVeitch D(2024)Triton: A Flexible Hardware Offloading Architecture for Accelerating Apsara vSwitch in Alibaba CloudProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672224(750-763)Online publication date: 4-Aug-2024
https://dl.acm.org/doi/10.1145/3651890.3672224
Khan BHeinz CKoch A(2024)The Open-source DeLiBA2 Hardware/Software Framework for Distributed Storage AcceleratorsACM Transactions on Reconfigurable Technology and Systems10.1145/362448217:2(1-32)Online publication date: 13-Mar-2024
https://dl.acm.org/doi/10.1145/3624482
Umeike JAgarwal SLazarev NAlian M(2024)Userspace Networking in gem52024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00026(179-191)Online publication date: 5-May-2024
https://doi.org/10.1109/ISPASS61541.2024.00026
Ingenzi VBarbette TBonaventure O(2024)Poster: Enhancing the Performance of a Single Connection Using Multipath Quic2024 IEEE 32nd International Conference on Network Protocols (ICNP)10.1109/ICNP61940.2024.10858527(1-3)Online publication date: 28-Oct-2024
https://doi.org/10.1109/ICNP61940.2024.10858527

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten