Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2872362.2872391acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Scalable Kernel TCP Design and Implementation for Short-Lived Connections

Published: 25 March 2016 Publication History

Abstract

With the rapid growth of network bandwidth, increases in CPU cores on a single machine, and application API models demanding more short-lived connections, a scalable TCP stack is performance-critical. Although many clean-state designs have been proposed, production environments still call for a bottom-up parallel TCP stack design that is backward-compatible with existing applications.
We present Fastsocket, a BSD Socket-compatible and scalable kernel socket design, which achieves table-level connection partition in TCP stack and guarantees connection locality for both passive and active connections. Fastsocket architecture is a ground up partition design, from NIC interrupts all the way up to applications, which naturally eliminates various lock contentions in the entire stack. Moreover, Fastsocket maintains the full functionality of the kernel TCP stack and BSD-socket-compatible API, and thus applications need no modifications.
Our evaluations show that Fastsocket achieves a speedup of 20.4x on a 24-core machine under a workload of short-lived connections, outperforming the state-of-the-art Linux kernel TCP implementations. When scaling up to 24 CPU cores, Fastsocket increases the throughput of Nginx and HAProxy by 267% and 621% respectively compared with the base Linux kernel. We also demonstrate that Fastsocket can achieve scalability and preserve BSD socket API at the same time. Fastsocket is already deployed in the production environment of Sina WeiBo, serving 50 million daily active users and billions of requests per day.

References

[1]
Haproxy. http://haproxy.1wt.eu/.
[2]
http_load - multiprocessing http test client. http://www.acme.com/software/http_load/.
[3]
Hypertext transfer protocol -- http/1.0. http://tools.ietf.org/html/rfc1945.
[4]
Lock statistics. https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/lockstat.txt.
[5]
Tcp syn flooding attacks and common mitigations. http://tools.ietf.org/html/rfc4987.
[6]
Rfs hardware acceleration. http://lwn.net/Articles/406489/, 2010.
[7]
rfs: Receive flow steering. http://lwn.net/Articles/381955/, 2010.
[8]
xps: Transmit packet steering. http://lwn.net/Articles/412062/, 2010.
[9]
Intel 82599 10 gigabit ethernet controller datasheet. http://www.intel.com/content/www/us/en/ethernet-controllers/82599--10-gbe-controller-datasheet.html, 2014.
[10]
Intel i/o acceleration technology: Intel network adapters user guide. http://web.mit.edu/cron/documentation/dell-server-admin/en/IntelNIC/ioat.htm, 2014.
[11]
A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The multikernel: A new os architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles, SOSP '09, pages 29--44, New York, NY, USA, 2009. ACM.
[12]
A. Belay, G. Prekas, A. Klimovic, S. Grossman, C. Kozyrakis, and E. Bugnion. Ix: A protected dataplane operating system for high throughput and low latency. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 49--65, Berkeley, CA, USA, 2014. USENIX Association.
[13]
S. M. Bellovin. A look back at "security problems in the tcp/ip protocol suite". In Proceedings of the 20th Annual Computer Security Applications Conference, ACSAC '04, pages 229--249, Washington, DC, USA, 2004. IEEE Computer Society.
[14]
S. Boyd-Wickizer, A. T. Clements, Y. Mao, A. Pesterev, M. F. Kaashoek, R. Morris, and N. Zeldovich. An analysis of linux scalability to many cores. In R. H. Arpaci-Dusseau and B. Chen, editors, OSDI, pages 1--16. USENIX Association, 2010.
[15]
H.-k. J. Chu. Zero-copy tcp in solaris. In Proceedings of the 1996 Annual Conference on USENIX Annual Technical Conference, ATEC '96, pages 21--21, Berkeley, CA, USA, 1996. USENIX Association.
[16]
A. T. Clements, M. F. Kaashoek, N. Zeldovich, R. T. Morris, and E. Kohler. The scalable commutativity rule: designing scalable software for multicore processors. In M. Kaminsky and M. Dahlin, editors, SOSP, pages 1--17. ACM, 2013.
[17]
S. Communications. Introduction to openonload: Building application transparency and protocol conformance into application acceleration middleware. http://www.solarflare.com/content/userfiles/documents/solarflare_openonload_intropaper.pdf, 2011.
[18]
D. Ely, S. Savage, and D. Wetherall. Alpine: A user-level infrastructure for network protocol development. In USITS. USENIX, 2001.
[19]
G. R. Ganger, D. R. Engler, M. F. Kaashoek, H. M. Briceo, R. Hunt, T. Pinckney, and V. Inc. Fast and flexible application-level networking on exokernel systems. ACM Transactions on Computer Systems, 20:49--83, 2000.
[20]
H. S. Gunawi, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Deploying safe user-level network services with icTCP. In OSDI, pages 317--332. USENIX Association, 2004.
[21]
S. Han, S. Marshall, B.-G. Chun, and S. Ratnasamy. Megapipe: A new programming interface for scalable network i/o. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, pages 135--148, Berkeley, CA, USA, 2012. USENIX Association.
[22]
E. Jeong, S. Wood, M. Jamshed, H. Jeong, S. Ihm, D. Han, and K. Park. mtcp: a highly scalable user-level tcp stack for multicore systems. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), pages 489--502, Seattle, WA, Apr. 2014. USENIX Association.
[23]
M. Kerrisk. The so_reuseport socket option. http://lwn.net/Articles/542629/, 2013.
[24]
I. Krsul, A. Ganguly, J. Zhang, J. A. B. Fortes, and R. J. Figueiredo. Vmplants: Providing and managing virtual machine execution environments for grid computing. In Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, SC '04, pages 7--, Washington, DC, USA, 2004. IEEE Computer Society.
[25]
G. Loukas and G. Öke. Protection against denial of service attacks. Comput. J., 53(7):1020--1037, Sept. 2010.
[26]
lwIP community. lwip - a lightweight tcp/ip stack - summary. http://savannah.nongnu.org/projects/lwip/, 2012.
[27]
E. M. Nahum, D. J. Yates, J. F. Kurose, and D. Towsley. Performance issues in parallelized network protocols. In Proceedings of the 1st USENIX Conference on Operating Systems Design and Implementation, OSDI '94, Berkeley, CA, USA, 1994. USENIX Association.
[28]
V. S. Pai, P. Druschel, and W. Zwaenepoel. Io-lite: A unified i/o buffering and caching system. ACM Trans. Comput. Syst., 18(1):37--66, Feb. 2000.
[29]
A. Pesterev, J. Strauss, N. Zeldovich, and R. T. Morris. Improving network connection locality on multicore systems. In P. Felber, F. Bellosa, and H. Bos, editors, EuroSys, pages 337--350. ACM, 2012.
[30]
S. Peter, J. Li, I. Zhang, D. R. K. Ports, D. Woos, A. Krishnamurthy, T. Anderson, and T. Roscoe. Arrakis: The operating system is the control plane. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 1--16, Berkeley, CA, USA, 2014. USENIX Association.
[31]
L. Shalev, J. Satran, E. Borovik, and M. Ben-Yehuda. Isostack: Highly efficient network processing on dedicated cores. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC'10, pages 5--5, Berkeley, CA, USA, 2010. USENIX Association.
[32]
L. Soares and M. Stumm. FlexSC: Flexible system call scheduling with exception-less system calls. In R. H. Arpaci-Dusseau and B. Chen, editors, OSDI, pages 33--46. USENIX Association, 2010.
[33]
T. Suzumura, M. Tatsubori, S. Trent, A. Tozawa, and T. Onodera. Highly scalable web applications with zero-copy data transfer. In Proceedings of the 18th International Conference on World Wide Web, WWW '09, pages 921--930, New York, NY, USA, 2009. ACM.
[34]
S. Tripathi. Fireengine new networking architecture for the solaris operating system. http://www.scn.rain.com/neighorn/PDF/FireEngine_WP.pdf, 2004.
[35]
R. Uhlig, G. Neiger, D. Rodgers, A. L. Santoni, F. C. M. Martins, A. V. Anderson, S. M. Bennett, A. Kagi, F. H. Leung, and L. Smith. Intel virtualization technology. Computer, 38(5):48--56, May 2005.
[36]
R. N. M. Watson. Introduction to multithreading and multiprocessing in the freebsd smpng network stack. http://www.watson.org/robert/freebsd/netperf/20051027-eurobsdcon2005-netperf.pdf, 2005.
[37]
D. F. Williamson, R. A. Parker, and J. S. Kendrick. The box plot: a simple visual method to interpret data. Annals of internal medicine, 110(11):916--921, 1989.
[38]
P. Willmann, S. Rixner, and A. L. Cox. An evaluation of network stack parallelization strategies in modern operating systems. In Proceedings of the Annual Conference on USENIX '06 Annual Technical Conference, ATEC '06, pages 8--8, Berkeley, CA, USA, 2006. USENIX Association.
[39]
S. Woo, E. Jeong, S. Park, J. Lee, S. Ihm, and K. Park. Comparison of caching strategies in modern cellular backhaul networks. In H.-H. Chu, P. Huang, R. R. Choudhury, and F. Zhao, editors, MobiSys, pages 319--332. ACM, 2013.
[40]
P. Xie, B. Wu, M. Liu, J. Harris, and C. Scheiman. Profiling the performance of tcp/ip on windows nt. Computer Performance and Dependability Symposium, International, 0:133, 2000.
[41]
H. youb Kim and S. Rixner. Performance characterization of the freebsd network stack. http://www.cs.rice.edu/CS/Architecture/docs/kim-tr05.pdf, 2005.
[42]
H. Zou, W. Wu, X.-H. Sun, P. DeMar, and M. Crawford. An evaluation of parallel optimization for opensolaris network stack. In Local Computer Networks (LCN), 2010 IEEE 35th Conference on, pages 296--299, Oct 2010.

Cited By

View all
  • (2024)AppSteer: Framework for Improving Multicore Scalability of Network Functions via Application-aware Packet Steering2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00012(18-27)Online publication date: 6-May-2024
  • (2023)Light: A Compatible, high-performance and scalable user-level network stackComputer Networks10.1016/j.comnet.2023.109756229(109756)Online publication date: Jun-2023
  • (2022)Queueing-Theoretic Performance Analysis of a Low-Entropy Labeled Network StackIntelligent Computing10.34133/2022/98630542022Online publication date: 5-Sep-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems
March 2016
824 pages
ISBN:9781450340915
DOI:10.1145/2872362
  • General Chair:
  • Tom Conte,
  • Program Chair:
  • Yuanyuan Zhou
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. TCP/IP
  2. multicore system
  3. operating system

Qualifiers

  • Research-article

Funding Sources

  • National High Technology Research and Development Program of China
  • Natural Science Foundation of China
  • National Science and Technology Major Project of China

Conference

ASPLOS '16

Acceptance Rates

ASPLOS '16 Paper Acceptance Rate 53 of 232 submissions, 23%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)6
Reflects downloads up to 07 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)AppSteer: Framework for Improving Multicore Scalability of Network Functions via Application-aware Packet Steering2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00012(18-27)Online publication date: 6-May-2024
  • (2023)Light: A Compatible, high-performance and scalable user-level network stackComputer Networks10.1016/j.comnet.2023.109756229(109756)Online publication date: Jun-2023
  • (2022)Queueing-Theoretic Performance Analysis of a Low-Entropy Labeled Network StackIntelligent Computing10.34133/2022/98630542022Online publication date: 5-Sep-2022
  • (2022)PipeDeviceProceedings of the 18th International Conference on emerging Networking EXperiments and Technologies10.1145/3555050.3569118(126-139)Online publication date: 30-Nov-2022
  • (2022)NetKernel: Making Network Stack Part of the Virtualized InfrastructureIEEE/ACM Transactions on Networking10.1109/TNET.2021.312980630:3(999-1013)Online publication date: Jun-2022
  • (2021)Evaluating Network Stacks for the Virtualized Mobile Packet CoreProceedings of the 5th Asia-Pacific Workshop on Networking10.1145/3469393.3469402(72-79)Online publication date: 24-Jun-2021
  • (2020)NetKernelProceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference10.5555/3489146.3489156(143-157)Online publication date: 15-Jul-2020
  • (2020)TCP ≈ RDMAProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388252(127-140)Online publication date: 25-Feb-2020
  • (2020)FastUDP: a highly scalable user-level UDP framework in multi-core systems for fast packet I/OThe Journal of Supercomputing10.1007/s11227-020-03486-677:5(5148-5175)Online publication date: 3-Nov-2020
  • (2019)SocksdirectProceedings of the ACM Special Interest Group on Data Communication10.1145/3341302.3342071(90-103)Online publication date: 19-Aug-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media