research-article

Cache-aware affinitization on commodity multicores for high-speed network flows

Authors:

Matthew Farrens,

Dipak GhosalAuthors Info & Claims

ANCS '12: Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems

Pages 39 - 48

https://doi.org/10.1145/2396556.2396564

Published: 29 October 2012 Publication History

Abstract

For a given TCP or UDP flow, protocol processing of incoming packets is performed on the core that receives the interrupt, while the user-space application which consumes the data may run on the same or a different core. If the cores are not the same, additional costs due to context switches, cache misses, and the movement of data between the caches of the cores may occur. The magnitude of this cost depends upon the processor affinity of the user-space process relative to the network stack. In this paper we present a prototype implementation of a tool which enables the application processing and protocol processing to occur on cores which share the lowest cache level. The Cache-Aware Affinity Deamon (CAAD) analyzes the topology of the die and the NIC characteristics and conveys information to the sender which allows the entire end-to-end path for each new flow to be be managed and controlled. This is done in a light-weight manner for both uni and bi-directional flows. Measurements show that for bulk data transfers using commodity multicore machines, the use of CAAD improves the overall TCP throughput by as much as 31%, and reduces the cache miss rate as much as 37.5%. GridFTP combined with CAAD improves the download time for big file transfers by up to 18%.

References

[1]

irqbalance. http://www.irqbalance.org/.

[2]

Rss verification. http://www.intel.com/content/www/us/en/ ethernet-controllers/82598--10-gbe-controller-datasheet.html.

[3]

Microsoft corporation. scalable networking with rss, 2005.

[4]

W. Allcock, J. Bresnahan, R. Kettimuthu, M. Link, C. Dumitrescu, I. Raicu, and I. Foster. The globus striped gridftp framework and server. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page 54. IEEE Computer Society, 2005.

Digital Library

[5]

A. Foong, J. Fung, and D. Newell. An in-depth analysis of the impact of processor affinity on network performance. In Networks, 2004.(ICON 2004). Proceedings. 12th IEEE International Conference on, volume 1, pages 244--250. IEEE, 2004.

[6]

A. Foong, J. Fung, D. Newell, S. Abraham, P. Irelan, and A. Lopez-Estrada. Architectural characterization of processor affinity in network processing. In Performance Analysis of Systems and Software, 2005. ISPASS 2005. IEEE International Symposium on, pages 207--218. IEEE, 2005.

Digital Library

[7]

S.H. Fuller and L.I. Millett. Computing performance: Game over or next level? Computer, 44(1):31--38, 2011.

Digital Library

[8]

T. Herbert. rfs: receive flow steering, september 2010. http://lwn.net/Articles/381955/.

[9]

T. Herbert. rps: receive packet steering, september 2010. http://lwn.net/Articles/361440/.

[10]

R. Huggahalli, R. Iyer, and S. Tetrick. Direct cache access for high bandwidth network i/o. In ACM SIGARCH Computer Architecture News, volume 33, pages 50--59. IEEE Computer Society, 2005.

Digital Library

[11]

H.C. Jang and H.W. Jin. Miami: Multi-core aware processor affinity for tcp/ip over multiple network interfaces. In High Performance Interconnects, 2009. HOTI 2009. 17th IEEE Symposium on, pages 73--82. IEEE, 2009.

Digital Library

[12]

R. Jones et al. Netperf: a network performance benchmark. Information Networks Division, Hewlett-Packard Company, 1996.

[13]

A. Kumar, R. Huggahalli, and S. Makineni. Characterization of direct cache access on multi-core systems and 10gbe. In High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on, pages 341--352. Ieee, 2009.

[14]

J. Levon and P. Elie. Oprofile: A system profiler for linux. http://oprofile.sf.net, 2004.

[15]

T. Marian, D.A. Freedman, K. Birman, and H. Weatherspoon. Empirical characterization of uncongested optical lambda networks and 10gbe commodity endpoints. In Dependable Systems and Networks (DSN), 2010 IEEE/IFIP International Conference on, pages 575--584. IEEE, 2010.

[16]

T.S. Marian. Operating systems abstractions for software packet processing in datacenters. PhD thesis, Cornell University, 2011.

Digital Library

[17]

G. Narayanaswamy, P. Balaji, and W. Feng. Impact of network sharing in multi-core architectures. In Computer Communications and Networks, 2008. ICCCN'08. Proceedings of 17th International Conference on, pages 1--6. IEEE, 2008.

[18]

A Pande and J Zambreno. Efficient translation of algorithmic kernels on large-scale multi-cores. In Intl. Work. Reconfigurable and Multicore Embedded Systems (WoRMES), IEEE Intl. Conf. Computational Science and Engineering, pages 915--920. IEEE Computer Society, 2009.

Digital Library

[19]

A. Pesterev, J. Strauss, N. Zeldovich, and R.T. Morris. Improving network connection locality on multicore systems. In Proceedings of the EuroSys 2012 Conference, EuroSys 2012. EuroSys, 2012.

Digital Library

[20]

T. Scogland, P. Balaji, W. Feng, and G. Narayanaswamy. Asymmetric interactions in symmetric multi-core systems: analysis, enhancements and evaluation. In High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. International Conference for, pages 1--12. IEEE, 2008.

Digital Library

[21]

Leah Shalev, Julian Satran, Eran Borovik, and Muli Ben-Yehuda. Isostack: highly efficient network processing on dedicated cores. In Proceedings of the 2010 USENIX conference on USENIX annual technical conference, USENIXATC'10, pages 5--5, Berkeley, CA, USA, 2010. USENIX Association.

Digital Library

[22]

W.R. Stevens. TCP/IP Illustrated: the protocols, volume 1. Addison-Wesley Professional, 1994.

Digital Library

[23]

D. Ghosal V. Ahuja and M. Farrens. Minimizing the data transfer time using multicore end-system aware flow bifurcation. In CCGrid, 2012.12th IEEEACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 2012.

Digital Library

[24]

W. Wu, P. DeMar, and M. Crawford. A transport-friendly nic for multicore/multiprocessor systems. Parallel and Distributed Systems, IEEE Transactions on, (99):1--1, 2011.

Digital Library

Cited By

Lee CJin H(2021)NUMA-aware I/O System Call Steering2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00077(805-806)Online publication date: Sep-2021
https://doi.org/10.1109/Cluster48925.2021.00077
Yun DWu CRao NKettimuthu R(2019)Advising Big Data Transfer Over Dedicated Connections Based on Profiling OptimizationIEEE/ACM Transactions on Networking10.1109/TNET.2019.294388427:6(2280-2293)Online publication date: Dec-2019
https://doi.org/10.1109/TNET.2019.2943884
Yun DWu C(2018)Stochastic Approximation-Based Transport Profiling for Big Data Movement Over Dedicated ConnectionsStochastic Methods for Estimation and Problem Solving in Engineering10.4018/978-1-5225-5045-7.ch005(113-138)Online publication date: 2018
https://doi.org/10.4018/978-1-5225-5045-7.ch005
Show More Cited By

Index Terms

Cache-aware affinitization on commodity multicores for high-speed network flows
1. Networks
  1. Network properties
    1. Network range
      1. Local area networks
      2. Wide area networks
  2. Network protocols

Recommendations

Unreliable transport protocol using congestion control for high-speed networks

Currently there is no control for the real-time traffic of multimedia applications using UDP (User Datagram Protocol) in high-speed networks. Therefore, although a number of high-speed TCP (Transmission Control Protocol) protocols have been developed ...
Hybrid congestion control for high-speed networks

Nowadays, more and more applications require fast transfer of massive data over networks, and the emergence of high-speed networks provides an ideal solution to this challenge. Due to the limitations of the conservative congestion control algorithm, the ...
Experimental evaluation of TCP protocols for high-speed networks

In this paper, we present experimental results evaluating the performance of the scalable-TCP, HS-TCP, BIC-TCP, FAST-TCP, and H-TCP proposals in a series of benchmark tests. In summary, we find that both Scalable-TCP and FAST-TCP consistently exhibit ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ANCS '12: Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems

October 2012

270 pages

ISBN:9781450316859

DOI:10.1145/2396556

General Chair:
Tilman Wolf
University of Massachusetts Amherst, USA
,
Program Chairs:
Andrew W. Moore
University of Cambridge, UK
,
Viktor Prasanna
University of Southern California, USA

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ANCS '12

Sponsor:

ANCS '12: Symposium on Architecture for Networking and Communications Systems

October 29 - 30, 2012

Texas, Austin, USA

Acceptance Rates

Overall Acceptance Rate 88 of 314 submissions, 28%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
281
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)2

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lee CJin H(2021)NUMA-aware I/O System Call Steering2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00077(805-806)Online publication date: Sep-2021
https://doi.org/10.1109/Cluster48925.2021.00077
Yun DWu CRao NKettimuthu R(2019)Advising Big Data Transfer Over Dedicated Connections Based on Profiling OptimizationIEEE/ACM Transactions on Networking10.1109/TNET.2019.294388427:6(2280-2293)Online publication date: Dec-2019
https://doi.org/10.1109/TNET.2019.2943884
Yun DWu C(2018)Stochastic Approximation-Based Transport Profiling for Big Data Movement Over Dedicated ConnectionsStochastic Methods for Estimation and Problem Solving in Engineering10.4018/978-1-5225-5045-7.ch005(113-138)Online publication date: 2018
https://doi.org/10.4018/978-1-5225-5045-7.ch005
Hanford NAhuja VFarrens MTierney BGhosal D(2018)A Survey of End-System Optimizations for High-Speed NetworksACM Computing Surveys10.1145/318489951:3(1-36)Online publication date: 16-Jul-2018
https://dl.acm.org/doi/10.1145/3184899
ZHANG RWANG JSHENG YCHEN XYE X(2017)Protocol-Aware Packet Scheduling Algorithm for Multi-Protocol Processing in Multi-Core MPL ArchitectureIEICE Transactions on Information and Systems10.1587/transinf.2017PAP0016E100.D:12(2837-2846)Online publication date: 2017
https://doi.org/10.1587/transinf.2017PAP0016
Yun DWu CRao NLiu QKettimuthu RJung E(2017)Data Transfer Advisor with Transport Profiling Optimization2017 IEEE 42nd Conference on Local Computer Networks (LCN)10.1109/LCN.2017.23(269-277)Online publication date: Oct-2017
https://doi.org/10.1109/LCN.2017.23
Mittal SVetter J(2016)A Technique for Improving Lifetime of Non-Volatile Caches Using Write-MinimizationJournal of Low Power Electronics and Applications10.3390/jlpea60100016:1(1)Online publication date: 18-Jan-2016
https://doi.org/10.3390/jlpea6010001
Lee HChoi K(2016)Event-Driven Approach for Flow-to-Core Mapping by NICs in Multicore SystemsIEEE Communications Letters10.1109/LCOMM.2016.253876320:5(882-885)Online publication date: May-2016
https://doi.org/10.1109/LCOMM.2016.2538763
Yun DWu CRao NLiu QKettimuthu RJung E(2016)Profiling Optimization for Big Data Transfer over Dedicated Channels2016 25th International Conference on Computer Communication and Networks (ICCCN)10.1109/ICCCN.2016.7568562(1-9)Online publication date: Aug-2016
https://doi.org/10.1109/ICCCN.2016.7568562
Hanford NAhuja VFarrens MGhosal DBalman MPouyoul ETierney B(2016)Improving network performance on multicore systemsFuture Generation Computer Systems10.1016/j.future.2015.09.01256:C(277-283)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1016/j.future.2015.09.012
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents