Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2396556.2396564acmconferencesArticle/Chapter ViewAbstractPublication PagesancsConference Proceedingsconference-collections
research-article

Cache-aware affinitization on commodity multicores for high-speed network flows

Published: 29 October 2012 Publication History
  • Get Citation Alerts
  • Abstract

    For a given TCP or UDP flow, protocol processing of incoming packets is performed on the core that receives the interrupt, while the user-space application which consumes the data may run on the same or a different core. If the cores are not the same, additional costs due to context switches, cache misses, and the movement of data between the caches of the cores may occur. The magnitude of this cost depends upon the processor affinity of the user-space process relative to the network stack. In this paper we present a prototype implementation of a tool which enables the application processing and protocol processing to occur on cores which share the lowest cache level. The Cache-Aware Affinity Deamon (CAAD) analyzes the topology of the die and the NIC characteristics and conveys information to the sender which allows the entire end-to-end path for each new flow to be be managed and controlled. This is done in a light-weight manner for both uni and bi-directional flows. Measurements show that for bulk data transfers using commodity multicore machines, the use of CAAD improves the overall TCP throughput by as much as 31%, and reduces the cache miss rate as much as 37.5%. GridFTP combined with CAAD improves the download time for big file transfers by up to 18%.

    References

    [1]
    irqbalance. http://www.irqbalance.org/.
    [2]
    Rss verification. http://www.intel.com/content/www/us/en/ ethernet-controllers/82598--10-gbe-controller-datasheet.html.
    [3]
    Microsoft corporation. scalable networking with rss, 2005.
    [4]
    W. Allcock, J. Bresnahan, R. Kettimuthu, M. Link, C. Dumitrescu, I. Raicu, and I. Foster. The globus striped gridftp framework and server. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page 54. IEEE Computer Society, 2005.
    [5]
    A. Foong, J. Fung, and D. Newell. An in-depth analysis of the impact of processor affinity on network performance. In Networks, 2004.(ICON 2004). Proceedings. 12th IEEE International Conference on, volume 1, pages 244--250. IEEE, 2004.
    [6]
    A. Foong, J. Fung, D. Newell, S. Abraham, P. Irelan, and A. Lopez-Estrada. Architectural characterization of processor affinity in network processing. In Performance Analysis of Systems and Software, 2005. ISPASS 2005. IEEE International Symposium on, pages 207--218. IEEE, 2005.
    [7]
    S.H. Fuller and L.I. Millett. Computing performance: Game over or next level? Computer, 44(1):31--38, 2011.
    [8]
    T. Herbert. rfs: receive flow steering, september 2010. http://lwn.net/Articles/381955/.
    [9]
    T. Herbert. rps: receive packet steering, september 2010. http://lwn.net/Articles/361440/.
    [10]
    R. Huggahalli, R. Iyer, and S. Tetrick. Direct cache access for high bandwidth network i/o. In ACM SIGARCH Computer Architecture News, volume 33, pages 50--59. IEEE Computer Society, 2005.
    [11]
    H.C. Jang and H.W. Jin. Miami: Multi-core aware processor affinity for tcp/ip over multiple network interfaces. In High Performance Interconnects, 2009. HOTI 2009. 17th IEEE Symposium on, pages 73--82. IEEE, 2009.
    [12]
    R. Jones et al. Netperf: a network performance benchmark. Information Networks Division, Hewlett-Packard Company, 1996.
    [13]
    A. Kumar, R. Huggahalli, and S. Makineni. Characterization of direct cache access on multi-core systems and 10gbe. In High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on, pages 341--352. Ieee, 2009.
    [14]
    J. Levon and P. Elie. Oprofile: A system profiler for linux. http://oprofile.sf.net, 2004.
    [15]
    T. Marian, D.A. Freedman, K. Birman, and H. Weatherspoon. Empirical characterization of uncongested optical lambda networks and 10gbe commodity endpoints. In Dependable Systems and Networks (DSN), 2010 IEEE/IFIP International Conference on, pages 575--584. IEEE, 2010.
    [16]
    T.S. Marian. Operating systems abstractions for software packet processing in datacenters. PhD thesis, Cornell University, 2011.
    [17]
    G. Narayanaswamy, P. Balaji, and W. Feng. Impact of network sharing in multi-core architectures. In Computer Communications and Networks, 2008. ICCCN'08. Proceedings of 17th International Conference on, pages 1--6. IEEE, 2008.
    [18]
    A Pande and J Zambreno. Efficient translation of algorithmic kernels on large-scale multi-cores. In Intl. Work. Reconfigurable and Multicore Embedded Systems (WoRMES), IEEE Intl. Conf. Computational Science and Engineering, pages 915--920. IEEE Computer Society, 2009.
    [19]
    A. Pesterev, J. Strauss, N. Zeldovich, and R.T. Morris. Improving network connection locality on multicore systems. In Proceedings of the EuroSys 2012 Conference, EuroSys 2012. EuroSys, 2012.
    [20]
    T. Scogland, P. Balaji, W. Feng, and G. Narayanaswamy. Asymmetric interactions in symmetric multi-core systems: analysis, enhancements and evaluation. In High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. International Conference for, pages 1--12. IEEE, 2008.
    [21]
    Leah Shalev, Julian Satran, Eran Borovik, and Muli Ben-Yehuda. Isostack: highly efficient network processing on dedicated cores. In Proceedings of the 2010 USENIX conference on USENIX annual technical conference, USENIXATC'10, pages 5--5, Berkeley, CA, USA, 2010. USENIX Association.
    [22]
    W.R. Stevens. TCP/IP Illustrated: the protocols, volume 1. Addison-Wesley Professional, 1994.
    [23]
    D. Ghosal V. Ahuja and M. Farrens. Minimizing the data transfer time using multicore end-system aware flow bifurcation. In CCGrid, 2012.12th IEEEACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 2012.
    [24]
    W. Wu, P. DeMar, and M. Crawford. A transport-friendly nic for multicore/multiprocessor systems. Parallel and Distributed Systems, IEEE Transactions on, (99):1--1, 2011.

    Cited By

    View all
    • (2021)NUMA-aware I/O System Call Steering2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00077(805-806)Online publication date: Sep-2021
    • (2019)Advising Big Data Transfer Over Dedicated Connections Based on Profiling OptimizationIEEE/ACM Transactions on Networking10.1109/TNET.2019.294388427:6(2280-2293)Online publication date: Dec-2019
    • (2018)Stochastic Approximation-Based Transport Profiling for Big Data Movement Over Dedicated ConnectionsStochastic Methods for Estimation and Problem Solving in Engineering10.4018/978-1-5225-5045-7.ch005(113-138)Online publication date: 2018
    • Show More Cited By

    Index Terms

    1. Cache-aware affinitization on commodity multicores for high-speed network flows

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          ANCS '12: Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
          October 2012
          270 pages
          ISBN:9781450316859
          DOI:10.1145/2396556
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 29 October 2012

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. cache affinity
          2. high-speed networks
          3. processor affinity
          4. receive livelock

          Qualifiers

          • Research-article

          Conference

          ANCS '12

          Acceptance Rates

          Overall Acceptance Rate 88 of 314 submissions, 28%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)3
          • Downloads (Last 6 weeks)2
          Reflects downloads up to 10 Aug 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2021)NUMA-aware I/O System Call Steering2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00077(805-806)Online publication date: Sep-2021
          • (2019)Advising Big Data Transfer Over Dedicated Connections Based on Profiling OptimizationIEEE/ACM Transactions on Networking10.1109/TNET.2019.294388427:6(2280-2293)Online publication date: Dec-2019
          • (2018)Stochastic Approximation-Based Transport Profiling for Big Data Movement Over Dedicated ConnectionsStochastic Methods for Estimation and Problem Solving in Engineering10.4018/978-1-5225-5045-7.ch005(113-138)Online publication date: 2018
          • (2018)A Survey of End-System Optimizations for High-Speed NetworksACM Computing Surveys10.1145/318489951:3(1-36)Online publication date: 16-Jul-2018
          • (2017)Protocol-Aware Packet Scheduling Algorithm for Multi-Protocol Processing in Multi-Core MPL ArchitectureIEICE Transactions on Information and Systems10.1587/transinf.2017PAP0016E100.D:12(2837-2846)Online publication date: 2017
          • (2017)Data Transfer Advisor with Transport Profiling Optimization2017 IEEE 42nd Conference on Local Computer Networks (LCN)10.1109/LCN.2017.23(269-277)Online publication date: Oct-2017
          • (2016)A Technique for Improving Lifetime of Non-Volatile Caches Using Write-MinimizationJournal of Low Power Electronics and Applications10.3390/jlpea60100016:1(1)Online publication date: 18-Jan-2016
          • (2016)Event-Driven Approach for Flow-to-Core Mapping by NICs in Multicore SystemsIEEE Communications Letters10.1109/LCOMM.2016.253876320:5(882-885)Online publication date: May-2016
          • (2016)Profiling Optimization for Big Data Transfer over Dedicated Channels2016 25th International Conference on Computer Communication and Networks (ICCCN)10.1109/ICCCN.2016.7568562(1-9)Online publication date: Aug-2016
          • (2016)Improving network performance on multicore systemsFuture Generation Computer Systems10.1016/j.future.2015.09.01256:C(277-283)Online publication date: 1-Mar-2016
          • Show More Cited By

          View Options

          Get Access

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media