Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Aggregate Flow-Based Performance Fairness in CMPs

Published: 28 December 2016 Publication History
  • Get Citation Alerts
  • Abstract

    In CMPs, multiple co-executing applications create mutual interference when sharing the underlying network-on-chip architecture. Such interference causes different performance slowdowns to different applications. To mitigate the unfairness problem, we treat traffic initiated from the same thread as an aggregate flow such that causal request/reply packet sequences can be allocated to resources consistently and fairly according to online profiled traffic injection rates. Our solution comprises three coherent mechanisms from rate profiling, rate inheritance, and rate-proportional channel scheduling to facilitate and realize unbiased workload-adaptive resource allocation. Full-system evaluations in GEM5 demonstrate that, compared to classic packet-centric and latest application-prioritization approaches, our approach significantly improves weighted speed-up for all multi-application mixtures and achieves nearly ideal performance fairness.

    References

    [1]
    Dennis Abts, Natalie D. Enright Jerger, John Kim, Dan Gibson, and Mikko H. Lipasti. 2009. Achieving predictable performance through better memory controller placement in many-core CMPs. In Proceedings of the International Symposium on Computer Architecture (ISCA). 451--461.
    [2]
    Niket Agarwal, Tushar Krishna, Li Shiuan Peh, and Niraj K. Jha. 2009. GARNET: A detailed on-chip network model inside a full-system simulator. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS). 33--42.
    [3]
    Rachata Ausavarungnirun, Kevin Kai-Wei Chang, Chris Fallin, and Onur Mutlu. 2011. Adaptive cluster throttling: Improving high-load performance in bufferless on-chip networks. CMU SAFARI Technical Report No. 2011-006 (2011).
    [4]
    Jon C. R. Bennett and Hui Zhang. 1997. Hierarchical packet fair queueing algorithms. IEEE/ACM Transactions on Networking 5, 5 (1997), 675--689.
    [5]
    Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). 72--81.
    [6]
    Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The gem5 simulator. SIGARCH Computer Architecture News 39, 2 (2011), 1--7.
    [7]
    Ramazan Bitirgen, Engin Ipek, and José F. Martínez. 2008. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In Proceedings of International Symposium on Microarchitecture (MICRO). 318--329.
    [8]
    Kevin Kai-Wei Chang, Rachata Ausavarungnirun, Chris Fallin, and Onur Mutlu. 2012. HAT: Heterogeneous adaptive throttling for on-chip networks. In Proceedings of the International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). 9--18.
    [9]
    Reetuparna Das, Rachata Ausavarungnirun, Onur Mutlu, Akhilesh Kumar, and Mani Azimi. 2013. Application-to-core mapping policies to reduce memory system interference in multi-core systems. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA). 107--118.
    [10]
    Reetuparna Das, Onur Mutlu, Thomas Moscibroda, and Chita R. Das. 2009. Application-aware prioritization mechanisms for on-chip networks. In Proceedings of the International Symposium on Microarchitecture (MICRO). 280--291.
    [11]
    Reetuparna Das, Onur Mutlu, Thomas Moscibroda, and Chita R. Das. 2010. Aérgia: Exploiting packet latency slack in on-chip networks. In Proceedings of the International Symposium on Computer Architecture (ISCA). 106--116.
    [12]
    Alan Demers, Srinivasan Keshav, and Scott Shenker. 1989. Analysis and simulation of a fair queueing algorithm. ACM SIGCOMM Computer Communication Review 19, 4 (1989), 1--12.
    [13]
    Benoît Dupont de Dinechin, Yves Durand, Duco van Amstel, and Alexandre Ghiti. 2014. Guaranteed services of the NoC of a manycore processor. In Proceedings of the International Workshop on Network-on-Chip Architectures (NoCArc). 0--5.
    [14]
    Eiman Ebrahimi, Chang Joo Lee, Onur Mutlu, and Yale N. Patt. 2010. Fairness via source throttling: A configurable and high-performance fairness substrate for multi-core memory systems. In Architectural Support for Programming Languages and Operating Systems (ASPLOS). 335--346.
    [15]
    Stijn Eyerman and Lieven Eeckhout. 2008. System-level performance metrics for multiprogram workloads. IEEE Micro 28, 3 (2008), 42--53.
    [16]
    S. Jamaloddin Golestani. 1994. A self-clocked fair queueing scheme for broadband applications. In Proceedings of the International Conference on Computer Communications (INFOCOM). 636--646.
    [17]
    Kees Goossens, John Dielissen, and Andrei Radulescu. 2005. Æthereal network on chip: Concepts, architectures, and implementations. Design Test of Computers 22, 5 (2005), 414--421.
    [18]
    Paul Gratz, Boris Grot, and Stephen W. Keckler. 2008. Regional congestion awareness for load balance in networks-on-chip. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA). 203--214.
    [19]
    Boris Grot, Joel Hestness, Stephen W. Keckler, and Onur Mutlu. 2011. Kilo-NOC: A heterogeneous network-on-chip architecture for scalability and service guarantees. In Proceedings of the International Symposium on Computer Architecture (ISCA). 401--412.
    [20]
    Boris Grot, Stephen W. Keckler, and Onur Mutlu. 2009. Preemptive virtual clock: A flexible, efficient, and cost-effective QoS scheme for networks-on-chip. In Proceedings of the International Symposium on Microarchitecture (MICRO). 268--279.
    [21]
    Fei Guo, Yan Solihin, Li Zhao, and Ravishankar Iyer. 2007. A framework for providing quality of service in chip multi-processors. In Proceedings of the International Symposium on Microarchitecture (MICRO). 343--355.
    [22]
    John L. Hennessy and David A. Patterson. 2011. Computer Architecture: A Quantitative Approach, Fifth Edition. Morgan Kaufmann.
    [23]
    Victor Jimenez, Alper Buyuktosunoglu, Pradip Bose, Francis P. O’Connelll, Francisco Cazorla, and Mateo Valero. 2015. Increasing multicore system efficiency through intelligent bandwidth shifting. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA). 39--50.
    [24]
    Hyoseung Kim, Dionisio De Niz, Björn Andersson, Mark Klein, Onur Mutlu, and Ragunathan Rajkumar. 2014. Bounding memory interference delay in COTS-based multi-core systems. In Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). 145--154.
    [25]
    S. Kim, D. Chandra, and Y. Solihin. 2004. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). 111--122.
    [26]
    Jae W. Lee, Man Cheuk Ng, and Krste Asanović. 2008. Globally-synchronized frames for guaranteed quality-of-service in on-chip networks. In Proceedings of the International Symposium on Computer Architecture (ISCA). 89--100.
    [27]
    Bin Li, Li-Shiuan Peh, Li Zhao, and Ravi Iyer. 2012. Dynamic QoS management for chip multiprocessors. ACM Transation on Architecture and Code Optimization (TACO) 9, 3 (2012), 17:1--17:29.
    [28]
    Bin Li, Li Zhao, Ravi Iyer, Li Shiuan Peh, Michael Leddige, Michael Espig, Seung Eun Lee, and Donald Newell. 2011. CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs. Journal of Parallel and Distributed Computing 71, 5 (2011), 700--713.
    [29]
    Zhonghai Lu and Yi Wang. 2012. Dynamic flow regulation for IP integration on network-on-chip. In Proceedings of the International Symposium on Networks on Chip (NoCS). 115--123.
    [30]
    Paul Marchal, Diederik Verkest, Adelina Shickova, Francky Catthoor, Frédéric Robert, and Anthony Leroy. 2005. Spatial division multiplexing: A novel approach for guaranteed throughput on NoCs. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES). 81--86.
    [31]
    Thomas Moscibroda and Onur Mutlu. 2009. A case for bufferless routing in on-chip networks. In Proceedings of the International Symposium on Computer Architecture (ISCA) (2009), 196--207.
    [32]
    Sai Prashanth Muralidhara, Lavanya Subramanian, Onur Mutlu, Mahmut Kandemir, and Thomas Moscibroda. 2011. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Proceedings of the International Symposium on Microarchitecture (MICRO). 374--385.
    [33]
    Naveen Muralimanohar, Rajeev Balasubramonian, and Norm Jouppi. 2007. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of the International Symposium on Microarchitecture (MICRO). 3--14.
    [34]
    Onur Mutlu and Thomas Moscibroda. 2007. Stall-time fair memory access scheduling for chip multiprocessors. In Proceedings of the International Symposium on Microarchitecture (MICRO). 146--158.
    [35]
    Onur Mutlu and Thomas Moscibroda. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In Proceedings of the International Symposium on Computer Architecture (ISCA). 63--74.
    [36]
    Kyle J. Nesbit, Nidhi Aggarwal, James Laudon, and James E. Smith. 2006. Fair queuing memory systems. In Proceedings of the International Symposium on Microarchitecture (MICRO). 208--219.
    [37]
    Kyle J. Nesbit, Miquel Moreto, Francisco J. Cazorla, Alex Ramirez, Mateo Valero, and James E. Smith. 2008. Multicore resource management. IEEE Micro 28, 3 (2008), 6--16.
    [38]
    George Nychis, Chris Fallin, Thomas Moscibroda, and Onur Mutlu. 2010. Next generation on-chip networks: What kind of congestion control do we need? In Proceedings of the ACM SIGCOMM Workshop on Hot Topics in Networks. Article No. 12.
    [39]
    George P. Nychis, Chris Fallin, Thomas Moscibroda, Onur Mutlu, and Srinivasan Seshan. 2012. On-chip networks from a networking perspective: Congestion and scalability in many-core interconnects. In Proceedings of the ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM). 407--418.
    [40]
    Jin Ouyang and Yuan Xie. 2010. LOFT: A high performance network-on-chip providing quality-of-service support. In Proceedings of the International Symposium on Microarchitecture (MICRO). 409--420.
    [41]
    Abhay K. Parekh and Robert G. Gallager. 1993. A generalized processor sharing approach to flow control in integrated services networks: The single-node case. Transactions on Networking 1, 3 (1993), 344--357.
    [42]
    Sunghyun Park, Tushar Krishna, Chia-Hsin Chen, Bhavya Daya, Anantha Chandrakasan, and Li-Shiuan Peh. 2012. Approaching the theoretical limits of a mesh NoC with a 16-node chip prototype in 45nm SOI. In Proceedings of the Design Automation Conference (DAC). 398--405.
    [43]
    Li-Shiuan Peh and William J. Dally. 2001. A delay model and speculative architecture for pipelined routers. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA). 255--266.
    [44]
    Nauman Rafique, Won Taek Lim, and Mithuna Thottethodi. 2007. Effective management of DRAM bandwidth in multicore processors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). 245--255.
    [45]
    Jennifer L. Rexford, Albert G. Greenberg, and Flavio G. Bonomi. 1996. Hardware-efficient fair queueing architectures for high-speed networks. In Proceedings of the International Conference on Computer Communications (INFOCOM). 638--646.
    [46]
    Hanrijanto Sariowan, Rene L. Cruz, and George C. Polyzos. 1995. Scheduling for quality of service guarantees via service curves. In Proceedings of the International Conference on Computer Communications and Networks (ICCCN). 512--520.
    [47]
    Akbar Sharifi, Shekhar Srikantaiah, Asit K. Mishra, Mahmut Kandemir, and Chita R. Das. 2011. METE: Meeting end-to-end QoS in multicores through system-wide resource management. ACM SIGMETRICS Performance Evaluation Review 39, 1 (2011), 13.
    [48]
    Allan Snavely and Dean M. Tullsen. 2000. Symbiotic job scheduling for a simultaneous multithreaded processor. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 234--244.
    [49]
    Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, and Onur Mutlu. 2015. The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory. In Proceedings of the International Symposium on Microarchitecture (MICRO). 62--75.
    [50]
    Lavanya Subramanian, Vivek Seshadri, Yoongu Kim, Ben Jaiyen, and Onur Mutlu. 2013. MISE: Providing performance predictability and improving fairness in shared main memory systems. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA). 639--650.
    [51]
    Mithuna Thottethodi, Alvin R. Lebeck, and Shubhendu S. Mukherjee. 2001. Self-tuned congestion control for multiprocessor networks. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA). 107--118.
    [52]
    Hiroyuki Usui, Lavanya Subramanian, Kevin Kai-Wei Chang, and Onur Mutlu. 2016. DASH: Deadline-aware high-performance memory scheduler for heterogeneous systems with hardware accelerators. ACM Transations on Architecture and Code Optimization (TACO) 12, 4 (2016), 65:1--65:28.
    [53]
    Xiaodong Wang and José F. Martínez. 2015. XChange: A market-based approach to scalable dynamic multi-resource allocation in multicore architectures. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA). 113--125.
    [54]
    Xiyue Xiang, Saugata Ghose, Onur Mutlu, and Nian-Feng Tzeng. 2016. A model for estimating application slowdowns in NoCs and its use for improving network fairness and performance. In Proceedings of the International Conference on Computer Design (ICCD).
    [55]
    Mingli Xie, Dong Tong, Kan Huang, and Xu Cheng. 2014. Improving system throughput and fairness simultaneously in shared memory CMP systems via dynamic bank partitioning. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA). 344--355.
    [56]
    Hui Zhang. 1995. Service disciplines for guaranteed performance service in packet-switching networks. Proceedings of the IEEE 83, 10 (1995), 1374--1396.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 4
    December 2016
    648 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/3012405
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 December 2016
    Accepted: 01 October 2016
    Revised: 01 October 2016
    Received: 01 May 2016
    Published in TACO Volume 13, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Computer architecture
    2. performance fairness
    3. quality of service

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Swedish Research Council (Vetenskapsrådet)

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)36
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media