Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

MARCO: A High-performance Task Mapping and Routing Co-optimization Framework for Point-to-Point NoC-based Heterogeneous Computing Systems

Published: 17 September 2021 Publication History

Abstract

Heterogeneous computing systems (HCSs), which consist of various processing elements (PEs) that vary in their processing ability, are usually facilitated by the network-on-chip (NoC) to interconnect its components. The emerging point-to-point NoCs which support single-cycle-multi-hop transmission, reduce or eliminate the latency dependence on distance, addressing the scalability concern raised by high latency for long-distance transmission and enlarging the design space of the routing algorithm to search the non-shortest paths. For such point-to-point NoC-based HCSs, resource management strategies which are managed by compilers, scheduler, or controllers, e.g., mapping and routing, are complicated for the following reasons: (i) Due to the heterogeneity, mapping and routing need to optimize computation and communication concurrently (for homogeneous computing systems, only communication). (ii) Conducting mapping and routing consecutively cannot minimize the schedule length in most cases since the PEs with high processing ability may locate in the crowded area and suffer from high resource contention overhead. (iii) Since changing the mapping selection of one task will reconstruct the whole routing design space, the exploration of mapping and routing design space is challenging. Therefore, in this work, we propose MARCO, the mapping and routing co-optimization framework, to decrease the schedule length of applications on point-to-point NoC-based HCSs. Specifically, we revise the tabu search to explore the design space and evaluate the quality of mapping and routing. The advanced reinforcement learning (RL)algorithm, i.e., advantage actor-critic, is adopted to efficiently compute paths. We perform extensive experiments on various real applications, which demonstrates that the MARCO achieves a remarkable performance improvement in terms of schedule length (+44.94% ∼ +50.18%) when compared with the state-of-the-art mapping and routing co-optimization algorithm for homogeneous computing systems. We also compare MARCO with different combinations of state-of-the-art mapping and routing approaches.

References

[1]
Shoukat Ali, Howard Jay Siegel, Muthucumaru Maheswaran, Debra Hensgen, and Sahra Ali. 2000. Task execution time modeling for heterogeneous computing systems. Proceedings of the Heterogeneous Computing Workshop, HCW (2000), 185–199. https://doi.org/10.1109/hcw.2000.843743
[2]
Marvin A. Arostegui Jr, Sukran N. Kadipasaoglu, and Basheer M. Khumawala. 2006. An empirical comparison of tabu search, simulated annealing, and genetic algorithms for facilities location problems. International Journal of Production Economics 103, 2 (2006), 742–754.
[3]
Yashar Asgarieh and Bill Lin. 2019. Smart-hop arbitration request propagation: Avoiding quadratic arbitration complexity and false negatives in SMART NoCs. ACM Trans. Des. Autom. Electron. Syst. 24, 6, Article 64 (Oct. 2019), 25 pages. https://doi.org/10.1145/3356235
[4]
A. G. Barto, R. S. Sutton, and C. W. Anderson. 1983. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics SMC-13, 5 (1983), 834–846. https://doi.org/10.1109/TSMC.1983.6313077
[5]
Chia Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramanian, Anantha P. Chandrakasan, and Li Shiuan Peh. 2013. SMART: A single-cycle reconfigurable NoC for SoC applications. Proceedings-Design, Automation and Test in Europe, DATE (2013), 338–343. https://doi.org/10.7873/date.2013.080
[6]
Hui Chen, Peng Chen, Jun Zhou, Duong H. K. Luan, and Weichen Liu. 2020. ArSMART: An improved SMART NoC design supporting arbitrary-turn transmission. arXiv preprint arXiv:2011.09261 (2020).
[7]
P. Chen, W. Liu, H. Chen, S. Li, M. Li, L. Yang, and N. Guan. 2020. Reduced worst-case communication latency using single-cycle multi-hop traversal network-on-chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2020), 1–1. https://doi.org/10.1109/TCAD.2020.3015440
[8]
Peng Chen, Weichen Liu, Mengquan Li, Lei Yang, and Nan Guan. 2020. Contention minimized bypassing in SMART NoC. Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC 2020-January (2020), 205–210. https://doi.org/10.1109/ASP-DAC47756.2020.9045103
[9]
Shengkai Chen, Shuiliang Fang, and Renzhong Tang. 2019. A reinforcement learning based approach for multi-projects scheduling in cloud manufacturing. International Journal of Production Research 57, 10 (2019), 3080–3098.
[10]
Xianmin Chen and Niraj K. Jha. 2016. Reducing wire and energy overheads of the SMART NoC using a setup request network. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24, 10 (2016), 3013–3026.
[11]
Xianfu Chen, Honggang Zhang, Celimuge Wu, Shiwen Mao, Yusheng Ji, and Medhi Bennis. 2018. Optimized computation offloading performance in virtual edge computing systems via deep reinforcement learning. IEEE Internet of Things Journal 6, 3 (2018), 4005–4018.
[12]
Ge Ming Chiu. 2000. The odd-even turn model for adaptive routing. IEEE Transactions on Parallel and Distributed Systems 11, 7 (2000), 729–738. https://doi.org/10.1109/71.877831
[13]
Chen Ling Chou and Radu Marculescu. 2008. User-aware dynamic task allocation in networks-on-chip. Proceedings -Design, Automation and Test in Europe, DATE (2008), 1232–1237. https://doi.org/10.1109/DATE.2008.4484847
[14]
Chaochao Feng, Zhonghai Lu, Axel Jantsch, Jinwen Li, and Minxuan Zhang. 2010. A reconfigurable fault-tolerant deflection routing algorithm based on reinforcement learning for network-on-chip. In Proceedings of the Third International Workshop on Network on Chip Architectures. 11–16.
[15]
Quintin Fettes, Mark Clark, Razvan Bunescu, Avinash Karanth, and Ahmed Louri. 2018. Dynamic voltage and frequency scaling in NoCs with supervised and reinforcement learning techniques. IEEE Trans. Comput. 68, 3 (2018), 375–389.
[16]
Ikki Fujiwara and Michihiro Koibuchi. 2013. Mapping non-trivial network topologies onto chips. In 2013 IEEE 7th International Symposium on Embedded Multicore Socs. IEEE, 73–78.
[17]
Philip K. F. Hölzenspies, Timon D. Ter Braak, Jan Kuper, Gerard J. M. Smit, and Johann M. Hurink. 2010. Run-time spatial mapping of streaming applications to heterogeneous multi-processor systems. International Journal of Parallel Programming 38, 1 (2010), 68–83. https://doi.org/10.1007/s10766-009-0120-y
[18]
Jingcao Hu and Radu Marculescu. 2004. DyAD: Smart routing for networks-on-chip. In Proceedings of the 41st annual Design Automation Conference. 260–263.
[19]
Samarth Kaushik, Amit Kumar Singh, and Thambipillai Srikanthan. 2011. Computation and communication aware run-time mapping for NoC-based MPSoC platforms. In 2011 IEEE International SOC Conference. IEEE, 185–190.
[20]
Anish Krishnakumar, Samet E. Arda, A. Alper Goksoy, Sumit K. Mandal, Umit Y. Ogras, Anderson L. Sartor, and Radu Marculescu. 2020. Runtime task scheduling using imitation learning for heterogeneous many-core systems. arXiv preprint arXiv:2007.09361 (2020).
[21]
Ting-Ru Lin, Drew Penney, Massoud Pedram, and Lizhong Chen. 2020. A deep reinforcement learning framework for architectural exploration: A routerless noc case study. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 99–110.
[22]
Marcelo Mandelli, Alexandre Amory, Luciano Ost, and Fernando Gehm Moraes. 2011. Multi-task dynamic mapping onto NoC-based MPSoCs. Proceedings - SBCCI2011: 24th Symposium on Integrated Circuits and Systems Design (2011), 191–196. https://doi.org/10.1145/2020876.2020920
[23]
Maurizio Palesi, Rickard Holsmark, Shashi Kumar, and Vincenzo Catania. 2006. A methodology for design of application specific deadlock-free routing algorithms for NoC systems. CODES+ISSS 2006: Proceedings of the 4th International Conference on Hardware Software Codesign and System Synthesis (2006), 142–147. https://doi.org/10.1145/1176254.1176289
[24]
Sunghyun Park, Tushar Krishna, Chia-Hsin Chen, Bhavya Daya, Anantha Chandrakasan, and Li-Shiuan Peh. 2012. Approaching the theoretical limits of a mesh NoC with a 16-node chip prototype in 45nm SOI. In Proceedings of the 49th Annual Design Automation Conference. 398–405.
[25]
Wei Quan and Andy D. Pimentel. 2015. A hybrid task mapping algorithm for heterogeneous MPSoCs. ACM Transactions on Embedded Computing Systems 14, 1 (2015), 1–25. https://doi.org/10.1145/2680542
[26]
Vijeta Rathore, Vivek Chaturvedi, Amit K. Singh, Thambipillai Srikanthan, and Muhammad Shafique. 2019. Life guard: A reinforcement learning-based task mapping strategy for performance-centric aging management. In 2019 56th ACM/IEEE Design Automation Conference (DAC). IEEE, 1–6.
[27]
Gamal Abd El-Nasser A. Said, Abeer M. Mahmoud, and El-Sayed M. El-Horbaty. 2014. A comparative study of meta-heuristic algorithms for solving quadratic assignment problem. arXiv preprint arXiv:1407.4863 (2014).
[28]
Amit Kumar Singh, Thambipillai Srikanthan, Akash Kumar, and Wu Jigang. 2010. Communication-aware heuristics for run-time task mapping on NoC-based MPSoC platforms. Journal of Systems Architecture 56, 7 (2010), 242–255. https://doi.org/10.1016/j.sysarc.2010.04.007
[29]
Amit Kumar Singh, Jigang Wu, Alok Prakash, and Thambipillai Srikanthan. 2009. Efficient heuristics for minimizing communication overhead in NoC-based heterogeneous MPSoC platforms. Proceedings of the International Workshop on Rapid System Prototyping (2009), 55–60. https://doi.org/10.1109/RSP.2009.18
[30]
Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement learning: An introduction. MIT press.
[31]
Éric Taillard. 1991. Robust taboo search for the quadratic assignment problem. Parallel computing 17, 4–5 (1991), 443–455.
[32]
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. 2019. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2820–2828.
[33]
William Thies and Saman Amarasinghe. 2010. An empirical characterization of stream programs and its implications for language and compiler design. In 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 365–376.
[34]
Ian H Witten. 1977. An adaptive optimal controller for discrete-time Markov environments. Inf. Control. 34, 4 (1977), 286–295.
[35]
Lei Yang, Weichen Liu, Peng Chen, Nan Guan, and Mengquan Li. 2017. Task Mapping on SMART NoC: Contention matters, not the distance. Proceedings - Design Automation Conference Part 12828 (2017). https://doi.org/10.1145/3061639.3062323

Cited By

View all
  • (2024)Toward Energy-efficient STT-MRAM-based Near Memory Computing Architecture for Embedded SystemsACM Transactions on Embedded Computing Systems10.1145/365072923:3(1-24)Online publication date: 25-Apr-2024
  • (2023)Experimental Demonstration of STT-MRAM-based Nonvolatile Instantly On/Off System for IoT Applications: Case StudiesACM Transactions on Embedded Computing Systems10.1145/354619322:2(1-24)Online publication date: 27-Jan-2023
  • (2022)Non-Volatile Phase Change Material based Nanophotonic Interconnect2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774648(1053-1058)Online publication date: 14-Mar-2022
  • Show More Cited By

Index Terms

  1. MARCO: A High-performance Task Mapping and Routing Co-optimization Framework for Point-to-Point NoC-based Heterogeneous Computing Systems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 20, Issue 5s
    Special Issue ESWEEK 2021, CASES 2021, CODES+ISSS 2021 and EMSOFT 2021
    October 2021
    1367 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/3481713
    • Editor:
    • Tulika Mitra
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 17 September 2021
    Accepted: 01 July 2021
    Revised: 01 June 2021
    Received: 01 April 2021
    Published in TECS Volume 20, Issue 5s

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Mapping
    2. routing
    3. noc
    4. heterogeneous computing systems

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • Ministry of Education, Singapore
    • Nanyang Technological University, Singapore

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)61
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 26 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Toward Energy-efficient STT-MRAM-based Near Memory Computing Architecture for Embedded SystemsACM Transactions on Embedded Computing Systems10.1145/365072923:3(1-24)Online publication date: 25-Apr-2024
    • (2023)Experimental Demonstration of STT-MRAM-based Nonvolatile Instantly On/Off System for IoT Applications: Case StudiesACM Transactions on Embedded Computing Systems10.1145/354619322:2(1-24)Online publication date: 27-Jan-2023
    • (2022)Non-Volatile Phase Change Material based Nanophotonic Interconnect2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774648(1053-1058)Online publication date: 14-Mar-2022
    • (2022)Intermittent-Aware Distributed Concurrency ControlIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319750241:11(3721-3732)Online publication date: 1-Nov-2022
    • (undefined)REC: REtime Convolutional layers to fully exploit harvested energy for ReRAM-based CNN acceleratorsACM Transactions on Embedded Computing Systems10.1145/3652593

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media