Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3195970.3196101acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article
Public Access

RAMP: resource-aware mapping for CGRAs

Published: 24 June 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Coarse-grained reconfigurable array (CGRA) is a promising solution that can accelerate even non-parallel loops. Acceleration achieved through CGRAs critically depends on the goodness of mapping (of loop operations onto the PEs of CGRA), and in particular, the compiler's ability to route the dependencies among operations. Previous works have explored several mechanisms to route data dependencies, including, routing through other PEs, registers, memory, and even re-computation. All these routing options change the graph to be mapped onto PEs (often by adding new operations), and without re-scheduling, it may be impossible to map the new graph. However, existing techniques explore these routing options inside the Place and Route (P&R) phase of the compilation process, which is performed after the scheduling step. As a result, they either may not achieve the mapping or obtain poor results. Our method RAMP, explicitly and intelligently explores the various routing options, before the scheduling step, and makes improve the mapping-ability and mapping quality. Evaluating top performance-critical loops of MiBench benchmarks over 12 architectural configurations, we find that RAMP is able to accelerate loops by 23× over sequential execution, achieving a geomean speedup of 2.13× over state-of-the-art.

    References

    [1]
    Shuai Che, Jie Li, Jeremy W Sheaffer, Kevin Skadron, and John Lach. Accelerating compute-intensive applications with gpus and fpgas. In SASP, 2008.
    [2]
    Bingfeng Mei, M Berekovic, and JY Mignolet. Adres & dresc: Architecture and compiler for coarse-grain reconfigurable processors. Springer, 2007.
    [3]
    Hyunchul Park et al. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In PACT, 2008.
    [4]
    Taewook Oh et al. Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures. In ACM Sigplan Notices, 2009.
    [5]
    Hongsik Lee, Dong Nguyen, and Jongeun Lee. Optimizing stream program performance on cgra-based systems. In DAC, 2015.
    [6]
    Zhongyuan Zhao et al. Optimizing the data placement and transformation for multi-bank cgra computing system. In DATE, 2018.
    [7]
    Manupa Karunaratne et al. Hycube: A cgra with reconfigurable single-cycle multi-hop interconnect. In DAC, 2017.
    [8]
    Mahdi Hamzeh, Aviral Shrivastava, and Sarma Vrudhula. Epimap: using epimorphism to map applications on cgras. In DAC, 2012.
    [9]
    Liang Chen and Tulika Mitra. Graph minor approach for application mapping on cgras. ACM TRETS, 2014.
    [10]
    Mahdi Hamzeh, Aviral Shrivastava, and Sarma Vrudhula. Regimap: Register-aware application mapping on cgras. In DAC, 2013.
    [11]
    Panagiotis Theocharis and Bjorn De Sutter. A bimodal scheduler for coarsegrained reconfigurable arrays. ACM TACO, 2016.
    [12]
    Bjorn De Sutter et al. Placement-and-routing-based register allocation for coarsegrained reconfigurable arrays. In ACM Sigplan Notices, 2008.
    [13]
    Shouyi Yin et al. Memory-aware loop mapping on coarse-grained reconfigurable architectures. IEEE TVLSI, 2016.
    [14]
    Matthew Guthaus et al. Mibench: A free, commercially representative embedded benchmark suite. In WWC, 2001.
    [15]
    Shail Dave and Aviral Shrivastava. Ccf: A cgra compilation framework. 2018.
    [16]
    Chris Lattner and Vikram Adve. Llvm: A compilation framework for lifelong program analysis & transformation. In CGO, 2004.
    [17]
    B Ramakrishna Rau. Iterative modulo scheduling: An algorithm for software pipelining loops. In MICRO, 1994.
    [18]
    Shail Dave, Mahesh Balasubramanian, and Aviral Shrivastava. Ureca: A compiler solution to manage unified register file for cgras. In DATE, 2018.
    [19]
    Giovanni Ansaloni, Paolo Bonzini, and Laura Pozzi. Egra: A coarse grained reconfigurable architectural template. IEEE TVLSI, 2011.
    [20]
    S Alexander Chin et al. Architecture exploration of standard-cell and fpga-overlay cgras using the open-source cgra-me framework. In ISPD, 2018.
    [21]
    Kyuseung Han, Junwhan Ahn, and Kiyoung Choi. Power-efficient predication techniques for acceleration of control flow execution on cgra. ACM TACO, 2013.
    [22]
    Nathan Binkert et al. The gem5 simulator. 2011.
    [23]
    Ashay Dharwadker. The clique algorithm, 2006.

    Cited By

    View all
    • (2024)Standalone Nested Loop Acceleration on CGRAs for Signal Processing ApplicationsDesign and Architectures for Signal and Image Processing10.1007/978-3-031-62874-0_7(83-95)Online publication date: 17-Jan-2024
    • (2023)SAT-MapIt: A SAT-based Modulo Scheduling Mapper for Coarse Grain Reconfigurable Architectures2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137123(1-6)Online publication date: Apr-2023
    • (2023)Explainable-DSE: An Agile and Explainable Exploration of Efficient HW/SW Codesigns of Deep Learning Accelerators Using Bottleneck AnalysisProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624772(87-107)Online publication date: 25-Mar-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DAC '18: Proceedings of the 55th Annual Design Automation Conference
    June 2018
    1089 pages
    ISBN:9781450357005
    DOI:10.1145/3195970
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 June 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    DAC '18
    Sponsor:
    DAC '18: The 55th Annual Design Automation Conference 2018
    June 24 - 29, 2018
    California, San Francisco

    Acceptance Rates

    Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

    Upcoming Conference

    DAC '25
    62nd ACM/IEEE Design Automation Conference
    June 22 - 26, 2025
    San Francisco , CA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)134
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Standalone Nested Loop Acceleration on CGRAs for Signal Processing ApplicationsDesign and Architectures for Signal and Image Processing10.1007/978-3-031-62874-0_7(83-95)Online publication date: 17-Jan-2024
    • (2023)SAT-MapIt: A SAT-based Modulo Scheduling Mapper for Coarse Grain Reconfigurable Architectures2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137123(1-6)Online publication date: Apr-2023
    • (2023)Explainable-DSE: An Agile and Explainable Exploration of Efficient HW/SW Codesigns of Deep Learning Accelerators Using Bottleneck AnalysisProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624772(87-107)Online publication date: 25-Mar-2023
    • (2023)SAT-MapItProceedings of the 20th ACM International Conference on Computing Frontiers10.1145/3587135.3591433(383-384)Online publication date: 9-May-2023
    • (2023)BusMap: Application Mapping With Bus Routing for Coarse-Grained Reconfigurable ArrayIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2023.325368670:8(3054-3058)Online publication date: Aug-2023
    • (2023)TAEM 2.0: A Faster Transfer-Aware Effective Loop Mapping for Heterogeneous Resources on CGRAIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.322615242:8(2552-2565)Online publication date: Aug-2023
    • (2023)DFGC: DFG-aware NoC Control based on Time Stamp Prediction for Dataflow Architecture2023 IEEE 41st International Conference on Computer Design (ICCD)10.1109/ICCD58817.2023.00071(432-439)Online publication date: 6-Nov-2023
    • (2022)PathSeekerProceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe10.5555/3539845.3539913(268-273)Online publication date: 14-Mar-2022
    • (2022)RF-CGRA: A Routing-Friendly CGRA with Hierarchical Register Chains2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774601(262-267)Online publication date: 14-Mar-2022
    • (2022)PathSeeker: A Fast Mapping Algorithm for CGRAs2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774520(268-273)Online publication date: 14-Mar-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media