Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ISCA.2005.9acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

Published: 01 May 2005 Publication History
  • Get Citation Alerts
  • Abstract

    Instruction set customization is an effective way to improve processor performance. Critical portions of applicationdata-flow graphs are collapsed for accelerated execution on specialized hardware. Collapsing dataflow subgraphs will compress the latency along critical paths and reduces the number of intermediate results stored in the register file. While custom instructions can be effective, the time and cost of designing a new processor for each application is immense. To overcome this roadblock, this paper proposes a flexible architectural framework to transparently integrate custom instructions into a general-purpose processor. Hardware accelerators are added to the processor to execute the collapsed subgraphs. A simple microarchitectural interface is provided to support a plug-and-play model for integrating a wide range of accelerators into a pre-designed and verified processor core. The accelerators are exploited using an approach of static identification and dynamic realization. The compiler is responsible for identifying profitable subgraphs, while the hardware handles discovery, mapping, and execution of compatible subgraphs. This paper presents the design of a plug-and-play transparent accelerator system and evaluates the cost/performance implications of the design.

    References

    [1]
    {1} A. Aho, M. Ganapathi, and S. Tijang. Code generation using tree pattern matching and dynamic programming. ACM Transactions on Programming Languages and Systems, 11(4):491-516, Oct. 1989.
    [2]
    {2} ARM Ltd. ARM926EJ-S Technical Reference Manual, Jan. 2004. http://www.arm.com/pdfs/DDI0198D_926_TRM.pdf.
    [3]
    {3} K. Atasu, L. Pozzi, and P. Ienne. Automatic application-specific instruction-set extensions under microarchitectural constraints. In Proc. of the 40th Design Automation Conference, pages 256-261, June 2003.
    [4]
    {4} T. Austin, E. Larson, and D. Ernst. Simplescalar: An infrastructure for computer system modeling. IEEE Transactions on Computers, 35(2):59-67, Feb. 2002.
    [5]
    {5} A. Bracy, P. Prahlad, and A. Roth. Dataflow mini-graphs: Amplifying superscalar capacity and bandwidth. In Proc. of the 37th Annual International Symposium on Microarchitecture, pages 18-29, Dec. 2004.
    [6]
    {6} P. Brisk et al. Instruction generation and regularity extraction for reconfigurable processors. In Proc. of the 2002 International Conference on on Compilers, Architecture, and Synthesis for Embedded Systems, pages 262-269, 2002.
    [7]
    {7} N. Clark et al. Application-specific processing on a general-purpose core via transparent instruction set customization. In Proc. of the 37th Annual International Symposium on Microarchitecture, pages 30-40, Dec. 2004.
    [8]
    {8} N. Clark, H. Zhong, and S. Mahlke. Processor acceleration through automated instruction set customization. In Proc. of the 36th Annual International Symposium on Microarchitecture, pages 129-140, Dec. 2003.
    [9]
    {9} M. L. Corliss, E. C. Lewis, and A. Roth. DISE: A programmable macro engine for customizing applications. In Proc. of the 30th Annual International Symposium on Computer Architecture, pages 362-373, 2003.
    [10]
    {10} J. Fisher. Trace scheduling: a technique for global microcode compaction. IEEE Transactions on Computers, 30(9):478-490, July 1981.
    [11]
    {11} D. Friendly, S. Patel, and Y. Patt. Putting the fill unit to work: Dynamic optimizations for trace cache microprocessors. In Proc. of the 25th Annual International Symposium on Computer Architecture, pages 173-181, June 1998.
    [12]
    {12} D. Goodwin and D. Petkov. Automatic generation of application specific processors. In Proc. of the 2003 International Conference on on Compilers, Architecture, and Synthesis for Embedded Systems, pages 137-147, 2003.
    [13]
    {13} I. Huang. Co-Synthesis of Instruction Sets and Microarchitectures. PhD thesis, University of Southern California, 1994.
    [14]
    {14} W. Hwu et al. The Superblock: An effective technique for VLIW and superscalar compilation. Journal of Supercomputing, 7(1):229-248, May 1993.
    [15]
    {15} Q. Jacobson and J. E. Smith. Instruction pre-processing in trace processors. In Proc. of the 5th International Symposium on on High-Performance Computer Architecture, pages 125-133, 1999.
    [16]
    {16} K. Kunchithapadam and J. R. Larus. Using lightweight procedures to improve instruction cache performance. Technical Report CS-TR-1999-1390, Jan. 1999.
    [17]
    {17} C. Lee, M. Potkonjak, and W. Mangione-Smith. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In Proc. of the 30th Annual International Symposium on Microarchitecture, pages 330-335, 1997.
    [18]
    {18} S. Liao. Code Generation and Optimization for Embedded Digital Signal Processors. PhD thesis, Massachussetts Institute of Technology, 1996.
    [19]
    {19} S. Liao et al. Instruction selection using binate covering for code size optimization. In Proc. of the 1995 International Conference on on Computer Aided Design, pages 393-399, 1995.
    [20]
    {20} P. Marwedel and G. Goossens. Code Generation for Embedded Processors. Kluwer Academic Publishers, Boston, 1995.
    [21]
    {21} S. J. Patel and S. S. Lumetta. rePLay: A Hardware Framework for Dynamic Optimization. IEEE Trans. Comput., 50(6):590-608, 2001.
    [22]
    {22} J. Phillips and S. Vassiliadis. High-performance 3-1 interlock collapsing alu's. IEEE Trans. Comput., 43(3):257-268, 1994.
    [23]
    {23} P. Sassone and D. S. Wills. Dynamic strands: Collapsing speculative dependence chains for reducing pipeline communication. In Proc. of the 37th Annual International Symposium on Microarchitecture, pages 7-17, Dec. 2004.
    [24]
    {24} Y. Sazeides, S. Vassiliadis, and J. E. Smith. The performance potential of data dependence speculation & collapsing. In Proc. of the 29th Annual International Symposium on Microarchitecture, pages 238-247, 1996.
    [25]
    {25} F. Sun et al. Synthesis of custom processors based on extensible platforms. In Proc. of the 2002 International Conference on on Computer Aided Design, pages 641-648, Nov. 2002.
    [26]
    {26} Trimaran. An infrastructure for research in ILP, 2000. http://www.trimaran.org.
    [27]
    {27} S. Yehia and O. Temam. From sequences of dependent instructions to functions: An approach for improving performance without ilp or speculation. In Proc. of the 31th Annual International Symposium on Computer Architecture, pages 238-249, June 2004.
    [28]
    {28} P. Yu and T. Mitra. Characterizing embedded applications for instruction-set extensible processors. In Proc. of the 41st Design Automation Conference, pages 723-728, June 2004.

    Cited By

    View all
    • (2021)NOVIA: A Framework for Discovering Non-Conventional Inline AcceleratorsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480094(507-521)Online publication date: 18-Oct-2021
    • (2020)Decentralized Offload-based Execution on Memory-centric Compute CoresProceedings of the International Symposium on Memory Systems10.1145/3422575.3422778(61-76)Online publication date: 28-Sep-2020
    • (2020)Improving Performance and Energy Consumption in Embedded Systems via Binary Acceleration: A SurveyACM Computing Surveys10.1145/336976453:1(1-36)Online publication date: 6-Feb-2020
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture
    June 2005
    541 pages
    ISBN:076952270X
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 33, Issue 2
      ISCA 2005
      May 2005
      531 pages
      ISSN:0163-5964
      DOI:10.1145/1080695
      Issue’s Table of Contents

    Sponsors

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 01 May 2005

    Check for updates

    Qualifiers

    • Article

    Conference

    ISCA05
    Sponsor:

    Acceptance Rates

    ISCA '05 Paper Acceptance Rate 45 of 194 submissions, 23%;
    Overall Acceptance Rate 543 of 3,203 submissions, 17%

    Upcoming Conference

    ISCA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)NOVIA: A Framework for Discovering Non-Conventional Inline AcceleratorsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480094(507-521)Online publication date: 18-Oct-2021
    • (2020)Decentralized Offload-based Execution on Memory-centric Compute CoresProceedings of the International Symposium on Memory Systems10.1145/3422575.3422778(61-76)Online publication date: 28-Sep-2020
    • (2020)Improving Performance and Energy Consumption in Embedded Systems via Binary Acceleration: A SurveyACM Computing Surveys10.1145/336976453:1(1-36)Online publication date: 6-Feb-2020
    • (2017)Lightweight SIMT core designs for intelligent 3D stacked DRAMProceedings of the International Symposium on Memory Systems10.1145/3132402.3132426(49-59)Online publication date: 2-Oct-2017
    • (2017)Generation of Customized Accelerators for Loop Pipelining of Binary Instruction TracesIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.257364025:1(21-34)Online publication date: 1-Jan-2017
    • (2016)LOCUSProceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems10.1145/2968455.2968506(1-10)Online publication date: 1-Oct-2016
    • (2015)Transparent acceleration of program execution using reconfigurable hardwareProceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition10.5555/2755753.2757061(1066-1071)Online publication date: 9-Mar-2015
    • (2015)DynaSpAMACM SIGARCH Computer Architecture News10.1145/2872887.275041443:3S(541-553)Online publication date: 13-Jun-2015
    • (2015)DynaSpAMProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750414(541-553)Online publication date: 13-Jun-2015
    • (2014)A Reconfigurable Architecture for Binary Acceleration of Loops with Memory AccessesACM Transactions on Reconfigurable Technology and Systems10.1145/26294687:4(1-20)Online publication date: 29-Dec-2014
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media