Article

An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

Authors:

Krisztian FlautnerAuthors Info & Claims

ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture

Pages 272 - 283

https://doi.org/10.1109/ISCA.2005.9

Published: 01 May 2005 Publication History

Abstract

Instruction set customization is an effective way to improve processor performance. Critical portions of applicationdata-flow graphs are collapsed for accelerated execution on specialized hardware. Collapsing dataflow subgraphs will compress the latency along critical paths and reduces the number of intermediate results stored in the register file. While custom instructions can be effective, the time and cost of designing a new processor for each application is immense. To overcome this roadblock, this paper proposes a flexible architectural framework to transparently integrate custom instructions into a general-purpose processor. Hardware accelerators are added to the processor to execute the collapsed subgraphs. A simple microarchitectural interface is provided to support a plug-and-play model for integrating a wide range of accelerators into a pre-designed and verified processor core. The accelerators are exploited using an approach of static identification and dynamic realization. The compiler is responsible for identifying profitable subgraphs, while the hardware handles discovery, mapping, and execution of compatible subgraphs. This paper presents the design of a plug-and-play transparent accelerator system and evaluates the cost/performance implications of the design.

References

[1]

{1} A. Aho, M. Ganapathi, and S. Tijang. Code generation using tree pattern matching and dynamic programming. ACM Transactions on Programming Languages and Systems, 11(4):491-516, Oct. 1989.

Digital Library

[2]

{2} ARM Ltd. ARM926EJ-S Technical Reference Manual, Jan. 2004. http://www.arm.com/pdfs/DDI0198D_926_TRM.pdf.

[3]

{3} K. Atasu, L. Pozzi, and P. Ienne. Automatic application-specific instruction-set extensions under microarchitectural constraints. In Proc. of the 40th Design Automation Conference, pages 256-261, June 2003.

Digital Library

[4]

{4} T. Austin, E. Larson, and D. Ernst. Simplescalar: An infrastructure for computer system modeling. IEEE Transactions on Computers, 35(2):59-67, Feb. 2002.

Digital Library

[5]

{5} A. Bracy, P. Prahlad, and A. Roth. Dataflow mini-graphs: Amplifying superscalar capacity and bandwidth. In Proc. of the 37th Annual International Symposium on Microarchitecture, pages 18-29, Dec. 2004.

Digital Library

[6]

{6} P. Brisk et al. Instruction generation and regularity extraction for reconfigurable processors. In Proc. of the 2002 International Conference on on Compilers, Architecture, and Synthesis for Embedded Systems, pages 262-269, 2002.

Digital Library

[7]

{7} N. Clark et al. Application-specific processing on a general-purpose core via transparent instruction set customization. In Proc. of the 37th Annual International Symposium on Microarchitecture, pages 30-40, Dec. 2004.

Digital Library

[8]

{8} N. Clark, H. Zhong, and S. Mahlke. Processor acceleration through automated instruction set customization. In Proc. of the 36th Annual International Symposium on Microarchitecture, pages 129-140, Dec. 2003.

Digital Library

[9]

{9} M. L. Corliss, E. C. Lewis, and A. Roth. DISE: A programmable macro engine for customizing applications. In Proc. of the 30th Annual International Symposium on Computer Architecture, pages 362-373, 2003.

Digital Library

[10]

{10} J. Fisher. Trace scheduling: a technique for global microcode compaction. IEEE Transactions on Computers, 30(9):478-490, July 1981.

Digital Library

[11]

{11} D. Friendly, S. Patel, and Y. Patt. Putting the fill unit to work: Dynamic optimizations for trace cache microprocessors. In Proc. of the 25th Annual International Symposium on Computer Architecture, pages 173-181, June 1998.

Digital Library

[12]

{12} D. Goodwin and D. Petkov. Automatic generation of application specific processors. In Proc. of the 2003 International Conference on on Compilers, Architecture, and Synthesis for Embedded Systems, pages 137-147, 2003.

Digital Library

[13]

{13} I. Huang. Co-Synthesis of Instruction Sets and Microarchitectures. PhD thesis, University of Southern California, 1994.

[14]

{14} W. Hwu et al. The Superblock: An effective technique for VLIW and superscalar compilation. Journal of Supercomputing, 7(1):229-248, May 1993.

Digital Library

[15]

{15} Q. Jacobson and J. E. Smith. Instruction pre-processing in trace processors. In Proc. of the 5th International Symposium on on High-Performance Computer Architecture, pages 125-133, 1999.

Digital Library

[16]

{16} K. Kunchithapadam and J. R. Larus. Using lightweight procedures to improve instruction cache performance. Technical Report CS-TR-1999-1390, Jan. 1999.

[17]

{17} C. Lee, M. Potkonjak, and W. Mangione-Smith. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In Proc. of the 30th Annual International Symposium on Microarchitecture, pages 330-335, 1997.

Digital Library

[18]

{18} S. Liao. Code Generation and Optimization for Embedded Digital Signal Processors. PhD thesis, Massachussetts Institute of Technology, 1996.

Digital Library

[19]

{19} S. Liao et al. Instruction selection using binate covering for code size optimization. In Proc. of the 1995 International Conference on on Computer Aided Design, pages 393-399, 1995.

Digital Library

[20]

{20} P. Marwedel and G. Goossens. Code Generation for Embedded Processors. Kluwer Academic Publishers, Boston, 1995.

Digital Library

[21]

{21} S. J. Patel and S. S. Lumetta. rePLay: A Hardware Framework for Dynamic Optimization. IEEE Trans. Comput., 50(6):590-608, 2001.

Digital Library

[22]

{22} J. Phillips and S. Vassiliadis. High-performance 3-1 interlock collapsing alu's. IEEE Trans. Comput., 43(3):257-268, 1994.

Digital Library

[23]

{23} P. Sassone and D. S. Wills. Dynamic strands: Collapsing speculative dependence chains for reducing pipeline communication. In Proc. of the 37th Annual International Symposium on Microarchitecture, pages 7-17, Dec. 2004.

Digital Library

[24]

{24} Y. Sazeides, S. Vassiliadis, and J. E. Smith. The performance potential of data dependence speculation & collapsing. In Proc. of the 29th Annual International Symposium on Microarchitecture, pages 238-247, 1996.

Digital Library

[25]

{25} F. Sun et al. Synthesis of custom processors based on extensible platforms. In Proc. of the 2002 International Conference on on Computer Aided Design, pages 641-648, Nov. 2002.

Digital Library

[26]

{26} Trimaran. An infrastructure for research in ILP, 2000. http://www.trimaran.org.

[27]

{27} S. Yehia and O. Temam. From sequences of dependent instructions to functions: An approach for improving performance without ilp or speculation. In Proc. of the 31th Annual International Symposium on Computer Architecture, pages 238-249, June 2004.

Digital Library

[28]

{28} P. Yu and T. Mitra. Characterizing embedded applications for instruction-set extensible processors. In Proc. of the 41st Design Automation Conference, pages 723-728, June 2004.

Digital Library

Cited By

Trilla DWellman JBuyuktosunoglu ABose P(2021)NOVIA: A Framework for Discovering Non-Conventional Inline AcceleratorsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480094(507-521)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480094
Baskaran SSampson J(2020)Decentralized Offload-based Execution on Memory-centric Compute CoresProceedings of the International Symposium on Memory Systems10.1145/3422575.3422778(61-76)Online publication date: 28-Sep-2020
https://dl.acm.org/doi/10.1145/3422575.3422778
Paulino NFerreira JCardoso J(2020)Improving Performance and Energy Consumption in Embedded Systems via Binary Acceleration: A SurveyACM Computing Surveys10.1145/336976453:1(1-36)Online publication date: 6-Feb-2020
https://dl.acm.org/doi/10.1145/3369764
Show More Cited By

Index Terms

An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

Recommendations

Improving performance and energy efficiency of embedded processors via post-fabrication instruction set customization

Encapsulating critical computation subgraphs as application-specific instruction set extensions is an effective technique to enhance the performance and energy efficiency of embedded processors. However, the addition of custom functional units to the ...
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors
ISCA 2005

Instruction set customization is an effective way to improve processor performance. Critical portions of applicationdata-flow graphs are collapsed for accelerated execution on specialized hardware. Collapsing dataflow subgraphs will compress the latency ...
Automatic custom instruction identification for application-specific instruction set processors

The application-specific instruction set processors (ASIPs) have received more and more attention in recent years. ASIPs make trade-offs between flexibility and performance by extending the base instruction set of a general-purpose processor with custom ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture

June 2005

541 pages

ISBN:076952270X

ACM SIGARCH Computer Architecture News Volume 33, Issue 2
ISCA 2005
May 2005
531 pages
ISSN:0163-5964
DOI:10.1145/1080695
Issue’s Table of Contents

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 May 2005

Check for updates

Qualifiers

Article

Conference

ISCA05

Sponsor:

SIGARCH

ISCA05: The 32nd Annual International Symposium on Computer Architecture 2005

June 4 - 8, 2005

Acceptance Rates

ISCA '05 Paper Acceptance Rate 45 of 194 submissions, 23%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

74
Total Citations
View Citations
25
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Trilla DWellman JBuyuktosunoglu ABose P(2021)NOVIA: A Framework for Discovering Non-Conventional Inline AcceleratorsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480094(507-521)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480094
Baskaran SSampson J(2020)Decentralized Offload-based Execution on Memory-centric Compute CoresProceedings of the International Symposium on Memory Systems10.1145/3422575.3422778(61-76)Online publication date: 28-Sep-2020
https://dl.acm.org/doi/10.1145/3422575.3422778
Paulino NFerreira JCardoso J(2020)Improving Performance and Energy Consumption in Embedded Systems via Binary Acceleration: A SurveyACM Computing Surveys10.1145/336976453:1(1-36)Online publication date: 6-Feb-2020
https://dl.acm.org/doi/10.1145/3369764
Kersey CKim HYalamanchili SJacob B(2017)Lightweight SIMT core designs for intelligent 3D stacked DRAMProceedings of the International Symposium on Memory Systems10.1145/3132402.3132426(49-59)Online publication date: 2-Oct-2017
https://dl.acm.org/doi/10.1145/3132402.3132426
Paulino NFerreira JCardoso J(2017)Generation of Customized Accelerators for Loop Pipelining of Binary Instruction TracesIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.257364025:1(21-34)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1109/TVLSI.2016.2573640
Tan CKulkarni AVenkataramani VKarunaratne MMitra TPeh L(2016)LOCUSProceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems10.1145/2968455.2968506(1-10)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1145/2968455.2968506
Paulino NFerreira JBispo JCardoso JNebel WAtienza D(2015)Transparent acceleration of program execution using reconfigurable hardwareProceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition10.5555/2755753.2757061(1066-1071)Online publication date: 9-Mar-2015
https://dl.acm.org/doi/10.5555/2755753.2757061
Liu FAhn HBeard SOh TAugust D(2015)DynaSpAMACM SIGARCH Computer Architecture News10.1145/2872887.275041443:3S(541-553)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750414
Liu FAhn HBeard SOh TAugust DMarr DAlbonesi D(2015)DynaSpAMProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750414(541-553)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2749469.2750414
Paulino NFerreira JCardoso J(2014)A Reconfigurable Architecture for Binary Acceleration of Loops with Memory AccessesACM Transactions on Reconfigurable Technology and Systems10.1145/26294687:4(1-20)Online publication date: 29-Dec-2014
https://dl.acm.org/doi/10.1145/2629468
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents