research-article

Triggered instructions: a control paradigm for spatially-programmed architectures

Authors:

Angshuman Parashar,

Michael Pellauer,

Vladimir Pavlov,

Stephen Maresh,

Joel EmerAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 41, Issue 3

Pages 142 - 153

https://doi.org/10.1145/2508148.2485935

Published: 23 June 2013 Publication History

Abstract

In this paper, we present triggered instructions, a novel control paradigm for arrays of processing elements (PEs) aimed at exploiting spatial parallelism. Triggered instructions completely eliminate the program counter and allow programs to transition concisely between states without explicit branch instructions. They also allow efficient reactivity to inter-PE communication traffic. The approach provides a unified mechanism to avoid over-serialized execution, essentially achieving the effect of techniques such as dynamic instruction reordering and multithreading, which each require distinct hardware mechanisms in a traditional sequential architecture.

Our analysis shows that a triggered-instruction based spatial accelerator can achieve 8X greater area-normalized performance than a traditional general-purpose processor. Further analysis shows that triggered control reduces the number of static and dynamic instructions in the critical paths by 62% and 64% respectively over a program-counter style spatial baseline, resulting in a speedup of 2.0X.

References

[1]

Arvind and R. S. Nikhil. Executing a Program on the MIT Tagged-Token Dataflow Architecture. IEEE Transactions on Computers, 39(3):300--318, 1990.

Digital Library

[2]

K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec. 2006.

[3]

Bluespec, Inc. Bluespec System Verilog Reference Guide. 2007.

[4]

D. Burger, S. W. Keckler, K. S. McKinley, M. Dahlin, L. K. John, C. Lin, C. R. Moore, J. Burrill, R. G. McDonald, and W. Yoder. Scaling to the End of Silicon with EDGE Architectures. Computer, 37(7):44--55, July 2004.

Digital Library

[5]

K. M. Chandy and J. Misra. Parallel Program Design: a Foundation. Addison-Wesley, 1988.

Digital Library

[6]

K. Compton and S. Hauck. Reconfigurable Computing: A Survey Of Systems and Software. ACM Computer Survey, 34(2):171--210, June 2002.

Digital Library

[7]

J. B. Dennis and D. P. Misunas. A Preliminary Architecture for a Basic Data-Flow Processor. In Proceedings of the 2nd annual Symposium on Computer Architecture, pages 126--132, 1975.

Digital Library

[8]

E. W. Dijkstra. Guarded Commands, Nondeterminacy and Formal Derivation of Programs. Communications of the ACM, 18(8):453--457, Aug. 1975.

Digital Library

[9]

J. Emer, P. Ahuja, E. Borch, A. Klauser, C.-K. Luk, S. Manne, S. S. Mukherjee, H. Patil, S. Wallace, N. Binkert, R. Espasa, and T. Juan. Asim: A Performance Model Framework. Computer, 35(2):68--76, 2002.

Digital Library

[10]

J. S. Emer and D. W. Clark. A Characterization of Processor Performance in the vax-11/780. In Proceedings of the 11th Annual International Symposium on Computer Architecture (ISCA), pages 301--310, 1984.

Digital Library

[11]

R. A. V. D. Geijin and J. Watts. SUMMA: Scalable Universal Matrix Multiplication Algorithm. Technical report, 1997.

[12]

V. Govindaraju, C.-H. Ho, and K. Sankaralingam. Dynamically Specialized Datapaths for Energy Efficient Computing. In Proceedings of 17th International Conference on High Performance Computer Architecture (HPCA), 2011.

Digital Library

[13]

J. Hauser and J. Wawrzynek. Garp: A MIPS Processor with a Reconfigurable Coprocessor. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, pages 12--21, April 1997.

Digital Library

[14]

J. Hoogerbrugge and H. Corporaal. Transport-Triggering vs. Operation-Triggering. In Lecture Notes in Computer Science 786, Compiler Construction, pages 435--449. Springer-Verlag, 1994.

Digital Library

[15]

D. E. Knuth, J. Morris, and V. R. Pratt. Fast Pattern Matching in Strings. SIAM Journal of Computing, 6(2):323--350, 1977.

[16]

H. T. Kung. The CMU Warp Processor. In F. A. Matsen and T. Tajima, editors, Supercomputers: Algorithms, Architectures, and Scientific Computation, pages 235--247. 1986.

Digital Library

[17]

A. Marquardt, V. Betz, and J. Rose. Speed and Area Tradeoffs in Cluster-Based FPGA Architectures. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 8(1):84--93, Feb. 2000.

Digital Library

[18]

B. Mei, S. Vernalde, D. Verkest, H. D. Man, and R. Lauwereins. ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix. In Proceedings of 13th International Conference on Field-Programmable Logic and Applications, pages 61--70, Sep. 2003.

[19]

D. G. Merrill and A. S. Grimshaw. Revisiting Sorting for GPGPU Stream Architectures. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 545--546, 2010.

Digital Library

[20]

E. Mirsky and A. DeHon. MATRIX: A Reconfigurable Computing Architecture with Configurable Instruction Distribution and Deployable Resources. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, pages 157--166, Apr. 1996.

[21]

G. Panesar, D. Towner, A. Duller, A. Gray, and W. Robbins. Deterministic Parallel Processing. International Journal of Parallel Programming, 34(4):323--341, Aug. 2006.

Digital Library

[22]

H. Schmit, D. Whelihan, A. Tsai, M. Moe, B. Levine, and R. Taylor. PipeRench: A Virtualized Programmable Datapath in 0.18 Micron Technology. In Proceedings of the 2002 IEEE Custom Integrated Circuits Conference, pages 63--66, May 2002.

[23]

S. Swanson, A. Schwerin, M. Mercaldi, A. Petersen, A. Putnam, K. Michelson, M. Oskin, and S. J. Eggers. The WaveScalar Architecture. ACM Transactions on Computer Systems, 25(2):4:1--4:54, May 2007.

Digital Library

[24]

M. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, P. Johnson, J. Lee, W. Lee, et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs. IEEE Micro, 22(2):25--35, 2002.

Digital Library

[25]

D. Truong, W. Cheng, T. Mohsenin, Z. Yu, A. Jacobson, G. Landge, M. Meeuwsen, C. Watnik, A. Tran, Z. Xiao, E. Work, J. Webb, P. Mejia, and B. Baas. A 167-Processor Computational Platform in 65 nm CMOS. IEEE Journal of Solid-State Circuits, 44(4):1130--1144, April 2009.

[26]

Z.-A. Ye, A. Moshovos, S. Hauck, and P. Banerjee. CHIMAERA: A High-Performance Architecture with a Tightly-Coupled Reconfigurable Functional Unit. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA), pages 225--235, Jun. 2000.

Digital Library

[27]

Z. Yu, M. Meeuwsen, R. Apperson, O. Sattari, M. Lai, J. Webb, E. Work, T. Mohsenin, M. Singh, and B. Baas. An Asynchronous Array of Simple Processors for DSP Applications. In Solid-State Circuits Conference (ISSCC), Digest of Technical Papers, pages 1696--1705, Feb. 2006.

Cited By

Lu LLuo ZZheng SYin JCong JLiang YYin J(2024)Rubick: A Unified Infrastructure for Analyzing, Exploring, and Implementing Spatial Architectures via Dataflow DecompositionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333720843:4(1177-1190)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1109/TCAD.2023.3337208
Serafin NGhosh SDesai HBeckmann NLucia B(2023)Pipestitch: An energy-minimal dataflow architecture with lightweight threadsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614283(1409-1422)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614283
Yin CJing NJiang JWang QMao Z(2023)A Reschedulable Dataflow-SIMD Execution for Increased Utilization in CGRA Cross-Domain AccelerationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.318554442:3(874-886)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1109/TCAD.2022.3185544
Show More Cited By

Index Terms

Triggered instructions: a control paradigm for spatially-programmed architectures
1. Computer systems organization
  1. Architectures
    1. Other architectures

Recommendations

Triggered instructions: a control paradigm for spatially-programmed architectures
ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

In this paper, we present triggered instructions, a novel control paradigm for arrays of processing elements (PEs) aimed at exploiting spatial parallelism. Triggered instructions completely eliminate the program counter and allow programs to transition ...
Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures

There has been recent interest in exploring the acceleration of nonvectorizable workloads with spatially programmed architectures that are designed to efficiently exploit pipeline parallelism. Such an architecture faces two main problems: how to ...
Dynamic coalescing for 16-bit instructions

In the embedded domain, memory usage and energy consumption are critical constraints.Embedded processors such as the ARM and MIPS provide a 16-bit instruction set, (called Thumb in the case of the ARM family of processors), in addition to the 32-bit ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 41, Issue 3

ICSA '13

June 2013

666 pages

ISSN:0163-5964

DOI:10.1145/2508148

Issue’s Table of Contents

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
June 2013
686 pages
ISBN:9781450320795
DOI:10.1145/2485922
General Chair:
Avi Mendelson
Technion

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2013

Published in SIGARCH Volume 41, Issue 3

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

104
Total Citations
View Citations
2,057
Total Downloads

Downloads (Last 12 months)148
Downloads (Last 6 weeks)21

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lu LLuo ZZheng SYin JCong JLiang YYin J(2024)Rubick: A Unified Infrastructure for Analyzing, Exploring, and Implementing Spatial Architectures via Dataflow DecompositionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333720843:4(1177-1190)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1109/TCAD.2023.3337208
Serafin NGhosh SDesai HBeckmann NLucia B(2023)Pipestitch: An energy-minimal dataflow architecture with lightweight threadsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614283(1409-1422)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614283
Yin CJing NJiang JWang QMao Z(2023)A Reschedulable Dataflow-SIMD Execution for Increased Utilization in CGRA Cross-Domain AccelerationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.318554442:3(874-886)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1109/TCAD.2022.3185544
Agarwal NFream MGhosh SSchwedock BBeckmann N(2023)UDIR: Towards a Unified Compiler Framework for Reconfigurable Dataflow ArchitecturesIEEE Computer Architecture Letters10.1109/LCA.2023.334213023:1(99-103)Online publication date: 13-Dec-2023
https://dl.acm.org/doi/10.1109/LCA.2023.3342130
Liu TLi WFan Z(2023)DFGC: DFG-aware NoC Control based on Time Stamp Prediction for Dataflow Architecture2023 IEEE 41st International Conference on Computer Design (ICCD)10.1109/ICCD58817.2023.00071(432-439)Online publication date: 6-Nov-2023
https://doi.org/10.1109/ICCD58817.2023.00071
Nguyen QSanchez D(2023)Phloem: Automatic Acceleration of Irregular Applications with Fine-Grain Pipeline Parallelism2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071026(1262-1274)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071026
Yuan BZhu JMan XMa ZYin SWei SLiu L(2022)Dynamic-II Pipeline: Compiling Loops With Irregular Branches on Static-Scheduling CGRAIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.312134641:9(2929-2942)Online publication date: Sep-2022
https://doi.org/10.1109/TCAD.2021.3121346
Wei SLiu LZhu JDeng CWei SLiu LZhu JDeng C(2022)Hardware Architectures and CircuitsSoftware Defined Chips10.1007/978-981-19-6994-2_3(77-196)Online publication date: 21-Oct-2022
https://doi.org/10.1007/978-981-19-6994-2_3
Wei SLiu LZhu JDeng CWei SLiu LZhu JDeng C(2022)Overview of SDCSoftware Defined Chips10.1007/978-981-19-6994-2_2(27-76)Online publication date: 21-Oct-2022
https://doi.org/10.1007/978-981-19-6994-2_2
Yin CWang QJiang JSheng WHe GMao ZJing N(2021)Subgraph Decoupling and Rescheduling for Increased Utilization in CGRA Architecture2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474195(1394-1399)Online publication date: 1-Feb-2021
https://doi.org/10.23919/DATE51398.2021.9474195
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents