Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Triggered instructions: a control paradigm for spatially-programmed architectures

Published: 23 June 2013 Publication History

Abstract

In this paper, we present triggered instructions, a novel control paradigm for arrays of processing elements (PEs) aimed at exploiting spatial parallelism. Triggered instructions completely eliminate the program counter and allow programs to transition concisely between states without explicit branch instructions. They also allow efficient reactivity to inter-PE communication traffic. The approach provides a unified mechanism to avoid over-serialized execution, essentially achieving the effect of techniques such as dynamic instruction reordering and multithreading, which each require distinct hardware mechanisms in a traditional sequential architecture.
Our analysis shows that a triggered-instruction based spatial accelerator can achieve 8X greater area-normalized performance than a traditional general-purpose processor. Further analysis shows that triggered control reduces the number of static and dynamic instructions in the critical paths by 62% and 64% respectively over a program-counter style spatial baseline, resulting in a speedup of 2.0X.

References

[1]
Arvind and R. S. Nikhil. Executing a Program on the MIT Tagged-Token Dataflow Architecture. IEEE Transactions on Computers, 39(3):300--318, 1990.
[2]
K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec. 2006.
[3]
Bluespec, Inc. Bluespec System Verilog Reference Guide. 2007.
[4]
D. Burger, S. W. Keckler, K. S. McKinley, M. Dahlin, L. K. John, C. Lin, C. R. Moore, J. Burrill, R. G. McDonald, and W. Yoder. Scaling to the End of Silicon with EDGE Architectures. Computer, 37(7):44--55, July 2004.
[5]
K. M. Chandy and J. Misra. Parallel Program Design: a Foundation. Addison-Wesley, 1988.
[6]
K. Compton and S. Hauck. Reconfigurable Computing: A Survey Of Systems and Software. ACM Computer Survey, 34(2):171--210, June 2002.
[7]
J. B. Dennis and D. P. Misunas. A Preliminary Architecture for a Basic Data-Flow Processor. In Proceedings of the 2nd annual Symposium on Computer Architecture, pages 126--132, 1975.
[8]
E. W. Dijkstra. Guarded Commands, Nondeterminacy and Formal Derivation of Programs. Communications of the ACM, 18(8):453--457, Aug. 1975.
[9]
J. Emer, P. Ahuja, E. Borch, A. Klauser, C.-K. Luk, S. Manne, S. S. Mukherjee, H. Patil, S. Wallace, N. Binkert, R. Espasa, and T. Juan. Asim: A Performance Model Framework. Computer, 35(2):68--76, 2002.
[10]
J. S. Emer and D. W. Clark. A Characterization of Processor Performance in the vax-11/780. In Proceedings of the 11th Annual International Symposium on Computer Architecture (ISCA), pages 301--310, 1984.
[11]
R. A. V. D. Geijin and J. Watts. SUMMA: Scalable Universal Matrix Multiplication Algorithm. Technical report, 1997.
[12]
V. Govindaraju, C.-H. Ho, and K. Sankaralingam. Dynamically Specialized Datapaths for Energy Efficient Computing. In Proceedings of 17th International Conference on High Performance Computer Architecture (HPCA), 2011.
[13]
J. Hauser and J. Wawrzynek. Garp: A MIPS Processor with a Reconfigurable Coprocessor. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, pages 12--21, April 1997.
[14]
J. Hoogerbrugge and H. Corporaal. Transport-Triggering vs. Operation-Triggering. In Lecture Notes in Computer Science 786, Compiler Construction, pages 435--449. Springer-Verlag, 1994.
[15]
D. E. Knuth, J. Morris, and V. R. Pratt. Fast Pattern Matching in Strings. SIAM Journal of Computing, 6(2):323--350, 1977.
[16]
H. T. Kung. The CMU Warp Processor. In F. A. Matsen and T. Tajima, editors, Supercomputers: Algorithms, Architectures, and Scientific Computation, pages 235--247. 1986.
[17]
A. Marquardt, V. Betz, and J. Rose. Speed and Area Tradeoffs in Cluster-Based FPGA Architectures. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 8(1):84--93, Feb. 2000.
[18]
B. Mei, S. Vernalde, D. Verkest, H. D. Man, and R. Lauwereins. ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix. In Proceedings of 13th International Conference on Field-Programmable Logic and Applications, pages 61--70, Sep. 2003.
[19]
D. G. Merrill and A. S. Grimshaw. Revisiting Sorting for GPGPU Stream Architectures. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 545--546, 2010.
[20]
E. Mirsky and A. DeHon. MATRIX: A Reconfigurable Computing Architecture with Configurable Instruction Distribution and Deployable Resources. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, pages 157--166, Apr. 1996.
[21]
G. Panesar, D. Towner, A. Duller, A. Gray, and W. Robbins. Deterministic Parallel Processing. International Journal of Parallel Programming, 34(4):323--341, Aug. 2006.
[22]
H. Schmit, D. Whelihan, A. Tsai, M. Moe, B. Levine, and R. Taylor. PipeRench: A Virtualized Programmable Datapath in 0.18 Micron Technology. In Proceedings of the 2002 IEEE Custom Integrated Circuits Conference, pages 63--66, May 2002.
[23]
S. Swanson, A. Schwerin, M. Mercaldi, A. Petersen, A. Putnam, K. Michelson, M. Oskin, and S. J. Eggers. The WaveScalar Architecture. ACM Transactions on Computer Systems, 25(2):4:1--4:54, May 2007.
[24]
M. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, P. Johnson, J. Lee, W. Lee, et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs. IEEE Micro, 22(2):25--35, 2002.
[25]
D. Truong, W. Cheng, T. Mohsenin, Z. Yu, A. Jacobson, G. Landge, M. Meeuwsen, C. Watnik, A. Tran, Z. Xiao, E. Work, J. Webb, P. Mejia, and B. Baas. A 167-Processor Computational Platform in 65 nm CMOS. IEEE Journal of Solid-State Circuits, 44(4):1130--1144, April 2009.
[26]
Z.-A. Ye, A. Moshovos, S. Hauck, and P. Banerjee. CHIMAERA: A High-Performance Architecture with a Tightly-Coupled Reconfigurable Functional Unit. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA), pages 225--235, Jun. 2000.
[27]
Z. Yu, M. Meeuwsen, R. Apperson, O. Sattari, M. Lai, J. Webb, E. Work, T. Mohsenin, M. Singh, and B. Baas. An Asynchronous Array of Simple Processors for DSP Applications. In Solid-State Circuits Conference (ISSCC), Digest of Technical Papers, pages 1696--1705, Feb. 2006.

Cited By

View all
  • (2024)Rubick: A Unified Infrastructure for Analyzing, Exploring, and Implementing Spatial Architectures via Dataflow DecompositionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333720843:4(1177-1190)Online publication date: 1-Apr-2024
  • (2023)Pipestitch: An energy-minimal dataflow architecture with lightweight threadsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614283(1409-1422)Online publication date: 28-Oct-2023
  • (2023)A Reschedulable Dataflow-SIMD Execution for Increased Utilization in CGRA Cross-Domain AccelerationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.318554442:3(874-886)Online publication date: 1-Mar-2023
  • Show More Cited By

Index Terms

  1. Triggered instructions: a control paradigm for spatially-programmed architectures

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 41, Issue 3
    ICSA '13
    June 2013
    666 pages
    ISSN:0163-5964
    DOI:10.1145/2508148
    Issue’s Table of Contents
    • cover image ACM Other conferences
      ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
      June 2013
      686 pages
      ISBN:9781450320795
      DOI:10.1145/2485922
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 June 2013
    Published in SIGARCH Volume 41, Issue 3

    Check for updates

    Author Tags

    1. reconfigurable accelerators
    2. spatial programming

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)148
    • Downloads (Last 6 weeks)21
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Rubick: A Unified Infrastructure for Analyzing, Exploring, and Implementing Spatial Architectures via Dataflow DecompositionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333720843:4(1177-1190)Online publication date: 1-Apr-2024
    • (2023)Pipestitch: An energy-minimal dataflow architecture with lightweight threadsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614283(1409-1422)Online publication date: 28-Oct-2023
    • (2023)A Reschedulable Dataflow-SIMD Execution for Increased Utilization in CGRA Cross-Domain AccelerationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.318554442:3(874-886)Online publication date: 1-Mar-2023
    • (2023)UDIR: Towards a Unified Compiler Framework for Reconfigurable Dataflow ArchitecturesIEEE Computer Architecture Letters10.1109/LCA.2023.334213023:1(99-103)Online publication date: 13-Dec-2023
    • (2023)DFGC: DFG-aware NoC Control based on Time Stamp Prediction for Dataflow Architecture2023 IEEE 41st International Conference on Computer Design (ICCD)10.1109/ICCD58817.2023.00071(432-439)Online publication date: 6-Nov-2023
    • (2023)Phloem: Automatic Acceleration of Irregular Applications with Fine-Grain Pipeline Parallelism2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071026(1262-1274)Online publication date: Feb-2023
    • (2022)Dynamic-II Pipeline: Compiling Loops With Irregular Branches on Static-Scheduling CGRAIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.312134641:9(2929-2942)Online publication date: Sep-2022
    • (2022)Hardware Architectures and CircuitsSoftware Defined Chips10.1007/978-981-19-6994-2_3(77-196)Online publication date: 21-Oct-2022
    • (2022)Overview of SDCSoftware Defined Chips10.1007/978-981-19-6994-2_2(27-76)Online publication date: 21-Oct-2022
    • (2021)Subgraph Decoupling and Rescheduling for Increased Utilization in CGRA Architecture2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474195(1394-1399)Online publication date: 1-Feb-2021
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media