research-article

Public Access

Type-directed scheduling of streaming accelerators

Authors:

Matthew Feldman,

Gilbert Louis Bernstein,

Marco Patrignani,

Kayvon Fatahalian,

Pat HanrahanAuthors Info & Claims

PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 408 - 422

https://doi.org/10.1145/3385412.3385983

Published: 11 June 2020 Publication History

Abstract

Designing efficient, application-specialized hardware accelerators requires assessing trade-offs between a hardware module’s performance and resource requirements. To facilitate hardware design space exploration, we describe Aetherling, a system for automatically compiling data-parallel programs into statically scheduled, streaming hardware circuits. Aetherling contributes a space- and time-aware intermediate language featuring data-parallel operators that represent parallel or sequential hardware modules, and sequence data types that encode a module’s throughput by specifying when sequence elements are produced or consumed. As a result, well-typed operator composition in the space-time language corresponds to connecting hardware modules via statically scheduled, streaming interfaces.

We provide rules for transforming programs written in a standard data-parallel language (that carries no information about hardware implementation) into equivalent space-time language programs. We then provide a scheduling algorithm that searches over the space of transformations to quickly generate area-efficient hardware designs that achieve a programmer-specified throughput. Using benchmarks from the image processing domain, we demonstrate that Aetherling enables rapid exploration of hardware designs with different throughput and area characteristics, and yields results that require 1.8-7.9× fewer FPGA slices than those of prior hardware generation systems.

References

[1]

2019. Vivado High-Level Synthesis. https://www.xilinx.com/products/ design-tools/vivado/integration/esl-design.html [Online; accessed 26-Mar-2020].

[2]

Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, and Jonathan Ragan-Kelley. 2019.

[3]

Learning to Optimize Halide with Tree Search and Random Programs. ACM Trans. Graph. 38, 4, Article 121 (July 2019), 12 pages.

Digital Library

[4]

Amal Ahmed and Matthias Blume. 2008. Typed Closure Conversion Preserves Observational Equivalence. In International Conference on Functional Programming. ACM, 157–168.

[5]

Amal Ahmed and Matthias Blume. 2011. An Equivalence-Preserving CPS Translation via Multi-Language Semantics. In Proceedings of the 16th ACM SIGPLAN International Conference on Functional Programming (Tokyo, Japan) (ICFP ’11). ACM, 431–444.

Digital Library

[6]

C.P.R. Baaij. 2015.

[7]

Digital circuit in C λaSH: functional specifications and type-directed synthesis. Ph.D. Dissertation. University of Twente, Netherlands. eemcs-eprint- 23939.

[8]

Jonathan Bachrach, Huy Vo, Brian Richards, Yunsup Lee, Andrew Waterman, Rimas Avižienis, John Wawrzynek, and Krste Asanović. 2012. Chisel: constructing hardware in a scala embedded language. In DAC Design Automation Conference 2012. IEEE, 1212–1221.

Digital Library

[9]

Greet Bilsen, Marc Engels, Rudy Lauwereins, and Jean Peperstraete. 1996. Cycle-static dataflow. IEEE Transactions on signal processing 44, 2 (1996), 397–408.

Digital Library

[10]

Per Bjesse, Koen Claessen, Mary Sheeran, and Satnam Singh. 1998.

[11]

Lava: hardware design in Haskell. In ACM SIGPLAN Notices, Vol. 34. ACM, 174–184.

[12]

Guy E. Blelloch. 1993.

[13]

NESL: A Nested Data-Parallel Language (Version 2.6). Technical Report. Pittsburgh, PA, USA.

[14]

Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H Anderson, Stephen Brown, and Tomasz Czajkowski. 2011.

[15]

LegUp: high-level synthesis for FPGA-based processor/accelerator systems. In Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays. ACM, 33–36.

[16]

Manuel M. T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, Gabriele Keller, and Simon Marlow. 2007. Data Parallel Haskell: A Status Report. In Proceedings of the 2007 Workshop on Declarative Aspects of Multicore Programming (Nice, France) (DAMP ’07). ACM, New York, NY, USA, 10–18.

Digital Library

[17]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018.

[18]

Nitin Chugh, Vinay Vasista, Suresh Purini, and Uday Bondhugula. 2016. A DSL Compiler for Accelerating Image Processing Pipelines on FPGAs. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (Haifa, Israel) (PACT ’16). ACM, New York, NY, USA, 327–338.

Digital Library

[19]

Richard A Eisenberg and Stephanie Weirich. 2013. Dependently typed programming with singletons. ACM SIGPLAN Notices 47, 12 (2013), 117–130.

Digital Library

[20]

Conal Elliott. 2017. Generic functional parallel algorithms: Scan and FFT. Proc. ACM Program. Lang. 1, ICFP, Article 48 (Sept. 2017), 24 pages.

Digital Library

[21]

Jonathan Frankle, Peter-Michael Osera, David Walker, and Steve Zdancewic. 2016.

[22]

Example-directed synthesis: a type-theoretic interpretation. ACM SIGPLAN Notices 51, 1 (2016), 802–815.

[23]

Michael I Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S Meli, Andrew A Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, et al. 2002. A stream compiler for communication-exposed architectures. In ACM SIGOPS Operating Systems Review, Vol. 36. ACM, 291–303.

Digital Library

[24]

Bastian Hagedorn, Larisa Stoltzfus, Michel Steuwer, Sergei Gorlatch, and Christophe Dubach. 2018. High performance stencil code generation with Lift. In Proceedings of the 2018 International Symposium on Code Generation and Optimization. ACM, 100–112.

Digital Library

[25]

Nicholas Halbwachs, Paul Caspi, Pascal Raymond, and Daniel Pilaud. 1991.

[26]

The synchronous data flow programming language LUSTRE. Proc. IEEE 79, 9 (1991), 1305–1320.

[27]

James Hegarty, John Brunhaver, Zachary DeVito, Jonathan Ragan-Kelley, Noy Cohen, Steven Bell, Artem Vasilyev, Mark Horowitz, and Pat Hanrahan. 2014. Darkroom: compiling high-level image processing code into hardware pipelines. ACM Trans. Graph. 33, 4 (2014), 144–1.

Digital Library

[28]

James Hegarty, Ross Daly, Zachary DeVito, Jonathan Ragan-Kelley, Mark Horowitz, and Pat Hanrahan. 2016.

[29]

Rigel: Flexible multi-rate image processing hardware. ACM Transactions on Graphics (TOG) 35, 4 (2016), 85.

[30]

John L Hennessy and David A Patterson. 2019. A new golden age for computer architecture. Commun. ACM 62, 2 (2019), 48–60.

Digital Library

[31]

Chung-Kil Hur and Derek Dreyer. 2011.

[32]

A Kripke Logical Relation Between ML and Assembly. In Principles of Programming Languages. ACM, 133–146.

Digital Library

[33]

Sang Ho Kim and Jan P Allebach. 2005.

[34]

Optimal unsharp mask for image sharpening and noise removal. Journal of Electronic Imaging 14, 2 (2005), 023005.

[35]

Thaddeus Koehn and Peter Athanas. 2016. Arbitrary streaming permutations with minimum memory and latency. In 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 1–6.

Digital Library

[36]

David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis, et al. 2018.

[37]

Spatial: A language and compiler for application accelerators. In ACM Sigplan Notices, Vol. 53. ACM, 296–311.

[38]

David Koeplinger, Raghu Prabhakar, Yaqi Zhang, Christina Delimitrou, Christos Kozyrakis, and Kunle Olukotun. 2016. Automatic generation of efficient accelerators for reconfigurable hardware. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). Ieee, 115–127.

Digital Library

[39]

Martin Kristien, Bruno Bodin, Michel Steuwer, and Christophe Dubach. 2019. High-level synthesis of functional patterns with Lift. In Proceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming. ACM, 35–45.

Digital Library

[40]

Edward A Lee and David G Messerschmitt. 1987. Synchronous data flow. Proc. IEEE 75, 9 (1987), 1235–1245.

[41]

Wim Meeus, Kristof Van Beeck, Toon Goedemé, Jan Meel, and Dirk Stroobandt. 2012. An overview of today’s high-level synthesis tools. Design Automation for Embedded Systems 16, 3 (2012), 31–51.

Digital Library

[42]

Luigi Nardi, David Koeplinger, and Kunle Olukotun. 2019. Practical design space exploration. In 2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, 347–358.

[43]

Mirabelle Nebut. 2004.

[44]

An overview of the Signal clock calculus. Electronic Notes in Theoretical Computer Science 88 (2004), 39–54.

[45]

Max S. New, William J. Bowman, and Amal Ahmed. 2016. Fully Abstract Compilation via Universal Embedding. In International Conference on Functional Programming. ACM, 103–116.

Digital Library

[46]

Rachit Nigam, Sachille Atapattu, Samuel Thomas, Zhijing Li, Ted Bauer, Yuwei Yi, Apurva Koti, Adrian Sampson, and Zhiru Zhang. 2020. Predictable Accelerator Design with Time-Sensitive Affine types. Proceedings of the 41st ACM SIGPLAN Conference on Programming Type-Directed Scheduling of Streaming Accelerators PLDI ’20, June 15–20, 2020, London, UK Language Design and Implementation (2020), to appear.

Digital Library

[47]

Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-exampledirected program synthesis. ACM SIGPLAN Notices 50, 6 (2015), 619– 630.

Digital Library

[48]

Marco Patrignani, Amal Ahmed, and Dave Clarke. 2019. Formal Approaches to Secure Compilation A Survey of Fully Abstract Compilation and Related Work. ACM Comput. Surv. 51, 6, Article 125 (Jan. 2019), 36 pages.

[49]

Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program synthesis from polymorphic refinement types. ACM SIGPLAN Notices 51, 6 (2016), 522–538.

Digital Library

[50]

Claudius Ptolemaeus (Ed.). 2014.

[51]

System Design, Modeling, and Simulation using Ptolemy II. Ptolemy.org. http://ptolemy.org/books/Systems

[52]

Jing Pu, Steven Bell, Xuan Yang, Jeff Setter, Stephen Richardson, Jonathan Ragan-Kelley, and Mark Horowitz. 2017. Programming heterogeneous systems from an image processing DSL. ACM Transactions on Architecture and Code Optimization (TACO) 14, 3 (2017), 26.

[53]

Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Acm Sigplan Notices, Vol. 48. ACM, 519–530.

Digital Library

[54]

Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David Brooks. 2014.

[55]

Wilson Snyder and Jean-Philippe Lang. 2019.

[56]

Intro - Verilator - Veripool. https://www.veripool.org/projects/verilator/wiki/Intro

[57]

Michel Steuwer, Christian Fensch, Sam Lindley, and Christophe Dubach. 2015. Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code. ACM SIGPLAN Notices 50, 9 (2015), 205–217.

Digital Library

[58]

Robert Stewart, Kirsty Duncan, Greg Michaelson, Paulo Garcia, Deepayan Bhowmik, and Andrew Wallace. 2018. RIPL: A Parallel Image Processing Language for FPGAs. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 11, 1 (2018), 7.

[59]

Rinse Wester. 2015.

[60]

A transformation-based approach to hardware design using higher-order functions. Ph.D. Dissertation. University of Twente.

[61]

Xilinx, Inc. 2019.

Cited By

Chen HZhang NXiang SZeng ZDai MZhang Z(2024)Allo: A Programming Model for Composable Accelerator DesignProceedings of the ACM on Programming Languages10.1145/36564018:PLDI(593-620)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656401
Jang MRhee JLee WZhao SKang J(2024)Modular Hardware Design of Pipelined Circuits with HazardsProceedings of the ACM on Programming Languages10.1145/36563788:PLDI(28-51)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656378
Xiao YLuo ZZhou KLiang YZhang ZPutnam A(2024)Cement: Streamlining FPGA Hardware Design with Cycle-Deterministic eHDL and SynthesisProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637561(211-222)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3626202.3637561
Show More Cited By

Index Terms

Type-directed scheduling of streaming accelerators

Recommendations

Spatial: a language and compiler for application accelerators
PLDI 2018: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation

Industry is increasingly turning to reconfigurable architectures like FPGAs and CGRAs for improved performance and energy efficiency. Unfortunately, adoption of these architectures has been limited by their programming models. HDLs lack abstractions for ...
Implementation of the reconfiguration port scheduling on the erlangen slot machine
FPGA '09: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Despite the possibility to execute several hardware tasks in parallel on an FPGA, partial reconfiguration is sequential. There exist only one reconfiguration port which is used exclusively during the reconfiguration of a task. Single processor ...
Designing Run-Time Reconfigurable Systems with JHDL

Run-time reconfigurable (RTR) systems are FPGA-based systems that reconfigure FPGAs during execution to alter hardware organization and composition to meet the varying needs of applications as they execute. These systems are difficult to describe with ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

June 2020

1174 pages

ISBN:9781450376136

DOI:10.1145/3385412

General Chair:
Alastair F. Donaldson
Imperial College London, UK
,
Program Chair:
Emina Torlak
University of Washington, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

Air Force Research Laboratory
National Science Foundation
Bundesministerium für Bildung und Forschung
Defense Advanced Research Projects Agency

Conference

PLDI '20

Sponsor:

SIGPLAN

PLDI '20: 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation

June 15 - 20, 2020

London, UK

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
916
Total Downloads

Downloads (Last 12 months)215
Downloads (Last 6 weeks)27

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen HZhang NXiang SZeng ZDai MZhang Z(2024)Allo: A Programming Model for Composable Accelerator DesignProceedings of the ACM on Programming Languages10.1145/36564018:PLDI(593-620)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656401
Jang MRhee JLee WZhao SKang J(2024)Modular Hardware Design of Pipelined Circuits with HazardsProceedings of the ACM on Programming Languages10.1145/36563788:PLDI(28-51)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656378
Xiao YLuo ZZhou KLiang YZhang ZPutnam A(2024)Cement: Streamlining FPGA Hardware Design with Cycle-Deterministic eHDL and SynthesisProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637561(211-222)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3626202.3637561
Honorat ADardaillon MMiomandre HNezan J(2024)Automated Buffer Sizing of Dataflow Applications in a High-level Synthesis WorkflowACM Transactions on Reconfigurable Technology and Systems10.1145/362610317:1(1-26)Online publication date: 27-Jan-2024
https://dl.acm.org/doi/10.1145/3626103
Ye HJun HChen DTsafrir DMUSUVATHI MGupta RAbu-Ghazaleh N(2024)HIDA: A Hierarchical Dataflow Compiler for High-Level SynthesisProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624850(215-230)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3617232.3624850
Lu LLuo ZZheng SYin JCong JLiang YYin J(2024)Rubick: A Unified Infrastructure for Analyzing, Exploring, and Implementing Spatial Architectures via Dataflow DecompositionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333720843:4(1177-1190)Online publication date: Apr-2024
https://doi.org/10.1109/TCAD.2023.3337208
Kanetaka YTakagi HMaeda YFukushima N(2024)SlidingConv: Domain-Specific Description of Sliding Discrete Cosine Transform Convolution for HalideIEEE Access10.1109/ACCESS.2023.334566012(7563-7583)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2023.3345660
Choudhury ZGulati APurini S(2023)FlowPix: Accelerating Image Processing Pipelines on an FPGA Overlay using a Domain Specific CompilerACM Transactions on Architecture and Code Optimization10.1145/362952320:4(1-25)Online publication date: 25-Oct-2023
https://dl.acm.org/doi/10.1145/3629523
Majumder KBondhugula UAamodt TSwift MJerger N(2023)HIR: An MLIR-based Intermediate Representation for Hardware Accelerator DescriptionProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624767(189-201)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624767
Juang TSchlaak CDubach C(2023)Let Coarse-Grained Resources Be Shared: Mapping Entire Neural Networks on FPGAsACM Transactions on Embedded Computing Systems10.1145/360910922:5s(1-23)Online publication date: 31-Oct-2023
https://dl.acm.org/doi/10.1145/3609109
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents