Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3385412.3385983acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article
Public Access

Type-directed scheduling of streaming accelerators

Published: 11 June 2020 Publication History

Abstract

Designing efficient, application-specialized hardware accelerators requires assessing trade-offs between a hardware module’s performance and resource requirements. To facilitate hardware design space exploration, we describe Aetherling, a system for automatically compiling data-parallel programs into statically scheduled, streaming hardware circuits. Aetherling contributes a space- and time-aware intermediate language featuring data-parallel operators that represent parallel or sequential hardware modules, and sequence data types that encode a module’s throughput by specifying when sequence elements are produced or consumed. As a result, well-typed operator composition in the space-time language corresponds to connecting hardware modules via statically scheduled, streaming interfaces.
We provide rules for transforming programs written in a standard data-parallel language (that carries no information about hardware implementation) into equivalent space-time language programs. We then provide a scheduling algorithm that searches over the space of transformations to quickly generate area-efficient hardware designs that achieve a programmer-specified throughput. Using benchmarks from the image processing domain, we demonstrate that Aetherling enables rapid exploration of hardware designs with different throughput and area characteristics, and yields results that require 1.8-7.9× fewer FPGA slices than those of prior hardware generation systems.

References

[1]
2019. Vivado High-Level Synthesis. https://www.xilinx.com/products/ design-tools/vivado/integration/esl-design.html [Online; accessed 26-Mar-2020].
[2]
Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, and Jonathan Ragan-Kelley. 2019.
[3]
Learning to Optimize Halide with Tree Search and Random Programs. ACM Trans. Graph. 38, 4, Article 121 (July 2019), 12 pages.
[4]
Amal Ahmed and Matthias Blume. 2008. Typed Closure Conversion Preserves Observational Equivalence. In International Conference on Functional Programming. ACM, 157–168.
[5]
Amal Ahmed and Matthias Blume. 2011. An Equivalence-Preserving CPS Translation via Multi-Language Semantics. In Proceedings of the 16th ACM SIGPLAN International Conference on Functional Programming (Tokyo, Japan) (ICFP ’11). ACM, 431–444.
[6]
C.P.R. Baaij. 2015.
[7]
Digital circuit in C λaSH: functional specifications and type-directed synthesis. Ph.D. Dissertation. University of Twente, Netherlands. eemcs-eprint- 23939.
[8]
Jonathan Bachrach, Huy Vo, Brian Richards, Yunsup Lee, Andrew Waterman, Rimas Avižienis, John Wawrzynek, and Krste Asanović. 2012. Chisel: constructing hardware in a scala embedded language. In DAC Design Automation Conference 2012. IEEE, 1212–1221.
[9]
Greet Bilsen, Marc Engels, Rudy Lauwereins, and Jean Peperstraete. 1996. Cycle-static dataflow. IEEE Transactions on signal processing 44, 2 (1996), 397–408.
[10]
Per Bjesse, Koen Claessen, Mary Sheeran, and Satnam Singh. 1998.
[11]
Lava: hardware design in Haskell. In ACM SIGPLAN Notices, Vol. 34. ACM, 174–184.
[12]
Guy E. Blelloch. 1993.
[13]
NESL: A Nested Data-Parallel Language (Version 2.6). Technical Report. Pittsburgh, PA, USA.
[14]
Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H Anderson, Stephen Brown, and Tomasz Czajkowski. 2011.
[15]
LegUp: high-level synthesis for FPGA-based processor/accelerator systems. In Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays. ACM, 33–36.
[16]
Manuel M. T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, Gabriele Keller, and Simon Marlow. 2007. Data Parallel Haskell: A Status Report. In Proceedings of the 2007 Workshop on Declarative Aspects of Multicore Programming (Nice, France) (DAMP ’07). ACM, New York, NY, USA, 10–18.
[17]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018.
[18]
Nitin Chugh, Vinay Vasista, Suresh Purini, and Uday Bondhugula. 2016. A DSL Compiler for Accelerating Image Processing Pipelines on FPGAs. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (Haifa, Israel) (PACT ’16). ACM, New York, NY, USA, 327–338.
[19]
Richard A Eisenberg and Stephanie Weirich. 2013. Dependently typed programming with singletons. ACM SIGPLAN Notices 47, 12 (2013), 117–130.
[20]
Conal Elliott. 2017. Generic functional parallel algorithms: Scan and FFT. Proc. ACM Program. Lang. 1, ICFP, Article 48 (Sept. 2017), 24 pages.
[21]
Jonathan Frankle, Peter-Michael Osera, David Walker, and Steve Zdancewic. 2016.
[22]
Example-directed synthesis: a type-theoretic interpretation. ACM SIGPLAN Notices 51, 1 (2016), 802–815.
[23]
Michael I Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S Meli, Andrew A Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, et al. 2002. A stream compiler for communication-exposed architectures. In ACM SIGOPS Operating Systems Review, Vol. 36. ACM, 291–303.
[24]
Bastian Hagedorn, Larisa Stoltzfus, Michel Steuwer, Sergei Gorlatch, and Christophe Dubach. 2018. High performance stencil code generation with Lift. In Proceedings of the 2018 International Symposium on Code Generation and Optimization. ACM, 100–112.
[25]
Nicholas Halbwachs, Paul Caspi, Pascal Raymond, and Daniel Pilaud. 1991.
[26]
The synchronous data flow programming language LUSTRE. Proc. IEEE 79, 9 (1991), 1305–1320.
[27]
James Hegarty, John Brunhaver, Zachary DeVito, Jonathan Ragan-Kelley, Noy Cohen, Steven Bell, Artem Vasilyev, Mark Horowitz, and Pat Hanrahan. 2014. Darkroom: compiling high-level image processing code into hardware pipelines. ACM Trans. Graph. 33, 4 (2014), 144–1.
[28]
James Hegarty, Ross Daly, Zachary DeVito, Jonathan Ragan-Kelley, Mark Horowitz, and Pat Hanrahan. 2016.
[29]
Rigel: Flexible multi-rate image processing hardware. ACM Transactions on Graphics (TOG) 35, 4 (2016), 85.
[30]
John L Hennessy and David A Patterson. 2019. A new golden age for computer architecture. Commun. ACM 62, 2 (2019), 48–60.
[31]
Chung-Kil Hur and Derek Dreyer. 2011.
[32]
A Kripke Logical Relation Between ML and Assembly. In Principles of Programming Languages. ACM, 133–146.
[33]
Sang Ho Kim and Jan P Allebach. 2005.
[34]
Optimal unsharp mask for image sharpening and noise removal. Journal of Electronic Imaging 14, 2 (2005), 023005.
[35]
Thaddeus Koehn and Peter Athanas. 2016. Arbitrary streaming permutations with minimum memory and latency. In 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 1–6.
[36]
David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis, et al. 2018.
[37]
Spatial: A language and compiler for application accelerators. In ACM Sigplan Notices, Vol. 53. ACM, 296–311.
[38]
David Koeplinger, Raghu Prabhakar, Yaqi Zhang, Christina Delimitrou, Christos Kozyrakis, and Kunle Olukotun. 2016. Automatic generation of efficient accelerators for reconfigurable hardware. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). Ieee, 115–127.
[39]
Martin Kristien, Bruno Bodin, Michel Steuwer, and Christophe Dubach. 2019. High-level synthesis of functional patterns with Lift. In Proceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming. ACM, 35–45.
[40]
Edward A Lee and David G Messerschmitt. 1987. Synchronous data flow. Proc. IEEE 75, 9 (1987), 1235–1245.
[41]
Wim Meeus, Kristof Van Beeck, Toon Goedemé, Jan Meel, and Dirk Stroobandt. 2012. An overview of today’s high-level synthesis tools. Design Automation for Embedded Systems 16, 3 (2012), 31–51.
[42]
Luigi Nardi, David Koeplinger, and Kunle Olukotun. 2019. Practical design space exploration. In 2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, 347–358.
[43]
Mirabelle Nebut. 2004.
[44]
An overview of the Signal clock calculus. Electronic Notes in Theoretical Computer Science 88 (2004), 39–54.
[45]
Max S. New, William J. Bowman, and Amal Ahmed. 2016. Fully Abstract Compilation via Universal Embedding. In International Conference on Functional Programming. ACM, 103–116.
[46]
Rachit Nigam, Sachille Atapattu, Samuel Thomas, Zhijing Li, Ted Bauer, Yuwei Yi, Apurva Koti, Adrian Sampson, and Zhiru Zhang. 2020. Predictable Accelerator Design with Time-Sensitive Affine types. Proceedings of the 41st ACM SIGPLAN Conference on Programming Type-Directed Scheduling of Streaming Accelerators PLDI ’20, June 15–20, 2020, London, UK Language Design and Implementation (2020), to appear.
[47]
Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-exampledirected program synthesis. ACM SIGPLAN Notices 50, 6 (2015), 619– 630.
[48]
Marco Patrignani, Amal Ahmed, and Dave Clarke. 2019. Formal Approaches to Secure Compilation A Survey of Fully Abstract Compilation and Related Work. ACM Comput. Surv. 51, 6, Article 125 (Jan. 2019), 36 pages.
[49]
Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program synthesis from polymorphic refinement types. ACM SIGPLAN Notices 51, 6 (2016), 522–538.
[50]
Claudius Ptolemaeus (Ed.). 2014.
[51]
System Design, Modeling, and Simulation using Ptolemy II. Ptolemy.org. http://ptolemy.org/books/Systems
[52]
Jing Pu, Steven Bell, Xuan Yang, Jeff Setter, Stephen Richardson, Jonathan Ragan-Kelley, and Mark Horowitz. 2017. Programming heterogeneous systems from an image processing DSL. ACM Transactions on Architecture and Code Optimization (TACO) 14, 3 (2017), 26.
[53]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Acm Sigplan Notices, Vol. 48. ACM, 519–530.
[54]
Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David Brooks. 2014.
[55]
Wilson Snyder and Jean-Philippe Lang. 2019.
[56]
Intro - Verilator - Veripool. https://www.veripool.org/projects/verilator/wiki/Intro
[57]
Michel Steuwer, Christian Fensch, Sam Lindley, and Christophe Dubach. 2015. Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code. ACM SIGPLAN Notices 50, 9 (2015), 205–217.
[58]
Robert Stewart, Kirsty Duncan, Greg Michaelson, Paulo Garcia, Deepayan Bhowmik, and Andrew Wallace. 2018. RIPL: A Parallel Image Processing Language for FPGAs. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 11, 1 (2018), 7.
[59]
Rinse Wester. 2015.
[60]
A transformation-based approach to hardware design using higher-order functions. Ph.D. Dissertation. University of Twente.
[61]
Xilinx, Inc. 2019.

Cited By

View all
  • (2024)Allo: A Programming Model for Composable Accelerator DesignProceedings of the ACM on Programming Languages10.1145/36564018:PLDI(593-620)Online publication date: 20-Jun-2024
  • (2024)Modular Hardware Design of Pipelined Circuits with HazardsProceedings of the ACM on Programming Languages10.1145/36563788:PLDI(28-51)Online publication date: 20-Jun-2024
  • (2024)Cement: Streamlining FPGA Hardware Design with Cycle-Deterministic eHDL and SynthesisProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637561(211-222)Online publication date: 1-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2020
1174 pages
ISBN:9781450376136
DOI:10.1145/3385412
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2020

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. FPGAs
  2. hardware description languages
  3. image processing
  4. scheduling
  5. space-time types

Qualifiers

  • Research-article

Funding Sources

Conference

PLDI '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)215
  • Downloads (Last 6 weeks)27
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Allo: A Programming Model for Composable Accelerator DesignProceedings of the ACM on Programming Languages10.1145/36564018:PLDI(593-620)Online publication date: 20-Jun-2024
  • (2024)Modular Hardware Design of Pipelined Circuits with HazardsProceedings of the ACM on Programming Languages10.1145/36563788:PLDI(28-51)Online publication date: 20-Jun-2024
  • (2024)Cement: Streamlining FPGA Hardware Design with Cycle-Deterministic eHDL and SynthesisProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637561(211-222)Online publication date: 1-Apr-2024
  • (2024)Automated Buffer Sizing of Dataflow Applications in a High-level Synthesis WorkflowACM Transactions on Reconfigurable Technology and Systems10.1145/362610317:1(1-26)Online publication date: 27-Jan-2024
  • (2024)HIDA: A Hierarchical Dataflow Compiler for High-Level SynthesisProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624850(215-230)Online publication date: 27-Apr-2024
  • (2024)Rubick: A Unified Infrastructure for Analyzing, Exploring, and Implementing Spatial Architectures via Dataflow DecompositionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333720843:4(1177-1190)Online publication date: Apr-2024
  • (2024)SlidingConv: Domain-Specific Description of Sliding Discrete Cosine Transform Convolution for HalideIEEE Access10.1109/ACCESS.2023.334566012(7563-7583)Online publication date: 2024
  • (2023)FlowPix: Accelerating Image Processing Pipelines on an FPGA Overlay using a Domain Specific CompilerACM Transactions on Architecture and Code Optimization10.1145/362952320:4(1-25)Online publication date: 25-Oct-2023
  • (2023)HIR: An MLIR-based Intermediate Representation for Hardware Accelerator DescriptionProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624767(189-201)Online publication date: 25-Mar-2023
  • (2023)Let Coarse-Grained Resources Be Shared: Mapping Entire Neural Networks on FPGAsACM Transactions on Embedded Computing Systems10.1145/360910922:5s(1-23)Online publication date: 31-Oct-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media