Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3192366.3192379acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article
Public Access

Spatial: a language and compiler for application accelerators

Published: 11 June 2018 Publication History

Abstract

Industry is increasingly turning to reconfigurable architectures like FPGAs and CGRAs for improved performance and energy efficiency. Unfortunately, adoption of these architectures has been limited by their programming models. HDLs lack abstractions for productivity and are difficult to target from higher level languages. HLS tools are more productive, but offer an ad-hoc mix of software and hardware abstractions which make performance optimizations difficult.
In this work, we describe a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators. We describe Spatial's hardware-centric abstractions for both programmer productivity and design performance, and summarize the compiler passes required to support these abstractions, including pipeline scheduling, automatic memory banking, and automated design tuning driven by active machine learning. We demonstrate the language's ability to target FPGAs and CGRAs from common source code. We show that applications written in Spatial are, on average, 42% shorter and achieve a mean speedup of 2.9x over SDAccel HLS when targeting a Xilinx UltraScale+ VU9P FPGA on an Amazon EC2 F1 instance.

Supplementary Material

WEBM File (p296-koeplinger.webm)

References

[1]
2015. MyHDL. http://www.myhdl.org/.
[2]
2015. Vivado design suite 2015.1 user guide.
[3]
2016. Vivado High-Level Synthesis. http://www.xilinx.com/products/design-tools/vivado/integration/esl-design.html.
[4]
2017. EC2 F1 Instances with FPGAs Now Generally Available. aws.amazon.com/blogs/aws/ec2-f1-instances-with-fpgas-now-generally-available/.
[5]
2017. Intel FPGA SDK for OpenCL. https://www.altera.com/products/design-software/embedded-software-developers/opencl/overview.html.
[6]
2017. Neon 2.0: Optimized for Intel Architectures. https://www.intelnervana.com/neon-2-0-optimized-for-intel-architectures/.
[7]
2017. Wave Computing Launches Machine Learning Appliance. https://www.top500.org/news/wave-computing-launches-machine-learning-appliance/.
[8]
Arvind. 2003. Bluespec: A Language for Hardware Design, Simulation, Synthesis and Verification. Invited Talk. In Proceedings of the First ACM and IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE '03). IEEE Computer Society, Washington, DC, USA, 249-. http://dl.acm.org/citation.cfm?id=823453.823860
[9]
J. Bachrach, Huy Vo, B. Richards, Yunsup Lee, A. Waterman, R. Avizienis, J. Wawrzynek, and K. Asanovic. 2012. Chisel: Constructing hardware in a Scala embedded language. In Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE. 1212-1221.
[10]
David Bacon, Rodric Rabbah, and Sunil Shukla. 2013. FPGA Programming for the Masses. Queue 11, 2, Article 40 (Feb. 2013), 13 pages.
[11]
Bruno Bodin, Luigi Nardi, M. Zeeshan Zia, Harry Wagstaff, Govind Sreekar Shenoy, Murali Emani, John Mawer, Christos Kotselidis, Andy Nisbet, Mikel Lujan, Björn Franke, Paul H.J. Kelly, and Michael O'Boyle. 2016. Integrating Algorithmic Parameters into Benchmarking and Design Space Exploration in 3D Scene Understanding. In PACT.
[12]
Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H. Anderson, Stephen Brown, and Tomasz Czajkowski. 2011. LegUp: High-level Synthesis for FPGA-based Processor/ Accelerator Systems. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '11). ACM, New York, NY, USA, 33-36.
[13]
C. Cascaval, S. Chatterjee, H. Franke, K. J. Gildea, and P. Pattnaik. 2010. A taxonomy of accelerator architectures and their programming models. IBM Journal of Research and Development 54, 5 (Sept 2010), 5:1-5:10.
[14]
Nitin Chugh, Vinay Vasista, Suresh Purini, and Uday Bondhugula. 2016. A DSL compiler for accelerating image processing pipelines on FPGAs. In Parallel Architecture and Compilation Techniques (PACT), 2016 International Conference on. IEEE, 327-338.
[15]
Bjorn De Sutter, Praveen Raghavan, and Andy Lambrechts. 2013. Coarse-Grained Reconfigurable Array Architectures. Springer New York, New York, NY, 553-592.
[16]
Kayvon Fatahalian, Daniel Reiter Horn, Timothy J. Knight, Larkhoon Leem, Mike Houston, Ji Young Park, Mattan Erez, Manman Ren, Alex Aiken, William J. Dally, and Pat Hanrahan. 2006. Sequoia: Programming the Memory Hierarchy. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC '06). ACM, New York, NY, USA, Article 83.
[17]
V. Govindaraju, C. H. Ho, T. Nowatzki, J. Chhugani, N. Satish, K. Sankaralingam, and C. Kim. 2012. DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing. IEEE Micro 32, 5 (Sept 2012), 38-51.
[18]
Prabhat K. Gupta. 2015. Xeon+FPGA Platform for the Data Center. http://www.ece.cmu.edu/~calcm/carl/lib/exe/fetch.php?media=carl15-gupta.pdf.
[19]
James Hegarty, John Brunhaver, Zachary DeVito, Jonathan Ragan-Kelley, Noy Cohen, Steven Bell, Artem Vasilyev, Mark Horowitz, and Pat Hanrahan. 2014. Darkroom: compiling high-level image processing code into hardware pipelines. ACM Trans. Graph. 33, 4 (2014), 144-1.
[20]
James Hegarty, Ross Daly, Zachary DeVito, Jonathan Ragan-Kelley, Mark Horowitz, and Pat Hanrahan. 2016. Rigel: Flexible multi-rate image processing hardware. ACM Transactions on Graphics (TOG) 35, 4 (2016), 85.
[21]
Intel. 2015. Advanced NAND Flash Memory Single-Chip Storage Solution. www.altera.com/b/nand-flash-memory-controller.html?_ga=2.108749825.2041564619.1502344247-21903935.1501673108.
[22]
David Koeplinger, Raghu Prabhakar, Yaqi Zhang, Christina Delimitrou, Christos Kozyrakis, and Kunle Olukotun. 2016. Automatic Generation of Efficient Accelerators for Reconfigurable Hardware. In International Symposium in Computer Architecture (ISCA).
[23]
Yanqiang Liu, Yao Li, Weilun Xiong, Meng Lai, Cheng Chen, Zhengwei Qi, and Haibing Guan. 2017. Scala Based FPGA Design Flow. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 286-286.
[24]
Maxeler Technologies. 2011. MaxCompiler white paper.
[25]
Richard Membarth, Oliver Reiche, Frank Hannig, Jürgen Teich, Mario Körner, and Wieland Eckert. 2016. Hipa cc: A domain-specific language and compiler for image processing. IEEE Transactions on Parallel and Distributed Systems 27, 1 (2016), 210-224.
[26]
Razvan Nane, Vlad-Mihai Sima, Christian Pilato, Jongsok Choi, Blair Fort, Andrew Canis, Yu Ting Chen, Hsuan Hsiao, Stephen Brown, Fabrizio Ferrandi, et al. 2016. A survey and evaluation of fpga high-level synthesis tools. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 35, 10 (2016), 1591-1604.
[27]
Luigi Nardi, Bruno Bodin, Sajad Saeedi, Emanuele Vespa, Andrew J. Davison, and Paul H. J. Kelly. 2017. Algorithmic Performance-Accuracy Trade-off in 3D Vision Applications Using HyperMapper. In iWAPTIPDPS. http://arxiv.org/abs/1702.00505
[28]
Luigi Nardi, Bruno Bodin, M Zeeshan Zia, John Mawer, Andy Nisbet, Paul HJ Kelly, Andrew J Davison, Mikel Luján, Michael FP O'Boyle, Graham Riley, et al. 2015. Introducing SLAMBench, a Performance and Accuracy Benchmarking Methodology for SLAM. In ICRA.
[29]
Jian Ouyang, Shiding Lin, Wei Qi, Yong Wang, Bo Yu, and Song Jiang. 2014. SDA: Software-Defined Accelerator for LargeScale DNN Systems (Hot Chips 26).
[30]
Angshuman Parashar, Michael Pellauer, Michael Adler, Bushra Ahsan, Neal Crago, Daniel Lustig, Vladimir Pavlov, Antonia Zhai, Mohit Gambhir, Aamer Jaleel, Randy Allmon, Rachid Rayess, Stephen Maresh, and Joel Emer. 2013. Triggered Instructions: A Control Paradigm for Spatially-programmed Architectures. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 142-153.
[31]
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W Keckler, and William J Dally. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, 27-40.
[32]
Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matthew Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2017. Plasticine: A Reconfigurable Architecture For Parallel Paterns. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, Toronto, ON, Canada, June 24-28, 2017. 389-402.
[33]
Jing Pu, Steven Bell, Xuan Yang, Jeff Setter, Stephen Richardson, Jonathan Ragan-Kelley, and Mark Horowitz. 2016. Programming Heterogeneous Systems from an Image Processing DSL. CoRR abs/1610.09405 (2016). arXiv:1610.09405 http://arxiv.org/abs/1610.09405
[34]
Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger. 2014. A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA '14). IEEE Press, Piscataway, NJ, USA, 13-24. http://dl.acm.org/citation.cfm?id=2665671.2665678
[35]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '13). ACM, New York, NY, USA, 519-530.
[36]
Sajad Saeedi, Luigi Nardi, Edward Johns, Bruno Bodin, Paul Kelly, and Andrew Davison. 2017. Application-oriented Design Space Exploration for SLAM Algorithms. In ICRA.
[37]
Ofer Shacham. 2011. Chip multiprocessor generator: automatic generation of custom and heterogeneous compute platforms. Stanford University.
[38]
Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David Brooks. 2014. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures. In Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on. IEEE, 97-108.
[39]
Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2014. Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages. In TECS'14: ACM Transactions on Embedded Computing Systems.
[40]
Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, and Anand Raghunathan. 2017. ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 13-26.
[41]
Yuxin Wang, Peng Li, and Jason Cong. 2014. Theory and Algorithm for Generalized Memory Partitioning in High-level Synthesis. In Proceedings of the 2014 ACM/SIGDA International Symposium on Fieldprogrammable Gate Arrays (FPGA '14). ACM, New York, NY, USA, 199-208.
[42]
Xilinx. 2014. The Xilinx SDAccel Development Environment. https://www.xilinx.com/publications/prod_mktg/sdx/sdaccel-backgrounder.pdf.
[43]
Xilinx. 2017. HLS Pragmas. https://www.xilinx.com/html_docs/xilinx2017_2/sdaccel_doc/topics/pragmas/concept-Intro_to_HLS_pragmas.html.
[44]
Xilinx. 2017. SDAccel DATAFLOW pragma. https://www.xilinx.com/html_docs/xilinx2017_2/sdaccel_doc/topics/pragmas/ref-pragma_HLS_dataflow.html.
[45]
Xilinx. 2017. SDAccel Example Repository. https://github.com/Xilinx/SDAccel_Examples.

Cited By

View all
  • (2024)Unifying Static and Dynamic Intermediate Languages for Accelerator GeneratorsProceedings of the ACM on Programming Languages10.1145/36897908:OOPSLA2(2242-2267)Online publication date: 8-Oct-2024
  • (2024)Wavefront Threading Enables Effective High-Level SynthesisProceedings of the ACM on Programming Languages10.1145/36564208:PLDI(1066-1090)Online publication date: 20-Jun-2024
  • (2024)Allo: A Programming Model for Composable Accelerator DesignProceedings of the ACM on Programming Languages10.1145/36564018:PLDI(593-620)Online publication date: 20-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI 2018: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2018
825 pages
ISBN:9781450356985
DOI:10.1145/3192366
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CGRAs
  2. FPGAs
  3. compilers
  4. domain-specific languages
  5. hardware accelerators
  6. high-level synthesis
  7. reconfigurable architectures

Qualifiers

  • Research-article

Funding Sources

Conference

PLDI '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)777
  • Downloads (Last 6 weeks)97
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Unifying Static and Dynamic Intermediate Languages for Accelerator GeneratorsProceedings of the ACM on Programming Languages10.1145/36897908:OOPSLA2(2242-2267)Online publication date: 8-Oct-2024
  • (2024)Wavefront Threading Enables Effective High-Level SynthesisProceedings of the ACM on Programming Languages10.1145/36564208:PLDI(1066-1090)Online publication date: 20-Jun-2024
  • (2024)Allo: A Programming Model for Composable Accelerator DesignProceedings of the ACM on Programming Languages10.1145/36564018:PLDI(593-620)Online publication date: 20-Jun-2024
  • (2024)Using Source-to-Source to Target RISC-V Custom Extensions: UVE Case-StudyProceedings of the 16th Workshop on Rapid Simulation and Performance Evaluation for Design10.1145/3642921.3642930(42-50)Online publication date: 18-Jan-2024
  • (2024)Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUsCompanion of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629527.3651428(14-20)Online publication date: 7-May-2024
  • (2024)POPA: Expressing High and Portable Performance across Spatial and Vector Architectures for Tensor ComputationsProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637566(199-210)Online publication date: 1-Apr-2024
  • (2024)Cement: Streamlining FPGA Hardware Design with Cycle-Deterministic eHDL and SynthesisProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637561(211-222)Online publication date: 1-Apr-2024
  • (2024)FEASTA: A Flexible and Efficient Accelerator for Sparse Tensor Algebra in Machine LearningProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651336(349-366)Online publication date: 27-Apr-2024
  • (2024)HIDA: A Hierarchical Dataflow Compiler for High-Level SynthesisProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624850(215-230)Online publication date: 27-Apr-2024
  • (2024)A Machine Learning-Based Toolbox for P4 Programmable Data-PlanesIEEE Transactions on Network and Service Management10.1109/TNSM.2024.340207421:4(4450-4465)Online publication date: Aug-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media