Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Profile Guided Dataflow Transformation for FPGAs and CPUs

Published: 01 April 2017 Publication History

Abstract

This paper proposes a new high-level approach for optimising field programmable gate array (FPGA) designs. FPGA designs are commonly implemented in low-level hardware description languages (HDLs), which lack the abstractions necessary for identifying opportunities for significant performance improvements. Using a computer vision case study, we show that modelling computation with dataflow abstractions enables substantial restructuring of FPGA designs before lowering to the HDL level, and also improve CPU performance. Using the CPU transformations, runtime is reduced by 43 %. Using the FPGA transformations, clock frequency is increased from 67MHz to 110MHz. Our results outperform commercial low-level HDL optimisations, showcasing dataflow program abstraction as an amenable computation model for highly effective FPGA optimisation.

References

[1]
Adl-Tabatabai, A., Cierniak, M., Lueh, G., Parikh, V.M., & Stichnoth, J.M. (1998). Fast, effective code generation in a just-in-time java compiler. In Proceedings of the ACM SIGPLAN '98 Conference on programming language design and implementation (PLDI), Montreal, Canada, June 17-19, 1998, pp. 280---290. ACM.
[2]
Bacon, D.F., Graham, S.L., & Sharp, O.J. (1994). Compiler transformations for high-performance computing. ACM Computing Surveys, 26(4), 345---420.
[3]
Bezati, E., Mattavelli, M., & Janneck, J.W. (2013). High-level synthesis of dataflow programs for signal processing systems. In International symposium on image and signal processing and analysis (ISPA), Trieste, Italy September 4-6, pp. 750---754. IEEE.
[4]
Bhowmik, D., Wallace, A.M., Stewart, R., Qian, X., & Michaelson, G.J. (2014). Profile driven dataflow optimisation of mean shift visual tracking. In IEEE Global conference on signal and information processing, GlobalSIP 2014, Atlanta, GA, USA, December 3-5, pp. 1---5.
[5]
Bonenfant, A., Chen, Z., Hammond, K., Michaelson, G., Wallace, A., & Wallace, I. (2007). Towards Resource-certified software: A formal Cost Model for Time and Its Application to an Image-Processing Example. In Proceedings ACM symposium on applied computing, pp. 1307---1314.
[6]
Brown, C., Danelutto, M., Hammond, K., Kilpatrick, P., & Elliott, A. (2014). Cost-directed refactoring for parallel erlang programs. International Journal of Parallel Programming, 42(4), 564--- 582.
[7]
Brown, C., Loidl, H., & Hammond, K. (2011). ParaForming: Forming parallel haskell programs using novel refactoring techniques. In Peña, R., & Page, R.L. (Eds.) Trends in functional programming, 12th international symposium, TFP 2011, Madrid, Spain, May 16-18, 2011, revised selected papers, lecture notes in computer science, vol. 7193, pp. 82---97. Springer.
[8]
Brunet, S.C., Alberti, C., Mattavelli, M., & Janneck, J.W. (2013). Turnus: A unified dataflow design space exploration framework for heterogeneous parallel systems. In Conference on design and architectures for signal and image processing, Cagliari, Italy, October 8-10, 2013, pp. 47---54. IEEE.
[9]
Chang, P.P., Mahlke, S.A., Chen, W.Y., mei, W., & Hwu, W. (1992). Profile-guided automatic inline expansion for C programs. Software, Practice Experience, 22(5), 349---369.
[10]
Comaniciu, D., Ramesh, V., & Meer, P. (2003). Kernel-based object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5), 564---577.
[11]
Dagum, L., & Menon, R. (1998). OpenMP: An industry-standard api for shared-memory programming. IEEE Computational Science and Engineering, 5(1), 46---55.
[12]
Eker, J., & Janneck, J.W. (2003). CAL language report specification of the CAL actor language. Tech. Rep. UCB/ERL M03/48, EECS Department. Berkeley: University of California. http://www.eecs.berkeley.edu/Pubs/TechRpts/2003/4186.html.
[13]
Floating-point working group, IEEE computer society: IEEE standard for binary floating-point arithmetic (1985). Note: Standard 754---1985.
[14]
Gordon, M.I., Thies, W., Karczmarek, M., Lin, J., Meli, A.S., Lamb, A.A., Leger, C., Wong, J., Hoffmann, H., Maze, D., & Amarasinghe, S.P. (2002). A stream compiler for communication-exposed architectures. In Proceedings of the 10th international conference on architectural support for programming languages and operating systems (ASPLOS-X), San Jose, California, USA, October 5-9, 2002., pp. 291---303.
[15]
Govindu, G., Zhuo, L., Choi, S., & Prasanna, V.K. (2004). Analysis of High-Performance Floating-Point Arithmetic on FPGAs. In 18th International parallel and distributed processing symposium (IPDPS 2004), CD-ROM / abstracts proceedings, 26-30 April, Santa Fe, New Mexico, USA. IEEE Computer Society.
[16]
Grov, G., & Michaelson, G. (2010). Hume box calculus: Robust system development through software transformation. Higher-Order and Symbolic Computation, 23(2), 191---226.
[17]
Intel: Intel VTune performance analyzer. https://software.intel.com/en-us/intel-vtune-amplifier-xe.
[18]
Janneck, J.W., Mattavelli, M., Raulet, M., & Wipliez, M. (2010). Reconfigurable video coding: A stream programming approach to the specification of new video coding standards. In Feng, W., & Mayer-Patel, K. (Eds.) Proceedings of the first annual ACM SIGMM conference on multimedia systems, MMSys 2010, Phoenix, Arizona, USA, February 22-23, 2010, pp. 223---234. ACM.
[19]
Janneck, J.W., Miller, I.D., Parlour, D.B., Roquier, G., Wipliez, M., & Raulet, M. (2011). Synthesizing hardware from dataflow programs - An MPEG-4 simple profile decoder case study. Signal Processing Systems, 63 (2), 241---249.
[20]
Kuck, D.J. (1977). A survey of parallel machine organization and programming. ACM Computing Surveys, 9 (1), 29---59.
[21]
Marathe, J., & Mueller, F. (2006). Hardware Profile-guided automatic page placement for ccnuma systems. In J.Torrellas, & S.Chatterjee (Eds.) Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming, PPOPP 2006, New York, New York, USA, March 29-31, pp. 90---99. ACM.
[22]
of Reading, U.: Performance evaluation of tracking and surveillance (PETS 2009) dataset (2009). http://www.cvg.rdg.ac.uk/PETS2009/.
[23]
Scholz, S. (2003). Single Assignment C: Efficient support for high-level array operations in a functional setting. Journal of Functional Programming, 1(6), 1005--1059.
[24]
Stewart, R., Bhowmik, D., Michaelson, G., & Wallace, A. (2015). Open access dataset for profile guided dataflow transformation for FPGAs and CPUs.
[25]
Trinder, P.W., Hammond, K., Loidl, H.W., & Peyton Jones, S.L. (1998). Algorithm + Strategy = Parallelism. Journal of Functional Programming, 8(1), 23---60.
[26]
Underwood, K.D. (2004). FPGAs vs. CPUs: Trends in peak floating-point performance. In R. Tessier, & H. Schmit (Eds.) Proceedings of the ACM/SIGDA 12th international symposium on field programmable gate arrays, FPGA 2004, Monterey, California, USA, February 22---24, 2004, pp. 171---180. ACM.
[27]
Xilinx: ISE design suite. http://www.xilinx.com/products/design-tools/ise-design-suite.
[28]
Yviquel, H., Lorence, A., Jerbi, K., Cocherel, G., Sanchez, A., & Raulet, M. (2013). Orcc: Multimedia development made easy. In ACM multimedia conference, MM '13, Barcelona, Spain, October 21---25, 2013, pp. 863---866. ACM.

Cited By

View all
  • (2023)Scalable Actor Networks with CALProceedings of the 21st ACM-IEEE International Conference on Formal Methods and Models for System Design10.1145/3610579.3611074(169-179)Online publication date: 21-Sep-2023
  • (2022)Optimizing data reshaping operations in functional IRs for high-level synthesisProceedings of the 23rd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3519941.3535069(61-72)Online publication date: 14-Jun-2022
  • (2022)Memory-Aware Functional IR for Higher-Level Synthesis of AcceleratorsACM Transactions on Architecture and Code Optimization10.1145/350176819:2(1-26)Online publication date: 31-Jan-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of Signal Processing Systems
Journal of Signal Processing Systems  Volume 87, Issue 1
April 2017
172 pages
ISSN:1939-8018
EISSN:1939-8115
Issue’s Table of Contents

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 April 2017

Author Tags

  1. CPU
  2. Dataflow
  3. FPGA
  4. Profiling
  5. Transformations

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Scalable Actor Networks with CALProceedings of the 21st ACM-IEEE International Conference on Formal Methods and Models for System Design10.1145/3610579.3611074(169-179)Online publication date: 21-Sep-2023
  • (2022)Optimizing data reshaping operations in functional IRs for high-level synthesisProceedings of the 23rd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3519941.3535069(61-72)Online publication date: 14-Jun-2022
  • (2022)Memory-Aware Functional IR for Higher-Level Synthesis of AcceleratorsACM Transactions on Architecture and Code Optimization10.1145/350176819:2(1-26)Online publication date: 31-Jan-2022
  • (2019)High-level synthesis of functional patterns with LiftProceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3315454.3329957(35-45)Online publication date: 8-Jun-2019
  • (2018)RIPLACM Transactions on Reconfigurable Technology and Systems10.1145/318048111:1(1-24)Online publication date: 14-Mar-2018

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media