article

Profile Guided Dataflow Transformation for FPGAs and CPUs

Authors:

Robert Stewart,

Deepayan Bhowmik,

Andrew Wallace,

Greg MichaelsonAuthors Info & Claims

Journal of Signal Processing Systems, Volume 87, Issue 1

Pages 3 - 20

https://doi.org/10.1007/s11265-015-1044-y

Published: 01 April 2017 Publication History

Abstract

This paper proposes a new high-level approach for optimising field programmable gate array (FPGA) designs. FPGA designs are commonly implemented in low-level hardware description languages (HDLs), which lack the abstractions necessary for identifying opportunities for significant performance improvements. Using a computer vision case study, we show that modelling computation with dataflow abstractions enables substantial restructuring of FPGA designs before lowering to the HDL level, and also improve CPU performance. Using the CPU transformations, runtime is reduced by 43 %. Using the FPGA transformations, clock frequency is increased from 67MHz to 110MHz. Our results outperform commercial low-level HDL optimisations, showcasing dataflow program abstraction as an amenable computation model for highly effective FPGA optimisation.

References

[1]

Adl-Tabatabai, A., Cierniak, M., Lueh, G., Parikh, V.M., & Stichnoth, J.M. (1998). Fast, effective code generation in a just-in-time java compiler. In Proceedings of the ACM SIGPLAN '98 Conference on programming language design and implementation (PLDI), Montreal, Canada, June 17-19, 1998, pp. 280---290. ACM.

Digital Library

[2]

Bacon, D.F., Graham, S.L., & Sharp, O.J. (1994). Compiler transformations for high-performance computing. ACM Computing Surveys, 26(4), 345---420.

Digital Library

[3]

Bezati, E., Mattavelli, M., & Janneck, J.W. (2013). High-level synthesis of dataflow programs for signal processing systems. In International symposium on image and signal processing and analysis (ISPA), Trieste, Italy September 4-6, pp. 750---754. IEEE.

[4]

Bhowmik, D., Wallace, A.M., Stewart, R., Qian, X., & Michaelson, G.J. (2014). Profile driven dataflow optimisation of mean shift visual tracking. In IEEE Global conference on signal and information processing, GlobalSIP 2014, Atlanta, GA, USA, December 3-5, pp. 1---5.

[5]

Bonenfant, A., Chen, Z., Hammond, K., Michaelson, G., Wallace, A., & Wallace, I. (2007). Towards Resource-certified software: A formal Cost Model for Time and Its Application to an Image-Processing Example. In Proceedings ACM symposium on applied computing, pp. 1307---1314.

Digital Library

[6]

Brown, C., Danelutto, M., Hammond, K., Kilpatrick, P., & Elliott, A. (2014). Cost-directed refactoring for parallel erlang programs. International Journal of Parallel Programming, 42(4), 564--- 582.

Digital Library

[7]

Brown, C., Loidl, H., & Hammond, K. (2011). ParaForming: Forming parallel haskell programs using novel refactoring techniques. In Peña, R., & Page, R.L. (Eds.) Trends in functional programming, 12th international symposium, TFP 2011, Madrid, Spain, May 16-18, 2011, revised selected papers, lecture notes in computer science, vol. 7193, pp. 82---97. Springer.

Digital Library

[8]

Brunet, S.C., Alberti, C., Mattavelli, M., & Janneck, J.W. (2013). Turnus: A unified dataflow design space exploration framework for heterogeneous parallel systems. In Conference on design and architectures for signal and image processing, Cagliari, Italy, October 8-10, 2013, pp. 47---54. IEEE.

[9]

Chang, P.P., Mahlke, S.A., Chen, W.Y., mei, W., & Hwu, W. (1992). Profile-guided automatic inline expansion for C programs. Software, Practice Experience, 22(5), 349---369.

Digital Library

[10]

Comaniciu, D., Ramesh, V., & Meer, P. (2003). Kernel-based object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5), 564---577.

Digital Library

[11]

Dagum, L., & Menon, R. (1998). OpenMP: An industry-standard api for shared-memory programming. IEEE Computational Science and Engineering, 5(1), 46---55.

Digital Library

[12]

Eker, J., & Janneck, J.W. (2003). CAL language report specification of the CAL actor language. Tech. Rep. UCB/ERL M03/48, EECS Department. Berkeley: University of California. http://www.eecs.berkeley.edu/Pubs/TechRpts/2003/4186.html.

[13]

Floating-point working group, IEEE computer society: IEEE standard for binary floating-point arithmetic (1985). Note: Standard 754---1985.

[14]

Gordon, M.I., Thies, W., Karczmarek, M., Lin, J., Meli, A.S., Lamb, A.A., Leger, C., Wong, J., Hoffmann, H., Maze, D., & Amarasinghe, S.P. (2002). A stream compiler for communication-exposed architectures. In Proceedings of the 10th international conference on architectural support for programming languages and operating systems (ASPLOS-X), San Jose, California, USA, October 5-9, 2002., pp. 291---303.

Digital Library

[15]

Govindu, G., Zhuo, L., Choi, S., & Prasanna, V.K. (2004). Analysis of High-Performance Floating-Point Arithmetic on FPGAs. In 18th International parallel and distributed processing symposium (IPDPS 2004), CD-ROM / abstracts proceedings, 26-30 April, Santa Fe, New Mexico, USA. IEEE Computer Society.

[16]

Grov, G., & Michaelson, G. (2010). Hume box calculus: Robust system development through software transformation. Higher-Order and Symbolic Computation, 23(2), 191---226.

Digital Library

[17]

Intel: Intel VTune performance analyzer. https://software.intel.com/en-us/intel-vtune-amplifier-xe.

[18]

Janneck, J.W., Mattavelli, M., Raulet, M., & Wipliez, M. (2010). Reconfigurable video coding: A stream programming approach to the specification of new video coding standards. In Feng, W., & Mayer-Patel, K. (Eds.) Proceedings of the first annual ACM SIGMM conference on multimedia systems, MMSys 2010, Phoenix, Arizona, USA, February 22-23, 2010, pp. 223---234. ACM.

Digital Library

[19]

Janneck, J.W., Miller, I.D., Parlour, D.B., Roquier, G., Wipliez, M., & Raulet, M. (2011). Synthesizing hardware from dataflow programs - An MPEG-4 simple profile decoder case study. Signal Processing Systems, 63 (2), 241---249.

Digital Library

[20]

Kuck, D.J. (1977). A survey of parallel machine organization and programming. ACM Computing Surveys, 9 (1), 29---59.

Digital Library

[21]

Marathe, J., & Mueller, F. (2006). Hardware Profile-guided automatic page placement for ccnuma systems. In J.Torrellas, & S.Chatterjee (Eds.) Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming, PPOPP 2006, New York, New York, USA, March 29-31, pp. 90---99. ACM.

Digital Library

[22]

of Reading, U.: Performance evaluation of tracking and surveillance (PETS 2009) dataset (2009). http://www.cvg.rdg.ac.uk/PETS2009/.

[23]

Scholz, S. (2003). Single Assignment C: Efficient support for high-level array operations in a functional setting. Journal of Functional Programming, 1(6), 1005--1059.

Digital Library

[24]

Stewart, R., Bhowmik, D., Michaelson, G., & Wallace, A. (2015). Open access dataset for profile guided dataflow transformation for FPGAs and CPUs.

[25]

Trinder, P.W., Hammond, K., Loidl, H.W., & Peyton Jones, S.L. (1998). Algorithm + Strategy = Parallelism. Journal of Functional Programming, 8(1), 23---60.

Digital Library

[26]

Underwood, K.D. (2004). FPGAs vs. CPUs: Trends in peak floating-point performance. In R. Tessier, & H. Schmit (Eds.) Proceedings of the ACM/SIGDA 12th international symposium on field programmable gate arrays, FPGA 2004, Monterey, California, USA, February 22---24, 2004, pp. 171---180. ACM.

Digital Library

[27]

Xilinx: ISE design suite. http://www.xilinx.com/products/design-tools/ise-design-suite.

[28]

Yviquel, H., Lorence, A., Jerbi, K., Cocherel, G., Sanchez, A., & Raulet, M. (2013). Orcc: Multimedia development made easy. In ACM multimedia conference, MM '13, Barcelona, Spain, October 21---25, 2013, pp. 863---866. ACM.

Digital Library

Cited By

Callanan GGruian FZhu Qvon Hanxleden RStephen EBrandt J(2023)Scalable Actor Networks with CALProceedings of the 21st ACM-IEEE International Conference on Formal Methods and Models for System Design10.1145/3610579.3611074(169-179)Online publication date: 21-Sep-2023
https://dl.acm.org/doi/10.1145/3610579.3611074
Schlaak CJuang TDubach CGrosser TLee K(2022)Optimizing data reshaping operations in functional IRs for high-level synthesisProceedings of the 23rd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3519941.3535069(61-72)Online publication date: 14-Jun-2022
https://dl.acm.org/doi/10.1145/3519941.3535069
Schlaak CJuang TDubach C(2022)Memory-Aware Functional IR for Higher-Level Synthesis of AcceleratorsACM Transactions on Architecture and Code Optimization10.1145/350176819:2(1-26)Online publication date: 31-Jan-2022
https://dl.acm.org/doi/10.1145/3501768
Show More Cited By

Recommendations

A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs
FPGA '14: Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

Sparse Matrix-Vector Multiplication (SpMxV) is a widely used mathematical operation in many high-performance scientific and engineering applications. In recent years, tuned software libraries for multi-core microprocessors (CPUs) and graphics processing ...
High-level dataflow design of signal processing systems for reconfigurable and multicore heterogeneous platforms

The potential computational power of today multicore processors has drastically improved compared to the single processor architecture. Since the trend of increasing the processor frequency is almost over, the competition for increased performance has ...
RIPL: A Parallel Image Processing Language for FPGAs
Special Section on FCCM 2016 and Regular Papers

Specialized FPGA implementations can deliver higher performance and greater power efficiency than embedded CPU or GPU implementations for real-time image processing. Programming challenges limit their wider use, because the implementation of FPGA ...

Comments

Information & Contributors

Information

Published In

cover image Journal of Signal Processing Systems

Journal of Signal Processing Systems Volume 87, Issue 1

April 2017

172 pages

ISSN:1939-8018

EISSN:1939-8115

Issue’s Table of Contents

Copyright © Copyright © 2017 Springer Science+Business Media New York.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 April 2017

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Callanan GGruian FZhu Qvon Hanxleden RStephen EBrandt J(2023)Scalable Actor Networks with CALProceedings of the 21st ACM-IEEE International Conference on Formal Methods and Models for System Design10.1145/3610579.3611074(169-179)Online publication date: 21-Sep-2023
https://dl.acm.org/doi/10.1145/3610579.3611074
Schlaak CJuang TDubach CGrosser TLee K(2022)Optimizing data reshaping operations in functional IRs for high-level synthesisProceedings of the 23rd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3519941.3535069(61-72)Online publication date: 14-Jun-2022
https://dl.acm.org/doi/10.1145/3519941.3535069
Schlaak CJuang TDubach C(2022)Memory-Aware Functional IR for Higher-Level Synthesis of AcceleratorsACM Transactions on Architecture and Code Optimization10.1145/350176819:2(1-26)Online publication date: 31-Jan-2022
https://dl.acm.org/doi/10.1145/3501768
Kristien MBodin BSteuwer MDubach CGibbons J(2019)High-level synthesis of functional patterns with LiftProceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3315454.3329957(35-45)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3315454.3329957
Stewart RDuncan KMichaelson GGarcia PBhowmik DWallace A(2018)RIPLACM Transactions on Reconfigurable Technology and Systems10.1145/318048111:1(1-24)Online publication date: 14-Mar-2018
https://dl.acm.org/doi/10.1145/3180481

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents