research-article

Open access

On Predictable Reconfigurable System Design

Authors:

Bastiaan Kwaadgras,

Georgi GaydadjievAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 18, Issue 2

Article No.: 17, Pages 1 - 28

https://doi.org/10.1145/3436995

Published: 09 February 2021 Publication History

All formats PDF

Abstract

We propose a design methodology to facilitate rigorous development of complex applications targeting reconfigurable hardware. Our methodology relies on analytical estimation of system performance and area utilisation for a given specific application and a particular system instance consisting of a controlflow machine working in conjunction with one or more reconfigurable dataflow accelerators. The targeted application is carefully analyzed, and the parts identified for hardware acceleration are reimplemented as a set of representative software models. Next, with the results of the application analysis, a suitable system architecture is devised and its performance is evaluated to determine bottlenecks, allowing predictable design. The architecture is iteratively refined, until the final version satisfying the specification requirements in terms of performance and required hardware area is obtained. We validate the presented methodology using a widely accepted convolutional neural network (VGG-16) and an important HPC application (BQCD). In both cases, our methodology relieved and alleviated all system bottlenecks before the hardware implementation was started. As a result the architectures were implemented first time right, achieving state-of-the-art performance within 15% of our modelling estimations.

References

[1]

Mohamed S. Abdelfattah, Andrei Hagiescu, and Deshanand Singh. 2014. Gzip on a chip: High performance lossless data compression on FPGAs using OpenCL. In Proceedings of the International Workshop on OpenCL 2013 8 2014. ACM.

Digital Library

[2]

Alibaba. [n.d.]. AliBaba f2 Instance. Retrieved from https://www.alibabacloud.com/help/doc-detail/25378.htm#concept-sx4-lxv-tdb-f2.

[3]

Amazon. [n.d.]. Amazon F1 Instance. Retrieved from https://aws.amazon.com/ec2/instance-types/f1/.

[4]

J. Arram, K. H. Tsoi, Wayne Luk, and P. Jiang. 2013. Hardware acceleration of genetic sequence alignment. In Reconfigurable Computing: Architectures, Tools and Applications. Springer, Berlin, 13--24.

[5]

Utku Aydonat, Shane O’Connell, Davor Capalija, Andrew C. Ling, and Gordon R. Chiu. 2017. An OpenCL™ deep learning accelerator on arria 10. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 55--64.

[6]

J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avižienis, J. Wawrzynek, and K. Asanović. 2012. Chisel: Constructing hardware in a Scala embedded language. In Proceedings of the Design Automation Conference.

[7]

Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. 2013. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[8]

Tobias Becker, Oskar Mencer, Stephen Weston, and Georgi Gaydadjiev. 2015. Maxeler Data-Flow in Computational Finance. Springer International Publishing, Cham, 243--266.

[9]

David Boland and George A. Constantinides. 2013. Word-length optimization beyond straight line code. In Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’13). 105--114.

[10]

Jacob A. Bower, James Huggett, Oliver Pell, and Michael J. Flynn. 2008. A Java-based system for FPGA programming. In Proceedings of the FPGA World Conference.

[11]

Doug Burger. 2020. Keynote: Will programmable hardware reach scale. In Proceedings of the International Conference on Field Program. Logic and App.

[12]

M. A. Cantin, Y. Blaguiere, Y. Sarvaria, P. Lavoie, and E. Granger. 2000. Analysis of quantization effects in a digital hardware implementation of a fuzzy ART neural network algorithm. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’00), Vol. 3. 141--144.

[13]

M. A. Cantin, Y. Savaria, and P. Lavoie. 2002. A comparison of automatic word length optimization procedures. In Proceedings of the 2002 IEEE International Symposium on Circuits and Systems. Proceedings, Vol. 2.

[14]

L. C. Carrington, M. Laurenzano, A. Snavely, R. L. Campbell, and L. P. Davis. 2005. How well can simple metrics represent the performance of HPC applications? In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing (SC’05). 48--48.

[15]

Gary Chun Tak Chow, Anson Hong Tak Tse, Qiwei Jin, Wayne Luk, Philip H. W. Leong, and David B. Thomas. 2012. A mixed precision Monte Carlo methodology for reconfigurable accelerator systems. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, 57--66.

[16]

R. Cmar, L. Rijnders, P. Schaumont, S. Vernalde, and I. Bolsens. 1999. A methodology and design environment for DSP ASIC fixed point refinement. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition 1999. 271--276.

Digital Library

[17]

Bruno da Silva, An Braeken, Erik D’Hollander, and Abdellah Touhafi. 2014. Performance and resource modeling for FPGAs using high-level synthesis tools. In Advances in Parallel Computing, Vol. 25. IOS Press, 523--531.

[18]

Bruno da Silva, An Braeken, Erik H. D’Hollander, and Abdellah Touhafi. 2013. Performance modeling for FPGAs: Extending the roofline model with high-level synthesis tools. Int. J. Reconfig. Comput. 2013, Article 7 (Jan. 2013), 1 page.

Digital Library

[19]

James W. Demmel, Michael T. Heath, and Henk A. van der Vorst. 1993. Parallel numerical linear algebra. Acta Numer. 2 (1993), 111--197.

[20]

Jack B. Dennis. 1980. Data flow supercomputers. Computer 13, 11 (Nov. 1980), 48--56.

Digital Library

[21]

Jack Dongarra and Whitney Heins. 2013. Professor Jack Dongarra Announces New Supercomputer Benchmark. Retrieved from https://news.utk.edu/2013/07/10/professor-jack-dongarra-announces-supercomputer-benchmark/.

[22]

E. El-Araby, M. Taher, M. Abouellail, T. El-Ghazawi, and G. B. Newby. 2007. Comparative analysis of high level programming for reconfigurable computers: Methodology and empirical study. In Proceedings of the 2007 3rd Southern Conference on Programmable Logic. 99--106.

[23]

Ad Emmen. 2020. Benchmarking Fugaku and Summit: A Revealing Process. Retrieved from http://primeurmagazine.com/flash/LV-PL-06-20-7.html.

[24]

Forbes. [n.d.]. Supercomputer Manages Fixed Income Risk at JPMorgan. Retrieved from https://www.forbes.com/sites/tomgroenfeldt/2012/03/20/supercomputer-manages-fixed-income-risk-at-jpmorgan/#376c5e5f1001.

[25]

L. Gan, H. Fu, W. Luk, C. Yang, W. Xue, X. Huang, Y. Zhang, and G. Yang. 2013. Accelerating solvers for global atmospheric equations through mixed-precision data flow engine. In Proceedings of the 2013 23rd International Conference on Field programmable Logic and Applications. 1--6.

[26]

C. Guo, H. Fu, and W. Luk. 2012. A fully-pipelined expectation-maximization engine for Gaussian Mixture Models. In Proceedings of the 2012 International Conference on Field-Programmable Technology. 182--189.

[27]

John L. Gustafson. 1988. Reevaluating Amdahl’s law. Commun. ACM 31, 5 (May 1988), 532--533.

Digital Library

[28]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. abs/1512.0338. Retrieved from http://arxiv.org/abs/1512.03385.

[29]

M. R. Hestenes and E. Stiefel. 1952. Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Standards 49, 6 (Dec. 1952), 409--436.

[30]

Brian Holland, Karthik Nagarajan, Chris Conger, Adam Jacobs, and Alan D. George. 2007. RAT: A methodology for predicting performance in application design migration to FPGAs. In Proceedings of the 1st International Workshop on High-performance Reconfigurable Computing Technology and Applications: Held in Conjunction with SC07 (HPRCTA’07). ACM, New York, NY, 1--10.

[31]

Sunpyo Hong and Hyesoon Kim. [n.d.]. An integrated GPU power and performance model. In Proceedings of the 37th Annual International Symposium on Computer Architecture. Association for Computing Machinery, New York, NY, 280--289.

[32]

Neng Hou, Xiaohu Yan, and Fazhi He. 2019. A survey on partitioning models, solution algorithms and algorithm parallelization for hardware/software co-design. Des. Autom. Embed. Syst. 23, 1 (01 Jun. 2019), 57--77.

[33]

Chris Jones. 2015. Maxeler Dense Matrix Multiply. Retrieved from https://github.com/maxeler/Dense-Matrix-Multiplication.

[34]

Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, 1--12.

Digital Library

[35]

Y. Kebbati and H. K. Souffi. 2007. Optimized design methodology for an integration of electrical control systems. In Proceedings of the 2007 Canadian Conference on Electrical and Computer Engineering. 1657--1660.

[36]

H. Keding, M. Willems, M. Coors, and H. Meyr. 1998. FRIDGE: A fixed-point design and simulation environment. In Proceedings of the Design, Automation and Test in Europe Conference. 429--435.

[37]

Seehyun Kim, Ki-Il Kum, and Wonyong Sung. 1995. Fixed-point optimization utility for C and C++ based digital signal processing programs. In VLSI Signal Processing, VIII. 197--206.

[38]

Seehyun Kim, Ki-Il Kum, and Wonyong Sung. 1998. Fixed-point optimization utility for C and C++ based digital signal processing programs. IEEE Trans. Circ. Syst. II: Analog Dig. Sign. Process. 45, 11 (Nov. 1998), 1455--1464.

[39]

Martin Kleppmann. 2017. Designing Data-intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O’Reilly Media, Inc.

[40]

D. Koeplinger, R. Prabhakar, Y. Zhang, C. Delimitrou, C. Kozyrakis, and K. Olukotun. 2016. Automatic generation of efficient accelerators for reconfigurable hardware. In Proceedings of the 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA’16). 115--127.

[41]

K. Koliogeorgi, N. Voss, S. Fytraki, S. Xydis, G. Gaydadjiev, and D. Soudris. 2019. Dataflow acceleration of smith-waterman with traceback for high throughput next generation sequencing. In Proceedings of the 2019 29th International Conference on Field Programmable Logic and Applications (FPL’19).

[42]

W. Kramer. 2012. Top500 versus sustained performance - the top problems with the TOP500 list - and what to do about them. In Proceedings of the 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). 223--230.

Digital Library

[43]

H. T. Kung. 1982. Why systolic architectures? Computer 15, 1 (Jan. 1982), 37--46.

Digital Library

[44]

G. L. B. Lima, G. A. L. Ferreira, O. Saotome, A. M. d. Cunha, and L. A. V. Dias. 2015. Hardware development: Agile and co-design. In Proceedings of the 2015 12th International Conference on Information Technology—New Generations. 784--787.

Digital Library

[45]

Yonghua Lin and Ling Shao. 2015. Super vessel: The open cloud service for OpenPOWER. White Paper, IBM Corporation (2015).

[46]

O. Lindtjorn, R. Clapp, O. Pell, H. Fu, M. Flynn, and O. Mencer. 2011. Beyond traditional microprocessors for geoscience high-performance computing applications. IEEE Micro 31, 2 (Mar. 2011), 41--49.

Digital Library

[47]

L. Lu, Y. Liang, Q. Xiao, and S. Yan. 2017. Evaluating fast algorithms for convolutional neural networks on FPGAs. In Proceedings of the 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’17). 101--108.

[48]

Yufei Ma, Yu Cao, Sarma Vrudhula, and Jae-sun Seo. 2017. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM.

Digital Library

[49]

Mariem Makni, Mouna Baklouti, Smail Niar, and Mohamed Abid. 2017. Hardware resource estimation for heterogeneous FPGA-based SoCs. In Proceedings of the Symposium on Applied Computing (SAC’17). ACM, New York, NY, 1481--1487.

Digital Library

[50]

Paolo Marchetti, Diego Oriato, Oliver Pell, A. M. Cristini, and D. Theis. 2010. Fast 3D ZO CRS stack—An FPGA implementation of an optimization based on the simultaneous estimate of eight parameters. In Proceedings of the 72nd EAGE Conference and Exhibition incorporating SPE EUROPEC’10.

[51]

Michael McCool, James Reinders, and Arch Robison. 2012. Structured Parallel Programming: Patterns for Efficient Computation (1st ed.). Morgan Kaufmann, San Francisco, CA.

Digital Library

[52]

O. Mencer, M. Platzner, M. Morf, and M. J. Flynn. 2001. Object-oriented domain specific compilers for programming FPGAs. IEEE Trans. VLSI Syst. 9, 1 (Feb. 2001), 205--210.

Digital Library

[53]

Paul A. Merolla, John V. Arthur, Rodrigo Alvarez-Icaza, Andrew S. Cassidy, Jun Sawada, Filipp Akopyan, Bryan L. Jackson, Nabil Imam, Chen Guo, Yutaka Nakamura, Bernard Brezzo, Ivan Vo, Steven K. Esser, Rathinakumar Appuswamy, Brian Taba, Arnon Amir, Myron D. Flickner, William P. Risk, Rajit Manohar, and Dharmendra S. Modha. 2014. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345, 6197 (2014), 668--673.

[54]

Giovanni De Micheli. 1994. Synthesis and Optimization of Digital Circuits (1st ed.). McGraw--Hill Higher Education.

Digital Library

[55]

Microsoft. [n.d.]. Inside the Microsoft FPGA-based Configurable Cloud. Retrieved from https://azure.microsoft.com/en-gb/resources/videos/build-2017-inside-the-microsoft-fpga-based-configurable-cloud/.

[56]

E. Monmasson and M. N. Cirstea. 2007. FPGA design methodology for industrial control systems—A review. IEEE Trans. Industr. Electr. 54, 4 (Aug. 2007), 1824--1842.

[57]

Wolfgang Müller, Wolfgang Rosenstiel, and Jürgen Ruf. 2007. SystemC: Methodologies and Applications. Springer Science 8 Business.

[58]

Syed Waqar Nabi and Wim Vanderbauwhede. 2019. FPGA design space exploration for scientific HPC applications using a fast and accurate cost model based on roofline analysis. J. Parallel Distrib. Comput. 133 (2019), 407--419.

[59]

Yoshifumi Nakamura and Hinnerk Stüben. 2014. BQCD: Berlin quantum chromodynamics program. arXiv1011.0199. Retrieved from https://arxiv.org/abs/1011.0199.

[60]

A. M. Nestorov, E. Reggiani, H. Palikareva, P. Burovskiy, T. Becker, and M. D. Santambrogio. 2017. A scalable dataflow implementation of Curran’s approximation algorithm. In Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’17). 150--157.

[61]

Rajat K. Pal. 2000. Multi-layer Channel Routing Complexity and Algorithms. Alpha Science Int’l Ltd.

[62]

Oliver Pell, Oskar Mencer, Kuen Hung Tsoi, and Wayne Luk. 2013. Maximum performance computing with dataflow engines. In High-Performance Computing Using FPGAs. Springer, New York, NY, 747--774.

[63]

Raghu Prabhakar, David Koeplinger, Kevin J. Brown, HyoukJoong Lee, Christopher De Sa, Christos Kozyrakis, and Kunle Olukotun. 2016. Generating configurable hardware from parallel patterns. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems. 651--665.

Digital Library

[64]

R. Rashid, J. G. Steffan, and V. Betz. 2014. Comparing performance, productivity and scalability of the TILT overlay processor to OpenCL HLS. In Proceedings of the 2014 International Conference on Field-Programmable Technology (FPT’14). 20--27.

[65]

T. Riesgo, Y. Torroja, and E. de la Torre. 1999. Design methodologies based on hardware description languages. IEEE Trans. Industr. Electr. 46, 1 (1999), 3--12.

[66]

B. Rountree, D. K. Lowenthal, M. Schulz, and B. R. de Supinski. 2011. Practical performance prediction under dynamic voltage frequency scaling. In Proceedings of the 2011 International Green Computing Conference and Workshops. 1--8.

Digital Library

[67]

Y. S. Shao, B. Reagen, G. Y. Wei, and D. Brooks. 2014. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures. In Proceedings of the 2014 ACM/IEEE 41st International Symposium on Computer Architecture.

[68]

Changchun Shi and R. W. Brodersen. 2004. Automated fixed-point data-type optimization tool for signal processing and communication systems. In Proceedings of the 41st Design Automation Conference. 478--483.

[69]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv 1409.1556. Retrieved from https://arxiv.org/abs/1409.1556.

[70]

V. Srinivasan, S. Radhakrishnan, and R. Vemuri. 1998. Hardware software partitioning with integrated hardware design space exploration. In Proceedings of the Design, Automation and Test in Europe Conference. 28--35.

[71]

S. Summers, A. Rose, and P. Sanders. 2017. Using MaxCompiler for the high level synthesis of trigger algorithms. J. Instrum. 12, 02 (Feb. 2017), C02015--C02015.

[72]

D. E. Thomas, J. K. Adams, and H. Schmit. 1993. A model and methodology for hardware-software codesign. IEEE Des. Test Comput. 10, 3 (1993), 6--15.

Digital Library

[73]

M. M. Tikir, L. Carrington, E. Strohmaier, and A. Snavely. 2007. A genetic algorithms approach to modeling the performance of memory-bound computations. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC’07). 1--12.

[74]

T. J. Todman, G. A. Constantinides, S. J. E. Wilton, O. Mencer, W. Luk, and P. Y. K. Cheung. 2005. Reconfigurable computing: Architectures and design methods. IEE Proc. Comput. Dig. Techn. 152, 2 (Mar. 2005), 193--207.

[75]

Carlo Tomas, Luca Cazzola, Diego Oriato, Oliver Pell, Daniela Theis, Guido Satta, and Ernesto Bonomi. 2012. Acceleration of the anisotropic PSPI imaging algorithm with dataflow engines. In SEG Technical Program Expanded Abstracts 2012. 1--5.

[76]

Nils Voss, Tobias Becker, Oskar Mencer, and Georgi Gaydadjiev. 2017. Rapid Development of Gzip with MaxJ. Springer International Publishing, Cham, 60--71.

[77]

Nils Voss, Stephen Girdlestone, Tobias Becker, Oskar Mencer, Wayne Luk, and Georgi Gaydadjiev. 2019. Low area overhead custom buffering for FFT. In Proceedings of the 2019 International Conference on ReConFigurable Computing and FPGAs (ReConFig’19). IEEE, 1--8.

[78]

Nils Voss, Peter Ziegenhein, Lukas Vermond, Joost Hoozemans, Oskar Mencer, Uwe Oelfke, Wayne Luk, and Georgi Gaydadjiev. 2020. Towards real time radiotherapy simulation. J. Sign. Process. Syst. 92, 9 (01 Sep. 2020), 949--963.

[79]

Hasitha Muthumala Waidyasooriya, Masanori Hariyama, and Kunio Uchiyama. 2018. FPGA Accelerator Design Using OpenCL. Springer International Publishing, 29--43.

[80]

Z. Wang, B. He, W. Zhang, and S. Jiang. 2016. A performance analysis framework for optimizing OpenCL applications on FPGAs. In Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA’16). 114--125.

[81]

Xuechao Wei, Cody Hao Yu, Peng Zhang, Youxiang Chen, Yuxin Wang, Han Hu, Yun Liang, and Jason Cong. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Proceedings of the 54th Annual Design Automation Conference 2017. ACM, 29.

Digital Library

[82]

S. Weston, J. Marin, J. Spooner, O. Pell, and O. Mencer. 2010. Accelerating the computation of portfolios of tranched credit derivatives. In Proceedings of the 2010 IEEE Workshop on High Performance Computational Finance. 1--8.

[83]

Stephen Weston, James Spooner, Sébastien Racanière, and Oskar Mencer. 2011. Rapid computation of value and risk for derivatives portfolios. Concurr. Comput.: Pract. Ex. 24, 8 (2011), 880--894.

Digital Library

[84]

Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52 (Apr. 2009).

Digital Library

[85]

Kenneth G. Wilson. 1974. Confinement of quarks. Phys. Rev. D 10, 8 (Oct. 1974), 2445--2459.

[86]

S. Winograd. 1980. Arithmetic Complexity of Computations. Society for Industrial and Applied Mathematics.

[87]

Xilinx. [n.d.]. Baidu Deploys Xilinx FPGAs in New Public Cloud Acceleration Services. Retrieved from https://www.xilinx.com/news/press/2017/baidu-deploys-xilinx-fpgas-in-new-public-cloud-acceleration-services.html.

[88]

ZDNet. [n.d.]. Intel FPGAs Picked up by Dell EMC and Fujitsu. Retrieved from https://www.zdnet.com/article/intel-fpgas-picked-up-by-dell-emc-and-fujitsu/.

[89]

M. D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q. V. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, and G. E. Hinton. 2013. On rectified linear units for speech processing. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 3517--3521.

[90]

Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks. In Proceedings of the 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’16). 1--8.

Digital Library

[91]

Jialiang Zhang and Jing Li. 2017. Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 25--34.

Digital Library

[92]

J. Zhao, L. Feng, S. Sinha, W. Zhang, Y. Liang, and B. He. 2017. COMBA: A comprehensive model-based analysis framework for high level synthesis of real applications. In Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’17). 430--437.

[93]

G. Zhong, A. Prakash, S. Wang, Y. Liang, T. Mitra, and S. Niar. 2017. Design space exploration of FPGA-based accelerators with multi-level parallelism. In Proceedings of the Design, Automation, and Test in Europe Conference (DATE’17).

Cited By

Miedema RStrydis C(2024)ExaFlexHH: an exascale-ready, flexible multi-FPGA library for biologically plausible brain simulationsFrontiers in Neuroinformatics10.3389/fninf.2024.133087518Online publication date: 12-Apr-2024
https://doi.org/10.3389/fninf.2024.1330875
Sahebi ABarbone MProcaccini MLuk WGaydadjiev GGiorgi R(2023)Distributed large-scale graph processing on FPGAsJournal of Big Data10.1186/s40537-023-00756-x10:1Online publication date: 4-Jun-2023
https://doi.org/10.1186/s40537-023-00756-x
Prouveur CHaefele MKenter TVoss NHuebl ASilvano CRobinson T(2023)FPGA Acceleration for HPC Supercapacitor SimulationsProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3592979.3593419(1-11)Online publication date: 26-Jun-2023
https://dl.acm.org/doi/10.1145/3592979.3593419

Index Terms

On Predictable Reconfigurable System Design
1. Hardware
  1. Electronic design automation
    1. Methodologies for EDA
      1. Best practices for EDA
    2. Modeling and parameter extraction
  2. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators
      2. Reconfigurable logic applications
2. Software and its engineering
  1. Software creation and management
    1. Software development process management
    2. Software development techniques

Recommendations

Hardware and software infrastructure to implement many-core systems in modern FPGAs
SBCCI '17: Proceedings of the 30th Symposium on Integrated Circuits and Systems Design: Chip on the Sands

Many-core systems are increasingly popular in embedded systems due to their high-performance and flexibility to execute different workloads. These many-core systems provide a rich processing fabric but lack the flexibility to accelerate critical ...
Implementation of FFT on General-Purpose Architectures for FPGA

This paper describes two general-purpose architectures targeted to Field Programmable Gate Array FPGA implementation. The first architecture is based on the coupling of a coarse-grain reconfigurable array with a general-purpose processor core. The ...
Heterogeneous-ASIF: an application specific inflexible FPGA using heterogeneous logic blocks (abstract only)
FPGA '10: Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays

An Application Specific Inflexible FPGA (ASIF) is an FPGA with reduced flexibility that can implement a set of application circuits which will operate at different times. Application circuits are initially placed and routed on an FPGA in such a way that ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 18, Issue 2

June 2021

190 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3450354

Editor:
David Kaeli
Northeastern University, USA

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 February 2021

Accepted: 01 November 2020

Revised: 01 October 2020

Received: 01 June 2020

Published in TACO Volume 18, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Maxeler, Intel, and Xilinx is gratefully acknowledged
United Kingdom EPSRC

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
1,084
Total Downloads

Downloads (Last 12 months)165
Downloads (Last 6 weeks)22

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Miedema RStrydis C(2024)ExaFlexHH: an exascale-ready, flexible multi-FPGA library for biologically plausible brain simulationsFrontiers in Neuroinformatics10.3389/fninf.2024.133087518Online publication date: 12-Apr-2024
https://doi.org/10.3389/fninf.2024.1330875
Sahebi ABarbone MProcaccini MLuk WGaydadjiev GGiorgi R(2023)Distributed large-scale graph processing on FPGAsJournal of Big Data10.1186/s40537-023-00756-x10:1Online publication date: 4-Jun-2023
https://doi.org/10.1186/s40537-023-00756-x
Prouveur CHaefele MKenter TVoss NHuebl ASilvano CRobinson T(2023)FPGA Acceleration for HPC Supercapacitor SimulationsProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3592979.3593419(1-11)Online publication date: 26-Jun-2023
https://dl.acm.org/doi/10.1145/3592979.3593419

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents