Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

On Predictable Reconfigurable System Design

Published: 09 February 2021 Publication History

Abstract

We propose a design methodology to facilitate rigorous development of complex applications targeting reconfigurable hardware. Our methodology relies on analytical estimation of system performance and area utilisation for a given specific application and a particular system instance consisting of a controlflow machine working in conjunction with one or more reconfigurable dataflow accelerators. The targeted application is carefully analyzed, and the parts identified for hardware acceleration are reimplemented as a set of representative software models. Next, with the results of the application analysis, a suitable system architecture is devised and its performance is evaluated to determine bottlenecks, allowing predictable design. The architecture is iteratively refined, until the final version satisfying the specification requirements in terms of performance and required hardware area is obtained. We validate the presented methodology using a widely accepted convolutional neural network (VGG-16) and an important HPC application (BQCD). In both cases, our methodology relieved and alleviated all system bottlenecks before the hardware implementation was started. As a result the architectures were implemented first time right, achieving state-of-the-art performance within 15% of our modelling estimations.

References

[1]
Mohamed S. Abdelfattah, Andrei Hagiescu, and Deshanand Singh. 2014. Gzip on a chip: High performance lossless data compression on FPGAs using OpenCL. In Proceedings of the International Workshop on OpenCL 2013 8 2014. ACM.
[2]
Alibaba. [n.d.]. AliBaba f2 Instance. Retrieved from https://www.alibabacloud.com/help/doc-detail/25378.htm#concept-sx4-lxv-tdb-f2.
[3]
Amazon. [n.d.]. Amazon F1 Instance. Retrieved from https://aws.amazon.com/ec2/instance-types/f1/.
[4]
J. Arram, K. H. Tsoi, Wayne Luk, and P. Jiang. 2013. Hardware acceleration of genetic sequence alignment. In Reconfigurable Computing: Architectures, Tools and Applications. Springer, Berlin, 13--24.
[5]
Utku Aydonat, Shane O’Connell, Davor Capalija, Andrew C. Ling, and Gordon R. Chiu. 2017. An OpenCL™ deep learning accelerator on arria 10. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 55--64.
[6]
J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avižienis, J. Wawrzynek, and K. Asanović. 2012. Chisel: Constructing hardware in a Scala embedded language. In Proceedings of the Design Automation Conference.
[7]
Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. 2013. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.
[8]
Tobias Becker, Oskar Mencer, Stephen Weston, and Georgi Gaydadjiev. 2015. Maxeler Data-Flow in Computational Finance. Springer International Publishing, Cham, 243--266.
[9]
David Boland and George A. Constantinides. 2013. Word-length optimization beyond straight line code. In Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’13). 105--114.
[10]
Jacob A. Bower, James Huggett, Oliver Pell, and Michael J. Flynn. 2008. A Java-based system for FPGA programming. In Proceedings of the FPGA World Conference.
[11]
Doug Burger. 2020. Keynote: Will programmable hardware reach scale. In Proceedings of the International Conference on Field Program. Logic and App.
[12]
M. A. Cantin, Y. Blaguiere, Y. Sarvaria, P. Lavoie, and E. Granger. 2000. Analysis of quantization effects in a digital hardware implementation of a fuzzy ART neural network algorithm. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’00), Vol. 3. 141--144.
[13]
M. A. Cantin, Y. Savaria, and P. Lavoie. 2002. A comparison of automatic word length optimization procedures. In Proceedings of the 2002 IEEE International Symposium on Circuits and Systems. Proceedings, Vol. 2.
[14]
L. C. Carrington, M. Laurenzano, A. Snavely, R. L. Campbell, and L. P. Davis. 2005. How well can simple metrics represent the performance of HPC applications? In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing (SC’05). 48--48.
[15]
Gary Chun Tak Chow, Anson Hong Tak Tse, Qiwei Jin, Wayne Luk, Philip H. W. Leong, and David B. Thomas. 2012. A mixed precision Monte Carlo methodology for reconfigurable accelerator systems. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, 57--66.
[16]
R. Cmar, L. Rijnders, P. Schaumont, S. Vernalde, and I. Bolsens. 1999. A methodology and design environment for DSP ASIC fixed point refinement. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition 1999. 271--276.
[17]
Bruno da Silva, An Braeken, Erik D’Hollander, and Abdellah Touhafi. 2014. Performance and resource modeling for FPGAs using high-level synthesis tools. In Advances in Parallel Computing, Vol. 25. IOS Press, 523--531.
[18]
Bruno da Silva, An Braeken, Erik H. D’Hollander, and Abdellah Touhafi. 2013. Performance modeling for FPGAs: Extending the roofline model with high-level synthesis tools. Int. J. Reconfig. Comput. 2013, Article 7 (Jan. 2013), 1 page.
[19]
James W. Demmel, Michael T. Heath, and Henk A. van der Vorst. 1993. Parallel numerical linear algebra. Acta Numer. 2 (1993), 111--197.
[20]
Jack B. Dennis. 1980. Data flow supercomputers. Computer 13, 11 (Nov. 1980), 48--56.
[21]
Jack Dongarra and Whitney Heins. 2013. Professor Jack Dongarra Announces New Supercomputer Benchmark. Retrieved from https://news.utk.edu/2013/07/10/professor-jack-dongarra-announces-supercomputer-benchmark/.
[22]
E. El-Araby, M. Taher, M. Abouellail, T. El-Ghazawi, and G. B. Newby. 2007. Comparative analysis of high level programming for reconfigurable computers: Methodology and empirical study. In Proceedings of the 2007 3rd Southern Conference on Programmable Logic. 99--106.
[23]
Ad Emmen. 2020. Benchmarking Fugaku and Summit: A Revealing Process. Retrieved from http://primeurmagazine.com/flash/LV-PL-06-20-7.html.
[24]
Forbes. [n.d.]. Supercomputer Manages Fixed Income Risk at JPMorgan. Retrieved from https://www.forbes.com/sites/tomgroenfeldt/2012/03/20/supercomputer-manages-fixed-income-risk-at-jpmorgan/#376c5e5f1001.
[25]
L. Gan, H. Fu, W. Luk, C. Yang, W. Xue, X. Huang, Y. Zhang, and G. Yang. 2013. Accelerating solvers for global atmospheric equations through mixed-precision data flow engine. In Proceedings of the 2013 23rd International Conference on Field programmable Logic and Applications. 1--6.
[26]
C. Guo, H. Fu, and W. Luk. 2012. A fully-pipelined expectation-maximization engine for Gaussian Mixture Models. In Proceedings of the 2012 International Conference on Field-Programmable Technology. 182--189.
[27]
John L. Gustafson. 1988. Reevaluating Amdahl’s law. Commun. ACM 31, 5 (May 1988), 532--533.
[28]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. abs/1512.0338. Retrieved from http://arxiv.org/abs/1512.03385.
[29]
M. R. Hestenes and E. Stiefel. 1952. Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Standards 49, 6 (Dec. 1952), 409--436.
[30]
Brian Holland, Karthik Nagarajan, Chris Conger, Adam Jacobs, and Alan D. George. 2007. RAT: A methodology for predicting performance in application design migration to FPGAs. In Proceedings of the 1st International Workshop on High-performance Reconfigurable Computing Technology and Applications: Held in Conjunction with SC07 (HPRCTA’07). ACM, New York, NY, 1--10.
[31]
Sunpyo Hong and Hyesoon Kim. [n.d.]. An integrated GPU power and performance model. In Proceedings of the 37th Annual International Symposium on Computer Architecture. Association for Computing Machinery, New York, NY, 280--289.
[32]
Neng Hou, Xiaohu Yan, and Fazhi He. 2019. A survey on partitioning models, solution algorithms and algorithm parallelization for hardware/software co-design. Des. Autom. Embed. Syst. 23, 1 (01 Jun. 2019), 57--77.
[33]
Chris Jones. 2015. Maxeler Dense Matrix Multiply. Retrieved from https://github.com/maxeler/Dense-Matrix-Multiplication.
[34]
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, 1--12.
[35]
Y. Kebbati and H. K. Souffi. 2007. Optimized design methodology for an integration of electrical control systems. In Proceedings of the 2007 Canadian Conference on Electrical and Computer Engineering. 1657--1660.
[36]
H. Keding, M. Willems, M. Coors, and H. Meyr. 1998. FRIDGE: A fixed-point design and simulation environment. In Proceedings of the Design, Automation and Test in Europe Conference. 429--435.
[37]
Seehyun Kim, Ki-Il Kum, and Wonyong Sung. 1995. Fixed-point optimization utility for C and C++ based digital signal processing programs. In VLSI Signal Processing, VIII. 197--206.
[38]
Seehyun Kim, Ki-Il Kum, and Wonyong Sung. 1998. Fixed-point optimization utility for C and C++ based digital signal processing programs. IEEE Trans. Circ. Syst. II: Analog Dig. Sign. Process. 45, 11 (Nov. 1998), 1455--1464.
[39]
Martin Kleppmann. 2017. Designing Data-intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O’Reilly Media, Inc.
[40]
D. Koeplinger, R. Prabhakar, Y. Zhang, C. Delimitrou, C. Kozyrakis, and K. Olukotun. 2016. Automatic generation of efficient accelerators for reconfigurable hardware. In Proceedings of the 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA’16). 115--127.
[41]
K. Koliogeorgi, N. Voss, S. Fytraki, S. Xydis, G. Gaydadjiev, and D. Soudris. 2019. Dataflow acceleration of smith-waterman with traceback for high throughput next generation sequencing. In Proceedings of the 2019 29th International Conference on Field Programmable Logic and Applications (FPL’19).
[42]
W. Kramer. 2012. Top500 versus sustained performance - the top problems with the TOP500 list - and what to do about them. In Proceedings of the 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). 223--230.
[43]
H. T. Kung. 1982. Why systolic architectures? Computer 15, 1 (Jan. 1982), 37--46.
[44]
G. L. B. Lima, G. A. L. Ferreira, O. Saotome, A. M. d. Cunha, and L. A. V. Dias. 2015. Hardware development: Agile and co-design. In Proceedings of the 2015 12th International Conference on Information Technology—New Generations. 784--787.
[45]
Yonghua Lin and Ling Shao. 2015. Super vessel: The open cloud service for OpenPOWER. White Paper, IBM Corporation (2015).
[46]
O. Lindtjorn, R. Clapp, O. Pell, H. Fu, M. Flynn, and O. Mencer. 2011. Beyond traditional microprocessors for geoscience high-performance computing applications. IEEE Micro 31, 2 (Mar. 2011), 41--49.
[47]
L. Lu, Y. Liang, Q. Xiao, and S. Yan. 2017. Evaluating fast algorithms for convolutional neural networks on FPGAs. In Proceedings of the 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’17). 101--108.
[48]
Yufei Ma, Yu Cao, Sarma Vrudhula, and Jae-sun Seo. 2017. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM.
[49]
Mariem Makni, Mouna Baklouti, Smail Niar, and Mohamed Abid. 2017. Hardware resource estimation for heterogeneous FPGA-based SoCs. In Proceedings of the Symposium on Applied Computing (SAC’17). ACM, New York, NY, 1481--1487.
[50]
Paolo Marchetti, Diego Oriato, Oliver Pell, A. M. Cristini, and D. Theis. 2010. Fast 3D ZO CRS stack—An FPGA implementation of an optimization based on the simultaneous estimate of eight parameters. In Proceedings of the 72nd EAGE Conference and Exhibition incorporating SPE EUROPEC’10.
[51]
Michael McCool, James Reinders, and Arch Robison. 2012. Structured Parallel Programming: Patterns for Efficient Computation (1st ed.). Morgan Kaufmann, San Francisco, CA.
[52]
O. Mencer, M. Platzner, M. Morf, and M. J. Flynn. 2001. Object-oriented domain specific compilers for programming FPGAs. IEEE Trans. VLSI Syst. 9, 1 (Feb. 2001), 205--210.
[53]
Paul A. Merolla, John V. Arthur, Rodrigo Alvarez-Icaza, Andrew S. Cassidy, Jun Sawada, Filipp Akopyan, Bryan L. Jackson, Nabil Imam, Chen Guo, Yutaka Nakamura, Bernard Brezzo, Ivan Vo, Steven K. Esser, Rathinakumar Appuswamy, Brian Taba, Arnon Amir, Myron D. Flickner, William P. Risk, Rajit Manohar, and Dharmendra S. Modha. 2014. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345, 6197 (2014), 668--673.
[54]
Giovanni De Micheli. 1994. Synthesis and Optimization of Digital Circuits (1st ed.). McGraw--Hill Higher Education.
[55]
Microsoft. [n.d.]. Inside the Microsoft FPGA-based Configurable Cloud. Retrieved from https://azure.microsoft.com/en-gb/resources/videos/build-2017-inside-the-microsoft-fpga-based-configurable-cloud/.
[56]
E. Monmasson and M. N. Cirstea. 2007. FPGA design methodology for industrial control systems—A review. IEEE Trans. Industr. Electr. 54, 4 (Aug. 2007), 1824--1842.
[57]
Wolfgang Müller, Wolfgang Rosenstiel, and Jürgen Ruf. 2007. SystemC: Methodologies and Applications. Springer Science 8 Business.
[58]
Syed Waqar Nabi and Wim Vanderbauwhede. 2019. FPGA design space exploration for scientific HPC applications using a fast and accurate cost model based on roofline analysis. J. Parallel Distrib. Comput. 133 (2019), 407--419.
[59]
Yoshifumi Nakamura and Hinnerk Stüben. 2014. BQCD: Berlin quantum chromodynamics program. arXiv1011.0199. Retrieved from https://arxiv.org/abs/1011.0199.
[60]
A. M. Nestorov, E. Reggiani, H. Palikareva, P. Burovskiy, T. Becker, and M. D. Santambrogio. 2017. A scalable dataflow implementation of Curran’s approximation algorithm. In Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’17). 150--157.
[61]
Rajat K. Pal. 2000. Multi-layer Channel Routing Complexity and Algorithms. Alpha Science Int’l Ltd.
[62]
Oliver Pell, Oskar Mencer, Kuen Hung Tsoi, and Wayne Luk. 2013. Maximum performance computing with dataflow engines. In High-Performance Computing Using FPGAs. Springer, New York, NY, 747--774.
[63]
Raghu Prabhakar, David Koeplinger, Kevin J. Brown, HyoukJoong Lee, Christopher De Sa, Christos Kozyrakis, and Kunle Olukotun. 2016. Generating configurable hardware from parallel patterns. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems. 651--665.
[64]
R. Rashid, J. G. Steffan, and V. Betz. 2014. Comparing performance, productivity and scalability of the TILT overlay processor to OpenCL HLS. In Proceedings of the 2014 International Conference on Field-Programmable Technology (FPT’14). 20--27.
[65]
T. Riesgo, Y. Torroja, and E. de la Torre. 1999. Design methodologies based on hardware description languages. IEEE Trans. Industr. Electr. 46, 1 (1999), 3--12.
[66]
B. Rountree, D. K. Lowenthal, M. Schulz, and B. R. de Supinski. 2011. Practical performance prediction under dynamic voltage frequency scaling. In Proceedings of the 2011 International Green Computing Conference and Workshops. 1--8.
[67]
Y. S. Shao, B. Reagen, G. Y. Wei, and D. Brooks. 2014. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures. In Proceedings of the 2014 ACM/IEEE 41st International Symposium on Computer Architecture.
[68]
Changchun Shi and R. W. Brodersen. 2004. Automated fixed-point data-type optimization tool for signal processing and communication systems. In Proceedings of the 41st Design Automation Conference. 478--483.
[69]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv 1409.1556. Retrieved from https://arxiv.org/abs/1409.1556.
[70]
V. Srinivasan, S. Radhakrishnan, and R. Vemuri. 1998. Hardware software partitioning with integrated hardware design space exploration. In Proceedings of the Design, Automation and Test in Europe Conference. 28--35.
[71]
S. Summers, A. Rose, and P. Sanders. 2017. Using MaxCompiler for the high level synthesis of trigger algorithms. J. Instrum. 12, 02 (Feb. 2017), C02015--C02015.
[72]
D. E. Thomas, J. K. Adams, and H. Schmit. 1993. A model and methodology for hardware-software codesign. IEEE Des. Test Comput. 10, 3 (1993), 6--15.
[73]
M. M. Tikir, L. Carrington, E. Strohmaier, and A. Snavely. 2007. A genetic algorithms approach to modeling the performance of memory-bound computations. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC’07). 1--12.
[74]
T. J. Todman, G. A. Constantinides, S. J. E. Wilton, O. Mencer, W. Luk, and P. Y. K. Cheung. 2005. Reconfigurable computing: Architectures and design methods. IEE Proc. Comput. Dig. Techn. 152, 2 (Mar. 2005), 193--207.
[75]
Carlo Tomas, Luca Cazzola, Diego Oriato, Oliver Pell, Daniela Theis, Guido Satta, and Ernesto Bonomi. 2012. Acceleration of the anisotropic PSPI imaging algorithm with dataflow engines. In SEG Technical Program Expanded Abstracts 2012. 1--5.
[76]
Nils Voss, Tobias Becker, Oskar Mencer, and Georgi Gaydadjiev. 2017. Rapid Development of Gzip with MaxJ. Springer International Publishing, Cham, 60--71.
[77]
Nils Voss, Stephen Girdlestone, Tobias Becker, Oskar Mencer, Wayne Luk, and Georgi Gaydadjiev. 2019. Low area overhead custom buffering for FFT. In Proceedings of the 2019 International Conference on ReConFigurable Computing and FPGAs (ReConFig’19). IEEE, 1--8.
[78]
Nils Voss, Peter Ziegenhein, Lukas Vermond, Joost Hoozemans, Oskar Mencer, Uwe Oelfke, Wayne Luk, and Georgi Gaydadjiev. 2020. Towards real time radiotherapy simulation. J. Sign. Process. Syst. 92, 9 (01 Sep. 2020), 949--963.
[79]
Hasitha Muthumala Waidyasooriya, Masanori Hariyama, and Kunio Uchiyama. 2018. FPGA Accelerator Design Using OpenCL. Springer International Publishing, 29--43.
[80]
Z. Wang, B. He, W. Zhang, and S. Jiang. 2016. A performance analysis framework for optimizing OpenCL applications on FPGAs. In Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA’16). 114--125.
[81]
Xuechao Wei, Cody Hao Yu, Peng Zhang, Youxiang Chen, Yuxin Wang, Han Hu, Yun Liang, and Jason Cong. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Proceedings of the 54th Annual Design Automation Conference 2017. ACM, 29.
[82]
S. Weston, J. Marin, J. Spooner, O. Pell, and O. Mencer. 2010. Accelerating the computation of portfolios of tranched credit derivatives. In Proceedings of the 2010 IEEE Workshop on High Performance Computational Finance. 1--8.
[83]
Stephen Weston, James Spooner, Sébastien Racanière, and Oskar Mencer. 2011. Rapid computation of value and risk for derivatives portfolios. Concurr. Comput.: Pract. Ex. 24, 8 (2011), 880--894.
[84]
Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52 (Apr. 2009).
[85]
Kenneth G. Wilson. 1974. Confinement of quarks. Phys. Rev. D 10, 8 (Oct. 1974), 2445--2459.
[86]
S. Winograd. 1980. Arithmetic Complexity of Computations. Society for Industrial and Applied Mathematics.
[87]
Xilinx. [n.d.]. Baidu Deploys Xilinx FPGAs in New Public Cloud Acceleration Services. Retrieved from https://www.xilinx.com/news/press/2017/baidu-deploys-xilinx-fpgas-in-new-public-cloud-acceleration-services.html.
[88]
ZDNet. [n.d.]. Intel FPGAs Picked up by Dell EMC and Fujitsu. Retrieved from https://www.zdnet.com/article/intel-fpgas-picked-up-by-dell-emc-and-fujitsu/.
[89]
M. D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q. V. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, and G. E. Hinton. 2013. On rectified linear units for speech processing. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 3517--3521.
[90]
Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks. In Proceedings of the 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’16). 1--8.
[91]
Jialiang Zhang and Jing Li. 2017. Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 25--34.
[92]
J. Zhao, L. Feng, S. Sinha, W. Zhang, Y. Liang, and B. He. 2017. COMBA: A comprehensive model-based analysis framework for high level synthesis of real applications. In Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’17). 430--437.
[93]
G. Zhong, A. Prakash, S. Wang, Y. Liang, T. Mitra, and S. Niar. 2017. Design space exploration of FPGA-based accelerators with multi-level parallelism. In Proceedings of the Design, Automation, and Test in Europe Conference (DATE’17).

Cited By

View all
  • (2024)ExaFlexHH: an exascale-ready, flexible multi-FPGA library for biologically plausible brain simulationsFrontiers in Neuroinformatics10.3389/fninf.2024.133087518Online publication date: 12-Apr-2024
  • (2023)Distributed large-scale graph processing on FPGAsJournal of Big Data10.1186/s40537-023-00756-x10:1Online publication date: 4-Jun-2023
  • (2023)FPGA Acceleration for HPC Supercapacitor SimulationsProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3592979.3593419(1-11)Online publication date: 26-Jun-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 18, Issue 2
June 2021
190 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3450354
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 February 2021
Accepted: 01 November 2020
Revised: 01 October 2020
Received: 01 June 2020
Published in TACO Volume 18, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA
  2. analytical model
  3. architecture
  4. development methodology
  5. performance model
  6. reconfigurable systems

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Maxeler, Intel, and Xilinx is gratefully acknowledged
  • United Kingdom EPSRC

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)165
  • Downloads (Last 6 weeks)22
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)ExaFlexHH: an exascale-ready, flexible multi-FPGA library for biologically plausible brain simulationsFrontiers in Neuroinformatics10.3389/fninf.2024.133087518Online publication date: 12-Apr-2024
  • (2023)Distributed large-scale graph processing on FPGAsJournal of Big Data10.1186/s40537-023-00756-x10:1Online publication date: 4-Jun-2023
  • (2023)FPGA Acceleration for HPC Supercapacitor SimulationsProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3592979.3593419(1-11)Online publication date: 26-Jun-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media