Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Microarchitectural Comparison of the MXP and Octavo Soft-Processor FPGA Overlays

Published: 27 May 2017 Publication History

Abstract

Field-Programmable Gate Arrays (FPGAs) can yield higher performance and lower power than software solutions on CPUs or GPUs. However, designing with FPGAs requires specialized hardware design skills and hours-long CAD processing times. To reduce and accelerate the design effort, we can implement an overlay architecture on the FPGA, on which we then more easily construct the desired system but at a large cost in performance and area relative to a direct FPGA implementation. In this work, we compare the micro-architecture, performance, and area of two soft-processor overlays: the Octavo multi-threaded soft-processor and the MXP soft vector processor. To measure the area and performance penalties of these overlays relative to the underlying FPGA hardware, we compare direct FPGA implementations of the micro-benchmarks written in C synthesized with the LegUp HLS tool and also written in the Verilog HDL. Overall, Octavo’s higher operating frequency and MXP’s more efficient code execution results in similar performance from both, within an order of magnitude of direct FPGA implementations, but with a penalty of an order of magnitude greater area.

References

[1]
Altera. 2014. Nios II Performance Benchmarks. Retrieved August 2014 from http://www.altera.com/literature/ds/ds_nios2_perf.pdf.
[2]
Alexander Brant and Guy G. F. Lemieux. 2012. ZUMA: An open FPGA overlay architecture. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM’12). 93--96.
[3]
A. Canis, S. Brown, and J. H. Anderson. 2014. Modulo SDC scheduling with recurrence minimization in high-Level synthesis. In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL’14). 1--8.
[4]
Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Tomasz Czajkowski, Stephen D. Brown, and Jason H. Anderson. 2013. LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems. ACM Trans. Embed. Comput. Syst. 13, 2 (Sept. 2013), Article 24, 27 pages.
[5]
D. Capalija and T. S. Abdelrahman. 2013. A high-performance overlay architecture for pipelined execution of data flow graphs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’13). 1--8.
[6]
Hui Yan Cheah, Fredrik Brosser, Suhaib A. Fahmy, and Douglas L. Maskell. 2014. The iDEA DSP block-based soft processor for FPGAs. ACM Trans. Reconfig. Technol. Syst. 7, 3 (Sept. 2014), Article 19, 23 pages.
[7]
Alexander Choong, Rami Beidas, and Jianwen Zhu. 2010. Parallelizing simulated annealing-based placement using GPGPU. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’10). 31--34.
[8]
Robert Dimond, Oskar Mencer, and Wayne Luk. 2005. CUSTARD - A customisable threaded FPGA soft processor and tools. In Proceedings of the International Conference on Field Programmable Logic (FPL’05). 1--6.
[9]
B. Fort, A. Canis, J. Choi, N. Calagar, R. Lian, S. Hadjis, Y. T. Chen, M. Hall, B. Syrowik, T. Czajkowski, S. Brown, and J. H. Anderson. 2014. Automating the design of processor/accelerator embedded systems with legup high-level synthesis. In Proceedings of the IEEE International Conference on Embedded and Ubiquitous Computing (EUC’14).
[10]
B. Fort, D. Capalija, Z. G. Vranesic, and S. D. Brown. 2006. A multithreaded soft processor for SoPC area reduction. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06). 131--142.
[11]
J. Gray. 2016. GRVI phalanx: A massively parallel RISC-V FPGA accelerator accelerator. In Proceedings of the 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’16). 17--20.
[12]
Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. 2010. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the International Symposium on Computer Architecture (ISCA’10). 37--47.
[13]
ITRS. 2011. International Roadmap For Semiconductors: Design. Retrieved from http://www.itrs.net/Links/2011itrs/2011Chapters/2011Design.pdf.
[14]
Alex K. Jones, Raymond Hoare, Dara Kusic, Joshua Fazekas, and John Foster. 2005. An FPGA-based VLIW processor with custom hardware execution. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’05). 107--117.
[15]
Volodymyr V. Kindratenko, Robert J. Brunner, and Adam D. Myers. 2007. Mitrion-C application development on SGI altix 350/RC100. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’07). 239--250.
[16]
A Krasnov and A Schultz. 2007. RAMP blue: A message-passing manycore system in FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’07). 54--61.
[17]
M. Labrecque and J. G. Steffan. 2007. Improving pipelined soft processors with multithreading. In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL’07). 210--215.
[18]
Martin Labrecque and J. Gregory Steffan. 2009. Fast critical sections via thread scheduling for FPGA-based multithreaded processors. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’09). 18--25.
[19]
Martin Labrecque, Peter Yiannacouras, and J. Gregory Steffan. 2008. Scaling soft processor systems. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM’08). 195--205.
[20]
Charles Eric LaForest, Jason Anderson, and John Gregory Steffan. 2014. Approaching overhead-free execution on FPGA soft-processors. In Proceedings of the International Conference on Field-Programmable Technology (FPT).
[21]
Charles Eric LaForest and John Gregory Steffan. 2012. OCTAVO: An FPGA-centric processor family. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’12). 219--228.
[22]
Charles Eric LaForest and John Gregory Steffan. 2013. Maximizing speed and density of tiled FPGA overlays via partitioning. In Proceedings of the International Conference on Field-Programmable Technology (FPT’13). 238--245.
[23]
C. Liu, H. C. Ng, and H. K. H. So. 2015. QuickDough: A rapid FPGA loop accelerator design framework using soft CGRA overlay. In Proceedings of the 2015 International Conference on Field Programmable Technology (FPT’15). 56--63.
[24]
Adrian Ludwin and Vaughn Betz. 2011. Efficient and deterministic parallel placement for FPGAs. ACM Trans. Des. Autom. Electr. Syst. 16, 3 (June 2011), 1--23.
[25]
K. E. Murray, S. Whitty, S. Liu, J. Luu, and V. Betz. 2013. Titan: Enabling large and complex benchmarks in academic CAD. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’13). 1--8.
[26]
Mazen A. R. Saghir, Mohamad El-Majzoub, and Patrick Akl. 2006. Datapath and ISA customization for soft VLIW processors. In Proceedings of the IEEE International Conference on Reconfigurable Computing and FPGAs (ReConfig’06). 1--10.
[27]
Aaron Severance, Joe Edwards, Hossein Omidian, and Guy Lemieux. 2014. Soft vector processors with streaming pipelines. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’14). 117--126.
[28]
Aaron Severance and Guy Lemieux. 2012. VENICE: A compact vector processor for FPGA applications. In Proceedings of the IEEE International Conference on Field Programmable Technology (FPT’12). 261--268.
[29]
A. Severance and G. G. F. Lemieux. 2013a. Embedded supercomputing in FPGAs with the vectorblox MXP matrix processor. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’13). 1--10.
[30]
Aaron Severance and Guy Lemieux. 2013b. TputCache: High-frequency, multi-way cache for high-throughput FPGA applications. In Proceedings of the International Conference on Field Programmable Logic (FPL’13), 1--6.
[31]
Kuen Hung Tsoi and Wayne Luk. 2010. Axel: A heterogeneous cluster with FPGAs and GPUs. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’10), 115--124.
[32]
United States Bureau of Labor Statistics. 2012. Occupational Outlook Handbook.
[33]
Henry Wong, Vaughn Betz, and Jonathan Rose. 2011. Comparing FPGA vs. custom CMOS and the impact on processor microarchitecture. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’11). 5--14.
[34]
H. Wong, V. Betz, and J. Rose. 2014. Quantifying the gap between FPGA and custom CMOS to aid microarchitectural design. IEEE Trans. VLSI 22, 10 (Oct. 2014), 2067--2080.
[35]
Qinghong Wu and Kenneth S. McElvain. 2012. A fast discrete placement algorithm for FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’12) (2012), 115--118.

Cited By

View all
  • (2022)Coarse Grained FPGA Overlay for Rapid Just-In-Time Accelerator CompilationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.311685933:6(1478-1490)Online publication date: 1-Jun-2022
  • (2022)An efficient FPGA overlay for MPI-2 RMA parallel applications2022 20th IEEE Interregional NEWCAS Conference (NEWCAS)10.1109/NEWCAS52662.2022.9842139(412-416)Online publication date: 19-Jun-2022
  • (2022)Packed SIMD Vectorization of the DRAGON2-CB2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC57363.2022.00023(85-92)Online publication date: Dec-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 10, Issue 3
September 2017
187 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/3102109
  • Editor:
  • Steve Wilton
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2017
Accepted: 01 February 2017
Revised: 01 January 2017
Received: 01 January 2016
Published in TRETS Volume 10, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Benchmarking
  2. FPGA
  3. multi-threading
  4. overlay
  5. soft-processor
  6. vector

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • NSERC
  • University of Toronto ECE Department
  • Queen Elizabeth II World Telecommunication Congress Graduate Scholarship in Science and Technology
  • Walter C. Sumner Foundation

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Coarse Grained FPGA Overlay for Rapid Just-In-Time Accelerator CompilationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.311685933:6(1478-1490)Online publication date: 1-Jun-2022
  • (2022)An efficient FPGA overlay for MPI-2 RMA parallel applications2022 20th IEEE Interregional NEWCAS Conference (NEWCAS)10.1109/NEWCAS52662.2022.9842139(412-416)Online publication date: 19-Jun-2022
  • (2022)Packed SIMD Vectorization of the DRAGON2-CB2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC57363.2022.00023(85-92)Online publication date: Dec-2022
  • (2021)A Highly-Efficient and Tightly-Connected Many-Core Overlay ArchitectureIEEE Access10.1109/ACCESS.2021.30741719(65277-65292)Online publication date: 2021
  • (2019)Time-Multiplexed FPGA Overlay ArchitecturesACM Transactions on Design Automation of Electronic Systems10.1145/333986124:5(1-19)Online publication date: 23-Jul-2019
  • (2018)A time-multiplexed FPGA overlay with linear interconnect2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2018.8342171(1075-1080)Online publication date: Mar-2018

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media