research-article

Microarchitectural Comparison of the MXP and Octavo Soft-Processor FPGA Overlays

Authors:

Charles Eric Laforest,

Jason H. AndersonAuthors Info & Claims

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 10, Issue 3

Article No.: 19, Pages 1 - 25

https://doi.org/10.1145/3053679

Published: 27 May 2017 Publication History

Abstract

Field-Programmable Gate Arrays (FPGAs) can yield higher performance and lower power than software solutions on CPUs or GPUs. However, designing with FPGAs requires specialized hardware design skills and hours-long CAD processing times. To reduce and accelerate the design effort, we can implement an overlay architecture on the FPGA, on which we then more easily construct the desired system but at a large cost in performance and area relative to a direct FPGA implementation. In this work, we compare the micro-architecture, performance, and area of two soft-processor overlays: the Octavo multi-threaded soft-processor and the MXP soft vector processor. To measure the area and performance penalties of these overlays relative to the underlying FPGA hardware, we compare direct FPGA implementations of the micro-benchmarks written in C synthesized with the LegUp HLS tool and also written in the Verilog HDL. Overall, Octavo’s higher operating frequency and MXP’s more efficient code execution results in similar performance from both, within an order of magnitude of direct FPGA implementations, but with a penalty of an order of magnitude greater area.

References

[1]

Altera. 2014. Nios II Performance Benchmarks. Retrieved August 2014 from http://www.altera.com/literature/ds/ds_nios2_perf.pdf.

[2]

Alexander Brant and Guy G. F. Lemieux. 2012. ZUMA: An open FPGA overlay architecture. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM’12). 93--96.

Digital Library

[3]

A. Canis, S. Brown, and J. H. Anderson. 2014. Modulo SDC scheduling with recurrence minimization in high-Level synthesis. In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL’14). 1--8.

[4]

Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Tomasz Czajkowski, Stephen D. Brown, and Jason H. Anderson. 2013. LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems. ACM Trans. Embed. Comput. Syst. 13, 2 (Sept. 2013), Article 24, 27 pages.

Digital Library

[5]

D. Capalija and T. S. Abdelrahman. 2013. A high-performance overlay architecture for pipelined execution of data flow graphs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’13). 1--8.

[6]

Hui Yan Cheah, Fredrik Brosser, Suhaib A. Fahmy, and Douglas L. Maskell. 2014. The iDEA DSP block-based soft processor for FPGAs. ACM Trans. Reconfig. Technol. Syst. 7, 3 (Sept. 2014), Article 19, 23 pages.

Digital Library

[7]

Alexander Choong, Rami Beidas, and Jianwen Zhu. 2010. Parallelizing simulated annealing-based placement using GPGPU. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’10). 31--34.

Digital Library

[8]

Robert Dimond, Oskar Mencer, and Wayne Luk. 2005. CUSTARD - A customisable threaded FPGA soft processor and tools. In Proceedings of the International Conference on Field Programmable Logic (FPL’05). 1--6.

[9]

B. Fort, A. Canis, J. Choi, N. Calagar, R. Lian, S. Hadjis, Y. T. Chen, M. Hall, B. Syrowik, T. Czajkowski, S. Brown, and J. H. Anderson. 2014. Automating the design of processor/accelerator embedded systems with legup high-level synthesis. In Proceedings of the IEEE International Conference on Embedded and Ubiquitous Computing (EUC’14).

Digital Library

[10]

B. Fort, D. Capalija, Z. G. Vranesic, and S. D. Brown. 2006. A multithreaded soft processor for SoPC area reduction. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06). 131--142.

Digital Library

[11]

J. Gray. 2016. GRVI phalanx: A massively parallel RISC-V FPGA accelerator accelerator. In Proceedings of the 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’16). 17--20.

[12]

Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. 2010. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the International Symposium on Computer Architecture (ISCA’10). 37--47.

Digital Library

[13]

ITRS. 2011. International Roadmap For Semiconductors: Design. Retrieved from http://www.itrs.net/Links/2011itrs/2011Chapters/2011Design.pdf.

[14]

Alex K. Jones, Raymond Hoare, Dara Kusic, Joshua Fazekas, and John Foster. 2005. An FPGA-based VLIW processor with custom hardware execution. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA’05). 107--117.

Digital Library

[15]

Volodymyr V. Kindratenko, Robert J. Brunner, and Adam D. Myers. 2007. Mitrion-C application development on SGI altix 350/RC100. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’07). 239--250.

Digital Library

[16]

A Krasnov and A Schultz. 2007. RAMP blue: A message-passing manycore system in FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’07). 54--61.

[17]

M. Labrecque and J. G. Steffan. 2007. Improving pipelined soft processors with multithreading. In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL’07). 210--215.

[18]

Martin Labrecque and J. Gregory Steffan. 2009. Fast critical sections via thread scheduling for FPGA-based multithreaded processors. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’09). 18--25.

[19]

Martin Labrecque, Peter Yiannacouras, and J. Gregory Steffan. 2008. Scaling soft processor systems. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM’08). 195--205.

Digital Library

[20]

Charles Eric LaForest, Jason Anderson, and John Gregory Steffan. 2014. Approaching overhead-free execution on FPGA soft-processors. In Proceedings of the International Conference on Field-Programmable Technology (FPT).

[21]

Charles Eric LaForest and John Gregory Steffan. 2012. OCTAVO: An FPGA-centric processor family. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’12). 219--228.

Digital Library

[22]

Charles Eric LaForest and John Gregory Steffan. 2013. Maximizing speed and density of tiled FPGA overlays via partitioning. In Proceedings of the International Conference on Field-Programmable Technology (FPT’13). 238--245.

[23]

C. Liu, H. C. Ng, and H. K. H. So. 2015. QuickDough: A rapid FPGA loop accelerator design framework using soft CGRA overlay. In Proceedings of the 2015 International Conference on Field Programmable Technology (FPT’15). 56--63.

[24]

Adrian Ludwin and Vaughn Betz. 2011. Efficient and deterministic parallel placement for FPGAs. ACM Trans. Des. Autom. Electr. Syst. 16, 3 (June 2011), 1--23.

Digital Library

[25]

K. E. Murray, S. Whitty, S. Liu, J. Luu, and V. Betz. 2013. Titan: Enabling large and complex benchmarks in academic CAD. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’13). 1--8.

[26]

Mazen A. R. Saghir, Mohamad El-Majzoub, and Patrick Akl. 2006. Datapath and ISA customization for soft VLIW processors. In Proceedings of the IEEE International Conference on Reconfigurable Computing and FPGAs (ReConfig’06). 1--10.

[27]

Aaron Severance, Joe Edwards, Hossein Omidian, and Guy Lemieux. 2014. Soft vector processors with streaming pipelines. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’14). 117--126.

Digital Library

[28]

Aaron Severance and Guy Lemieux. 2012. VENICE: A compact vector processor for FPGA applications. In Proceedings of the IEEE International Conference on Field Programmable Technology (FPT’12). 261--268.

[29]

A. Severance and G. G. F. Lemieux. 2013a. Embedded supercomputing in FPGAs with the vectorblox MXP matrix processor. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’13). 1--10.

Digital Library

[30]

Aaron Severance and Guy Lemieux. 2013b. TputCache: High-frequency, multi-way cache for high-throughput FPGA applications. In Proceedings of the International Conference on Field Programmable Logic (FPL’13), 1--6.

[31]

Kuen Hung Tsoi and Wayne Luk. 2010. Axel: A heterogeneous cluster with FPGAs and GPUs. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’10), 115--124.

Digital Library

[32]

United States Bureau of Labor Statistics. 2012. Occupational Outlook Handbook.

[33]

Henry Wong, Vaughn Betz, and Jonathan Rose. 2011. Comparing FPGA vs. custom CMOS and the impact on processor microarchitecture. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’11). 5--14.

Digital Library

[34]

H. Wong, V. Betz, and J. Rose. 2014. Quantifying the gap between FPGA and custom CMOS to aid microarchitectural design. IEEE Trans. VLSI 22, 10 (Oct. 2014), 2067--2080.

[35]

Qinghong Wu and Kenneth S. McElvain. 2012. A fast discrete placement algorithm for FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’12) (2012), 115--118.

Digital Library

Cited By

Jain AMaskell DFahmy S(2022)Coarse Grained FPGA Overlay for Rapid Just-In-Time Accelerator CompilationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.311685933:6(1478-1490)Online publication date: 1-Jun-2022
https://doi.org/10.1109/TPDS.2021.3116859
Mba MEwo RDenoulet JYonta PGranado B(2022)An efficient FPGA overlay for MPI-2 RMA parallel applications2022 20th IEEE Interregional NEWCAS Conference (NEWCAS)10.1109/NEWCAS52662.2022.9842139(412-416)Online publication date: 19-Jun-2022
https://doi.org/10.1109/NEWCAS52662.2022.9842139
Abdelhamid RYamaguchi Y(2022)Packed SIMD Vectorization of the DRAGON2-CB2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC57363.2022.00023(85-92)Online publication date: Dec-2022
https://doi.org/10.1109/MCSoC57363.2022.00023
Show More Cited By

Index Terms

Microarchitectural Comparison of the MXP and Octavo Soft-Processor FPGA Overlays

Recommendations

Soft vector processors vs FPGA custom hardware: measuring and reducing the gap
FPGA '09: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Soft processors are often used in FPGA-based systems because of their ease-of-use, but for a given computation there is a significant gap in area/performance between a C code implementation executing on a soft processor and a custom FPGA hardware ...
An FPGA implementation for neural networks with the FDFM processor core approach

This paper presents a field programmable gate array FPGA implementation of a three-layer perceptron using the few DSP blocks and few block RAMs FDFM approach implemented in the Xilinx Virtex-6 family FPGA. In the FDFM approach, multiple processor cores ...
Intel nehalem processor core made FPGA synthesizable
FPGA '10: Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays

We present a FPGA-synthesizable version of the Intel Nehalem processor core, synthesized, partitioned and mapped to a multi-FPGA emulation system consisting of Xilinx Virtex-4 and Virtex-5 FPGAs. To our knowledge, this is the first time a modern state-...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems

ACM Transactions on Reconfigurable Technology and Systems Volume 10, Issue 3

September 2017

187 pages

ISSN:1936-7406

EISSN:1936-7414

DOI:10.1145/3102109

Editor:
Steve Wilton
Department of Electrical and Computer Engineering/University of British Columbia/Kaiser 4112, 5500-2332 Main Mall/Vancouver, BC V6T 1Z4 Canada

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2017

Accepted: 01 February 2017

Revised: 01 January 2017

Received: 01 January 2016

Published in TRETS Volume 10, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

NSERC
University of Toronto ECE Department
Queen Elizabeth II World Telecommunication Congress Graduate Scholarship in Science and Technology
Walter C. Sumner Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
226
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jain AMaskell DFahmy S(2022)Coarse Grained FPGA Overlay for Rapid Just-In-Time Accelerator CompilationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.311685933:6(1478-1490)Online publication date: 1-Jun-2022
https://doi.org/10.1109/TPDS.2021.3116859
Mba MEwo RDenoulet JYonta PGranado B(2022)An efficient FPGA overlay for MPI-2 RMA parallel applications2022 20th IEEE Interregional NEWCAS Conference (NEWCAS)10.1109/NEWCAS52662.2022.9842139(412-416)Online publication date: 19-Jun-2022
https://doi.org/10.1109/NEWCAS52662.2022.9842139
Abdelhamid RYamaguchi Y(2022)Packed SIMD Vectorization of the DRAGON2-CB2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC57363.2022.00023(85-92)Online publication date: Dec-2022
https://doi.org/10.1109/MCSoC57363.2022.00023
Abdelhamid RYamaguchi YBoku T(2021)A Highly-Efficient and Tightly-Connected Many-Core Overlay ArchitectureIEEE Access10.1109/ACCESS.2021.30741719(65277-65292)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3074171
Li XMaskell D(2019)Time-Multiplexed FPGA Overlay ArchitecturesACM Transactions on Design Automation of Electronic Systems10.1145/333986124:5(1-19)Online publication date: 23-Jul-2019
https://dl.acm.org/doi/10.1145/3339861
Li XJain AMaskell DFahmy S(2018)A time-multiplexed FPGA overlay with linear interconnect2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2018.8342171(1075-1080)Online publication date: Mar-2018
https://doi.org/10.23919/DATE.2018.8342171

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents