Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

The Molen compiler for reconfigurable processors

Published: 01 February 2007 Publication History

Abstract

In this paper, we describe the compiler developed to target the Molen reconfigurable processor and programming paradigm. The compiler automatically generates optimized binary code for C applications, based on pragma annotation of the code executed on the reconfigurable hardware. For the IBM PowerPC 405 processor included in the Virtex II Pro platform FPGA, we implemented code generation, register, and stack frame allocation following the PowerPC EABI (embedded application binary interface). The PowerPC backend has been extended to generate the appropriate instructions for the reconfigurable hardware and data transfer, taking into account the information of the specific hardware implementations and system. Starting with an annotated C application, a complete design flow has been integrated to generate the executable bitstream for the reconfigurable processor. The flexible design of the proposed infrastructure allows to consider the special features of the reconfigurable architectures. In order to hide the reconfiguration latencies, we implemented an instruction-scheduling algorithm for the dynamic hardware configuration instructions. The algorithm schedules, in advance, the hardware configuration instructions, taking into account the conflicts for the reconfigurable hardware resources (FPGA area) between the hardware operations. To verify the Molen compiler, we used the multimedia video frame M-JPEG encoder of which the extended discrete cosine transform (DCT*) function was mapped on the FPGA. We obtained an overall speedup of 2.5 (about 84% efficiency over the maximal theoretical speedup of 2.96). The performance efficiency is achieved using automatically generated nonoptimized DCT* hardware implementation. The instruction-scheduling algorithm has been tested for DCT, quantization, and VLC operations. Based on simulation results, we determine that, while a simple scheduling produces a significant performance decrease, our proposed scheduling contributes for up to 16x M-JPEG encoder speedup.

References

[1]
Blodget, B., Bobda, C., Huebner, M., and Niyonkuru, A. 2004. Partial and dynamic reconfiguration of xilinx virtex-ii fpgas. In FPL. vol. 3203. Springer-Verlag. Lecture Notes in Computer Science (LNCS), Antwerp, Belgium. 801--810.]]
[2]
Bolotski, M., DeHon, A., and Knight, J. T. F. 1994. Unifying FPGAs and SIMD arrays. In ACM/SIGDA Symposium on FPGAs. Berkeley, CA. 1--10.]]
[3]
Bondalapati, K., Diniz, P. C., Duncan, P., Granacki, J., Hall, M., Jain, R., and Ziegler, H. 1999. DEFACTO: A design environment for adaptive computing technology. In IPPS/SPDP Workshops. 570--578.]]
[4]
Cai, Q. and Xue, J. 2003. Optimal and efficient speculation-based partial redundancy elimination. In ACM CGO. San Francisco, CA. 91--102.]]
[5]
Campi, F., Cappelli, A., Guerrieri, R., Lodi, A., Toma, M., Rosa, A. L., Lavagno, L., and Passerone, C. 2003. A reconfigurable processor architecture and software development environment for embedded systems. In Proceedings of Parallel and Distributed Processing Symposium. Nice, France. 171--178.]]
[6]
EDK. http://www.xilinx.com/ise/embedded/edk.htm.]]
[7]
Gokhale, M. B. and Stone, J. M. 1998. Napa C: Compiling for a Hybrid RISC/FPGA Architecture. In Proceedings of FCCM'98. Napa Valley, CA. 126--137.]]
[8]
ISE. “http://www.xilinx.com/ise_eval/index.htm.”]]
[9]
Kastrup, B., Bink, A., and Hoogerbrugge, J. 1999. Concise: A compiler-driven cpld-based instruction set accelerator. In Proceedings of FCCM'99. Napa Valley CA. 92--100.]]
[10]
Kienhuis, B., Rijpkema, E., and Deprettere, E. 2000. Compaan: Deriving process networks from matlab for embedded signal processing architectures. In Proc. of CODES'2000. San Diego, CA. 13--17.]]
[11]
Kuzmanov, G. and Vassiliadis, S. 2003. Arbitrating instructions in an ρμ-coded CCM. In Proceedings of the 13th International Conference on Field Programmable Logic and Applications (FPL'03). vol. 2778. Springer-Verlag. Lecture Notes in Computer Science (LNCS), Lisbon, Portugal. 81--90.]]
[12]
Lee, M.-H., Singh, H., Lu, G., Bagherzadeh, N., and Kurdahi, F. J. 2000. Design and implementation of the MorphoSys reconfigurable computing processor. VLSI Signal Processing Systems 24, 147--164.]]
[13]
MachineSUIF. “http://www.eecs.harvard.edu/hube/software.”]]
[14]
Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. Simics: A full system simulation platform. IEEE Transactions on Computers 35, 2, 50--58.]]
[15]
Moscu Panainte, E., Bertels, K., and Vassiliadis, S. 2003. Compiling for the molen programming paradigm. In 13th International Conference on Field Programmable Logic and Applications (FPL). vol. 2778. Springer-Verlag. Lecture Notes in Computer Science (LNCS), Lisbon, Portugal. 900--910.]]
[16]
Moscu Panainte, E., Bertels, K., and Vassiliadis, S. 2004a. Dynamic hardware reconfigurations: Performance impact on mpeg2. In Proceedings of SAMOS. vol. 3133. Springer-Verlag. Lecture Notes in Computer Science (LNCS), Samos, Greece. 284--292.]]
[17]
Moscu Panainte, E., Bertels, K., and Vassiliadis, S. 2004b. The PowerPC backend molen compiler. In FPL. vol. 3203. Springer-Verlag. Lecture Notes in Computer Science (LNCS), Antwerp, Belgium. 434--443.]]
[18]
Pillai, L. 2002. Video compression using dct. In Application Note: Virtex-II Series. Xilinx, http://direct.xilinx.com/bvdocs/appnotes/xapp610.pdf.]]
[19]
Pillai, L. 2003a. Quantization. In Application Note: Virtex and Virtex-II Series. Xilinx, http://direct.xilinx.com/bvdocs/appnotes/xapp615.pdf.]]
[20]
Pillai, L. 2003b. Variable length coding. In Application Note: Virtex-II Series. Xilinx, http://direct.xilinx.com/bvdocs/appnotes/xapp621.pdf.]]
[21]
Rosa, A. L., Lavagno, L., and Passerone, C. 2003. Hardware/Software design space exploration for a reconfigurable processor. In Proc. of DATE 2003. Munich, Germany. 570--575.]]
[22]
Sima, M., Vassiliadis, S., S.Cotofana, van Eijndhoven, J., and Vissers, K. 2002. Field-programmable custom computing machines---A taxonomy. In 12th International Conference on Field Programmable Logic and Applications (FPL). vol. 2438. Springer-Verlag. Lecture Notes in Computer Science (LNCS), Montpellier, France. 79--88.]]
[23]
Stefanov, T., Zissulescu, C., Turjan, A., Kienhuis, B., and Deprettere, E. 2004. System design using kahn process networks: The Compaan/Laura approach. In Proc. of DATE 2004. Paris, France. 340--345.]]
[24]
SUIF2. “http://suif.stanford.edu/suif/suif2.”]]
[25]
Tang, X., Aalsma, M., and Jou, R. 2000. A compiler directed approach to hiding confguration latency in Chameleon processors. In FPL. vol. 1896. Springer-Verlag. Lecture Notes in Computer Science (LNCS), Villach, Austria. 29--38.]]
[26]
Vassiliadis, S., Wong, S., and Cotofana, S. 2001. The MOLEN ρμ-Coded Processor. In 11th International Conference on Field Programmable Logic and Applications (FPL). vol. 2147. Springer-Verlag. Lecture Notes in Computer Science (LNCS), Belfast, UK. 275--285.]]
[27]
Vassiliadis, S., Gaydadjiev, G., Bertels, K., and Moscu Panainte, E. 2003. The molen programming paradigm. In Proceedings of the 3rd International Workshop on Systems, Architectures, Modeling, and Simulation. Samos, Greece. 1--7.]]
[28]
Ye, Z. A., Shenoy, N., and Banerjee, P. 2000. A C Compiler for a processor with a reconfigurable functional unit. In ACM/SIGDA Symposium on FPGAs. Monterey, CA. 95--100.]]
[29]
Zissulescu, C., Stefanov, T., Kienhuis, B., and Deprettere, E. 2003. Laura: Leiden architecture research and exploration tool. In 13th International Conference on Field Programmable Logic and Applications (FPL). vol. 2778. Springer-Verlag. Lecture Notes in Computer Science (LNCS), Lisbon, Portugal. 911--920.]]

Cited By

View all
  • (2022)A Survey on Machine Learning Accelerators and Evolutionary Hardware PlatformsIEEE Design & Test10.1109/MDAT.2022.316112639:3(91-116)Online publication date: Jun-2022
  • (2017)Hot spots profiling and dataflow analysis in custom dataflow computing SoftProcessorsJournal of Systems and Software10.1016/j.jss.2016.07.025125:C(427-438)Online publication date: 1-Mar-2017
  • (2016)OuessantProceedings of the 2016 Conference on Design, Automation & Test in Europe10.5555/2971808.2972156(1493-1496)Online publication date: 14-Mar-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 6, Issue 1
February 2007
210 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/1210268
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 01 February 2007
Published in TECS Volume 6, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA
  2. Instruction scheduling
  3. reconfigurable computing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)A Survey on Machine Learning Accelerators and Evolutionary Hardware PlatformsIEEE Design & Test10.1109/MDAT.2022.316112639:3(91-116)Online publication date: Jun-2022
  • (2017)Hot spots profiling and dataflow analysis in custom dataflow computing SoftProcessorsJournal of Systems and Software10.1016/j.jss.2016.07.025125:C(427-438)Online publication date: 1-Mar-2017
  • (2016)OuessantProceedings of the 2016 Conference on Design, Automation & Test in Europe10.5555/2971808.2972156(1493-1496)Online publication date: 14-Mar-2016
  • (2016)Evaluation and Tradeoffs for Out-of-Order Execution on Reconfigurable Heterogeneous MPSoCIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2015.238989324:1(79-91)Online publication date: Jan-2016
  • (2016)Ouessant: Microcontroller approach for flexible accelerator integration and control in System-on-Chip2016 26th International Conference on Field Programmable Logic and Applications (FPL)10.1109/FPL.2016.7577340(1-4)Online publication date: Aug-2016
  • (2015)CRAIS: A Crossbar-Based Interconnection Scheme on FPGA for Big DataJournal of Computer Science and Technology10.1007/s11390-015-1506-530:1(84-96)Online publication date: 21-Jan-2015
  • (2013)BibliographyMulticore Technology10.1201/b15268-20(409-450)Online publication date: 18-Jul-2013
  • (2013)MORPHEUSACM Transactions on Embedded Computing Systems10.1145/2442116.244212012:3(1-33)Online publication date: 8-Apr-2013
  • (2013)Static or DynamicProceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications10.1109/TrustCom.2013.110(903-910)Online publication date: 16-Jul-2013
  • (2013)FPGA-Based DSPHandbook of Signal Processing Systems10.1007/978-1-4614-6859-2_22(707-739)Online publication date: 10-May-2013
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media