research-article

VirtualSoC: A Research Tool for Modern MPSoCs

Authors:

Daniele Bortolotti,

Andrea Marongiu,

Luca BeniniAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 16, Issue 1

Article No.: 3, Pages 1 - 27

https://doi.org/10.1145/2930665

Published: 13 October 2016 Publication History

Abstract

Architectural heterogeneity has proven to be an effective design paradigm to cope with an ever-increasing demand for computational power within tight energy budgets, in virtually every computing domain. Programmable manycore accelerators are currently widely used not only in high-performance computing systems, but also in embedded devices, in which they operate as coprocessors under the control of a general-purpose CPU (the host processor). Clearly, such powerful hardware architectures are paired with sophisticated and complex software ecosystems, composed of operating systems, programming models plus associated runtime engines, and increasingly complex user applications with related libraries. System modeling has always played a key role in early architectural exploration or software development when the real hardware is not available. The necessity of efficiently coping with the huge HW/SW design space provided by the described heterogeneous Systems on Chip (SoCs) calls for advanced full-system simulation methodologies and tools, capable of assessing various metrics for the functional and nonfunctional properties of the target system. In this article, we describe VirtualSoC, a simulation tool targeting the full-system simulation of massively parallel heterogeneous SoCs. We also describe how VirtualSoC has been successfully adopted in several research projects.

References

[1]

José L. Abellán, Juan Fernández, Manuel E. Acacio, Davide Bertozzi, Daniele Bortolotti, Andrea Marongiu, and Luca Benini. 2012. Design of a collective communication infrastructure for barrier synchronization in cluster-based nanoscale MPSoCs. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’12). EDA Consortium, 491--496.

Digital Library

[2]

Adapteva. 2013. Epiphany Architecture Reference. Retrieved September 9, 2016 from http://www.adapteva.com/docs/epiphany_arch_ref.pdf.

[3]

Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong, and Tor M. Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’09). IEEE, 163--174.

[4]

Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In USENIX 2005 Annual Technical Conference (DATE’05), FREENIX Track. 41--46.

Digital Library

[5]

Luca Benini, Eric Flamand, Didier Fuin, and Diego Melpignano. 2012. P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’12). EDA Consortium, 983--987.

Digital Library

[6]

Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, and others. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2, 1--7.

Digital Library

[7]

Daniele Bortolotti, Andrea Bartolini, Christian Weis, Davide Rossi, and Luca Benini. 2014a. Hybrid memory architecture for voltage scaling in ultra-low power multi-core biomedical processors. In Design, Automation and Test in Europe Conference and Exhibition (DATE’14). IEEE, 1--6.

Digital Library

[8]

Daniele Bortolotti, Hossein Mamaghanian, Andrea Bartolini, Maryam Ashouei, Jan Stuijt, David Atienza, Pierre Vandergheynst, and Luca Benini. 2014b. Approximate compressed sensing: Ultra-low power biosignal processing via aggressive voltage scaling on a hybrid memory multi-core processor. In Proceedings of the 2014 International Symposium on Low Power Electronics and Design. ACM, 45--50.

Digital Library

[9]

Nathan Brookwood. 2010. AMD fusion family of APUs: Enabling a superior, immersive PC experience. Insight 64, 1, 1--8.

[10]

Doug Burger and Todd M. Austin. 1997. The SimpleScalar tool set, version 2.0. ACM SIGARCH Computer Architecture News 25, 3, 13--25.

Digital Library

[11]

Trevor E. Carlson, Wim Heirman, and Lieven Eeckhout. 2011. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulations. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC’11). 52:1--52:12.

Digital Library

[12]

Ik Joon Chang, Debabrata Mohapatra, and Kaushik Roy. 2011. A priority-based 6t/8t hybrid SRAM architecture for aggressive voltage scaling in video applications. IEEE Transactions on Circuits and Systems for Video Technology 21, 2, 101--112.

Digital Library

[13]

Bruce R. Childers, Alex K. Jones, and Daniel Mossé. 2015. A roadmap and plan of action for community-supported empirical evaluation in computer architecture. ACM SIGOPS Operating Systems Review 49, 1, 108--117.

Digital Library

[14]

Leonardo Dagum and Rameshm Enon. 1998. OpenMP: An industry standard API for shared-memory programming. IEEE Computational Science 8 Engineering 5, 1, 46--55.

Digital Library

[15]

M. Dall’Osso, G. Biccari, L. Giovannini, D. Bertozzi, and L. Benini. 2012. Xpipes: A latency insensitive parameterized network-on-chip architecture for multi-processor SoCs. In IEEE 30th International Conference on Computer Design (ICCD’12). 45--48.

Digital Library

[16]

Benoît Dupont de Dinechin, Renaud Ayrignac, Pierre-Edouard Beaucamps, Patrice Couvert, Benoit Ganne, Pierre Guironnet de Massas, Frederique Jacquet, Simon Jones, Nicolas Morey Chaisemartin, Frédéric Riss, and others. 2013. A clustered manycore processor architecture for embedded and accelerated applications. In IEEE High Performance Extreme Computing Conference (HPEC’13). IEEE, 1--6.

[17]

Cesare Ferri, Andrea Marongiu, Benjamin Lipton, R. Iris Bahar, Tali Moreshet, Luca Benini, and Maurice Herlihy. 2011. SoC-TM: Integrated HW/SW support for transactional memory programming on embedded MPSoCs. In Proceedings of the 9th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’11), part of ESWeek’11 7th Embedded Systems Week, Taipei, Taiwan, 9-14 October, 2011. 39--48.

Digital Library

[18]

Christophe Guillon. 2011. Program instrumentation with QEMU. In 1st International QEMU Users Forum, Vol. 1. 15--18.

[19]

Alvaro Gutierrez, Joseph Pusdesris, Ronald G. Dreslinski, Trevor Mudge, Chander Sudanthi, Christopher D. Emmons, Mitchell Hayenga, and Nigel Paver. 2014. Sources of error in full-system simulation. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’14). IEEE, 13--22.

[20]

C. Helmstetter and V. Joloboff. 2008. SimSoC: A SystemC TLM integrated ISS for full system simulation. In IEEE Asia Pacific Conference on Circuits and Systems (APCCAS’08). 1759--1762.

[21]

Maurice Herlihy and J. Eliot B. Moss. 1993. Transactional Memory: Architectural Support for Lock-free Data Structures. Vol. 21. ACM.

Digital Library

[22]

Imperas Software. 2015. OVPSim. Retrieved September 9, 2016 from http://www.ovpworld.org/technology_ovpsim.

[23]

James Jeffers and James Reinders. 2013. Intel Xeon Phi Coprocessor High-performance Programming. Newnes, Boston, MA.

Digital Library

[24]

Kalray. 2015. MPPA 256 - Programmable Manycore Processor. Retrieved September 9, 2016 from www.kalray.eu/products/mppa-manycore/mppa-256.

[25]

Khronos OpenCL Working Group and others. 2008. The OpenCL specification. A. Munshi, ed.

[26]

Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Hallberg, Johan Hogberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner. 2002. Simics: A full system simulation platform. Computer 35, 2, 50--58.

Digital Library

[27]

Hossein Mamaghanian, Nadia Khaled, David Atienza, and Pierre Vandergheynst. 2011. Compressed sensing for real-time energy-efficient ECG compression on wireless body sensor nodes. IEEE Transactions on Biomedical Engineering 58, 9, 2456--2466.

[28]

Andrea Marongiu, Alessandro Capotondi, and Luca Benini. 2016. Controlling {NUMA} effects in embedded manycore applications with lightweight nested parallelism support. Parallel Computing In press.

[29]

Andrea Marongiu, Alessandro Capotondi, Giuseppe Tagliavini, and Luca Benini. 2015. Simplifying many-core-based heterogeneous SoC programming with offload directives. IEEE Transactions on Industrial Informatics 11, 4, 957--967.

[30]

Aline Mello, Isaac Maia, Alain Greiner, and Francois Pecheux. 2010. Parallel simulation of SystemC TLM 2.0 compliant MPSoC on SMP workstations. In Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’10). IEEE, 606--609.

Digital Library

[31]

MentorGraphics. 2015. Vista Virtual Prototyping. (2015). Retrieved September 9, 2016 from https://www.mentor.com/esl/vista/virtual-prototyping/.

[32]

Marius Monton, Antoni Portero, Marc Moreno, Borja Martinez, and Jordi Carrabina. 2007. Mixed SW/SystemC SoC emulation framework. In IEEE International Symposium on Industrial Electronics (ISIE’07). 2338--2341.

[33]

NVIDIA. 2015. NVIDIA Tegra X1. Retrieved September 9, 2016 from http://www.nvidia.com/object/tegra-x1-processor.html.

[34]

NVIDIA Corp. 2015. NVIDIA Tegra X1 Architecture. Retrieved September 9, 2016 from http://international.download.nvidia.com/pdf/tegra/Tegra-X1-whitepaper-v1.0.pdf.

[35]

OpenACC. 2013. The OpenACC Application Programming Interface, Version 2.0. Retrieved September 9, 2016 from http://www.openacc.org/sites/default/files/OpenACC.2.0a_1.pdf.

[36]

OpenMP Architecture Review Board. 2013. OpenMP Application Program Interface Version 4.0. Retrieved September 9, 2016 from http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf.

[37]

OSCI. 2009. Open SystemC Initiative (OSCI) TLM-2.0 LANGUAGE REFERENCE MANUAL. Retrieved September 9, 2016 from http://www.accellera.org/images/downloads/standards/systemc/TLM_2_0_LRM.pdf.

[38]

Dimitra Papagiannopoulou, Tali Moreshet, Andrea Marongiu, Luca Benini, Maurice Herlihy, and R. Iris Bahar. 2014. Speculative synchronization for coherence-free embedded NUMA architectures. In International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV’14). IEEE, 99--106.

[39]

Avadh Patel, Furat Afram, Shunfei Chen, and Kanad Ghose. 2011. MARSS: A full system simulator for multicore x86 CPUs. In Proceedings of the 48th Design Automation Conference. ACM, 1050--1055.

Digital Library

[40]

PEZY. 2015. PEZY-SC Many Core Processor. Retrieved September 9, 2016 from http://www.pezy.co.jp/en/products/pezy-sc.html.

[41]

Christian Pinto, Shivani Raghav, Andrea Marongiu, Martino Ruggiero, David Atienza, and Luca Benini. 2011. GPGPU-accelerated parallel and fast simulation of thousand-core platforms. In Proceedings of the 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID’11). IEEE Computer Society, Washington, DC, 53--62.

Digital Library

[42]

Plurality Ltd. 2010. The hypercore architecture. White paper. Technical report version 1.7.

[43]

J. Power, J. Hestness, M. S. Orr, M. D. Hill, and D. A. Wood. 2015. Gem5-gpu: A heterogeneous CPU-GPU simulator. Computer Architecture Letters 14, 1, 34--36.

Digital Library

[44]

PULP. 2016. PULP - An Open Parallel Ultra-Low-Power Processing-Platform. Retrieved September 9, 2016 from http://iis-projects.ee.ethz.ch/index.php/PULP.

[45]

Shivani Raghav, Andrea Marongiu, Christian Pinto, David Atienza, Martino Ruggiero, and Luca Benini. 2012. Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihosting. In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units (GPGPU’12). ACM, New York, NY, 101--109.

Digital Library

[46]

Shivani Raghav, Andrea Marongiu, Christian Pinto, Martino Ruggiero, David Atienza, and Luca Benini. 2013. SIMinG-1k: A thousand-core simulator running on general-purpose graphical processing units. Concurrency and Computation: Practice and Experience 25, 10, 1443--1461.

[47]

Abbas Rahimi, Daniele Cesarini, Andrea Marongiu, Rajesh K. Gupta, and Luca Benini. 2015. Task scheduling strategies to mitigate hardware variability in embedded shared memory clusters. In Proceedings of the 52nd Annual Design Automation Conference (DAC’15). ACM, New York, NY, Article 152, 152:1--152:6 pages.

Digital Library

[48]

Abbas Rahimi, Igor Loi, Mohammad Reza Kakoee, and Luca Benini. 2011. A fully-synthesizable single-cycle interconnection network for shared-L1 processor clusters. In Design, Automation Test in Europe Conference Exhibition (DATE’11). 1--6.

[49]

Davide Rossi, Igor Loi, Germain Haugou, and Luca Benini. 2014. Ultra-low-latency lightweight DMA for tightly coupled multi-core clusters. In Proceedings of the 11th ACM Conference on Computing Frontiers. ACM, 15.

Digital Library

[50]

Synopsys. 2015. Platform Architect. Retrieved September 9, 2016 from http://www.synopsys.com/Prototyping/ArchitectureDesign/pages/platform-architect.aspx.

[51]

Texas Instruments. 2013. Multicore DSP+ARM KeyStone II System-on-Chip (SoC). Retrieved September 9, 2016 from http://www.ti.com/lit/ds/sprs866e/sprs866e.pdf.

[52]

Rafael Ubal, Byunghyun Jang, Perhaad Mistry, Dana Schaa, and David Kaeli. 2012. Multi2Sim: A simulation framework for CPU-GPU computing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. ACM, 335--344.

Digital Library

[53]

David Wang, Brinda Ganesh, Nuengwong Tuaycharoen, Kathleen Baynes, Aamer Jaleel, and Bruce Jacob. 2005. DRAMsim: A memory system simulator. ACM SIGARCH Computer Architecture News 33, 4, 100--107.

Digital Library

[54]

Wind River. 2015. Simics Full System Simulator. Retrieved September 9, 2016 from http://www.windriver.com/products/simics.

[55]

Matt T. Yourst. 2007. PTLsim: A cycle accurate full system x86-64 microarchitectural simulator. In IEEE International Symposium on Performance Analysis of Systems 8 Software (ISPASS’07). IEEE, 23--34.

Cited By

Gorgues Alonso MFlich JTurki MBertozzi D(2019)A Low-Latency and Flexible TDM NoC for Strong Isolation in Security-Critical Systems2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC.2019.00029(149-156)Online publication date: Oct-2019
https://doi.org/10.1109/MCSoC.2019.00029
Turki MBertozzi D(2019)An Interconnect-Centric Approach to the Flexible Partitioning and Isolation of Many-Core Accelerators for Fog Computing2019 XXXIV Conference on Design of Circuits and Integrated Systems (DCIS)10.1109/DCIS201949030.2019.8959943(1-6)Online publication date: Nov-2019
https://doi.org/10.1109/DCIS201949030.2019.8959943
Tagliavini GCesarini DMarongiu A(2018)Unleashing Fine-Grained Parallelism on Embedded Many-Core Accelerators with Lightweight OpenMP TaskingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.281460229:9(2150-2163)Online publication date: 1-Sep-2018
https://doi.org/10.1109/TPDS.2018.2814602
Show More Cited By

Index Terms

VirtualSoC: A Research Tool for Modern MPSoCs
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems

Recommendations

Network interfaces for programmable NICs and multicore platforms

The availability of multicore processors and programmable NICs, such as TOEs (TCP/IP Offloading Engines), provides new opportunities for designing efficient network interfaces to cope with the gap between the improvement rates of link bandwidths and ...
A 475 MHz Manycore FPGA Accelerator for RTL Simulation
FPGA '24: Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

This paper presents the implementation of Manticore: a manycore accelerator for parallel RTL simulation. Manticore packs up to 225 custom soft processors running at 475 MHz on a large FPGA. Implementing manycore accelerators on FPGAs is challenging as ...
Exploring many-core architecture design space for parallel discrete event simulation
SIGSIM PADS '14: Proceedings of the 2nd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

As multicore and manycore processor architectures are emerging and the core counts per chip continue to increase, it is important to evaluate and understand the performance and scalability of Parallel Discrete Event Simulation (PDES) on these platforms. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 16, Issue 1

Special Issue on VIPES, Special Issue on ICESS2015 and Regular Papers

February 2017

602 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/3008024

Editor:
Sandeep K. Shukla
Indian Institute of Technology, India

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 13 October 2016

Accepted: 01 April 2016

Revised: 01 March 2016

Received: 01 October 2015

Published in TECS Volume 16, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

MULTITHERMAN
P-SOCRATES

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
251
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gorgues Alonso MFlich JTurki MBertozzi D(2019)A Low-Latency and Flexible TDM NoC for Strong Isolation in Security-Critical Systems2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC.2019.00029(149-156)Online publication date: Oct-2019
https://doi.org/10.1109/MCSoC.2019.00029
Turki MBertozzi D(2019)An Interconnect-Centric Approach to the Flexible Partitioning and Isolation of Many-Core Accelerators for Fog Computing2019 XXXIV Conference on Design of Circuits and Integrated Systems (DCIS)10.1109/DCIS201949030.2019.8959943(1-6)Online publication date: Nov-2019
https://doi.org/10.1109/DCIS201949030.2019.8959943
Tagliavini GCesarini DMarongiu A(2018)Unleashing Fine-Grained Parallelism on Embedded Many-Core Accelerators with Lightweight OpenMP TaskingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.281460229:9(2150-2163)Online publication date: 1-Sep-2018
https://doi.org/10.1109/TPDS.2018.2814602
Zarrin JAguiar RBarraca J(2017)Manycore simulation for peta-scale system design: Motivation, tools, challenges and prospectsSimulation Modelling Practice and Theory10.1016/j.simpat.2016.12.01472(168-201)Online publication date: Mar-2017
https://doi.org/10.1016/j.simpat.2016.12.014

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents