Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

VirtualSoC: A Research Tool for Modern MPSoCs

Published: 13 October 2016 Publication History

Abstract

Architectural heterogeneity has proven to be an effective design paradigm to cope with an ever-increasing demand for computational power within tight energy budgets, in virtually every computing domain. Programmable manycore accelerators are currently widely used not only in high-performance computing systems, but also in embedded devices, in which they operate as coprocessors under the control of a general-purpose CPU (the host processor). Clearly, such powerful hardware architectures are paired with sophisticated and complex software ecosystems, composed of operating systems, programming models plus associated runtime engines, and increasingly complex user applications with related libraries. System modeling has always played a key role in early architectural exploration or software development when the real hardware is not available. The necessity of efficiently coping with the huge HW/SW design space provided by the described heterogeneous Systems on Chip (SoCs) calls for advanced full-system simulation methodologies and tools, capable of assessing various metrics for the functional and nonfunctional properties of the target system. In this article, we describe VirtualSoC, a simulation tool targeting the full-system simulation of massively parallel heterogeneous SoCs. We also describe how VirtualSoC has been successfully adopted in several research projects.

References

[1]
José L. Abellán, Juan Fernández, Manuel E. Acacio, Davide Bertozzi, Daniele Bortolotti, Andrea Marongiu, and Luca Benini. 2012. Design of a collective communication infrastructure for barrier synchronization in cluster-based nanoscale MPSoCs. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’12). EDA Consortium, 491--496.
[2]
Adapteva. 2013. Epiphany Architecture Reference. Retrieved September 9, 2016 from http://www.adapteva.com/docs/epiphany_arch_ref.pdf.
[3]
Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong, and Tor M. Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’09). IEEE, 163--174.
[4]
Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In USENIX 2005 Annual Technical Conference (DATE’05), FREENIX Track. 41--46.
[5]
Luca Benini, Eric Flamand, Didier Fuin, and Diego Melpignano. 2012. P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’12). EDA Consortium, 983--987.
[6]
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, and others. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2, 1--7.
[7]
Daniele Bortolotti, Andrea Bartolini, Christian Weis, Davide Rossi, and Luca Benini. 2014a. Hybrid memory architecture for voltage scaling in ultra-low power multi-core biomedical processors. In Design, Automation and Test in Europe Conference and Exhibition (DATE’14). IEEE, 1--6.
[8]
Daniele Bortolotti, Hossein Mamaghanian, Andrea Bartolini, Maryam Ashouei, Jan Stuijt, David Atienza, Pierre Vandergheynst, and Luca Benini. 2014b. Approximate compressed sensing: Ultra-low power biosignal processing via aggressive voltage scaling on a hybrid memory multi-core processor. In Proceedings of the 2014 International Symposium on Low Power Electronics and Design. ACM, 45--50.
[9]
Nathan Brookwood. 2010. AMD fusion family of APUs: Enabling a superior, immersive PC experience. Insight 64, 1, 1--8.
[10]
Doug Burger and Todd M. Austin. 1997. The SimpleScalar tool set, version 2.0. ACM SIGARCH Computer Architecture News 25, 3, 13--25.
[11]
Trevor E. Carlson, Wim Heirman, and Lieven Eeckhout. 2011. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulations. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC’11). 52:1--52:12.
[12]
Ik Joon Chang, Debabrata Mohapatra, and Kaushik Roy. 2011. A priority-based 6t/8t hybrid SRAM architecture for aggressive voltage scaling in video applications. IEEE Transactions on Circuits and Systems for Video Technology 21, 2, 101--112.
[13]
Bruce R. Childers, Alex K. Jones, and Daniel Mossé. 2015. A roadmap and plan of action for community-supported empirical evaluation in computer architecture. ACM SIGOPS Operating Systems Review 49, 1, 108--117.
[14]
Leonardo Dagum and Rameshm Enon. 1998. OpenMP: An industry standard API for shared-memory programming. IEEE Computational Science 8 Engineering 5, 1, 46--55.
[15]
M. Dall’Osso, G. Biccari, L. Giovannini, D. Bertozzi, and L. Benini. 2012. Xpipes: A latency insensitive parameterized network-on-chip architecture for multi-processor SoCs. In IEEE 30th International Conference on Computer Design (ICCD’12). 45--48.
[16]
Benoît Dupont de Dinechin, Renaud Ayrignac, Pierre-Edouard Beaucamps, Patrice Couvert, Benoit Ganne, Pierre Guironnet de Massas, Frederique Jacquet, Simon Jones, Nicolas Morey Chaisemartin, Frédéric Riss, and others. 2013. A clustered manycore processor architecture for embedded and accelerated applications. In IEEE High Performance Extreme Computing Conference (HPEC’13). IEEE, 1--6.
[17]
Cesare Ferri, Andrea Marongiu, Benjamin Lipton, R. Iris Bahar, Tali Moreshet, Luca Benini, and Maurice Herlihy. 2011. SoC-TM: Integrated HW/SW support for transactional memory programming on embedded MPSoCs. In Proceedings of the 9th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’11), part of ESWeek’11 7th Embedded Systems Week, Taipei, Taiwan, 9-14 October, 2011. 39--48.
[18]
Christophe Guillon. 2011. Program instrumentation with QEMU. In 1st International QEMU Users Forum, Vol. 1. 15--18.
[19]
Alvaro Gutierrez, Joseph Pusdesris, Ronald G. Dreslinski, Trevor Mudge, Chander Sudanthi, Christopher D. Emmons, Mitchell Hayenga, and Nigel Paver. 2014. Sources of error in full-system simulation. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’14). IEEE, 13--22.
[20]
C. Helmstetter and V. Joloboff. 2008. SimSoC: A SystemC TLM integrated ISS for full system simulation. In IEEE Asia Pacific Conference on Circuits and Systems (APCCAS’08). 1759--1762.
[21]
Maurice Herlihy and J. Eliot B. Moss. 1993. Transactional Memory: Architectural Support for Lock-free Data Structures. Vol. 21. ACM.
[22]
Imperas Software. 2015. OVPSim. Retrieved September 9, 2016 from http://www.ovpworld.org/technology_ovpsim.
[23]
James Jeffers and James Reinders. 2013. Intel Xeon Phi Coprocessor High-performance Programming. Newnes, Boston, MA.
[24]
Kalray. 2015. MPPA 256 - Programmable Manycore Processor. Retrieved September 9, 2016 from www.kalray.eu/products/mppa-manycore/mppa-256.
[25]
Khronos OpenCL Working Group and others. 2008. The OpenCL specification. A. Munshi, ed.
[26]
Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Hallberg, Johan Hogberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner. 2002. Simics: A full system simulation platform. Computer 35, 2, 50--58.
[27]
Hossein Mamaghanian, Nadia Khaled, David Atienza, and Pierre Vandergheynst. 2011. Compressed sensing for real-time energy-efficient ECG compression on wireless body sensor nodes. IEEE Transactions on Biomedical Engineering 58, 9, 2456--2466.
[28]
Andrea Marongiu, Alessandro Capotondi, and Luca Benini. 2016. Controlling {NUMA} effects in embedded manycore applications with lightweight nested parallelism support. Parallel Computing In press.
[29]
Andrea Marongiu, Alessandro Capotondi, Giuseppe Tagliavini, and Luca Benini. 2015. Simplifying many-core-based heterogeneous SoC programming with offload directives. IEEE Transactions on Industrial Informatics 11, 4, 957--967.
[30]
Aline Mello, Isaac Maia, Alain Greiner, and Francois Pecheux. 2010. Parallel simulation of SystemC TLM 2.0 compliant MPSoC on SMP workstations. In Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’10). IEEE, 606--609.
[31]
MentorGraphics. 2015. Vista Virtual Prototyping. (2015). Retrieved September 9, 2016 from https://www.mentor.com/esl/vista/virtual-prototyping/.
[32]
Marius Monton, Antoni Portero, Marc Moreno, Borja Martinez, and Jordi Carrabina. 2007. Mixed SW/SystemC SoC emulation framework. In IEEE International Symposium on Industrial Electronics (ISIE’07). 2338--2341.
[33]
NVIDIA. 2015. NVIDIA Tegra X1. Retrieved September 9, 2016 from http://www.nvidia.com/object/tegra-x1-processor.html.
[34]
NVIDIA Corp. 2015. NVIDIA Tegra X1 Architecture. Retrieved September 9, 2016 from http://international.download.nvidia.com/pdf/tegra/Tegra-X1-whitepaper-v1.0.pdf.
[35]
OpenACC. 2013. The OpenACC Application Programming Interface, Version 2.0. Retrieved September 9, 2016 from http://www.openacc.org/sites/default/files/OpenACC.2.0a_1.pdf.
[36]
OpenMP Architecture Review Board. 2013. OpenMP Application Program Interface Version 4.0. Retrieved September 9, 2016 from http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf.
[37]
OSCI. 2009. Open SystemC Initiative (OSCI) TLM-2.0 LANGUAGE REFERENCE MANUAL. Retrieved September 9, 2016 from http://www.accellera.org/images/downloads/standards/systemc/TLM_2_0_LRM.pdf.
[38]
Dimitra Papagiannopoulou, Tali Moreshet, Andrea Marongiu, Luca Benini, Maurice Herlihy, and R. Iris Bahar. 2014. Speculative synchronization for coherence-free embedded NUMA architectures. In International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV’14). IEEE, 99--106.
[39]
Avadh Patel, Furat Afram, Shunfei Chen, and Kanad Ghose. 2011. MARSS: A full system simulator for multicore x86 CPUs. In Proceedings of the 48th Design Automation Conference. ACM, 1050--1055.
[40]
PEZY. 2015. PEZY-SC Many Core Processor. Retrieved September 9, 2016 from http://www.pezy.co.jp/en/products/pezy-sc.html.
[41]
Christian Pinto, Shivani Raghav, Andrea Marongiu, Martino Ruggiero, David Atienza, and Luca Benini. 2011. GPGPU-accelerated parallel and fast simulation of thousand-core platforms. In Proceedings of the 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID’11). IEEE Computer Society, Washington, DC, 53--62.
[42]
Plurality Ltd. 2010. The hypercore architecture. White paper. Technical report version 1.7.
[43]
J. Power, J. Hestness, M. S. Orr, M. D. Hill, and D. A. Wood. 2015. Gem5-gpu: A heterogeneous CPU-GPU simulator. Computer Architecture Letters 14, 1, 34--36.
[44]
PULP. 2016. PULP - An Open Parallel Ultra-Low-Power Processing-Platform. Retrieved September 9, 2016 from http://iis-projects.ee.ethz.ch/index.php/PULP.
[45]
Shivani Raghav, Andrea Marongiu, Christian Pinto, David Atienza, Martino Ruggiero, and Luca Benini. 2012. Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihosting. In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units (GPGPU’12). ACM, New York, NY, 101--109.
[46]
Shivani Raghav, Andrea Marongiu, Christian Pinto, Martino Ruggiero, David Atienza, and Luca Benini. 2013. SIMinG-1k: A thousand-core simulator running on general-purpose graphical processing units. Concurrency and Computation: Practice and Experience 25, 10, 1443--1461.
[47]
Abbas Rahimi, Daniele Cesarini, Andrea Marongiu, Rajesh K. Gupta, and Luca Benini. 2015. Task scheduling strategies to mitigate hardware variability in embedded shared memory clusters. In Proceedings of the 52nd Annual Design Automation Conference (DAC’15). ACM, New York, NY, Article 152, 152:1--152:6 pages.
[48]
Abbas Rahimi, Igor Loi, Mohammad Reza Kakoee, and Luca Benini. 2011. A fully-synthesizable single-cycle interconnection network for shared-L1 processor clusters. In Design, Automation Test in Europe Conference Exhibition (DATE’11). 1--6.
[49]
Davide Rossi, Igor Loi, Germain Haugou, and Luca Benini. 2014. Ultra-low-latency lightweight DMA for tightly coupled multi-core clusters. In Proceedings of the 11th ACM Conference on Computing Frontiers. ACM, 15.
[50]
Synopsys. 2015. Platform Architect. Retrieved September 9, 2016 from http://www.synopsys.com/Prototyping/ArchitectureDesign/pages/platform-architect.aspx.
[51]
Texas Instruments. 2013. Multicore DSP+ARM KeyStone II System-on-Chip (SoC). Retrieved September 9, 2016 from http://www.ti.com/lit/ds/sprs866e/sprs866e.pdf.
[52]
Rafael Ubal, Byunghyun Jang, Perhaad Mistry, Dana Schaa, and David Kaeli. 2012. Multi2Sim: A simulation framework for CPU-GPU computing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. ACM, 335--344.
[53]
David Wang, Brinda Ganesh, Nuengwong Tuaycharoen, Kathleen Baynes, Aamer Jaleel, and Bruce Jacob. 2005. DRAMsim: A memory system simulator. ACM SIGARCH Computer Architecture News 33, 4, 100--107.
[54]
Wind River. 2015. Simics Full System Simulator. Retrieved September 9, 2016 from http://www.windriver.com/products/simics.
[55]
Matt T. Yourst. 2007. PTLsim: A cycle accurate full system x86-64 microarchitectural simulator. In IEEE International Symposium on Performance Analysis of Systems 8 Software (ISPASS’07). IEEE, 23--34.

Cited By

View all
  • (2019)A Low-Latency and Flexible TDM NoC for Strong Isolation in Security-Critical Systems2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC.2019.00029(149-156)Online publication date: Oct-2019
  • (2019)An Interconnect-Centric Approach to the Flexible Partitioning and Isolation of Many-Core Accelerators for Fog Computing2019 XXXIV Conference on Design of Circuits and Integrated Systems (DCIS)10.1109/DCIS201949030.2019.8959943(1-6)Online publication date: Nov-2019
  • (2018)Unleashing Fine-Grained Parallelism on Embedded Many-Core Accelerators with Lightweight OpenMP TaskingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.281460229:9(2150-2163)Online publication date: 1-Sep-2018
  • Show More Cited By

Index Terms

  1. VirtualSoC: A Research Tool for Modern MPSoCs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 16, Issue 1
    Special Issue on VIPES, Special Issue on ICESS2015 and Regular Papers
    February 2017
    602 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/3008024
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 13 October 2016
    Accepted: 01 April 2016
    Revised: 01 March 2016
    Received: 01 October 2015
    Published in TECS Volume 16, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. SystemC modeling
    2. Virtual platforms
    3. accuracy
    4. full-system simulation
    5. manycore accelerators

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • MULTITHERMAN
    • P-SOCRATES

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)A Low-Latency and Flexible TDM NoC for Strong Isolation in Security-Critical Systems2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC.2019.00029(149-156)Online publication date: Oct-2019
    • (2019)An Interconnect-Centric Approach to the Flexible Partitioning and Isolation of Many-Core Accelerators for Fog Computing2019 XXXIV Conference on Design of Circuits and Integrated Systems (DCIS)10.1109/DCIS201949030.2019.8959943(1-6)Online publication date: Nov-2019
    • (2018)Unleashing Fine-Grained Parallelism on Embedded Many-Core Accelerators with Lightweight OpenMP TaskingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.281460229:9(2150-2163)Online publication date: 1-Sep-2018
    • (2017)Manycore simulation for peta-scale system design: Motivation, tools, challenges and prospectsSimulation Modelling Practice and Theory10.1016/j.simpat.2016.12.01472(168-201)Online publication date: Mar-2017

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media