article

The Agamid design-space exploration framework

Authors:

Daniel Gregorek,

Alberto Garcia-OrtizAuthors Info & Claims

Design Automation for Embedded Systems, Volume 22, Issue 4

Pages 293 - 314

https://doi.org/10.1007/s10617-018-9214-3

Published: 01 December 2018 Publication History

Abstract

The emergence of many-core processors raises novel demands to system design. Power-limitations and abundant parallelism require for efficient and scalable run-time management. The integration of dedicated hardware to enhance the performance of the run-time management system is gaining an increasing importance. But the design of a run-time manager for many-core generally suffers from exhaustive evaluation time. Previous works do not address for the required flexibility or do not address for reasonable evaluation time of the simulation framework. We propose the novel simulation framework Agamid to foster the development and evaluation of hardware enhanced run-time management for many-core. Our transaction-level framework performs design point evaluation of hardware enhanced run-time management for many-core at the timescale of seconds. We use a hybrid simulation approach considering the run-time management and the user application at different levels of abstraction. The framework provides a generic run-time manager to compare arbitrary management systems and HW/SW partitionings. The implementation of the run-time manager facilitates direct execution at the host machine and a detailed synchronization model. Agamid applies user application workloads by means of transaction-based task graphs. An extendable system-call interface allows arbitrary interaction between the user application and the run-time management system. The thorough calibration of the RTM timing model enables reasonable approximations of the management overhead. Our evaluation considers the accuracy, wall-time and design space exploration capabilities of Agamid. Our findings substantiate the usefulness to integrate the modeling of the run-time management, hardware architecture and user application into a single transaction-level framework.

References

[1]

Ahn JH, Li S, Seongil O, Jouppi NP (2013) Mcsima+: a manycore simulator with application-level+ simulation and detailed microarchitecture modeling. In: 2013 IEEE international symposium on performance analysis of systems and software (ISPASS), IEEE, pp 74---85

[2]

Bergamaschi R, Nair I, Dittmann G, Patel H, Janssen G, Dhanwada N, Buyuktosunoglu A, Acar E, Nam GJ, Kucar D, et al (2007) Performance modeling for early analysis of multi-core systems. In: Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis, ACM, pp 209---214

Digital Library

[3]

Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S et al (2011) The gem5 simulator. ACM SIGARCH Comput Arch News 39(2):1---7

Digital Library

[4]

Cai L, Gajski D (2003) Transaction level modeling: an overview. In: Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, ACM, pp 19---24

Digital Library

[5]

Cain HW, Lepak KM, Schwartz BA, Lipasti MH (2002) Precise and accurate processor simulation. In: Workshop on computer architecture evaluation using commercial workloads, HPCA, vol 8

[6]

Carvalho E, Calazans N, Moraes F (2007) Heuristics for dynamic task mapping in noc-based heterogeneous MPSOCS. In: 18th IEEE/IFIP international workshop on rapid system prototyping, 2007. RSP 2007, IEEE, pp 34---40

Digital Library

[7]

Cho S, Demetriades S, Evans S, Jin L, Lee H, Lee K, Moeng M (2008) TPTS: a novel framework for very fast manycore processor architecture simulation. In: 37th international conference on parallel processing, ICPP'08, IEEE, pp 446---453

Digital Library

[8]

Cosnard M, Loi M (1995) Automatic task graph generation techniques. In: Proceedings of the Twenty-Eighth Hawaii international conference on system sciences, IEEE, vol 2, pp 113---122

Digital Library

[9]

Dick RP, Rhodes DL, Wolf W (1998) TGFF: task graphs for free. In: Proceedings of the 6th international workshop on Hardware/software codesign, IEEE Computer Society, pp. 97---101

Digital Library

[10]

Esmaeilzadeh H, Blem E, Amant RS, Sankaralingam K, Burger D (2011) Dark silicon and the end of multicore scaling. In: 38th annual international symposium on computer architecture (ISCA), IEEE, pp. 365---376

Digital Library

[11]

Fraboulet A, Risset T, Scherrer A (2004) Cycle accurate simulation model generation for soc prototyping. In: International workshop on embedded computer systems, Springer, pp. 453---462

[12]

Gailliard G (2010) Towards a common hardware/software specification and implementation approach for distributed, rel time and embedded systems, based on middlewares and object-oriented components. Ph.D. thesis, Université de Cergy Pontoise

[13]

Gerstlauer A, Haubelt C, Pimentel AD, Stefanov TP, Gajski DD, Teich J (2009) Electronic system-level synthesis methodologies. IEEE Trans Comput Aided Des Integr Circuits Syst 28(10):1517---1530

Digital Library

[14]

Girkar M, Polychronopoulos CD (1994) The hierarchical task graph as a universal intermediate representation. Int J Parallel Program 22(5):519---551

Digital Library

[15]

Grama A (2003) Introduction to parallel computing. Pearson Education, London

[16]

Gregorek D, Garcia-Ortiz A (2014) A transaction-level framework for design-space exploration of hardware-enhanced operating systems. In: International symposium on system-on-chip (SOC 2014). IEEE

[17]

Gregorek D, Garcia-Ortiz A (2015) The DRACON embedded many-core: hardware-enhanced run-time management using a network of dedicated control nodes. In: International symposium on VLSI (ISVLSI)

[18]

Gregorek D, Schmidt R, García-Ortiz A (2015) Transaction level analysis for a clustered and hardware-enhanced task manager on homogeneous many-core systems. In: HIP3ES. arXiv:1502.02852

[19]

Grötker T, Liao S, Martin G, Swan S (2002) System design with systemC$$^{{\rm TM}}$$TM. Springer, Berlin

Digital Library

[20]

Gupta N, Mandal S, Malave J, Mandal A, Mahapatra R (2010) A hardware scheduler for real time multiprocessor system on chip. In: 23rd international conference on VLSI design, 2010. VLSID'10, IEEE, pp 264---269

Digital Library

[21]

Haririan P, Garcia-Ortiz A (2014) Non-intrusive DVFS emulation in GEM5 with application to self-aware architectures. In: 2014 9th international symposium on reconfigurable and communication-centric systems-on-chip (ReCoSoC), IEEE, pp 1---7

[22]

IEEE Design Automation Standards Committee (2011) IEEE std 1666-2011, IEEE standard for standard systemc$$\textregistered $$® language reference manual

[23]

Keutzer K, Rabaey JM, Sangiovanni-Vincentelli A et al (2000) System-level design: orthogonalization of concerns and platform-based design. IEEE Trans Comput Aided Des Integr Circuits Syst 19(12):1523---1543

Digital Library

[24]

Kinsy MA, Pellauer M, Devadas S (2013) Heracles: a tool for fast RTL-based design space exploration of multicore processors. In: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays, ACM, pp. 125---134

Digital Library

[25]

Kuz I, Anderson Z, Shinde P, Roscoe T (2011) Multicore os benchmarks: we can do better. In: Proceedings of the 13th USENIX conference on Hot topics in operating systems, USENIX Association, pp 10

Digital Library

[26]

Kwok YK, Ahmad I (1999) Benchmarking and comparison of the task graph scheduling algorithms. J Parallel Distrib Comput 59(3):381---422

Digital Library

[27]

Lee J, Nicopoulos C, Lee HG, Panth S, Lim SK, Kim J (2013) Isonet: hardware-based job queue management for many-core architectures. IEEE Trans Very Large Scale Integr (VLSI) Syst 21(6):1080---1093.

Digital Library

[28]

Leupers R, Temam O (2010) Processor and system-on-chip simulation. Springer, Berlin

Digital Library

[29]

Lindh L (1991) Fastchart-a fast time deterministic CPU and hardware based real-time-kernel. In: Proceedings of Euromicro'91 workshop on real time systems, IEEE, pp 36---40

[30]

Liu W, Xu J, Wu X, Ye Y, Wang X, Zhang W, Nikdast M, Wang Z (2011) A NOC traffic suite based on real applications. In: IEEE computer society annual symposium on VLSI (ISVLSI), IEEE, pp 66---71

Digital Library

[31]

Luk CK, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. In: ACM sigplan notices, ACM, vol 40, pp 190---200

Digital Library

[32]

Mariani G, Palermo G, Zaccaria V, Silvano C (2012) Evaluating run-time resource management policies for multi-core embedded platforms with the EMME evaluation framework. In: ARCS workshops (ARCS), IEEE, pp 1---6

[33]

Miller JE, Kasture H, Kurian G, Gruenwald III C, Beckmann N, Celio C, Eastep J, Agarwal A (2010) Graphite: a distributed parallel simulator for multicores. In: 2010 IEEE 16th international symposium on high performance computer architecture (HPCA), IEEE, pp 1---12

Digital Library

[34]

Nollet V, Verkest D, Corporaal H (2010) A safari through the MPSOC run-time management jungle. J Signal Process Syst 60(2):251---268

Digital Library

[35]

Perez JM, Badia RM, Labarta J (2008) A dependency-aware task-based programming environment for multi-core architectures. In: 2008 IEEE international conference on cluster computing, IEEE, pp 142---151

[36]

Podobas A, Brorsson M (2010) A comparison of some recent task-based parallel programming models. In: MULTIPROG'2010, Jan 2010, Pisa

[37]

Rhoads S (2006) Plasma-most MIPS i (tm) opcodes: overview. Internet: http://opencores.org/project, plasma, 2 May 2012

[38]

Rosenblum M, Herrod S, Witchel E, Gupta A et al (1995) Complete computer system simulation: the simos approach. IEEE Parallel Distrib Technol Syst Appl 3(4):34---43

Digital Library

[39]

Sanchez D, Kozyrakis C (2013) ZSIM: fast and accurate microarchitectural simulation of thousand-core systems. In: ACM SIGARCH computer architecture news, ACM, vol 41, pp 475---486

Digital Library

[40]

Sinnen O (2007) Task scheduling for parallel systems, vol 60. Wiley, New York

Digital Library

[41]

Tobita T, Kasahara H (2002) A standard task graph set for fair evaluation of multiprocessor scheduling algorithms. J Sched 5(5):379---394

[42]

Wawrzynek J, Patterson D, Oskin M, Lu SL, Kozyrakis C, Hoe JC, Chiou D, Asanović K (2007) Ramp: research accelerator for multiple processors. IEEE Micro 2:46---57

Digital Library

[43]

Weichslgartner A, Heisswolf J, Zaib A, Wild T, Herkersdorf A, Becker J, Teich J (2015) Position paper: towards hardware-assisted decentralized mapping of applications for heterogeneous NOC architectures. In: ARCS 2015-The 28th international conference on proceedings of architecture of computing systems, VDE, pp 1---4

[44]

Wenisch TF, Wunderlich RE, Ferdman M, Ailamaki A, Falsafi B, Hoe JC (2006) Simflex: statistical sampling of computer system simulation. IEEE MICRO Spec Issue Comput Arch Simul Model 26(PARSA---ARTICLE---2007---001):19---31

Digital Library

[45]

Wild T, Herkersdorf A, Lee GY (2005) TAPES--trace-based architecture performance evaluation with systemc. Des Autom Embed Syst 10(2---3):157---179

Digital Library

Recommendations

Run-time resource management based on design space exploration
CODES+ISSS '12: Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

A main challenge in today's embedded system design is to find the perfect balance between performance and power consumption. This paper presents a run-time resource management framework for embedded heterogeneous multi-core platforms. It allows dynamic ...
Design space exploration using arithmetic-level hardware--software cosimulation for configurable multiprocessor platforms

Configurable multiprocessor platforms consist of multiple soft processors configured on FPGA devices. They have become an attractive choice for implementing many computing applications. In addition to the various ways of distributing software execution ...
Enhancing Design Space Exploration by Extending CPU/GPU Specifications onto FPGAs

The design cycle for complex special-purpose computing systems is extremely costly and time-consuming. It involves a multiparametric design space exploration for optimization, followed by design verification. Designers of special purpose VLSI ...

Comments

Information & Contributors

Information

Published In

cover image Design Automation for Embedded Systems

Design Automation for Embedded Systems Volume 22, Issue 4

December 2018

54 pages

ISSN:0929-5585

Issue’s Table of Contents

Copyright © Copyright © 2018 Springer Science+Business Media, LLC, part of Springer Nature.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 December 2018

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents