Abstract
The emergence of many-core processors raises novel demands to system design. Power-limitations and abundant parallelism require for efficient and scalable run-time management. The integration of dedicated hardware to enhance the performance of the run-time management system is gaining an increasing importance. But the design of a run-time manager for many-core generally suffers from exhaustive evaluation time. Previous works do not address for the required flexibility or do not address for reasonable evaluation time of the simulation framework. We propose the novel simulation framework Agamid to foster the development and evaluation of hardware enhanced run-time management for many-core. Our transaction-level framework performs design point evaluation of hardware enhanced run-time management for many-core at the timescale of seconds. We use a hybrid simulation approach considering the run-time management and the user application at different levels of abstraction. The framework provides a generic run-time manager to compare arbitrary management systems and HW/SW partitionings. The implementation of the run-time manager facilitates direct execution at the host machine and a detailed synchronization model. Agamid applies user application workloads by means of transaction-based task graphs. An extendable system-call interface allows arbitrary interaction between the user application and the run-time management system. The thorough calibration of the RTM timing model enables reasonable approximations of the management overhead. Our evaluation considers the accuracy, wall-time and design space exploration capabilities of Agamid. Our findings substantiate the usefulness to integrate the modeling of the run-time management, hardware architecture and user application into a single transaction-level framework.
Similar content being viewed by others
References
Ahn JH, Li S, Seongil O, Jouppi NP (2013) Mcsima+: a manycore simulator with application-level+ simulation and detailed microarchitecture modeling. In: 2013 IEEE international symposium on performance analysis of systems and software (ISPASS), IEEE, pp 74–85
Bergamaschi R, Nair I, Dittmann G, Patel H, Janssen G, Dhanwada N, Buyuktosunoglu A, Acar E, Nam GJ, Kucar D, et al (2007) Performance modeling for early analysis of multi-core systems. In: Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis, ACM, pp 209–214
Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S et al (2011) The gem5 simulator. ACM SIGARCH Comput Arch News 39(2):1–7
Cai L, Gajski D (2003) Transaction level modeling: an overview. In: Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, ACM, pp 19–24
Cain HW, Lepak KM, Schwartz BA, Lipasti MH (2002) Precise and accurate processor simulation. In: Workshop on computer architecture evaluation using commercial workloads, HPCA, vol 8
Carvalho E, Calazans N, Moraes F (2007) Heuristics for dynamic task mapping in noc-based heterogeneous MPSOCS. In: 18th IEEE/IFIP international workshop on rapid system prototyping, 2007. RSP 2007, IEEE, pp 34–40
Cho S, Demetriades S, Evans S, Jin L, Lee H, Lee K, Moeng M (2008) TPTS: a novel framework for very fast manycore processor architecture simulation. In: 37th international conference on parallel processing, ICPP’08, IEEE, pp 446–453
Cosnard M, Loi M (1995) Automatic task graph generation techniques. In: Proceedings of the Twenty-Eighth Hawaii international conference on system sciences, IEEE, vol 2, pp 113–122
Dick RP, Rhodes DL, Wolf W (1998) TGFF: task graphs for free. In: Proceedings of the 6th international workshop on Hardware/software codesign, IEEE Computer Society, pp. 97–101
Esmaeilzadeh H, Blem E, Amant RS, Sankaralingam K, Burger D (2011) Dark silicon and the end of multicore scaling. In: 38th annual international symposium on computer architecture (ISCA), IEEE, pp. 365–376
Fraboulet A, Risset T, Scherrer A (2004) Cycle accurate simulation model generation for soc prototyping. In: International workshop on embedded computer systems, Springer, pp. 453–462
Gailliard G (2010) Towards a common hardware/software specification and implementation approach for distributed, rel time and embedded systems, based on middlewares and object-oriented components. Ph.D. thesis, Université de Cergy Pontoise
Gerstlauer A, Haubelt C, Pimentel AD, Stefanov TP, Gajski DD, Teich J (2009) Electronic system-level synthesis methodologies. IEEE Trans Comput Aided Des Integr Circuits Syst 28(10):1517–1530
Girkar M, Polychronopoulos CD (1994) The hierarchical task graph as a universal intermediate representation. Int J Parallel Program 22(5):519–551
Grama A (2003) Introduction to parallel computing. Pearson Education, London
Gregorek D, Garcia-Ortiz A (2014) A transaction-level framework for design-space exploration of hardware-enhanced operating systems. In: International symposium on system-on-chip (SOC 2014). IEEE
Gregorek D, Garcia-Ortiz A (2015) The DRACON embedded many-core: hardware-enhanced run-time management using a network of dedicated control nodes. In: International symposium on VLSI (ISVLSI)
Gregorek D, Schmidt R, García-Ortiz A (2015) Transaction level analysis for a clustered and hardware-enhanced task manager on homogeneous many-core systems. In: HIP3ES. arXiv:1502.02852
Grötker T, Liao S, Martin G, Swan S (2002) System design with systemC\(^{{\rm TM}}\). Springer, Berlin
Gupta N, Mandal S, Malave J, Mandal A, Mahapatra R (2010) A hardware scheduler for real time multiprocessor system on chip. In: 23rd international conference on VLSI design, 2010. VLSID’10, IEEE, pp 264–269
Haririan P, Garcia-Ortiz A (2014) Non-intrusive DVFS emulation in GEM5 with application to self-aware architectures. In: 2014 9th international symposium on reconfigurable and communication-centric systems-on-chip (ReCoSoC), IEEE, pp 1–7
IEEE Design Automation Standards Committee (2011) IEEE std 1666-2011, IEEE standard for standard systemc\(\textregistered \) language reference manual
Keutzer K, Rabaey JM, Sangiovanni-Vincentelli A et al (2000) System-level design: orthogonalization of concerns and platform-based design. IEEE Trans Comput Aided Des Integr Circuits Syst 19(12):1523–1543
Kinsy MA, Pellauer M, Devadas S (2013) Heracles: a tool for fast RTL-based design space exploration of multicore processors. In: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays, ACM, pp. 125–134
Kuz I, Anderson Z, Shinde P, Roscoe T (2011) Multicore os benchmarks: we can do better. In: Proceedings of the 13th USENIX conference on Hot topics in operating systems, USENIX Association, pp 10
Kwok YK, Ahmad I (1999) Benchmarking and comparison of the task graph scheduling algorithms. J Parallel Distrib Comput 59(3):381–422
Lee J, Nicopoulos C, Lee HG, Panth S, Lim SK, Kim J (2013) Isonet: hardware-based job queue management for many-core architectures. IEEE Trans Very Large Scale Integr (VLSI) Syst 21(6):1080–1093. https://doi.org/10.1109/TVLSI.2012.2202699
Leupers R, Temam O (2010) Processor and system-on-chip simulation. Springer, Berlin
Lindh L (1991) Fastchart-a fast time deterministic CPU and hardware based real-time-kernel. In: Proceedings of Euromicro’91 workshop on real time systems, IEEE, pp 36–40
Liu W, Xu J, Wu X, Ye Y, Wang X, Zhang W, Nikdast M, Wang Z (2011) A NOC traffic suite based on real applications. In: IEEE computer society annual symposium on VLSI (ISVLSI), IEEE, pp 66–71
Luk CK, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. In: ACM sigplan notices, ACM, vol 40, pp 190–200
Mariani G, Palermo G, Zaccaria V, Silvano C (2012) Evaluating run-time resource management policies for multi-core embedded platforms with the EMME evaluation framework. In: ARCS workshops (ARCS), IEEE, pp 1–6
Miller JE, Kasture H, Kurian G, Gruenwald III C, Beckmann N, Celio C, Eastep J, Agarwal A (2010) Graphite: a distributed parallel simulator for multicores. In: 2010 IEEE 16th international symposium on high performance computer architecture (HPCA), IEEE, pp 1–12
Nollet V, Verkest D, Corporaal H (2010) A safari through the MPSOC run-time management jungle. J Signal Process Syst 60(2):251–268
Perez JM, Badia RM, Labarta J (2008) A dependency-aware task-based programming environment for multi-core architectures. In: 2008 IEEE international conference on cluster computing, IEEE, pp 142–151
Podobas A, Brorsson M (2010) A comparison of some recent task-based parallel programming models. In: MULTIPROG’2010, Jan 2010, Pisa
Rhoads S (2006) Plasma-most MIPS i (tm) opcodes: overview. Internet: http://opencores.org/project, plasma, 2 May 2012
Rosenblum M, Herrod S, Witchel E, Gupta A et al (1995) Complete computer system simulation: the simos approach. IEEE Parallel Distrib Technol Syst Appl 3(4):34–43
Sanchez D, Kozyrakis C (2013) ZSIM: fast and accurate microarchitectural simulation of thousand-core systems. In: ACM SIGARCH computer architecture news, ACM, vol 41, pp 475–486
Sinnen O (2007) Task scheduling for parallel systems, vol 60. Wiley, New York
Tobita T, Kasahara H (2002) A standard task graph set for fair evaluation of multiprocessor scheduling algorithms. J Sched 5(5):379–394
Wawrzynek J, Patterson D, Oskin M, Lu SL, Kozyrakis C, Hoe JC, Chiou D, Asanović K (2007) Ramp: research accelerator for multiple processors. IEEE Micro 2:46–57
Weichslgartner A, Heisswolf J, Zaib A, Wild T, Herkersdorf A, Becker J, Teich J (2015) Position paper: towards hardware-assisted decentralized mapping of applications for heterogeneous NOC architectures. In: ARCS 2015-The 28th international conference on proceedings of architecture of computing systems, VDE, pp 1–4
Wenisch TF, Wunderlich RE, Ferdman M, Ailamaki A, Falsafi B, Hoe JC (2006) Simflex: statistical sampling of computer system simulation. IEEE MICRO Spec Issue Comput Arch Simul Model 26(PARSA–ARTICLE–2007–001):19–31
Wild T, Herkersdorf A, Lee GY (2005) TAPES—trace-based architecture performance evaluation with systemc. Des Autom Embed Syst 10(2–3):157–179
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gregorek, D., Garcia-Ortiz, A. The Agamid design-space exploration framework. Des Autom Embed Syst 22, 293–314 (2018). https://doi.org/10.1007/s10617-018-9214-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10617-018-9214-3