Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

The Agamid design-space exploration framework

Published: 01 December 2018 Publication History

Abstract

The emergence of many-core processors raises novel demands to system design. Power-limitations and abundant parallelism require for efficient and scalable run-time management. The integration of dedicated hardware to enhance the performance of the run-time management system is gaining an increasing importance. But the design of a run-time manager for many-core generally suffers from exhaustive evaluation time. Previous works do not address for the required flexibility or do not address for reasonable evaluation time of the simulation framework. We propose the novel simulation framework Agamid to foster the development and evaluation of hardware enhanced run-time management for many-core. Our transaction-level framework performs design point evaluation of hardware enhanced run-time management for many-core at the timescale of seconds. We use a hybrid simulation approach considering the run-time management and the user application at different levels of abstraction. The framework provides a generic run-time manager to compare arbitrary management systems and HW/SW partitionings. The implementation of the run-time manager facilitates direct execution at the host machine and a detailed synchronization model. Agamid applies user application workloads by means of transaction-based task graphs. An extendable system-call interface allows arbitrary interaction between the user application and the run-time management system. The thorough calibration of the RTM timing model enables reasonable approximations of the management overhead. Our evaluation considers the accuracy, wall-time and design space exploration capabilities of Agamid. Our findings substantiate the usefulness to integrate the modeling of the run-time management, hardware architecture and user application into a single transaction-level framework.

References

[1]
Ahn JH, Li S, Seongil O, Jouppi NP (2013) Mcsima+: a manycore simulator with application-level+ simulation and detailed microarchitecture modeling. In: 2013 IEEE international symposium on performance analysis of systems and software (ISPASS), IEEE, pp 74---85
[2]
Bergamaschi R, Nair I, Dittmann G, Patel H, Janssen G, Dhanwada N, Buyuktosunoglu A, Acar E, Nam GJ, Kucar D, et al (2007) Performance modeling for early analysis of multi-core systems. In: Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis, ACM, pp 209---214
[3]
Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S et al (2011) The gem5 simulator. ACM SIGARCH Comput Arch News 39(2):1---7
[4]
Cai L, Gajski D (2003) Transaction level modeling: an overview. In: Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, ACM, pp 19---24
[5]
Cain HW, Lepak KM, Schwartz BA, Lipasti MH (2002) Precise and accurate processor simulation. In: Workshop on computer architecture evaluation using commercial workloads, HPCA, vol 8
[6]
Carvalho E, Calazans N, Moraes F (2007) Heuristics for dynamic task mapping in noc-based heterogeneous MPSOCS. In: 18th IEEE/IFIP international workshop on rapid system prototyping, 2007. RSP 2007, IEEE, pp 34---40
[7]
Cho S, Demetriades S, Evans S, Jin L, Lee H, Lee K, Moeng M (2008) TPTS: a novel framework for very fast manycore processor architecture simulation. In: 37th international conference on parallel processing, ICPP'08, IEEE, pp 446---453
[8]
Cosnard M, Loi M (1995) Automatic task graph generation techniques. In: Proceedings of the Twenty-Eighth Hawaii international conference on system sciences, IEEE, vol 2, pp 113---122
[9]
Dick RP, Rhodes DL, Wolf W (1998) TGFF: task graphs for free. In: Proceedings of the 6th international workshop on Hardware/software codesign, IEEE Computer Society, pp. 97---101
[10]
Esmaeilzadeh H, Blem E, Amant RS, Sankaralingam K, Burger D (2011) Dark silicon and the end of multicore scaling. In: 38th annual international symposium on computer architecture (ISCA), IEEE, pp. 365---376
[11]
Fraboulet A, Risset T, Scherrer A (2004) Cycle accurate simulation model generation for soc prototyping. In: International workshop on embedded computer systems, Springer, pp. 453---462
[12]
Gailliard G (2010) Towards a common hardware/software specification and implementation approach for distributed, rel time and embedded systems, based on middlewares and object-oriented components. Ph.D. thesis, Université de Cergy Pontoise
[13]
Gerstlauer A, Haubelt C, Pimentel AD, Stefanov TP, Gajski DD, Teich J (2009) Electronic system-level synthesis methodologies. IEEE Trans Comput Aided Des Integr Circuits Syst 28(10):1517---1530
[14]
Girkar M, Polychronopoulos CD (1994) The hierarchical task graph as a universal intermediate representation. Int J Parallel Program 22(5):519---551
[15]
Grama A (2003) Introduction to parallel computing. Pearson Education, London
[16]
Gregorek D, Garcia-Ortiz A (2014) A transaction-level framework for design-space exploration of hardware-enhanced operating systems. In: International symposium on system-on-chip (SOC 2014). IEEE
[17]
Gregorek D, Garcia-Ortiz A (2015) The DRACON embedded many-core: hardware-enhanced run-time management using a network of dedicated control nodes. In: International symposium on VLSI (ISVLSI)
[18]
Gregorek D, Schmidt R, García-Ortiz A (2015) Transaction level analysis for a clustered and hardware-enhanced task manager on homogeneous many-core systems. In: HIP3ES. arXiv:1502.02852
[19]
Grötker T, Liao S, Martin G, Swan S (2002) System design with systemC$$^{{\rm TM}}$$TM. Springer, Berlin
[20]
Gupta N, Mandal S, Malave J, Mandal A, Mahapatra R (2010) A hardware scheduler for real time multiprocessor system on chip. In: 23rd international conference on VLSI design, 2010. VLSID'10, IEEE, pp 264---269
[21]
Haririan P, Garcia-Ortiz A (2014) Non-intrusive DVFS emulation in GEM5 with application to self-aware architectures. In: 2014 9th international symposium on reconfigurable and communication-centric systems-on-chip (ReCoSoC), IEEE, pp 1---7
[22]
IEEE Design Automation Standards Committee (2011) IEEE std 1666-2011, IEEE standard for standard systemc$$\textregistered $$® language reference manual
[23]
Keutzer K, Rabaey JM, Sangiovanni-Vincentelli A et al (2000) System-level design: orthogonalization of concerns and platform-based design. IEEE Trans Comput Aided Des Integr Circuits Syst 19(12):1523---1543
[24]
Kinsy MA, Pellauer M, Devadas S (2013) Heracles: a tool for fast RTL-based design space exploration of multicore processors. In: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays, ACM, pp. 125---134
[25]
Kuz I, Anderson Z, Shinde P, Roscoe T (2011) Multicore os benchmarks: we can do better. In: Proceedings of the 13th USENIX conference on Hot topics in operating systems, USENIX Association, pp 10
[26]
Kwok YK, Ahmad I (1999) Benchmarking and comparison of the task graph scheduling algorithms. J Parallel Distrib Comput 59(3):381---422
[27]
Lee J, Nicopoulos C, Lee HG, Panth S, Lim SK, Kim J (2013) Isonet: hardware-based job queue management for many-core architectures. IEEE Trans Very Large Scale Integr (VLSI) Syst 21(6):1080---1093.
[28]
Leupers R, Temam O (2010) Processor and system-on-chip simulation. Springer, Berlin
[29]
Lindh L (1991) Fastchart-a fast time deterministic CPU and hardware based real-time-kernel. In: Proceedings of Euromicro'91 workshop on real time systems, IEEE, pp 36---40
[30]
Liu W, Xu J, Wu X, Ye Y, Wang X, Zhang W, Nikdast M, Wang Z (2011) A NOC traffic suite based on real applications. In: IEEE computer society annual symposium on VLSI (ISVLSI), IEEE, pp 66---71
[31]
Luk CK, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. In: ACM sigplan notices, ACM, vol 40, pp 190---200
[32]
Mariani G, Palermo G, Zaccaria V, Silvano C (2012) Evaluating run-time resource management policies for multi-core embedded platforms with the EMME evaluation framework. In: ARCS workshops (ARCS), IEEE, pp 1---6
[33]
Miller JE, Kasture H, Kurian G, Gruenwald III C, Beckmann N, Celio C, Eastep J, Agarwal A (2010) Graphite: a distributed parallel simulator for multicores. In: 2010 IEEE 16th international symposium on high performance computer architecture (HPCA), IEEE, pp 1---12
[34]
Nollet V, Verkest D, Corporaal H (2010) A safari through the MPSOC run-time management jungle. J Signal Process Syst 60(2):251---268
[35]
Perez JM, Badia RM, Labarta J (2008) A dependency-aware task-based programming environment for multi-core architectures. In: 2008 IEEE international conference on cluster computing, IEEE, pp 142---151
[36]
Podobas A, Brorsson M (2010) A comparison of some recent task-based parallel programming models. In: MULTIPROG'2010, Jan 2010, Pisa
[37]
Rhoads S (2006) Plasma-most MIPS i (tm) opcodes: overview. Internet: http://opencores.org/project, plasma, 2 May 2012
[38]
Rosenblum M, Herrod S, Witchel E, Gupta A et al (1995) Complete computer system simulation: the simos approach. IEEE Parallel Distrib Technol Syst Appl 3(4):34---43
[39]
Sanchez D, Kozyrakis C (2013) ZSIM: fast and accurate microarchitectural simulation of thousand-core systems. In: ACM SIGARCH computer architecture news, ACM, vol 41, pp 475---486
[40]
Sinnen O (2007) Task scheduling for parallel systems, vol 60. Wiley, New York
[41]
Tobita T, Kasahara H (2002) A standard task graph set for fair evaluation of multiprocessor scheduling algorithms. J Sched 5(5):379---394
[42]
Wawrzynek J, Patterson D, Oskin M, Lu SL, Kozyrakis C, Hoe JC, Chiou D, Asanović K (2007) Ramp: research accelerator for multiple processors. IEEE Micro 2:46---57
[43]
Weichslgartner A, Heisswolf J, Zaib A, Wild T, Herkersdorf A, Becker J, Teich J (2015) Position paper: towards hardware-assisted decentralized mapping of applications for heterogeneous NOC architectures. In: ARCS 2015-The 28th international conference on proceedings of architecture of computing systems, VDE, pp 1---4
[44]
Wenisch TF, Wunderlich RE, Ferdman M, Ailamaki A, Falsafi B, Hoe JC (2006) Simflex: statistical sampling of computer system simulation. IEEE MICRO Spec Issue Comput Arch Simul Model 26(PARSA---ARTICLE---2007---001):19---31
[45]
Wild T, Herkersdorf A, Lee GY (2005) TAPES--trace-based architecture performance evaluation with systemc. Des Autom Embed Syst 10(2---3):157---179

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Design Automation for Embedded Systems
Design Automation for Embedded Systems  Volume 22, Issue 4
December 2018
54 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 December 2018

Author Tags

  1. Dedicated hardware
  2. Design space exploration
  3. Many-core
  4. Run-time management

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Sep 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media