research-article

Runtime Resource Allocation for Software Pipelines

Authors:

Janmartin Jahn,

Santiago Pagani,

Sebastian Kobbe,

Jörg HenkelAuthors Info & Claims

ACM Transactions on Parallel Computing (TOPC), Volume 2, Issue 1

Article No.: 5, Pages 1 - 23

https://doi.org/10.1145/2742347

Published: 21 May 2015 Publication History

Abstract

Efficiently allocating the computational resources of many-core systems is one of the most prominent challenges, especially when resource requirements may vary unpredictably at runtime. This is even more challenging when facing unreliable cores—a scenario that becomes common as the number of cores increases and integration sizes shrink.

To address this challenge, this article presents an optimal method for the allocation of the resources to software-pipelined applications. Here we show how runtime observations of the resource requirements of tasks can be used to adapt resource allocations. Furthermore, we show how the optimum can be traded for a high degree of scalability by clustering applications in a distributed, hierarchical manner. To diminish the negative effects of unreliable cores, this article shows how self-organization can effectively restore the integrity of such a hierarchy when it is corrupted by a failing core. Experiments on Intel’s 48-core Single-Chip Cloud Computer and in a many-core simulator show that a significant improvement in system throughput can be achieved over the current state of the art.

References

[1]

Pedram Azad, Tilo Gockel, and Rudiger Dillmann. 2008. Computer Vision: Principles and Practice. Elektor.

[2]

Jacques M. Bahi, Sylvain Contassot-Vivier, and Raphael Couturier. 2005. Dynamic load balancing and efficient load estimators for asynchronous iterative algorithms. IEEE Transactions on Parallel and Distributed Systems 16, 4, 289--299.

Digital Library

[3]

Mohamed A. Bamakharma and Todor Stefanov. 2012. Managing latency in embedded streaming applications under hard-real-time scheduling. In Proceedings of the ACM International Symposium on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 83--92.

Digital Library

[4]

Shekhar Borkar. 2005. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro 25, 6, 10--16.

Digital Library

[5]

David Brooks, Robert P. Dick, Russ Joseph, and Li Shang. 2007. Power, thermal, and reliability modeling in nanometer-scale microprocessors. IEEE Micro 27, 3, 49--62.

Digital Library

[6]

Ewerson Carvalho, Ney Calazans, and Fernando Moraes. 2007. Heuristics for dynamic task mapping in NoC-based heterogeneous MPSoCs. In Proceedings of the IEEE/IFIP International Workshop on Rapid System Prototyping (RSP). 34--40.

Digital Library

[7]

Jeronimo Castrillon, Andreas Tretter, Rainer Leupers, and Gerd Ascheid. 2012. Communication-aware mapping of KPN applications onto heterogeneous MPSoCs. In Proceedings of the IEEE/ACM Design Automation Conference (DAC). 1262--1267.

Digital Library

[8]

Weijia Che and Karam S. Chatha. 2012. Unrolling and retiming of stream applications onto embedded multicore processors. In Proceedings of the IEEE/ACM Design Automation Conference (DAC). 1272--1277.

Digital Library

[9]

Junchul Choi, Hyunok Oh, Sungchan Kim, and Soonhoi Ha. 2012. Executing synchronous dataflow graphs on a SPM-based multicore architecture. In Proceedings of the IEEE/ACM Design Automation Conference (DAC). 664--671.

Digital Library

[10]

Dror G. Feitelson, Larry Rudolph, Uwe Schwiegelshohn, Kenneth C. Sevcik, and Parkson Wong. 1997. In Proceedings of the Job Scheduling Strategies for Parallel Processing (IPPS’97). 1--34.

Digital Library

[11]

Matthew Guthaus, Jeff Ringenberg, Todd Austin, Trevor Mudge, and Richard Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the Workshop on Workload Characterization (WWC-4). 3--14.

Digital Library

[12]

John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Computer Architecture News 34, 4, 1--17.

Digital Library

[13]

Jason Howard, Saurabh Dighe, Yatin Hoskote, Sriram Vangal, David Finan, Gregory Ruhl, David Jenkins, Howard Wilson, Nitin Borkar, Gerhard Schrom, Fabrice, Shailendra, Tiju Jacob, Satish Yada, Sraven Marella, Praveen Salihundam, Vasantha Erraguntla, Michael Konow, Michael Riepen, Guido Droege, Joerg Lindemann, Matthias Gries, Thomas Apel, Kersten Henriss, Tor Lund-Larsen, Sebastian Steibl, Shekhar Borkar, Vivek De, Rob Van Der Wijngaart, and Timothy Mattson. 2010. A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC). 108--109.

[14]

Markus C. Huebscher and Julie A. McCann. 2008. A survey of autonomic computing—degrees, models, and applications. ACM Computing Surveys 40, 3, 7:1--7:28.

Digital Library

[15]

Janmartin Jahn and Jörg Henkel. 2013. Pipelets: Self-organizing software pipelines for many core architectures. In Proceedings of the IEEE/ACM International Conference on Design, Automation, and Test in Europe (DATE). 1516--1521.

Digital Library

[16]

Janmartin Jahn, Santiago Pagani, Sebastian Kobbe, Jian-Jia Chen, and Jörg Henkel. 2013. Optimizations for configuring and mapping software pipelines in many core systems. In Proceedings of the IEEE/ACM Design Automation Conference (DAC). Article No. 130.

Digital Library

[17]

Kevin Klues, Barret Rhoden, Andrew Waterman, and Eric Brewer. 2010. Processes and resource management in a scalable many-core OS. In Proceedings of the USENIX Workshop on Hot Topics in Parallelism (HotPar). 1--6.

[18]

Sebastian Kobbe, Lars Bauer, Daniel Lohmann, Wolfgang Schröder-Preikschat, and Jörg Henkel. 2011. DistRM: Distributed resource management for on-chip many-core systems. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’11). 119--128.

Digital Library

[19]

Karthik Lakshmanan, Ragunathan Rajkumar, and John Lehoczky. 2009. Partitioned fixed-priority preemptive scheduling for multi-core processors. In Proceedings of the Euromicro Conference on Real-Time Systems (ECRTS). 239--248.

Digital Library

[20]

Haeseung Lee, Weijia Che, and Karam Chatha. 2012. Dynamic scheduling of stream programs on embedded multi-core processors. In Proceedings of the ACM International Symposium on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 93--102.

Digital Library

[21]

Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn. 2007. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Proceedings of the International Conference on Supercomputing (ICS). 53:1--53:11.

Digital Library

[22]

Chi-Keung Luk, Sunpyo Hong, and Hyesoon Kim. 2009. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the International Symposium on Microarchitecture (MICRO). 45--55.

Digital Library

[23]

David S. Johnson and Michael R. Garey. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Co.

Digital Library

[24]

Vijaykrishnan Narayanan and Yuan Xie. 2006. Reliability concerns in embedded system designs. IEEE Transactions on Computers 39, 1, 118--120.

Digital Library

[25]

Vincent Nollet, Theodore Marescaux, Prabhat Avasare, Diederik Verkest, and Jean-Yves Mignolet. 2005. Centralized run-time resource management in a network-on-chip containing reconfigurable hardware tiles. In Proceedings of the IEEE/ACM International Conference on Design, Automation, and Test in Europe (DATE). 234--239.

Digital Library

[26]

Mohan Rajagopalan, Brian T. Lewis, and Todd A. Anderson. 2007. Thread scheduling for multi-core platforms. In Proceedings of the 11th USENIX Workshop on Hot Topics in Operating Systems (HOTOS’07). 2:1--2:6.

Digital Library

[27]

Petar Radojković, Vladimir Čakarević, Miquel Moretó, Javier Verdú, Alex Pajuelo, Francisco J. Cazorla, Mario Nemirovsky, and Mateo Valero. 2012. Optimal task assignment in multithreaded processors: A statistical approach. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 235--248.

Digital Library

[28]

Lars Schor, Iuliana Bacivarov, Devendra Rai, Hoeseok Yang, Shin Haeng Kang, and Lothar Thiele. 2012. Scenario-based design flow for mapping streaming applications onto on-chip many-core systems. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES). 71--80.

Digital Library

[29]

Amit Kumar Singh, Muhammad Shafique, Akash Kumar, and Jörg Henkel. 2013. Mapping on multi/many-core systems: Survey of current and emerging trends. In Proceedings of the 50th Annual Design Automation Conference (DAC). Article No. 1.

Digital Library

[30]

Allan Snavely and Dean Tullsen. 2000. Symbiotic jobscheduling for a simultaneous multithreading processor. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 234--244.

Digital Library

[31]

Allan Snavely, Dean M. Tullsen, and Geoff Voelker. 2002. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX). 66--76.

Digital Library

[32]

Jan Stender, Silvan Kaiser, and Sahin Albayrak. 2006. Mobility-based runtime load balancing in multi-agent systems. In Proceedings of the International Conference on Software Engineering and Knowledge Engineering (SEKE). 688--693.

[33]

William Thies, Michal Karczmarek, Michael Gordon, David Maze, Jeremy Wong, Henry Hoffmann, Matthew Brown, and Saman Amarasinghe. 2001. StreamIt: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction (ICCC). 179--196.

Digital Library

[34]

John Turek, Joel L. Wolf, and Philip S. Yu. 1992. Approximate algorithms for scheduling parallelizable tasks. In Proceedings of the Symposium on Parallel Algorithms and Architectures (SPAA). 323--332.

Digital Library

Cited By

Garmon ARamakrishnaiah VPerez D(2022)Resource allocation for task-level speculative scientific applications: A proof of concept using Parallel Trajectory SplicingParallel Computing10.1016/j.parco.2022.102936112(102936)Online publication date: Sep-2022
https://doi.org/10.1016/j.parco.2022.102936

Recommendations

Resource reconstruction algorithms for on-demand allocation in virtual computing resource pool

Resource reconstruction algorithms are studied in this paper to solve the problem of resource on-demand allocation and improve the efficiency of resource utilization in virtual computing resource pool. Based on the idea of resource virtualization and ...
A Survey and Comparative Study of Hard and Soft Real-Time Dynamic Resource Allocation Strategies for Multi-/Many-Core Systems

Multi-/many-core systems are envisioned to satisfy the ever-increasing performance requirements of complex applications in various domains such as embedded and high-performance computing. Such systems need to cater to increasingly dynamic workloads, ...
A Hierarchical Distributed Runtime Resource Management Scheme for NoC-Based Many-Cores

As technology constantly strengthens its presence in all aspects of human life, computing systems integrate a high number of processing cores, whereas applications become more complex and greedy for computational resources. Inevitably, this high ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing

ACM Transactions on Parallel Computing Volume 2, Issue 1

Special Issue on SPAA 2012

May 2015

202 pages

ISSN:2329-4949

EISSN:2329-4957

DOI:10.1145/2757213

Editor:
Phillip B. Gibbons
Intel Labs, Pittsburgh, USA

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 May 2015

Accepted: 01 November 2014

Revised: 01 August 2014

Received: 01 August 2013

Published in TOPC Volume 2, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Deutsche Forschungsgemeinschaft

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
219
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Garmon ARamakrishnaiah VPerez D(2022)Resource allocation for task-level speculative scientific applications: A proof of concept using Parallel Trajectory SplicingParallel Computing10.1016/j.parco.2022.102936112(102936)Online publication date: Sep-2022
https://doi.org/10.1016/j.parco.2022.102936

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents