Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Runtime Resource Allocation for Software Pipelines

Published: 21 May 2015 Publication History

Abstract

Efficiently allocating the computational resources of many-core systems is one of the most prominent challenges, especially when resource requirements may vary unpredictably at runtime. This is even more challenging when facing unreliable cores—a scenario that becomes common as the number of cores increases and integration sizes shrink.
To address this challenge, this article presents an optimal method for the allocation of the resources to software-pipelined applications. Here we show how runtime observations of the resource requirements of tasks can be used to adapt resource allocations. Furthermore, we show how the optimum can be traded for a high degree of scalability by clustering applications in a distributed, hierarchical manner. To diminish the negative effects of unreliable cores, this article shows how self-organization can effectively restore the integrity of such a hierarchy when it is corrupted by a failing core. Experiments on Intel’s 48-core Single-Chip Cloud Computer and in a many-core simulator show that a significant improvement in system throughput can be achieved over the current state of the art.

References

[1]
Pedram Azad, Tilo Gockel, and Rudiger Dillmann. 2008. Computer Vision: Principles and Practice. Elektor.
[2]
Jacques M. Bahi, Sylvain Contassot-Vivier, and Raphael Couturier. 2005. Dynamic load balancing and efficient load estimators for asynchronous iterative algorithms. IEEE Transactions on Parallel and Distributed Systems 16, 4, 289--299.
[3]
Mohamed A. Bamakharma and Todor Stefanov. 2012. Managing latency in embedded streaming applications under hard-real-time scheduling. In Proceedings of the ACM International Symposium on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 83--92.
[4]
Shekhar Borkar. 2005. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro 25, 6, 10--16.
[5]
David Brooks, Robert P. Dick, Russ Joseph, and Li Shang. 2007. Power, thermal, and reliability modeling in nanometer-scale microprocessors. IEEE Micro 27, 3, 49--62.
[6]
Ewerson Carvalho, Ney Calazans, and Fernando Moraes. 2007. Heuristics for dynamic task mapping in NoC-based heterogeneous MPSoCs. In Proceedings of the IEEE/IFIP International Workshop on Rapid System Prototyping (RSP). 34--40.
[7]
Jeronimo Castrillon, Andreas Tretter, Rainer Leupers, and Gerd Ascheid. 2012. Communication-aware mapping of KPN applications onto heterogeneous MPSoCs. In Proceedings of the IEEE/ACM Design Automation Conference (DAC). 1262--1267.
[8]
Weijia Che and Karam S. Chatha. 2012. Unrolling and retiming of stream applications onto embedded multicore processors. In Proceedings of the IEEE/ACM Design Automation Conference (DAC). 1272--1277.
[9]
Junchul Choi, Hyunok Oh, Sungchan Kim, and Soonhoi Ha. 2012. Executing synchronous dataflow graphs on a SPM-based multicore architecture. In Proceedings of the IEEE/ACM Design Automation Conference (DAC). 664--671.
[10]
Dror G. Feitelson, Larry Rudolph, Uwe Schwiegelshohn, Kenneth C. Sevcik, and Parkson Wong. 1997. In Proceedings of the Job Scheduling Strategies for Parallel Processing (IPPS’97). 1--34.
[11]
Matthew Guthaus, Jeff Ringenberg, Todd Austin, Trevor Mudge, and Richard Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the Workshop on Workload Characterization (WWC-4). 3--14.
[12]
John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Computer Architecture News 34, 4, 1--17.
[13]
Jason Howard, Saurabh Dighe, Yatin Hoskote, Sriram Vangal, David Finan, Gregory Ruhl, David Jenkins, Howard Wilson, Nitin Borkar, Gerhard Schrom, Fabrice, Shailendra, Tiju Jacob, Satish Yada, Sraven Marella, Praveen Salihundam, Vasantha Erraguntla, Michael Konow, Michael Riepen, Guido Droege, Joerg Lindemann, Matthias Gries, Thomas Apel, Kersten Henriss, Tor Lund-Larsen, Sebastian Steibl, Shekhar Borkar, Vivek De, Rob Van Der Wijngaart, and Timothy Mattson. 2010. A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC). 108--109.
[14]
Markus C. Huebscher and Julie A. McCann. 2008. A survey of autonomic computing—degrees, models, and applications. ACM Computing Surveys 40, 3, 7:1--7:28.
[15]
Janmartin Jahn and Jörg Henkel. 2013. Pipelets: Self-organizing software pipelines for many core architectures. In Proceedings of the IEEE/ACM International Conference on Design, Automation, and Test in Europe (DATE). 1516--1521.
[16]
Janmartin Jahn, Santiago Pagani, Sebastian Kobbe, Jian-Jia Chen, and Jörg Henkel. 2013. Optimizations for configuring and mapping software pipelines in many core systems. In Proceedings of the IEEE/ACM Design Automation Conference (DAC). Article No. 130.
[17]
Kevin Klues, Barret Rhoden, Andrew Waterman, and Eric Brewer. 2010. Processes and resource management in a scalable many-core OS. In Proceedings of the USENIX Workshop on Hot Topics in Parallelism (HotPar). 1--6.
[18]
Sebastian Kobbe, Lars Bauer, Daniel Lohmann, Wolfgang Schröder-Preikschat, and Jörg Henkel. 2011. DistRM: Distributed resource management for on-chip many-core systems. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’11). 119--128.
[19]
Karthik Lakshmanan, Ragunathan Rajkumar, and John Lehoczky. 2009. Partitioned fixed-priority preemptive scheduling for multi-core processors. In Proceedings of the Euromicro Conference on Real-Time Systems (ECRTS). 239--248.
[20]
Haeseung Lee, Weijia Che, and Karam Chatha. 2012. Dynamic scheduling of stream programs on embedded multi-core processors. In Proceedings of the ACM International Symposium on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 93--102.
[21]
Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn. 2007. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Proceedings of the International Conference on Supercomputing (ICS). 53:1--53:11.
[22]
Chi-Keung Luk, Sunpyo Hong, and Hyesoon Kim. 2009. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the International Symposium on Microarchitecture (MICRO). 45--55.
[23]
David S. Johnson and Michael R. Garey. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Co.
[24]
Vijaykrishnan Narayanan and Yuan Xie. 2006. Reliability concerns in embedded system designs. IEEE Transactions on Computers 39, 1, 118--120.
[25]
Vincent Nollet, Theodore Marescaux, Prabhat Avasare, Diederik Verkest, and Jean-Yves Mignolet. 2005. Centralized run-time resource management in a network-on-chip containing reconfigurable hardware tiles. In Proceedings of the IEEE/ACM International Conference on Design, Automation, and Test in Europe (DATE). 234--239.
[26]
Mohan Rajagopalan, Brian T. Lewis, and Todd A. Anderson. 2007. Thread scheduling for multi-core platforms. In Proceedings of the 11th USENIX Workshop on Hot Topics in Operating Systems (HOTOS’07). 2:1--2:6.
[27]
Petar Radojković, Vladimir Čakarević, Miquel Moretó, Javier Verdú, Alex Pajuelo, Francisco J. Cazorla, Mario Nemirovsky, and Mateo Valero. 2012. Optimal task assignment in multithreaded processors: A statistical approach. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 235--248.
[28]
Lars Schor, Iuliana Bacivarov, Devendra Rai, Hoeseok Yang, Shin Haeng Kang, and Lothar Thiele. 2012. Scenario-based design flow for mapping streaming applications onto on-chip many-core systems. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES). 71--80.
[29]
Amit Kumar Singh, Muhammad Shafique, Akash Kumar, and Jörg Henkel. 2013. Mapping on multi/many-core systems: Survey of current and emerging trends. In Proceedings of the 50th Annual Design Automation Conference (DAC). Article No. 1.
[30]
Allan Snavely and Dean Tullsen. 2000. Symbiotic jobscheduling for a simultaneous multithreading processor. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 234--244.
[31]
Allan Snavely, Dean M. Tullsen, and Geoff Voelker. 2002. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX). 66--76.
[32]
Jan Stender, Silvan Kaiser, and Sahin Albayrak. 2006. Mobility-based runtime load balancing in multi-agent systems. In Proceedings of the International Conference on Software Engineering and Knowledge Engineering (SEKE). 688--693.
[33]
William Thies, Michal Karczmarek, Michael Gordon, David Maze, Jeremy Wong, Henry Hoffmann, Matthew Brown, and Saman Amarasinghe. 2001. StreamIt: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction (ICCC). 179--196.
[34]
John Turek, Joel L. Wolf, and Philip S. Yu. 1992. Approximate algorithms for scheduling parallelizable tasks. In Proceedings of the Symposium on Parallel Algorithms and Architectures (SPAA). 323--332.

Cited By

View all
  • (2022)Resource allocation for task-level speculative scientific applications: A proof of concept using Parallel Trajectory SplicingParallel Computing10.1016/j.parco.2022.102936112(102936)Online publication date: Sep-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing
ACM Transactions on Parallel Computing  Volume 2, Issue 1
Special Issue on SPAA 2012
May 2015
202 pages
ISSN:2329-4949
EISSN:2329-4957
DOI:10.1145/2757213
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 May 2015
Accepted: 01 November 2014
Revised: 01 August 2014
Received: 01 August 2013
Published in TOPC Volume 2, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Many-core systems
  2. distributed systems
  3. resource allocation
  4. runtime system management
  5. software pipelines
  6. task mapping

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Resource allocation for task-level speculative scientific applications: A proof of concept using Parallel Trajectory SplicingParallel Computing10.1016/j.parco.2022.102936112(102936)Online publication date: Sep-2022

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media