research-article

Open access

User-Assisted Store Recycling for Dynamic Task Graph Schedulers

Authors:

Mehmet Can Kurt,

Sriram Krishnamoorthy,

Bin RenAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 13, Issue 4

Article No.: 55, Pages 1 - 24

https://doi.org/10.1145/3018111

Published: 28 December 2016 Publication History

Abstract

The emergence of the multi-core era has led to increased interest in designing effective yet practical parallel programming models. Models based on task graphs that operate on single-assignment data are attractive in several ways. Notably, they can support dynamic applications and precisely represent the available concurrency. However, for efficient execution, they also require nuanced algorithms for scheduling and memory management. In this article, we consider memory-efficient dynamic scheduling of task graphs. Specifically, we present a novel approach for dynamically recycling the memory locations assigned to data items as they are produced by tasks. We develop algorithms to identify memory-efficient store recycling functions by systematically evaluating the validity of a set of user-provided or automatically generated alternatives. Because recycling functions can be input data-dependent, we have also developed support for continued correct execution of a task graph in the presence of a potentially incorrect store recycling function.

Experimental evaluation demonstrates that this approach to automatic store recycling incurs little to no overheads, achieves memory usage comparable to the best manually derived solutions, often produces recycling functions valid across problem sizes and input parameters, and efficiently recovers from an incorrect choice of store recycling functions.

References

[1]

Samah Abu-Mahmeed, Cheryl McCosh, Zoran Budimlić, Ken Kennedy, Kaushik Ravindran, Kevin Hogan, Paul Austin, Steve Rogers, and Jacob Kornerup. 2009. Scheduling tasks to maximize usage of aggregate variables in place. In CC. 204--219.

Digital Library

[2]

Kunal Agrawal, Charles E. Leiserson, and Jim Sukha. 2010. Executing task graphs using work-stealing. In Proceedings of the IEEE 24th International Symposium on Parallel 8 Distributed Processing (IPDPS). 1--12.

[3]

Arvind. 1981. Data flow languages and architecture. In Proceedings of the 8th Annual Symposium on Computer Architecture (ISCA’ 81). 1.

[4]

Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience 23, 2 (2011), 187--198.

Digital Library

[5]

François Broquedis, Thierry Gautier, and Vincent Danjean. 2012. LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms. In Proceedings of the 8th International Conference on OpenMP in a Heterogeneous World (IWOMP’12). Springer-Verlag, Berlin, 102--115.

Digital Library

[6]

Zoran Budimlić, Michael Burke, Vincent Cavé, Kathleen Knobe, Geoff Lowney, and others. 2010. Concurrent collections. Scientific Programming 18, 3 (2010), 203--217.

Digital Library

[7]

Zoran Budimlic, Aparna M. Chandramowlishwaran, Kathleen Knobe, Geoff N. Lowney, Vivek Sarkar, and Leo Treggiari. 2008. Declarative aspects of memory management in the concurrent collections parallel programming model. In Proceedings of the Workshop on Declarative Aspects of Multicore Programming. 47--58.

Digital Library

[8]

A. Bui, Kwang-Ting Cheng, J. Cong, L. Vese, Yi-Chu Wang, Bo Yuan, and Yi Zou. 2012. Platform characterization for domain-specific computing. In ASP-DAC. 94--99.

[9]

Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’09). IEEE, 44--54.

Digital Library

[10]

Jack B. Dennis. 1974. First version of a data flow procedure language. In Programming Symposium. Springer, 362--376.

[11]

Alejandro Duran, Eduard Ayguadé, Rosa M. Badia, Jesús Labarta, Luis Martinell, Xavier Martorell, and Judit Planas. 2011. OmpSs: A Proposal for Programming Heterogeneous Multi-core Architectures. Parallel Processing Letters 21, 2 (2011), 173--193.

[12]

Thierry Gautier, Joao V. F. Lima, Nicolas Maillard, and Bruno Raffin. 2013. XKaapi: A runtime system for data-flow task programming on heterogeneous architectures. In Proceedings of the 2013 IEEE 27th International Symposium on Parallel 8 Distributed Processing (IPDPS). IEEE, 1299--1308.

Digital Library

[13]

Léonard Gérard, Adrien Guatto, Cédric Pasteur, and Marc Pouzet. 2012. A modular memory optimization for synchronous data-flow languages: Application to arrays in a lustre compiler. SIGPLAN Not. 47, 5 (June 2012), 51--60.

Digital Library

[14]

Paul Hudak and Adrienne G. Bloss. 1985. The aggregate update problem in functional programming systems. In POPL. 300--314.

Digital Library

[15]

Prabhanjan Kambadur, Anshul Gupta, Torsten Hoefler, and Andrew Lumsdaine. 2009. Demand-driven execution of static directed acyclic graphs using task parallelism. In Proceedings of the 2009 International Conference on High Performance Computing (HiPC). IEEE, 284--293.

[16]

Mehmet Can Kurt, Sriram Krishnamoorthy, Kunal Agrawal, and Gagan Agrawal. 2014. Fault-tolerant dynamic task graph scheduling. In SC. 719--730.

Digital Library

[17]

Yu-Kwong Kwok and Ishfaq Ahmad. 1999. Static scheduling algorithms for allocating directed task graphs to multiprocessors. Comput. Surveys 31, 4 (Dec. 1999), 406--471.

Digital Library

[18]

Malcolm Yoke Hean Low, Weiguo Liu, and Bertil Schmidt. 2007. A parallel BSP algorithm for irregular dynamic programming. In APPT. 151--160.

[19]

Nicholas D. Matsakis. 2012. Parallel closures: A new twist on an old idea. In HotPar. 5--5.

[20]

Walid A. Najjar, Edward A. Lee, and Guang R. Gao. 1999. Advances in the dataflow computational model. Parallel Comput. 25, 1314 (1999), 1907--1929.

Digital Library

[21]

Antoniu Pop and Albert Cohen. 2013. OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs. ACM Transactions on Architecture and Code Optimization (TACO) 9, 4 (2013), 53.

Digital Library

[22]

Polyvios Pratikakis, Hans Vandierendonck, Spyros Lyberis, and Dimitrios S. Nikolopoulos. 2011. A programming model for deterministic task parallelism. In Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness. ACM, 7--12.

Digital Library

[23]

Veselin Raychev, Martin Vechev, and Manu Sridharan. 2013. Effective race detection for event-driven programs. In OOPSLA. 151--166.

Digital Library

[24]

Dragos Sbirlea, Zoran Budimlic, and Vivek Sarkar. 2014. Bounded memory scheduling of dynamic task graphs. In PACT. 343--356.

Digital Library

[25]

Dragos Sbirlea, Kathleen Knobe, and Vivek Sarkar. 2012. Folding of tagged single assignment values for memory-efficient parallelism. In Euro-Par. 601--613.

Digital Library

[26]

Peter Schnorf, Mahadevan Ganapathi, and John L. Hennessy. 1993. Compile-time copy elimination. Software: Practice and Experience 23, 11 (1993), 1175--1200.

Digital Library

[27]

Sagnak Tasirlar and Vivek Sarkar. 2011. Data-driven tasks and their implementation. In ICPP. 652--661.

Digital Library

[28]

Asim YarKhan, Jakub Kurzak, and Jack Dongarra. 2011. Quark users guide: Queueing and runtime for kernels. University of Tennessee Innovative Computing Laboratory Technical Report ICL-UT-11-02.

Cited By

Abou-Tair DKhalifeh AAlouneh SObermaisser R(2021)Incremental, Distributed, and Concurrent Service Coordination for Reliable and Deterministic Systems-of-SystemsIEEE Systems Journal10.1109/JSYST.2020.302043015:2(2470-2481)Online publication date: Jul-2021
https://doi.org/10.1109/JSYST.2020.3020430

Index Terms

User-Assisted Store Recycling for Dynamic Task Graph Schedulers
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Theory of computation
  1. Models of computation
    1. Concurrency
      1. Parallel computing models

Recommendations

User-assisted storage reuse determination for dynamic task graphs
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Models based on task graphs that operate on single-assignment data are attractive in several ways, but also require nuanced algorithms for scheduling and memory management for efficient execution. In this paper, we consider memory-efficient dynamic ...
User-assisted storage reuse determination for dynamic task graphs
PPoPP '16

Models based on task graphs that operate on single-assignment data are attractive in several ways, but also require nuanced algorithms for scheduling and memory management for efficient execution. In this paper, we consider memory-efficient dynamic ...
Enabling Hybrid PCM Memory System with Inherent Memory Management
RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent Systems

Replacing the traditional volatile main memory, e.g., DRAM, with a non-volatile phase change memory (PCM) has become a possible solution to reduce the energy consumption of computing systems. To further reduce the bit cost of PCM, the development trend ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 13, Issue 4

December 2016

648 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3012405

Editor:
Koen De Bosschere
Ghent University

Issue’s Table of Contents

Copyright © 2016 ACM.

© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 December 2016

Accepted: 01 November 2016

Revised: 01 November 2016

Received: 01 May 2016

Published in TACO Volume 13, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

U.S. Department of Energy's (DOE) Office of Science
Battelle for DOE
Pacific Northwest National Laboratory
NSF
The Ohio State University
Office of Advanced Scientific Computing Research
Department of Energy (DOE)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
293
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)4

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Abou-Tair DKhalifeh AAlouneh SObermaisser R(2021)Incremental, Distributed, and Concurrent Service Coordination for Reliable and Deterministic Systems-of-SystemsIEEE Systems Journal10.1109/JSYST.2020.302043015:2(2470-2481)Online publication date: Jul-2021
https://doi.org/10.1109/JSYST.2020.3020430

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents