Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

User-Assisted Store Recycling for Dynamic Task Graph Schedulers

Published: 28 December 2016 Publication History
  • Get Citation Alerts
  • Abstract

    The emergence of the multi-core era has led to increased interest in designing effective yet practical parallel programming models. Models based on task graphs that operate on single-assignment data are attractive in several ways. Notably, they can support dynamic applications and precisely represent the available concurrency. However, for efficient execution, they also require nuanced algorithms for scheduling and memory management. In this article, we consider memory-efficient dynamic scheduling of task graphs. Specifically, we present a novel approach for dynamically recycling the memory locations assigned to data items as they are produced by tasks. We develop algorithms to identify memory-efficient store recycling functions by systematically evaluating the validity of a set of user-provided or automatically generated alternatives. Because recycling functions can be input data-dependent, we have also developed support for continued correct execution of a task graph in the presence of a potentially incorrect store recycling function.
    Experimental evaluation demonstrates that this approach to automatic store recycling incurs little to no overheads, achieves memory usage comparable to the best manually derived solutions, often produces recycling functions valid across problem sizes and input parameters, and efficiently recovers from an incorrect choice of store recycling functions.

    References

    [1]
    Samah Abu-Mahmeed, Cheryl McCosh, Zoran Budimlić, Ken Kennedy, Kaushik Ravindran, Kevin Hogan, Paul Austin, Steve Rogers, and Jacob Kornerup. 2009. Scheduling tasks to maximize usage of aggregate variables in place. In CC. 204--219.
    [2]
    Kunal Agrawal, Charles E. Leiserson, and Jim Sukha. 2010. Executing task graphs using work-stealing. In Proceedings of the IEEE 24th International Symposium on Parallel 8 Distributed Processing (IPDPS). 1--12.
    [3]
    Arvind. 1981. Data flow languages and architecture. In Proceedings of the 8th Annual Symposium on Computer Architecture (ISCA’ 81). 1.
    [4]
    Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience 23, 2 (2011), 187--198.
    [5]
    François Broquedis, Thierry Gautier, and Vincent Danjean. 2012. LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms. In Proceedings of the 8th International Conference on OpenMP in a Heterogeneous World (IWOMP’12). Springer-Verlag, Berlin, 102--115.
    [6]
    Zoran Budimlić, Michael Burke, Vincent Cavé, Kathleen Knobe, Geoff Lowney, and others. 2010. Concurrent collections. Scientific Programming 18, 3 (2010), 203--217.
    [7]
    Zoran Budimlic, Aparna M. Chandramowlishwaran, Kathleen Knobe, Geoff N. Lowney, Vivek Sarkar, and Leo Treggiari. 2008. Declarative aspects of memory management in the concurrent collections parallel programming model. In Proceedings of the Workshop on Declarative Aspects of Multicore Programming. 47--58.
    [8]
    A. Bui, Kwang-Ting Cheng, J. Cong, L. Vese, Yi-Chu Wang, Bo Yuan, and Yi Zou. 2012. Platform characterization for domain-specific computing. In ASP-DAC. 94--99.
    [9]
    Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’09). IEEE, 44--54.
    [10]
    Jack B. Dennis. 1974. First version of a data flow procedure language. In Programming Symposium. Springer, 362--376.
    [11]
    Alejandro Duran, Eduard Ayguadé, Rosa M. Badia, Jesús Labarta, Luis Martinell, Xavier Martorell, and Judit Planas. 2011. OmpSs: A Proposal for Programming Heterogeneous Multi-core Architectures. Parallel Processing Letters 21, 2 (2011), 173--193.
    [12]
    Thierry Gautier, Joao V. F. Lima, Nicolas Maillard, and Bruno Raffin. 2013. XKaapi: A runtime system for data-flow task programming on heterogeneous architectures. In Proceedings of the 2013 IEEE 27th International Symposium on Parallel 8 Distributed Processing (IPDPS). IEEE, 1299--1308.
    [13]
    Léonard Gérard, Adrien Guatto, Cédric Pasteur, and Marc Pouzet. 2012. A modular memory optimization for synchronous data-flow languages: Application to arrays in a lustre compiler. SIGPLAN Not. 47, 5 (June 2012), 51--60.
    [14]
    Paul Hudak and Adrienne G. Bloss. 1985. The aggregate update problem in functional programming systems. In POPL. 300--314.
    [15]
    Prabhanjan Kambadur, Anshul Gupta, Torsten Hoefler, and Andrew Lumsdaine. 2009. Demand-driven execution of static directed acyclic graphs using task parallelism. In Proceedings of the 2009 International Conference on High Performance Computing (HiPC). IEEE, 284--293.
    [16]
    Mehmet Can Kurt, Sriram Krishnamoorthy, Kunal Agrawal, and Gagan Agrawal. 2014. Fault-tolerant dynamic task graph scheduling. In SC. 719--730.
    [17]
    Yu-Kwong Kwok and Ishfaq Ahmad. 1999. Static scheduling algorithms for allocating directed task graphs to multiprocessors. Comput. Surveys 31, 4 (Dec. 1999), 406--471.
    [18]
    Malcolm Yoke Hean Low, Weiguo Liu, and Bertil Schmidt. 2007. A parallel BSP algorithm for irregular dynamic programming. In APPT. 151--160.
    [19]
    Nicholas D. Matsakis. 2012. Parallel closures: A new twist on an old idea. In HotPar. 5--5.
    [20]
    Walid A. Najjar, Edward A. Lee, and Guang R. Gao. 1999. Advances in the dataflow computational model. Parallel Comput. 25, 1314 (1999), 1907--1929.
    [21]
    Antoniu Pop and Albert Cohen. 2013. OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs. ACM Transactions on Architecture and Code Optimization (TACO) 9, 4 (2013), 53.
    [22]
    Polyvios Pratikakis, Hans Vandierendonck, Spyros Lyberis, and Dimitrios S. Nikolopoulos. 2011. A programming model for deterministic task parallelism. In Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness. ACM, 7--12.
    [23]
    Veselin Raychev, Martin Vechev, and Manu Sridharan. 2013. Effective race detection for event-driven programs. In OOPSLA. 151--166.
    [24]
    Dragos Sbirlea, Zoran Budimlic, and Vivek Sarkar. 2014. Bounded memory scheduling of dynamic task graphs. In PACT. 343--356.
    [25]
    Dragos Sbirlea, Kathleen Knobe, and Vivek Sarkar. 2012. Folding of tagged single assignment values for memory-efficient parallelism. In Euro-Par. 601--613.
    [26]
    Peter Schnorf, Mahadevan Ganapathi, and John L. Hennessy. 1993. Compile-time copy elimination. Software: Practice and Experience 23, 11 (1993), 1175--1200.
    [27]
    Sagnak Tasirlar and Vivek Sarkar. 2011. Data-driven tasks and their implementation. In ICPP. 652--661.
    [28]
    Asim YarKhan, Jakub Kurzak, and Jack Dongarra. 2011. Quark users guide: Queueing and runtime for kernels. University of Tennessee Innovative Computing Laboratory Technical Report ICL-UT-11-02.

    Cited By

    View all
    • (2021)Incremental, Distributed, and Concurrent Service Coordination for Reliable and Deterministic Systems-of-SystemsIEEE Systems Journal10.1109/JSYST.2020.302043015:2(2470-2481)Online publication date: Jul-2021

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 4
    December 2016
    648 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/3012405
    Issue’s Table of Contents
    © 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 December 2016
    Accepted: 01 November 2016
    Revised: 01 November 2016
    Received: 01 May 2016
    Published in TACO Volume 13, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Task parallelism
    2. dynamic scheduling
    3. memory management

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • U.S. Department of Energy's (DOE) Office of Science
    • Battelle for DOE
    • Pacific Northwest National Laboratory
    • NSF
    • The Ohio State University
    • Office of Advanced Scientific Computing Research
    • Department of Energy (DOE)

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)31
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Incremental, Distributed, and Concurrent Service Coordination for Reliable and Deterministic Systems-of-SystemsIEEE Systems Journal10.1109/JSYST.2020.302043015:2(2470-2481)Online publication date: Jul-2021

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media