Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2591635.2591647acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Author retrospective for PYRROS: static task scheduling and code generation for message passing multiprocessors

Published: 10 June 2014 Publication History

Abstract

Given a program with annotated task parallelism represented as a directed acyclic graph (DAG), the PYRROS project was focused on fast DAG scheduling, code generation and runtime execution on distributed memory architectures. PYRROS scheduling goes through several processing stages including clustering of tasks, cluster mapping, and task execution ordering. Since the publication of the PYRROS project, there have been new advancements in the area of DAG scheduling algorithms, the use of DAG scheduling for irregular and large-scale computation, and software system development with annotated task parallelism on modern parallel and cloud architectures. This retrospective describes our experience from this project and the follow-up work, and reviews representative papers related to DAG scheduling published in the last decade.

References

[1]
D. Brown and P. Messina (Chairs). Scientific Grand Challenges: Crosscutting Technologies for Computing at the Exascale. Report from the 2010 Workshop. U.S. Department of Energy, 2010.
[2]
Vikram S. Adve and Rizos Sakellariou. Compiler synthesis of task graphs for parallel program performance prediction. In Proc. of 13th Inter. Workshop on Languages and Compilers for Parallel Computing-Revised Papers, pages 208--226, 2001.
[3]
Steven Balensiefer, Lucas Kregor-Stickles, and Mark Oskin. An evaluation framework and instruction set architecture for ion-trap based quantum micro-architectures. In Proc. of 32nd Annual Inter. Symposium on Computer Architecture, pages 186--196, 2005.
[4]
Muthu Manikandan Baskaran, Nagavijayalakshmi Vydyanathan, Uday Kumar Reddy Bondhugula, J. Ramanujam, Atanas Rountev, and P. Sadayappan. Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors. In Proc. of 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 219--228, 2009.
[5]
Pieter Bellens, Josep M. Perez, Rosa M. Badia, and Jesus Labarta. Cellss: A programming model for the cell be architecture. In Proc. of ACM/IEEE Supercomputing'06.
[6]
R. Blumofe, C. Joerg, B. Kuszmaul, C. Leiserson, K. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. In Proc. of 5th ACM Symposium on Principles and Practice of Parallel Programming, pages 207--216, 1995.
[7]
George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Herault, Pierre Lemarinier, and Jack Dongarra. Dague: A generic distributed dag engine for high performance computing. Parallel Comput., 38(1--2):37--51, January 2012.
[8]
Doruk Bozdaug, Füsun Özgüner, and Umit V. Catalyurek. Compaction of schedules and a two-stage approach for duplication-based dag scheduling. IEEE Trans. Parallel Distrib. Syst., 20(6):857--871, June 2009.
[9]
Javier Bueno, Judit Planas, Alejandro Duran, Rosa M. Badia, Xavier Martorell, Eduard Ayguade, and Jesus Labarta. Productive programming of gpu clusters with ompss. In Proc. of 2012 IEEE 26th Inter. Parallel and Distributed Processing Symposium, pages 557--568, 2012.
[10]
F. T. Chong, S. D. Sharma, E. A. Brewer, and J. Saltz. Multiprocessor Runtime Support for Fine-Grained Irregular DAGs. In Rajiv K. Kalia and Priya Vashishta, editors, Toward Teraflop Computing and New Grand Challenge Applications., New York, 1995. Nova Science Publishers.
[11]
Michel Cosnard and Emmanuel Jeannot. Compact dag representation and its dynamic scheduling. J. Parallel Distrib. Comput., 58(3):487--514, September 1999.
[12]
Michel Cosnard, Emmanuel Jeannot, and Tao Yang. Compact dag representation and its symbolic scheduling. J. Parallel Distrib. Comput., 64(8):921--935, August 2004.
[13]
Mohammad I. Daoud and Nawwaf Kharma. A hybrid heuristic-genetic algorithm for task scheduling in heterogeneous processor networks. J. Parallel Distrib. Comput., 71(11):1518--1531, November 2011.
[14]
Jack Dongarra. Achitecture Aware Algorithms and Software for Peta and Exascale. Presentation at Ken Kennedy Institute of Information Technology, Feb 2014.
[15]
C. Fu, X. Jiao, and T. Yang. Efficient Sparse LU Factorization with Partial Pivoting on Distributed Memory Architectures. IEEE Transactions on Parallel and Distributed Systems, 9(2):109--125, 1998.
[16]
C. Fu and T. Yang. Sparse LU Factorization with Partial Pivoting on Distributed Memory Machines. In Proc. of ACM/IEEE SuperComputing'96.
[17]
C. Fu and T. Yang. Run-time Compilation for Parallel Sparse Matrix Computations. In Proc. of ACM Inter. Conf. on Supercomputing, pages 237--244, 1996.
[18]
C. Fu and T. Yang. Space and Time Efficient Execution of Parallel Irregular Computations. In Proc. of ACM Symposium on Principles & Practice of Parallel Programming, pages 57--68, 1997.
[19]
A. Gerasoulis and T. Yang. On the Granularity and Clustering of Directed Acyclic Task Graphs . IEEE Trans. on Parallel and Distributed Syst., 4(6):686--701, June 1993.
[20]
Apostolos Gerasoulis and Jia Jiao. Rescheduling support for mapping dynamic scientific computation onto distributed memory multiprocessors. In Proc. of Third Inter. Euro-Par Conference on Parallel Processing, pages 905--912, 1997.
[21]
M. Girkar and C. Polychronopoulos. Automatic Extraction of Functinal Parallelism from Ordinary Programs. IEEE Trans. on Parallel and Distributed Syst., 3(2):166--178, 1992.
[22]
Thomas A. Henzinger, Anmol V. Singh, Vasu Singh, Thomas Wies, and Damien Zufferey. A marketplace for cloud resources. In Proc. of Tenth ACM Inter. Conf. on Embedded Software, pages 1--8, 2010.
[23]
Thomas A. Henzinger, Anmol V. Singh, Vasu Singh, Thomas Wies, and Damien Zufferey. Static scheduling in clouds. In Proc. of 3rd USENIX HotCloud, 2011.
[24]
Yu-Kwong Kwok and Ishfaq Ahmad. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv., 31(4):406--471, 1999.
[25]
Qingyu Meng, Alan Humphrey, John Schmidt, and Martin Berzins. Investigating applications portability with the uintah dag-based runtime system on petascale supercomputers. In Proc. of SC13: Inter. Conf. for High Performance Computing, Networking, Storage and Analysis, pages 96:1--96:12, 2013.
[26]
Martha Mercaldi, Steven Swanson, Andrew Petersen, Andrew Putnam, Andrew Schwerin, Mark Oskin, and Susan J. Eggers. Instruction scheduling for a tiled dataflow architecture. In Proc. of 12th Inter. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 141--150, 2006.
[27]
Tony Nowatzki, Michael Sartin-Tarm, Lorenzo De Carli, Karthikeyan Sankaralingam, Cristian Estan, and Behnam Robatmili. A general constraint-centric scheduling framework for spatial architectures. In Proc. of 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 495--506, 2013.
[28]
V. Sarkar. Partitioning and Scheduling Parallel Programs for Execution on Multiprocessors. MIT Press, 1989.
[29]
Oliver Sinnen, Andrea To, and Manpreet Kaur. Contention-aware scheduling with task duplication. J. Parallel Distrib. Comput., 71(1):77--86, January 2011.
[30]
Xiaoyong Tang, Kenli Li, Meikang Qiu, and Edwin H. M. Sha. A hierarchical reliability-driven scheduling algorithm in grid systems. J. Parallel Distrib. Comput., 72(4):525--535, April 2012.
[31]
Naga Vydyanathan, Umit Catalyurek, Tahsin Kurc, Ponnuswamy Sadayappan, and Joel Saltz. Optimizing latency and throughput of application workflows on clusters. Parallel Comput., 37(10--11):694--712, 2011.
[32]
T. Yang and A. Gerasoulis. List Scheduling With and Without Communication . Parallel Computing, 19:1321--1344, 1993.
[33]
T. Yang and A. Gerasoulis. DSC: Scheduling Parallel Tasks on An Unbounded Number of Processors. IEEE Trans. on Parallel and Distributed Syst., 5(9):951--967, 1994.
[34]
Tao Yang and Cong Fu. Space/time-efficient scheduling and execution of parallel irregular computations. ACM Trans. Program. Lang. Syst., 20(6):1195--1222, November 1998.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ACM International Conference on Supercomputing 25th Anniversary Volume
June 2014
94 pages
ISBN:9781450328401
DOI:10.1145/2591635
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2014

Check for updates

Author Tags

  1. dag
  2. parallel processing
  3. scheduling
  4. task graph

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 83
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media