Article

Free access

Optimal mapping of sequences of data parallel tasks

Authors:

Jaspal Subhlok,

Gary VondranAuthors Info & Claims

PPOPP '95: Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming

Pages 134 - 143

https://doi.org/10.1145/209936.209951

Published: 01 August 1995 Publication History

Abstract

Many applications in a variety of domains including digital signal processing, image processing and computer vision are composed of a sequence of tasks that act on a stream of input data sets in a pipelined manner. Recent research has established that these applications are best mapped to a massively parallel machine by dividing the tasks into modules and assigning a subset of the available processors to each module. This paper addresses the problem of optimally mapping such applications onto a massively parallel machine. We formulate the problem of optimizing throughput in task pipelines and present two new solution algorithms. The formulation uses a general and realistic model for inter-task communication, takes memory constraints into account, and addresses the entire problem of mapping which includes clustering tasks into modules, assignment of processors to modules, and possible replication of modules. The first algorithm is based on dynamic programming and finds the optimal mapping of k tasks onto P processors in O(P⁴k²) time. We also present a heuristic algorithm that is linear in the number of processors and establish with theoretical and practical results that the solutions obtained are optimal in practical situations. The entire framework is implemented as an automatic mapping tool for the Fx parallelizing compiler for High Performance Fortran. We present experimental results that demonstrate the importance of choosing a good mapping and show that the methods presented yield efficient mappings and predict optimal performance accurately.

References

[1]

BOKHARI, S. Assignment Problems in Parallel and Distributed Computing. Kluwer Academic Publishers, 1987.]]

Digital Library

[2]

CHANDY, M., FOSTER, I., KENNEDY, K., KOELBEL, C., AND TSENG, C. Integrated support for task and data parallelism. International Journal of Supercomputer Applications 8, 2 (1994), 80-98.]]

[3]

CHAPMAN, B., MEHROTRA, P., VAN ROSENDALE, J., AND ZIMA, H. A software architecture for multidisciplinary applications: Integrating task and data parallelism. Tech. Rep. 94-18, ICASE, NASA Langley Research Center, Hampton, VA, Mar. 1994.]]

Digital Library

[4]

CHOUDHARY, A., NARAHARI, B., NICOL, D., AND SIMHA, R. Optimal processor assignment for a class of pipelined computations. IEEE Transactions on Parallel and Distributed Systems 5, 4 (April 94), 439-445.]]

Digital Library

[5]

CROWL, L., CROVELLA, M., LEBLANC, T., AND SCOTT, M. The advantages of multiple parallelizations in combinatorial search. Journal of Parallel and Distributed Computing 21 (1994), 110-123.]]

Digital Library

[6]

DINDA, P., GROSS, T., O'HALLARON, D., SEGALL, E., STICH- NOTH, J., SUBHLOK, J., WEBB, J., AND YANG, B. The CMU task parallel program suite. Tech. Rep. CMU-CS-94-131, School of Computer Science, Carnegie Mellon University, Mar. 1994.]]

[7]

FOSTER, I., AVALANI, B., CHOUDHARY, A., AND XU, M. A compilation system that integrates High Performance Fortran and Fortran M. In Proceeding of 1994 Scalable High Performance Computing Conference (Knoxville, TN, October 1994), pp. 293-300.]]

[8]

GROSS, T., O'HALLARON, D., AND SUBHLOK, J. Task parallelism in a High Performance Fortran framework. IEEE Parallel & Distributed Technology, 3 (1994), 16-26.]]

Digital Library

[9]

HIGH PERFORMANCE FORTRAN FORUM. High Performance Fortran Language Specification, Version 1.0, May 1993.]]

[10]

RAMASWAMY, S, SAPATNEKAR, $., AND BANERJEE, P. A convex programming approach for exploiting data and functional parallelism. In Proceedings of the 1994 International Conference on Parallel Processing (St Charles, IL, August 1994), vol. 2, pp. 116-125.]]

Digital Library

[11]

SARKAR, V. Partitioning and Scheduling Parallel Programs for Multiprocessors. The MIT Press, Cambridge, MA, 1989.]]

Digital Library

[12]

SUBHLOK, J., O'HALLARON, D., GROSS, T., DINDA, P., AND WEBB, J. Communication and memory requirements as the basis for mapping task and data parallel programs. In Supercomputing '94 (Washington, DC, November 1994), pp. 330- 339.]]

[13]

SUBHLOK, J., STICHNOTH, J., O'HALLARON, D., AND GROSS, T. Exploiting task and data parallelism on a multicomputer. in A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming (San Diego, CA, May 1993), pp. 13- 22.]]

Digital Library

[14]

VONDRAN, G. Optimization of latency, throughput and processors for pipelines of data parallel tasks. Master's thesis, Dept. of Electrical and Computer Engineering, Carnegie Mellon University, 1995. In preparation.]]

[15]

WEBB, J. Latency and bandwidth consideration in parallel robotics image processing. In Supercomputing '93 (Portland, OR, Nov. 1993), pp. 230-239.]]

Digital Library

[16]

YANG, B., WEBB, J., STICHNOTH, J., O'HALLARON, D., AND GROSS, T. Do&merge: Integrating parallel loops and reductions. In Sixth Annual Workshop on Languages and Compilers for Parallel Computing (Portland, Oregon, Aug 1993).]]

Digital Library

[17]

YANG, T. Scheduling and Code Generation for Parallel Architectures. PhD thesis, Rutgers University, May 1993.]]

Digital Library

Cited By

Benoit AÇatalyürek ÜRobert YSaule E(2013)A survey of pipelined workflow schedulingACM Computing Surveys10.1145/2501654.250166445:4(1-36)Online publication date: 30-Aug-2013
https://dl.acm.org/doi/10.1145/2501654.2501664
Benoit ADufossé FGirault ARobert Y(2013)Reliability and performance optimization of pipelined real-time systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2013.02.00973:6(851-865)Online publication date: 1-Jun-2013
https://dl.acm.org/doi/10.1016/j.jpdc.2013.02.009
Benoit ARenaud-Goud PRobert Y(2011)Models and complexity results for performance and energy optimization of concurrent streaming applicationsThe International Journal of High Performance Computing Applications10.1177/109434201141474225:3(261-273)Online publication date: 7-Jul-2011
https://doi.org/10.1177/1094342011414742
Show More Cited By

Index Terms

Optimal mapping of sequences of data parallel tasks

Recommendations

Optimal mapping of sequences of data parallel tasks

Many applications in a variety of domains including digital signal processing, image processing and computer vision are composed of a sequence of tasks that act on a stream of input data sets in a pipelined manner. Recent research has established that ...
Communicating Data-Parallel Tasks: An MPI Library for HPF
HIPC '96: Proceedings of the Third International Conference on High-Performance Computing (HiPC '96)

High Performance Fortran (HPF) has emerged as a standard dialect of Fortran for data-parallel computing. However, HPF does not support task parallelism or heterogeneous computing adequately. This paper presents a summary of our work on a library-based ...
Combined scheduling and mapping for scalable computing with parallel tasks
Biological Knowledge Discovery and Data Mining

Recent and future parallel clusters and supercomputers use symmetric multiprocessors SMPs and multi-core processors as basic nodes, providing a huge amount of parallel resources. These systems often have hierarchically structured interconnection ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPOPP '95: Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming

August 1995

234 pages

ISBN:0897917006

DOI:10.1145/209936

Chairmen:
Jeanne Ferrante
Univ. of California, San Diego
,
David Padua
Univ. of Illinois at Urbana-Champaign, Urbana
,
Editor:
Richard L. Wexelblat
IRS IS:AO, Washington, DC

ACM SIGPLAN Notices Volume 30, Issue 8
Aug. 1995
226 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/209937
Editors:
Richard L. Wexelblat
IRS IS:AO, Washington, DC
,
Jeanne Ferrante
Univ. of California, San Diego
,
David Padua
Univ. of Illinois at Urbana-Champaign, Urbana
Issue’s Table of Contents

Copyright © 1995 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 1995

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

PPoPP95

Sponsor:

SIGPLAN

PPoPP95: Principles & Practices of Parallel Programming

July 19 - 21, 1995

California, Santa Barbara, USA

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

36
Total Citations
View Citations
650
Total Downloads

Downloads (Last 12 months)131
Downloads (Last 6 weeks)25

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Benoit AÇatalyürek ÜRobert YSaule E(2013)A survey of pipelined workflow schedulingACM Computing Surveys10.1145/2501654.250166445:4(1-36)Online publication date: 30-Aug-2013
https://dl.acm.org/doi/10.1145/2501654.2501664
Benoit ADufossé FGirault ARobert Y(2013)Reliability and performance optimization of pipelined real-time systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2013.02.00973:6(851-865)Online publication date: 1-Jun-2013
https://dl.acm.org/doi/10.1016/j.jpdc.2013.02.009
Benoit ARenaud-Goud PRobert Y(2011)Models and complexity results for performance and energy optimization of concurrent streaming applicationsThe International Journal of High Performance Computing Applications10.1177/109434201141474225:3(261-273)Online publication date: 7-Jul-2011
https://doi.org/10.1177/1094342011414742
Benoit ABouziane HRobert Y(2011)Optimizing the Reliability of Streaming Applications Under Throughput ConstraintsInternational Journal of Parallel Programming10.1007/s10766-011-0165-639:5(584-614)Online publication date: 1-Mar-2011
https://doi.org/10.1007/s10766-011-0165-6
Benoit ARobert Y(2010)Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel WorkflowsAlgorithmica10.5555/3118226.311847257:4(689-724)Online publication date: 1-Aug-2010
https://dl.acm.org/doi/10.5555/3118226.3118472
Benoit ADufossé FGallet MRobert YGaujal Bauf der Heide FPhillips C(2010)Computing the throughput of probabilistic and replicated streaming applicationsProceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures10.1145/1810479.1810511(166-175)Online publication date: 13-Jun-2010
https://dl.acm.org/doi/10.1145/1810479.1810511
Benoit ADufosse FGirault ARobert Y(2010)Reliability and Performance Optimization of Pipelined Real-Time SystemsProceedings of the 2010 39th International Conference on Parallel Processing10.1109/ICPP.2010.24(150-159)Online publication date: 13-Sep-2010
https://dl.acm.org/doi/10.1109/ICPP.2010.24
Benoit AKosch HRehn-Sonigo VRobert Y(2009)Multi-Criteria Scheduling of Pipeline Workflows (and Application To the JPEG Encoder)International Journal of High Performance Computing Applications10.1177/109434200910400923:2(171-187)Online publication date: 1-May-2009
https://dl.acm.org/doi/10.1177/1094342009104009
Benoit AGallet MGaujal BRobert Y(2009)Computing the Throughput of Replicated Workflows on Heterogeneous PlatformsProceedings of the 2009 International Conference on Parallel Processing10.1109/ICPP.2009.41(204-211)Online publication date: 22-Sep-2009
https://dl.acm.org/doi/10.1109/ICPP.2009.41
Devi U(2009)Scheduling Recurrent Precedence-Constrained Task Graphs on a Symmetric Shared-Memory MultiprocessorProceedings of the 15th International Euro-Par Conference on Parallel Processing10.1007/978-3-642-03869-3_27(265-280)Online publication date: 23-Aug-2009
https://dl.acm.org/doi/10.1007/978-3-642-03869-3_27
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten