Article

Efficient load balancing for wide-area divide-and-conquer applications

Authors:

Rob V. van Nieuwpoort,

Thilo Kielmann,

Henri E. BalAuthors Info & Claims

PPoPP '01: Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming

Pages 34 - 43

https://doi.org/10.1145/379539.379563

Published: 18 June 2001 Publication History

Abstract

Divide-and-conquer programs are easily parallelized by letting the programmer annotate potential parallelism in the form of spawn and sync constructs. To achieve efficient program execution, the generated work load has to be balanced evenly among the available CPUs. For single cluster systems, Random Stealing (RS) is known to achieve optimal load balancing. However, RS is inefficient when applied to hierarchical wide-area systems where multiple clusters are connected via wide-area networks (WANs) with high latency and low bandwidth.

In this paper, we experimentally compare RS with existing load-balancing strategies that are believed to be efficient for multi-cluster systems, Random Pushing and two variants of Hierarchical Stealing. We demonstrate that, in practice, they obtain less than optimal results. We introduce a novel load-balancing algorithm, Cluster-aware Random Stealing (CRS) which is highly efficient and easy to implement. CRS adapts itself to network conditions and job granularities, and does not require manually-tuned parameters. Although CRS sends more data across the WANs, it is faster than its competitors for 11 out of 12 test applications with various WAN configurations. It has at most 4% overhead in run time compared to RS on a single, large cluster, even with high wide-area latencies and low wide-area bandwidths. These strong results suggest that divide-and-conquer parallelism is a useful model for writing distributed supercomputing applications on hierarchical wide-area systems.

References

[1]

M. Backschat, A. Pfaffinger, and C. Zenger. Economic Based Dynamic Load Distribution in Large Workstation Networks. In Euro-Par'96, number 1124, pages 631-634, 1996.]]

[2]

H. E. Bal, R. Bhoedjang, R. Hofman, C. Jacobs, K. Langendoen, T. R uhl, and F. Kaashoek. Performance Evaluation of the Orca Shared Object System. ACM Transactions on Computer Systems, 16(1):1-40, Feb. 1998.]]

Digital Library

[3]

J. Baldeschwieler, R. Blumofe, and E. Brewer. ATLAS: An Infrastructure for Global Computing. In Seventh ACM SIGOPS European Workshop on System Support for Worldwide Applications, 1996.]]

Digital Library

[4]

R. A. F. Bhoedjang, T. R uhl, and H. E. Bal. User-Level Network Interface Protocols. IEEE Computer, 31(11):53-60, Nov. 1998.]]

Digital Library

[5]

A. Bik, J. Villacis, and D. Gannon. Javar: A Prototype Java Restructuring Compiler. Concurrency: Practice and Experience, 9(11):1181-1191, Nov. 1997.]]

[6]

R. Blumofe and P. Lisiecki. Adaptive and Reliable Parallel Computing on Networks of Workstations. In USENIX 1997 Annual Technical Conference on UNIX and Advanced Computing Systems, Anaheim, California, 1997.]]

Digital Library

[7]

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. In 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'95), pages 207-216, Santa Barbara, California, July 1995.]]

Digital Library

[8]

R. D. Blumofe and C. E. Leiserson. Scheduling Multithreaded Computations by Work Stealing. In 35th Annual Symposium on Foundations of Computer Science (FOCS '94), pages 356-368, Santa Fe, New Mexico, Nov. 1994.]]

Digital Library

[9]

J. Darlington and M. Reeve. Alice: a Multi-processor Reduction Machine for the Parallel Evaluation of Applicative Languages. In 1st Conference on Functional Programming Languages and Computer Architecture, pages 65-75, 1981.]]

Digital Library

[10]

D. L. Eager, E. D. Lazowska, and J. Zahorjan. A Comparison of Receiver-Initiated and Sender-Initiated Adaptive Load Sharing. Performance Evaluation, 6(1):53-68, Mar. 1986.]]

Digital Library

[11]

B. Freisleben and T. Kielmann. Automated Transformation of Sequential Divide-and-Conquer Algorithms into Parallel Programs. Computers and Artificial Intelligence, 14(6):579-596, 1995.]]

[12]

T. Kielmann, R. F. H. Hofman, H. E. Bal, A. Plaat, and R. A. F. Bhoedjang. MAGPIE: MPI's Collective Communication Operations for Clustered Wide Area Systems. In Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'99), pages 131-140, Atlanta, GA, May 1999.]]

Digital Library

[13]

T. Kunz. The Influence of Different Workload Descriptions on a Heuristic Load Balancing Scheme. IEEE Trans. Software Eng., 17(7):725-730, July 1991.]]

Digital Library

[14]

D. Lea. A Java Fork/Join Framework. In ACM Java Grande 2000 Conference, pages 36-43, San Francisco, CA, June 2000.]]

Digital Library

[15]

J. Maassen, T. Kielmann, and H. E. Bal. Parallel Application Experience with Replicated Method Invocation. Concurrency and Computation: Practice and Experience, 2001.]]

[16]

J. Maassen, R. van Nieuwpoort, R. Veldema, H. E. Bal, and A. Plaat. An Efficient Implementation of Java's Remote Method Invocation. In Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'99), pages 173-182, Atlanta, GA, May 1999.]]

Digital Library

[17]

E. Mohr, D. Kranz, and R. Halstead. Lazy Task Creation: a Technique for Increasing the Granularity of Parallel Programs. In Proceedings of the 1990 ACM Conference on Lisp and Functional Programming, pages 185-197, June 1990.]]

Digital Library

[18]

M. O. Neary, A. Phipps, S. Richman, and P. Cappello. Javelin 2.0: Java-Based Parallel Computing on the Internet. In Euro-Par 2000 Parallel Processing, number 1900 in Lecture Notes in Computer Science, pages 1231-1238, Munich, Germany, Aug. 2000. Springer.]]

[19]

L. Peng, W. Wong, M. Feng, and C. Yuen. SilkRoad: A Multithreaded Runtime System with Software Distributed Shared Memory for SMP Clusters. In IEEE International Conference on Cluster Computing (Cluster2000), pages 243-249, Chemnitz, Saxony, Germany, Nov. 2000.]]

[20]

A. Plaat, H. E. Bal, and R. F. H. Hofman. Sensitivity of Parallel Applications to Large Differences in Bandwidth and Latency in Two-Layer Interconnects. In High Performance Computer Architecture (HPCA-5), pages 244-253, Orlando, FL, January 1999.]]

[21]

N. G. Shivaratri, P. Krueger, and M. Ginghal. Load Distributing for Locally Distributed Systems. IEEE Computer, 25(12):33-44, Dec. 92.]]

Digital Library

[22]

R. V. van Nieuwpoort, T. Kielmann, and H. E. Bal. Satin: Efficient Parallel Divide-and-Conquer in Java. In Euro-Par 2000 Parallel Processing, number 1900 in Lecture Notes in Computer Science, pages 690-699, Munich, Germany, Aug. 2000. Springer.]]

Digital Library

[23]

R. V. van Nieuwpoort, J. Maassen, H. E. Bal, T. Kielmann, and R. Veldema. Wide-Area Parallel Programming using the Remote Method Invocation Model. Concurrency: Practice and Experience, 12(8):643-666, 2000.]]

[24]

I. Watson, V. Woods, P. Watson, R. Banach, M. Greenberg, and J. Sargeant. Flagship: a Parallel Architecture for Declarative Programming. In 15th IEEE/ACM Symp. on Computer Architecture, pages 124-130, Honolulu, Hawaii, 1988.]]

Digital Library

[25]

I.-C. Wu and H. Kung. Communication Complexity for Parallel Divide-and-Conquer. In 32nd Annual Symposium on Foundations of Computer Science (FOCS '91), pages 151-162, San Juan, Puerto Rico, Oct. 1991.]]

Digital Library

Cited By

Datta LPal A(2023)A Task Scheduling Algorithm for Hierarchical Hybrid Semi-Distributed System for Balancing Load2023 International Conference on Integration of Computational Intelligent System (ICICIS)10.1109/ICICIS56802.2023.10430288(1-7)Online publication date: 1-Nov-2023
https://doi.org/10.1109/ICICIS56802.2023.10430288
Parikh HDeodhar VGavrilovska APande S(2021)Distributed Work Stealing at Scale via Matchmaking2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00040(250-260)Online publication date: Sep-2021
https://doi.org/10.1109/Cluster48925.2021.00040
NEERAJ R(2017)A REVIEW TOWARDS: LOAD BALANCING TECHNIQUESi-manager's Journal on Power Systems Engineering10.26634/jps.4.4.113974:4(47)Online publication date: 2017
https://doi.org/10.26634/jps.4.4.11397
Show More Cited By

Index Terms

Efficient load balancing for wide-area divide-and-conquer applications

Recommendations

Efficient load balancing for wide-area divide-and-conquer applications

Divide-and-conquer programs are easily parallelized by letting the programmer annotate potential parallelism in the form of spawn and sync constructs. To achieve efficient program execution, the generated work load has to be balanced evenly among the ...
Grid-enabled parallel divide-and-conquer: theory and practice
SAC '02: Proceedings of the 2002 ACM symposium on Applied computing

This paper presents a general methodology for the communication-efficient parallelization of graph algorithms using the divide-and-conquer approach. The algorithm is communication-free in the conquer stage and uses only a small amount of messages while ...
Divide-and-conquer programming on MIMD computers
IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing

We have developed a programming template to implement divide and conquer algorithms on MIMD computers. The template is based on the parallel divide and conquer function of Z.G. Mou and P. Hudak (1988). We explore the programmability and performance of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '01: Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming

June 2001

142 pages

ISBN:1581133464

DOI:10.1145/379539

Chairmen:
Michael Heath
Univ. of Illinois, Illinois, IN
,
Andrew Lumsdaine
Indiana Univ.

ACM SIGPLAN Notices Volume 36, Issue 7
July 2001
143 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/568014
Issue’s Table of Contents

Copyright © 2001 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2001

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

PPoPP01

Sponsor:

SIGPLAN

PPoPP01: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Utah, Snowbird, USA

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

94
Total Citations
View Citations
1,068
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Datta LPal A(2023)A Task Scheduling Algorithm for Hierarchical Hybrid Semi-Distributed System for Balancing Load2023 International Conference on Integration of Computational Intelligent System (ICICIS)10.1109/ICICIS56802.2023.10430288(1-7)Online publication date: 1-Nov-2023
https://doi.org/10.1109/ICICIS56802.2023.10430288
Parikh HDeodhar VGavrilovska APande S(2021)Distributed Work Stealing at Scale via Matchmaking2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00040(250-260)Online publication date: Sep-2021
https://doi.org/10.1109/Cluster48925.2021.00040
NEERAJ R(2017)A REVIEW TOWARDS: LOAD BALANCING TECHNIQUESi-manager's Journal on Power Systems Engineering10.26634/jps.4.4.113974:4(47)Online publication date: 2017
https://doi.org/10.26634/jps.4.4.11397
Chen QGuo MChen QGuo M(2017)Work-Stealing for NUMA-enabled ArchitectureTask Scheduling for Multi-core and Parallel Architectures10.1007/978-981-10-6238-4_4(73-111)Online publication date: 25-Nov-2017
https://doi.org/10.1007/978-981-10-6238-4_4
Chen QGuo MChen QGuo M(2017)Work-Stealing for Multi-socket ArchitectureTask Scheduling for Multi-core and Parallel Architectures10.1007/978-981-10-6238-4_3(29-72)Online publication date: 25-Nov-2017
https://doi.org/10.1007/978-981-10-6238-4_3
Peng YWu SJin H(2016)Robinhood: Towards Efficient Work-Stealing in Virtualized EnvironmentsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.249256327:8(2363-2376)Online publication date: 1-Aug-2016
https://doi.org/10.1109/TPDS.2015.2492563
Peng YWu SJin HBalaji PXu C(2015)Towards efficient work-stealing in virtualized environmentsProceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2015.23(41-50)Online publication date: 4-May-2015
https://dl.acm.org/doi/10.1109/CCGrid.2015.23
Zhang WTardieu OGrove DHerta BKamada TSaraswat VTakeuchi MKumar MJann JNagpurkar P(2014)GLBProceedings of the first workshop on Parallel programming for analytics applications10.1145/2567634.2567639(31-40)Online publication date: 16-Feb-2014
https://dl.acm.org/doi/10.1145/2567634.2567639
Vu TDerbel BReed DSun XFoster I(2014)Link-heterogeneous work stealingProceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2014.85(354-363)Online publication date: 26-May-2014
https://dl.acm.org/doi/10.1109/CCGrid.2014.85
Wang YZhang YSu YWang XChen XJi WShi F(2014)An adaptive and hierarchical task scheduling scheme for multi-core clustersParallel Computing10.1016/j.parco.2014.09.01240:10(611-627)Online publication date: 1-Dec-2014
https://dl.acm.org/doi/10.1016/j.parco.2014.09.012
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents