Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/379539.379563acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article

Efficient load balancing for wide-area divide-and-conquer applications

Published: 18 June 2001 Publication History

Abstract

Divide-and-conquer programs are easily parallelized by letting the programmer annotate potential parallelism in the form of spawn and sync constructs. To achieve efficient program execution, the generated work load has to be balanced evenly among the available CPUs. For single cluster systems, Random Stealing (RS) is known to achieve optimal load balancing. However, RS is inefficient when applied to hierarchical wide-area systems where multiple clusters are connected via wide-area networks (WANs) with high latency and low bandwidth.
In this paper, we experimentally compare RS with existing load-balancing strategies that are believed to be efficient for multi-cluster systems, Random Pushing and two variants of Hierarchical Stealing. We demonstrate that, in practice, they obtain less than optimal results. We introduce a novel load-balancing algorithm, Cluster-aware Random Stealing (CRS) which is highly efficient and easy to implement. CRS adapts itself to network conditions and job granularities, and does not require manually-tuned parameters. Although CRS sends more data across the WANs, it is faster than its competitors for 11 out of 12 test applications with various WAN configurations. It has at most 4% overhead in run time compared to RS on a single, large cluster, even with high wide-area latencies and low wide-area bandwidths. These strong results suggest that divide-and-conquer parallelism is a useful model for writing distributed supercomputing applications on hierarchical wide-area systems.

References

[1]
M. Backschat, A. Pfaffinger, and C. Zenger. Economic Based Dynamic Load Distribution in Large Workstation Networks. In Euro-Par'96, number 1124, pages 631-634, 1996.]]
[2]
H. E. Bal, R. Bhoedjang, R. Hofman, C. Jacobs, K. Langendoen, T. R uhl, and F. Kaashoek. Performance Evaluation of the Orca Shared Object System. ACM Transactions on Computer Systems, 16(1):1-40, Feb. 1998.]]
[3]
J. Baldeschwieler, R. Blumofe, and E. Brewer. ATLAS: An Infrastructure for Global Computing. In Seventh ACM SIGOPS European Workshop on System Support for Worldwide Applications, 1996.]]
[4]
R. A. F. Bhoedjang, T. R uhl, and H. E. Bal. User-Level Network Interface Protocols. IEEE Computer, 31(11):53-60, Nov. 1998.]]
[5]
A. Bik, J. Villacis, and D. Gannon. Javar: A Prototype Java Restructuring Compiler. Concurrency: Practice and Experience, 9(11):1181-1191, Nov. 1997.]]
[6]
R. Blumofe and P. Lisiecki. Adaptive and Reliable Parallel Computing on Networks of Workstations. In USENIX 1997 Annual Technical Conference on UNIX and Advanced Computing Systems, Anaheim, California, 1997.]]
[7]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. In 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'95), pages 207-216, Santa Barbara, California, July 1995.]]
[8]
R. D. Blumofe and C. E. Leiserson. Scheduling Multithreaded Computations by Work Stealing. In 35th Annual Symposium on Foundations of Computer Science (FOCS '94), pages 356-368, Santa Fe, New Mexico, Nov. 1994.]]
[9]
J. Darlington and M. Reeve. Alice: a Multi-processor Reduction Machine for the Parallel Evaluation of Applicative Languages. In 1st Conference on Functional Programming Languages and Computer Architecture, pages 65-75, 1981.]]
[10]
D. L. Eager, E. D. Lazowska, and J. Zahorjan. A Comparison of Receiver-Initiated and Sender-Initiated Adaptive Load Sharing. Performance Evaluation, 6(1):53-68, Mar. 1986.]]
[11]
B. Freisleben and T. Kielmann. Automated Transformation of Sequential Divide-and-Conquer Algorithms into Parallel Programs. Computers and Artificial Intelligence, 14(6):579-596, 1995.]]
[12]
T. Kielmann, R. F. H. Hofman, H. E. Bal, A. Plaat, and R. A. F. Bhoedjang. MAGPIE: MPI's Collective Communication Operations for Clustered Wide Area Systems. In Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'99), pages 131-140, Atlanta, GA, May 1999.]]
[13]
T. Kunz. The Influence of Different Workload Descriptions on a Heuristic Load Balancing Scheme. IEEE Trans. Software Eng., 17(7):725-730, July 1991.]]
[14]
D. Lea. A Java Fork/Join Framework. In ACM Java Grande 2000 Conference, pages 36-43, San Francisco, CA, June 2000.]]
[15]
J. Maassen, T. Kielmann, and H. E. Bal. Parallel Application Experience with Replicated Method Invocation. Concurrency and Computation: Practice and Experience, 2001.]]
[16]
J. Maassen, R. van Nieuwpoort, R. Veldema, H. E. Bal, and A. Plaat. An Efficient Implementation of Java's Remote Method Invocation. In Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'99), pages 173-182, Atlanta, GA, May 1999.]]
[17]
E. Mohr, D. Kranz, and R. Halstead. Lazy Task Creation: a Technique for Increasing the Granularity of Parallel Programs. In Proceedings of the 1990 ACM Conference on Lisp and Functional Programming, pages 185-197, June 1990.]]
[18]
M. O. Neary, A. Phipps, S. Richman, and P. Cappello. Javelin 2.0: Java-Based Parallel Computing on the Internet. In Euro-Par 2000 Parallel Processing, number 1900 in Lecture Notes in Computer Science, pages 1231-1238, Munich, Germany, Aug. 2000. Springer.]]
[19]
L. Peng, W. Wong, M. Feng, and C. Yuen. SilkRoad: A Multithreaded Runtime System with Software Distributed Shared Memory for SMP Clusters. In IEEE International Conference on Cluster Computing (Cluster2000), pages 243-249, Chemnitz, Saxony, Germany, Nov. 2000.]]
[20]
A. Plaat, H. E. Bal, and R. F. H. Hofman. Sensitivity of Parallel Applications to Large Differences in Bandwidth and Latency in Two-Layer Interconnects. In High Performance Computer Architecture (HPCA-5), pages 244-253, Orlando, FL, January 1999.]]
[21]
N. G. Shivaratri, P. Krueger, and M. Ginghal. Load Distributing for Locally Distributed Systems. IEEE Computer, 25(12):33-44, Dec. 92.]]
[22]
R. V. van Nieuwpoort, T. Kielmann, and H. E. Bal. Satin: Efficient Parallel Divide-and-Conquer in Java. In Euro-Par 2000 Parallel Processing, number 1900 in Lecture Notes in Computer Science, pages 690-699, Munich, Germany, Aug. 2000. Springer.]]
[23]
R. V. van Nieuwpoort, J. Maassen, H. E. Bal, T. Kielmann, and R. Veldema. Wide-Area Parallel Programming using the Remote Method Invocation Model. Concurrency: Practice and Experience, 12(8):643-666, 2000.]]
[24]
I. Watson, V. Woods, P. Watson, R. Banach, M. Greenberg, and J. Sargeant. Flagship: a Parallel Architecture for Declarative Programming. In 15th IEEE/ACM Symp. on Computer Architecture, pages 124-130, Honolulu, Hawaii, 1988.]]
[25]
I.-C. Wu and H. Kung. Communication Complexity for Parallel Divide-and-Conquer. In 32nd Annual Symposium on Foundations of Computer Science (FOCS '91), pages 151-162, San Juan, Puerto Rico, Oct. 1991.]]

Cited By

View all
  • (2023)A Task Scheduling Algorithm for Hierarchical Hybrid Semi-Distributed System for Balancing Load2023 International Conference on Integration of Computational Intelligent System (ICICIS)10.1109/ICICIS56802.2023.10430288(1-7)Online publication date: 1-Nov-2023
  • (2021)Distributed Work Stealing at Scale via Matchmaking2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00040(250-260)Online publication date: Sep-2021
  • (2017)A REVIEW TOWARDS: LOAD BALANCING TECHNIQUESi-manager's Journal on Power Systems Engineering10.26634/jps.4.4.113974:4(47)Online publication date: 2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '01: Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
June 2001
142 pages
ISBN:1581133464
DOI:10.1145/379539
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2001

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Java
  2. clustered wide-area systems
  3. distributed supercomputing

Qualifiers

  • Article

Conference

PPoPP01
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A Task Scheduling Algorithm for Hierarchical Hybrid Semi-Distributed System for Balancing Load2023 International Conference on Integration of Computational Intelligent System (ICICIS)10.1109/ICICIS56802.2023.10430288(1-7)Online publication date: 1-Nov-2023
  • (2021)Distributed Work Stealing at Scale via Matchmaking2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00040(250-260)Online publication date: Sep-2021
  • (2017)A REVIEW TOWARDS: LOAD BALANCING TECHNIQUESi-manager's Journal on Power Systems Engineering10.26634/jps.4.4.113974:4(47)Online publication date: 2017
  • (2017)Work-Stealing for NUMA-enabled ArchitectureTask Scheduling for Multi-core and Parallel Architectures10.1007/978-981-10-6238-4_4(73-111)Online publication date: 25-Nov-2017
  • (2017)Work-Stealing for Multi-socket ArchitectureTask Scheduling for Multi-core and Parallel Architectures10.1007/978-981-10-6238-4_3(29-72)Online publication date: 25-Nov-2017
  • (2016)Robinhood: Towards Efficient Work-Stealing in Virtualized EnvironmentsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.249256327:8(2363-2376)Online publication date: 1-Aug-2016
  • (2015)Towards efficient work-stealing in virtualized environmentsProceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2015.23(41-50)Online publication date: 4-May-2015
  • (2014)GLBProceedings of the first workshop on Parallel programming for analytics applications10.1145/2567634.2567639(31-40)Online publication date: 16-Feb-2014
  • (2014)Link-heterogeneous work stealingProceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2014.85(354-363)Online publication date: 26-May-2014
  • (2014)An adaptive and hierarchical task scheduling scheme for multi-core clustersParallel Computing10.1016/j.parco.2014.09.01240:10(611-627)Online publication date: 1-Dec-2014
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media