Article

Advanced eager scheduling for Java-based adaptively parallel computing

Authors:

Michael O. Neary,

Peter CappelloAuthors Info & Claims

JGI '02: Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande

Pages 56 - 65

https://doi.org/10.1145/583810.583817

Published: 03 November 2002 Publication History

Abstract

Javelin 3 is a software system for developing large-scale, fault tolerant, adaptively parallel applications. When all or part of their application can be cast as a master-worker or branch-and-bound computation, Javelin 3 frees application developers from concerns about inter-processor communication and fault tolerance among networked hosts, allowing them to focus on the underlying application. The paper describes a fault tolerant task scheduler and its performance analysis. The task scheduler integrates work stealing with an advanced form of eager scheduling. It enables dynamic task decomposition, which improves host load-balancing in the presence of tasks whose non-uniform computational load is evident only at execution time. Speedup measurements are presented of actual performance on up to 1,000 hosts. We analyze the expected performance degradation due to unresponsive hosts, and measure actual performance degradation due to unresponsive hosts.

References

[1]

A. Alexandrov, M. Ibel, K. E. Schauser, and C. Scheiman. SuperWeb: Research Issues in Java-Based Global Computing. Concurrency: Practice and Experience, 9(6):535--553, June 1997.]]

[2]

J. E. Baldeschwieler, R. D. Blumofe, and E. A. Brewer. ATLAS: An Infrastructure for Global Computing. In Proceedings of the Seventh ACM SIGOPS European Workshop on System Support for Worldwide Applications, 1996.]]

Digital Library

[3]

A. Baratloo, M. Karaul, Z. Kedem, and P. Wyckoff. Charlotte: Metacomputing on the Web. In Proceedings of the 9th Conference on Parallel and Distributed Computing Systems, 1996.]]

[4]

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. In 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP '95), pages 207--216, Santa Barbara, CA, July 1995.]]

Digital Library

[5]

T. Brecht, H. Sandhu, M. Shan, and J. Talbot. ParaWeb: Towards World-Wide Supercomputing. In Proc. 7th ACM SIGOPS European Workshop on System Support for Worldwide Applications, 1996.]]

Digital Library

[6]

H. Casanova, G. Obertelli, F. Berman, and R. Wolski. The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid. In Proceedings of Super Computing, Nov. 2000. Dallas, TX.]]

Digital Library

[7]

B. O. Christiansen, P. Cappello, M. F. Ionescu, M. O. Neary, K. E. Schauser, and D. Wu. Javelin: Internet-Based Parallel Computing Using Java. Concurrency: Practice and Experience, 9(11):1139--1160, Nov. 1997.]]

[8]

B. N. Chun and D. E. Culler. REXEC: A Decentralized, Secure Remote Execution Environment for Clusters. In Proc. 4th Workshop on Communication, Architecture, and Applications for Network-based Parallel Computing, Jan. 2000. Toulouse, France.]]

Digital Library

[9]

D. H. J. Epema, M. Livny, R. van Dantzig, X. Evers, and J. Pruyne. A Worldwide Flock of Condors: Load Sharing among Workstation Clusters. Future Generation Computer Systems, 12:53--65, 1996.]]

Digital Library

[10]

I. Foster and C. Kesselman. Globus: A Metacomputing Infrastructure Toolkit. International Journal of Supercomputer Applications, 1997.]]

Digital Library

[11]

G. Fox and W. Furmanski. Java for Parallel Computing and as a General Language for Scientific and Engineering Simulation and Modeling. Concurrency: Practice and Experience, 9(6):415--425, June 1997.]]

[12]

J. Frey, T. Tannenbaum, I. Foster, M. Livny, and S. Tuecke. Condor-G: A Computation Management Agent for Multi- Institutional Grids. In Proc. Tenth IEEE Symposium on High Performance Distributed Computing (HPDC10), Aug. 2000. San Francisco, CA.]]

Digital Library

[13]

D. Gelernter and D. Kaminsky. Supercomputing out of Recycled Garbage: Preliminary Experience with Piranha. In Proc. Sixth ACM Int. Conf. on Supercomputing, July 1992.]]

Digital Library

[14]

A. S. Grimshaw, W. A. Wulf, and the Legion team. The Legion Vision of a Worldwide Virtual Computer. Communications of the ACM, 40(1):39--45, Jan. 1997.]]

Digital Library

[15]

K. Kennedy, M. Mazina, J. Mellor-Crummey, K. Cooper, L. Torczon, F. Berman, A. Chien, H. Dail, O. Sievert, D. Angulo, I. Foster, D. Gannon, L. Johnsson, C. Kesselman, R. Aydt, D. Reed, J. Dongarra, S. Vadhiyar, and R. Wolski. Toward a Framework for Preparing and Executing Adaptive Grid Programs. In Proc. NSF Next Generation Systems Program Workshop (Int. Parallel and Distributed Processing Symp.), Apr. 2002. Ft. Lauderdale, FL.]]

Digital Library

[16]

M. O. Neary, S. P. Brydon, P. Kmiec, S. Rollins, and P. Cappello. Javelin++: Scalability Issues in Global Computing. Concurrency: Practice and Experience, pages 727--753, Dec. 2000.]]

[17]

M. O. Neary and P. Cappello. Internet-Based TSP Computation with Javelin++. In 1st International Workshop on Scalable Web Services (SWS 2000), International Conference on Parallel Processing, Toronto, Canada, Aug. 2000.]]

Digital Library

[18]

M. O. Neary, A. Phipps, S. Richman, and P. Cappello. Javelin 2.0: Java-Based Parallel Computing on the Internet. In Euro-Par 2000, pages 1231--1238, Munich, Germany, Aug. 2000.]]

Digital Library

[19]

M. Nibhanupudi and B. Szymanski. BSP-based Adaptive Parallel Processing. In R. Buyya, editor, High Performance Cluster Computing, pages 702--721. Prentice-Hall, 1999.]]

[20]

L. F. G. Sarmenta and S. Hirano. Bayanihan: Building and Studying Web-Based Volunteer Computing Systems Using Java. Future Generation Computer Systems, 15(5-6):675--686, Oct. 1999.]]

Digital Library

[21]

R. van Nieupoort, J. Maassen, H. E. Bal, T. Kielmann, and R. Veldema. Wide-Area Parallel Computing in Java. In ACM 1999 Java Grande Conference, pages 8--14, San Francisco, June 1999.]]

Digital Library

[22]

G. von Laszewski, I. Foster, J. Gawor, W. Smith, and S. Tuecke. CoG Kits: A Bridge between Commodity Distributed Computing and High-Performance Grids. In ACM Java Grande Conference, June 2000.]]

Digital Library

[23]

M. Welsh, D. Culler, and E. Brewer. SEDA: An Architecture for Well-Conditioned, Scalable Internet Services. In Proc. 18th Symp. Operating Systems Principles, Oct. 2001. Lake Louise, Canada.]]

Digital Library

[24]

R. Wolski, J. Brevik, C. Krintz, G. Obertelli, N. Spring, and A. Su. Running EveryWare on the Computational Grid. In Proc. of SC99, Nov. 1999.]]

Digital Library

Cited By

Janjic VHammond K(2013)How to be a successful thiefProceedings of the 19th international conference on Parallel Processing10.1007/978-3-642-40047-6_14(114-125)Online publication date: 26-Aug-2013
https://dl.acm.org/doi/10.1007/978-3-642-40047-6_14
Janjic VHammond K(2012)Using load information in work-stealing on distributed systems with non-uniform communication latenciesProceedings of the 18th international conference on Parallel Processing10.1007/978-3-642-32820-6_17(155-166)Online publication date: 27-Aug-2012
https://dl.acm.org/doi/10.1007/978-3-642-32820-6_17
Van Nieuwpoort RWrzesińska GJacobs CBal H(2010)SatinACM Transactions on Programming Languages and Systems10.1145/1709093.170909632:3(1-39)Online publication date: 16-Mar-2010
https://dl.acm.org/doi/10.1145/1709093.1709096
Show More Cited By

Index Terms

Advanced eager scheduling for Java-based adaptively parallel computing
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages

Recommendations

Advanced eager scheduling for Java-based adaptive parallel computing: Research Articles
2002 ACM Java Grande–ISCOPE Conference Part II

Javelin 3 is a software system for developing large-scale, fault-tolerant, adaptively parallel applications. When all or part of their application can be cast as a master–worker or branch-and-bound computation, Javelin 3 frees application developers ...
Cloud Model-Based Security-Aware and Fault-Tolerant Job Scheduling for Computing Grid
CHINAGRID '10: Proceedings of the The Fifth Annual ChinaGrid Conference

The uncertainties of grid nodes security are main hurdle to make the job scheduling secure, reliable and fault-tolerant. The fixed fault-tolerant strategy in jobs scheduling may utilize excessive resources. In this paper, the job scheduling decides ...
Performance and cost evaluation of Gang Scheduling in a Cloud Computing system with job migrations and starvation handling
ISCC '11: Proceedings of the 2011 IEEE Symposium on Computers and Communications

Cloud Computing is an emerging technology in the area of parallel and distributed computing. Clouds consist of a collection of virtualized resources, which include both computational and storage facilities that can be provisioned on demand, depending on ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

JGI '02: Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande

November 2002

252 pages

ISBN:1581135998

DOI:10.1145/583810

General Chair:
José E. Moreira
IBM Thomas J. Watson Research Center
,
Program Chairs:
Geoffrey C. Fox
Indiana University, Bloomington
,
Vladimir Getov
University of Westminster, London

Copyright © 2002 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2002

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

JGI02

Sponsor:

JGI02: Joint ACM Java Grande - ISCOPE 2002 Conference ( co-located with OOPSLA 2002 )

November 3 - 5, 2002

Washington, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 18 of 60 submissions, 30%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
687
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Janjic VHammond K(2013)How to be a successful thiefProceedings of the 19th international conference on Parallel Processing10.1007/978-3-642-40047-6_14(114-125)Online publication date: 26-Aug-2013
https://dl.acm.org/doi/10.1007/978-3-642-40047-6_14
Janjic VHammond K(2012)Using load information in work-stealing on distributed systems with non-uniform communication latenciesProceedings of the 18th international conference on Parallel Processing10.1007/978-3-642-32820-6_17(155-166)Online publication date: 27-Aug-2012
https://dl.acm.org/doi/10.1007/978-3-642-32820-6_17
Van Nieuwpoort RWrzesińska GJacobs CBal H(2010)SatinACM Transactions on Programming Languages and Systems10.1145/1709093.170909632:3(1-39)Online publication date: 16-Mar-2010
https://dl.acm.org/doi/10.1145/1709093.1709096
Watanabe KFukushi MHoriguchi S(2009)Optimal Spot-checking for Computation Time Minimization in Volunteer ComputingJournal of Grid Computing10.1007/s10723-009-9125-47:4(575-600)Online publication date: 18-Aug-2009
https://doi.org/10.1007/s10723-009-9125-4
Rosinha RGeyer CVargas P(2009)WSPE: a peer‐to‐peer grid programming environmentConcurrency and Computation: Practice and Experience10.1002/cpe.139221:13(1709-1724)Online publication date: 11-Feb-2009
https://doi.org/10.1002/cpe.1392
Byun EChoi SKim HHwang CLee S(2008)Advanced Job Scheduler Based on Markov Availability Model and Resource Selection in Desktop Grid Computing EnvironmentMetaheuristics for Scheduling in Distributed Computing Environments10.1007/978-3-540-69277-5_6(153-171)Online publication date: 2008
https://doi.org/10.1007/978-3-540-69277-5_6
Dangelmayr CBlochinger W(2008)Aspect‐oriented component assembly—a case study in parallel software designSoftware: Practice and Experience10.1002/spe.91239:9(807-832)Online publication date: 12-Dec-2008
https://doi.org/10.1002/spe.912
Rosinha RGeyer CVargas PSchulze BRana OMyers JCirne W(2007)WSPEProceedings of the 5th international workshop on Middleware for grid computing: held at the ACM/IFIP/USENIX 8th International Middleware Conference10.1145/1376849.1376855(1-6)Online publication date: 26-Nov-2007
https://dl.acm.org/doi/10.1145/1376849.1376855
Byun EChoi SBaik MGil JPark CHwang C(2007)MJSAFuture Generation Computer Systems10.1016/j.future.2006.09.00423:4(616-622)Online publication date: 1-May-2007
https://dl.acm.org/doi/10.1016/j.future.2006.09.004
Ranaldo NZimeo E(2006)An economy-driven mapping heuristic for hierarchical master-slave applications in grid systemsProceedings of the 20th international conference on Parallel and distributed processing10.5555/1898953.1899090(162-162)Online publication date: 25-Apr-2006
https://dl.acm.org/doi/10.5555/1898953.1899090
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents