Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Malleable applications for scalable high performance computing

Published: 01 September 2007 Publication History

Abstract

Iterative applications are known to run as slow as their slowest computational component. This paper introduces malleability , a new dynamic reconfiguration strategy to overcome this limitation. Malleability is the ability to dynamically change the data size and number of computational entities in an application. Malleability can be used by middleware to autonomously reconfigure an application in response to dynamic changes in resource availability in an architecture-aware manner, allowing applications to optimize the use of multiple processors and diverse memory hierarchies in heterogeneous environments.
The modular Internet Operating System (IOS) was extended to reconfigure applications autonomously using malleability. Two different iterative applications were made malleable. The first is used in astronomical modeling, and representative of maximum-likelihood applications was made malleable in the SALSA programming language. The second models the diffusion of heat over a two dimensional object, and is representative of applications such as partial differential equations and some types of distributed simulations. Versions of the heat application were made malleable both in SALSA and MPI. Algorithms for concurrent data redistribution are given for each type of application. Results show that using malleability for reconfiguration is 10 to 100 times faster on the tested environments. The algorithms are also shown to be highly scalable with respect to the quantity of data involved. While previous work has shown the utility of dynamically reconfigurable applications using only computational component migration, malleability is shown to provide up to a 15% speedup over component migration alone on a dynamic cluster environment.
This work is part of an ongoing research effort to enable applications to be highly reconfigurable and autonomously modifiable by middleware in order to efficiently utilize distributed environments. Grid computing environments are becoming increasingly heterogeneous and dynamic, placing new demands on applications' adaptive behavior. This work shows that malleability is a key aspect in enabling effective dynamic reconfiguration of iterative applications in these environments.

References

[1]
1. Agbaria, A., Friedman, R.: Starfish: Fault-tolerant dynamic MPI programs on clusters of workstations. In: Proceedings of The Eighth IEEE International Symposium on High Performance Distributed Computing, p. 31. IEEE Computer Society (1999).
[2]
2. Agha, G.: Actors: A Model of Concurrent Computation in Distributed Systems. MIT Press (1986).
[3]
3. Anderson, D. P., Cobb, J., Korpela, E., Lebofsky, M., Werthimer, D.: SETI@home: an experiment in public-resource computing. Commun. ACM 45 (11), 56-61 (2002).
[4]
4. Berman, F., Chien, A., Cooper, K., Dongarra, J., Foster, I., Gannon, D., Johnson, L., Kennedy, K., Kesselman, C., Mellor-Crummey, J., Reed, D., Torczon, L., Wolski, R.: The GrADS project: Software support for high-level grid application development. Int. J. High-Perform. Comput. Appl. 15 (4), 327-344 (2002).
[5]
5. Bhandarkar, M. A., Kale, L. V., de Sturler, E., Hoeflinger, J.: Adaptive load balancing for MPI programs. In: Proceedings of the International Conference on Computational Science--Part II, pp. 108-117. Springer (2001).
[6]
6. Blumofe, R. D., Leiserson, C. E.: Scheduling multithreaded computations by work stealing. In: Proceedings of the 35th Annual Symposium on Foundations of Computer Science (FOCS '94), Santa Fe, New Mexico, November 1994, pp. 356-368.
[7]
7. Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid information services for distributed resource sharing. In: 10th IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), August 2001.
[8]
8. Desell, T., Maghraoui, K. E., Varela, C.: Load balancing of autonomous actors over dynamic networks. In: Proceedings of the Hawaii International Conference on System Sciences, HICSS-37 Software Technology Track, January 2004, pp. 1-10.
[9]
9. Foster, I., Kesselman, C.: The Globus project: A status report. In: Antonio, J. (ed.) Proceedings of the Seventh Heterogeneous Computing Workshop (HCW '98), pp. 4-18. IEEE Computer Society (1998).
[10]
10. Huang, C., Lawlor, O., Kalé, L. V.: Adaptive MPI. In: Proceedings of the 16th International Workshop on Languages and Compilers for Parallel Computing (LCPC 03), College Station, Texas, October 2003.
[11]
11. Lan, Z., Taylor, V. E., Bryan, G.: Dynamic load balancing of SAMR applications on distributed systems. Sci. Progr. 10 (4), 319-328 (2002).
[12]
12. Litzkow, M., Livny, M., Mutka, M.: Condor--a hunter of idle workstations. In: Proceedings of the 8th International Conference of Distributed Computing Systems, June 1988, pp. 104-111.
[13]
13. Maghraoui, K. E., Flaherty, J., Szymanski, B., Teresco, J., Varela, C.: Adaptive computation over dynamic and heterogeneous networks. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Wasniewski, J. (eds.) Proc. of the Fifth International Conference on Parallel Processing and Applied Mathematics (PPAM'2003), Czestochowa, Poland, September 2003. Lecture Notes in Computer Science, vol. 3019, pp. 1083-1090. Springer, Berlin (2003).
[14]
14. Maghraoui, K. E., Desell, T., Szymanski, B. K., Teresco, J. D., Varela, C. A.: Towards a middleware framework for dynamically reconfigurable scientific computing. In: Grandinetti, L. (ed.) Grid Computing and New Frontiers of High Performance Processing. Elsevier (2005).
[15]
15. Maghraoui, K. E., Desell, T. J., Szymanski, B. K., Varela, C. A.: The internet operating system: Middleware for adaptive distributed computing. Int. J. High Perform. Comput. Appl. (IJHPCA) 10 (4), 467-480 (2006), Special Issue on Scheduling Techniques for Large-Scale Distributed Platforms.
[16]
16. Maghraoui, K. E., Szymanski, B., Varela, C.: An architecture for reconfigurable iterative mpi applications in dynamic environments. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Wasniewski, J. (eds.) Proc. of the Sixth International Conference on Parallel Processing and Applied Mathematics (PPAM'2005), ser. LNCS, no. 3911, Poznan, Poland, September 2005, pp. 258-271.
[17]
17. Message Passing Interface Forum, MPI: A message-passing interface standard, Int. J. Supercomput. Appl. High Perform. Comput. 8 (3/4), 159-416 (1994).
[18]
18. Pande, V., et al.: Atomistic protein folding simulations on the submillisecond timescale using worldwide distributed computing. Biopolymers 68 (1), 91-109 (2002), Peter Kollman Memorial Issue.
[19]
19. Purnell, J., Magdon-Ismail, M., Newberg, H.: A probabilistic approach to finding geometric objects in spatial datasets of the Milky Way. In: Proceedings of the 15th International Symposium on Methodoligies for Intelligent Systems (ISMIS 2005), Saratoga Springs, NY, USA, May 2005, pp. 475-484. Springer (2005).
[20]
20. Sievert, O., Casanova, H.: A simple MPI process swapping architecture for iterative applications. Int. J. High Perform. Comput. Appl. 18 (3), 341-352 (2004).
[21]
21. Stellner, G.: Cocheck: Checkpointing and process migration for MPI. In: Proceedings of the 10th International Parallel Processing Symposium, pp. 526-531. IEEE Computer Society (1996).
[22]
22. Szalay, A., Gray, J.: The world-wide telescope. Science 293 , 2037 (2001).
[23]
23. Taura, K., Kaneda, K., Endo, T.: Phoenix: a parallel programming model for accommodating dynamically joininig/leaving resources. In: Proc. of PPoPP, pp. 216-229. ACM (2003).
[24]
24. Vadhiyar, S. S., Dongarra, J. J.: SRS--a framework for developing malleable and migratable parallel applications for distributed systems. Parallel Process. Lett. 13 (2), 291-312 (2003).
[25]
25. Varela, C., Agha, G.: Programming dynamically reconfigurable open systems with SALSA. ACM SIGPLAN Not. OOPSLA'2001 Intriguing Techn. Track Proc. 36 (12), 20-34 (2001), http://www.cs.rpi.edu/~cvarela/oopsla2001.pdf
[26]
26. Varela, C. A., Ciancarini, P., Taura, K.: Worldwide computing: Adaptive middleware and programming technology for dynamic Grid environments. Sci. Program. J. 13 (4), 255-263 (2005), Guest Editorial.
[27]
27. Wolski, R., Spring, N. T., Hayes, J.: The Network Weather Service: A distributed resource performance forecasting service for metacomputing. Future Gener. Comput. Syst. 15 (5-6), 757-768 (1999).

Cited By

View all
  • (2024)Role-shifting threadsInternational Journal of High Performance Computing Applications10.1177/1094342023120115338:2(94-107)Online publication date: 1-Mar-2024
  • (2024)MalleTrain: Deep Neural Networks Training on Unfillable Supercomputer NodesProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3645035(190-200)Online publication date: 7-May-2024
  • (2023)Evaluating the Potential of Elastic Jobs in HPC SystemsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624199(1324-1333)Online publication date: 12-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Cluster Computing
Cluster Computing  Volume 10, Issue 3
Sep 2007
108 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 September 2007

Author Tags

  1. Actors
  2. Dynamic reconfiguration
  3. High performance computing
  4. MPI
  5. Malleability
  6. SALSA

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Role-shifting threadsInternational Journal of High Performance Computing Applications10.1177/1094342023120115338:2(94-107)Online publication date: 1-Mar-2024
  • (2024)MalleTrain: Deep Neural Networks Training on Unfillable Supercomputer NodesProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3645035(190-200)Online publication date: 7-May-2024
  • (2023)Evaluating the Potential of Elastic Jobs in HPC SystemsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624199(1324-1333)Online publication date: 12-Nov-2023
  • (2022)Exploiting OpenMP Malleability with Free Agent Threads and DLBHigh Performance Computing. ISC High Performance 2022 International Workshops10.1007/978-3-031-23220-6_11(162-175)Online publication date: 29-May-2022
  • (2022)Decentralized Online Scheduling of Malleable NP-hard JobsEuro-Par 2022: Parallel Processing10.1007/978-3-031-12597-3_8(119-135)Online publication date: 22-Aug-2022
  • (2018)A Hierarchical Distributed Runtime Resource Management Scheme for NoC-Based Many-CoresACM Transactions on Embedded Computing Systems10.1145/318217317:3(1-26)Online publication date: 23-Apr-2018
  • (2017)A Large-Scale Malleable Tsunami Simulation Realized on an Elastic MPI InfrastructureProceedings of the Computing Frontiers Conference10.1145/3075564.3075585(271-274)Online publication date: 15-May-2017
  • (2016)Formal probabilistic analysis of distributed resource management schemes in on-chip systemsProceedings of the 2016 Conference on Design, Automation & Test in Europe10.5555/2971808.2972022(930-935)Online publication date: 14-Mar-2016
  • (2016)Architecting Malleable MPI Applications for Priority-driven Adaptive SchedulingProceedings of the 23rd European MPI Users' Group Meeting10.1145/2966884.2966907(74-81)Online publication date: 25-Sep-2016
  • (2016)ElCoreMicroprocessors & Microsystems10.1016/j.micpro.2016.06.00746:PB(221-239)Online publication date: 1-Oct-2016
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media