Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

On the Performance of Malleable APGAS Programs and Batch Job Schedulers

Published: 27 March 2024 Publication History

Abstract

Malleability—the ability for applications to dynamically adjust their resource allocations at runtime—presents great potential to enhance the efficiency and resource utilization of modern supercomputers. However, applications are rarely capable of growing and shrinking their number of nodes at runtime, and batch job schedulers provide only rudimentary support for such features. While numerous approaches have been proposed to enable application malleability, these typically focus on iterative computations and require complex code modifications. This amplifies the challenges for programmers, who already wrestle with the complexity of traditional MPI inter-node programming. Asynchronous Many-Task (AMT) programming presents a promising alternative. In AMT, computations are split into many fine-grained tasks, which are processed by workers. This makes transparent task relocation via the AMT runtime system possible, thus offering great potential for enabling efficient malleability. In this work, we propose an extension to an existing AMT system, namely APGAS for Java. We provide easy-to-use malleability programming abstractions, requiring only minor application code additions from programmers. Runtime adjustments, such as process initialization and termination, are automatically managed by our malleability extension. We validate our malleability extension by adapting a load balancing library handling multiple benchmarks. We show that both shrinking and growing operations cost low execution time overhead. In addition, we demonstrate compatibility with potential batch job schedulers by developing a prototype batch job scheduler that supports malleable jobs. Through extensive real-world job batches execution on up to 32 nodes, involving rigid, moldable, and malleable programs, we evaluate the impact of deploying malleable APGAS applications on supercomputers. Exploiting scheduling algorithms, such as FCFS, Backfilling, Easy-Backfilling, and one exploiting malleable jobs, the experimental results highlight a significant improvement regarding several metrics for malleable jobs. We show a 13.09% makespan reduction (the time needed to schedule and execute all jobs), a 19.86% increase in node utilization, and a 3.61% decrease in job turnaround time (the time a job takes from its submission to completion) when using 100% malleable job in combination with our prototype batch job scheduler compared to the best-performing scheduling algorithm with 100% rigid jobs.

References

[1]
Acun B, Gupta A, Jain N, et al. Parallel programming with migratable objects: Charm++ in practice. In: International conference for high performance computing, networking, storage and analysis (SC). IEEE; 2014. p. 647–658.
[2]
Aliaga JI, Castillo M, Iserte S, et al. A survey on malleability solutions for high-performance distributed computing Appl Sci 2022
[3]
Almasi G PGAS (Partitioned global address space) languages 2011 Berlin Springer 1539-1545
[4]
Bachan J, Baden SB, Hofmeyr S, et al. UPC++: a high-performance communication framework for asynchronous computation. In: International parallel and distributed processing symposium (IPDPS). IEEE; 2019. p. 963–973.
[5]
Bland W, Bouteiller A, Herault T, et al. Post-failure recovery of MPI communication capability: design and rationale Int J High Perform Comput Appl 2013 27 3 244-254
[6]
Bungart M, Fohry C. A malleable and fault-tolerant task pool framework for X10. In: Proceedings international conference on cluster computing. IEEE; 2017.
[7]
Charles P, Grothoff C, Saraswat V, et al. X10: an object-oriented approach to non-uniform cluster computing SIGPLAN Notices 2005 40 10 519-538
[8]
Competence Center for High Performance Computing in Hessen (HKHLR). Linux Cluster Kassel. 2023. https://www.hkhlr.de/en/clusters/linux-cluster-kassel.
[9]
De Wael M, Marr S, De Fraine B, et al. Partitioned global address space languages Comput Surv 2015
[10]
El-Ghazawi T, Smith L. UPC: unified parallel C. In: Proceedings international conference on high performance computing, networking, storage and analysis (SC). ACM; 2006.
[11]
Feitelson DG, Rudolph L . Toward convergence in job schedulers for parallel supercomputers. In: Job scheduling strategies for parallel processing. Springer, p. 1–26.
[12]
Feitelson DG, Tsafrir D, and Krakov D Experience with using the parallel workloads archive J Parallel Distrib Comput 2014 74 10 2967-2982
[13]
Finnerty P, Kamada T, Ohta C. Self-adjusting task granularity for global load balancer library on clusters of many-core processors. In: Proceedings international workshop on programming models and applications for multicores and manycores. ACM; 2020. p. 1–10.
[14]
Finnerty P, Kamada T, and Ohta C A self-adjusting task granularity mechanism for the Java lifeline-based global load balancer library on many-core clusters Concurr Comput Pract Exp 2021
[15]
Freeman LC A set of measures of centrality based on betweenness Sociometry 1977 40 1 35
[16]
Galante G and da Rosa Righi R Adaptive parallel applications: from shared memory architectures to fog computing Clust Comput 2022 25 6 4439-61
[17]
Gik EJ (1987) Schach und Mathematik. 1st ed. Thun.
[18]
Hazelcast Unified Real-Time Data Platform for Instant Action. 2023. http://hazelcast.org.
[19]
Herault T and Robert Y Fault-tolerance techniques for high-performance computing 2015 Berlin Springer
[20]
Hill MD and Marty MR Amdahl’s law in the multicore era Computer 2008 41 7 33-8
[21]
Huber D, Streubel M, Comprés I, et al. Towards dynamic resource management with MPI sessions and PMIx. In: European MPI users’ group meeting. ACM; 2022.
[23]
IBM The X10 Programming Language. 2021. https://github.com/x10-lang.
[24]
Iserte S, Mayo R, Quintana-Ortí ES, et al. DMRlib: easy-coding and efficient resource management for job malleability Trans Comput 2021 70 9 1443-1457
[25]
Maghraoui KE, Desell TJ, Szymanski BK, et al. Dynamic malleability in iterative MPI applications. In: International symposium on cluster computing and the grid. IEEE; 2007.
[26]
Message Passing Interface Forum. MPI: a message-passing interface standard Version 4.0. 2021. https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf.
[27]
Moody A, Bronevetsky G, Mohror K, et al. Design, modeling, and evaluation of a scalable multi-level checkpointing system. In: International conference for high performance computing, networking, storage and analysis (SC). IEEE; 2010.
[28]
Nowicki M, Bała P. Parallel computations in Java with PCJ library. In: Proceedings international conference on high performance computing simulation (HPCS). IEEE; 2012. p. 381–387.
[29]
Numrich RW and Reid J Co-Arrays in the next Fortran Standard SIGPLAN Fortran Forum 2005 24 2 4-17
[30]
Olivier S, Huan J, Liu J, et al. UTS: an unbalanced tree search benchmark. In: Languages and compilers for parallel computing (LCPC). Springer; 2006. p. 235–250.
[31]
[32]
Özden T, Beringer T, Mazaheri A, et al. ElastiSim: a batch-system simulator for malleable workloads. In: Proceedings of the international conference on parallel processing (ICPP). ACM; 2023.
[33]
PMIx Administrative Steering Committee. Process management interface for exascale (PMIx) Standard 4.0. 2020. https://pmix.github.io/uploads/2020/12/pmix-standard-v4.0.pdf.
[34]
Posner J, Fohry C. Cooperation vs. coordination for lifeline-based global load balancing in APGAS. In: Proceedings of workshop on X10. ACM; 2016. p. 13–17.
[35]
Posner J, Fohry C. Transparent resource elasticity for task-based cluster environments with work stealing. In: International conference on parallel processing workshop. ACM; 2021. p. 1–10.
[36]
Posner J, Hupfeld F, Finnerty P. Enhancing supercomputer performance with malleable job scheduling strategies. In: Proceedings Euro-Par parallel processing workshops (PECS). Springer; 2023 (to appear).
[37]
Prabhakaran S, Neumann M, Rinke S, et al. A batch system with efficient adaptive scheduling for malleable and evolving applications. In: Proceedings international parallel and distributed processing symposium. 2015. p. 429–438.
[38]
Saraswat V, Almasi G, Bikshandi G, et al. The asynchronous partitioned global address space model. In: Proceedings SIGPLAN workshop on advances in message passing (AMP). ACM; 2010.
[39]
Saraswat VA, Kambadur P, Kodali S, et al. Lifeline-based global load balancing. In: Proceedings principles and practice of parallel programming. ACM; 2011. p. 201–212.
[40]
Shahzad F, Wittmann M, Kreutzer M, et al. A survey of checkpoint/restart techniques on distributed memory systems Parallel Process Lett 2013
[41]
Staples G. TORQUE resource manager. In: Proceedings international conference on high performance computing, networking, storage and analysis (SC). ACM, New York, NY, USA; 2006.
[42]
Tardieu O. The APGAS library: resilient parallel and distributed programming in Java 8. In: Proceedings of the ACM SIGPLAN workshop on X10. ACM; 2015. p. 25–26.
[43]
Tardieu O, Herta B, Cunningham D, et al. X10 and APGAS at Petascale. In: Proceedings principles and practice of parallel programming. ACM; 2014. p. 53–66.
[44]
Yamashita K and Kamada T Introducing a multithread and multistage mechanism for the Global Load Balancing Library of X10 J Inf Process 2016 24 2 416-424
[45]
Yelick KA, Semenzato L, Pike G, et al. Titanium: a high-performance Java Dialect. Concurr Pract Exp; 1998. 10(11–13):825–836.
[46]
Yoo AB, Jette MA, Grondona M. SLURM: simple Linux utility for resource management. In: Job scheduling strategies for parallel processing (JSSPP). Springer; 2003. p. 44–60.
[47]
Zhang W, Tardieu O, Grove D, et al. GLB: lifeline-based global load balancing library in X10. In: Proceedings workshop on parallel programming for analytics applications (PPAA). ACM; 2014. p. 31–40.
[48]
Zheng G, Ni X, Kale LV. A scalable double in-memory checkpoint and restart scheme towards exascale. In: Proceedings international conference on dependable systems and networks workshops (DSN). IEEE; 2012.

Cited By

View all
  • (2024)Evolving APGAS Programs: Automatic and Transparent Resource Adjustments at RuntimeAsynchronous Many-Task Systems and Applications10.1007/978-3-031-61763-8_15(154-165)Online publication date: 14-Feb-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image SN Computer Science
SN Computer Science  Volume 5, Issue 4
Apr 2024
1729 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 27 March 2024
Accepted: 18 January 2024
Received: 03 November 2023

Author Tags

  1. Malleable runtime system
  2. Malleable job scheduling
  3. APGAS

Qualifiers

  • Research-article

Funding Sources

  • Universität Kassel (3154)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Evolving APGAS Programs: Automatic and Transparent Resource Adjustments at RuntimeAsynchronous Many-Task Systems and Applications10.1007/978-3-031-61763-8_15(154-165)Online publication date: 14-Feb-2024

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media