Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

A Job Dispatcher for Large and Heterogeneous HPC Systems Running Modern Applications

Authors Cristian Galleguillos , Zeynep Kiziltan , Ricardo Soto



PDF
Thumbnail PDF

File

LIPIcs.CP.2021.26.pdf
  • Filesize: 2.68 MB
  • 16 pages

Document Identifiers

Author Details

Cristian Galleguillos
  • Pontificia Universidad Católica de Valparaíso, Chile
  • University of Bologna, Italy
Zeynep Kiziltan
  • University of Bologna, Italy
Ricardo Soto
  • Pontificia Universidad Católica de Valparaíso, Chile

Acknowledgements

We thank A. Bartolini, L. Benini, M. Milano, M. Lombardi and the SCAI group at Cineca for providing the Eurora data, and A. Borghesi and T. Bridi for sharing the implementations of the original CP-based dispatchers. We also thank the School of Computer Engineering of PUCV in Chile for providing access to computing resources.

Cite AsGet BibTex

Cristian Galleguillos, Zeynep Kiziltan, and Ricardo Soto. A Job Dispatcher for Large and Heterogeneous HPC Systems Running Modern Applications. In 27th International Conference on Principles and Practice of Constraint Programming (CP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 210, pp. 26:1-26:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.CP.2021.26

Abstract

High-performance Computing (HPC) systems have become essential instruments in our modern society. As they get closer to exascale performance, HPC systems become larger in size and more heterogeneous in their computing resources. With recent advances in AI, HPC systems are also increasingly being used for applications that employ many short jobs with strict timing requirements. HPC job dispatchers need to therefore adopt techniques to go beyond the capabilities of those developed for small or homogeneous systems, or for traditional compute-intensive applications. In this paper, we present a job dispatcher suitable for today’s large and heterogeneous systems running modern applications. Unlike its predecessors, our dispatcher solves the entire dispatching problem using Constraint Programming (CP) with a model size independent of the system size. Experimental results based on a simulation study show that our approach can bring about significant performance gains over the existing CP-based dispatchers in a large or heterogeneous system.

Subject Classification

ACM Subject Classification
  • Theory of computation → Constraint and logic programming
  • Computing methodologies → Planning and scheduling
Keywords
  • Constraint programming
  • HPC systems
  • heterogeneous systems
  • large systems
  • on-line job dispatching
  • resource allocation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Altair. Altair PBS professional (accessed may 27 2021), 2021. URL: https://www.altair.com/pbs-works/.
  2. Philippe Baptiste, Philippe Laborie, Claude Le Pape, and Wim Nuijten. Chapter 22 - constraint-based scheduling and planning. In Handbook of Constraint Programming, volume 2 of Foundations of Artificial Intelligence, pages 761-799. Elsevier, 2006. Google Scholar
  3. Andrea Bartolini, Andrea Borghesi, Thomas Bridi, Michele Lombardi, and Michela Milano. Proactive workload dispatching on the EURORA supercomputer. In Proceedings of Principles and Practice of Constraint Programming - 20th International Conference, CP 2014, Lyon, France, September 8-12, 2014., volume 8656 of Lecture Notes in Computer Science, pages 765-780. Springer, 2014. URL: https://doi.org/10.1007/978-3-319-10428-7_55.
  4. Nicolas Beldiceanu and Evelyne Contejean. Introducing global constraints in CHIP. Mathematical and Computer Modelling, 20(12):97-123, 1994. URL: https://doi.org/10.1016/0895-7177(94)90127-9.
  5. Jacek Blazewicz, Jan Karel Lenstra, and A. H. G. Rinnooy Kan. Scheduling subject to resource sonstraints: classification and complexity. Discrete Applied Mathematics, 5(1):11-24, 1983. URL: https://doi.org/10.1016/0166-218X(83)90012-4.
  6. A. Borghesi, A. Bartolini, M. Lombardi, M. Milano, and L. Benini. Scheduling-based power capping in high performance computing systems. Sustainable Computing: Informatics and Systems, 19:1-13, 2018. Google Scholar
  7. Andrea Borghesi, Francesca Collina, Michele Lombardi, Michela Milano, and Luca Benini. Power capping in high performance computing systems. In Proceedings of Principles and Practice of Constraint Programming - 21st International Conference, CP 2015, Cork, Ireland, August 31 - September 4, 2015, Proceedings, volume 9255 of Lecture Notes in Computer Science, pages 524-540. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-23219-5_37.
  8. Thomas Bridi, Andrea Bartolini, Michele Lombardi, Michela Milano, and Luca Benini. A constraint programming scheduler for heterogeneous high-performance computing machines. IEEE Trans. Parallel Distrib. Syst., 27(10):2781-2794, 2016. Google Scholar
  9. Jirachai Buddhakulsomsiri and David S. Kim. Priority rule-based heuristic for multi-mode resource-constrained project scheduling problems with resource vacations and activity splitting. European Journal of Operational Research, 178(2):374-390, 2007. URL: https://doi.org/10.1016/j.ejor.2006.02.010.
  10. Carlo Cavazzoni. EURORA: a european architecture toward exascale. In Proceedings of the Future HPC Systems - the Challenges of Power-Constrained Performance, FutureHPC@ICS 2012, Venezia, Italy, June 25, 2012, pages 1:1-1:4. ACM, 2012. URL: https://doi.org/10.1145/2322156.2322157.
  11. Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn. Parallel job scheduling - A status report. In Job Scheduling Strategies for Parallel Processing, 10th International Workshop, JSSPP 2004, New York, NY, USA, June 13, 2004, Revised Selected Papers, volume 3277 of Lecture Notes in Computer Science, pages 1-16. Springer, 2004. URL: https://doi.org/10.1007/11407522_1.
  12. Cristian Galleguillos, Zeynep Kiziltan, Alessio Netti, and Ricardo Soto. Accasim: a customizable workload management simulator for job dispatching research in HPC systems. Cluster Computing, 23(1):107-122, 2020. URL: https://doi.org/10.1007/s10586-019-02905-5.
  13. Cristian Galleguillos, Zeynep Kiziltan, Alina Sîrbu, and Özalp Babaoglu. Constraint programming-based job dispatching for modern HPC applications. In Proceeding of Principles and Practice of Constraint Programming - 25th International Conference, CP 2019, Stamford, CT, USA, September 30 - October 4, 2019, volume 11802 of Lecture Notes in Computer Science, pages 438-455. Springer, 2019. URL: https://doi.org/10.1007/978-3-030-30048-7_26.
  14. Cristian Galleguillos, Alina Sîrbu, Zeynep Kiziltan, Özalp Babaoglu, Andrea Borghesi, and Thomas Bridi. Data-driven job dispatching in HPC systems. In Proceedings of Machine Learning, Optimization, and Big Data - Third International Conference, MOD 2017, Volterra, Italy, September 14-17, 2017, Revised Selected Papers, volume 10710 of Lecture Notes in Computer Science, pages 449-461. Springer, 2017. URL: https://doi.org/10.1007/978-3-319-72926-8_37.
  15. R. Haupt. A survey of priority rule-based scheduling. Operations-Research-Spektrum, 11(1):3-16, March 1989. URL: https://doi.org/10.1007/BF01721162.
  16. Stijn Heldens, Pieter Hijma, Ben van Werkhoven, Jason Maassen, Adam S. Z. Belloum, and Rob van Nieuwpoort. The landscape of exascale research: A data-driven literature analysis. ACM Comput. Surv., 53(2):23:1-23:43, 2020. URL: https://doi.org/10.1145/3372390.
  17. Robert L. Henderson. Job scheduling under the portable batch system. In Proceedings of Job Scheduling Strategies for Parallel Processing, IPPS'95 Workshop, Santa Barbara, CA, USA, April 25, 1995., volume 949 of Lecture Notes in Computer Science, pages 279-294. Springer, 1995. URL: https://doi.org/10.1007/3-540-60153-8_34.
  18. ITIF. The vital importance of high-performance computing to u.s. competitiveness. information technology and innovation foundation. (accessed september 4, 2020), 2016. URL: http://www2.itif.org/2016-high-performance-computing.pdf.
  19. Philippe Laborie and Jerome Rogerie. Reasoning with conditional time-intervals. In Proceedings of the Twenty-First International Florida Artificial Intelligence Research Society Conference, May 15-17, 2008, Coconut Grove, Florida, USA, pages 555-560. AAAI Press, 2008. URL: http://www.aaai.org/Library/FLAIRS/2008/flairs08-126.php.
  20. Alessio Netti, Cristian Galleguillos, Zeynep Kiziltan, Alina Sîrbu, and Özalp Babaoglu. Heterogeneity-aware resource allocation in HPC systems. In Proceedings of High Performance Computing - 33rd International Conference, ISC High Performance 2018, Frankfurt, Germany, June 24-28, 2018, volume 10876 of Lecture Notes in Computer Science, pages 3-21. Springer, 2018. URL: https://doi.org/10.1007/978-3-319-92040-5_1.
  21. C. Le Pape, P. Couronne, D. Vergamini, and V. Gosselin. Time-versus-capacity compromises in project scheduling. AISB Quartetly, pages 19-31, 1995. Google Scholar
  22. PRACE. The scientific case for computing in europe 2018-2026. prace scientific steering committee. (accessed september 4, 2020), 2018. URL: https://prace-ri.eu/wp-content/uploads/2019/08/PRACEScientificCase.pdf.
  23. Albert Reuther, Chansup Byun, William Arcand, David Bestor, Bill Bergeron, Matthew Hubbell, Michael Jones, Peter Michaleas, Andrew Prout, Antonio Rosa, and Jeremy Kepner. Scalable system scheduling for HPC and big data. J. Parallel Distributed Comput., 111:76-92, 2018. URL: https://doi.org/10.1016/j.jpdc.2017.06.009.
  24. Abraham Silberschatz, Peter Baer Galvin, and Greg Gagne. Operating System Concepts, 9th Edition. Wiley, 2014. Google Scholar
  25. SLURM. SLURM workload manager, 2019. URL: http://slurm.schedmd.com.
  26. Dan Tsafrir, Yoav Etsion, and Dror G. Feitelson. Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans. Parallel Distrib. Syst., 18(6):789-803, 2007. URL: https://doi.org/10.1109/TPDS.2007.70606.
  27. Rinki Tyagi and Santosh Kumar Gupta. A survey on scheduling algorithms for parallel and distributed systems. In Silicon Photonics & High Performance Computing, pages 51-64, Singapore, 2018. Springer Singapore. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail