Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3569951.3593605acmconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article

Airavata Metascheduler: A Reliable, Fault Tolerant, and Resource-Aware Job Scheduling Service

Published: 10 September 2023 Publication History

Abstract

Software-as-a-service science gateways provide user interfaces and middleware for accessing scientific software deployed on remote high-performance computing resources and clusters. Selecting the resource to use for a particular job submission may be left to the user, who may need more information to make good choices when selecting from multiple options. To address this problem, we have designed and developed an extensible, scalable metascheduling system that can provide automated scheduling capabilities based on resource availability and other characteristics. We develop a system model based on queuing theory to guide our implementation and provide a basis for analysis. In particular, we derive an efficiency metric from these considerations. We implement the metascheduling system within the open-source Apache Airavata framework for science gateways as a supplemental service for guiding the job submission capabilities. We measure efficiency in representative scenarios, observing efficiencies of greater than 70% even in scenarios with high input rates and low job acceptance rates.

References

[1]
2023. Airavata DataModels. https://github.com/apache/airavata/blob/develop/thrift-interface-descriptions/data-models/experiment-catalog-models/process_model.thrift.
[2]
2023. Airavata Metascheduler. https://github.com/apache/airavata/tree/develop/modules/airavata-metascheduler, https://github.com/apache/airavata/tree/develop/modules/cluster-monitoring.
[3]
2023. Airavata Python SDK. https://github.com/apache/airavata/tree/develop/airavata-api/airavata-client-sdks/airavata-python-sdk.
[4]
Enis Afgan, Dannon Baker, Bérénice Batut, Marius Van Den Beek, Dave Bouvier, Martin Čech, John Chilton, Dave Clements, Nate Coraor, Björn A Grüning, 2018. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic acids research 46, W1 (2018), W537–W544.
[5]
Enis Afgan and Purushotham Bangalore. 2008. Embarrassingly parallel jobs are not embarrassingly easy to schedule on the grid. In 2008 Workshop on Many-Task Computing on Grids and Supercomputers. 1–10. https://doi.org/10.1109/MTAGS.2008.4777910
[6]
Enis Afgan, Purushotham V. Bangalore, and Tibor Skala. 2011. Scheduling and planning job execution of loosely coupled applications. The Journal of Supercomputing 59 (2011), 1431 – 1454.
[7]
Aymen Alsaadi, Logan Ward, Andre Merzky, Kyle Chard, Ian Foster, Shantenu Jha, and Matteo Turilli. 2022. RADICAL-Pilot and Parsl: Executing Heterogeneous Workflows on HPC Platforms. In 2022 IEEE/ACM Workshop on Workflows in Support of Large-Scale Science (WORKS). 27–34. https://doi.org/10.1109/WORKS56498.2022.00009
[8]
Rajkumar Buyya, Chee Shin Yeo, Srikumar Venugopal, James Broberg, and Ivona Brandic. 2009. Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Generation Computer Systems 25, 6 (2009), 599–616. https://doi.org/10.1016/j.future.2008.12.001
[9]
Thomas E. Carroll and Daniel Grosu. 2008. An Incentive-Compatible Mechanism for Scheduling Non-Malleable Parallel Jobs with Individual Deadlines. In 2008 37th International Conference on Parallel Processing. 107–114. https://doi.org/10.1109/ICPP.2008.27
[10]
Tuhinangshu Choudhury, Gauri Joshi, Weina Wang, and Sanjay Shakkottai. 2021. Job Dispatching Policies for Queueing Systems with Unknown Service Rates(MobiHoc ’21). Association for Computing Machinery, New York, NY, USA, 181–190. https://doi.org/10.1145/3466772.3467047
[11]
J Eric Coulter, Eroma Abeysinghe, Sudhakar Pamidighantam, and Marlon Pierce. 2019. Virtual clusters in the jetstream cloud: A story of elasticized hpc. In Proceedings of the Humans in the Loop: Enabling and Facilitating Research on Cloud Computing. 1–6.
[12]
Attila Csenki. 2011. Independent events in elementary probability theory. International Journal of Mathematical Education in Science and Technology 42, 5 (2011), 685–691. https://doi.org/10.1080/0020739X.2011.562313
[13]
Borries Demeler. 2005. UltraScan: a comprehensive data analysis software package for analytical ultracentrifugation experiments. Modern analytical ultracentrifugation: techniques and methods 10 (2005), 210–229.
[14]
Ye Fan, Sudhakar Pamidighantam, and Warren Smith. 2014. Incorporating Job Predictions into the SEAGrid Science Gateway(XSEDE ’14). Association for Computing Machinery, New York, NY, USA, Article 57, 3 pages. https://doi.org/10.1145/2616498.2616563
[15]
Carole Fayad, Jonathan M. Garibaldi, and Djamila Ouelhadj. 2007. Fuzzy Grid Scheduling Using Tabu Search. In 2007 IEEE International Fuzzy Systems Conference. 1–6. https://doi.org/10.1109/FUZZY.2007.4295513
[16]
Saurabh Garg, Pramod Konugurthi, and Rajkumar Buyya. 2008. A Linear Programming Driven Genetic Algorithm for Meta-Scheduling on Utility Grids. In 2008 16th International Conference on Advanced Computing and Communications. 19–26. https://doi.org/10.1109/ADCOM.2008.4760422
[17]
David Y Hancock, Jeremy Fischer, John Michael Lowe, Winona Snapp-Childs, Marlon Pierce, Suresh Marru, J Eric Coulter, Matthew Vaughn, Brian Beck, Nirav Merchant, 2021. Jetstream2: Accelerating cloud computing via Jetstream. In Practice and Experience in Advanced Research Computing. 1–8.
[18]
James H. Anderson J. Y-T.Leung. 2004. Handbook of Scheduling: Algorithms, Models, and Performance Analysis. Chapman and Hall.
[19]
Katherine A. Lawrence, Michael Zentner, Nancy Wilkins-Diehr, Julie A. Wernert, Marlon Pierce, Suresh Marru, and Scott Michael. 2015. Science gateways today and tomorrow: positive perspectives of nearly 5000 members of the research community. Concurrency and Computation: Practice and Experience 27, 16 (2015), 4252–4268. https://doi.org/10.1002/cpe.3526
[20]
Gunho Lee, Byung-Gon Chun, and H. Katz. 2011. Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud(HotCloud’11). USENIX Association, USA, 4.
[21]
D. V. Lindley and L. D. Phillips. 1976. Inference for a Bernoulli Process (A Bayesian View). The American Statistician 30, 3 (1976), 112–119. http://www.jstor.org/stable/2683855
[22]
Suresh Marru, Lahiru Gunathilake, Chathura Herath, Patanachai Tangchaisin, Marlon Pierce, Chris Mattmann, Raminder Singh, Thilina Gunarathne, Eran Chinthaka, Ross Gardler, 2011. Apache Airavata: a framework for distributed applications and computational workflows. In Proceedings of the 2011 ACM workshop on Gateway computing environments. 21–28.
[23]
Avinash Maurya, Bogdan Nicolae, Ishan Guliani, and M. Mustafa Rafique. 2020. CoSim: A Simulator for Co-Scheduling of Batch and On-Demand Jobs in HPC Datacenters. In 2020 IEEE/ACM 24th International Symposium on Distributed Simulation and Real Time Applications (DS-RT). 1–8. https://doi.org/10.1109/DS-RT50469.2020.9213578
[24]
Michael McLennan, Steven Clark, Ewa Deelman, Mats Rynge, Karan Vahi, Frank McKenna, Derrick Kearney, and Carol Song. 2015. HUBzero and Pegasus: integrating scientific workflows into science gateways. Concurrency and Computation: Practice and Experience 27, 2 (2015), 328–343.
[25]
Marlon Pierce, Suresh Marru, Eroma Abeysinghe, Sudhakar Pamidighantam, Marcus Christie, and Dimuthu Wannipurage. 2018. Supporting science gateways using Apache Airavata and SciGaP services. In Proceedings of the Practice and Experience on Advanced Research Computing. 1–4.
[26]
Marlon Pierce, Suresh Marru, Borries Demeler, Raminderjeet Singh, and Gary Gorbet. 2014. The Apache Airavata application programming interface: overview and evaluation with the UltraScan science gateway. In 2014 9th Gateway Computing Environments Workshop. IEEE, 25–29.
[27]
Marlon E Pierce, Mark A Miller, Emre H Brookes, Mona Wong, Enis Afgan, Yan Liu, Sandra Gesing, Maytal Dahan, Suresh Marru, and Tony Walker. 2018. Towards a science gateway reference architecture. (2018).
[28]
Alexey Savelyev and Emre Brookes. 2019. GenApp: Extensible tool for rapid generation of web and native GUI applications. Future Generation Computer Systems 94 (2019), 929–936.
[29]
Jennifer M. Schopf. 2004. Ten Actions When Grid Scheduling. Springer US, Boston, MA, 15–23. https://doi.org/10.1007/978-1-4615-0509-9_2
[30]
Uwe Schwiegelshohn and Ramin Yahyapour. 1999. Resource Allocation and Scheduling in Metasystems. In Proceedings of the 7th International Conference on High-Performance Computing and Networking(HPCN Europe ’99). Springer-Verlag, Berlin, Heidelberg, 851–860.
[31]
Stelios Sotiriadis, Nik Bessis, Fatos Xhafa, and Nick Antonopoulos. 2012. From Meta-computing to Interoperable Infrastructures: A Review of Meta-schedulers for HPC, Grid and Cloud. In 2012 IEEE 26th International Conference on Advanced Information Networking and Applications. 874–883. https://doi.org/10.1109/AINA.2012.15
[32]
R. Srikant and Lei Ying. 2014. Communication Networks: An Optimization, Control and Stochastic Networks Perspective. Cambridge University Press, USA.
[33]
Joe Stubbs, Richard Cardone, Mike Packard, Anagha Jamthe, Smruti Padhy, Steve Terry, Julia Looney, Joseph Meiring, Steve Black, Maytal Dahan, 2021. Tapis: an API platform for reproducible, distributed computational research. In Advances in Information and Communication: Proceedings of the 2021 Future of Information and Communication Conference (FICC), Volume 1. Springer, 878–900.
[34]
Dimuthu Wannipurage, Suresh Marru, Marlon Piece, Eroma Abeysinghe, Sudhakar Pamidighantam, Marcus Christie, Gourav Shenoy, Ajinkya Dhamnaskar, and Lahiru Jayathilaka. 2019. Implementing a flexible, fault tolerant job management system for science gateways. In Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning). 1–8.
[35]
Dimuthu Wannipurage, Suresh Marru, Marlon Piece, Eroma Abeysinghe, Sudhakar Pamidighantam, Marcus Christie, Gourav Shenoy, Ajinkya Dhamnaskar, and Lahiru Jayathilaka. 2019. Implementing a Flexible, Fault Tolerant Job Management System for Science Gateways. In Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning) (Chicago, IL, USA) (PEARC ’19). Association for Computing Machinery, New York, NY, USA, Article 15, 8 pages. https://doi.org/10.1145/3332186.3332233
[36]
Fatos Xhafa, Javier Carretero, Bernabé Dorronsoro, and Enrique Alba. 2009. A Tabu Search Algorithm for Scheduling Independent Jobs in Computational Grids. Comput. Informatics 28 (2009), 237–250.
[37]
Shijue Zheng, Wanneng Shu, and Li Gao. 2006. Task Scheduling using Parallel Genetic Simulated Annealing Algorithm. In 2006 IEEE International Conference on Service Operations and Logistics, and Informatics. 46–50. https://doi.org/10.1109/SOLI.2006.328980

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PEARC '23: Practice and Experience in Advanced Research Computing 2023: Computing for the Common Good
July 2023
519 pages
ISBN:9781450399852
DOI:10.1145/3569951
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 September 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cyberinfrastructure
  2. metascheduling
  3. open source software
  4. queueing analysis
  5. science gateways

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

PEARC '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 133 of 202 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 56
    Total Downloads
  • Downloads (Last 12 months)39
  • Downloads (Last 6 weeks)5
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media