Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3332186.3332233acmotherconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article
Public Access

Implementing a Flexible, Fault Tolerant Job Management System for Science Gateways

Published: 28 July 2019 Publication History

Abstract

This paper summarizes our experiences evaluating and deploying a new task execution management system within the open source Apache Airavata framework for science gateways. We base our choices on our operational requirements and experiences running Airavata software as a multi-tenanted production service for multiple gateway clients. Our considerations include integrating semi-independent components, making major upgrades to those components while retaining the system's overall functionality, and choosing between integrating third party and in-house developed components. While we focus on Apache Airavata as the platform for evaluation, our results should be of general interest. After considering the options of extensions to our previous, in-house job management system using Apache Kafka or replacing it with Kubernetes, we ultimately chose Apache Helix, primarily for its ability to execute multiple tasks coupled into directed acyclic graphs. We have integrated this approach into Apache Airavata and have tested extensively over several months with many thousands of jobs, both from our internal throughput testing and operational tests with early adopter science gateway clients. The new system has proven to be at least as reliable as the previous system with the advantages that we now have simplified maintenance, do not need to support an in-house system that required extensive developer training to modify, and can support more sophisticated job execution scenarios.

References

[1]
2018. Towards a Science Gateway Reference Architecture. http://ceur-ws.org/Vol-2357/paper6.pdf
[2]
Enis Afgan, Dannon Baker, Bérénice Batut, Marius Van Den Beek, Dave Bouvier, Martin Čech, John Chilton, Dave Clements, Nate Coraor, Björn A Grüning, et al. 2018. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic acids research 46, W1 (2018), W537--W544.
[3]
Apache Software Foundation 2019. Apache Helix Task Framework. Apache Software Foundation. https://helix.apache.org//0.8.2-docs/tutorial_task_framework.html.
[4]
Apache Software Foundation 2019. ASF 3RD PARTY LICENSE POLICY. Apache Software Foundation. https://www.apache.org/legal/resolved.html.
[5]
Krzysztof Benedyczak, Bernd Schuller, Maria Petrova-El Sayed, Jedrzej Rybicki, and Richard Grunzke. 2016. UNICORE 7 a Middleware services for distributed and federated computing. In 2016 International Conference on High Performance Computing and Simulation (HPCS). 613--620.
[6]
Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes. 2016. Borg, omega, and kubernetes. ACM Queue 59, 5 (2016), 50--57.
[7]
Patrice Calegari, Marc Levrier, and PaweÃĚÂĆ BalczyÃĚÂĎski. 2019. Web Portals for High-performance Computing: A Survey. ACM Transactions on The Web 13, 1 (2019), 1--36.
[8]
Marcus A Christie, Anuj Bhandar, Supun Nakandala, Suresh Marru, Eroma Abeysinghe, Sudhakar Pamidighantam, and Marlon E Pierce. 2017. Using Keycloak for Gateway Authentication and Authorization. (2017).
[9]
Ewa Deelman, Tom Peterka, Ilkay Altintas, Christopher D. Carothers, Kerstin Kleese van Dam, Kenneth Moreland, Manish Parashar, Lavanya Ramakrishnan, Michela Taufer, and Jeffrey S. Vetter. 2018. The future of scientific workflows. International Journal of High Performance Computing Applications 32, 1 (2018), 159--175.
[10]
Sandra Gesing and Nancy Wilkins-Diehr. 2015. Science gateway workshops 2014 special issue conference publications. Concurrency and Computation: Practice and Experience 27, 16 (2015), 4247--4251.
[11]
Kishore Gopalakrishna, Shi Lu, Zhen Zhang, Adam Silberstein, Kapil Surlaker, Ramesh Subramonian, and Bob Schulman. 2012. Untangling cluster management with Helix. In Proceedings of the Third ACM Symposium on Cloud Computing. 19.
[12]
Gerhard Klimeck, Michael McLennan, Sean P Brophy, George B Adams III, and Mark S Lundstrom. 2008. nanohub. org: Advancing education and research in nanotechnology. Computing in Science & Engineering 10, 5 (2008), 17.
[13]
Katherine A Lawrence, Michael Zentner, Nancy Wilkins-Diehr, Julie A Wernert, Marlon Pierce, Suresh Marru, and Scott Michael. 2015. Science gateways today and tomorrow: positive perspectives of nearly 5000 members of the research community. Concurrency and Computation: Practice and Experience 27, 16 (2015), 4252--4268.
[14]
Suresh Marru, Lahiru Gunathilake, Chathura Herath, Patanachai Tangchaisin, Marlon Pierce, Chris Mattmann, Raminder Singh, Thilina Gunarathne, Eran Chinthaka, Ross Gardler, et al. 2011. Apache airavata: a framework for distributed applications and computational workflows. In Proceedings of the 2011 ACM workshop on Gateway computing environments. ACM, 21--28.
[15]
Suresh Marru, Srinath Perera, Martin Feller, and Stewart G. Martin. 2008. Reliable and Scalable Job Submission: LEAD Science Gateway's Testing and Experiences with WS GRAM on TeraGrid Resources. (2008).
[16]
Suresh Marru, Marlon Pierce, Sudhakar Pamidighantam, and Chathuri Wimalasena. 2015. Apache airavata as a laboratory: architecture and case study for component-based gateway middleware. In Proceedings of the 1st Workshop on The Science of Cyberinfrastructure: Research, Experience, Applications and Models. ACM, 19--26.
[17]
Michael McLennan, Steven Clark, Ewa Deelman, Mats Rynge, Karan Vahi, Frank McKenna, Derrick Kearney, and Carol Song. 2015. HUBzero and Pegasus: integrating scientific workflows into science gateways. Concurrency and Computation: Practice and Experience 27, 2 (2015), 328--343.
[18]
Mark A Miller, Wayne Pfeiffer, and Terri Schwartz. 2010. Creating the CIPRES Science Gateway for inference of large phylogenetic trees. In 2010 gateway computing environments workshop (GCE). Ieee, 1--8.
[19]
Supun Nakandala, Suresh Marru, Marlon Piece, Sudhakar Pamidighantam, Kenneth Yoshimoto, Terri Schwartz, Subhashini Sivagnanam, Amit Majumdar, and Mark A Miller. 2017. Apache Airavata Sharing Service: A Tool for Enabling User Collaboration in Science Gateways. In Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact. ACM, 20.
[20]
Supun Nakandala, Sudhakar Pamidighantam, Suresh Marru, and Marlon Pierce. {n. d.}. Better Data Discoverability in Science Gateways. ({n. d.}).
[21]
Silvia Delgado Olabarriaga and Nancy Wilkins-Diehr. 2016. GCE15 Special Issue Conference Publications. Concurrency and Computation: Practice and Experience 28, 7 (2016), 1949--1951.
[22]
Sudhakar Pamidighantam, Supun Nakandala, Eroma Abeysinghe, Chathuri Wimalasena, Shameera Rathnayaka Yodage, Suresh Marru, and Marlon Pierce. 2016. Community Science Exemplars in SEAGrid Science Gateway. international conference on conceptual structures 80 (2016), 1927--1939.
[23]
Marlon Pierce, Suresh Marru, Eroma Abeysinghe, Sudhakar Pamidighantam, Marcus Christie, and Dimuthu Wannipurage. 2018. Supporting Science Gateways Using Apache Airavata and SciGaP Services. In Proceedings of the Practice and Experience on Advanced Research Computing. ACM, 99.
[24]
Marlon Pierce, Suresh Marru, Borries Demeler, Raminderjeet Singh, and Gary Gorbet. 2014. The apache airavata application programming interface: overview and evaluation with the UltraScan science gateway. In Proceedings of the 9th Gateway Computing Environments Workshop. IEEE Press, 25--29.
[25]
Pankaj Saha, Madhusudhan Govindaraju, Suresh Marru, and Marlon E. Pierce. 2016. Integrating Apache Airavata with Docker, Marathon, and Mesos. Concurrency and Computation: Practice and Experience 28, 7 (2016), 1952--1959.
[26]
Subhashini Sivagnanam, Amit Majumdar, Kenneth Yoshimoto, Vadim Astakhov, Anita Bandrowski, Maryann E Martone, and Nicholas T Carnevale. 2013. Introducing the Neuroscience Gateway. In IWSG.
[27]
Nancy Wilkins-Diehr, Sandra Gesing, and Tamas Kiss. 2015. Science gateway workshops 2013 special issue conference publications. Concurrency and Computation: Practice and Experience 27, 2 (2015), 253--257.

Cited By

View all
  • (2024)A Comparative Analysis of Apache Cloud Projects for Data Storage2024 7th International Conference on Advanced Communication Technologies and Networking (CommNet)10.1109/CommNet63022.2024.10793258(1-9)Online publication date: 4-Dec-2024
  • (2023)Airavata Metascheduler: A Reliable, Fault Tolerant, and Resource-Aware Job Scheduling ServicePractice and Experience in Advanced Research Computing 2023: Computing for the Common Good10.1145/3569951.3593605(35-42)Online publication date: 23-Jul-2023
  • (2023)Cybershuttle: An End-to-End Cyberinfrastructure Continuum to Accelerate Discovery in Science and EngineeringPractice and Experience in Advanced Research Computing 2023: Computing for the Common Good10.1145/3569951.3593602(26-34)Online publication date: 23-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
PEARC '19: Practice and Experience in Advanced Research Computing 2019: Rise of the Machines (learning)
July 2019
775 pages
ISBN:9781450372275
DOI:10.1145/3332186
  • General Chair:
  • Tom Furlani
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 July 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Apache Airavata
  2. Science gateways
  3. cyberinfrastructure
  4. job management

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

PEARC '19

Acceptance Rates

Overall Acceptance Rate 133 of 202 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)119
  • Downloads (Last 6 weeks)25
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Comparative Analysis of Apache Cloud Projects for Data Storage2024 7th International Conference on Advanced Communication Technologies and Networking (CommNet)10.1109/CommNet63022.2024.10793258(1-9)Online publication date: 4-Dec-2024
  • (2023)Airavata Metascheduler: A Reliable, Fault Tolerant, and Resource-Aware Job Scheduling ServicePractice and Experience in Advanced Research Computing 2023: Computing for the Common Good10.1145/3569951.3593605(35-42)Online publication date: 23-Jul-2023
  • (2023)Cybershuttle: An End-to-End Cyberinfrastructure Continuum to Accelerate Discovery in Science and EngineeringPractice and Experience in Advanced Research Computing 2023: Computing for the Common Good10.1145/3569951.3593602(26-34)Online publication date: 23-Jul-2023
  • (2021)User-Centric Design and Evolvable Architecture for Science Gateways: A Case Study2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid51090.2021.00036(267-276)Online publication date: May-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media