Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3592533.3592809acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

A Study of Orchestration Approaches for Scientific Workflows in Serverless Computing

Published: 08 May 2023 Publication History

Abstract

Scientific workflows are typically data- and compute-intensive. They consist of many stages, each of which may contain hundreds to even thousands of tasks. Traditionally, scientific workflows have been executed using the serverful computing model. Serverless computing presents an attractive alternative to the serverful computing model as it frees developers from managing and provisioning resources and offers a fine-grained pay-as-you-go pricing model. In this paper, we investigate the viability of using serverless computing to execute scientific workflows. Specifically, we discuss, implement, and evaluate three orchestration approaches for executing scientific workflows: serverful-centralized, serverless-centralized, and serverless-decentralized. This work is the first to implement and evaluate a purely serverless orchestration approach that does not require deploying a dedicated workflow manager. Our evaluation shows that serverless orchestration approaches cause a noticeable performance overhead for some workflow patterns (e.g., reduce stages) due to accessing a large amount of remote data. We propose two optimizations (i.e., prefetching file privileges and container placement) that exploit data locality to mitigate that impact. Our evaluation with the Montage application shows that a fully decentralized approach achieves a comparable performance to a serverful approach. Also, our results show that prefetching file privileges and container placement optimizations improve the performance by 26% and 44% respectively when compared to an unoptimized version.

References

[1]
Daniel Katz, G. Berriman, John Good, Anastasia Laity, Ewa Deelman, Carl Kesselman, Gurmeet Singh, Mei-Hui Su, Thomas Prince, and Roy Williams. Montage: A grid portal and software toolkit for science-grade astronomical image mosaicking. International Journal of Computational Science and Engineering, 4, 05 2010.
[2]
Suyash Shringarpure, Andrew Carroll, Francisco De La Vega, and Carlos Bustamante. Inexpensive and highly reproducible cloud-based variant calling of 2,535 human genomes. PloS one, 10:e0129277, 06 2015.
[3]
Oleg Trott and Arthur J. Olson. Autodock vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of Computational Chemistry, Jun 2009.
[4]
William L. Poehlman, Mats Rynge, D. Balamurugan, Nicholas Mills, and Frank A. Feltus. Osg-kinc: High-throughput gene co-expression network construction using the open science grid. In 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 1827--1831, 2017.
[5]
Yang Liu, Saad Khan, Juexin Wang, Mats Rynge, Yuanxun Zhang, Shuai Zeng, Shiyuan Chen, Joao Vitor Maldonado dos Santos, Babu Valliyodan, Prasad Calyam, Nirav Merchant, Henry Nguyen, Dong Xu, and Trupti Joshi. Pgen: Large-scale genomic variations analysis workflow and browser in soykb. BMC Bioinformatics, 17:337, 10 2016.
[6]
Gideon Juve, Ann Chervenak, Ewa Deelman, Shishir Bharathi, Gaurang Mehta, and Karan Vahi. Characterizing and profiling scientific workflows. Future Generation Computer Systems, 29(3):682--692, 2013. Special Section: Recent Developments in High Performance Computing and Security.
[7]
Jens-Sönke Vöckler, Gideon Juve, Ewa Deelman, Mats Rynge, and Bruce Berriman. Experiences using cloud computing for a scientific workflow application. In Proceedings of the 2nd International Workshop on Scientific Cloud Computing, ScienceCloud '11, page 15--24, New York, NY, USA, 2011. Association for Computing Machinery.
[8]
Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta, Bruce Berriman, Benjamin P. Berman, and Phil Maechling. Scientific workflow applications on amazon ec2. In 2009 5th IEEE International Conference on E-Science Workshops, pages 59--66, 2009.
[9]
Janez Kranjc, Vid Podpečan, and Nada Lavrač. Clowdflows: a cloud based scientific workflow platform. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2012, Bristol, UK, September 24--28, 2012. Proceedings, Part II 23, pages 816--819. Springer, 2012.
[10]
Ashraf Mahgoub, Edgardo Barsallo Yi, Karthick Shankar, Eshaan Minocha, Sameh Elnikety, Saurabh Bagchi, and Somali Chaterji. Wisefuse: Workload characterization and dag transformation for serverless workflows. In Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS/PERFORMANCE '22, page 57--58, New York, NY, USA, 2022. Association for Computing Machinery.
[11]
Ceph file system. https://docs.ceph.com/en/pacific/cephfs/index.html, 2023.
[12]
Samer Al-Kiswany, Lauro B. Costa, Hao Yang, Emalayan Vairavanathan, and Matei Ripeanu. A cross-layer optimized storage system for workflow applications. Future Generation Computer Systems, 75:423--437, 2017.
[13]
Microsoft. Durable functions overview - azure | microsoft learn. https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview?tabs=csharp-inproc, 2023.
[14]
Ewa Deelman, Karan Vahi, Gideon Juve, Mats Rynge, Scott Callaghan, Philip J. Maechling, Rajiv Mayani, Weiwei Chen, Rafael Ferreira da Silva, Miron Livny, and Kent Wenger. Pegasus, a workflow management system for science automation. Future Generation Computer Systems, 46:17--35, 2015.
[15]
Aji John, Kristiina Ausmees, Kathleen Muenzen, Catherine Kuhn, and Amanda Tan. Sweep: Accelerating scientific research through scalable serverless workflows. In Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing Companion, UCC '19 Companion, page 43--50, New York, NY, USA, 2019. Association for Computing Machinery.
[16]
Maciej Malawski, Adam Gajek, Adam Zima, Bartosz Balis, and Kamil Figiela. Serverless execution of scientific workflows: Experiments with hyperflow, aws lambda and google cloud functions. Future Generation Computer Systems, 110:502--514, 2020.
[17]
Krzysztof Burkat, Maciej Pawlik, Bartosz Balis, Maciej Malawski, Karan Vahi, Mats Rynge, Rafael Ferreira da Silva, and Ewa Deelman. Serverless containers - rising viable approach to scientific workflows. In 2021 IEEE 17th International Conference on eScience (eScience), pages 40--49, 2021.
[18]
Mania Abdi, Sam Ginzburg, Charles Lin, Jose M Faleiro, Íñigo Goiri, Gohar Irfan Chaudhry, Ricardo Bianchini, Daniel S. Berger, and Rodrigo Fonseca. Palette load balancing: Locality hints for serverless functions. In Proceedings of the 18th European Conference on Computer Systems (EuroSys). ACM, May 2023.
[19]
Ashraf Mahgoub, Edgardo Barsallo Yi, Karthick Shankar, Sameh Elnikety, Somali Chaterji, and Saurabh Bagchi. ORION and the three rights: Sizing, bundling, and prewarming for serverless DAGs. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), pages 303--320, Carlsbad, CA, July 2022. USENIX Association.
[20]
Ashraf Mahgoub, Karthick Shankar, Subrata Mitra, Ana Klimovic, Somali Chaterji, and Saurabh Bagchi. SONIC: Application-aware data passing for chained server-less applications. In 2021 USENIX Annual Technical Conference (USENIX ATC 21), pages 285--301. USENIX Association, July 2021.
[21]
AWS. Amazon s3 - cloud object storage. https://aws.amazon.com/s3/, 2023.
[22]
Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong, Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David Johnson, Kirk Webb, et al. The design and operation of cloudlab. In USENIX Annual Technical Conference, pages 1--14, 2019.
[23]
Knative. Knative: Serverless containers in kubernetes environments. https://knative.dev, 2023.
[24]
Johannes Manner, Martin Endreß, Tobias Heckel, and Guido Wirtz. Cold start influencing factors in function as a service. In 2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion), pages 181--188, 2018.

Cited By

View all
  • (2024)Escalonamento com Consciência Energética para Fluxos de Trabalho Científicos sem Servidor: Uma Abordagem de Aprendizado de MáquinaAnais da XV Escola Regional de Alto Desempenho de São Paulo (ERAD-SP 2024)10.5753/eradsp.2024.239934(89-92)Online publication date: 16-May-2024
  • (2024)A survey on the cold start latency approaches in serverless computing: an optimization-based perspectiveComputing10.1007/s00607-024-01335-5Online publication date: 17-Aug-2024
  • (2024)Electricity-cost-aware multi-workflow scheduling in heterogeneous cloudComputing10.1007/s00607-024-01264-3106:6(1749-1775)Online publication date: 24-Feb-2024

Index Terms

  1. A Study of Orchestration Approaches for Scientific Workflows in Serverless Computing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SESAME '23: Proceedings of the 1st Workshop on SErverless Systems, Applications and MEthodologies
    May 2023
    64 pages
    ISBN:9798400701856
    DOI:10.1145/3592533
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 May 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. serverless
    2. scientific workflows
    3. workflow orchestration

    Qualifiers

    • Research-article

    Conference

    SESAME '23
    Sponsor:

    Upcoming Conference

    EuroSys '25
    Twentieth European Conference on Computer Systems
    March 30 - April 3, 2025
    Rotterdam , Netherlands

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)100
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Escalonamento com Consciência Energética para Fluxos de Trabalho Científicos sem Servidor: Uma Abordagem de Aprendizado de MáquinaAnais da XV Escola Regional de Alto Desempenho de São Paulo (ERAD-SP 2024)10.5753/eradsp.2024.239934(89-92)Online publication date: 16-May-2024
    • (2024)A survey on the cold start latency approaches in serverless computing: an optimization-based perspectiveComputing10.1007/s00607-024-01335-5Online publication date: 17-Aug-2024
    • (2024)Electricity-cost-aware multi-workflow scheduling in heterogeneous cloudComputing10.1007/s00607-024-01264-3106:6(1749-1775)Online publication date: 24-Feb-2024

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media