Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3631295.3631403acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

Scaling a Variant Calling Genomics Pipeline with FaaS

Published: 11 December 2023 Publication History

Abstract

With the escalating complexity and volume of genomic data, the capacity of biology institutions' HPC faces limitations. While the Cloud presents a viable solution for short-term elasticity, its intricacies pose challenges for bioinformatics users. Alternatively, serverless computing allows for workload scalability with minimal developer burden. However, porting a scientific application to serverless is not a straightforward process. In this article, we present a Variant Calling genomics pipeline migrated from single-node HPC to a serverless architecture. We describe the inherent challenges of this approach and the engineering efforts required to achieve scalability. We contribute by open-sourcing the pipeline for future systems research and as a scalable user-friendly tool for the bioinformatics community.

References

[1]
European Nucleotide Archive. 2022. Statistics. https://www.ebi.ac.uk/ena/browser/about/statistics.
[2]
Daniel Barcelona-Pons and Pedro García-López. 2021. Benchmarking parallelism in FaaS platforms. Future Generation Computer Systems 124 (2021), 268--284. https://doi.org/10.1016/j.future.2021.06.005
[3]
J. Burgin, A. Ahamed, C. Cummins, R. Devraj, K. Gueye, D. Gupta, V. Gupta, M. Haseeb, M. Ihsan, E. Ivanov, S. Jayathilaka, V. Balavenkataraman Kadhirvelu, M. Kumar, A. Lathi, R. Leinonen, M. Mansurova, J. McKinnon, C. O'Cathail, J. rio, S. Pesant, N. Rahman, G. Rinck, S. Selvakumar, S. Suman, S. Vijayaraja, Z. Waheed, P. Woollard, D. Yuan, A. Zyoud, T. Burdett, and G. Cochrane. 2023. The European Nucleotide Archive in 2022. Nucleic Acids Res 51, D1 (Jan 2023), D121--D125.
[4]
Rodrigo Crespo-Cepeda, Giuseppe Agapito, Jose Vazquez-Poletti, and Mario Cannataro. 2019. Challenges and Opportunities of Amazon Serverless Lambda Services in Bioinformatics. 663--668. https://doi.org/10.1145/3307339.3343462
[5]
Paolo Di Tommaso, Maria Chatzou, Evan W Floden, Pablo Prieto Barja, Emilio Palumbo, and Cedric Notredame. 2017. Nextflow enables reproducible computational workflows. Nature biotechnology 35, 4 (2017), 316--319.
[6]
L. Ferretti, C. Tennakoon, A. Silesian, G. Freimanis, and P. Ribeca. 2019. SiNPle: Fast and Sensitive Variant Calling for Deep Sequencing Data. Genes (2019).
[7]
Joseph M. Hellerstein, Jose Faleiro, Joseph E. Gonzalez, Johann Schleier-Smith, Vikram Sreekanti, Alexey Tumanov, and Chenggang Wu. 2018. Serverless Computing: One Step Forward, Two Steps Back. arXiv:1812.03651 [cs.DC]
[8]
Ling-Hong Hung, Xingzhi Niu, Wes Lloyd, and Ka Yee Yeung. 2020. Accessible and interactive RNA sequencing analysis using serverless computing. bioRxiv (2020). https://doi.org/10.1101/576199 arXiv:https://www.biorxiv.org/content/early/2020/10/03/576199.full.pdf
[9]
Aji John, Kathleen Muenzen, and Kristiina Ausmees. 2021. Evaluation of serverless computing for scalable execution of a joint variant calling workflow. PLOS ONE 16, 7 (07 2021), 1--12. https://doi.org/10.1371/journal.pone.0254363
[10]
Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. 2017. Occupy the Cloud: Distributed Computing for the 99%. In Proceedings of the 2017 Symposium on Cloud Computing (Santa Clara, California) (SoCC '17). Association for Computing Machinery, New York, NY, USA, 445--451. https://doi.org/10.1145/3127479.3128601
[11]
Benjamin D Lee, Michael A Timony, and Pablo Ruiz. 2019. DNAvisualization.org: a serverless web tool for DNA sequence visualization. Nucleic Acids Res 47, W1 (July 2019), W20--W25.
[12]
Santiago Marco-Sola, Michael Sammeth, Roderic Guigó, and Paolo Ribeca. 2012. The GEM mapper: fast, accurate and versatile alignment by filtration. Nature methods 9, 12 (2012), 1185--1188.
[13]
Xingzhi Niu, Dimitar Kumanov, Ling-Hong Hung, Wesley Lloyd, and Ka Yee Yeung. 2019. Leveraging Serverless Computing to Improve Performance for Sequence Comparison. 683--687. https://doi.org/10.1145/3307339.3343465
[14]
Qifan Pu, Shivaram Venkataraman, and Ion Stoica. 2019. Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). USENIX Association, Boston, MA, 193--206. https://www.usenix.org/conference/nsdi19/presentation/pu
[15]
David P. Rodgers. 1985. Improvements in Multiprocessor System Design. SIGARCH Comput. Archit. News 13, 3 (jun 1985), 225--231. https://doi.org/10.1145/327070.327215
[16]
Josep Sampe, Pedro Garcia-Lopez, Marc Sanchez-Artigas, Gil Vernik, Pol Roca-Llaberia, and Aitor Arjona. 2021. Toward Multicloud Access Transparency in Serverless Computing. IEEE Software 38, 1 (2021), 68--74. https://doi.org/10.1109/MS.2020.3029994
[17]
Vaishaal Shankar, Karl Krauth, Qifan Pu, Eric Jonas, Shivaram Venkataraman, Ion Stoica, Benjamin Recht, and Jonathan Ragan-Kelley. 2018. numpywren: serverless linear algebra. arXiv:1810.09679 [cs.DC]
[18]
Josef Spillner, Cristian Mateos, and David A. Monge. 2018. FaaSter, Better, Cheaper: The Prospect of Serverless Scientific Computing and HPC. In High Performance Computing, Esteban Mocskos and Sergio Nesmachnow (Eds.). Springer International Publishing, Cham, 154--168.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WoSC '23: Proceedings of the 9th International Workshop on Serverless Computing
December 2023
68 pages
ISBN:9798400704550
DOI:10.1145/3631295
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • IFIP: International Federation for Information Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 December 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FaaS
  2. genomics
  3. serverless
  4. workflow

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

Middleware '23
Sponsor:

Upcoming Conference

MIDDLEWARE '24
25th International Middleware Conference
December 2 - 6, 2024
Hong Kong , Hong Kong

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 62
    Total Downloads
  • Downloads (Last 12 months)62
  • Downloads (Last 6 weeks)2
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media