Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3219104.3219141acmotherconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article

Building a Science Gateway For Processing and Modeling Sequencing Data Via Apache Airavata

Published: 22 July 2018 Publication History

Abstract

The amount of DNA sequencing data has been exponentially growing during the past decade due to advances in sequencing technology. Processing and modeling large amounts of sequencing data can be computationally intractable for desktop computing platforms. High performance computing (HPC) resources offer advantages in terms of computing power, and can be a general solution to these problems. Using HPCs directly for computational needs requires skilled users who know their way around HPCs and acquiring such skills take time. Science gateways acts as the middle layer between users and HPCs, providing users with the resources to accomplish compute-intensive tasks without requiring specialized expertise. We developed a web-based computing platform for genome biologists by customizing the PHP Gateway for Airavata (PGA) framework that accesses publicly accessible HPC resources via Apache Airavata. This web computing platform takes advantage of the Extreme Science and Engineering Discovery Environment (XSEDE) which provides the resources for gateway development, including access to CPU, GPU, and storage resources. We used this platform to develop a gateway for the dREG algorithm, an online computing tool for finding functional regions in mammalian genomes using nascent RNA sequencing data. The dREG gateway provides its users a free, powerful and user-friendly GPU computing resource based on XSEDE, circumventing the need of specialized knowledge about installation, configuration, and execution on an HPC for biologists. The dREG gateway is available at: https://dREG.dnasequence.org/.

References

[1]
2018. Genome Cloud Computing - Amazon Web Service (AWS). (2018). https://aws.amazon.com/en/health/genomics/
[2]
2018. SciGaP gateway. (2018). https://scigap.org/
[3]
2018. Tus - resumable file uploads. (2018). https://tus.io/
[4]
Enis Afgan, Dannon Baker, Marius Van den Beek, Daniel Blankenberg, Dave Bouvier, Martin Čech, John Chilton, Dave Clements, Nate Coraor, Carl Eberhard, et al. 2016. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic acids research 44, W1 (2016), W3--W10.
[5]
Babak Alipanahi, Andrew Delong, Matthew T Weirauch, and Brendan J Frey. 2015. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nature biotechnology 33, 8 (2015), 831.
[6]
Ehsaneddin Asgari and Mohammad RK Mofrad. 2015. Continuous distributed representation of biological sequences for deep proteomics and genomics. PloS one 10, 11 (2015), e0141287.
[7]
Joseph G Azofeifa and Robin D Dowell. 2017. A generative model for the behavior of RNA polymerase. Bioinformatics 33, 2 (2017), 227--234.
[8]
Nicolas L Bray, Harold Pimentel, Páll Melsted, and Lior Pachter. 2016. Near-optimal probabilistic RNA-seq quantification. Nature biotechnology 34, 5 (2016), 525.
[9]
Antonio Celesti, Fabrizio Celesti, Maria Fazio, Placido Bramanti, and Massimo Villari. 2017. Are next-generation sequencing tools ready for the cloud? Trends in biotechnology 35, 6 (2017), 486--489.
[10]
Heejoon Chae, Sungmin Rhee, Kenneth P Nephew, and Sun Kim. 2014. BioVLAB-MMIA-NGS: microRNA--mRNA integrated analysis using high-throughput sequencing data. Bioinformatics 31, 2 (2014), 265--267.
[11]
Yifei Chen, Yi Li, Rajiv Narayan, Aravind Subramanian, and Xiaohui Xie. 2016. Gene expression inference with deep learning. Bioinformatics 32, 12 (2016), 1832--1839.
[12]
Tinyi Chu, Edward J Rice, Gregory T Booth, Hans H Salamanca, Zhong Wang, Leighton J Core, Sharon L Longo, Robert J Corona, Lawrence S Chin, John T Lis, et al. 2017. Chromatin run-on reveals nascent RNAs that differentiate normal and malignant brain tissue. bioRxiv (2017), 185991.
[13]
MA Cianfrocco, M Wong-Barnum, C Youn, R Wagner, and A Leschziner. 2017. COSMIC2: A Science Gateway for Cryo-Electron Microscopy Structure Determination. In Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability Success and Impact. ACM, 22.
[14]
Leighton J Core, Joshua J Waterfall, and John T Lis. 2008. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322, 5909 (2008), 1845--1848.
[15]
Charles G Danko, Stephanie L Hyland, Leighton J Core, Andre L Martins, Colin T Waters, Hyung Won Lee, Vivian G Cheung, W Lee Kraus, John T Lis, and Adam Siepel. 2015. Identification of active transcriptional regulatory elements from GRO-seq data. Nature methods 12, 5 (2015), 433.
[16]
Narayan Desai, Dion Antonopoulos, Jack A Gilbert, Elizabeth M Glass, and Folker Meyer. 2012. From genomics to metagenomics. Current opinion in biotechnology 23, 1 (2012), 72--76.
[17]
Andrey V Kartashov and Artem Barski. 2015. BioWardrobe: an integrated platform for analysis of epigenomics and transcriptomics data. Genome biology 16, 1 (2015), 158.
[18]
W James Kent, Charles W Sugnet, Terrence S Furey, Krishna M Roskin, Tom H Pringle, Alan M Zahler, and David Haussler. 2002. The human genome browser at UCSC. Genome research 12, 6 (2002), 996--1006.
[19]
Richard Knepper, Eric Coulter, Marlon Pierce, Suresh Marru, and Sudhakar Pamidighantam. 2017. Using the Jetstream Research Cloud to provide Science Gateway resources. In Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE Press, 753--757.
[20]
Parul Kudtarkar, Todd F DeLuca, Vincent A Fusaro, Peter J Tonellato, and Dennis P Wall. 2010. Cost-effective cloud computing: a case study using the comparative genomics tool, roundup. Evolutionary Bioinformatics 6 (2010), EBO--S6259.
[21]
Hojoong Kwak, Nicholas J Fuda, Leighton J Core, and John T Lis. 2013. Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science 339, 6122 (2013), 950--953.
[22]
Maxwell W Libbrecht and William Stafford Noble. 2015. Machine learning applications in genetics and genomics. Nature Reviews Genetics 16, 6 (2015), 321.
[23]
Marc Lohse, Anthony M Bolgaer, Axel Nagel, Alisdair R Fernie, John E Lunn, Mark Stitt, and Björn Usadel. 2012. RobiNA: A user-friendly, integrated software solution for RNA-Seq-based transcriptomics. Nucleic acids research 40, W1 (2012), W622--W627.
[24]
Suresh Marru, Lahiru Gunathilake, Chathura Herath, Patanachai Tangchaisin, Marlon Pierce, Chris Mattmann, Raminder Singh, Thilina Gunarathne, Eran Chinthaka, Ross Gardler, et al. 2011. Apache airavata: a framework for distributed applications and computational workflows. In Proceedings of the 2011 ACM workshop on Gateway computing environments. ACM, 21--28.
[25]
Mark A Miller, Wayne Pfeiffer, and Terri Schwartz. 2010. Creating the CIPRES Science Gateway for inference of large phylogenetic trees. In Gateway Computing Environments Workshop (GCE), 2010. IEEE, 1--8.
[26]
Yashar Niknafs, Nicholas Molen, Balaji Pandian, Matthew Iyer, and Arul Chinnaiyan. 2017. Bridging the gap between NGS data and its usability: cancer gene discovery through massive-scale transcriptomic analyses and development of a powerful web-tool for dissemination of these findings. (2017).
[27]
Sudhakar Pamidighantam, Supun Nakandala, Eroma Abeysinghe, Chathuri Wimalasena, Shameera Rathnayaka Yodage, Suresh Marru, and Marlon Pierce. 2016. Community science exemplars in seagrid science gateway: Apache airavata based implementation of advanced infrastructure. Procedia Computer Science 80 (2016), 1927--1939.
[28]
Rob Patro, Geet Duggal, Michael I Love, Rafael A Irizarry, and Carl Kingsford. 2017. Salmon provides fast and bias-aware quantification of transcript expression. Nature methods 14, 4 (2017), 417.
[29]
Marlon Pierce, Suresh Marru, Borries Demeler, Raminderjeet Singh, and Gary Gorbet. 2014. The apache airavata application programming interface: overview and evaluation with the UltraScan science gateway. In Gateway Computing Environments Workshop (GCE), 2014 9th. IEEE, 25--29.
[30]
Marlon E Pierce, Suresh Marru, Lahiru Gunathilake, Don Kushan Wijeratne, Raminder Singh, Chathuri Wimalasena, Shameera Ratnayaka, and Sudhakar Pamidighantam. 2015. Apache Airavata: design and directions of a science gateway framework. Concurrency and Computation: Practice and Experience 27, 16 (2015), 4282--4291.
[31]
Sheila M Reynolds, Michael Miller, Phyliss Lee, Kalle Leinonen, Suzanne M Paquette, ZackRodebaugh, Abigail Hahn, David L Gibbs, Joseph Slagel, William J Longabaugh, et al. 2017. The ISB Cancer Genomics Cloud: a flexible cloud-based platform for cancer genomics research. Cancer research 77, 21 (2017), e7--el0.
[32]
Shrutii Sarda and Sridhar Hannenhalli. 2014. Next-generation sequencing and epigenomics research: a hammer in search of nails. Genomics & informatics 12, 1 (2014), 2--11.
[33]
Ritambhara Singh, Jack Lanchantin, Gabriel Robins, and Yanjun Qi. 2016. Deepchrome: deep-learning for predicting gene expression from histone modifications. Bioinformatics 32, 17 (2016), i639--i648.
[34]
John Towns, Timothy Cockerill, Maytal Dahan, Ian Foster, Kelly Gaither, Andrew Grimshaw, Victor Hazlewood, Scott Lathrop, Dave Lifka, Gregory D Peterson, et al. 2014. XSEDE: accelerating scientific discovery. Computing in Science & Engineering 16, 5 (2014), 62--74.
[35]
Sheng Wang, Jian Peng, Jianzhu Ma, and Jinbo Xu. 2016. Protein secondary structure prediction using deep convolutional neural fields. Scientific reports 6 (2016), 18962.
[36]
Zhong Wang, Tinyi Chu, Lauren A Choate, and Charles G Danko. 2017. Rgtsvm: Support Vector Machines on a GPU in R. arXiv preprint arXiv:1706.05544 (2017).
[37]
Zhong Wang, Tinyi Chu, Lauren A Choate, and Charles G Danko. 2018. Identification of regulatory elements from nascent transcription using dREG. bioPxiv (2018), 321539.
[38]
Nancy Wilkins-Diehr, Sergiu Sanielevici, Jay Alameda, John Cazes, Lonnie Crosby, Marlon Pierce, and Ralph Roskies. 2015. An Overview of the XSEDE Extended Collaborative Support Program. In International Conference on Supercomputing. Springer, 3--13.
[39]
Tianwei Yue and Haohan Wang. 2018. Deep Learning for Genomics: A Concise Overview. arXiv preprint arXiv:1802.00810 (2018).
[40]
Xin Zhou, Daofeng Li, Bo Zhang, Rebecca FLowdon, Nicole B Rockweiler, Renee L Sears, Pamela AF Madden, Ivan Smirnov, Joseph F Costello, and Ting Wang. 2015. Epigenomic annotation of genetic variants using the Roadmap Epigenome Browser. Nature biotechnology 33, 4 (2015), 345.

Cited By

View all
  • (2024)Science Gateways and AI/ML: How Can Gateway Concepts and Solutions Meet the Needs in Data Science?Critical Infrastructure - Modern Approach and New Developments10.5772/intechopen.110144Online publication date: 14-Feb-2024
  • (2024)Enhancing Vulnerability Prioritization in Cloud Computing Using Multi-View Representation LearningJournal of Management Information Systems10.1080/07421222.2024.237638441:3(708-743)Online publication date: 4-Sep-2024
  • (2020)An extensible Django-based web portal for Apache AiravataPractice and Experience in Advanced Research Computing 2020: Catch the Wave10.1145/3311790.3396650(160-167)Online publication date: 26-Jul-2020
  • Show More Cited By

Index Terms

  1. Building a Science Gateway For Processing and Modeling Sequencing Data Via Apache Airavata

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    PEARC '18: Proceedings of the Practice and Experience on Advanced Research Computing: Seamless Creativity
    July 2018
    652 pages
    ISBN:9781450364461
    DOI:10.1145/3219104
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 July 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Apache Airavata
    2. Next Generation Sequencing
    3. Science gateway
    4. cloud computing
    5. sequencing data
    6. software-as-a-service

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • NHGRI
    • XSEDE allocations
    • NSF award

    Conference

    PEARC '18

    Acceptance Rates

    PEARC '18 Paper Acceptance Rate 79 of 123 submissions, 64%;
    Overall Acceptance Rate 133 of 202 submissions, 66%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Science Gateways and AI/ML: How Can Gateway Concepts and Solutions Meet the Needs in Data Science?Critical Infrastructure - Modern Approach and New Developments10.5772/intechopen.110144Online publication date: 14-Feb-2024
    • (2024)Enhancing Vulnerability Prioritization in Cloud Computing Using Multi-View Representation LearningJournal of Management Information Systems10.1080/07421222.2024.237638441:3(708-743)Online publication date: 4-Sep-2024
    • (2020)An extensible Django-based web portal for Apache AiravataPractice and Experience in Advanced Research Computing 2020: Catch the Wave10.1145/3311790.3396650(160-167)Online publication date: 26-Jul-2020
    • (2018)Discovering Transcriptional Regulatory Elements From Run‐On and Sequencing Data Using the Web‐Based dREG GatewayCurrent Protocols in Bioinformatics10.1002/cpbi.7066:1Online publication date: 27-Dec-2018

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media