Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2822332.2822340acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Enabling workflow repeatability with virtualization support

Published: 15 November 2015 Publication History

Abstract

The value of workflows to the scientific community spans over time and space. Not only results but also performance and resource consumption of a workflow need to be replayed over time and in varying environments. Achieving such repeatability in practice is challenging due to changes in software and infrastructure over time. In this work, we introduce a new abstraction that builds on the concept of virtual appliance to enable workflow repeatability. We have also developed a novel architecture to leverage this abstraction and realized it into a system implementation that supports a popular workflow management system and builds on a federated in-production environment. We demonstrate the effectiveness of our approach by examining various aspects of workflow repeatability. Our results show that workflows can be replayed with 2% fidelity when considering their walltime as performance metric.

References

[1]
Montage. http://montage.ipac.caltech.edu/docs/grid.html.
[2]
Network descriptive language. http://en.wikipedia.org/wiki/Network_Description_Language.
[3]
UC Davis, UC Santa Barbara, and UC San Diego. https://kepler-project.org/.
[4]
S. Bechhofer, J. Ainsworth, J. Bhagat, I. Buchan, P. Couch, D. Cruickshank, D. D. Roure, M. Delderfield, I. Dunlop, M. Gamble, C. Goble, D. Michaelides, P. Missier, S. Owen, D. Newman, and S. Sufi. Why Linked Data Is Not Enough for Scientists. In e-Science (e-Science), 2010 IEEE Sixth International Conference on, pages 300--307, 2010.
[5]
K. Belhajjame, C. Goble, S. Soiland-Reyes, and D. De Roure. Fostering Scientific Workflow Preservation through Discovery of Substitute Services. In E-Science (e-Science), 2011 IEEE 7th International Conference on, pages 97--104, 2011.
[6]
Shishir Bharathi, Ann Chervenak, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, and Karan Vahi. Characterization of Scientific Workflows. In Workflows in Support of Large-Scale Science, 2008. WORKS 2008. Third Workshop on, pages 1--10. IEEE, 2008.
[7]
Jeff Chase, Laura Grit, David Irwin, Varun Marupadi, Piyush Shivam, and Aydan Yumerefendi. Beyond Virtual Data Centers: Toward An Open Resource Control Architecture. In in Selected Papers from the International Conference on the Virtual Computing Initiative (ACM Digital Library), ACM, 2007.
[8]
Ewa Deelman, Gurmeet Singh, Mei-Hui Su, James Blythe, Yolanda Gil, Carl Kesselman, Gaurang Mehta, Karan Vahi, G. Bruce Berriman, John Good, Anastasia Laity, Joseph C. Jacob, and Daniel S. Katz. Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems. Scientific Programming, 13(3):219--237, 2005.
[9]
ExoGENI. http://www.exogeni.net/.
[10]
Juliana Freire, Philippe Bonnet, and Dennis Shasha. Computational Reproducibility: State-of-the-art, Challenges, and Database Research Opportunities. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12, pages 593--596, New York, NY, USA, 2012. ACM.
[11]
Ian P. Gent. The Recomputation Manifesto. http://arxiv.org/abs/1304.3674, April 2013.
[12]
Y. Gil, E. Deelman, M. Ellisman, T. Fahringer, G. Fox, D. Gannon, C. Goble, M. Livny, L. Moreau, and J. Myers. Examining the Challenges of Scientific Workflows. Computer, 40(12):24--32, 2007.
[13]
Michael Litzkow, Miron Livny, and Matthew Mutka. Condor - a hunter of idle workstations. In Proceedings of the 8th International Conference of Distributed Computing Systems, June 1988.
[14]
Anirban Mandal, Paul Ruth, Ilya Baldin, Yufeng Xin, Claris Castillo, Gideon Juve, Mats Rynge, Ewa Deelman, and Jeff Chase. Tr-15-01: Adapting Scientific Workflows on Networked Clouds Using Proactive Introspection. Technical Report TR-15-01, Renaissance Computing Institute (RENCI), 2015.
[15]
MongoDB. https://www.mongodb.org/.
[16]
Ahalt S. Berg J. Coyle J. Evans J. Fecho K. Gillis D. Schmitt C. Young D. Owen, P. and K. Wilhelmsen. Technologies for Genomic Medicine: The GMW, A Genetic Medical Workflow Engine. 2014.
[17]
RabbitMQ. http://www.rabbitmq.com/.
[18]
David De Roure, Carole Goble, and Robert Stevens. The Design and Realisation of the Virtual Research Environment for Social Sharing of Workflows. Future Generation Computer Systems, 25(5):561--567, 2009.
[19]
Idafen Santana-Perez, Rafael Ferreira da Silva, Mats Rynge, Ewa Deelman, MarÃņaS. PÃl'rez-HernÃąndez, and Oscar Corcho. A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study. In Euro-Par 2014: Parallel Processing Workshops, volume 8805 of Lecture Notes in Computer Science, pages 452--463. Springer International Publishing, 2014.
[20]
Constantine Sapuntzakis and Monica S. Lam. Virtual Appliances in the Collective: A Road to Hassle-Free Computing. In Proceedings of the 9th Conference on Hot Topics in Operating Systems - Volume 9, HOTOS'03, pages 10--10, Berkeley, CA, USA, 2003. USENIX Association.
[21]
Victoria Stodden, Freidrich Leisch, and Roger D. Peng. Implementing Reproducible Research, chapter 10: Reproducibility, Virtual Appliances, and Cloud Computing, pages 282--295. CRC Press, 2014.
[22]
Indiana University. FutureGrid. https://portal.futuregrid.org/.
[23]
Katherine Wolstencroft, Robert Haines, Donal Fellows, Alan Williams, David Withers, Stuart Owen, Stian Soiland-Reyes, Ian Dunlop, Aleksandra Nenadic, Paul Fisher, Jiten Bhagat, Khalid Belhajjame, Finn Bacall, Alex Hardisty, Abraham Nieva de la Hidalga, Maria P. Balcazar Vargas, Shoaib Sufi, and Carole Goble. The Taverna Workflow Suite: Designing and Executing Workflows of Web Services on the Desktop, Web or in the Cloud. Nucleic Acids Research, 41(W1):W557--W561, 2013.
[24]
Jun Zhao, J. M. Gomez-Perez, K. Belhajjame, G. Klyne, E. Garcia-Cuesta, A. Garrido, K. Hettne, M. Roos, D. De Roure, and C. Goble. Why Workflows Break? Understanding and Combating Decay in Taverna Workflows. In E-Science (e-Science), 2012 IEEE 8th International Conference on, pages 1--9, 2012.

Cited By

View all
  • (2022)Sharing and performance optimization of reproducible workflows in the cloudFuture Generation Computer Systems10.1016/j.future.2019.03.04598:C(487-502)Online publication date: 21-Apr-2022
  • (2020)Deployable Self-contained Workflow ModelsService-Oriented and Cloud Computing10.1007/978-3-030-44769-4_7(85-96)Online publication date: 27-Mar-2020
  • (2016)A framework for scientific workflow reproducibility in the cloud2016 IEEE 12th International Conference on e-Science (e-Science)10.1109/eScience.2016.7870888(81-90)Online publication date: Oct-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WORKS '15: Proceedings of the 10th Workshop on Workflows in Support of Large-Scale Science
November 2015
98 pages
ISBN:9781450339896
DOI:10.1145/2822332
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. repeatability
  2. virtual appliance
  3. workflow

Qualifiers

  • Research-article

Funding Sources

  • National Science Fundation

Conference

SC15
Sponsor:

Acceptance Rates

WORKS '15 Paper Acceptance Rate 9 of 13 submissions, 69%;
Overall Acceptance Rate 30 of 54 submissions, 56%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Sharing and performance optimization of reproducible workflows in the cloudFuture Generation Computer Systems10.1016/j.future.2019.03.04598:C(487-502)Online publication date: 21-Apr-2022
  • (2020)Deployable Self-contained Workflow ModelsService-Oriented and Cloud Computing10.1007/978-3-030-44769-4_7(85-96)Online publication date: 27-Mar-2020
  • (2016)A framework for scientific workflow reproducibility in the cloud2016 IEEE 12th International Conference on e-Science (e-Science)10.1109/eScience.2016.7870888(81-90)Online publication date: Oct-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media