Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2554850.2555066acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Collecting cloud provenance metadata with Matriohska: a case study with genomic workflows

Published: 24 March 2014 Publication History

Abstract

Scientific Workflows are abstractions used to model in silico scientific experiments. Cloud environments are still incipient in collecting and recording prospective and retrospective provenance. This paper presents an approach to support collecting metadata provenance of in silico scientific experiments executed in public clouds. The strategy was implemented as a distributed and modular architecture named Matriohska. This paper also presents a provenance data model compatible with PROV specification. We also show preliminary results that describe how provenance metadata was captured from the components running in the cloud.

References

[1]
Abbadi, I. M., A framework for establishing trust in cloud provenance. In Int. J. of Inf. Security, 2013 vol. 12, no. 2, 111--128.
[2]
AMAZON EC2, Amazon Elastic Compute Cloud. 2013. http://aws.amazon.com/ec2/.
[3]
Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludascher, B., Mock, S., (2004), Kepler: an extensible system for design and execution of scientific workflows. In SSDBM, 2004, 423--424.
[4]
Buneman, P., Chapman, A. P., Cheney, J. Provenance management in curated databases. In Proc. of the 2006 ACM SIGMOD. 539--550.
[5]
Callahan, S. P., Freire, J., Santos, E., Scheidegger, C. E., Silva, C. T., Vo, H. T., VisTrails: visualization meets data management. In: SIGMOD 2006, 745--747, Chicago, IL, USA.
[6]
Callaghan, S., Maechling, P., Small, P. et al. Metrics for heterogeneous scientific workflows: A case study of an earthquake science application,. Int. Jour. of High Perf. Comp. App., 2011, vol. 25, no. 3, 274--285.
[7]
Cruz, S. M. S. D., Campos, M., Mattoso, M., Towards a Taxonomy of Provenance in Scientific Workflow Management Systems. In IEEE Int. Workshop on Scientific Workflows, 2009, Los Angeles, California, United States.
[8]
Cruz, S. M. S. et al., Provenance-Based Approach to Resource Discovery in Distributed Molecular Dynamics Workflows. In LNCS - V, 6162/2010. 66--80.
[9]
Cruz, S. M. S, Campos, M. L. M. Mattoso, M. A Foundational Ontology to Support Scientific Experiments. In ONTOBRAS-MOST 2012: 144--155.
[10]
Cruz, S. M. S. et al Detecting distant homologies on protozoans metabolic pathways using scientific workflows. In IJDMB, 2010, 4(3): 256--280.
[11]
Deelman, E., et al., Pegasus: a Framework for Mapping Complex Scientific Workflows onto Distributed Systems. Scientific Programming Journal, 2005, vol. 13(3), 219--237.
[12]
Foster, I., Kesselman, C., The Grid: Blueprint for a New Computing Infrastructure. 2004, Morgan Kaufmann.
[13]
Freire, J., Koop, D., Santos, E., Silva, C. T., Provenance for Computational Tasks: A Survey, Computing in Science and Engineering, 2008, v. 10, n. 3, 11--21.
[14]
Gessiou, E., Pappas, V., Athanasopoulos, E., Keromytis, A. D., Ioannidis, S. Towards a Universal Data Provenance Framework Using Dynamic Instrumentation. In SEC 2012 103--114
[15]
Gil, Y., et al., Examining the challenges of scientific workflows Computer, 2007 vol. 40, Issue 12, 24--32.
[16]
Hartig, O., Zhao, J. Publishing and Consuming Provenance Metadata on the Web of Linked Data. In Proc. of the 3rd Int. Provenance and Annotation Workshop, 2010, New York, USA.
[17]
Hey, T., Tansley, S., Tolle, K., The Fourth Paradigm: Data-Intensive Scientific Discovery. 2009.
[18]
Hoffa, C., Mehta, G., Freeman, T., Deelman, E., Keahey, K., Berriman, B., Good, J., On the use of cloud computing for scientific workflows. In: IEEE 4th Int. Conf. on eScience (eScience 2008), Indianapolis, USA, 7--12.
[19]
Imran, M., Hlavacs, H., Provenance Framework for the Cloud Environment (IaaS). In The Third International Conference on Cloud Computing, GRIDs, and Virtualization, 2012, 152--158.
[20]
Lin, C-F., Valladares, O., Childress, D. M., et al. DRAW+SneakPeek: Analysis workflow and quality metric management for DNA-seq experiments. In Bioinformatics. Aug 2013, 1--3.
[21]
Malawski, M., Juve, G., Deelman, E., Nabrzyski, J., Cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds. In Proc. of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2012.
[22]
Mattoso, M., Werner, C., Travassos, G. H., et al., Towards Supporting the Life Cycle of Large Scale Scientific Experiments, Int. J. of Business Process Integration and Management, 2010, vol. 5, no. 1, 79--92.
[23]
Mendonça, R. R., Cruz, S. M. S., De La Cerda, J. F. S. M, Cavalcanti, M. C., Cordeiro, K. F., Campos, M. L. M., LOP: capturing and linking open provenance on LOD cycle. In Proc. of the 5th Workshop on Semantic Web Information Management, 2013.
[24]
Missier, P., et al., Taverna, reloaded, In SSDBM 2010, Heidelberg, Germany.
[25]
Muniswamy-Reddy, K.-K., Seltzer, M. Provenance as first-class cloud data. In 3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware (LADIS'09) (2009).
[26]
PROV - 2013 http://www.w3.org/TR/prov-overview/
[27]
Research Data Sharing Without Barries. https://rd-alliance.org/
[28]
Sultana, S., Bertino, E. A file provenance system, In Proc. of the 3rd ACM conf. on Data and application security and privacy, 2013, 153--156.
[29]
Taylor, I. J., Deelman, E., Gannon, D. B., Shields, M., (Eds.), Workflows for e-Science: Scientific Workflows for Grids. 2007, 1 ed. Springer.
[30]
Tatusov RL, et al. The COG database: an updated version includes eukaryotes. In BMC Bioinformatics. 2003 Sep 11; 4--41.
[31]
Vaquero, L. M., Rodero-Merino, L., Caceres, J., Lindner, M., A break in the clouds: towards a cloud definition, SIGCOMM Comput. Commun. Rev., 2009, vol. 39, no. 1, 50--55.
[32]
Vöckler, J. S., Juve, G., Deelman, E., Rynge, M., Experiences Using Cloud Computing for A Scientific Workflow Application. In Proc. of the 2nd Int. Workshop on Scientific Cloud Computing, 2011, 15--24.

Index Terms

  1. Collecting cloud provenance metadata with Matriohska: a case study with genomic workflows

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SAC '14: Proceedings of the 29th Annual ACM Symposium on Applied Computing
      March 2014
      1890 pages
      ISBN:9781450324694
      DOI:10.1145/2554850
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 March 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. cloud computing
      2. provenance
      3. scientific workflows

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      SAC 2014
      Sponsor:
      SAC 2014: Symposium on Applied Computing
      March 24 - 28, 2014
      Gyeongju, Republic of Korea

      Acceptance Rates

      SAC '14 Paper Acceptance Rate 218 of 939 submissions, 23%;
      Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

      Upcoming Conference

      SAC '25
      The 40th ACM/SIGAPP Symposium on Applied Computing
      March 31 - April 4, 2025
      Catania , Italy

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 102
        Total Downloads
      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media