Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Designing for Recommending Intermediate States in A Scientific Workflow Management System

Published: 29 May 2021 Publication History

Abstract

To process a large amount of data sequentially and systematically, proper management of workflow components (i.e., modules, data, configurations, associations among ports and links) in a Scientific Workflow Management System (SWfMS) is inevitable. Managing data with provenance in a SWfMS to support reusability of workflows, modules, and data is not a simple task. Handling such components is even more burdensome for frequently assembled and executed complex workflows for investigating large datasets with different technologies (i.e., various learning algorithms or models). However, a great many studies propose various techniques and technologies for managing and recommending services in a SWfMS, but only a very few studies consider the management of data in a SWfMS for efficient storing and facilitating workflow executions. Furthermore, there is no study to inquire about the effectiveness and efficiency of such data management in a SWfMS from a user perspective. In this paper, we present and evaluate a GUI version of such a novel approach of intermediate data management with two use cases (Plant Phenotyping and Bioinformatics). The technique we call GUI-RISPTS (Recommending Intermediate States from Pipelines Considering Tool-States) can facilitate executions of workflows with processed data (i.e., intermediate outcomes of modules in a workflow) and can thus reduce the computational time of some modules in a SWfMS. We integrated GUI-RISPTS with an existing workflow management system called SciWorCS. In SciWorCS, we present an interface that users use for selecting the recommendation of intermediate states (i.e., modules' outcomes). We investigated GUI-RISPTS's effectiveness from users' perspectives along with measuring its overhead in terms of storage and efficiency in workflow execution.

References

[1]
Enis Afgan, Dannon Baker, Bérénice Batut, Marius van den Beek, Dave Bouvier, Martin ?ech, John Chilton, Dave Clements, Nate Coraor, Björn A Grüning, Aysam Guerler, Jennifer Hillman-Jackson, Saskia Hiltemann, Vahid Jalili, Helena Rasche, Nicola Soranzo, Jeremy Goecks, James Taylor, Anton Nekrutenko, and Daniel Blankenberg. 2018. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Research, Vol. 46 (2018), W537--W544. https://doi.org/10.1093/nar/gky379
[2]
Rakesh Agrawal, Tomasz Imieli'nski, and Arun Swami. 1993. Mining Association Rules Between Sets of Items in Large Databases. SIGMOD Rec., Vol. 22 (1993), 207--216.
[3]
Emir M. Bahsi, Emrah Ceyhan, and Tevfik Kosar. 2007. Conditional Workflow Management: A Survey and Analysis. Sci. Program., Vol. 15, 4 (Dec. 2007), 283--297. https://doi.org/10.1155/2007/680291
[4]
Duncan A. Brown, Patrick R. Brady, Alexander Dietz, Junwei Cao, Ben Johnson, and John McNabb. 2007. A Case Study on the Use of Workflow Technologies for Scientific Analysis: Gravitational Wave Data Analysis .Springer London, London, 39--59. https://doi.org/10.1007/978--1--84628--757--2_4
[5]
D. Chakroborti, M. Mondal, B. Roy, C. K. Roy, and K. A. Schneider. 2018. Optimized Storing of Workflow Outputs through Mining Association Rules. In 2018 IEEE International Conference on Big Data (Big Data). 508--515. https://doi.org/10.1109/BigData.2018.8622351
[6]
Debasish Chakroborti, Banani Roy, Amit Mondal, Golam Mostaeen, Chanchal K. Roy, Kevin A. Schneider, and Ralph Deters. 2020. A Data Management Scheme for Micro-Level Modular Computation-Intensive Programs in Big Data Platforms .Springer International Publishing, Cham, 135--153. https://doi.org/10.1007/978--3-030--32587--9_9
[7]
Eran Chinthaka, Jaliya Ekanayake, David Leake, and Beth Plale. 2009. CBR based workflow composition assistant. In Proc. of World Congress on Services. 352 -- 355.
[8]
Brian Clifton. 2012. Advanced web metrics with Google Analytics .John Wiley & Sons.
[9]
Susan B. Davidson and Juliana Freire. 2008. Provenance and Scientific Workflows: Challenges and Opportunities. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08). ACM, New York, NY, USA, 1345--1350. https://doi.org/10.1145/1376616.1376772
[10]
E. Deelman and A. Chervenak. 2008. Data Management Challenges of Data-Intensive Scientific Workflows. In 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID). 687--692. https://doi.org/10.1109/CCGRID.2008.24
[11]
Juliana Freire, Cláudio T. Silva, Steven P. Callahan, Emanuele Santos, Carlos E. Scheidegger, and Huy T. Vo. 2006. Managing Rapidly-Evolving Scientific Workflows. In Provenance and Annotation of Data, Luc Moreau and Ian Foster (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 10--18.
[12]
Ritu Garg and Awadhesh Kumar Singh. 2015. Adaptive workflow scheduling in grid computing based on dynamic resource availability. Engineering Science and Technology, an International Journal, Vol. 18, 2 (2015), 256 -- 269. https://doi.org/10.1016/j.jestch.2015.01.001
[13]
Y. Gil, E. Deelman, M. Ellisman, T. Fahringer, G. Fox, D. Gannon, C. Goble, M. Livny, L. Moreau, and J. Myers. 2007. Examining the Challenges of Scientific Workflows. Computer, Vol. 40, 12 (Dec 2007), 24--32. https://doi.org/10.1109/MC.2007.421
[14]
Yolanda Gil, Pedro Szekely, Sandra Villamizar, Thomas C. Harmon, Varun Ratnakar, Shubham Gupta, Maria Muslea, Fabio Silva, and Craig A. Knoblock. 2011. Mind Your Metadata: Exploiting Semantics for Configuration, Adaptation, and Provenance in Scientific Workflows. In Proceedings of the 10th International Conference on The Semantic Web - Volume Part II (ISWC'11). Springer-Verlag, Berlin, Heidelberg, 65--80. http://dl.acm.org/citation.cfm?id=2063076.2063082
[15]
Jim Gray, David T. Liu, Maria Nieto-Santisteban, Alex Szalay, David J. DeWitt, and Gerd Heber. 2005. Scientific Data Management in the Coming Decade. SIGMOD Rec., Vol. 34, 4 (Dec. 2005), 34--41. https://doi.org/10.1145/1107499.1107503
[16]
Emily H Halili. 2008. Apache JMeter: A practical beginner's guide to automated testing and performance measurement for your websites .Packt Publishing Ltd.
[17]
D. Koop, C. E. Scheidegger, S. P. Callahan, J. Freire, and C. T. Silva. 2008. VisComplete: Automating Suggestions for Visualization Pipelines. IEEE Transactions on Visualization and Computer Graphics, Vol. 14 (2008), 1691--1698.
[18]
David Leake and Joseph Kendall-Morwick. 2008. Towards Case-Based Support for e-Science Workflow Generation by Mining Provenance. In Advances in Case-Based Reasoning, Klaus-Dieter Althoff, Ralph Bergmann, Mirjam Minor, and Alexandre Hanft (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 269--283.
[19]
Golam Mostaeen, Banani Roy, Chanchal Roy, and Kevin Schneider. 2019. Designing for Real-Time Groupware Systems to Support Complex Scientific Data Analysis. Proc. ACM Hum.-Comput. Interact., Vol. 3, EICS, Article Article 9 (June 2019), 28 pages. https://doi.org/10.1145/3331151
[20]
Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Margo Seltzer. 2006. Provenance-aware Storage Systems. In Proceedings of the Annual Conference on USENIX '06 Annual Technical Conference (ATEC '06). USENIX Association, Berkeley, CA, USA, 4--4. http://dl.acm.org/citation.cfm?id=1267359.1267363
[21]
Radu Prodan and Thomas Fahringer. 2005. Dynamic Scheduling of Scientific Workflow Applications on the Grid: A Case Study. In Proceedings of the 2005 ACM Symposium on Applied Computing (SAC '05). ACM, New York, NY, USA, 687--694. https://doi.org/10.1145/1066677.1066835
[22]
Arcot Rajasekar, Mike Wan, Reagan Moore, and Wayne Schroeder. 2006. A prototype rule-based distributed data management system. (01 2006).
[23]
H.A. Reijers, I. Vanderfeesten, and W.M.P. van der Aalst. 2016. The effectiveness of workflow management systems: A longitudinal study. International Journal of Information Management, Vol. 36, 1 (2016), 126 -- 141. https://doi.org/10.1016/j.ijinfomgt.2015.08.003
[24]
Peter Sevcik. 2005. Defining the application performance index. Business Communications Review, Vol. 20 (2005).
[25]
Yogesh L. Simmhan, Beth Plale, and Dennis Gannon. 2005. A Survey of Data Provenance in e-Science. SIGMOD Rec., Vol. 34, 3 (Sept. 2005), 31--36. https://doi.org/10.1145/1084805.1084812
[26]
Ola Spjuth, Erik Bongcam-Rudloff, Guillermo Carrasco Hernández, Lukas Forer, Mario Giovacchini, Roman Valls Guimera, Aleksi Kallio, Eija Korpelainen, Maciej M. Ka'n duła, Milko Krachunov, David P. Kreil, Ognyan Kulev, Paweł P. Łabaj, Samuel Lampa, Luca Pireddu, Sebastian Schönherr, Alexey Siretskiy, and Dimitar Vassilev. 2015. Experiences with workflows for automating data-intensive bioinformatics. Biology Direct, Vol. 10 (2015), 43.
[27]
Jianwu Wang, Daniel Crawl, and Ilkay Altintas. 2009. Kepler+Hadoop: A General Architecture Facilitating Data-intensive Applications in Scientific Workflow Systems. In Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science (WORKS '09). ACM, New York, NY, USA, Article 12, 8 pages. https://doi.org/10.1145/1645164.1645176
[28]
Simon Woodman, Hugo Hiden, and Paul Watson. 2015. Workflow Provenance: An Analysis of Long Term Storage Costs. In Proc. of WORKS. 1--9.
[29]
Qishi Wu, Mengxia Zhu, Yi Gu, Patrick Brown, Xukang Lu, Wuyin Lin, and Yangang Liu. 2012. A Distributed Workflow Management System with Case Study of Real-life Scientific Applications on Grids. Journal of Grid Computing, Vol. 10, 3 (01 Sep 2012), 367--393. https://doi.org/10.1007/s10723-012--9222--7
[30]
Dong Yuan, Yun Yang, Xiao Liu, and Jinjun Chen. 2011. On-demand Minimum Cost Benchmarking for Intermediate Dataset Storage in Scientific Cloud Workflow Systems. J. Parallel Distrib. Comput., Vol. 71 (2011), 316--332.
[31]
Jia Zhang, Wei Tan, Alexander John, Ian Foster, and Ravi Madduri. 2011. Recommend-as-you-go: A novel approach supporting services-oriented scientific workflow reuse. In Proc. of SCC. 48 -- 55.
[32]
Charles Zheng and Douglas Thain. 2015. Integrating Containers into Workflows: A Case Study Using Makeflow, Work Queue, and Docker. In Proceedings of the 8th International Workshop on Virtualization Technologies in Distributed Computing (VTDC '15). ACM, New York, NY, USA, 31--38. https://doi.org/10.1145/2755979.2755984
[33]
Thomas Zimmermann, Peter Weisgerber, Stephan Diehl, and Andreas Zeller. 2004. Mining Version Histories to Guide Software Changes. In Proc. of ICSE. 563--572.
[34]
M. zur Muhlen. 1999. Evaluation of workflow management systems using meta models. In Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers, Vol. Track 5. 11 pp.--. https://doi.org/10.1109/HICSS.1999.772961

Cited By

View all
  • (2024)On the use of big data frameworks in big service managementJournal of Software: Evolution and Process10.1002/smr.264236:7Online publication date: 14-Jul-2024
  • (2023)MANSOR: A module alignment method based on neighbor information for scientific workflowConcurrency and Computation: Practice and Experience10.1002/cpe.773636:10Online publication date: 19-Apr-2023
  • (2022)Challenges of Provenance in Scientific Workflow Management Systems2022 IEEE/ACM Workshop on Workflows in Support of Large-Scale Science (WORKS)10.1109/WORKS56498.2022.00007(10-18)Online publication date: Nov-2022

Index Terms

  1. Designing for Recommending Intermediate States in A Scientific Workflow Management System

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Human-Computer Interaction
    Proceedings of the ACM on Human-Computer Interaction  Volume 5, Issue EICS
    EICS
    June 2021
    546 pages
    EISSN:2573-0142
    DOI:10.1145/3468527
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 May 2021
    Published in PACMHCI Volume 5, Issue EICS

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. association rules
    2. intermediate states
    3. pipeline design
    4. plant phenotyping
    5. workflow

    Qualifiers

    • Research-article

    Funding Sources

    • Canada First Research Excellence Fund (CFREF)
    • Global Institute for Food Security (GIFS)

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)On the use of big data frameworks in big service managementJournal of Software: Evolution and Process10.1002/smr.264236:7Online publication date: 14-Jul-2024
    • (2023)MANSOR: A module alignment method based on neighbor information for scientific workflowConcurrency and Computation: Practice and Experience10.1002/cpe.773636:10Online publication date: 19-Apr-2023
    • (2022)Challenges of Provenance in Scientific Workflow Management Systems2022 IEEE/ACM Workshop on Workflows in Support of Large-Scale Science (WORKS)10.1109/WORKS56498.2022.00007(10-18)Online publication date: Nov-2022

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media