Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Search, adapt, and reuse: the future of scientific workflows

Published: 15 September 2011 Publication History
  • Get Citation Alerts
  • Abstract

    Over the last years, a number of scientific workflow management systems (SciWFM) have been brought to a state of maturity that should permit their usage in a production-style environment. This is especially true for the Life Sciences, but SciWFM also attract considerable attention in fields like geophysics or climate research. These developments, accompanied by the growing availability of analytical tools wrapped as (web) services, were driven by a series of very interesting promises: End users will be empowered to develop their own pipelines; reuse of services will be enhanced by easier integration into custom workflows; time necessary for developing analysis pipelines will decrease; etc. But despite all efforts, SciWFM have not yet found widespread acceptance in their intended audience. In this paper, we argue that a wider adoption of SciWFM will only be achieved if the focus of research and development is shifted from methods for developing and running workflows to searching, adapting, and reusing existing workflows. Only by this shift can SciWFM outreach to the mass of domain scientists actually performing scientific analysis - and with little interest in developing them themselves. To this end, SciWFM need to be combined with communitywide workflow repositories allowing users to find solutions for their scientific needs (coded as a workflow). In this vision paper, we show how and where such developments have already started and highlight new research questions arising.

    References

    [1]
    Ailamaki, A., Kantere, V. and Dash, D. (2010). "Managing scientific data." Communications of the ACM 53(6): 68--78.
    [2]
    Albrecht, A. (2009). "METL: Managing and Integrating ETL Processes". VLDB PhD workshop.
    [3]
    Aumueller, D., Do, H., Massmann, S. and Rahm, E. (2005). "Schema and ontology matching with COMA++". SIGMOD Conference, Baltimore, US.
    [4]
    Awad, A. and Sakr, S. (2010). "Querying Graph-Based Repositories of Business Process Models ". DASFAA workshops Tsukuba, Japan.
    [5]
    Bao, Z., Cohen-Boulakia, S., Davidson, S. B., Eyal, A. and Khanna, S. (2009). "Differencing Provenance in Scientific Workflows". Int. Conf. on Data Engineering. Shanghai, China.
    [6]
    Beeri, C., Eyal, A., Kamenkovich, S. and Milo, T. (2008). "Querying business processes with BP-QL." Information Systems 33(6): 477--507.
    [7]
    Bhagat, J., Tanoh, F., Nzuobontane, E., Laurent, T., Orlowski, J., Roos, M., Wolstencroft, K., Aleksejevs, S., Stevens, R., Pettifer, S., et al. (2010). "BioCatalogue: a universal catalogue of web services for the life sciences." Nucleic Acids Res 38 Suppl: W689-94.
    [8]
    Biton, O., Cohen-Boulakia, S., Davidson, S. B. and Hara, C. S. (2008). "Querying and Managing Provenance through User Views in Scientific Workflows". Int. Conf. on Data Engineering. Cancún, México.
    [9]
    Ceol, A., Chatr-Aryamontri, A., Licata, L. and Cesareni, G. (2008). "Linking entries in protein interaction database to structured text: The FEBS Letters experiment." FEBS Letters 582(8): 1171-7.
    [10]
    Cesa-Bianchi, N., Gentile, C. and Zaniboni, L. (2006). "Incremental Algorithms for Hierarchical Classification." Journal of Machine Learning Research 7: 31--54.
    [11]
    Churches, D., Gombas, G., Harrison, A., Maassen, J., Robinson, C., Shields, M., Taylor, I. and Wang, I. (2006). "Programming scientific and distributed workflow with Triana services." Concurrency and Computation: Practice and Experience 18(10): 1021--1037.
    [12]
    Cohen-Boulakia, S., Froidevaux, C., Lair, S., Stransky, N., Radvanyi, F., Graziani, S. and Barillot, E. (2004). "Selecting Biomedical Data Sources According To User Preferences". Int. Conference on Intelligent Systems in Molecular Biology (ISMB/ECCB), Glasgow, UK.
    [13]
    Cohen-Boulakia, S. and Tan, W.-C. (2009). Provenance in Scientific Databases. In Liu, L. and Ozsu, M. T. (ed). Book "Encyclopedia of Database Systems", Springer, pp.
    [14]
    Consens, M. P. and Mendelzon, A. O. (1990). "Graph-Log: a Visual Formalism for Real Life Recursion". ACM Symposium on Principles of Database Systems, Nashville, Tennessee. pp 404--416.
    [15]
    Cristianini, N. and Hahn, M. W. (2007). "Introduction to Computational Genomics - A Case Study Approach", Cambridge University Press.
    [16]
    Dadam, P. and Rinderle, S. (2009). Workflow Evolution In Öszu, T. and Liu, L. (ed). Book "Encyclopedia of Database Systems", Springer, pp.: 3540--3544.
    [17]
    Davidson, S., B. and Freire, J. (2008). "Provenance and scientific workflows: challenges and opportunities". SIGMOD Conference. Vancouver, Canada.
    [18]
    De Roure, D., Goble, C. and Stevens, R. (2009). "The design and realisation of the Virtual Research Environment for social sharing of workflows." Future Generation Computer Systems 25(5): 561--567.
    [19]
    Deelman, E., Gannon, D., Shields, M. and Taylor, I. (2009). "Workflows and e-Science: An overview of workflow system features and capabilities." Future Generation Computer Systems 25(5): 528--540.
    [20]
    Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G. B., Good, J., et al. (2004). "Pegasus: A framework for mapping complex scientific workflows onto distributed systems." Scientific Programming 13(3): 219--237.
    [21]
    Delcambre, L., Kop, C., Mayr, H., Mylopoulos, J., Pastor, O., Bowers, S. and Ludäscher, B. (2005). Actor-Oriented Design of Scientific Workflows. In (ed). Book "Conceptual Modeling - ER 2005", Springer Berlin / Heidelberg, pp.: 369--384.
    [22]
    Gentleman, R. C., Carey, V. J., Bates, D. M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., et al. (2004). "Bioconductor: open software development for computational biology and bioinformatics." Genome Biol 5(10): R80.
    [23]
    Gibson, A., Gamble, M., Wolstencroft, K., Oinn, T., Goble, C., Belhajjame, K. and Missier, P. (2009). "The data playground: An intuitive workflow specification environment." Future Generation Computer Systems 25(4): 453--459
    [24]
    Goble, C. and Stevens, R. (2008). "State of the nation in data integration for bioinformatics." J Biomed Inform 41(5): 687--93.
    [25]
    Goderis, A., De Roure, D., Goble, C., Bhagat, J., Cruickshank, D., Fisher, P., Michaelides, D. and Tanoh, F. (2008). "Discovering scientific workflows: the myExperiment benchmarks." IEEE Transactions on Automation Science and Engineering.
    [26]
    Goecks, J., Nekrutenko, A. and Taylor, J. (2010). "Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences." Genome Biol 11(8): R86.
    [27]
    Grigori, D., Corrales, J. C. and Bouzeghoub, M. (2008). "Behavioral matchmaking for service retrieval: application to conversation protocols." Information Systems 33(7-8): 681--698.
    [28]
    Grossmann, D. A. and Frieder, O. (2004). "Information Retrieval - Algorithms and Heuristics", Springer.
    [29]
    Gusfield, D. (1997). "Algorithms on Strings, Trees and Sequences", Cambridge University Press.
    [30]
    He, H. and Singh, A. K. (2008). "Graphs-at-a-time: query language and access methods for graph databases". SIGMOD Conference, Vancouver, Canada pp 405--418.
    [31]
    Ho, J. W., Stefani, M., dos Remedios, C. G. and Charleston, M. A. (2008). "Differential variability analysis of gene expression and its application to human diseases." Bioinformatics 24(13): i390-8.
    [32]
    Howe, B., Maier, D. and Bright, L. (2007). "Smoothing the ROI curve for scientific data management applications". Conf. on Innovative Data Research, Asilomar, USA.
    [33]
    Karvounarakis, G., Ives, Z. G. and Tannen, V. (2010). "Querying data provenance". SIGMOD Conference, Indianapolis, US.
    [34]
    Kumar, A. M., Bowers, S. and Ludaescher, B. (2010). "Techniques for efficiently querying scientific workflow provenance graphs". Int. Conf. on Extending Database Technology. Lausanne, Switzerland.
    [35]
    Liu, Z., Shao, Q. and Chen, Y. (2010). "Searching Workflows with Hierarchical Views." PVLDB 3(1): 918--927.
    [36]
    Lottaz, C., Kostka, D., Markowetz, F. and Spang, R. (2008). "Computational diagnostics with gene expression profiles." Methods Mol Biol 453: 281--96.
    [37]
    Ludaescher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E. A., Tao, J. and Zhao, Y. (2005). "Scientific workflow management and the Kepler system." Concurrency and Computation: Practice and Experience 18(10): 1039--1065.
    [38]
    Ludaescher, B., Altintas, I. and Gupta, A. (2003). "Compiling Abstract Scientific Workflows into Web Service Workflows". 15th Int. Conf. on Scientific and Statistical Database Management, Cambridge, US.
    [39]
    Missier, P., Ludascher, B., Bowers, S., Dey, S., Sarkar, A., Shrestha, B., Altintas, I., Anand, M. K. and Goble, C. (2010). "Linking multiple workflow provenance traces for interoperable collaborative science". 5th Workshop on Workflows in Support of Large-Scale Science, New Orleans, US.
    [40]
    Missier, P., Paton, N. W. and Belhajjame, K. (2010). "Fine-grained and efficient lineage querying of collection-based workflow provenance". 13th Int. Conf. on Extending Database Technology. Lausanne, CH.
    [41]
    Moreau, L., Ludaescher, B., Altintas, I., Barga, R. S., Bowers, S., Callahan, S., Chin, G. J. R., Clifford, B., Cohen, S., Cohen-Boulakia, S., et al. (2008). "Special Issue: The First Provenance Challenge." Concurrency and Computation: Practice and Experience 20(5): 409--418.
    [42]
    Oinn, T., Greenwood, M., Addis, M., Alpdemir, M. N., Ferris, J., Glover, K., Goble, C., GODERIS, A., HULL, D., MARVIN, D., et al. (2005). "Taverna: Lessons in creating a workflow environment for the life sciences." Concurrency and Computation: Practice and Experience 18(10): 1067--1100.
    [43]
    Radetzki, U., Leser, U., Schulze-Rauschenbach, S. C., Zimmermann, J., Lussem, J., Bode, T. and Cremers, A. B. (2006). "Adapters, shims, and glue - service interoperability for in silico experiments." Bioinformatics 22(9): 1137--43.
    [44]
    Reisig, W., Fahland, D., Lohmann, N., Massuthe, P., Stahl, C., Weinberg, D., Wolf, K. and Kaschner, K. (2006). "Analysis Techniques for Service Models". 2nd Int. Symp. on Leveraging Applications of Formal Methods, Verification and Validation. pp 11--17.
    [45]
    Ren, K., Sarvas, R. and Calic, J. (2009). "Interactive search and browsing interface for large-scale visual repositories." Multimedia Tools and Applications 49(3): 513--528.
    [46]
    Scheidegger, C., E., Vo, H., T., Koop, D., Freire, J. and Silva, C., T. (2008). "Querying and re-using workflows with VisTrails". SIGMOD Vancouver, Canada.
    [47]
    Schofield, P. N., Bubela, T., Weaver, T., Portilla, L., Brown, S. D., Hancock, J. M., Einhorn, D., Tocchini-Valentini, G., Hrabe de Angelis, M. and Rosenthal, N. (2009). "Postpublication sharing of data and tools." Nature 461(7261): 171--3.
    [48]
    Seligman, L., Mork, P., Halevy, A., Smith, K., Carey, M. J., Chen, K., Wolf, C., Madhavan, J. and Kannan, A. (2010). "OpenII: An Open Source Information Integration Toolkit". Int. Conf. on Very Large Databases, Singapore.
    [49]
    Seringhaus, M. R. and Gerstein, M. B. (2007). "Publishing perishing? Towards tomorrow's information architecture." BMC Bioinformatics 8: 17.
    [50]
    Stein, L. D. (2008). "Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges." Nature Reviews Genetics 9(9): 678--88.
    [51]
    Stoyanovich, J., Taskar, B. and Davidson, S. (2010). "Exploring Repositories of Scientific Workflows". Int. Workshop on Workflow Approaches to New Data-centric Science Indianapolos, US.
    [52]
    Su, X. and Khoshgoftaar, T. M. (2009). "A Survey of Collaborative Filtering Techniques." Advances in Artificial Intelligence.
    [53]
    Wu, W., Yu, C., Doan, A. and Meng, W. (2004). "An interactive clustering-based approach to integrating source query interfaces on the deep Web". SIGMOD Conference, Paris, France.
    [54]
    Zimmermann, K. and Leser, U. (2010). "Analysis of Affymetrix Exon Arrays". Technical Report 235, Department for Computer Science, Humboldt-Universität zu Berlin.

    Cited By

    View all
    • (2024)A qualitative assessment of using ChatGPT as large language model for scientific workflow developmentGigaScience10.1093/gigascience/giae03013Online publication date: 19-Jun-2024
    • (2023)Developing and reusing bioinformatics data analysis pipelines using scientific workflow systemsComputational and Structural Biotechnology Journal10.1016/j.csbj.2023.03.00321(2075-2085)Online publication date: 2023
    • (2023)Workflows for Bioinformatics Data IntegrationBiological Data Integration10.1002/9781394257317.ch3(53-85)Online publication date: 8-Dec-2023
    • Show More Cited By

    Index Terms

    1. Search, adapt, and reuse: the future of scientific workflows

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM SIGMOD Record
        ACM SIGMOD Record  Volume 40, Issue 2
        June 2011
        43 pages
        ISSN:0163-5808
        DOI:10.1145/2034863
        Issue’s Table of Contents

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 15 September 2011
        Published in SIGMOD Volume 40, Issue 2

        Check for updates

        Author Tags

        1. data analysis
        2. scientific data
        3. scientific workflow systems
        4. workflow management

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)4
        • Downloads (Last 6 weeks)1

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)A qualitative assessment of using ChatGPT as large language model for scientific workflow developmentGigaScience10.1093/gigascience/giae03013Online publication date: 19-Jun-2024
        • (2023)Developing and reusing bioinformatics data analysis pipelines using scientific workflow systemsComputational and Structural Biotechnology Journal10.1016/j.csbj.2023.03.00321(2075-2085)Online publication date: 2023
        • (2023)Workflows for Bioinformatics Data IntegrationBiological Data Integration10.1002/9781394257317.ch3(53-85)Online publication date: 8-Dec-2023
        • (2022)Provenance-based Workflow Diagnostics Using Program Specification2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC56025.2022.00046(292-301)Online publication date: Dec-2022
        • (2022)A Consolidated View on Specification Languages for Data Analysis WorkflowsLeveraging Applications of Formal Methods, Verification and Validation. Software Engineering10.1007/978-3-031-19756-7_12(201-215)Online publication date: 22-Oct-2022
        • (2020)The Workflow Trace Archive: Open-Access Data From Public and Private Computing InfrastructuresIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.298482131:9(2170-2184)Online publication date: 1-Sep-2020
        • (2019)BePTProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3357882(1873-1882)Online publication date: 3-Nov-2019
        • (2019)Adaptation of Scientific Workflows by Means of Process-Oriented Case-Based ReasoningCase-Based Reasoning Research and Development10.1007/978-3-030-29249-2_26(388-403)Online publication date: 8-Sep-2019
        • (2018)Scientific Workflow Clustering and Recommendation Leveraging Layer Hierarchical AnalysisIEEE Transactions on Services Computing10.1109/TSC.2016.254280511:1(169-183)Online publication date: 1-Jan-2018
        • (2018)On the use of model-driven engineering principles for the management of simulation experimentsJournal of Simulation10.1080/17477778.2017.141863813:2(83-95)Online publication date: 4-Jan-2018
        • Show More Cited By

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media