Abstract
e-Science has been greatly enhanced from the developing capability and usability of cyberinfrastructure. This chapter explains how scientific workflow systems can facilitate e-Science discovery in Grid environments by providing features including scientific process automation, resource consolidation, parallelism, provenance tracking, fault tolerance, and workflow reuse. We first overview the core services to support e-Science discovery. To demonstrate how these services can be seamlessly assembled, an open source scientific workflow system, called Kepler, is integrated into the University of California Grid. This architecture is being applied to a computational enzyme design process, which is a formidable and collaborative problem in computational chemistry that challenges our knowledge of protein chemistry. Our implementation and experiments validate how the Kepler workflow system can make the scientific computation process automated, pipelined, efficient, extensible, stable, and easy-to-use.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
John Taylor, Director General of Research Councils, Office of Science and Technology, UK.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.
- 37.
- 38.
- 39.
- 40.
References
Foster I (2002) What is the Grid? – a three point checklist. GRIDtoday, Vol. 1, No. 6. http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf
Sudholt W, Altintas I, Baldridge K (2006) Scientific workflow infrastructure for computational chemistry on the Grid. In: Proc. of the 1st Computational Chemistry and Its Applications Workshop at the 6th International Conference on Computational Science (ICCS 2006):69–76, LNCS 3993
Tiwari A, Sekhar AKT (2007) Workflow based framework for life science informatics. Computational Biology and Chemistry 31(5–6):305–319
Yang X, Bruin RP, Dove MT (2010) Developing an End-to-End Scientific Workflow: a Case Study of Using a Reliable, Lightweight, and Comprehensive Workflow Platform in e-Science. Computing in Science and Engineering, 12(3):52–61, May/June 2010, doi:10.1109/MCSE.2010.61
Taylor I, Deelman E, Gannon D, Shields M (eds) (2007), Workflows for e-Science. Springer, New York, Secaucus, NJ, USA, ISBN: 978-1-84628-519-6
Yu Y, Buyya R (2006) A Taxonomy of Workflow Management Systems for Grid Computing. J. Grid Computing, 2006 (3):171–200
Foster I, Kesselman C (eds) (2003) The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers, The Elsevier Series in Grid Computing, ISBN 1558609334, 2nd edition
Berman F, Fox GC, Hey AJG (eds) (2003) Grid Computing: Making The Global Infrastructure a Reality. Wiley. ISBN 0-470-85319-0
Richardson L, Ruby S (2007) RESTful Web Services. O’Reilly Media, Inc., ISBN: 978-0-596-52926-0
Foster I, Kesselman C, Nick J, Tuecke S (2002) The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. www.globus.org/research/papers/ogsa.pdf
Singh MP, Huhns MN (2005) Service-Oriented Computing: Semantics, Processes, Agents. John Wiley & Sons
Buyya R (ed.) (1999) High Performance Cluster Computing: Architectures and Systems. Volume 1, ISBN 0-13-013784-7, Prentice Hall, NJ, USA
Buyya R (ed.) (1999) High Performance Cluster Computing: Programming and Applications. Volume 2, ISBN 0-13-013785-5, Prentice Hall, NJ, USA
El-Rewini H, Lewis TG, Ali HH (1994) Task Scheduling in Parallel and Distributed Systems, ISBN: 0130992356, PTR Prentice Hall
Dong F, Akl SG (2006) Scheduling Algorithms for Grid Computing: State of the Art and Open Problems. Technical Report No. 2006-504, Queen’s University, Canada, http://www.cs.queensu.ca/TechReports/Reports/2006-504.pdf
Chervenak A, Foster I, Kesselman C, Salisbury C, Tuecke S (2000) The data Grid: Towards an architecture for the distributed management and analysis of large scientific datasets. Journal of Network and Computer Applications. 23(3): 187–200. July 2000, doi:10.1006/jnca.2000.0110
Gray J, Liu DT, Nieto-Santisteban M, Szalay A, DeWitt DJ, Heber G (2005) Scientific data management in the coming decade, ACM SIGMOD Record, 34(4):34–41, doi://10.1145/1107499.1107503
Shoshani A, Rotem D (eds) (2009) Scientific Data Management: Challenges, Existing Technology, and Deployment, Computational Science Series. Chapman & Hall/CRC
Moore RW, Jagatheesan A, Rajasekar A, Wan M, Schroeder W (2004) Data Grid Management Systems. In Proc. of the 21st IEEE/NASA Conference on Mass Storage Systems and Technologies (MSST)
Venugopal S, Buyya R, Ramamohanarao K (2006) A taxonomy of Data Grids for distributed data sharing, management, and processing. ACM Comput. Surv. 38(1)
Yick J, Mukherjee B, Ghosal D (2008) Wireless sensor network survey. Computer Networks, 52(12): 2292–2330, DOI: 10.1016/j.comnet.2008.04.002.
Fox G, Gadgil H, Pallickara S, Pierce M, Grossman RL, Gu Y, Hanley D, Hong X (2004) High Performance Data Streaming in Service Architecture. Technical Report. http://www.hpsearch.org/documents/HighPerfDataStreaming.pdf
Rajasekar A, Lu S, Moore R, Vernon F, Orcutt J, Lindquist K (2005) Accessing sensor data using meta data: a virtual object ring buffer framework. In: Proc. of the 2nd Workshop on Data Management for Sensor Networks (DMSN 2005): 35–42
Tilak S, Hubbard P, Miller M, Fountain T (2007) The Ring Buffer Network Bus (RBNB) Data Turbine Streaming Data Middleware for Environmental Observing Systems. eScience 2007: 125–133
J. Postel and J. Reynolds, File Transfer Protocol (FTP), Internet RFC-959 1985
secure copy, http://linux.die.net/man/1/scp
Greenberg J (2002) Metadata and the World Wide Web. The Encyclopedia of Library and Information Science, Vol.72: 224–261, Marcel Dekker, New York
Wittenburg P, Broeder D (2002) Metadata Overview and the Semantic Web. In Proc. of the International Workshop on Resources and Tools in Field Linguistics
Davies J, Fensel D, van Harmelen F. (eds.) (2002) Towards the Semantic Web: Ontology-driven Knowledge Management. Wiley
Wolstencroft K, Alper P, Hull D, Wroe C, Lord PW, Stevens RD, Goble C (2007) The myGrid Ontology: Bioinformatics Service Discovery. International Journal of Bioinformatics Research and Applications, 3(3):326–340
Ludäscher B, Altintas I, Bowers S, Cummings J, Critchlow T, Deelman E, Roure DD, Freire J, Goble C, Jones M, Klasky S, McPhillips T, Podhorszki N, Silva C, Taylor I, Vouk M (2009) Scientific Process Automation and Workflow Management. In Shoshani A, Rotem D (eds) Scientific Data Management: Challenges, Existing Technology, and Deployment, Computational Science Series. 476–508. Chapman & Hall/CRC
Deelman E, Gannon D, Shields MS, Taylor I (2009) Workflows and e-Science: An overview of workflow system features and capabilities. Future Generation Comp. Syst. 25(5): 528–540
Brooks C, Lee EA, Liu X, Neuendorffer S, Zhao Y, Zheng H (2007), Chapter 7: MoML, Heterogeneous Concurrent Modeling and Design in Java (Volume 1: Introduction to Ptolemy II), EECS Department, University of California, Berkeley, UCB/EECS-2007-7, http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-7.html
Scufl Language, Taverna 1.7.1 Manual, http://www.myGrid.org.uk/usermanual1.7/
SwiftScript Language Reference Manual. http://www.ci.uchicago.edu/swift/guides/historical/languagespec.php
Wang J, Altintas I, Berkley C, Gilbert L, Jones MB (2008) A High-Level Distributed Execution Framework for Scientific Workflows. In: Proc. of workshop SWBES08: Challenging Issues in Workflow Applications, 4th IEEE International Conference on e-Science (e-Science 2008):634–639
Pautasso C, Alonso G (2006) Parallel Computing Patterns for Grid Workflows, In: Proc. of Workshop on Workflows in Support of Large-Scale Science (WORKS06) http://www.iks.ethz.ch/publications/jop_grid_workflow_patterns
Flynn MJ (1972) Some Computer Organizations and Their Effectiveness. IEEE Trans. on Computers, C–21(9):948-960
Wieczorek M, Prodan R, Fahringer T (2005) Scheduling of scientific workflows in the ASKALON grid environment. SIGMOD Record 34(3): 56–62
Singh G, Kesselman C, Deelman E (2005) Optimizing Grid-Based Workflow Execution. J. Grid Comput. 3(3–4):201–219
Simmhan YL, Plale B, Gannon D (2005). A survey of data provenance in e-science. SIGMOD Record, 34(3):31–36
Davidson SB, Freire J (2008) Provenance and scientific workflows: challenges and opportunities. In: Proc. of SIGMOD Conference 2008:1345–1350
Wang J, Altintas I, Berkley C, Gilbert L, Jones MB (2008) A High-Level Distributed Execution Framework for Scientific Workflows. In: Proc. of the 2008 Fourth IEEE International Conference on e-Science (e-Science 2008):634–639
Tierney B, Aydt R, Gunter D, Smith W, Swany M, Taylor V, Wolski R (2002) A Grid Monitoring Architecture. GWDPerf-16–3, Global Grid Forum http://wwwdidc.lbl.gov/GGF-PERF/GMA-WG/papers/GWD-GP-16-3.pdf
Friendly M (2009) Milestones in the history of thematic cartography, statistical graphics, and data visualization. Toronto, York University, http://www.math.yorku.ca/SCS/Gallery/milestone/milestone.pdf
Haber RB, McNabb DA (1990) Visualization Idioms: A Conceptual Model for Scientific Visualization Systems. IEEE Visualization in Scientific Computing:74–93
Singh JP, Gupta A, Levoy M (1994) Parallel Visualization Algorithms: Performance and Architectural Implications, Computer, 27(7):45–55 doi:10.1109/2.299410
Ahrens J, Brislawn K, Martin K, Geveci B, Law CC, Papka M (2001) Large-scale data visualization using parallel data streaming. IEEE Comput. Graph. Appl., 21(4):34–41
Strengert M, Magallón M, Weiskopf D, Guthe S, Ertl T (2004) Hierarchical visualization and compression of large volume datasets using GPU clusters. In: Proc. Eurographics symposium on parallel graphics and visualization (EGPGV04), Eurographics Association: 41–48
Welch V, Siebenlist F, Foster I, Bresnahan J, Czajkowski K, Gawor J, Kesselman C, Meder S, Pearlman L, Tuecke S (2003) Security for grid services. In: Proc. of the Twelfth International Symposium on High Performance Distributed Computing (HPDC-12). IEEE Press
Plankensteiner K, Prodan R, Fahringer T, Kertesz A, Kacsuk PK (2007). Fault-tolerant behavior in state-of-the-art grid workflow management systems. Technical Report. CoreGRID, http://www.coregrid.net/mambo/images/stories/TechnicalReports/tr-0091.pdf
Ludäscher B, Altintas I, Berkley C, Higgins D, Jaeger E, Jones M, Lee E, Tao J, Zhao Y (2005) Scientific workflow management and the Kepler system. Concurrency and Computa-tion: Practice and Experience, 18 (10):1039–1065
Brooks C, Lee EA, Liu X, Neuendorffer S, Zhao Y, Zheng H (2007) Heterogeneous Concurrent Modeling and Design in Java (Volume 3: Ptolemy II Domains), EECS Department, University of California, Berkeley, UCB/EECS-2007-9, http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-9.html
Mouallem P, Crawl D, Altintas I, Vouk M, Yildiz U (2010). A Fault-Tolerance Architecture for Kepler-based Distributed Scientific Workflows. In: Proc. of 22nd International Conference on Scientific and Statistical Database Management (SSDBM 2010):452–460
Lee EA, Parks T (1995) Dataflow Process Networks. In: Proc. of the IEEE, 83(5):773–799
Altintas I, Barney O, Jaeger-Frank E (2006) Provenance Collection Support in the Kepler Scientific Workflow System. In: Proc. of International Provenance and Annotation Workshop (IPAW2006):118–132
Wang J, Altintas I, Hosseini PR, Barseghian D, Crawl D, Berkley C, Jones MB (2009) Accelerating Parameter Sweep Workflows by Utilizing Ad-hoc Network Computing Resources: an Ecological Example. In: Proc. of IEEE 2009 Third International Workshop on Scientific Workflows (SWF 2009) at Congress on Services (Services 2009):267–274
Radetzki U, Leser U, Schulze-Rauschenbach SC, Zimmermann J, Lussem J, Bode T, Cremers AB (2006) Adapters, shims, and glue-service interoperability for in silico experiments. Bioinformatics, 22(9):1137–1143
Wang J, Korambath P, Kim S, Johnson S, Jin K, Crawl D, Altintas I, Smallen S, Labate B, Houk KN (2010) Theoretical Enzyme Design Using the Kepler Scientific Workflows on the Grid, In: Proc. of 5th Workshop on Computational Chemistry and Its Applications (5th CCA) at International Conference on Computational Science (ICCS 2010):1169–1178
Zanghellini A, Jiang L, Wollacott AM, Cheng G, Meiler J, Althoff EA, Röthlisberger D, Baker D (2006) New algorithms and an in silico benchmark for computational enzyme design. Protein Sci. 15(12):2785–2794
Tantillo DJ, Chen J, Houk KN (1998) Theozymes and compuzymes: theoretical models for biological catalysis. Curr Opin Chem Biol. 2(6):743–50
Dantas G, Kuhlman B, Callender D, Wong M, Baker D (2003) A Large scale test of computational protein desing: Folding and stability of nine completely redesigned globular proteins. J. Mol. Biol. 332(2):449–460
Meiler J, Baker D (2006) ROSETTALIGAND: Protein-small molecule docking with full side-chain flexibility. Proteins 65:538–548
Acknowledgments
The authors would like to thank the rest of the Kepler and UC Grid community for their collaboration. We also like to explicitly acknowledge the contribution of Tajendra Vir Singh, Shao-Ching Huang, Sveta Mazurkova, and Paul Weakliem during the UC Grid architecture design phase. This work was supported by NSF SDCI Award OCI-0722079 for Kepler/CORE, NSF CEO:P Award No. DBI 0619060 for REAP, DOE SciDac Award No. DE-FC02-07ER25811 for SDM Center, and UCGRID Project. We also thank the support to the Houk group from NIH-NIGMS and DARPA.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag London Limited
About this chapter
Cite this chapter
Wang, J. et al. (2011). Facilitating e-Science Discovery Using Scientific Workflows on the Grid. In: Yang, X., Wang, L., Jie, W. (eds) Guide to e-Science. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-0-85729-439-5_13
Download citation
DOI: https://doi.org/10.1007/978-0-85729-439-5_13
Published:
Publisher Name: Springer, London
Print ISBN: 978-0-85729-438-8
Online ISBN: 978-0-85729-439-5
eBook Packages: Computer ScienceComputer Science (R0)