Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Facilitating e-Science Discovery Using Scientific Workflows on the Grid

  • Chapter
  • First Online:
Guide to e-Science

Abstract

e-Science has been greatly enhanced from the developing capability and usability of cyberinfrastructure. This chapter explains how scientific workflow systems can facilitate e-Science discovery in Grid environments by providing features including scientific process automation, resource consolidation, parallelism, provenance tracking, fault tolerance, and workflow reuse. We first overview the core services to support e-Science discovery. To demonstrate how these services can be seamlessly assembled, an open source scientific workflow system, called Kepler, is integrated into the University of California Grid. This architecture is being applied to a computational enzyme design process, which is a formidable and collaborative problem in computational chemistry that challenges our knowledge of protein chemistry. Our implementation and experiments validate how the Kepler workflow system can make the scientific computation process automated, pipelined, efficient, extensible, stable, and easy-to-use.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    John Taylor, Director General of Research Councils, Office of Science and Technology, UK.

  2. 2.

    http://www.globus.org/toolkit/

  3. 3.

    http://glite.web.cern.ch/glite/

  4. 4.

    http://www.unicore.eu/

  5. 5.

    http://kepler-project.org

  6. 6.

    http://www.ucgrid.org/

  7. 7.

    http://messagelab.monash.edu.au/NimrodG

  8. 8.

    http://www.cs.wisc.edu/condor/condorg/

  9. 9.

    http://www.w3.org/TR/ws-gloss/#webservice

  10. 10.

    http://www.clusterresources.com/products/torque-resource-manager.php

  11. 11.

    http://www.sun.com/software/sge/

  12. 12.

    http://www.platform.com/workload-management

  13. 13.

    http://www.clusterresources.com/products/maui-cluster-scheduler.php

  14. 14.

    http://www.clusterresources.com/products/moab-cluster-suite.php

  15. 15.

    http://www.ogsadai.org.uk/

  16. 16.

    http://www.sdsc.edu/srb/index.php

  17. 17.

    https://www.irods.org/

  18. 18.

    http://www.neoninc.org/

  19. 19.

    http://en.wikipedia.org/wiki/RAID

  20. 20.

    http://www.lustre.org/

  21. 21.

    http://www.pvfs.org/

  22. 22.

    http://www.globus.org/toolkit/data/rls/

  23. 23.

    http://www.globus.org/security/overview.html

  24. 24.

    http://pegasus.isi.edu/

  25. 25.

    http://www.taverna.org.uk

  26. 26.

    http://www.trianacode.org

  27. 27.

    http://www.dps.uibk.ac.at/projects/askalon/

  28. 28.

    http://www.ci.uchicago.edu/swift/

  29. 29.

    http://saml.xml.org/

  30. 30.

    http://shibboleth.internet2.edu/about.html

  31. 31.

    http://grid.ncsa.illinois.edu/myproxy/

  32. 32.

    https://www.teragrid.org/web/science-gateways/

  33. 33.

    http://gridshib.globus.org/

  34. 34.

    http://shibboleth.internet2.edu/

  35. 35.

    http://ganglia.sourceforge.net/

  36. 36.

    http://www.nagios.org/

  37. 37.

    http://www.globus.org/toolkit/mds/

  38. 38.

    http://www.gridsphere.org

  39. 39.

    http://ptolemy.eecs.berkeley.edu/ptolemyII

  40. 40.

    http://jmol.sourceforge.net/

References

  1. Foster I (2002) What is the Grid? – a three point checklist. GRIDtoday, Vol. 1, No. 6. http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf

  2. Sudholt W, Altintas I, Baldridge K (2006) Scientific workflow infrastructure for computational chemistry on the Grid. In: Proc. of the 1st Computational Chemistry and Its Applications Workshop at the 6th International Conference on Computational Science (ICCS 2006):69–76, LNCS 3993

    Google Scholar 

  3. Tiwari A, Sekhar AKT (2007) Workflow based framework for life science informatics. Computational Biology and Chemistry 31(5–6):305–319

    Article  MATH  Google Scholar 

  4. Yang X, Bruin RP, Dove MT (2010) Developing an End-to-End Scientific Workflow: a Case Study of Using a Reliable, Lightweight, and Comprehensive Workflow Platform in e-Science. Computing in Science and Engineering, 12(3):52–61, May/June 2010, doi:10.1109/MCSE.2010.61

    Article  Google Scholar 

  5. Taylor I, Deelman E, Gannon D, Shields M (eds) (2007), Workflows for e-Science. Springer, New York, Secaucus, NJ, USA, ISBN: 978-1-84628-519-6

    Google Scholar 

  6. Yu Y, Buyya R (2006) A Taxonomy of Workflow Management Systems for Grid Computing. J. Grid Computing, 2006 (3):171–200

    Google Scholar 

  7. Foster I, Kesselman C (eds) (2003) The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers, The Elsevier Series in Grid Computing, ISBN 1558609334, 2nd edition

    Google Scholar 

  8. Berman F, Fox GC, Hey AJG (eds) (2003) Grid Computing: Making The Global Infrastructure a Reality. Wiley. ISBN 0-470-85319-0

    Google Scholar 

  9. Richardson L, Ruby S (2007) RESTful Web Services. O’Reilly Media, Inc., ISBN: 978-0-596-52926-0

    Google Scholar 

  10. Foster I, Kesselman C, Nick J, Tuecke S (2002) The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. www.globus.org/research/papers/ogsa.pdf

  11. Singh MP, Huhns MN (2005) Service-Oriented Computing: Semantics, Processes, Agents. John Wiley & Sons

    Google Scholar 

  12. Buyya R (ed.) (1999) High Performance Cluster Computing: Architectures and Systems. Volume 1, ISBN 0-13-013784-7, Prentice Hall, NJ, USA

    Google Scholar 

  13. Buyya R (ed.) (1999) High Performance Cluster Computing: Programming and Applications. Volume 2, ISBN 0-13-013785-5, Prentice Hall, NJ, USA

    Google Scholar 

  14. El-Rewini H, Lewis TG, Ali HH (1994) Task Scheduling in Parallel and Distributed Systems, ISBN: 0130992356, PTR Prentice Hall

    Google Scholar 

  15. Dong F, Akl SG (2006) Scheduling Algorithms for Grid Computing: State of the Art and Open Problems. Technical Report No. 2006-504, Queen’s University, Canada, http://www.cs.queensu.ca/TechReports/Reports/2006-504.pdf

  16. Chervenak A, Foster I, Kesselman C, Salisbury C, Tuecke S (2000) The data Grid: Towards an architecture for the distributed management and analysis of large scientific datasets. Journal of Network and Computer Applications. 23(3): 187–200. July 2000, doi:10.1006/jnca.2000.0110

    Article  Google Scholar 

  17. Gray J, Liu DT, Nieto-Santisteban M, Szalay A, DeWitt DJ, Heber G (2005) Scientific data management in the coming decade, ACM SIGMOD Record, 34(4):34–41, doi://10.1145/1107499.1107503

    Article  Google Scholar 

  18. Shoshani A, Rotem D (eds) (2009) Scientific Data Management: Challenges, Existing Technology, and Deployment, Computational Science Series. Chapman & Hall/CRC

    Google Scholar 

  19. Moore RW, Jagatheesan A, Rajasekar A, Wan M, Schroeder W (2004) Data Grid Management Systems. In Proc. of the 21st IEEE/NASA Conference on Mass Storage Systems and Technologies (MSST)

    Google Scholar 

  20. Venugopal S, Buyya R, Ramamohanarao K (2006) A taxonomy of Data Grids for distributed data sharing, management, and processing. ACM Comput. Surv. 38(1)

    Google Scholar 

  21. Yick J, Mukherjee B, Ghosal D (2008) Wireless sensor network survey. Computer Networks, 52(12): 2292–2330, DOI: 10.1016/j.comnet.2008.04.002.

    Article  Google Scholar 

  22. Fox G, Gadgil H, Pallickara S, Pierce M, Grossman RL, Gu Y, Hanley D, Hong X (2004) High Performance Data Streaming in Service Architecture. Technical Report. http://www.hpsearch.org/documents/HighPerfDataStreaming.pdf

  23. Rajasekar A, Lu S, Moore R, Vernon F, Orcutt J, Lindquist K (2005) Accessing sensor data using meta data: a virtual object ring buffer framework. In: Proc. of the 2nd Workshop on Data Management for Sensor Networks (DMSN 2005): 35–42

    Google Scholar 

  24. Tilak S, Hubbard P, Miller M, Fountain T (2007) The Ring Buffer Network Bus (RBNB) Data Turbine Streaming Data Middleware for Environmental Observing Systems. eScience 2007: 125–133

    Google Scholar 

  25. J. Postel and J. Reynolds, File Transfer Protocol (FTP), Internet RFC-959 1985

    Google Scholar 

  26. secure copy, http://linux.die.net/man/1/scp

  27. Greenberg J (2002) Metadata and the World Wide Web. The Encyclopedia of Library and Information Science, Vol.72: 224–261, Marcel Dekker, New York

    Google Scholar 

  28. Wittenburg P, Broeder D (2002) Metadata Overview and the Semantic Web. In Proc. of the International Workshop on Resources and Tools in Field Linguistics

    Google Scholar 

  29. Davies J, Fensel D, van Harmelen F. (eds.) (2002) Towards the Semantic Web: Ontology-driven Knowledge Management. Wiley

    Google Scholar 

  30. Wolstencroft K, Alper P, Hull D, Wroe C, Lord PW, Stevens RD, Goble C (2007) The myGrid Ontology: Bioinformatics Service Discovery. International Journal of Bioinformatics Research and Applications, 3(3):326–340

    Article  Google Scholar 

  31. Ludäscher B, Altintas I, Bowers S, Cummings J, Critchlow T, Deelman E, Roure DD, Freire J, Goble C, Jones M, Klasky S, McPhillips T, Podhorszki N, Silva C, Taylor I, Vouk M (2009) Scientific Process Automation and Workflow Management. In Shoshani A, Rotem D (eds) Scientific Data Management: Challenges, Existing Technology, and Deployment, Computational Science Series. 476–508. Chapman & Hall/CRC

    Google Scholar 

  32. Deelman E, Gannon D, Shields MS, Taylor I (2009) Workflows and e-Science: An overview of workflow system features and capabilities. Future Generation Comp. Syst. 25(5): 528–540

    Article  Google Scholar 

  33. Brooks C, Lee EA, Liu X, Neuendorffer S, Zhao Y, Zheng H (2007), Chapter 7: MoML, Heterogeneous Concurrent Modeling and Design in Java (Volume 1: Introduction to Ptolemy II), EECS Department, University of California, Berkeley, UCB/EECS-2007-7, http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-7.html

  34. Scufl Language, Taverna 1.7.1 Manual, http://www.myGrid.org.uk/usermanual1.7/

  35. SwiftScript Language Reference Manual. http://www.ci.uchicago.edu/swift/guides/historical/languagespec.php

  36. Wang J, Altintas I, Berkley C, Gilbert L, Jones MB (2008) A High-Level Distributed Execution Framework for Scientific Workflows. In: Proc. of workshop SWBES08: Challenging Issues in Workflow Applications, 4th IEEE International Conference on e-Science (e-Science 2008):634–639

    Google Scholar 

  37. Pautasso C, Alonso G (2006) Parallel Computing Patterns for Grid Workflows, In: Proc. of Workshop on Workflows in Support of Large-Scale Science (WORKS06) http://www.iks.ethz.ch/publications/jop_grid_workflow_patterns

  38. Flynn MJ (1972) Some Computer Organizations and Their Effectiveness. IEEE Trans. on Computers, C–21(9):948-960

    Article  MathSciNet  Google Scholar 

  39. Wieczorek M, Prodan R, Fahringer T (2005) Scheduling of scientific workflows in the ASKALON grid environment. SIGMOD Record 34(3): 56–62

    Article  Google Scholar 

  40. Singh G, Kesselman C, Deelman E (2005) Optimizing Grid-Based Workflow Execution. J. Grid Comput. 3(3–4):201–219

    Article  Google Scholar 

  41. Simmhan YL, Plale B, Gannon D (2005). A survey of data provenance in e-science. SIGMOD Record, 34(3):31–36

    Article  Google Scholar 

  42. Davidson SB, Freire J (2008) Provenance and scientific workflows: challenges and opportunities. In: Proc. of SIGMOD Conference 2008:1345–1350

    Google Scholar 

  43. Wang J, Altintas I, Berkley C, Gilbert L, Jones MB (2008) A High-Level Distributed Execution Framework for Scientific Workflows. In: Proc. of the 2008 Fourth IEEE International Conference on e-Science (e-Science 2008):634–639

    Google Scholar 

  44. Tierney B, Aydt R, Gunter D, Smith W, Swany M, Taylor V, Wolski R (2002) A Grid Monitoring Architecture. GWDPerf-16–3, Global Grid Forum http://wwwdidc.lbl.gov/GGF-PERF/GMA-WG/papers/GWD-GP-16-3.pdf

  45. Friendly M (2009) Milestones in the history of thematic cartography, statistical graphics, and data visualization. Toronto, York University, http://www.math.yorku.ca/SCS/Gallery/milestone/milestone.pdf

  46. Haber RB, McNabb DA (1990) Visualization Idioms: A Conceptual Model for Scientific Visualization Systems. IEEE Visualization in Scientific Computing:74–93

    Google Scholar 

  47. Singh JP, Gupta A, Levoy M (1994) Parallel Visualization Algorithms: Performance and Architectural Implications, Computer, 27(7):45–55 doi:10.1109/2.299410

    Article  Google Scholar 

  48. Ahrens J, Brislawn K, Martin K, Geveci B, Law CC, Papka M (2001) Large-scale data visualization using parallel data streaming. IEEE Comput. Graph. Appl., 21(4):34–41

    Article  Google Scholar 

  49. Strengert M, Magallón M, Weiskopf D, Guthe S, Ertl T (2004) Hierarchical visualization and compression of large volume datasets using GPU clusters. In: Proc. Eurographics symposium on parallel graphics and visualization (EGPGV04), Eurographics Association: 41–48

    Google Scholar 

  50. Welch V, Siebenlist F, Foster I, Bresnahan J, Czajkowski K, Gawor J, Kesselman C, Meder S, Pearlman L, Tuecke S (2003) Security for grid services. In: Proc. of the Twelfth International Symposium on High Performance Distributed Computing (HPDC-12). IEEE Press

    Google Scholar 

  51. Plankensteiner K, Prodan R, Fahringer T, Kertesz A, Kacsuk PK (2007). Fault-tolerant behavior in state-of-the-art grid workflow management systems. Technical Report. CoreGRID, http://www.coregrid.net/mambo/images/stories/TechnicalReports/tr-0091.pdf

  52. Ludäscher B, Altintas I, Berkley C, Higgins D, Jaeger E, Jones M, Lee E, Tao J, Zhao Y (2005) Scientific workflow management and the Kepler system. Concurrency and Computa-tion: Practice and Experience, 18 (10):1039–1065

    Article  Google Scholar 

  53. Brooks C, Lee EA, Liu X, Neuendorffer S, Zhao Y, Zheng H (2007) Heterogeneous Concurrent Modeling and Design in Java (Volume 3: Ptolemy II Domains), EECS Department, University of California, Berkeley, UCB/EECS-2007-9, http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-9.html

  54. Mouallem P, Crawl D, Altintas I, Vouk M, Yildiz U (2010). A Fault-Tolerance Architecture for Kepler-based Distributed Scientific Workflows. In: Proc. of 22nd International Conference on Scientific and Statistical Database Management (SSDBM 2010):452–460

    Google Scholar 

  55. Lee EA, Parks T (1995) Dataflow Process Networks. In: Proc. of the IEEE, 83(5):773–799

    Article  Google Scholar 

  56. Altintas I, Barney O, Jaeger-Frank E (2006) Provenance Collection Support in the Kepler Scientific Workflow System. In: Proc. of International Provenance and Annotation Workshop (IPAW2006):118–132

    Google Scholar 

  57. Wang J, Altintas I, Hosseini PR, Barseghian D, Crawl D, Berkley C, Jones MB (2009) Accelerating Parameter Sweep Workflows by Utilizing Ad-hoc Network Computing Resources: an Ecological Example. In: Proc. of IEEE 2009 Third International Workshop on Scientific Workflows (SWF 2009) at Congress on Services (Services 2009):267–274

    Google Scholar 

  58. Radetzki U, Leser U, Schulze-Rauschenbach SC, Zimmermann J, Lussem J, Bode T, Cremers AB (2006) Adapters, shims, and glue-service interoperability for in silico experiments. Bioinformatics, 22(9):1137–1143

    Article  Google Scholar 

  59. Wang J, Korambath P, Kim S, Johnson S, Jin K, Crawl D, Altintas I, Smallen S, Labate B, Houk KN (2010) Theoretical Enzyme Design Using the Kepler Scientific Workflows on the Grid, In: Proc. of 5th Workshop on Computational Chemistry and Its Applications (5th CCA) at International Conference on Computational Science (ICCS 2010):1169–1178

    Google Scholar 

  60. Zanghellini A, Jiang L, Wollacott AM, Cheng G, Meiler J, Althoff EA, Röthlisberger D, Baker D (2006) New algorithms and an in silico benchmark for computational enzyme design. Protein Sci. 15(12):2785–2794

    Article  Google Scholar 

  61. Tantillo DJ, Chen J, Houk KN (1998) Theozymes and compuzymes: theoretical models for biological catalysis. Curr Opin Chem Biol. 2(6):743–50

    Article  Google Scholar 

  62. Dantas G, Kuhlman B, Callender D, Wong M, Baker D (2003) A Large scale test of computational protein desing: Folding and stability of nine completely redesigned globular proteins. J. Mol. Biol. 332(2):449–460

    Article  Google Scholar 

  63. Meiler J, Baker D (2006) ROSETTALIGAND: Protein-small molecule docking with full side-chain flexibility. Proteins 65:538–548

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the rest of the Kepler and UC Grid community for their collaboration. We also like to explicitly acknowledge the contribution of Tajendra Vir Singh, Shao-Ching Huang, Sveta Mazurkova, and Paul Weakliem during the UC Grid architecture design phase. This work was supported by NSF SDCI Award OCI-0722079 for Kepler/CORE, NSF CEO:P Award No. DBI 0619060 for REAP, DOE SciDac Award No. DE-FC02-07ER25811 for SDM Center, and UCGRID Project. We also thank the support to the Houk group from NIH-NIGMS and DARPA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianwu Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag London Limited

About this chapter

Cite this chapter

Wang, J. et al. (2011). Facilitating e-Science Discovery Using Scientific Workflows on the Grid. In: Yang, X., Wang, L., Jie, W. (eds) Guide to e-Science. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-0-85729-439-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-0-85729-439-5_13

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-0-85729-438-8

  • Online ISBN: 978-0-85729-439-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics