Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1646468.1646470acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Exploring many task computing in scientific workflows

Published: 16 November 2009 Publication History

Abstract

One of the main advantages of using a scientific workflow management system (SWfMS) to orchestrate data flows among scientific activities is to control and register the whole workflow execution. The execution of activities within a workflow with high performance computing (HPC) presents challenges in SWfMS execution control. Current solutions leave the scheduling to the HPC queue system. Since the workflow execution engine does not run on remote clusters, SWfMS are not aware of the parallel strategy of the workflow execution. Consequently, remote execution control and provenance registry of the parallel activities is very limited from the SWfMS side. This work presents a set of components to be included on the workflow specification of any SWMfS to control parallelization of activities as MTC. In addition, these components can gather provenance data during remote workflow execution. Through these MTC components, the parallelization strategy can be registered and reused, and provenance data can be uniformly queried. We have evaluated our approach by performing parameter sweep parallelization in solving the incompressible 3D Navier-Stokes equations. Experimental results show the performance gains with the additional benefits of distributed provenance support.

References

[1]
E. Deelman, D. Gannon, M. Shields, and I. Taylor, 2009, Workflows and e-Science: An overview of workflow system features and capabilities, Future Generation Computer Systems, v. 25, n. 5, p. 528--540.
[2]
T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Greenwood, C. Goble, A. Wipat, P. Li, and T. Carver, 2004, Delivering web service coordination capability to users, In: 13th international World Wide Web conference on Alternate track papers&posters, p. 438--439
[3]
Y. Zhao, I. Raicu, and I. Foster, 2008, Scientific Workflow Systems for 21st Century, New Bottle or New Wine?, In: 2008 IEEE Congress on Services, p. 467--471
[4]
J. Freire, D. Koop, E. Santos, and C. T. Silva, 2008, Provenance for Computational Tasks: A Survey, Computing in Science and Engineering, v. 10, n. 3, p. 11--21.
[5]
S. P. Callahan, J. Freire, E. Santos, C. E. Scheidegger, C. T. Silva, and H. T. Vo, 2006, VisTrails: visualization meets data management, In: Proceedings of the 2006 ACM SIGMOD, p. 745--747, Chicago, IL, USA.
[6]
I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludascher, and S. Mock, 2004, Kepler: an extensible system for design and execution of scientific workflows, In: 16th SSDBM, p. 423--424, Santorini, Greece.
[7]
D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. R. Pocock, P. Li, and T. Oinn, 2006, Taverna: a tool for building and running workflows of services, Nucleic Acids Research, v. 34, n. Web Server issue, p. 729--732.
[8]
J. Yu and R. Buyya, 2005, A Taxonomy of Workflow Management Systems for Grid Computing, Journal of Grid Computing, v. 34, n. 3--4, p. 171--200.
[9]
E. Deelman, G. Mehta, G. Singh, M. Su, and K. Vahi, 2007, "Pegasus: Mapping Large-Scale Workflows to Distributed Resources", Workflows for e-Science, Springer, p. 376--394.
[10]
I. Raicu, Y. Zhao, C. Dumitrescu, I. Foster, and M. Wilde, 2007, Falkon: a Fast and Light-weight tasK executiON framework, In: 2007 ACM/IEEE conference on Supercomputing, p. 1--12, Reno, Nevada.
[11]
I. Taylor, M. Shields, I. Wang, and A. Harrison, 2007, "The Triana Workflow Environment: Architecture and Applications", Workflows for e-Science, Springer, p. 320--339.
[12]
I. Raicu, I. Foster, and Yong Zhao, 2008, Many-task computing for grids and supercomputers, In: Workshop on Many-Task Computing on Grids and Supercomputers, p. 1--11
[13]
D. L. Brown, J. Bell, D. Estep, W. Gropp, B. Hendrickson, S. Keller-McNulty, D. Keyes, J. T. Oden, L. Petzold, et al., 2008, Applied Mathematics at the U.S. Department of Energy: Past, Present and a View to the Future. URL: http://www.osti.gov/bridge/servlets/purl/944335-d7sRna/.
[14]
M. S. {. Eldred, H. {. Agarwal, V. M. {. Perez, S. F. {. Wojtkiewicz, and J. E. {. Renaud, 2007, Investigation of reliability method formulations in DAKOTA/UQ, Structure&Infrastructure Engineering: Maintenance, Management, Life-Cycl, v. 3 (Sep.), p. 199--213.
[15]
L. A. Meyer, S. C. Rössle, P. M. Bisch, and M. Mattoso, 2005, "Parallelism in Bioinformatics Workflows", High Performance Computing for Computational Science - VECPAR 2004, p. 583--597.
[16]
R. S. Barga, D. Fay, D. Guo, S. Newhouse, Y. Simmhan, and A. Szalay, 2008, Efficient scheduling of scientific workflows in a high performance computing cluster, In: 6th international workshop on Challenges of large applications in distributed environments, p. 63--68, Boston, MA, USA.
[17]
W. M. P. V. D. Aalst, A. H. M. T. Hofstede, B. Kiepuszewski, and A. P. Barros, 2003, Workflow Patterns, Distrib. Parallel Databases, v. 14, n. 1, p. 5--51.
[18]
E. Walker and C. Guiang, 2007, Challenges in executing large parameter sweep studies across widely distributed computing environments, In: 5th IEEE workshop on Challenges of large applications in distributed environments, p. 11--18, Monterey, California, USA.
[19]
M. E. Samples, J. M. Daida, M. Byom, and M. Pizzimenti, 2005, Parameter sweeps for exploring GP parameters, In: 2005 workshops on Genetic and evolutionary computation, p. 212--219, Washington, D.C.
[20]
L. Meyer, D. Scheftner, J. Vöckler, M. Mattoso, M. Wilde, and I. Foster, 2007, "An Opportunistic Algorithm for Scheduling Workflows on Grids", High Performance Computing for Computational Science - VECPAR 2006, p. 1--12.
[21]
J. Dean and S. Ghemawat, 2008, MapReduce: simplified data processing on large clusters, Commun. ACM, v. 51, n. 1, p. 107--113.
[22]
C. Szyperski, 1997, Component Software: Beyond Object-Oriented Programming. Addison-Wesley Professional.
[23]
H. Bergsten, 2003, JavaServer pages. O'Reilly Media, Inc.
[24]
A. Bayucan, R. L. Henderson, and J. P. Jones, 2000, Portable Batch System Administration Guide, Veridian System
[25]
L. Moreau, J. Freire, J. Futrelle, R. McGrath, J. Myers, and P. Paulson, 2008, "The Open Provenance Model: An Overview", Provenance and Annotation of Data and Processes, p. 323--326.
[26]
E. Ogasawara, C. Paulino, L. Murta, C. Werner, and M. Mattoso, 2009, Experiment Line: Software Reuse in Scientific Workflows, In: 21th SSDBM, p. 264--272, New Orleans, LA.
[27]
A. Marinho, L. Murta, C. Werner, V. Braganholo, S. M. S. D. Cruz, and M. Mattoso, 2009, A Strategy for Provenance Gathering in Distributed Scientific Workflows, In: IEEE International Workshop on Scientific Workflows, Los Angeles, California, United States.
[28]
D. A. Bader, 2008, Petascale computing: algorithms and applications. Chapman&Hall/CRC.
[29]
R. N. Elias and A. L. G. A. Coutinho, 2007, Stabilized edge-based finite element simulation of free-surface flows, International Journal for Numerical Methods in Fluids, v. 54, n. 6--8, p. 965--993.
[30]
R. N. Elias, V. Braganholo, J. Clarke, M. Mattoso, and A. L. Coutinho, 2009, Using XML with large parallel datasets: is ther any hope?, In: Parallel Computational Fluid Dynamics (ParCFD)
[31]
Paraview, 2009, Paraview, http://www.paraview.org.
[32]
L. Gadelha and M. Mattoso, 2008, Kairos: An Architecture for Securing Authorship and Temporal Information of Provenance Data in Grid-Enabled Workflow Management Systems, In: International Workshop on Scientific Workflows and Business Workflow Standards in e-Science (SWBES 2008), p. 597--602
[33]
R. Hasan, R. Sion, and M. Winslett, 2007, Introducing secure provenance: problems and challenges, In: Proceedings of the 2007 ACM workshop on Storage security and survivability, p. 13--18, Alexandria, Virginia, USA.
[34]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, 2007, Dryad: distributed data-parallel programs from sequential building blocks, In: 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, p. 72, 59, Lisbon, Portugal.
[35]
Li Hui, Huashan Yu, and Li Xiaoming, 2008, A lightweight execution framework for massive independent tasks, In: Workshop on Many-Task Computing on Grids and Supercomputers, 2008, p. 1--9
[36]
R. Pike, S. Dorward, R. Griesemer, and S. Quinlan, 2005, Interpreting the data: Parallel analysis with Sawzall, Sci. Program., v. 13, n. 4, p. 277--298.
[37]
I. WfMC, 2009, Binding, WfMC Standards, WFMC-TC-1023, http://www. wfmc. org, 2000.

Cited By

View all
  • (2021)An incremental reinforcement learning scheduling strategy for data‐intensive scientific workflows in the cloudConcurrency and Computation: Practice and Experience10.1002/cpe.619333:11Online publication date: 26-Jan-2021
  • (2017)Columbus: Enabling Scalable Scientific Workflows for Fast Evolving Spatio-Temporal Sensor Data2017 IEEE International Conference on Services Computing (SCC)10.1109/SCC.2017.11(9-18)Online publication date: Jun-2017
  • (2016)Parallelization Strategies for Computational Fluid Dynamics Software: State of the Art ReviewArchives of Computational Methods in Engineering10.1007/s11831-016-9165-424:2(337-363)Online publication date: 13-Jan-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MTAGS '09: Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
November 2009
131 pages
ISBN:9781605587141
DOI:10.1145/1646468
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 November 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. computational fluid dynamics
  2. parallelization
  3. provenance
  4. scientific workflows

Qualifiers

  • Research-article

Conference

SC '09
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2021)An incremental reinforcement learning scheduling strategy for data‐intensive scientific workflows in the cloudConcurrency and Computation: Practice and Experience10.1002/cpe.619333:11Online publication date: 26-Jan-2021
  • (2017)Columbus: Enabling Scalable Scientific Workflows for Fast Evolving Spatio-Temporal Sensor Data2017 IEEE International Conference on Services Computing (SCC)10.1109/SCC.2017.11(9-18)Online publication date: Jun-2017
  • (2016)Parallelization Strategies for Computational Fluid Dynamics Software: State of the Art ReviewArchives of Computational Methods in Engineering10.1007/s11831-016-9165-424:2(337-363)Online publication date: 13-Jan-2016
  • (2015)Dynamic steering of HPC scientific workflowsFuture Generation Computer Systems10.1016/j.future.2014.11.01746:C(100-113)Online publication date: 1-May-2015
  • (2013)User-steering of HPC workflowsProceedings of the 2nd ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies10.1145/2499896.2499900(1-6)Online publication date: 23-Jun-2013
  • (2013)Performance evaluation of parallel strategies in public cloudsFuture Generation Computer Systems10.1016/j.future.2012.12.01929:7(1816-1825)Online publication date: 1-Sep-2013
  • (2013)Chiron: a parallel engine for algebraic scientific workflowsConcurrency and Computation: Practice and Experience10.1002/cpe.303225:16(2327-2341)Online publication date: 10-May-2013
  • (2012)Evaluating parameter sweep workflows in high performance computingProceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies10.1145/2443416.2443418(1-10)Online publication date: 20-May-2012
  • (2012)A framework for readapting and running bioinformatics applications in the cloudProceedings of the 2012 ACM Research in Applied Computation Symposium10.1145/2401603.2401624(86-91)Online publication date: 23-Oct-2012
  • (2012)A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in CloudsJournal of Grid Computing10.1007/s10723-012-9227-210:3(521-552)Online publication date: 1-Sep-2012
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media