research-article

Exploring many task computing in scientific workflows

Authors:

Eduardo Ogasawara,

Daniel de Oliveira,

Fernando Chirigati,

Carlos Eduardo Barbosa,

Vanessa Braganholo,

Alvaro Coutinho,

Marta MattosoAuthors Info & Claims

MTAGS '09: Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers

Article No.: 2, Pages 1 - 10

https://doi.org/10.1145/1646468.1646470

Published: 16 November 2009 Publication History

Abstract

One of the main advantages of using a scientific workflow management system (SWfMS) to orchestrate data flows among scientific activities is to control and register the whole workflow execution. The execution of activities within a workflow with high performance computing (HPC) presents challenges in SWfMS execution control. Current solutions leave the scheduling to the HPC queue system. Since the workflow execution engine does not run on remote clusters, SWfMS are not aware of the parallel strategy of the workflow execution. Consequently, remote execution control and provenance registry of the parallel activities is very limited from the SWfMS side. This work presents a set of components to be included on the workflow specification of any SWMfS to control parallelization of activities as MTC. In addition, these components can gather provenance data during remote workflow execution. Through these MTC components, the parallelization strategy can be registered and reused, and provenance data can be uniformly queried. We have evaluated our approach by performing parameter sweep parallelization in solving the incompressible 3D Navier-Stokes equations. Experimental results show the performance gains with the additional benefits of distributed provenance support.

References

[1]

E. Deelman, D. Gannon, M. Shields, and I. Taylor, 2009, Workflows and e-Science: An overview of workflow system features and capabilities, Future Generation Computer Systems, v. 25, n. 5, p. 528--540.

Digital Library

[2]

T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Greenwood, C. Goble, A. Wipat, P. Li, and T. Carver, 2004, Delivering web service coordination capability to users, In: 13th international World Wide Web conference on Alternate track papers&posters, p. 438--439

Digital Library

[3]

Y. Zhao, I. Raicu, and I. Foster, 2008, Scientific Workflow Systems for 21st Century, New Bottle or New Wine?, In: 2008 IEEE Congress on Services, p. 467--471

Digital Library

[4]

J. Freire, D. Koop, E. Santos, and C. T. Silva, 2008, Provenance for Computational Tasks: A Survey, Computing in Science and Engineering, v. 10, n. 3, p. 11--21.

Digital Library

[5]

S. P. Callahan, J. Freire, E. Santos, C. E. Scheidegger, C. T. Silva, and H. T. Vo, 2006, VisTrails: visualization meets data management, In: Proceedings of the 2006 ACM SIGMOD, p. 745--747, Chicago, IL, USA.

Digital Library

[6]

I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludascher, and S. Mock, 2004, Kepler: an extensible system for design and execution of scientific workflows, In: 16th SSDBM, p. 423--424, Santorini, Greece.

Digital Library

[7]

D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. R. Pocock, P. Li, and T. Oinn, 2006, Taverna: a tool for building and running workflows of services, Nucleic Acids Research, v. 34, n. Web Server issue, p. 729--732.

[8]

J. Yu and R. Buyya, 2005, A Taxonomy of Workflow Management Systems for Grid Computing, Journal of Grid Computing, v. 34, n. 3--4, p. 171--200.

[9]

E. Deelman, G. Mehta, G. Singh, M. Su, and K. Vahi, 2007, "Pegasus: Mapping Large-Scale Workflows to Distributed Resources", Workflows for e-Science, Springer, p. 376--394.

[10]

I. Raicu, Y. Zhao, C. Dumitrescu, I. Foster, and M. Wilde, 2007, Falkon: a Fast and Light-weight tasK executiON framework, In: 2007 ACM/IEEE conference on Supercomputing, p. 1--12, Reno, Nevada.

Digital Library

[11]

I. Taylor, M. Shields, I. Wang, and A. Harrison, 2007, "The Triana Workflow Environment: Architecture and Applications", Workflows for e-Science, Springer, p. 320--339.

[12]

I. Raicu, I. Foster, and Yong Zhao, 2008, Many-task computing for grids and supercomputers, In: Workshop on Many-Task Computing on Grids and Supercomputers, p. 1--11

[13]

D. L. Brown, J. Bell, D. Estep, W. Gropp, B. Hendrickson, S. Keller-McNulty, D. Keyes, J. T. Oden, L. Petzold, et al., 2008, Applied Mathematics at the U.S. Department of Energy: Past, Present and a View to the Future. URL: http://www.osti.gov/bridge/servlets/purl/944335-d7sRna/.

[14]

M. S. {. Eldred, H. {. Agarwal, V. M. {. Perez, S. F. {. Wojtkiewicz, and J. E. {. Renaud, 2007, Investigation of reliability method formulations in DAKOTA/UQ, Structure&Infrastructure Engineering: Maintenance, Management, Life-Cycl, v. 3 (Sep.), p. 199--213.

[15]

L. A. Meyer, S. C. Rössle, P. M. Bisch, and M. Mattoso, 2005, "Parallelism in Bioinformatics Workflows", High Performance Computing for Computational Science - VECPAR 2004, p. 583--597.

Digital Library

[16]

R. S. Barga, D. Fay, D. Guo, S. Newhouse, Y. Simmhan, and A. Szalay, 2008, Efficient scheduling of scientific workflows in a high performance computing cluster, In: 6th international workshop on Challenges of large applications in distributed environments, p. 63--68, Boston, MA, USA.

Digital Library

[17]

W. M. P. V. D. Aalst, A. H. M. T. Hofstede, B. Kiepuszewski, and A. P. Barros, 2003, Workflow Patterns, Distrib. Parallel Databases, v. 14, n. 1, p. 5--51.

Digital Library

[18]

E. Walker and C. Guiang, 2007, Challenges in executing large parameter sweep studies across widely distributed computing environments, In: 5th IEEE workshop on Challenges of large applications in distributed environments, p. 11--18, Monterey, California, USA.

Digital Library

[19]

M. E. Samples, J. M. Daida, M. Byom, and M. Pizzimenti, 2005, Parameter sweeps for exploring GP parameters, In: 2005 workshops on Genetic and evolutionary computation, p. 212--219, Washington, D.C.

Digital Library

[20]

L. Meyer, D. Scheftner, J. Vöckler, M. Mattoso, M. Wilde, and I. Foster, 2007, "An Opportunistic Algorithm for Scheduling Workflows on Grids", High Performance Computing for Computational Science - VECPAR 2006, p. 1--12.

Digital Library

[21]

J. Dean and S. Ghemawat, 2008, MapReduce: simplified data processing on large clusters, Commun. ACM, v. 51, n. 1, p. 107--113.

Digital Library

[22]

C. Szyperski, 1997, Component Software: Beyond Object-Oriented Programming. Addison-Wesley Professional.

Digital Library

[23]

H. Bergsten, 2003, JavaServer pages. O'Reilly Media, Inc.

Digital Library

[24]

A. Bayucan, R. L. Henderson, and J. P. Jones, 2000, Portable Batch System Administration Guide, Veridian System

[25]

L. Moreau, J. Freire, J. Futrelle, R. McGrath, J. Myers, and P. Paulson, 2008, "The Open Provenance Model: An Overview", Provenance and Annotation of Data and Processes, p. 323--326.

Digital Library

[26]

E. Ogasawara, C. Paulino, L. Murta, C. Werner, and M. Mattoso, 2009, Experiment Line: Software Reuse in Scientific Workflows, In: 21th SSDBM, p. 264--272, New Orleans, LA.

Digital Library

[27]

A. Marinho, L. Murta, C. Werner, V. Braganholo, S. M. S. D. Cruz, and M. Mattoso, 2009, A Strategy for Provenance Gathering in Distributed Scientific Workflows, In: IEEE International Workshop on Scientific Workflows, Los Angeles, California, United States.

Digital Library

[28]

D. A. Bader, 2008, Petascale computing: algorithms and applications. Chapman&Hall/CRC.

[29]

R. N. Elias and A. L. G. A. Coutinho, 2007, Stabilized edge-based finite element simulation of free-surface flows, International Journal for Numerical Methods in Fluids, v. 54, n. 6--8, p. 965--993.

[30]

R. N. Elias, V. Braganholo, J. Clarke, M. Mattoso, and A. L. Coutinho, 2009, Using XML with large parallel datasets: is ther any hope?, In: Parallel Computational Fluid Dynamics (ParCFD)

[31]

Paraview, 2009, Paraview, http://www.paraview.org.

[32]

L. Gadelha and M. Mattoso, 2008, Kairos: An Architecture for Securing Authorship and Temporal Information of Provenance Data in Grid-Enabled Workflow Management Systems, In: International Workshop on Scientific Workflows and Business Workflow Standards in e-Science (SWBES 2008), p. 597--602

Digital Library

[33]

R. Hasan, R. Sion, and M. Winslett, 2007, Introducing secure provenance: problems and challenges, In: Proceedings of the 2007 ACM workshop on Storage security and survivability, p. 13--18, Alexandria, Virginia, USA.

Digital Library

[34]

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, 2007, Dryad: distributed data-parallel programs from sequential building blocks, In: 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, p. 72, 59, Lisbon, Portugal.

Digital Library

[35]

Li Hui, Huashan Yu, and Li Xiaoming, 2008, A lightweight execution framework for massive independent tasks, In: Workshop on Many-Task Computing on Grids and Supercomputers, 2008, p. 1--9

[36]

R. Pike, S. Dorward, R. Griesemer, and S. Quinlan, 2005, Interpreting the data: Parallel analysis with Sawzall, Sci. Program., v. 13, n. 4, p. 277--298.

Digital Library

[37]

I. WfMC, 2009, Binding, WfMC Standards, WFMC-TC-1023, http://www. wfmc. org, 2000.

Cited By

Nascimento ASilva VPaes Ade Oliveira D(2021)An incremental reinforcement learning scheduling strategy for data‐intensive scientific workflows in the cloudConcurrency and Computation: Practice and Experience10.1002/cpe.619333:11Online publication date: 26-Jan-2021
https://doi.org/10.1002/cpe.6193
Arulswamy JPallickara S(2017)Columbus: Enabling Scalable Scientific Workflows for Fast Evolving Spatio-Temporal Sensor Data2017 IEEE International Conference on Services Computing (SCC)10.1109/SCC.2017.11(9-18)Online publication date: Jun-2017
https://doi.org/10.1109/SCC.2017.11
Afzal AAnsari ZFaizabadi ARamis M(2016)Parallelization Strategies for Computational Fluid Dynamics Software: State of the Art ReviewArchives of Computational Methods in Engineering10.1007/s11831-016-9165-424:2(337-363)Online publication date: 13-Jan-2016
https://doi.org/10.1007/s11831-016-9165-4
Show More Cited By

Index Terms

Exploring many task computing in scientific workflows

Recommendations

Atomicity and provenance support for pipelined scientific workflows

Today many significant scientific discoveries are achieved through complex and distributed scientific computations that are structured and represented as scientific workflows. Although atomicity is a well studied topic in transaction processing and ...
CAMERA 2.0: A Data-centric Metagenomics Community Infrastructure Driven by Scientific Workflows
SERVICES '10: Proceedings of the 2010 6th World Congress on Services

Over the last decade, workflows have been established as a mechanism for scientific developers to create simplified views of complex scientific processes. However, there is a need for a comprehensive system architecture to link scientific developers ...
Monitoring of Grid scientific workflows
Large-Scale Programming Tools and Environments

Scientific workflows are a means of conducting in silico experiments in modern computing infrastructures for e-Science, often built on top of Grids. Monitoring of Grid scientific workflows is essential not only for performance analysis but also to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MTAGS '09: Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers

November 2009

131 pages

ISBN:9781605587141

DOI:10.1145/1646468

Conference Chairs:
Ioan Raicu
Northwestern University
,
Ian Foster
University of Chicago & Argonne National Laboratory
,
Yong Zhao
Microsoft

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 November 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SC '09

Sponsor:

SIGARCH

SC '09: International Conference for High Performance Computing, Networking, Storage and Analysis

November 16, 2009

Oregon, Portland

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
387
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nascimento ASilva VPaes Ade Oliveira D(2021)An incremental reinforcement learning scheduling strategy for data‐intensive scientific workflows in the cloudConcurrency and Computation: Practice and Experience10.1002/cpe.619333:11Online publication date: 26-Jan-2021
https://doi.org/10.1002/cpe.6193
Arulswamy JPallickara S(2017)Columbus: Enabling Scalable Scientific Workflows for Fast Evolving Spatio-Temporal Sensor Data2017 IEEE International Conference on Services Computing (SCC)10.1109/SCC.2017.11(9-18)Online publication date: Jun-2017
https://doi.org/10.1109/SCC.2017.11
Afzal AAnsari ZFaizabadi ARamis M(2016)Parallelization Strategies for Computational Fluid Dynamics Software: State of the Art ReviewArchives of Computational Methods in Engineering10.1007/s11831-016-9165-424:2(337-363)Online publication date: 13-Jan-2016
https://doi.org/10.1007/s11831-016-9165-4
Mattoso MDias JOcaña KOgasawara ECosta FHorta FSilva Vde Oliveira D(2015)Dynamic steering of HPC scientific workflowsFuture Generation Computer Systems10.1016/j.future.2014.11.01746:C(100-113)Online publication date: 1-May-2015
https://dl.acm.org/doi/10.1016/j.future.2014.11.017
Mattoso MOcaña KHorta FDias JOgasawara ESilva Vde Oliveira DCosta FAraújo IHidders JMissier PSroka J(2013)User-steering of HPC workflowsProceedings of the 2nd ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies10.1145/2499896.2499900(1-6)Online publication date: 23-Jun-2013
https://dl.acm.org/doi/10.1145/2499896.2499900
De Oliveira DOcañA KOgasawara EDias JGonçAlves JBaiãO FMattoso M(2013)Performance evaluation of parallel strategies in public cloudsFuture Generation Computer Systems10.1016/j.future.2012.12.01929:7(1816-1825)Online publication date: 1-Sep-2013
https://dl.acm.org/doi/10.1016/j.future.2012.12.019
Ogasawara EDias JSilva VChirigati Fde Oliveira DPorto FValduriez PMattoso M(2013)Chiron: a parallel engine for algebraic scientific workflowsConcurrency and Computation: Practice and Experience10.1002/cpe.303225:16(2327-2341)Online publication date: 10-May-2013
https://doi.org/10.1002/cpe.3032
Chirigati FSilva VOgasawara Ede Oliveira DDias JPorto FValduriez PMattoso MHidders JMissier PSroka J(2012)Evaluating parameter sweep workflows in high performance computingProceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies10.1145/2443416.2443418(1-10)Online publication date: 20-May-2012
https://dl.acm.org/doi/10.1145/2443416.2443418
Sarmiento EBreitman KDávila AViterbo JCho YGantenbein RKuo TTarokh V(2012)A framework for readapting and running bioinformatics applications in the cloudProceedings of the 2012 ACM Research in Applied Computation Symposium10.1145/2401603.2401624(86-91)Online publication date: 23-Oct-2012
https://dl.acm.org/doi/10.1145/2401603.2401624
Oliveira DOcaña KBaião FMattoso M(2012)A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in CloudsJournal of Grid Computing10.1007/s10723-012-9227-210:3(521-552)Online publication date: 1-Sep-2012
https://dl.acm.org/doi/10.1007/s10723-012-9227-2
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents