Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1851476.1851570acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Weaver: integrating distributed computing abstractions into scientific workflows using Python

Published: 21 June 2010 Publication History

Abstract

Weaver is a high-level framework that enables researchers to integrate distributed computing abstractions into their scientific workflows. Rather than develop a new workflow language, we built Weaver on top of the Python programming language. As such, Weaver takes advantage of users' familiarity with Python, minimizes barriers to adoption, and allows for integration with existing software. In this paper, we introduce Weaver's programming model, which consists of datasets, functions, and abstractions that users combine to organize and specify large-scale scientific workflows. We also explain how these specifications are compiled into a directed acyclic graph used by a workflow manager that dispatches the work to a variety of distributed computing engines. To examine how Weaver is used in scientific research, we present three example applications that demonstrate Weaver's ability to integrate into existing workflows and incorporate optimized distributed computing abstraction tools.

References

[1]
}}The directed acyclic graph manager. http://www.cs.wisc.edu/condor/dagman, 2002.
[2]
}}J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In Operating Systems Design and Implementation, 2004.
[3]
}}E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, B. Berriman, J. Good, A. Laity, J. Jacob, and D. Katz. Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming Journal, 13(3), 2005.
[4]
}}M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data parallel programs from sequential building blocks. In Proceedings of EuroSys, March 2007.
[5]
}}M. Isard and Y. Yu. Distributed data-parallel computing using a high-level programming language. In SIGMOD '09: Proceedings of the 35th SIGMOD international conference on Management of data, pages 987--994, New York, NY, USA, 2009. ACM.
[6]
}}C. Moretti, J. Bulosan, D. Thain, and P. Flynn. All-Pairs: An Abstraction for Data Intensive Cloud Computing. In IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 1--11, 2008.
[7]
}}C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1099--1110, New York, NY, USA, 2008. ACM.
[8]
}}R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. Interpreting the data: Parallel analysis with sawzall. Scientific Programming Journal, 13(4):227--298.
[9]
}}Python Programming Language. http://www.python.org/, 2010.
[10]
}}SQLAlchemy. http://sqlalchemy.org/, 2010.
[11]
}}D. Thain, T. Tannenbaum, and M. Livny. Condor and the grid. In F. Berman, G. Fox, and T. Hey, editors, Grid Computing: Making the Global Infrastructure a Reality. John Wiley, 2003.
[12]
}}L. Yu, C. Moretti, A. Thrasher, S. Emrich, K. Judd, and D. Thain. Harnessing Parallelism in Multicore Clusters with the All-Pairs, Wavefront, and Makeflow Abstractions. to appear in Journal of Cluster Computing, 2010.
[13]
}}Y. Zhao, J. Dobson, L. Moreau, I. Foster, and M. Wilde. A notation and system for expressing and executing cleanly typed workflows on messy scientific data. In SIGMOD, 2005.

Cited By

View all
  • (2023)Scientific workflow execution in the cloud using a dynamic runtime modelSoftware and Systems Modeling10.1007/s10270-023-01112-623:1(163-193)Online publication date: 23-Jun-2023
  • (2018)The future of scientific workflowsInternational Journal of High Performance Computing Applications10.5555/3195474.319547732:1(159-175)Online publication date: 1-Jan-2018
  • (2017)The future of scientific workflowsThe International Journal of High Performance Computing Applications10.1177/109434201770489332:1(159-175)Online publication date: 26-Apr-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
June 2010
911 pages
ISBN:9781605589428
DOI:10.1145/1851476
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2010

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

HPDC '10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Scientific workflow execution in the cloud using a dynamic runtime modelSoftware and Systems Modeling10.1007/s10270-023-01112-623:1(163-193)Online publication date: 23-Jun-2023
  • (2018)The future of scientific workflowsInternational Journal of High Performance Computing Applications10.5555/3195474.319547732:1(159-175)Online publication date: 1-Jan-2018
  • (2017)The future of scientific workflowsThe International Journal of High Performance Computing Applications10.1177/109434201770489332:1(159-175)Online publication date: 26-Apr-2017
  • (2017)How many ways to use CiteSpace? A study of user interactive events over 14 monthsJournal of the Association for Information Science and Technology10.1002/asi.2377068:5(1234-1256)Online publication date: 1-May-2017
  • (2016)Tigres workflow libraryProceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2016.54(146-155)Online publication date: 16-May-2016
  • (2015)Experiences in autotuning matrix multiplication for energy minimization on GPUsConcurrency and Computation: Practice & Experience10.1002/cpe.351627:17(5096-5113)Online publication date: 10-Dec-2015
  • (2015)FireWorksConcurrency and Computation: Practice & Experience10.1002/cpe.350527:17(5037-5059)Online publication date: 10-Dec-2015
  • (2015)Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applicationsConcurrency and Computation: Practice & Experience10.1002/cpe.345727:17(4763-4783)Online publication date: 10-Dec-2015
  • (2014)Experiences with User-Centered Design for the Tigres Workflow APIProceedings of the 2014 IEEE 10th International Conference on e-Science - Volume 0110.1109/eScience.2014.56(290-297)Online publication date: 20-Oct-2014
  • (2014)Combining workflow templates with a shared space-based execution modelProceedings of the 9th Workshop on Workflows in Support of Large-Scale Science10.1109/WORKS.2014.14(50-58)Online publication date: 16-Nov-2014
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media