Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Pegasus: A framework for mapping complex scientific workflows onto distributed systems

Published: 01 July 2005 Publication History

Abstract

This paper describes the Pegasus framework that can be used to map complex scientific workflows onto distributed resources. Pegasus enables users to represent the workflows at an abstract level without needing to worry about the particulars of the target execution systems. The paper describes general issues in mapping applications and the functionality of Pegasus. We present the results of improving application performance through workflow restructuring which clusters multiple tasks in a workflow into single entities. A real-life astronomy application is used as the basis for the study.

References

[1]
{1} E. Deelman et al., Mapping abstract complex workflows onto grid environments, Journal of Grid Computing 1 (2003), 25- 39.
[2]
{2} E. Deelman et al., GriPhyN and LIGO, Building a Virtual Data Grid for Gravitational Wave Scientists, Proceedings of 11th Intl Symposium on High Performance Distributed Computing, 2002.
[3]
{3} G.B. Berriman et al., Montage: A Grid Enabled Engine for Delivering Custom Science-Grade Mosaics On Demand, Proceedings of SPIE Conference 5487: Astronomical Telescopes, 2004.
[4]
{4} E. Deelman et al., Pegasus: Mapping Scientific Workflows onto the Grid, Proceedings of 2nd EUROPEAN ACROSS GRIDS CONFERENCE, Nicosia, Cyprus, 2004.
[5]
{5} Southern California Earthquake Center (SCEC), 2004. http://www.scec.org/.
[6]
{6} P. Bruneman et al., Why and where: A characterization of data provenance, Proceedings of 8th International Conference on Database Theory, 2001.
[7]
{7} I. Foster et al., Chimera: A Virtual Data System for Representing, Querying, and Automating Data Derivation, Proceedings of Scientific and Statistical Database Management, 2002.
[8]
{8} J. Kim et al., A Knowledge-Based Approach to Interactive Workflow Composition, Proceedings of Workshop: Planning and Scheduling for Web and Grid Services at the 14th International Conference on Automatic Planning and Scheduling (ICAPS 04), Whistler, Canada, 2004.
[9]
{9} J. Kim et al., An Intelligent Assistant for Interactive Workflow Composition, Proceedings of 2004 International Conference on Intelligent User Interfaces (IUI-2004), Madeira Islands, Portugal, 2004.
[10]
{10} G. Singh et al., A Metadata Catalog Service for Data Intensive Applications, Proceedings of Supercomputing (SC), 2003.
[11]
{11} E. Deelman et al., Grid-Based Metadata Services, Proceedings of Statistical and Scientific Database Management (SSDBM), Santorini, Greece, 2004.
[12]
{12} D. Sundaram-Stukel and M.K. Vernon, Predictive Analysis of a Wavefront Application Using LogGP, Proceedings of 7th ACM SIGPLAN Symp. on Principles and Practices of Parallel Programming (PPoPP '99), Atlanta, GA, 1999.
[13]
{13} V. Taylor et al., Using Kernel Couplings to Predict Parallel Application Performance, Proceedings of 11th IEEE International Symposium on High-Performance Distributed Computing (HPDC 2002), Edinburgh, Scotland, 2002.
[14]
{14} V.S. Adve et al., POEMS: End-to-end performance design of large parallel adaptive computational systems, IEEE Transactions on Software Engineering 26 (2000), 1027-1048.
[15]
{15} I. Foster and C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure, (2nd ed.), Morgan Kauffmann, 2004.
[16]
{16} E. Deelman et al., Workflow Management in GriPhyN, in: Grid Resource Management, J. Nabrzyski, J. Schopf and J. Weglarz, eds, Kluwer, 2003.
[17]
{17} "Globus". http://www.globus.org/.
[18]
{18} K. Czajkowski et al., A Resource Management Architecture for Metacomputing Systems, in 4th Workshop on Job Scheduling Strategies for Parallel Processing: Springer-Verlag, 1998, 62- 82.
[19]
{19} W. Allcock et al., Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing , Proceedings of Mass Storage Conference, 2001.
[20]
{20} K. Czajkowski et al., Grid Information Services for Distributed Resource Sharing, Proceedings of 10th IEEE International Symposium on High Performance Distributed Computing, 2001.
[21]
{21} A. Chervenak et al., Giggle: A Framework for Constructing Sclable Replica Location Services, Proceedings of Proceedings of Supercomputing 2002 (SC2002), 2002.
[22]
{22} J. Frey et al., Condor-G: A computation management agent for multi-institutional grids, Cluster Computing 5 (2002), 237- 246.
[23]
{23} E. Deelman et al., Transformation Catalog Design for Gri-PhyN, Technical Report GriPhyN-2001-17, 2001.
[24]
{24} MPI: A Message-Passing Interface Standard, May 1994.
[25]
{25} "Montage". http://montage.ipac.caltech.edu.
[26]
{26} R. Henderson and D. Tweten, Portable Batch System: External Reference Specification, 1996.
[27]
{27} S. Zhou, LSF: Load Sharing in Large-Scale Heterogeneous Distributed Systems, in Proc. Workshop on Cluster Computing, 1992.
[28]
{28} B. Bode et al., The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters, Proceedings of 4th Annual Linux Showcase & Conference, Atlanta, 2000.
[29]
{29} E. Akarsu et al., WebFlow - High-Level Programming Environment and Visual Authoring Toolkit for High Performance Distributed Computing, 1998. http://www.supercomp. org/sc98/TechPapers/sc98_FullAbstracts/Akarsu809/Index. htm.
[30]
{30} G. v. Laszewski et al., A java commodity grid toolkit, Concurrency: Practice and Experience 13 (2001), 643-662.
[31]
{31} J. Cao et al., GridFlow: WorkFlow Management for Grid Computing, Proceedings of 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'03), 2003.
[32]
{32} G. v. Laszewski et al., GridAnt - Client Side Grid Workflow Management with Ant, 2003. http://www-unix. globus.org/cog/projects/gridant/gridant-whitepaper.pdf.
[33]
{33} "ANT." http://ant.apache.org.
[34]
{34} R. Buyya et al., Nimrod-G: An Architecture for a Resource Management and Scheduling System in a Global Computational Grid, Proceedings of HPC ASIA'2000, 2000.
[35]
{35} J. Beiriger et al., Constructing the ASCI Grid, Proceedings of Proc. 9th IEEE Symposium on High Performance Distributed Computing, 2000.
[36]
{36} "Globus Toolkit 3". http://www.globus.org/ogsa/.
[37]
{37} V. Welch et al., Security for Grid Services, Proceedings of Twelfth International Symposium on High Performance Distributed Computing (HPDC-12), 2003.
[38]
{38} E. Deelman et al., Pegasus and the Pulsar Search: From Metadata to Execution on the Grid, Proceedings of Applications Grid Workshop, PPAM 2003, Czestochowa, Poland, 2003.
[39]
{39} "Pegasus". http://pegasus.isi.edu.

Cited By

View all
  • (2025)Scheduling ensemble workflows on hybrid resources in IaaS cloudsComputing10.1007/s00607-024-01386-8107:1Online publication date: 1-Jan-2025
  • (2024)A neural network framework for optimizing parallel computing in cloud serversJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2024.103131150:COnline publication date: 1-May-2024
  • (2024)A derived information framework for a dynamic knowledge graph and its application to smart citiesFuture Generation Computer Systems10.1016/j.future.2023.10.008152:C(112-126)Online publication date: 4-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Scientific Programming
Scientific Programming  Volume 13, Issue 3
July 2005
74 pages

Publisher

IOS Press

Netherlands

Publication History

Published: 01 July 2005

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Scheduling ensemble workflows on hybrid resources in IaaS cloudsComputing10.1007/s00607-024-01386-8107:1Online publication date: 1-Jan-2025
  • (2024)A neural network framework for optimizing parallel computing in cloud serversJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2024.103131150:COnline publication date: 1-May-2024
  • (2024)A derived information framework for a dynamic knowledge graph and its application to smart citiesFuture Generation Computer Systems10.1016/j.future.2023.10.008152:C(112-126)Online publication date: 4-Mar-2024
  • (2024)Security challenges for workflow allocation model in cloud computing environment: a comprehensive survey, framework, taxonomy, open issues, and future directionsThe Journal of Supercomputing10.1007/s11227-023-05873-180:8(11491-11555)Online publication date: 1-May-2024
  • (2024)Tri-objective Optimization for Large-Scale Workflow Scheduling and Execution in CloudsJournal of Network and Systems Management10.1007/s10922-024-09863-332:4Online publication date: 6-Sep-2024
  • (2024)A novel strategy for deterministic workflow scheduling with load balancing using modified min-min heuristic in cloud computing environmentCluster Computing10.1007/s10586-024-04307-827:5(6985-7006)Online publication date: 1-Aug-2024
  • (2024)Scientific workflow execution in the cloud using a dynamic runtime modelSoftware and Systems Modeling (SoSyM)10.1007/s10270-023-01112-623:1(163-193)Online publication date: 1-Feb-2024
  • (2024)MAESTRO: a lightweight ontology-based framework for composing and analyzing script-based scientific experimentsKnowledge and Information Systems10.1007/s10115-024-02134-266:10(5959-6000)Online publication date: 1-Oct-2024
  • (2023)Rapid simulations of atmospheric data assimilation of hourly-scale phenomena with modern neural networksProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607031(1-13)Online publication date: 12-Nov-2023
  • (2023)Approximation-Aware Task Deployment on Heterogeneous Multicore Platforms With DVFSIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.322229342:7(2108-2121)Online publication date: 1-Jul-2023
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media