Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

A Distributed Workflow Management System with Case Study of Real-life Scientific Applications on Grids

Published: 01 September 2012 Publication History

Abstract

Next-generation scientific applications feature complex workflows comprised of many computing modules with intricate inter-module dependencies. Supporting such scientific workflows in wide-area networks especially Grids and optimizing their performance are crucial to the success of collaborative scientific discovery. We develop a Scientific Workflow Automation and Management Platform (SWAMP), which enables scientists to conveniently assemble, execute, monitor, control, and steer computing workflows in distributed environments via a unified web-based user interface. The SWAMP architecture is built entirely on a seamless composition of web services: the functionalities of its own are provided and its interactions with other tools or systems are enabled through web services for easy access over standard Internet protocols while being independent of different platforms and programming languages. SWAMP also incorporates a class of efficient workflow mapping schemes to achieve optimal end-to-end performance based on rigorous performance modeling and algorithm design. The performance superiority of SWAMP over existing workflow mapping schemes is justified by extensive simulations, and the system efficacy is illustrated by large-scale experiments on real-life scientific workflows for climate modeling through effective system implementation, deployment, and testing on the Open Science Grid.

References

[1]
Afrati, F., Papadimitriou, C., Papageorgiou, G.: Scheduling DAGs to minimize time and communication. In: Proc. of the 3rd Aegean Workshop on Computing: VLSI Algorithms and Architectures, pp. 134-138. Springer, Berlin (1988).
[2]
Agarwalla, B., Ahmed, N., Hilley, D., Ramachandran, U.: Streamline: a scheduling heuristic for streaming application on the Grid. In: Proc. of the 13th Multimedia Comp. and Net. Conf. San Jose, CA (2006).
[3]
Ahmed, I., Kwok, Y.: On exploiting task duplication in parallel program scheduling. IEEE Trans. Parallel Distrib. Syst. 9, 872-892 (1998).
[4]
Annie, S., Yu, H., Jin, S., Lin, K.-C.: An incremental genetic algorithm approach to multiprocessor scheduling. IEEE Trans. Parallel Distrib. Syst. 15, 824-834 (2004).
[5]
Bandwidth Test Controller: http://www.internet2.edu/performance/bwctl/. Accessed 1 Aug 2012.
[6]
Boeres, C., Filho, J., Rebello, V.: A cluster-based strategy for scheduling task on heterogeneous processors. In: Proc. of 16th Symp. on Comp. Arch. and HPC, pp. 214-221 (2004).
[7]
Bozdag, D., Catalyurek, U., Ozguner, F.: A task duplication based bottom-up scheduling algorithm for heterogeneous environments. In: Proc. of the 20th IPDPS (2006).
[8]
Benoit, A., Robert, Y.: Mapping pipeline skeletons onto heterogeneous platforms. JPDC 68(6), 790-808 (2008).
[9]
Churches, D., Gombas, G., Harrison, A., Maassen, J., Robinson, C., Shields, M., Taylor, I., Wang, I.: Programming scientific and distributed workflow with triana services. Concurrency and Computation: Practice and Experience, Special Issue: Workflow in Grid Systems 18(10), 1021-1037 (2006). http://www.trianacode.org.
[10]
Climate and Carbon Research Institute: http://www.ccs.ornl.gov/CCR. Accessed 1 Aug 2012.
[11]
Cordella, L., Foggia, P., Sansone, C., Vento, M.: An improved algorithm for matching large graphs. In: Proc. of the 3rd Int. Workshop on Graph-based Representations, Italy (2001).
[12]
DAGMan: http://www.cs.wisc.edu/condor/dagman. Accessed 1 Aug 2012.
[13]
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proc. of 6th Symp. on Operating System Design and Implementation, San Francisco, CA (2004).
[14]
Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-Science: an overview of workflow system features and capabilities. J. of Future Generation Comp. Sys. 25(5), 528-540 (2009).
[15]
Deelman, E., Singh, G., Su, M., Blythe, J., Gil, A., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13, 219-237 (2005).
[16]
Dhodhi, M., Ahmad, I., Yatama, A.: An integrated technique for task matching and scheduling onto distributed heterogeneous computing systems. JPDC 62, 1338-1361 (2002).
[17]
Distributed computing projects: http://en.wikipedia.org/wiki/List_of_distributed_computing_projects. Accessed 1 Aug 2012.
[18]
Dobber, M., van der Mei, R., Koole, G.: A prediction method for job runtimes on shared processors: survey, statistical analysis and new avenues. Perform. Eval. 64(7-8), 755-781 (2007).
[19]
Earth Simulator Center: http://www.jamstec.go.jp/esc. Accessed 1 Aug 2012.
[20]
Earth System Grid (ESG): http://www.earth systemgrid.org. Accessed 1 Aug 2012.
[21]
Fahringer, T., Prodan, R., Duan, R., Nerieri, F., Podlipnig, S., Qin, J., Siddiqui, M., Truong, H.-L., Villazon, A., Wieczorek, M.: ASKALON: a Grid application development and computing environment. In: Proc. of the 6th IEEE/ACM Int. Workshop on Grid Comp., pp. 122-131 (2005).
[22]
Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of NP-completeness. Freeman, San Francisco (1979).
[23]
Gates, M., Warshavsky, A.: Iperf version 2.0.3. http://iperf.sourceforge.net. Accessed 1 Aug 2012.
[24]
Gerasoulis, A., Yang, T.: A comparison of clustering heuristics for scheduling DAGs on multiprocessors. JPDC 16(4), 276-291 (1992).
[25]
Globus Replica Location Service: http://www.globus.org/toolkit/data/rls/. Accessed 1 Aug 2012.
[26]
GridFTP: http://www.globus.org/grid_software/data/gridftp.php. Accessed 1 Aug 2012.
[27]
Gu, Y., Wu, Q.: Maximizing workflow throughput for streaming applications in distributed environments. In: Proc. of the 19th Int. Conf. on Comp. Comm. and Net., Zurich, Switzerland (2010).
[28]
Gu, Y., Wu, Q.: Optimizing distributed computing workflows in heterogeneous network environments. In: Proc. of the 11th Int. Conf. on Distributed Computing and Networking, Kolkata, India, 3-6 Jan 2010.
[29]
Gu, Y., Wu, Q., Benoit, A., Robert, Y.: Optimizing end-to-end performance of distributed applications with linear computing pipelines. In: Proc. of the 15th Int. Conf. on Para. and Dist. Sys., Shenzhen, China, 8-11 Dec 2009.
[30]
Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res 34, 729-732 (2006). http://www.taverna.org.uk.
[31]
Ilavarasan, E., Thambidurai, P.: Low complexity performance effective task scheduling algorithm for heterogeneous computing environments. J. Comp. Sci. 3(2), 94-103 (2007).
[32]
Johnston, W.: Computational and data Grids in large-scale science and engineering. J. of Future Generation Comp. Sys. 18(8), 1085-1100 (2002).
[33]
Kacsuk, P., Farkas, Z., Sipos, G., Toth, A., Hermann, G.: Workflow-level parameter study management in multi-Grid environments by the P-GRADE Grid portal. In: Int. Workshop on Grid Computing Enviornments (2006).
[34]
Kwok, Y., Ahmad, I.: Dynamic critical-path scheduling: An effective technique for allocating task graph to multiprocessors. IEEE Trans. Parallel Distrib. Syst. 7(5), 506-521 (1996).
[35]
Kwok, Y., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31(4), 406-471 (1999).
[36]
Large Hadron Collider (LHC): http://lhc.web.cern.ch/lhc. Accessed 1 Aug 2012.
[37]
Laszewski, G., Hategan, M.: Workflow concepts of the Java CoG Kit. J. Grid Computing 3(3-4), 239-258 (2005).
[38]
Lewis, T., EI-Rewini, H.: Introduction to Parallel Computing. Prentice Hall, New York (1992).
[39]
Load Sharing Facility: http://www.platform.com/workload-management/high-performance-computing/lp. Accessed 1 Aug 2012.
[40]
Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger-Frank, E., Jones, M., Lee, E., Tao, J., Zhao, Y.: Scientific workflow management and the Kepler system. Concurrency and Computation: Practice and Experience 18(10), 1039-1605 (2006).
[41]
Ma, T., Buyya, R.: Critical-path and priority based algorithms for scheduling workflows with parameter sweep tasks on global Grids. In: Proc. of the 17th Int. Symp. on Computer Architecture on HPC, pp. 251-258 (2005).
[42]
McCreary, C., Khan, A., Thompson, J., McArdle, M.: A comparison of heuristics for scheduling DAGs on multiprocessors. In: Proc. of the 8th ISPP, pp. 446-451 (1994).
[43]
McDermott, W., Maluf, D., Gawdiak, Y., Tran, P.: Airport simulations using distributed computational resources. J. Defense Soft. Eng. 16(6), 7-11 (2003).
[44]
Messmer, B.: Efficient graph matching algorithms for preprocessed model graphs. PhD thesis, Institute of Computer Science and Applied Mathematics, University of Bern (1996).
[45]
Monitoring and Discovery System (MDS): http://www.globus.org/toolkit/mds/. Accessed 1 Aug 2012.
[46]
Network weather service: http://nws.cs.ucsb.edu. Accessed 1 Aug 2012.
[47]
One-Way Active Measurement Protocol: http://www.internet2.edu/performance/owamp/. Accessed 1 Aug 2012.
[48]
Open Science Grid: http://www.opensciencegrid.org. Accessed 1 Aug 2012.
[49]
OSCARS: On-demand Secure Circuits and Advance Reservation System: http://www.es.net/oscars. Accessed 1 Aug 2012.
[50]
OSG Resource and Site Validation: http://vdt.cs.wisc.edu/components/osg-rsv.html. Accessed 1 Aug 2012.
[51]
Performance Inspector: http://perfinsp.sourceforge.net. Accessed 1 Aug 2012.
[52]
perfSONAR: http://www.perfsonar.net/. Accessed 1 Aug 2012.
[53]
Portable Batch System: http://www.pbsworks.com/. Accessed 1 Aug 2012.
[54]
Rahman, M., Venugopal, S., Buyya, R.: A dynamic critical path algorithm for scheduling scientific workflow applications on global Grids. In: Proc. of the 3rd IEEE Int. Conf. on e-Sci. and Grid Comp., pp. 35-42 (2007).
[55]
Ranaweera, A., Agrawal, D.: A task duplication based algorithm for heterogeneous systems. In: Proc. of IPDPS, pp. 445-450 (2000).
[56]
Rao, N.S.V.: Vector space methods for sensor fusion problems. Opt. Eng. 37(2), 499-504 (1998).
[57]
Reliable File Transfer: http://www-unix.globus.org/toolkit/docs/3.2/rft/index.html. Accessed 1 Aug 2012.
[58]
Sekhar, A., Manoj, B., Murthy, C.: A state-space search approach for optimizing reliability and cost of execution in distributed sensor networks. In: Proc. of Int. Workshop on Dist. Comp., pp. 63-74 (2005).
[59]
Shroff, P., Watson, D., Flann, N., Freund, R.: Genetic simulated annealing for scheduling data-dependent tasks in heterogeneous environments. In: Proc. of Heter. Comp. Workshop, pp. 98-104 (1996).
[60]
Singh, M., Vouk, M.: Scientific workflows: scientific computing meets transactional workflows. In: Proc. of the NSF Workshop on Workflow and Process Automation in Information Systems: State-of-the-Art and Future Directions, pp. 28-34. Univ. Georgia, Athens, GA (1996).
[61]
Spallation Neutron Source: http://neutrons.ornl.gov, http://www.sns.gov. Accessed 1 Aug 2012.
[62]
Storage Resource Broker (SRB): http://www.sdsc.edu/srb/index.php/Main_Page. Accessed 1 Aug 2012.
[63]
Storage Resource Management (SRM): https://sdm.lbl.gov/srm-wg/. Accessed 1 Aug 2012.
[64]
Stork: http://www.cct.lsu.edu/~kosar/stork/index.php. Accessed 1 Aug 2012.
[65]
Swift: http://www.ci.uchicago.edu/swift/main/. Accessed 1 Aug 2012.
[66]
Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.): Workflows for e-Science: Scientific Workflows for Grids. Springer, Berlin Heidelberg New York (2007).
[67]
TeraPaths: https://www.racf.bnl.gov/terapaths. Accessed 1 Aug 2012.
[68]
The Whetstone Benchmark: http://www.roylongbottom.org.uk/whetstone.htm. Accessed 1 Aug 2012.
[69]
Topcuoglu, H., Hariri, S., Wu, M.: Performance effective and low-complexity task scheduling for heterogeneous computing. IEEE TPDS 13(3), 260-274 (2002).
[70]
Wang, L., Siege, H., Roychowdhury, V., Maciejewski, A.: Task matching and scheduling in heterogeneous computing environments using a genetic-algorithm-based approach. JPDC 47, 8-22 (1997).
[71]
Wassermann, B., Emmerich, W., Butchart, B., Cameron, N., Chen, L., Patel, J.: Workflows for e-Science: Scientific Workflows for Grids, Chapter Sedna: A BPEL-based Environment for Visual Scientific Workflow Modeling, pp. 427-448. Springer, London (2007).
[72]
Worldwide LHC Computing Grid (WLCG): http://lcg.web.cern.ch/LCG. Accessed 1 Aug 2012.
[73]
Wu, Q., Gu, Y.: Optimizing end-to-end performance of data-intensive computing pipelines in heterogeneous network environments. J. Parallel Distrib. Comput. 71(2), 254-265 (2011).
[74]
Wu, Q., Gu, Y., Liao, Y., Lu, X., Lin, Y., Rao, N.: Latency modeling and minimization for large-scale scientific workflows in distributed network environments. In: The 44th Annual Simulation Symposium (ANSS11), Part of the 2011 Spring Simulation Multiconference (SpringSim11), Boston, MA, 4-7 Apr 2011.
[75]
Wu, Q., Rao, N.S.V.: On transport daemons for small collaborative applications over wide-area networks. In: Proc. of the 24th IEEE Int. Performance Computing and Communications Conf., pp. 159-166, Phoenix, AZ, 7-9 Apr 2005.
[76]
Wu, Q., Zhu, M., Lu, X., Brown, P., Lin, Y., Gu, Y., Cao, F., Reuter, M.: Automation and management of scientific workflows in distributed network environments. In: Proc. of the 6th Int. Workshop on Sys. Man. Tech., Proc., and Serv., Atlanta, GA, 19 Apr 2010.

Cited By

View all
  • (2021)Designing for Recommending Intermediate States in A Scientific Workflow Management SystemProceedings of the ACM on Human-Computer Interaction10.1145/34571455:EICS(1-29)Online publication date: 29-May-2021
  • (2019)LincoSim: a Web Based HPC-Cloud Platform for Automatic Virtual Towing Tank AnalysisJournal of Grid Computing10.1007/s10723-019-09494-y17:4(771-795)Online publication date: 1-Dec-2019
  • (2018)Survey of Scientific Programming Techniques for the Management of Data-Intensive Engineering EnvironmentsScientific Programming10.1155/2018/84674132018Online publication date: 30-Oct-2018
  • Show More Cited By
  1. A Distributed Workflow Management System with Case Study of Real-life Scientific Applications on Grids

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Journal of Grid Computing
    Journal of Grid Computing  Volume 10, Issue 3
    September 2012
    245 pages

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 01 September 2012

    Author Tags

    1. Climate modeling
    2. Distributed computing
    3. Open Science Grid
    4. Scientific workflow

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Designing for Recommending Intermediate States in A Scientific Workflow Management SystemProceedings of the ACM on Human-Computer Interaction10.1145/34571455:EICS(1-29)Online publication date: 29-May-2021
    • (2019)LincoSim: a Web Based HPC-Cloud Platform for Automatic Virtual Towing Tank AnalysisJournal of Grid Computing10.1007/s10723-019-09494-y17:4(771-795)Online publication date: 1-Dec-2019
    • (2018)Survey of Scientific Programming Techniques for the Management of Data-Intensive Engineering EnvironmentsScientific Programming10.1155/2018/84674132018Online publication date: 30-Oct-2018
    • (2018)Mechanisms for provenance collection in scientific workflow systemsComputing10.1007/s00607-017-0578-1100:5(439-472)Online publication date: 1-May-2018
    • (2017)A Distributed Infrastructure to Support Scientific ExperimentsJournal of Grid Computing10.1007/s10723-017-9401-715:4(475-500)Online publication date: 1-Dec-2017
    • (2015)Integration of grid, cluster and cloud resources to semantically annotate a large-sized repository of learning objectsConcurrency and Computation: Practice & Experience10.1002/cpe.342727:17(4603-4629)Online publication date: 10-Dec-2015
    • (2014)Solving the Interoperability Problem by Means of a BusJournal of Grid Computing10.1007/s10723-013-9276-112:1(41-65)Online publication date: 1-Mar-2014
    • (2013)On causes of GridFTP transfer throughput varianceProceedings of the Third International Workshop on Network-Aware Data Management10.1145/2534695.2534701(1-10)Online publication date: 17-Nov-2013
    • (2013)A practical experience concerning the parallel semantic annotation of a large-scale data collectionProceedings of the 9th International Conference on Semantic Systems10.1145/2506182.2506191(65-72)Online publication date: 4-Sep-2013
    • (2013)Using Kestrel and XMPP to Support the STAR Experiment in the CloudJournal of Grid Computing10.1007/s10723-013-9253-811:2(249-264)Online publication date: 1-Jun-2013

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media