Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Adapting scientific workflow structures using multi-objective optimization strategies

Published: 19 April 2013 Publication History

Abstract

Scientific workflows have become the primary mechanism for conducting analyses on distributed computing infrastructures such as grids and clouds. In recent years, the focus of optimization within scientific workflows has primarily been on computational tasks and workflow makespan. However, as workflow-based analysis becomes ever more data intensive, data optimization is becoming a prime concern. Moreover, scientific workflows can scale along several dimensions: (i) number of computational tasks, (ii) heterogeneity of computational resources, and the (iii) size and type (static versus streamed) of data involved. Adapting workflow structure in response to these scalability challenges remains an important research objective. Understanding how a workflow graph can be restructured in an automated manner (through task merge, for instance), to address constraints of a particular execution environment is explored in this work, using a multi-objective evolutionary approach. Our approach attempts to adapt the workflow structure to achieve both compute and data optimization. The question of when to terminate the evolutionary search in order to conserve computations is tackled with a novel termination criterion. The results presented in this article demonstrate the feasibility of the termination criterion and demonstrate that significant optimization can be achieved with a multi-objective approach.

References

[1]
Abramowitz, M. and Stegun, I. 1972. Stirling numbers of the second kind. In Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover Publications, 824--825.
[2]
Ad-Dabbagh, Y., Einarson, D., Lyttelton, O., Muehlboeck, J.-S., Mok, K., Ivanov, O., Vincent, R. D., Lepage, C., Lerch, J., Fombonne, E., and Evans, A. C. 2006. The civet image-processing environment: A fully automated comprehensive pipeline for anatomical neuroimaging research. In Proceedings of the 12th Annual Meeting of the Organization for Human Brain Mapping (OHBM).
[3]
Bader, J. M. 2010. Hypervolume-Based Search for Multiobjective Optimization: Theory and Methods. CreateSpace Independent Publishing Platform, Paramount, CA.
[4]
Barga, R. and Gannon, D. 2007. Scientific versus business workflows. In Workflows for e-Science, I. J. Taylor, E. Deelman, D. B. Gannon, and M. Shields, Eds., Springer, 9--16.
[5]
Beume, N., Naujoks, B., and Emmerich, M. 2007. SMS-EMOA: Multiobjective selection based on dominated hypervolume. Euro. J. Oper. Res. 181, 3, 1653--1669.
[6]
Brockhoff, D. and Zitzler, E. 2007. Improving hypervolume-based multiobjective evolutionary algorithms by using objective reduction methods. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC'07). IEEE, 2086--2093.
[7]
Casanova, H., Legrand, A., and Quinson, M. 2008. SimGrid: A generic framework for large-scale distributed experiments. In Proceedings of the 10th International Conference on Computer Modeling and Simulation. IEEE Computer Society, Los Alamitos, CA, 126--131.
[8]
Chervenak, A. L. and Schuler, R. 2007. A data placement service for petascale applications. In Proceedings of the 2nd International Workshop on Petascale Data Storage (PDSW'07). ACM Press, New York, 63--68.
[9]
Deb, K., Agrawal, S., Pratap, A., and Meyarivan, T. 2000. A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In Parallel Problem Solving from Nature PPSN VI, M. Schoenauer, K. Deb, G. Rudolph, X. Yao, E. Lutton, J. Merelo, and H.-P. Schwefel, Eds. Lecture Notes in Computer Science Series, vol. 1917, Springer, 849--858.
[10]
Deelman, E. 2007. Looking into the future of workflows: The challenges ahead. In Workflows for e-Science, I. J. Taylor, E. Deelman, D. B. Gannon, and M. Shields, Eds, Springer, 475--481.
[11]
Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Blackburn, K., Lazzarini, A., Arbree, A., Cavanaugh, R., and Koranda, S. 2003. Mapping abstract complex workflows onto grid environments. J. Grid Comput. 1, 1, 25--39.
[12]
Deelman, E., Gannon, D., Shields, M., and Taylor, I. 2009. Workflows and e-science: An overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25, 5, 528--540.
[13]
Deelman, E., Singh, G., Hui Su, M., Blythe, J., Gil, A., Kesselman, C., Mehta, G., Vahi, K., Berriman, G. B., Good, J., Laity, A., Jacob, J. C., and Katz, D. S. 2005. Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Sci. Program. J. 13, 219--237.
[14]
Durillo, J. J., Nebro, A. J., and Alba, E. 2010. The jmetal framework for multi-objective optimization: Design and architecture. In Proceedings of the Congress on Evolutionary Computation (CEC'10). Lecture Notes in Computer Science Series, vol. 5467, Springer, 4138--4325.
[15]
Ghosh, S., Das, S., Vasilakos, A., and Suresh, K. 2012. On convergence of differential evolution over a class of continuous functions with unique global optimum. IEEE Trans. Syst. Man Cybernet. Part B Cybernet. 42, 1, 107--124.
[16]
Gil, Y., Deelman, E., Ellisman, M., Fahringer, T., and Fox, G. 2007. Examining the challenges of scientific workflows. Comput. 40, 12, 24--35.
[17]
Goldberg, D. E., Deb, K., and Clark, J. H. 1991. Genetic algorithms, noise, and the sizing of populations. Complex Syst. 6, 333--362.
[18]
Graphviz. 2010. DOT language. http://www.graphviz.org/doc/info/lang.html.
[19]
Grefenstette, J. 1986. Optimization of control parameters for genetic algorithms. IEEE Trans. Syst. Man Cybern. 16, 1, 122--128.
[20]
Guerrero, J. L., Garcia, J., Marti, L., Molina, J. M., and Berlanga, A. 2009. A stopping criterion based on kalman estimation techniques with several progress indicators. In Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation (GECCO'09). ACM Press, New York, 587--594.
[21]
Habib, I., Anjum, A., Bloodsworth, P., and Mcclatchey, R. 2009. Neuroimaging analysis using grid aware planning and optimisation techniques. In Proceedings of the 5th IEEE International Conference on E-Science Workshops. IEEE Computer Society, 102--109.
[22]
Huang, S. and Zhu, Y. 2009. NSGA-II based grid task scheduling with multi-qos constraint. In Proceedings of the 3rd International Conference on Genetic and Evolutionary Computing (WGEC'09). IEEE Computer Society, Los Alamitos, CA, 306--308.
[23]
Huband, S., Hingston, P., Barone, L., and While, L. 2006. A review of multiobjective test problems and a scalable test problem toolkit. IEEE Trans. Evolut. Comput. 10, 5, 477--506.
[24]
Jack Jr., C. R., Bernstein, M. A., Fox, N., and Thompson, P. 2008. The alzheimer's disease neuroimaging initiative (ADNI): MRI methods. J. Magn. Reson. Imag. 27, 4, 685--91.
[25]
Kosar, T. and Livny, M. 2004. Stork: Making data placement a first class citizen in the grid. In Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04). IEEE Computer Society, Los Alamitos, CA, 342--349.
[26]
Krauter, K., Buyya, R., and Maheswaran, M. 2002. A taxonomy and survey of grid resource management systems for distributed computing. Softw. Pract. Exper. 32, 2, 135--164.
[27]
Lingrand, D., Montagnat, J., and Glatard, T. 2008. Modeling the latency on production grids with respect to the execution context. In Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGRID'08). IEEE Computer Society, Los Alamitos, CA, 753--758.
[28]
Marti, L., Garcia, J., Berlanga, A., and Molina, J. M. 2007. A cumulative evidential stopping criterion for multiobjective optimization evolutionary algorithms. In Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation (GECCO'07). ACM Press, New York, 911--911.
[29]
Nerieri, F., Prodan, R., Fahringer, T., and Truong, H.-L. 2006. Overhead analysis of grid workflow applications. In Proceedings of the 7th IEEE/ACM International Conference on Grid Computing (GRID'06). IEEE Computer Society, Los Alamitos, CA, 17--24.
[30]
Park, S.-M. and Humphrey, M. Data throttling for data-intensive workflows. In Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing (IPDPS). IEEE Computer Society.
[31]
Prodan, R. 2007. Specification and runtime workflow support in the askalon grid environment. Sci. Program. 15, 4, 193--211.
[32]
Purshouse, R. C. 2003. On the evolutionary optimisation of many objectives. Ph.D. thesis, Department of Automatic Control and Systems Engineering, The University of Sheffield, UK.
[33]
Raicu, I., Zhao, Y., Dumitrescu, C., Foster, I., and Wilde, M. 2007. Falkon: A fast and light-weight tasK executiON framework. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC'07). ACM Press, New York, 43:1--43:12.
[34]
Raicu, I., Zhao, Y., Foster, I. T., and Szalay, A. 2008. Accelerating large-scale data exploration through data diffusion. In Proceedings of the International Workshop on Data-Aware Distributed Computing (DADC'08). ACM Press, New York, 9--18.
[35]
Ramakrishnan, L. and Plale, B. 2010. A multi-dimensional classification model for scientific workflow characteristics. In Proceedings of the 1st International Workshop on Workflow Approaches to New Data-Centric Science (Wands'10). ACM Press, New York, 1--12.
[36]
Redolfi, A., Mcclatchey, R., Anjum, A., Zijdenbos, A., Manset, D., Barkhof, F., Spenger, C., Legre, Y., Wahlund, L.-O., Barattieri, C., and Frisoni, G. B. 2009. Grid infrastructures for computational neuroscience: The neugrid example. Future Neurol. 4, 6, 703--722.
[37]
Reeves, C. R. 1993. Using genetic algorithms with small populations. In Proceedings of the 5th International Conference on Genetic Algorithms. Morgan Kaufmann, San Francisco, CA, 92--99.
[38]
Rudenko, O. and Schoenauer, M. 2004. A steady performance stopping criterion for pareto-based evolutionary algorithm. In Proceedings of the 6th International Multi-Objective Programming and Goal Programming Conference.
[39]
Schaffer, J. D., Caruana, R. A., Eshelman, L. J., and Das, R. 1989. A study of control parameters affecting online performance of genetic algorithms for function optimization. In Proceedings of the 3rd International Conference on Genetic Algorithms. Morgan Kaufmann, San Francisco, CA, 51--60.
[40]
Singh, G., Su, M.-H., Vahi, K., Deelman, E., Berriman, B., Good, J., Katz, D. S., and Mehta, G. 2008. Workflow task clustering for best effort systems with pegasus. In Proceedings of the 15th ACM Mardi Gras Conference (MG'08). ACM Press, New York, 9:1--9:8.
[41]
Talbi, E.-G. 2009. Metaheuristics: From Design to Implementation. Wiley Publishing.
[42]
Trautmann, H., Ligges, U., Mehnen, J., and PREUSS, M. 2008. A convergence criterion for multiobjective evolutionary algorithms based on systematic statistical testing. In Proceedings of the 10th International Conference on Parallel Problem Solving from Nature (PPSN'08). Springer, 825--836.
[43]
Veldhuizen, D. A. V. and Lamont, G. B. 1998. Multiobjective evolutionary algorithm research: A history and analysis. Tech. rep. TR-98-03. Department of Electrical and Computer Engineering, Air Force Institute of Technology, Ohio.
[44]
Von Laszewski, G. and Hategan, M. 2005. Workflow concepts of the java cog kit. J. Grid Comput. 3, 239--258.
[45]
Xue, F., Sanderson, A., and Graves, R. 2003. Pareto-Based multi-objective differential evolution. In Proceedings of the Congress on Evolutionary Computation (CEC'03).Vol. 2. IEEE Press, 862--869.
[46]
Zhao, Y., Hategan, M., Clifford, B., Foster, I., Von Laszewski, G., Nefedova, V., Raicu, I., Stefpraun, T., and Wilde, M. 2007. Swift: Fast, reliable, loosely coupled parallel computation. In Proceedings of the Congress on Services. IEEE Computer Society, 199--206.
[47]
Zitzler, E., Deb, K., and Thiele, L. 2000. Comparison of multiobjective evolutionary algorithms: Empirical results. Evol. Comput. 8, 2, 173--195.
[48]
Zitzler, E. and Kunzli, S. 2004. Indicator-Based selection in multiobjective search. In Proceedings of the 8th International Conference on Parallel Problem Solving from Nature (PPSN'04). Springer, 832--842.
[49]
Zitzler, E., Laumanns, M., and Thiele, L. 2001. SPEA2: Improving the strength pareto evolutionary algorithm. Tech. rep., ETH Zurich.
[50]
Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C. M., and Fonseca, V. G. D. 2003. Performance assessment of multiobjective optimizers: An analysis and review. IEEE Trans. Evol. Comput. 7, 2, 117--132.

Cited By

View all
  • (2024)Optimizing Data Analytics Workflows through User-driven ExperimentationProceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI10.1145/3644815.3644971(253-255)Online publication date: 14-Apr-2024
  • (2024)Responsible composition and optimization of integration processes under correctness preserving guaranteesInformation Systems10.1016/j.is.2024.102400(102400)Online publication date: Apr-2024
  • (2024)In Silico Evaluation and Prediction of Pesticide Supported by Reproducible Evolutionary WorkflowsOptimization Under Uncertainty in Sustainable Agriculture and Agrifood Industry10.1007/978-3-031-49740-7_6(135-159)Online publication date: 29-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Autonomous and Adaptive Systems
ACM Transactions on Autonomous and Adaptive Systems  Volume 8, Issue 1
April 2013
126 pages
ISSN:1556-4665
EISSN:1556-4703
DOI:10.1145/2451248
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 April 2013
Accepted: 01 August 2012
Revised: 01 August 2012
Received: 01 November 2011
Published in TAAS Volume 8, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Multi-objective optimization
  2. evolutionary computing
  3. hypervolume
  4. scientific workflows
  5. termination criteria
  6. workflow planning

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Optimizing Data Analytics Workflows through User-driven ExperimentationProceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI10.1145/3644815.3644971(253-255)Online publication date: 14-Apr-2024
  • (2024)Responsible composition and optimization of integration processes under correctness preserving guaranteesInformation Systems10.1016/j.is.2024.102400(102400)Online publication date: Apr-2024
  • (2024)In Silico Evaluation and Prediction of Pesticide Supported by Reproducible Evolutionary WorkflowsOptimization Under Uncertainty in Sustainable Agriculture and Agrifood Industry10.1007/978-3-031-49740-7_6(135-159)Online publication date: 29-Mar-2024
  • (2022)Cost-aware process modeling in multicloudsInformation Systems10.1016/j.is.2021.101969108(101969)Online publication date: Sep-2022
  • (2022)A literature review on optimization techniques for adaptation planning in adaptive systems: State of the art and research directionsInformation and Software Technology10.1016/j.infsof.2022.106940149(106940)Online publication date: Sep-2022
  • (2021)Video Analytics Framework for Human Action RecognitionComputers, Materials & Continua10.32604/cmc.2021.01686468:3(3841-3859)Online publication date: 2021
  • (2021)Prediction and Modelling of Traffic Flow of Human-driven Vehicles at a Signalized Road Intersection Using Artificial Neural Network Model: A South Africa Road Transportation System ScenarioTransportation Engineering10.1016/j.treng.2021.100095(100095)Online publication date: Sep-2021
  • (2020)Large-scale Data Integration Using Graph Probabilistic Dependencies (GPDs)2020 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT)10.1109/BDCAT50828.2020.00028(27-36)Online publication date: Dec-2020
  • (2019)Intelligent Price Alert System for Digital Assets - CryptocurrenciesProceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing Companion10.1145/3368235.3368874(109-115)Online publication date: 2-Dec-2019
  • (2019)High Performance Dynamic Graph Model for Consistent Data IntegrationProceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing10.1145/3344341.3368806(263-272)Online publication date: 2-Dec-2019
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media