Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1851476.1851541acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Comparison of resource platform selection approaches for scientific workflows

Published: 21 June 2010 Publication History

Abstract

Cloud computing is increasingly considered as an additional computational resource platform for scientific workflows. The cloud offers opportunity to scale-out applications from desktops and local cluster resources. Each platform has different properties (e.g., queue wait times in high performance systems, virtual machine startup overhead in clouds) and characteristics (e.g., custom environments in cloud) that makes choosing from these diverse resource platforms for a workflow execution a challenge for scientists. Scientists are often faced with deciding resource platform selection trade-offs with limited information on the actual workflows. While many workflow planning methods have explored resource selection or task scheduling, these methods often require fine-scale characterization of the workflow that is onerous for a scientist. In this paper, we describe our early exploratory work in using blackbox characteristics for a cost-benefit analysis of using different resource platforms. In our blackbox method, we use only limited high-level information on the workflow length, width, and data sizes. The length and width are indicative of the workflow duration and parallelism. We compare the effectiveness of this approach to other resource selection models using two exemplar scientific workflows on desktop, local cluster, HPC center, and cloud platforms. Early results suggest that the blackbox model often makes the same resource selections as a more fine-grained whitebox model. We believe the simplicity of the blackbox model can help inform a scientist on the applicability of a new resource platform, such as cloud resources, even before porting an existing workflow.

References

[1]
}}Listgarten, J. 2010. Correction for Hidden Confounders in the Genetic Analysis of Gene Expression. (In submission)
[2]
}}Ramakrishnan, L. and Gannon, D. 2008. A Survey of Distributed Workflow Characteristics and Resource Requirements. Technical Report TR671, Indiana University.
[3]
}}Nurmi, D., Wolski, R., Grzegorczyk, C., Obertelli, G., Soman, S., Youseff, L., and Zagorodnov, D. 2009. The Eucalyptus Open-Source Cloud-Computing System. In IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid).
[4]
}}Simmhan, Y., van Ingen, C., Subramanian, G., and Li, J. 2010. Bridging the Gap between Desktop and the Cloud for eScience Applications. (In Submission)
[5]
}}https://portal.teragrid.org/hpc-queue-prediction, visited on 15th April, 2010.
[6]
}}Ramakrishnan, L., Koelbel, C., Kee, Y., Wolski, R., Nurmi, D., Gannon, D., Obertelli, G., YarKhan, A., Mandal, A., Huang, T. M., Thyagaraja, K., and Zagorodnov, D. 2009. VGrADS: Enabling e-Science Workflows on Grids and Clouds with Fault tolerance. In Conference on High Performance Computing Networking, Storage and Analysis (SC).
[7]
}}Tilson, J. L., Blatecky, A., Rendon, G., Ger, M., and Jakobsson, E. 2007. MotifNetwork: Genome-Wide Domain Analysis using Grid-enabled Workflows, In IEEE International Conference Bioinformatics and Bioengineering (BIBE).
[8]
}}Deelman, E., Singh, G., Livny, M., Berriman, B., and Good, J. 2008. The Cost of Doing Science on the Cloud. In ACM/IEEE Conference on Supercomputing (SC).
[9]
}}Sakellariou, R., Zhao, H., Tsiakkouri E., and Dikaiakos, M. 2007. Scheduling Workflows with Budget Constraints. Integrated Research in GRID Computing, pages 189--202, S. Gorlatch and M. Danelutto, Ed., Springer.
[10]
}}Raicu, I., Zhao, Y., Dumitrescu, C., Foster, I., and Wilde, M. 2007. Falkon: A Fast and Light-weight tasK executiON framework. In ACM/IEEE Conference on Supercomputing (SC).
[11]
}}Wieczorek, M., Hoheisel, A., and Prodan, R. 2008. Taxonomies of the Multi-Criteria Grid Workflow Scheduling Problem. Grid Middleware and Services, pages 237--264, D. Talia, R. Yahyapour and W. Ziegler, Ed., Springer.
[12]
}}Mandal, A., Kennedy, K., Koelbel, C., Marin, G., Mellor-Crummey, J., Liu, B., and Johnsson, L. 2005. Scheduling Strategies for Mapping Application Workflows onto the Grid. In IEEE International Symposium High Performance Distributed Computing (HPDC).
[13]
}}Kennedy, K., Mazina, M., Mellor-Crummey, J. M., Cooper, K. D., Torczon, L., Berman, F., Chien, A. A., Dail, H., Sievert, O., Angulo, D., Foster, I. T., Aydt, R. A., Reed, D. A., Gannon, D., Johnsson, S. L., Kesselman, C., Dongarra, J., Vadhiyar, S. S., and Wolski, R. 2002. Toward a Framework for Preparing and Executing Adaptive Grid Programs. In International Parallel and Distributed Processing Symposium (IPDPS).
[14]
}}Brevik, J., Nurmi, D., and Wolski, R. 2006. Predicting Bounds on Queuing Delay for Batch-scheduled Parallel Machines. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).
[15]
}}Blythe, J., Jain, S., Deelman, E., Gil, Y., Vahi, K., Mandal, A., and Kennedy, K. 2005. Task Scheduling Strategies for Workflow-based Applications in Grids. In IEEE International Symposium on Cluster Computing and the Grid (CCGRID).
[16]
}}Nurmi, D., Mandal, A., Brevik, J., Koelbel, C., Wolski, R., and Kennedy, K. 2006. Evaluation of a Workflow Scheduler using Integrated Performance Modelling and Batch Queue Wait Time Prediction. In ACM/IEEE Conference on Supercomputing (SC).
[17]
}}Simmhan, Y., Barga, R., van Ingen, C., Lazowska, E., and Szalay, A. 2009. Building the Trident Scientific Workflow Workbench for Data Management in the Cloud. In International Conference on Advanced Engineering Computing and Applications in Sciences (ADVCOMP).
[18]
}}Lee, K., Paton, N. W., Sakellariou, R., Deelman, E., Fernandes, A. A., and Mehta, G. 2008. Adaptive Workflow Processing and Execution in Pegasus. In International Workshop on Workflow Management and Applications in Grid Environments (WaGe).
[19]
}}Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., and Zaharia, M. 2010. A View of Cloud Computing. Communications of the ACM 53, 4 (Apr. 2010).

Cited By

View all
  • (2019)A Flexible and Distributed Runtime System for High-Throughput Constrained Data Streams Generation2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2019.00120(718-728)Online publication date: May-2019
  • (2019)A cost-effective cloud computing framework for accelerating multimedia communication simulationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2012.06.00572:10(1373-1385)Online publication date: 4-Jan-2019
  • (2018)Profiling e-Science infrastructures with kernel and application benchmarksInternational Journal of Grid and Utility Computing10.1504/IJGUC.2014.0602045:2(123-134)Online publication date: 16-Dec-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
June 2010
911 pages
ISBN:9781605589428
DOI:10.1145/1851476
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud computing
  2. eScience
  3. high performance computing
  4. resource management
  5. workflow patterns
  6. workflow structure

Qualifiers

  • Research-article

Funding Sources

Conference

HPDC '10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2019)A Flexible and Distributed Runtime System for High-Throughput Constrained Data Streams Generation2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2019.00120(718-728)Online publication date: May-2019
  • (2019)A cost-effective cloud computing framework for accelerating multimedia communication simulationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2012.06.00572:10(1373-1385)Online publication date: 4-Jan-2019
  • (2018)Profiling e-Science infrastructures with kernel and application benchmarksInternational Journal of Grid and Utility Computing10.1504/IJGUC.2014.0602045:2(123-134)Online publication date: 16-Dec-2018
  • (2016)Integrating Abstractions to Enhance the Execution of Distributed Applications2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2016.64(953-962)Online publication date: May-2016
  • (2014)Using a Cloud Computing Environment to Process Large 3D Spatial DatasetsBig Data10.1201/b16524-4(55-68)Online publication date: 22-Jan-2014
  • (2012)Efficiency-Aware Jobs Allocation for e-Science EnvironmentsProceedings of the 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing10.1109/PDP.2012.17(162-169)Online publication date: 15-Feb-2012
  • (2012)Moving Multimedia Simulations into the CloudProceedings of the 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications10.1109/ISPA.2012.13(32-39)Online publication date: 10-Jul-2012
  • (2012)LiDAR data reduction using vertex decimation and processing with GPGPU and multicore CPU technologyComputers & Geosciences10.1016/j.cageo.2011.09.01343(118-125)Online publication date: 1-Jun-2012
  • (2011)Towards Composing Data Aware Systems Biology Workflows on Cloud PlatformsProceedings of the 2011 IEEE World Congress on Services10.1109/SERVICES.2011.22(184-191)Online publication date: 4-Jul-2011
  • (2011)Influences between performance based scheduling and service level agreementsProceedings of the 2011 international conference on Parallel Processing - Volume 210.1007/978-3-642-29740-3_12(96-105)Online publication date: 29-Aug-2011

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media