Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2755644.2755646acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

On Performance Resilient Scheduling for Scientific Workflows in HPC Systems with Constrained Storage Resources

Published: 16 June 2015 Publication History

Abstract

Although the storage capacity is rapidly increasing, the size of datasets is also ever-growing, especially for those workflows in HPC that perform the parameter sweep studies. Consequently, the deadlock caused by the storage competition between concurrent workflow instances is still a major pragmatic concern and storage management remains important for high performance and throughput computing. In practice, there are various ways to this issue, ranging from admission control to deadlock resolution. Despite being a simple solution, the admission control is conservative and not space efficient to storage utilization. Therefore, in this paper, we study the performance of the deadlock resolution approach by proposing a resource allocation algorithm which is performance resilient to the workflows characterized by different features. The algorithm is designed based on our previous result, called DDS, which takes advantages of the dataflow information of the workflow to resolve deadlock based on detection&recovery principle. We improve DDS to allow it to not only resolve the deadlock but also overcome the performance anomaly, a not yet investigated problem in our previous studies. We thus called the improved algorithm performance-resilience algorithm, denoted as DDS+. The studies in this paper can be viewed as a follow-up research on DDS and show the performance behavior of the improved algorithm in various conditions. Therefore, the results in this paper are more useful to adapt DDS+ to the workflows with different characteristics in reality while keeping the performance stable.

References

[1]
Panorama, 2014. https://pegasus.isi.edu/projects/panorama.
[2]
Sextractor, 2014. http://www.astromatic.net/software/sextractor.
[3]
B. Barish and R. Weiss. Ligo and the detection of gravitational waves. Physics Today, 52, 1999.
[4]
J. Bent, T. E. Denehy, M. Livny, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Data-Driven Batch Scheduling. In Data-Aware Distributed Computing 2009 (DADC09), Munich, Germany, jun 2009.
[5]
J. Bent, D. Thain, A. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and M. Livny. Explicit control in a batch-aware distributed file system. In Proceedings of Networked Systems Design and Implementation (NSDI), pages 365--378, San Francisco, California, USA, 2004.
[6]
W. Chen and E. Deelman. Integration of workflow partitioning and resource provisioning. In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), CCGRID '12, pages 764--768, 2012.
[7]
E. Deelman and A. Chervenak. Data management challenges of data-intensive scientific workflows. In Cluster Computing and the Grid, 2008. CCGRID '08. 8th IEEE International Symposium on, pages 687--692, May 2008.
[8]
S. Djorgovski, R. Gal, S. Odewahn, R. de Carvalho, R. Brunner, G. Longo, and R. Scaramella. The digital palomar sky survey (dposs). Wide Field Surveys in Cosmology, 1:10--20, 1998.
[9]
P. Gburzynski. SMURPH, http://www.olsonet.com/pg/PAPERS/side.pdf, access date: Oct. 2, 2012.
[10]
T. Glatard, J. Montagnat, and X. Pennec. Grid-enabled workflows for data intensive medical applications. In 18th IEEE Symposium on Computer-Based Medical Systems, pages 537--542, Trinity College Dublin, Ireland, 2005.
[11]
J. Gray, D. Liu, M. Nieto-Santisteban, A. S. Szalay, D. DeWitt, and G. Heber. Scientific data management in the coming decade. Technical Report MSR-TR-2005-10, Microsoft Corporation, 2005.
[12]
D. Gunter, E. Deelman, T. Samak, C. Brooks, M. Goode, G. Juve, G. Mehta, P. Moraes, F. Silva, M. Swany, and K. Vahi. Online workflow management and performance analysis with stampede. In Network and Service Management (CNSM), 2011 7th International Conference on, pages 1--10, Oct 2011.
[13]
K. Knight and D. Marcu. Machine translation in the year 2004. In Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pages 965--968, 2005.
[14]
P. Maechling, H. Chalupsky, M. Dougherty, E. Deelman, Y. Gil, S. Gullapalli, V. Gupta, C. Kesselman, J. Kim, G. Mehta, B. Mendenhall, T. Russ, G. Singh, M. Spraragen, G. Staples, and K. Vahi. Simplifying construction of complex workflows for non-expert users of the southern california earthquake center community modeling environment. SIGMOD Rec., 34(3):24--30, sep 2005.
[15]
S. Pandey and R. Buyya. Scheduling of scientific workflows on data grids. In Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid, pages 548--553, Washington, DC, USA, 2008.
[16]
A. Ramakrishnan, G. Singh, H. Zhao, E. Deelman, R. Sakellariou, K. Vahi, K. Blackburn, D. Mayers, and M. Samidi. Scheduling data-intensive workflows onto storage-constrained distributed resources. In Proceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid, pages 401--409, 2007.
[17]
A. Rosenberg. On scheduling mesh-structured computations for internet-based computing. IEEE Transactions on Computers, 53(9):1176--1186, September 2004.
[18]
T. Samak, D. Gunter, M. Goode, E. Deelman, G. Juve, G. Mehta, F. Silva, and K. Vahi. Online fault and anomaly detection for large-scale scientific workflows. In High Performance Computing and Communications (HPCC), 2011 IEEE 13th International Conference on, pages 373--381, Sept 2011.
[19]
G. Singh, K. Vahi, A. Ramakrishnan, G. Mehta, E. Deelman, H. Zhao, R. Sakellariou, K. Blackburn, D. Brown, S. Fairhurst, D. Meyers, G. B. Berriman, J. Good, and D. S. Katz. Optimizing workflow data footprint. Sci. Program., 15(4):249--268, Dec. 2007.
[20]
Y. Wang, M. Hu, and K. Kent. ACS: an effective admission control scheme with deadlock resolutions for workflow scheduling in clouds. Computing, pages 1--24, 2014.
[21]
Y. Wang and P. Lu. DDS: A deadlock detection-based scheduling algorithm for workflow computations in hpc systems with storage constraints. Parallel Comput., 39(8):291--305, Aug. 2013.
[22]
Y. Wang and P. Lu. Maximizing active storage resources with deadlock avoidance in workflow-based computations. IEEE Transactions on Computers, 62(11):2210--2223, 2013. WaFS: A workflow-aware file system for effective storage utilization in the cloud. IEEE Transactions on Computers, PP(99):1--1, 2014.

Cited By

View all
  • (2024)Scrutinizing Variables for Checkpoint Using Automatic DifferentiationSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00056(372-379)Online publication date: 17-Nov-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ScienceCloud '15: Proceedings of the 6th Workshop on Scientific Cloud Computing
June 2015
46 pages
ISBN:9781450335706
DOI:10.1145/2755644
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 June 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deadlock detection
  2. performance anomaly
  3. performance resilience
  4. storage constraint
  5. workflow management
  6. workflow scheduling

Qualifiers

  • Research-article

Conference

HPDC'15
Sponsor:

Acceptance Rates

ScienceCloud '15 Paper Acceptance Rate 3 of 6 submissions, 50%;
Overall Acceptance Rate 44 of 151 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Scrutinizing Variables for Checkpoint Using Automatic DifferentiationSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00056(372-379)Online publication date: 17-Nov-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media