Improving the availability of supercomputer job input data using temporal replication

Wang, Chao; Zhang, Zhe; Ma, Xiaosong; Vazhkudai, Sudharshan S.; Mueller, Frank

doi:10.1007/s00450-009-0082-8

Improving the availability of supercomputer job input data using temporal replication

Special Issue Paper
Published: 06 May 2009

Volume 23, pages 149–157, (2009)
Cite this article

Computer Science - Research and Development

Chao Wang¹,
Zhe Zhang¹,
Xiaosong Ma¹,
Sudharshan S. Vazhkudai² &
…
Frank Mueller¹

77 Accesses
6 Citations
Explore all metrics

Abstract

Storage systems in supercomputers are a major reason for service interruptions. RAID solutions alone cannot provide sufficient protection as 1) growing average disk recovery times make RAID groups increasingly vulnerable to disk failures during reconstruction, and 2) RAID does not help with higher-level faults such failed I/O nodes.

This paper presents a complementary approach based on the observation that files in the supercomputer scratch space are typically accessed by batch jobs whose execution can be anticipated. Therefore, we propose to transparently, selectively, and temporarily replicate “active” job input data by coordinating the parallel file system with the batch job scheduler. We have implemented the temporal replication scheme in the popular Lustre parallel file system and evaluated it with real-cluster experiments. Our results show that the scheme allows for fast online data reconstruction, with a reasonably low overall space and I/O bandwidth overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GekkoFS — A Temporary Burst Buffer File System for HPC Applications

Article 17 January 2020

System Software for Data-Intensive Science

StorAlloc: A Simulator for Job Scheduling on Heterogeneous Storage Resources

References

Bairavasundaram L, Goodson G, Pasupathy S, Schindler J (2007) An analysis of latent sector errors in disk drives. Proceedings of the 2007 ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS ’07), pp 289–300
Bent J, Thain D, Arpaci-Dusseau A, Arpaci-Dusseau R, Livny M (2004) Explicit control in a batch aware distributed file system. Proceedings of the First USENIX/ACM Conference on Networked Systems Design and Implementation
Blake C, Rodrigues R (2003) High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two. Proceedings the 9th Workshop on Hot Topics in Operating Systems (HotOS)
Butt A, Johnson T, Zheng Y, Hu Y (2004) Kosha: A peer-to-peer enhancement for the network file system. Proceedings of Supercomputing
Byers J, Luby M, Mitzenmacher M, Rege A (1998) A digital fountain approach to reliable distribution of bulk data. Proceedings of the ACM SIGCOMM Conference
Lustre File System site. http://wiki.lustre.org. Accessed April 2009
Cohen E, Shenker S (2002) Replication strategies in unstructured peer-to-peer networks. Proceedings of the ACM SIGCOMM Conference
Darling AE, Carey L, Feng WC (2003) The design, implementation, and evaluation of mpiblast. ClusterWorld Conference & Expo and the 4th International Conference on Linux Cluster: The HPC Revolution ’03
Ghemawat S, Gobioff H, Leung S (2003) The Google file system. Proceedings of the 19th Symposium on Operating Systems Principles
Gunawi H, Prabhakaran V, Krishnan S, Arpaci-Dusseau A, Arpaci-Dusseau R (2007) Improving file system reliability with i/o shepherding. Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP’07)
Hsu C, Feng W (2005) A power-aware run-time system for high performance computing. In: Proceedings of Supercomputing 2005 (SC2005): Intl Conference on High Performance Computing, Networking, Storage and Analysis
Jiang W, Hu C, Zhou Y, Kanevsky A (2008) Are disks the dominant contributor for storage failures? A comprehensive study of storage subsystem failure characteristics. Trans Stor 4(3):1–25
Article Google Scholar
Liskov B, Ghemawat S, Gruber R, Johnson P, Shrira L, Williams M (1991) Replication in the Harp file system. Proceedings of 13th ACM Symposium on Operating Systems Principles, pp 226–38. Association for Computing Machinery SIGOPS
Monti H, Butt AR, Vazhkudai SS (2008) Timely Offloading of Result-Data in HPC Centers. Proceedings of 22nd Int’l Conference on Supercomputing ICS’08
Patterson D, Gibson G, Katz R (1988) A case for redundant arrays of inexpensive disks (RAID). Proceedings of the ACM SIGMOD Conference
Plank J, Buchsbaum A, Collins R, Thomason M (2005) Small parity-check erasure codes – exploration and observations. Proceedings of the International Conference on Dependable Systems and Networks
Prabhakaran V, Bairavasundaram LN, Agrawal N, Gunawi abd Andrea HS, Arpaci-Dusseau C, Arpaci-Dusseau RH (2005) Iron file systems. Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP ’05), pp 206–220
Schroeder B, Gibson G (2007) Understanding failure in petascale computers. Proceedings of the SciDAC Conference
Stoica I, Morris R, Karger D, Kaashoek M, Balakrishnan H (2001) Chord: A scalable peer-to-peer lookup service for internet applications. Proceedings of the ACM SIGCOMM Conference
Thomasian A, Fu G, Han C (2007) Performance of two-disk failure-tolerant disk arrays. IEEE Trans Comput 56(6):799–814
Article MathSciNet Google Scholar
Tian L, Feng D, Jiang H, Zhou K, Zeng L, Chen J, Wang Z, Song Z (2007) Pro: a popularity-based multi-threaded reconstruction optimization for raid-structured storage systems. FAST’07: Proceedings of the 5th conference on USENIX Conference on File and Storage Technologies, pp 32–32. USENIX Association, Berkeley, CA, USA
Google Scholar
Top500 supercomputer sites. http://www.top500.org/, June 2007
Vazhkudai S, Ma X, Freeh V, Strickland J, Tammineedi N, Scott S (2005) Freeloader: Scavenging desktop storage resources for bulk, transient data. Proceedings of Supercomputing
Weatherspoon H, Kubiatowicz J (2002) Erasure coding vs. replication: A quantitative comparison. Proceedings of the 1st International Workshop on Peer-to-Peer Systems
Weil S, Brandt S, Miller E, Long D, Maltzahn C (2006) Ceph: A scalable, high-performance distributed file system. Proceedings of the 7th Conference on Operating Systems Design and Implementation (OSDI ’06)
Wylie JJ, Swaminathan R (2007) Determining fault tolerance of xor-based erasure codes efficiently. DSN ’07: Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp 206–215. IEEE Computer Society, Washington, DC, USA
Xin Q, Miller E, Schwarz T (2004) Evaluation of distributed recovery in large-scale storage systems. Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing (HPDC 2004), pp 172–181
Zhang Z, Wang C, Vazhkudai SS, Ma X, Pike G, Cobb J, Mueller F (2007) Optimizing center performance through coordinated data staging, scheduling and recovery. Proceedings of Supercomputing 2007 (SC07): Int’l Conference on High Performance Computing, Networking, Storage and Analysis

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, North Carolina State University, 890 Oval Drive EBII, 27695, Raleigh, NC, USA
Chao Wang, Zhe Zhang, Xiaosong Ma & Frank Mueller
Computer Science and Mathematics Division, ORNL, One Bethel Valley Road, PO Box 2008 MS6016, 37831, Oak Ridge, TN, USA
Sudharshan S. Vazhkudai

Authors

Chao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaosong Ma
View author publications
You can also search for this author in PubMed Google Scholar
Sudharshan S. Vazhkudai
View author publications
You can also search for this author in PubMed Google Scholar
Frank Mueller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, C., Zhang, Z., Ma, X. et al. Improving the availability of supercomputer job input data using temporal replication . Comp. Sci. Res. Dev. 23, 149–157 (2009). https://doi.org/10.1007/s00450-009-0082-8

Download citation

Published: 06 May 2009
Issue Date: June 2009
DOI: https://doi.org/10.1007/s00450-009-0082-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving the availability of supercomputer job input data using temporal replication

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

GekkoFS — A Temporary Burst Buffer File System for HPC Applications

System Software for Data-Intensive Science

StorAlloc: A Simulator for Job Scheduling on Heterogeneous Storage Resources

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Improving the availability of supercomputer job input data using temporal replication

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

GekkoFS — A Temporary Burst Buffer File System for HPC Applications

System Software for Data-Intensive Science

StorAlloc: A Simulator for Job Scheduling on Heterogeneous Storage Resources

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation