Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Improving the availability of supercomputer job input data using temporal replication

  • Special Issue Paper
  • Published:
Computer Science - Research and Development

Abstract

Storage systems in supercomputers are a major reason for service interruptions. RAID solutions alone cannot provide sufficient protection as 1) growing average disk recovery times make RAID groups increasingly vulnerable to disk failures during reconstruction, and 2) RAID does not help with higher-level faults such failed I/O nodes.

This paper presents a complementary approach based on the observation that files in the supercomputer scratch space are typically accessed by batch jobs whose execution can be anticipated. Therefore, we propose to transparently, selectively, and temporarily replicate “active” job input data by coordinating the parallel file system with the batch job scheduler. We have implemented the temporal replication scheme in the popular Lustre parallel file system and evaluated it with real-cluster experiments. Our results show that the scheme allows for fast online data reconstruction, with a reasonably low overall space and I/O bandwidth overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bairavasundaram L, Goodson G, Pasupathy S, Schindler J (2007) An analysis of latent sector errors in disk drives. Proceedings of the 2007 ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS ’07), pp 289–300

  2. Bent J, Thain D, Arpaci-Dusseau A, Arpaci-Dusseau R, Livny M (2004) Explicit control in a batch aware distributed file system. Proceedings of the First USENIX/ACM Conference on Networked Systems Design and Implementation

  3. Blake C, Rodrigues R (2003) High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two. Proceedings the 9th Workshop on Hot Topics in Operating Systems (HotOS)

  4. Butt A, Johnson T, Zheng Y, Hu Y (2004) Kosha: A peer-to-peer enhancement for the network file system. Proceedings of Supercomputing

  5. Byers J, Luby M, Mitzenmacher M, Rege A (1998) A digital fountain approach to reliable distribution of bulk data. Proceedings of the ACM SIGCOMM Conference

  6. Lustre File System site. http://wiki.lustre.org. Accessed April 2009

  7. Cohen E, Shenker S (2002) Replication strategies in unstructured peer-to-peer networks. Proceedings of the ACM SIGCOMM Conference

  8. Darling AE, Carey L, Feng WC (2003) The design, implementation, and evaluation of mpiblast. ClusterWorld Conference & Expo and the 4th International Conference on Linux Cluster: The HPC Revolution ’03

  9. Ghemawat S, Gobioff H, Leung S (2003) The Google file system. Proceedings of the 19th Symposium on Operating Systems Principles

  10. Gunawi H, Prabhakaran V, Krishnan S, Arpaci-Dusseau A, Arpaci-Dusseau R (2007) Improving file system reliability with i/o shepherding. Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP’07)

  11. Hsu C, Feng W (2005) A power-aware run-time system for high performance computing. In: Proceedings of Supercomputing 2005 (SC2005): Intl Conference on High Performance Computing, Networking, Storage and Analysis

  12. Jiang W, Hu C, Zhou Y, Kanevsky A (2008) Are disks the dominant contributor for storage failures? A comprehensive study of storage subsystem failure characteristics. Trans Stor 4(3):1–25

    Article  Google Scholar 

  13. Liskov B, Ghemawat S, Gruber R, Johnson P, Shrira L, Williams M (1991) Replication in the Harp file system. Proceedings of 13th ACM Symposium on Operating Systems Principles, pp 226–38. Association for Computing Machinery SIGOPS

  14. Monti H, Butt AR, Vazhkudai SS (2008) Timely Offloading of Result-Data in HPC Centers. Proceedings of 22nd Int’l Conference on Supercomputing ICS’08

  15. Patterson D, Gibson G, Katz R (1988) A case for redundant arrays of inexpensive disks (RAID). Proceedings of the ACM SIGMOD Conference

  16. Plank J, Buchsbaum A, Collins R, Thomason M (2005) Small parity-check erasure codes – exploration and observations. Proceedings of the International Conference on Dependable Systems and Networks

  17. Prabhakaran V, Bairavasundaram LN, Agrawal N, Gunawi abd Andrea HS, Arpaci-Dusseau C, Arpaci-Dusseau RH (2005) Iron file systems. Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP ’05), pp 206–220

  18. Schroeder B, Gibson G (2007) Understanding failure in petascale computers. Proceedings of the SciDAC Conference

  19. Stoica I, Morris R, Karger D, Kaashoek M, Balakrishnan H (2001) Chord: A scalable peer-to-peer lookup service for internet applications. Proceedings of the ACM SIGCOMM Conference

  20. Thomasian A, Fu G, Han C (2007) Performance of two-disk failure-tolerant disk arrays. IEEE Trans Comput 56(6):799–814

    Article  MathSciNet  Google Scholar 

  21. Tian L, Feng D, Jiang H, Zhou K, Zeng L, Chen J, Wang Z, Song Z (2007) Pro: a popularity-based multi-threaded reconstruction optimization for raid-structured storage systems. FAST’07: Proceedings of the 5th conference on USENIX Conference on File and Storage Technologies, pp 32–32. USENIX Association, Berkeley, CA, USA

    Google Scholar 

  22. Top500 supercomputer sites. http://www.top500.org/, June 2007

  23. Vazhkudai S, Ma X, Freeh V, Strickland J, Tammineedi N, Scott S (2005) Freeloader: Scavenging desktop storage resources for bulk, transient data. Proceedings of Supercomputing

  24. Weatherspoon H, Kubiatowicz J (2002) Erasure coding vs. replication: A quantitative comparison. Proceedings of the 1st International Workshop on Peer-to-Peer Systems

  25. Weil S, Brandt S, Miller E, Long D, Maltzahn C (2006) Ceph: A scalable, high-performance distributed file system. Proceedings of the 7th Conference on Operating Systems Design and Implementation (OSDI ’06)

  26. Wylie JJ, Swaminathan R (2007) Determining fault tolerance of xor-based erasure codes efficiently. DSN ’07: Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp 206–215. IEEE Computer Society, Washington, DC, USA

  27. Xin Q, Miller E, Schwarz T (2004) Evaluation of distributed recovery in large-scale storage systems. Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing (HPDC 2004), pp 172–181

  28. Zhang Z, Wang C, Vazhkudai SS, Ma X, Pike G, Cobb J, Mueller F (2007) Optimizing center performance through coordinated data staging, scheduling and recovery. Proceedings of Supercomputing 2007 (SC07): Int’l Conference on High Performance Computing, Networking, Storage and Analysis

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, C., Zhang, Z., Ma, X. et al. Improving the availability of supercomputer job input data using temporal replication . Comp. Sci. Res. Dev. 23, 149–157 (2009). https://doi.org/10.1007/s00450-009-0082-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00450-009-0082-8

Keywords