Abstract
The rapid progress in mass storage technology has made it possible for designers to implement large data storage systems for a variety of applications. One of the efficient ways to build large storage systems is to use RAIDs as basic storage modules. In general, the data can be recovered in RAIDs only when one error occurs. But in large RAIDs systems, the fault probability will increase when the number of disks increases, and the use of disks with big storage capacity will cause the recovering time to prolong, thus the probability of the second disk's fault will increase. Therefore, it is necessary to develop methods to recover data when two or more errors have occurred. In this paper, a fault tolerant scheme is proposed based on extended Reed-Solomon code, a recovery procedure is designed to correct up to two errors which is implemented by software and hardware together, and the scheme is verified by computer simulation. In this scheme, only two redundant disks are used to recover up to two disks' fault. The encoding and decoding methods, and the implementation based on software and hardware are described. The application of the scheme in software RAIDs that are built in cluster computers are also described. Compared with the existing methods such as EVENODD and DH, the proposed scheme has distinct improvement in implementation and redudancy.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Patterson D A, Gibson G A, Katz R. A case for redundant arrays of inexpensive disks. InProc. SIGMOD Int. Conf. Data Management, Chicago, IL, 1988, pp.109–116.
Chen P, Lee Eet al. RAID: High-performance, reliable secondary storage.ACM Computing Surveys, June, 1994, 26(2): 145–185.
Fang Liang. An architecture of mass storage systems for parallel computers [Dissertation]. National University of Defense Technology, Dec., 1994, (in Chinese).
Malhotra M, Reibman A L. Reliability analysis of redundant array of inexpensive disks.J. Parallel and Distributed Computing, Jan., 1993, 17: 146–151.
Gibson G, Patterson D. Designing disk arrays for high data reliability.J. Parallel and Distributed Computing, Jan., 1993, 17: 4–27.
Nam-Kyu Lee, Sung-Bong Yang, Kyoung-Woo Lee. Efficient parity placement schemes for tolerating up to two disk failures in disk arrays.Journal of Systems Architecture, 2000, 46: 1383–1402.
Chan-Ik Park. Efficient placement of parity and data to tolerate two disk failures in disk array systems.IEEE Trans. Parallel and Distributed Systems, Nov. 1995, 6(11): 1177–1184.
Zemor G, Cohen G D. Error-correcting WOM-codes.IEEE Trans. Inform. Theory, May, 1991, 37(3): 730–734.
Blahut R E. A universal reed-solomon decoder.IBM J. RES. DEVELOP. Jan., 1984, 28(1): 150–158.
Blaum M, Brady J, Bruck J, Menon J. EVENODD: An efficient scheme for tolerating double disk failures in RAID architectures.IEEE Trans. Comput. Feb., 1995, 44(2): 192–202.
Cortes T. Software RAID and parallel file systems. High Performance Cluster Computing, Buyya R (ed.), Prentice Hall, 1999, pp.463–496.
Liu Kuang Y. Architecture for VLSI design of Reed-Solomon decoders.IEEE Trans. Comput. Feb., 1984, C-33(2): 178–189.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research is supported by the National Natural Science Foundation of China (No. 69933030).
Rights and permissions
About this article
Cite this article
Fang, L., Lu, X. A cost effective fault-tolerant scheme for RAIDs. J. Comput. Sci. & Technol. 18, 230–234 (2003). https://doi.org/10.1007/BF02948889
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02948889