Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

High-Performance General Functional Regenerating Codes with Near-Optimal Repair Bandwidth

Published: 10 June 2017 Publication History
  • Get Citation Alerts
  • Abstract

    Erasure codes are widely used in modern distributed storage systems to prevent data loss and server failures. Regenerating codes are a class of erasure codes that trade storage efficiency and computation for repair bandwidth reduction. However, their nonunified coding parameters and huge computational overhead prohibit their applications. Hence, we first propose a family of General Functional Regenerating (GFR) codes with uncoded repair, balancing storage efficiency and repair bandwidth with general parameters. The GFR codes take advantage of a heuristic repair algorithm, which makes efforts to employ as little repair bandwidth as possible to repair a single failure. Second, we also present a scheduled shift multiplication (SSM) algorithm, which accelerates the matrix product over the Galois field by scheduling the order of coding operations, so encoding and repairing of GFR codes can be executed by fast bitwise shifting and exclusive-OR. Compared to the traditional table-lookup multiplication algorithm, our SSM algorithm gains 1.2 to 2 X speedup in our experimental evaluations, with little effect on the repair success rate.

    References

    [1]
    H. P. Anvin. 2015. The mathematics of RAID-6. (2015). https://www.kernel.org/pub/linux/kernel/people/hpa/raid6.pdf.
    [2]
    Walter Burkhard and Jai Menon. 1993. Disk array storage system reliability. In Proceedings of the 23rd International Symposium on Fault-Tolerant Computing (FTCS-23). IEEE, Los Alamitos, CA, 432--441.
    [3]
    H. C. H. Chen, Y. Hu, P. P. C. Lee, and Y. Tang. 2014. NCCloud: A network-coding-based storage system in a cloud-of-clouds. IEEE Transactions on Computers 63, 1, 31--44.
    [4]
    Daniel Cullina, Alexandros G. Dimakis, and Tracey Ho. 2009. Searching for minimum storage regenerating codes. arXiv:0910.2245.
    [5]
    Alexandros G. Dimakis, P. Brighten Godfrey, Yunnan Wu, Martin J. Wainwright, and Kannan Ramchandran. 2010. Network coding for distributed storage systems. IEEE Transactions on Information Theory 56, 9, 4539--4551.
    [6]
    Alexandros G. Dimakis, Kannan Ramchandran, Yunnan Wu, and Changho Suh. 2011. A survey on network codes for distributed storage. Proceedings of the IEEE 99, 3, 476--489.
    [7]
    K. M. Greenan, E. L. Miller, and T. J. E. Schwarz. 2008. Optimizing Galois field arithmetic for diverse processor architectures and applications. In Proceedings of the 2008 IEEE International Symposium on Modeling, Analysis, and Simulation of Computers and Telecommunications Systems (MASCOTS’08). IEEE, Los Alamitos, CA, 1--10.
    [8]
    Kevin M. Greenan, James S. Plank, and Jay J. Wylie. 2010. Mean time to meaningless: MTTDL, Markov models, and storage system reliability. In Proceedings of the 2nd USENIX Conference on Hot Topics in Storage and File Systems. 5.
    [9]
    Yuchong Hu, Henry C. H. Chen, Patrick P. C. Lee, and Yang Tang. 2012. NCCloud: Applying network coding for the storage repair in a cloud-of-clouds. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). 21.
    [10]
    Yuchong Hu, Chiu-Man Yu, Yan Kit Li, Patrick P. C. Lee, and John C. S. Lui. 2011. NCFS: On the practicality and extensibility of a network-coding-based distributed file system. In Proceedings of the 2011 International Symposium on Network Coding (NetCod’11). IEEE, Los Alamitos, CA, 1--6.
    [11]
    C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, P. Gopalan, J. Li, and S. Yekhanin. 2012. Erasure coding in Windows Azure storage. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’12).
    [12]
    Intel VTune Amplifier XE 2015. 2015. What’s new? - Intel®VTuneTM Amplifier XE 2015. (2015). https://software.intel.com/en-us/articles/whats-new-intel-vtune-amplifier-xe-2015.
    [13]
    Sebastian Kalcher and Volker Lindenstruth. 2011. Accelerating Galois field arithmetic for Reed-Solomon erasure codes in storage applications. In Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER’11). IEEE, Los Alamitos, CA, 290--298.
    [14]
    Osama Khan, Randal Burns, James Plank, William Pierce, and Cheng Huang. 2012. Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12).
    [15]
    John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishan Gummadi, et al. 2000. OceanStore: An architecture for global-scale persistent storage. ACM SIGPLAN Notices 35, 11, 190--201.
    [16]
    Mingqiang Li and Patrick P. C. Lee. 2014. STAIR codes: A general family of erasure codes for tolerating device and sector failures in practical storage systems. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). 147--162.
    [17]
    Runhui Li, Jian Lin, and Patrick P. C. Lee. 2013. CORE: Augmenting regenerating-coding-based recovery for single and concurrent failures in distributed storage systems. arXiv:1302.3344.
    [18]
    Qing Liu, Dan Feng, Zhan Shi, and Min Fu. 2015. General functional regenerating codes with uncoded repair for distributed storage system. In Proceedings of the 2015 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid’15). IEEE, Los Alamitos, CA, 372--381.
    [19]
    Jianqiang Luo, Kevin D. Bowers, Alina Oprea, and Lihao Xu. 2012. Efficient software implementations of large finite fields GF (2 n) for secure storage applications. ACM Transactions on Storage 8, 1, 2.
    [20]
    Jianqiang Luo, James S. Plank, Mochan Shrestha, and Lihao Xu. 2013. Efficient encoding schedules for XOR-based erasure codes. IEEE Transactions on Computers 63, 9, 2259--2272.
    [21]
    Dimitris S. Papailiopoulos, Jianqiang Luo, Alexandros G. Dimakis, Cheng Huang, and Jin Li. 2012. Simple regenerating codes: Network coding for cloud storage. In Proceedings of IEEE INFOCOM (INFOCOM’12). IEEE, Los Alamitos, CA, 2801--2805.
    [22]
    James S. Plank. 1997. A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems. Software—Practice and Experience 27, 9, 995--1012.
    [23]
    James S. Plank. 2009. The RAID-6 liber8tion code. International Journal of High Performance Computing Applications 23, 3, 242--251.
    [24]
    James S. Plank, Kevin M. Greenan, and Ethan L. Miller. 2013. Screaming fast Galois field arithmetic using Intel SIMD instructions. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13).
    [25]
    James S. Plank, Jianqiang Luo, Catherine D. Schuman, Lihao Xu, and Zooko Wilcox-O’Hearn. 2009. A performance evaluation and examination of open-source erasure coding libraries for storage. In Proceedings of the 7th Conference on File and Storage Technologies (FAST’09). 253--265.
    [26]
    James S. Plank and Lihao Xu. 2006. Optimizing cauchy Reed-Solomon codes for fault-tolerant network storage applications. In Proceedings of the 5th IEEE International Symposium on Network Computing and Applications (NCA’06). IEEE, Los Alamitos, CA, 173--180.
    [27]
    K. V. Rashmi, N. B. Shah, D. Gu, H. Kuang, D. Borthakur, and K. Ramchandran. 2014. A “hitchhiker’s” guide to fast and efficient data reconstruction in erasure-coded data centers. In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM’14). ACM, New York, NY, 331--342.
    [28]
    K. V. Rashmi, N. B. Shah, and P. V. Kumar. 2011. Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction. IEEE Transactions on Information Theory 57, 8, 5227--5239.
    [29]
    K. V. Rashmi, N. B. Shah, P. V. Kumar, and K. Ramchandran. 2009. Explicit construction of optimal exact regenerating codes for distributed storage. In Proceedings of the 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton’09). IEEE, Los Alamitos, CA, 1243--1249.
    [30]
    I. S. Reed and G. Solomon. 1960. Polynomial codes over certain finite fields. Journal of the Society for Industrial and Applied Mathematics 8, 2, 300--304.
    [31]
    Maheswaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. 2013. Xoring elephants: Novel erasure codes for big data. In Proceedings of the 39th International Conference on Very Large Data Bases. 325--336.
    [32]
    Bianca Schroeder and Garth A. Gibson. 2007. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST’07). 1--16.
    [33]
    N. B. Shah, K. V. Rashmi, P. V. Kumar, and K. Ramchandran. 2012. Distributed storage codes with repair-by-transfer and nonachievability of interior points on the storage-bandwidth tradeoff. IEEE Transactions on Information Theory 58, 3, 1837--1852.
    [34]
    Kenneth W. Shum and Yuchong Hu. 2012. Functional-repair-by-transfer regenerating codes. In Proceedings of the 2012 IEEE International Symposium on Information Theory (ISIT’12). IEEE, Los Alamitos, CA, 1192--1196.
    [35]
    K. Shvachko, H. Kuang, S. Radia, and R. Chansler. 2010. The Hadoop Distributed File System. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10). IEEE, Los Alamitos, CA, 1--10.
    [36]
    Changho Suh and Kannan Ramchandran. 2011. Exact-repair MDS code construction using interference alignment. IEEE Transactions on Information Theory 57, 3, 1425--1442.
    [37]
    Itzhak Tamo, Zhiying Wang, and Jehoshua Bruck. 2011. Zigzag codes: MDS array codes with optimal rebuilding. arXiv:1112.0371.
    [38]
    Anyu Wang and Zhifang Zhang. 2012. Exact cooperative regenerating codes with minimum-repair-bandwidth for distributed storage. arXiv:1207.0879.
    [39]
    Zhiying Wang, Itzhak Tamo, and Jehoshua Bruck. 2011. On codes for optimal rebuilding access. In Proceedings of the 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton’11). IEEE, Los Alamitos, CA, 1374--1381.
    [40]
    Yunfeng Zhu, Patrick P. C. Lee, Yuchong Hu, Liping Xiang, and Yinlong Xu. 2012. On the speedup of single-disk failure recovery in XOR-coded storage systems: Theory and practice. In Proceedings of the 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST’12). IEEE, Los Alamitos, CA, 1--12.

    Cited By

    View all
    • (2022)On the Impact of Pollution Attacks on Coding-Based Distributed Storage SystemsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2022.314092417(292-302)Online publication date: 2022
    • (2020)Hybrid CodesACM Transactions on Storage10.1145/340719316:4(1-26)Online publication date: 12-Nov-2020
    • (2018)Stochastic Circuit Synthesis by Cube AssignmentIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.278972537:12(3109-3122)Online publication date: 19-Nov-2018
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Storage
    ACM Transactions on Storage  Volume 13, Issue 2
    Special Issue on MSST 2016 and Regular Papers
    May 2017
    199 pages
    ISSN:1553-3077
    EISSN:1553-3093
    DOI:10.1145/3098275
    • Editor:
    • Sam H. Noh
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 June 2017
    Accepted: 01 February 2017
    Revised: 01 December 2016
    Received: 01 September 2015
    Published in TOS Volume 13, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Erasure codes
    2. failure recovery
    3. failure tolerance
    4. performance and evaluation
    5. repair bandwidth

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • National Natural Science Foundation of China
    • Fundamental Research Funds for the Central Universities
    • National High Technology Research and Development Program of China (863 Program)
    • National Basic Research Program of China (973 Program)
    • National Key Technology R8D Program of China
    • Natural Science Foundation of Hubei Province

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)On the Impact of Pollution Attacks on Coding-Based Distributed Storage SystemsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2022.314092417(292-302)Online publication date: 2022
    • (2020)Hybrid CodesACM Transactions on Storage10.1145/340719316:4(1-26)Online publication date: 12-Nov-2020
    • (2018)Stochastic Circuit Synthesis by Cube AssignmentIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.278972537:12(3109-3122)Online publication date: 19-Nov-2018
    • (2018)CRL: Efficient Concurrent Regeneration Codes with Local Reconstruction in Geo-Distributed Storage SystemsJournal of Computer Science and Technology10.1007/s11390-018-1877-533:6(1140-1151)Online publication date: 19-Nov-2018

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media