Abstract
The paper describes the Remote Data Disaster Recovery System using Hash to identify and avoid sending duplicate data blocks between the Primary Node and the Secondary Node, thereby, to reduce the data replication network bandwidth, decrease overhead and improve network efficiency. On both nodes, some extra storage spaces (the Hash Repositories) besides data disks are used to record the Hash for each data block on data disks. We extend the data replication protocol between the Primary Node and the Secondary Node. When the data, whose Hash exists in the Hash Repository, is duplication, the block address is transferred instead of the data, and that reduces network bandwidth requirement, saves synchronization time, and improves network efficiency.
Chapter PDF
Similar content being viewed by others
References
Yang, T., Jiang, H., Feng, D., et al.: DEBAR: A Scalable High-Performance De-duplication Storage System for Backup and Archiving. CSE Technical Reports, 58 (2009)
Garcia-Molina, H., Halim, H., King, R.P., Polyzois, C.A.: Management of a remote backup copy for disaster recovery. ACM Transactions on Database Systems 16, 338–368 (1991)
Polyzois, C.A., Molina, H.G.: Evaluation of remote backup algorithms for transaction-processing systems. ACM Transactions on Database Systems (TODS) 19(3), 423–449 (1994)
Ellenberg, L.: DRBD 8.0.x and beyond Shared-Disk semantics on a Shared-Nothing Cluster (2007), http://www.drbd.org
Ao, L., Shu, J., Li, M.: Data Deduplication Techniques. Journal of Software 21(5), 916–929 (2010)
Reisner, P.: DRBD–Distributed Replicated Block Device (August 2002), http://www.drbd.org
Patterson, R.H., Manley, S., Federwisch, M., et al.: SnapMirror: file-system-based asynchronous mirroring for disaster recovery. USENIX Association (2002)
Zhu, B., Li, K., Patterson, H.: Avoiding the disk bottleneck in the Data Domain deduplication file system. In: Proceeding of the 6th USENIX Conference File and Storage Technologies, California, USA, February 2008, pp. 1–14 (2008)
Tan, Y.A., Jin, J., Cao, Y.D., et al.: A high-throughput fibre channel data communication service. Institute of Electrical and Electronics Engineers Computer Society, Dalian, China (2005)
Reisner, P., Ellenberg, L.: Drbd v8–replicated storage with shared disk semantics (2005), http://www.drbd.org
Bobbarjung, D.R., Jagannathan, S., Dubnicki, C.: Improving duplicate elimination in storage systems. ACM Transactions on Storage (TOS) 2, 424–448 (2006)
Barreto, J., Ferreira, P.: Efficient locally trackable deduplication in replicated systems. In: Bacon, J.M., Cooper, B.F. (eds.) Middleware 2009. LNCS, vol. 5896, pp. 103–122. Springer, Heidelberg (2009)
Aref, W.G., Samet, H.: Hashing by proximity to process duplicates in spatial databases. Presented at Information and Knowledge Management. Gaithersburg, Maryland, United States (1994)
Eltabakh, M.Y., Ouzzani, M., Aref, W.G.: Duplicate Elimination in Space-partitioning Tree Indexes. Presented at Scientific and Statistical Database Management (2007)
You, L.L., Pollack, K.T., Long, D.D.E.: Deep Store: An Archival Storage System Architecture. In: Proc. Of the 21st Conf. on Data Engineering (ICDE 2005), pp. 804–815. IEEE Computer Society Press, Washington (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 IFIP International Federation for Information Processing
About this paper
Cite this paper
Liu, J., Tan, Ya., Li, Y., Zhang, X., Zhou, Z. (2011). A Method of Deduplication for Data Remote Backup. In: Li, D., Liu, Y., Chen, Y. (eds) Computer and Computing Technologies in Agriculture IV. CCTA 2010. IFIP Advances in Information and Communication Technology, vol 344. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18333-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-18333-1_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-18332-4
Online ISBN: 978-3-642-18333-1
eBook Packages: Computer ScienceComputer Science (R0)