A Method of Deduplication for Data Remote Backup

Liu, Jingyu; Tan, Yu-an; Li, Yuanzhang; Zhang, Xuelan; Zhou, Zexiang

doi:10.1007/978-3-642-18333-1_10

Jingyu Liu^3,4,
Yu-an Tan³,
Yuanzhang Li³,
Xuelan Zhang³ &
…
Zexiang Zhou⁵

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 344))

Included in the following conference series:

International Conference on Computer and Computing Technologies in Agriculture

2723 Accesses

Abstract

The paper describes the Remote Data Disaster Recovery System using Hash to identify and avoid sending duplicate data blocks between the Primary Node and the Secondary Node, thereby, to reduce the data replication network bandwidth, decrease overhead and improve network efficiency. On both nodes, some extra storage spaces (the Hash Repositories) besides data disks are used to record the Hash for each data block on data disks. We extend the data replication protocol between the Primary Node and the Secondary Node. When the data, whose Hash exists in the Hash Repository, is duplication, the block address is transferred instead of the data, and that reduces network bandwidth requirement, saves synchronization time, and improves network efficiency.

Download to read the full chapter text

Chapter PDF

Distributed Storage Hash Algorithm (DSHA) for File-Based Deduplication in Cloud Computing

A Viewpoint on Different Data Deduplication Systems and Allied Issues

Caching: QoS Enabled Metadata Processing Scheme for Data Deduplication

Keywords

References

Yang, T., Jiang, H., Feng, D., et al.: DEBAR: A Scalable High-Performance De-duplication Storage System for Backup and Archiving. CSE Technical Reports, 58 (2009)
Google Scholar
Garcia-Molina, H., Halim, H., King, R.P., Polyzois, C.A.: Management of a remote backup copy for disaster recovery. ACM Transactions on Database Systems 16, 338–368 (1991)
Google Scholar
Polyzois, C.A., Molina, H.G.: Evaluation of remote backup algorithms for transaction-processing systems. ACM Transactions on Database Systems (TODS) 19(3), 423–449 (1994)
Google Scholar
Ellenberg, L.: DRBD 8.0.x and beyond Shared-Disk semantics on a Shared-Nothing Cluster (2007), http://www.drbd.org
Ao, L., Shu, J., Li, M.: Data Deduplication Techniques. Journal of Software 21(5), 916–929 (2010)
Google Scholar
Reisner, P.: DRBD–Distributed Replicated Block Device (August 2002), http://www.drbd.org
Patterson, R.H., Manley, S., Federwisch, M., et al.: SnapMirror: file-system-based asynchronous mirroring for disaster recovery. USENIX Association (2002)
Google Scholar
Zhu, B., Li, K., Patterson, H.: Avoiding the disk bottleneck in the Data Domain deduplication file system. In: Proceeding of the 6th USENIX Conference File and Storage Technologies, California, USA, February 2008, pp. 1–14 (2008)
Google Scholar
Tan, Y.A., Jin, J., Cao, Y.D., et al.: A high-throughput fibre channel data communication service. Institute of Electrical and Electronics Engineers Computer Society, Dalian, China (2005)
Google Scholar
Reisner, P., Ellenberg, L.: Drbd v8–replicated storage with shared disk semantics (2005), http://www.drbd.org
Bobbarjung, D.R., Jagannathan, S., Dubnicki, C.: Improving duplicate elimination in storage systems. ACM Transactions on Storage (TOS) 2, 424–448 (2006)
Google Scholar
Barreto, J., Ferreira, P.: Efficient locally trackable deduplication in replicated systems. In: Bacon, J.M., Cooper, B.F. (eds.) Middleware 2009. LNCS, vol. 5896, pp. 103–122. Springer, Heidelberg (2009)
Google Scholar
Aref, W.G., Samet, H.: Hashing by proximity to process duplicates in spatial databases. Presented at Information and Knowledge Management. Gaithersburg, Maryland, United States (1994)
Google Scholar
Eltabakh, M.Y., Ouzzani, M., Aref, W.G.: Duplicate Elimination in Space-partitioning Tree Indexes. Presented at Scientific and Statistical Database Management (2007)
Google Scholar
You, L.L., Pollack, K.T., Long, D.D.E.: Deep Store: An Archival Storage System Architecture. In: Proc. Of the 21st Conf. on Data Engineering (ICDE 2005), pp. 804–815. IEEE Computer Society Press, Washington (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, P.R. China
Jingyu Liu, Yu-an Tan, Yuanzhang Li & Xuelan Zhang
School of Computer Science and Engineering, Hebei University of Technology, Tianjin, 300010, P.R. China
Jingyu Liu
Toyou Feiji Electronics CO., LTD, Beijing, 100081, P.R. China
Zexiang Zhou

Authors

Jingyu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yu-an Tan
View author publications
You can also search for this author in PubMed Google Scholar
Yuanzhang Li
View author publications
You can also search for this author in PubMed Google Scholar
Xuelan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zexiang Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

EU-China Center for Information & Communication Technologies (CICTA), China Agricultural University, 17 Tsinghua East Road, 100083, Beijing, P.R. China
Daoliang Li & Yingyi Chen &
College of Mechanical and Electronic Engineering, East China Jiaotong University, Shuanggang Road, 330013, Jiangxi, Nanchang, China
Yande Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, J., Tan, Ya., Li, Y., Zhang, X., Zhou, Z. (2011). A Method of Deduplication for Data Remote Backup. In: Li, D., Liu, Y., Chen, Y. (eds) Computer and Computing Technologies in Agriculture IV. CCTA 2010. IFIP Advances in Information and Communication Technology, vol 344. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18333-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-18333-1_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-18332-4
Online ISBN: 978-3-642-18333-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Method of Deduplication for Data Remote Backup

Abstract

Chapter PDF

Similar content being viewed by others

Distributed Storage Hash Algorithm (DSHA) for File-Based Deduplication in Cloud Computing

A Viewpoint on Different Data Deduplication Systems and Allied Issues

Caching: QoS Enabled Metadata Processing Scheme for Data Deduplication

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Method of Deduplication for Data Remote Backup

Abstract

Chapter PDF

Similar content being viewed by others

Distributed Storage Hash Algorithm (DSHA) for File-Based Deduplication in Cloud Computing

A Viewpoint on Different Data Deduplication Systems and Allied Issues

Caching: QoS Enabled Metadata Processing Scheme for Data Deduplication

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation