Abstract
As smart devices are revolutionized in recent years, people may generate enormous amount of various sized data and store them in the local or remote file system in their daily lives. With cheaper and easy to use private cloud storage appliances helping to handle the increasing demand of storing and sharing big volume of data, effective file deduplication schemes can greatly increase the space efficiency in private cloud storage systems as well as preserve network bandwidth. In the paper, we aim at designing and implementing several file deduplication schemes built in the private cloud storage appliance, based on different duplication checking rules, including file name, file size, and file partial/full content hash value. Experiment results show using partial content hashing based file deduplication scheme achieves a reasonably balanced performance without overutilized limited local computational resources.
Similar content being viewed by others
References
Tate J, Beck P, Ibarra HH, Kumaravel S, Miklas L (2012) Introduction to storage area networks and system networking. IBM Redbooks
Hong B, Plantenberg D, Long DD, & Sivan-Zimet M (2004) “Duplicate Data Elimination in a SAN File System”. In MSST (pp. 301–314)
Bobbarjung DR, Jagannathan S, Dubnicki C (2006) Improving duplicate elimination in storage systems. ACM Trans Storage (TOS) 2(4):424–448
Min J, Yoon D, Won Y (2011) Efficient deduplication techniques for modern backup operation. Comput IEEE Trans on 60(6):824–840
Li J, Li YK, Chen X, Lee PP, Lou W (2015) A hybrid cloud approach for secure authorized deduplication. Parallel and Distrib Sys IEEE Trans on 26(5):1206–1216
Stanek J, Sorniotti A, Androulaki E, Kencl L (2014) A secure data deduplication scheme for cloud storage in financial cryptography and data security. Springer, Berlin Heidelberg, pp 99–118
Meyer DT, Bolosky WJ (2012) A study of practical deduplication. ACM Trans Storage (TOS) 7(4):14
Harnik D, Pinkas B, Shulman-Peleg A (2010) Side channels in cloud services: deduplication in cloud storage. Security & Privacy IEEE 8(6):40–47
Paulo J, Pereira J (2014) A survey and classification of storage deduplication systems. ACM Comput Surveys (CSUR) 47(1):11
Meister D., & Brinkmann A (2009) Multi-level comparison of data deduplication in a backup scenario. In Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference (p. 8). ACM
Henson V (2003) An Analysis of Compare-by-hash. In HotOS (pp. 13–18)
Malhotra J, & Bakal J (2015) A survey and comparative study of data deduplication techniques. In Pervasive Computing (ICPC), 2015 International Conference on (pp. 1–5). IEEE
Rivest R (1992) The MD5 message-digest algorithm. RFC 1321
Banachowski L, Kreczmar A, Rytter W (1991) Analysis of Algorithms and Data Structures
Quinlan S, & Dorward S (2002) Venti: A New Approach to Archival Storage. In FAST (Vol. 2, pp. 89–101).2
Acknowledgments
The authors gratefully acknowledge the financial support from the “Aiming For the Top University Program” funded by Ministry of Education, Taiwan.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Su, KW., Leu, JS., Yu, MC. et al. Design and Implementation of Various File Deduplication Schemes on Storage Devices. Mobile Netw Appl 22, 40–50 (2017). https://doi.org/10.1007/s11036-016-0677-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11036-016-0677-9