Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

ESDedup: An efficient and secure deduplication scheme based on data similarity and blockchain for cloud-assisted medical storage systems

Published: 29 August 2022 Publication History

Abstract

With the explosive growth of medical data, the tendency to store medical data on cloud is extremely widespread. However, large-scale medical data have put great pressure on cloud storage systems since redundancy of data wastes much storage space and increases economic expense. Besides, security is also highly important for medical data stored in the cloud. In order to reduce redundancy and ensure security of medical data simultaneously, this study proposes an efficient and secure deduplication scheme ESDedup. Compared with existing works, the redundancy of medical data is summarized and denoising of deduplication is firstly presented to decrease the storage overhead. Afterward, the rewriting algorithm based on similarity instead of the time attribute eliminates more fragments. Meanwhile, the auditing strategy of blockchain is designed to promote the auditing efficiency and security. Experiments demonstrate that ESDedup not only promotes the scalability and system performance, but also improves the deduplication ratio by 55.9% compared with the state-of-the-art method.

References

[1]
Lv Z and Qiao L Analysis of healthcare big data Future Generat Comput Syst 2020 109 103-110
[2]
Liao H, Tang M, Luo L, Li C, Chiclana F, and Zeng X-J A bibliometric analysis and visualization of medical big data research Sustainability 2018 10 1 166
[3]
Saharan S, Somani G, Gupta G, Verma R, Gaur MS, and Buyya R Quickdedup: Efficient vm deduplication in cloud computing environments J Parallel and Distribut Comput 2020 139 18-31
[4]
Kaur R, Chana I, and Bhattacharya J Data deduplication techniques for efficient cloud storage management: a systematic review J Supercomput 2018 74 5 2035-2085
[5]
Pooranian Z, Chen K-C, Yu C-M, Conti M (2018) Rare: Defeating side channels based on data-deduplication in cloud storage. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 444–449
[6]
Stanford medicine health trends report. https://med.stanford.edu/content/dam/sm/sm-news/documents/StanfordMedicineHealthTrendsWhitePaper2017.pdf (2017)
[7]
Cogo V, Paulo J, and Bessani A Genodedup: Similarity-based deduplication and delta-encoding for genome sequencing data IEEE Transact Comput 2020 70 5 669-681
[8]
Taghizadeh RG, Taghizadeh RG, Khakpash F, Marvasti MB, and Asghari SA Ca-dedupe: Content-aware deduplication in ssds J Supercomput 2020 76 11 8901-8921
[9]
Liu J, Yun-Peng C, Xiao Q, and Yao-Hong L Endurable ssd-based read cache for improving the performance of selective restore from deduplication systems J comput sci technol 2018 33 1 58-78
[10]
Wu S, Du C, Zhang W, Mao B, and Jiang H Deduphr: Exploiting content locality to alleviate read/write interference in deduplication-based flash storage IEEE Transact Comput 2021
[11]
Mao B, Jiang H, Wu S, and Tian L Leveraging data deduplication to improve the performance of primary storage systems in the cloud IEEE transact comput 2015 65 6 1775-1788
[12]
Wu S, Li K-C, Mao B, and Liao M Dac: improving storage availability with deduplication-assisted cloud-of-clouds Future Generat Comput Syst 2017 74 190-198
[13]
Xia W, Feng D, Jiang H, Zhang Y, Chang V, and Zou X Accelerating content-defined-chunking based data deduplication by exploiting parallelism Future Generat Comput Syst 2019 98 406-418
[14]
Xia W, Zou X, Jiang H, Zhou Y, Liu C, Feng D, Hua Y, Hu Y, and Zhang Y The design of fast content-defined chunking for data deduplication based storage systems IEEE Transact Parallel Distributed Syst 2020 31 9 2017-2031
[15]
Xia W, Zhou Y, Jiang H, Feng D, Hua Y, Hu Y, Liu Q, Zhang Y (2016) Fastcdc: a fast and efficient content-defined chunking approach for data deduplication. In: 2016 {USENIX} Annual Technical Conference ({USENIX}{ATC} 16), pp. 101–114
[16]
Tan Y, Wang B, Wen J, Yan Z, Jiang H, and Srisa-an W Improving restore performance in deduplication-based backup systems via a fine-grained defragmentation approach IEEE Transact on Parallel Distribut Syst 2018 29 10 2254-2267
[17]
Zhang Y, Fu M, Wu X, Wang F, Wang Q, Wang C, Dong X, and Han H Improving restore performance of packed datasets in deduplication systems via reducing persistent fragmented chunks IEEE Transact Parallel Distribut Syst 2020 31 7 1651-1664
[18]
Zou X, Yuan J, Shilane P, Xia W, Zhang H, Wang X (2021) The dilemma between deduplication and locality: Can both be achieved? In: 19th {USENIX} Conference on File and Storage Technologies ({FAST} 21), pp. 171–185
[19]
Price WN and Cohen IG Privacy in the age of medical big data Nat med 2019 25 1 37-43
[20]
Yang X, Lu R, Shao J, Tang X, and Ghorbani A Achieving efficient secure deduplication with user-defined access control in cloud IEEE Transact Dependable Secure Comput 2020 19 1 591-606
[21]
Bellare M, Keelveedhi S, Ristenpart T (2013) Message-locked encryption and secure deduplication. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 296–312
[22]
Oham C, Michelin RA, Jurdak R, Kanhere SS, and Jha S B-ferl: Blockchain based framework for securing smart vehicles Inform Process & Manag 2021 58 1
[23]
Hardin T and Kotz D Amanuensis: Information provenance for health-data systems Inform Process Manag 2021 58 2
[24]
Cao S, Zhang G, Liu P, Zhang X, and Neri F Cloud-assisted secure ehealth systems for tamper-proofing ehr via blockchain Inform Sci 2019 485 427-440
[25]
Zhao H, Wang L, Wang Y, Shu M, and Liu J Feasibility study on security deduplication of medical cloud privacy data EURASIP J Wireless Communicat Netw 2018 2018 1 1-15
[26]
Li J, Yang Z, Ren Y, Lee PP, Zhang X (2020) Balancing storage efficiency and data confidentiality with tunable encrypted deduplication. In: Proceedings of the Fifteenth European Conference on Computer Systems, pp. 1–15
[27]
Zhang Y, Xu C, Li H, Yang K, Zhou J, and Lin X Healthdep: An efficient and secure deduplication scheme for cloud-assisted ehealth systems IEEE Transact on Indu Inform 2018 14 9 4101-4112
[28]
Zhang G, Yang Z, Xie H, and Liu W A secure authorized deduplication scheme for cloud data based on blockchain Inform Process Manag 2021 58 3
[29]
Zhang Y, Feng D, Jiang H, Xia W, Fu M, Huang F, and Zhou Y A fast asymmetric extremum content defined chunking algorithm for data deduplication in backup storage systems IEEE Transact Comput 2016 66 2 199-211
[30]
Godavari A, Sudhakar C, and Ramesh T Hybrid deduplication system-a block-level similarity-based approach IEEE Syst J 2020 15 3 3860-3870
[31]
Li H, Xia Q, Wang Y, et al. Research and improvement of kruskal algorithm J Comput Communicat 2017 5 12 63
[32]
Ma F, Ren M, Fu Y, Wang M, Li H, Song H, and Jiang Y Security reinforcement for ethereum virtual machine Inform Process Manage 2021 58 4

Cited By

View all
  • (2024)A Medical Data-Effective Learning Benchmark for Highly Efficient Pre-training of Foundation ModelsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681313(3499-3508)Online publication date: 28-Oct-2024
  • (2023)Blockchain‐based data deduplication using novel content‐defined chunking algorithm in cloud environmentInternational Journal of Network Management10.1002/nem.224933:6Online publication date: 7-Nov-2023

Index Terms

  1. ESDedup: An efficient and secure deduplication scheme based on data similarity and blockchain for cloud-assisted medical storage systems
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image The Journal of Supercomputing
          The Journal of Supercomputing  Volume 79, Issue 3
          Feb 2023
          1178 pages

          Publisher

          Kluwer Academic Publishers

          United States

          Publication History

          Published: 29 August 2022
          Accepted: 31 July 2022

          Author Tags

          1. Redundancy
          2. Deduplication
          3. Data similarity
          4. Blockchain
          5. Storage systems

          Qualifiers

          • Research-article

          Funding Sources

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 26 Jan 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)A Medical Data-Effective Learning Benchmark for Highly Efficient Pre-training of Foundation ModelsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681313(3499-3508)Online publication date: 28-Oct-2024
          • (2023)Blockchain‐based data deduplication using novel content‐defined chunking algorithm in cloud environmentInternational Journal of Network Management10.1002/nem.224933:6Online publication date: 7-Nov-2023

          View Options

          View options

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media