Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Information Leakage in Encrypted Deduplication via Frequency Analysis: Attacks and Defenses

Published: 29 March 2020 Publication History

Abstract

Encrypted deduplication combines encryption and deduplication to simultaneously achieve both data security and storage efficiency. State-of-the-art encrypted deduplication systems mainly build on deterministic encryption to preserve deduplication effectiveness. However, such deterministic encryption reveals the underlying frequency distribution of the original plaintext chunks. This allows an adversary to launch frequency analysis against the ciphertext chunks and infer the content of the original plaintext chunks. In this article, we study how frequency analysis affects information leakage in encrypted deduplication, from both attack and defense perspectives. Specifically, we target backup workloads and propose a new inference attack that exploits chunk locality to increase the coverage of inferred chunks. We further combine the new inference attack with the knowledge of chunk sizes and show its attack effectiveness against variable-size chunks. We conduct trace-driven evaluation on both real-world and synthetic datasets and show that our proposed attacks infer a significant fraction of plaintext chunks under backup workloads. To defend against frequency analysis, we present two defense approaches, namely MinHash encryption and scrambling. Our trace-driven evaluation shows that our combined MinHash encryption and scrambling scheme effectively mitigates the severity of the inference attacks, while maintaining high storage efficiency and incurring limited metadata access overhead.

References

[1]
2014. FSL Traces and Snapshots Public Archive. Retrieved from http://tracer.filesystems.org/.
[2]
2019. Ubuntu IRC Logs. Retrieved from http://irclogs.ubuntu.com.
[3]
Martín Abadi, Dan Boneh, Ilya Mironov, Ananth Raghunathan, and Gil Segev. 2013. Message-locked encryption for lock-dependent messages. In Proceedings of the Conference on Advances in Cryptology (CRYPTO’13). 374--391.
[4]
Ibrahim A. Al-Kadit. 1992. Origins of cryptology: The arab contributions. Cryptologia 16, 2 (1992), 97--126.
[5]
Yamini Allu, Fred Douglis, Mahesh Kamat, Philip Shilane, Hugo Patterson, and Ben Zhu. 2017. Backup to the future: How workload and hardware changes continually redefine data domain file systems. IEEE Trans. Comput. 50, 7 (2017), 64--72.
[6]
George Amvrosiadis and Medha Bhadkamkar. 2015. Identifying trends in enterprise data protection systems. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’15).
[7]
Paul Anderson and Le Zhang. 2010. Fast and secure laptop backups with encrypted de-duplication. In Proceedings of the 24th International Conference on Large Installation System Administration (LISA’10). 1--8.
[8]
Frederik Armknecht, Jens-Matthias Bohli, Ghassan O. Karame, and Franck Youssef. 2015. Transparent data deduplication in the cloud. In Proceedings of the 22nd ACM Conference on Computer and Communications Security (CCS’15). 886--900.
[9]
Frederik Armknecht, Colin Boyd, Gareth T. Davies, Kristian Gjøsteen, and Mohsen Toorani. 2017. Side channels in deduplication: Trade-offs between leakage and efficiency. In Proceedings of the ACM Asia Conference on Computer and Communications Security (ASIACCS’17). 266--274.
[10]
Michael Arrington. 2006. AOL: “This Was a Screw Up.” Retrieved from https://techcrunch.com/2006/08/07/aol-this-was-a-screw-up/.
[11]
Mihir Bellare and Sriram Keelveedhi. 2015. Interactive message-locked encryption and secure deduplication. In Proceedings of the Conference on Public-Key Cryptography (PKC’15). 516--538.
[12]
Mihir Bellare, Sriram Keelveedhi, and Thomas Ristenpart. 2013. DupLESS: Server-aided encryption for deduplicated storage. In Proceeding of the 22nd USENIX Security Symposium (USENIX Security’13). 179--194.
[13]
Mihir Bellare, Sriram Keelveedhi, and Thomas Ristenpart. 2013. Message-locked encryption and secure deduplication. In Proceedings of the Conference on Advances in Cryptology (EUROCRYPT’13). 296--312.
[14]
Deepavali Bhagwat, Kave Eshghi, Darrell D. E. Long, and Mark Lillibridge. 2009. Extreme binning: Scalable, parallel deduplication for chunk-based file backup. In Proceeding of the IEEE International Symposium on Modeling, Analysis 8 Simulation of Computer and Telecommunication Systems (MASCOTS’09). 1--9.
[15]
Vincent Bindschaedler, Paul Grubbs, David Cash, Thomas Ristenpart, and Vitaly Shmatikov. 2018. The Tao of inference in privacy-protected databases. In Proceedings of the VLDB Endowment, Vol. 11. 1715--1728.
[16]
John Black. 2006. Compare-by-hash: A reasoned analysis. In Proceeding of the USENIX Annual Technical Conference (USENIX ATC’06). 85--90.
[17]
Tønnes Brekne, André Årnes, and Arne Øslebø. 2005. Anonymization of IP traffic monitoring data: Attacks on two prefix-preserving anonymization schemes and some proposed remedies. In Proceeding of the International Workshop on Privacy Enhancing Technologies (PET’05). 179--196.
[18]
Andrei Z. Broder. 1997. On the resemblance and containment of documents. In Proceeding of the Compression and Complexity of Sequences (SEQUENCES’97). 21--29.
[19]
David Cash, Paul Grubbs, Jason Perry, and Thomas Ristenpart. 2015. Leakage-abuse attacks against searchable encryption. In Proceedings of the 22nd ACM Conference on Computer and Communications Security (CCS’15). 668--679.
[20]
Landon P. Cox, Christopher D. Murray, and Brian D. Noble. 2002. Pastiche: Making backup cheap and easy. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI’02). 285--298.
[21]
Barb Darrow. 2015. Harvard-affiliate McLean Hospital Loses Patient Data. Retrieved from http://fortune.com/2015/07/29/mclean-hospital-loses-patient-data/.
[22]
John R. Douceur, Atul Adya, William J. Bolosky, Dan Simon, and Marvin Theimer. 2002. Reclaiming space from duplicate files in a serverless distributed file system. In Proceeding of the 22nd International Conference on Distributed Computing Systems (ICDCS’02). 617--624.
[23]
Fred Douglis, Abhinav Duggal, Philip Shilane, Tony Wong, Shiqin Yan, and Fabiano Botelho. 2017. The logic of physical garbage collection in deduplicating storage. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). 29--43.
[24]
Yitao Duan. 2014. Distributed key generation for encrypted deduplication: Achieving the strongest privacy. In Proceedings of the ACM Workshop on Cloud Computing Security (CCSW’14). 57--68.
[25]
Kave Eshghi and Hsiu Khuern Tang. 2005. A Framework for Analyzing and Improving Content-Based Chunking Algorithms. Technical Report HPL-2005-30(R.1). Hewlett-Packard Laboratories.
[26]
Sanjay Ghemawat and Jeff Dean. 2014. LevelDB: A Fast Key/Value Storage Library by Google. Retrieved from https://github.com/google/leveldb.
[27]
Paul Grubbs, Richard McPherson, Muhammad Naveed, Thomas Ristenpart, and Vitaly Shmatikov. 2016. Breaking web applications built on top of encrypted data. In Proceedings of the ACM Conference on Computer and Communications Security (CCS’16). 1353--1364.
[28]
Paul Grubbs, Kevin Sekniqi, Vincent Bindschaedler, Muhammad Naveed, and Thomas Ristenpart. 2017. Leakage-abuse attacks against order-revealing encryption. In Proceeding of the IEEE Symposium on Security and Privacy (SP’17). 655--672.
[29]
Robert Hackett. 2016. LinkedIn Lost 167 Million Account Credentials in Data Breach. Retrieved from http://fortune.com/2016/05/18/linkedin-data-breach-email-password/.
[30]
Shai Halevi, Danny Harnik, Benny Pinkas, and Alexandra Shulman-Peleg. 2011. Proofs of ownership in remote storage systems. In Proceedings of the 18th ACM Conference on Computer and Communications Security (CCS’11). 491--500.
[31]
Danny Harnik, Benny Pinkas, and Alexandra Shulman-Peleg. 2010. Side channels in cloud services: Deduplication in cloud storage. IEEE Secur. Priv. 8, 6 (2010), 40--47.
[32]
HIPAA Journal. 2017. Hard Drive Theft Sees Data of 1 Million Individuals Exposed. Retrieved from https://www.hipaajournal.com/hard-drive-theft-sees-data-1-million-individuals-exposed-8859/.
[33]
Mohammad Saiful Islam, Mehmet Kuzu, and Murat Kantarcioglu. 2012. Access pattern disclosure on searchable encryption: Ramification, attack and mitigation. In Proceedings of the Network and Distributed System Security Symposium (NDSS’12). 1--15.
[34]
Keren Jin and Ethan L. Miller. 2009. The effectiveness of deduplication on virtual machine disk images. In Proceeding of the Israeli Experimental Systems Conference (SYSTOR’09). 7:1--7:12.
[35]
Mahesh Kallahall, Erik Riedel, Ram Swaminathan, Qian Wang, and Kevin Fu. 2003. Plutus: Scalable secure file sharing on untrusted storage. In Proceedings of the USENIX Conference on File and Stroage Technologies (FAST’03). 29--42.
[36]
Georgios Kellaris, George Kollios, Kobbi Nissim, and Adam O’Neill. 2016. Generic attacks on secure outsourced databases. In Proceedings of the ACM Conference on Computer and Communications Security (CCS’16). 1329--1340.
[37]
Bryan Klimt and Yiming Yang. 2004. The enron corpus: A new dataset for email classification research. In Proceeding of the European Conference on Machine Learning. 217--226.
[38]
Erik Kruus, Cristian Ungureanu, and Cezary Dubnicki. 2010. Bimodal content defined chunking for backup streams. In Proceeding of the USENIX Conference on File and Storage Technologies (FAST’10).
[39]
Ravi Kumar, Jasmine Novak, Bo Pang, and Andrew Tomkins. 2007. On anonymizing query logs via token-based hashing. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). 629--638.
[40]
Marie-Sarah Lacharité and Kenneth G. Paterson. 2015. A note on the optimality of frequency analysis vs. -optimization. Cryptology ePrint Archive: Report 2015/1158. Retrieved from https://eprint.iacr.org/2015/1158.
[41]
Jingwei Li, Patrick P. C. Lee, Yanjing Ren, and Xiaosong Zhang. 2019. Metadedup: Deduplicating metadata in encrypted deduplication via indirection. In Proceeding of the 35th International Conference on Massive Storage Systems and Technology (MSST’19). 1--13.
[42]
Jingwei Li, Chuan Qin, Patrick P. C. Lee, and Xiaosong Zhang. 2017. Information leakage in encrypted deduplication via frequency analysis. In Proceeding of the 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’17). 1--12.
[43]
Mingqiang Li, Chuan Qin, and Patrick P. C. Lee. 2015. CDStore: Toward reliable, secure, and cost-efficient cloud storage via convergent dispersal. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’15). 111--124.
[44]
Mark Lillibridge, Kave Eshghi, and Deepavali Bhagwat. 2013. Improving restore speed for backup systems that use inline chunk-based deduplication. In Proceeding of the 11th USENIX Conference on File and Storage Technologies (FAST’13). 183--197.
[45]
Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezise, and Peter Camble. 2009. Sparse indexing: Large scale, inline deduplication using sampling and locality. In Proceeding of the USENIX Conference on File and Storage Technologies (FAST’09). 111--123.
[46]
Jian Liu, N. Asokan, and Benny Pinkas. 2015. Secure deduplication of encrypted data without additional independent servers. In Proceedings of the 22nd ACM Conference on Computer and Communications Security (CCS’15). 874--885.
[47]
Jingwei Ma, Rebecca J. Stones, Yuxiang Ma, Jingui Wang, Junjie Ren, Gang Wang, and Xiaoguang Liu. 2016. Lazy exact deduplication. In Proceeding of the 32nd Symposium on Mass Storage Systems and Technologies (MSST’16). 1--10.
[48]
Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone. 2001. Handbook of Applied Cryptography. CRC Press.
[49]
Dutch T. Meyer and William J. Bolosky. 2011. A study of practical deduplication. In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST’11). 1--1.
[50]
Muhammad Naveed, Seny Kamara, and Charles V. Wright. 2015. Inference attacks on property-preserving encrypted databases. In Proceeding of the 22nd ACM Conference on Computer and Communications Security (CCS’15). 644--655.
[51]
Muhammad Naveed, Manoj Prabhakaran, and Carl A. Gunter. 2014. Dynamic searchable encryption via blind storage. In Proceedings of the IEEE Symposium on Security and Privacy (SP’14). 639--654.
[52]
David Pouliot and Charles V. Wright. 2016. The shadow nemesis: Inference attacks on efficiently deployable, efficiently searchable encryption. In Proceedings of the 23th ACM Conference on Computer and Communications Security (CCS’16). 1341--1352.
[53]
Chuan Qin, Jingwei Li, and Patrick P. C. Lee. 2017. The design and implementation of a rekeying-aware encrypted deduplication storage system. ACM Trans. Stor. 13, 1 (Mar. 2017), 9:1--9:30.
[54]
Michael O. Rabin. 1981. Fingerprinting by Random Polynomials. Center for Research in Computing Technology, Harvard University. Technical Report TR-CSE-03-01.
[55]
Hubert Ritzdorf, Ghassan Karame, Claudio Soriente, and Srdjan Čapkun. 2016. On information leakage in deduplicated storage systems. In Proceedings of the ACM on Cloud Computing Security Workshop (CCSW’16). 61--72.
[56]
Peter Shah and Won So. 2015. Lamassu: Storage-efficient host-side encryption. In Proceedings of the USENIX Conference on Usenix Annual Technical Conference (USENIX ATC’15). 333--345.
[57]
Elaine Shi, T.-H. Hubert Chan, Emil Stefanov, and Mingfei Li. 2011. Oblivious RAM with O((log N)3) worst-case cost. In Proceedings of the Conference on Advances in Cryptology (ASIACRYPT’11). 197--214.
[58]
Mark W. Storer, Kevin Greenan, Darrell D. E. Long, and Ethan L. Miller. 2008. Secure data deduplication. In Proceedings of the 4th ACM International Workshop on Storage Security and Survivability (StorageSS’08). 1--10.
[59]
Zhu Sun, Geoff Kuenning, Sonam Mandal, Philip Shilane, Vasily Tarasov, Nong Xiao, and Erez Zadok. 2016. A long-term user-centric analysis of deduplication patterns. In Proceedings of the 32nd Symposium on Mass Storage Systems and Technologies (MSST’16).
[60]
Vasily Tarasov, Amar Mudrankit, Will Buik, Philip Shilane, Geoff Kuenning, and Erez Zadok. 2012. Generating realistic datasets for deduplication analysis. In Proceedings of the USENIX Conference on Annual Technical Conference (USENIX ATC’12). 24--24.
[61]
David C. Uthus and David W. Aha. 2013. The Ubuntu chat corpus for multiparticipant chat analysis. In Proceedings of the AAAI Spring Symposium. 99--102.
[62]
Grant Wallace, Fred Douglis, Hangwei Qian, Philip Shilane, Stephen Smaldone, Mark Chamness, and Windsor Hsu. 2012. Characteristics of backup workloads in production systems. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). 33--48.
[63]
Zooko Wilcox-O’Hearn and Brian Warner. 2008. Tahoe: The least-authority filesystem. In Proceedings of the 4th ACM International Workshop on Storage Security and Survivability (StorageSS’08). 21--26.
[64]
Wen Xia, Hong Jiang, Dan Feng, Fred Douglis, Philip Shilane, Yu Hua, Min Fu, Yucheng Zhang, and Yukun Zhou. 2016. A comprehensive study of the past, present, and future of data deduplication. Proc. IEEE 104, 9 (2016), 1681--1710.
[65]
Wen Xia, Hong Jiang, Dan Feng, and Yu Hua. 2011. SiLo: A similarity locality based near exact deduplication scheme with low RAM overhead and high throughput. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’11). 285--298.
[66]
Yupeng Zhang, Jonathan Katz, and Charalampos Papamanthou. 2016. All your queries are belong to us: The power of file-injection attacks on searchable encryption. In Proceeding of the 25th USENIX Security Symposium (Security’16). 707--720.
[67]
Benjamin Zhu, Kai Li, and R. Hugo Patterson. 2008. Avoiding the disk bottleneck in the data domain deduplication file system. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST’08). 269--282.

Cited By

View all
  • (2024)Encrypted Data Reduction: Removing Redundancy from Encrypted Data in Outsourced StorageACM Transactions on Storage10.1145/368527820:4(1-30)Online publication date: 29-Jul-2024
  • (2024)LSDedup: Layered Secure Deduplication for Cloud StorageIEEE Transactions on Computers10.1109/TC.2023.333195373:2(422-435)Online publication date: 1-Feb-2024
  • (2024)A Secure and Lightweight Client-Side Deduplication Approach for Resisting Side Channel AttacksICC 2024 - IEEE International Conference on Communications10.1109/ICC51166.2024.10622721(1400-1406)Online publication date: 9-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 16, Issue 1
ATC 2019 Special Section and Regular Papers
February 2020
155 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/3386184
  • Editor:
  • Sam H. Noh
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2020
Accepted: 01 October 2019
Revised: 01 August 2019
Received: 01 March 2019
Published in TOS Volume 16, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Frequency analysis
  2. cloud storage
  3. encrypted deduplication

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • National Natural Science Foundation of China
  • Open Research Project of the State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences
  • Research Grants Council of Hong Kong
  • National Key R8D Program of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)62
  • Downloads (Last 6 weeks)9
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Encrypted Data Reduction: Removing Redundancy from Encrypted Data in Outsourced StorageACM Transactions on Storage10.1145/368527820:4(1-30)Online publication date: 29-Jul-2024
  • (2024)LSDedup: Layered Secure Deduplication for Cloud StorageIEEE Transactions on Computers10.1109/TC.2023.333195373:2(422-435)Online publication date: 1-Feb-2024
  • (2024)A Secure and Lightweight Client-Side Deduplication Approach for Resisting Side Channel AttacksICC 2024 - IEEE International Conference on Communications10.1109/ICC51166.2024.10622721(1400-1406)Online publication date: 9-Jun-2024
  • (2024)A randomized encryption deduplication method against frequency attackJournal of Information Security and Applications10.1016/j.jisa.2024.10377483(103774)Online publication date: Jun-2024
  • (2024)Privacy leakage of certificateless public key authenticated searchable encryption via frequency analysis: Attacks and revisesComputer Standards & Interfaces10.1016/j.csi.2023.10376287(103762)Online publication date: Jan-2024
  • (2024)Redundancy elimination in IoT oriented big data: a survey, schemes, open challenges and future applicationsCluster Computing10.1007/s10586-023-04209-127:1(1063-1087)Online publication date: 1-Feb-2024
  • (2024)Convergent encryption enabled secure data deduplication algorithm for cloud environmentConcurrency and Computation: Practice and Experience10.1002/cpe.820536:21Online publication date: 21-Jun-2024
  • (2023)Threat Model and Defense Scheme for Side-Channel Attacks in Client-Side DeduplicationTsinghua Science and Technology10.26599/TST.2021.901007128:1(1-12)Online publication date: Feb-2023
  • (2023)Double-Layer Search and Adaptive Pooling Fusion for Reference-Based Image Super-ResolutionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360493720:1(1-23)Online publication date: 25-Aug-2023
  • (2023)Scalable and Popularity-Based Secure Deduplication Schemes With Fully Random TagsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.328517321:3(1484-1500)Online publication date: 13-Jun-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media