Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Design and Implementation of Deduplication on F2FS

Published: 06 August 2024 Publication History

Abstract

Data deduplication technology has gained popularity in modern file systems due to its ability to eliminate redundant writes and improve storage space efficiency. In recent years, the flash-friendly file system (F2FS) has been widely adopted in flash memory-based storage devices, including smartphones, fast-speed servers, and Internet of Things. In this article, we propose F2DFS (deduplication-based F2FS), which introduces three main design contributions. First, F2DFS integrates inline and offline hybrid deduplication. Inline deduplication eliminates redundant writes and enhances flash device endurance, while offline deduplication mitigates the negative I/O performance impact and saves more storage space. Second, F2DFS follows the file system coupling design principle, effectively leveraging the potentials and benefits of both deduplication and native F2FS. Also, with the aid of this principle, F2DFS achieves high-performance and space-efficient incremental deduplication. Third, F2DFS adopts virtual indexing to mitigate deduplication-induced many-to-one mapping updates during the segment cleaning. We conducted comprehensive experimental comparisons between F2DFS, native F2FS, and other state-of-the-art deduplication schemes, using both synthetic and real-world workloads. For inline deduplication, F2DFS outperforms SmartDedup, Dmdedup, and ZFS, in terms of both I/O bandwidth performance and deduplication rates. And for offline deduplication, compared to SmartDedup, XFS, and BtrFS, F2DFS shows higher execution efficiency, lower resource usage, and greater storage space savings. Moreover, F2DFS demonstrates more efficient segment cleanings than native F2FS.

References

[1]
Adam Manzanares, Noah Watkins, Cyril Guyot, Damien Le Moal, Carlos Maltzahn, and Zvonimr Bandic. 2016. ZEA: A data management approach for SMR. In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’16).
[2]
Mohammadamin Ajdari, Pyeongsu Park, Joonsung Kim, Dongup Kwon, and Jangwoo Kim. 2019. CIDR: A cost-effective in-line data reduction system for terabit-per-second scale SSD arrays. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’19).
[3]
Mohammadamin Ajdari, Pyeongsu Park, Dongup Kwon, Joonsung Kim, and Jangwoo Kim. 2017. A scalable HW-based inline deduplication for SSD arrays. IEEE Comput. Archit. Lett. 17, 1 (2017).
[4]
Sandra Moulton. 2014. NetApp Data compression and deduplication deployment and implementation guide. Clustered Data ONTAP, NetApp Technical Report (TR-3966).
[5]
Jens Axboe. 2023. FIO: Flexible I/O Tester Synthetic Benchmark. Retrieved from https://git.kernel.dk/cgit/fio
[6]
Andrei Bacs, Saidgani Musaev, Kaveh Razavi, Cristiano Giuffrida, and Herbert Bos. 2022. DUPEFS: Leaking data over the network with filesystem deduplication side channels. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’22).
[7]
Ashish Bijlani and Umakishore Ramachandran. 2019. Extension framework for file systems in user space. In Proceedings of the USENIX Annual Technical Conference (ATC’19).
[8]
Artem B. Bityuckiy. 2005. JFFS3 design issues. Memory Technology Device (MTD) Subsystem for Linux. Retrieved from http://www.linux-mtd.infradead.org
[9]
Matias Bjørling, Abutalib Aghayev, Hans Holmberg, Aravind Ramesh, Damien Le Moal, Gregory R. Ganger, and George Amvrosiadis. 2021. ZNS: Avoiding the block interface tax for flash-based SSDs. In Proceedings of the USENIX Annual Technical Conference (ATC’21).
[10]
William J. Bolosky, Scott Corbin, David Goebel, and John R. Douceur. 2000. Single instance storage in Windows 2000. In Proceedings of the USENIX Windows Systems Symposium.
[11]
Jeff Bonwick, Matt Ahrens, Val Henson, Mark Maybee, and Mark Shellenbaum. 2003. The zettabyte file system. Atlas Zeolite Framew. 215, 1 (2003).
[12]
Jeff Bonwick and Bill Moore. 2007. ZFS: The Last Word in File Systems. Retrieved from https://www.snia.org/sites/default/orig/sdc_archives/2008_presentations/monday/JeffBonwick-BillMoore_ZFS.pdf
[13]
Fabiano C. Botelho, Philip Shilane, Nitin Garg, and Windsor Hsu. 2013. Memory efficient sanitization of a deduplicated storage system. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’13).
[14]
Jeramiah Bowling. 2013. Opendedup: Open-source deduplication put to the test. Linux J. 2013, 228 (2013).
[15]
Jody Bruchon. 2023. Jdupes: A Powerful Duplicate File Finder and an Enhanced Fork of Fdupes. Retrieved from https://github.com/jbruchon/jdupes
[16]
BtrFS. 2023. Deduplication Metadata Write Amplification Problem in BtrFS. Retrieved from https://btrfs.readthedocs.io/en/latest/Hardware.html
[17]
BtrFS. 2023. The Documentation of BtrFS. Retrieved from https://btrfs.readthedocs.io/en/latest
[18]
Zhichao Cao, Shiyong Liu, Fenggang Wu, Guohua Wang, Bingzhe Li, and David H. C. Du. 2019. Sliding look-back window assisted data chunk rewriting for improving deduplication restore performance. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’19).
[19]
Zhichao Cao, Hao Wen, Xiongzi Ge, Jingwei Ma, Jim Diehl, and David H. C. Du. 2019. TDDFS: A tier-aware data deduplication-based file system. ACM Trans. Stor. 15, 1 (2019).
[20]
João Carlos Menezes Carreira, Rodrigo Rodrigues, George Candea, and Rupak Majumdar. 2012. Scalable testing of file system checkers. In Proceedings of the ACM European Conference on Computer Systems (EuroSys’12).
[21]
Feng Chen, Tian Luo, and Xiaodong Zhang. 2011. CAFTL: A content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’11).
[22]
Ying Chen, Zili Shao, Qingfeng Zhuge, Chun Xue, Bin Xiao, and E. H.-M. Sha. 2005. Minimizing energy via loop scheduling and DVS for multi-core embedded systems. In Proceedings of the IEEE International Conference on Parallel and Distributed Systems (ICPADS’05).
[23]
Zhuan Chen and Kai Shen. 2016. OrderMergeDedup: Efficient, failure-consistent deduplication on flash. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’16).
[24]
Vijay Chidambaram, Tushar Sharma, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2012. Consistency without ordering. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’12).
[25]
Austin T. Clements, Irfan Ahmad, Murali Vilayannur, and Jinyuan Li. 2009. Decentralized deduplication in SAN cluster file systems. In Proceedings of the USENIX Annual Technical Conference (ATC’09).
[26]
SMART. 2023. smartctl: Control and Monitor Utility for the Self-Monitoring, Analysis and Reporting Technology (S.M.A.R.T.) Disks. Retrieved from https://www.smartmontools.org
[27]
Intel Corporation. 2023. Intel(R) Intelligent Storage Acceleration Library Crypto Version. Retrieved from https://github.com/intel/isa-l_crypto
[28]
Quynh H. Dang. 2015. Secure hash algorithm. Retrieved from https://www.nist.gov/publications/secure-hash-standard
[29]
Chris Dragga and Douglas J. Santry. 2016. GCTrees: Garbage collecting snapshots. ACM Trans. Stor. 12, 1 (2016), 1–32.
[30]
Chunfeng Du, Suzhen Wu, Jiapeng Wu, Bo Mao, and Shengzhe Wang. 2023. ESD: An ECC-assisted and selective deduplication for encrypted non-volatile main memory. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’23).
[31]
Subramanya R. Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System software for persistent memory. In Proceedings of the ACM European Conference on Computer Systems (EuroSys’14).
[32]
Ahmed El-Shimi, Ran Kalach, Ankit Kumar, Adi Ottean, Jin Li, and Sudipta Sengupta. 2012. Primary data deduplication-large scale study and system design. In Proceedings of the USENIX Annual Technical Conference (ATC’12).
[33]
Marc Eshel, Roger L. Haskin, Dean Hildebrand, Manoj Naik, Frank B. Schmuck, and Renu Tewari. 2010. Panache: A parallel file system cache for global file access. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’10).
[34]
EXT4, BtrFS, and XFS. 2013. Defragmentation Mechanism of Modern File Systems. Retrieved from https://manpages.ubuntu.com/manpages/trusty/man8/e4defrag.8.html
[35]
Heitor Faria, J. Luiz Bordim, and P. Solis Barreto. 2017. Backup storage block level deduplication with DDUMBFS and BACULA. Int. J. Advanc. Inf. Technol. 7, 4 (2017).
[36]
Mark Fasheh. 2023. duperemove: Tools for Finding Duplicated Extents and Submitting them for Deduplication. Retrieved from http://markfasheh.github.io/duperemove
[37]
Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen, Jingning Liu, Wen Xia, Fangting Huang, and Qing Liu. 2015. Reducing fragmentation for in-line deduplication backup storage via exploiting backup history and cache knowledge. IEEE Trans. Parallel Distrib. Syst. 27, 3 (2015).
[38]
Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen, Wen Xia, Fangting Huang, and Qing Liu. 2014. Accelerating restore and garbage collection in deduplication-based backup systems via exploiting historical information. In Proceedings of the USENIX Annual Technical Conference (ATC’14).
[39]
Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen, Wen Xia, Yucheng Zhang, and Yujuan Tan. 2015. Design tradeoffs for data deduplication performance in backup workloads. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’15).
[40]
Min Fu, Shujie Han, Patrick P. C. Lee, Dan Feng, Zuoning Chen, and Yu Xiao. 2018. A simulation analysis of redundancy and reliability in primary storage deduplication. IEEE Trans. Comput. 67, 9 (2018).
[41]
Min Fu, Patrick P. C. Lee, Dan Feng, Zuoning Chen, and Yu Xiao. 2016. A simulation analysis of reliability in primary storage deduplication. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’16).
[42]
Yinjin Fu, Hong Jiang, Nong Xiao, Lei Tian, Fang Liu, and Lei Xu. 2013. Application-aware local-global source deduplication for cloud backup services of personal storage. IEEE Trans. Parallel Distrib. Syst. 25, 5 (2013).
[43]
Om Rameshwar Gatla, Mai Zheng, Muhammad Hameed, Viacheslav Dubeyko, Adam Manzanares, Filip Blagojevic, Cyril Guyot, and Robert Mateescu. 2018. Towards robust file system checkers. ACM Trans. Stor. 14, 4 (2018).
[44]
Fanglu Guo and Petros Efstathopoulos. 2011. Building a high-performance deduplication system. In Proceedings of the USENIX Annual Technical Conference (ATC’11).
[45]
Peizhen Guo and Wenjun Hu. 2018. Potluck: Cross-application approximate deduplication for computation-intensive mobile applications. In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’18).
[46]
Aayush Gupta, Raghav Pisolkar, Bhuvan Urgaonkar, and Anand Sivasubramaniam. 2011. Leveraging value locality in optimizing NAND flash-based SSDs. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’11).
[47]
Sangwook Shane Hahn, Sungjin Lee, Cheng Ji, Li-Pin Chang, Inhyuk Yee, Liang Shi, Chun Jason Xue, and Jihong Kim. 2017. Improving file system performance of mobile storage systems using a decoupled defragmenter. In Proceedings of the USENIX Annual Technical Conference (ATC’17).
[48]
Kyuhwa Han, Hyunho Gwak, Dongkun Shin, and Jooyoung Hwang. 2021. ZNS+: Advanced zoned namespace interface for supporting in-storage zone compaction. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’21).
[49]
Kyuhwa Han, Hyukjoong Kim, and Dongkun Shin. 2019. WAL-SSD: Address remapping-based write-ahead-logging solid-state disks. IEEE Trans. Comput. 69, 2 (2019).
[50]
Tyler Harter, Chris Dragga, Michael Vaughn, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2012. A file is not a file: Understanding the I/O behavior of apple desktop applications. ACM Trans. Comput. Syst. 30, 3 (2012).
[51]
Christoph Hellwig. 2009. XFS: The big storage file system for Linux. USENIX & SAGE Login Mag. 34, 5 (2009).
[52]
Yige Hu, Zhiting Zhu, Ian Neal, Youngjin Kwon, Tianyu Cheng, Vijay Chidambaram, and Emmett Witchel. 2019. TxFS: Leveraging file-system crash consistency to provide ACID transactions. ACM Trans. Stor. 15, 2 (2019).
[53]
Joo-Young Hwang and Samsung Electronics Co. Ltd.2013. Flash-friendly file system (F2FS). In Embedded Linux Conference (ELC’13).
[54]
William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, Michael Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2015. BetrFS: A right-optimized write-optimized file system. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’15).
[55]
Daeho Jeong, Youngjae Lee, and Jin-Soo Kim. 2015. Boosting quasi-asynchronous I/O for better responsiveness in mobile devices. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’15).
[56]
Sooman Jeong, Kisung Lee, Seongjin Lee, Seoungbum Son, and Youjip Won. 2013. I/O stack optimization for smartphones. In Proceedings of the USENIX Annual Technical Conference (ATC’13).
[57]
Cheng Ji, Li-Pin Chang, Riwei Pan, Chao Wu, Congming Gao, Liang Shi, Tei-Wei Kuo, and Chun Jason Xue. 2021. Pattern-guided file compression with user-experience enhancement for log-structured file system on mobile devices. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’21).
[58]
William K. Josephson, Lars A. Bongo, Kai Li, and David Flynn. 2010. DFS: A file system for virtualized flash storage. ACM Trans. Stor. 6, 3 (2010), 1–25.
[59]
Ram Kesavan, Matthew Curtis-Maury, Vinay Devadas, and Kesari Mishra. 2019. Storage gardening: Using a virtualization layer for efficient defragmentation in the WAFL file system. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’19).
[60]
Dohyun Kim, Kwangwon Min, Joontaek Oh, and Youjip Won. 2022. ScaleXFS: Getting scalability of XFS back on the ring. In Proceedings of the USENIX Annual Technical Conference (ATC’22).
[61]
Jaegeuk Kim. 2017. Crosshatch: Switch Userdata Filesystem From EXT4 to F2FS. Retrieved from https://android.googlesource.com/device/google/crosshatch/+/a0d74ba2c0b943c6370288b13ade0cf6c4868da2
[62]
Jaegeuk Kim. 2023. The Documentation of F2FS. Retrieved from https://jaegeuk.github.io
[63]
Jongseok Kim, Cassiano Campes, Joo-Young Hwang, Jinkyu Jeong, and Euiseong Seo. 2021. Z-journal: Scalable per-core journaling. In Proceedings of the USENIX Annual Technical Conference (ATC’21).
[64]
Juwon Kim, Minsu Kim, Muhammad Danish Tehseen, Joontaek Oh, and YouJip Won. 2022. IPLFS: Log-structured file system without garbage collection. In Proceedings of the USENIX Annual Technical Conference (ATC’22).
[65]
Ryusuke Konishi, Yoshiji Amagai, Koji Sato, Hisashi Hifumi, Seiji Kihara, and Satoshi Moriai. 2006. The Linux implementation of a log-structured file system. ACM SIGOPS Oper. Syst. Rev. 40, 3 (2006).
[66]
Iwona Kotlarska, Andrzej Jackowski, Krzysztof Lichota, Michal Welnicki, Cezary Dubnicki, and Konrad Iwanicki. 2023. InftyDedup: Scalable and cost-effective cloud tiering with deduplication. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’23).
[67]
Kai Krakow, Timofey Titovets, and Jiahao Xu. 2023. Best-Effort Extent-Same (BEES). Retrieved from https://github.com/Zygo/bees
[68]
Hyungjoon Kwon, Yonghyeon Cho, Awais Khan, Yeohyeon Park, and Youngjae Kim. 2022. DENOVA: Deduplication extended NOVA file system. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’22).
[69]
VISA Laboratory. 2022. Smartphone File System I/O Traces from Research Laboratory for Virtualized Infrastructures, Systems and Applications. Retrieved from http://visa.lab.asu.edu/web/resources/traces
[70]
Lakshmipathi. 2023. Dduper: Fast Block-level Out-of-band BtrFS Deduplication Tool. Retrieved from https://github.com/lakshmipathi/dduper
[71]
Changman Lee, Dongho Sim, Jooyoung Hwang, and Sangyeun Cho. 2015. F2FS: A new file system for flash storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’15).
[72]
Chang-Gyu Lee, Sunghyun Noh, Hyeongu Kang, Soon Hwang, and Youngjae Kim. 2021. Concurrent file metadata structure using readers-writer lock. In Proceedings of the Annual ACM/SIGAPP Symposium on Applied Computing (SAC’21).
[73]
Eunji Lee, Julie Kim, Hyokyung Bahn, Sunjin Lee, and Sam H. Noh. 2017. Reducing write amplification of flash storage through cooperative data management with NVM. ACM Trans. Stor. 13, 2 (2017), 1–13.
[74]
James Lembke, Pierre-Louis Roman, and Patrick Eugster. 2022. DEFUSE: An interface for fast and correct user space file system access. ACM Trans. Stor. 18, 3 (2022).
[75]
Cheng Li, Philip Shilane, Fred Douglis, Hyong Shim, Stephen Smaldone, and Grant Wallace. 2014. Nitro: A capacity-optimized SSD cache for primary storage. In Proceedings of the USENIX Annual Technical Conference (ATC’14).
[76]
Wenji Li, Gregory Jean-Baptise, Juan Riveros, Giri Narasimhan, Tony Zhang, and Ming Zhao. 2016. CacheDedup: In-line deduplication for flash caching. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’16).
[77]
Yan-Kit Li, Min Xu, Chun-Ho Ng, and Patrick P. C. Lee. 2014. Efficient hybrid inline and out-of-line deduplication for backup storage. ACM Trans. Stor. 11, 1 (2014).
[78]
Yu Liang, Chenchen Fu, Yajuan Du, Aosong Deng, Mengying Zhao, Liang Shi, and Chun Jason Xue. 2017. An empirical study of F2FS on mobile devices. In Proceedings of the IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA’17).
[79]
Yu Liang, Cheng Ji, Chenchen Fu, Rachata Ausavarungnirun, Qiao Li, Riwei Pan, Siyu Chen, Liang Shi, Tei-Wei Kuo, and Chun Jason Xue. 2020. iTRIM: I/O-aware TRIM for improving user experience on mobile devices. IEEE Trans. Comput.-Aid. Des. Integ. Circ. Syst. 40, 9 (2020), 1782–1795.
[80]
Xiaojian Liao, Youyou Lu, Erci Xu, and Jiwu Shu. 2021. Max: A multicore-accelerated file system for flash storage. In Proceedings of the USENIX Annual Technical Conference (ATC’21).
[81]
Mark Lillibridge, Kave Eshghi, and Deepavali Bhagwat. 2013. Improving restore speed for backup systems that use inline chunk-based deduplication. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’13).
[82]
Linux. 2023. The Overview of the Linux Virtual File System (VFS). Retrieved from https://www.kernel.org/doc/html/latest/filesystems/vfs.html
[83]
Tiantian Liu, Yingchao Zhao, Minming Li, and Chun Jason Xue. 2010. Task assignment with cache partitioning and locking for WCET minimization on MPSoC. In Proceedings of the IEEE International Conference on Parallel Processing (ICPP’10).
[84]
Fengqi Lu and Wenruo Qu. 2023. Development of Userspace BtrFS Tools with Inline Deduplication Support. Retrieved from https://github.com/kdave/btrfs-progs
[85]
Youyou Lu, Jiwu Shu, and Weimin Zheng. 2013. Extending the lifetime of flash-based storage through reducing write amplification from file systems. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’13).
[86]
Sonam Mandal, Geoff Kuenning, Dongju Ok, Varun Shastry, Philip Shilane, Sun Zhen, Vasily Tarasov, and Erez Zadok. 2016. Using hints to improve inline block-layer deduplication. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’16).
[87]
Stathis Maneas, Kaveh Mahdaviani, Tim Emami, and Bianca Schroeder. 2022. Operational characteristics of SSDs in enterprise storage systems: A large-scale field study. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’22).
[88]
Bo Mao, Jindong Zhou, Suzhen Wu, Hong Jiang, Xiao Chen, and Weijian Yang. 2018. Improving flash memory performance and reliability for smartphones with I/O deduplication. IEEE Trans. Comput.-Aid. Des. Integ. Circ. Syst. 38, 6 (2018).
[89]
Sara McAllister, Benjamin Berg, Julian Tutuncu-Macias, Juncheng Yang, Sathya Gunasekar, Jimmy Lu, Daniel S. Berger, Nathan Beckmann, and Gregory R. Ganger. 2021. Kangaroo: Caching billions of tiny objects on flash. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’21).
[90]
Dirk Meister, André Brinkmann, and Tim Süß. 2013. File recipe compression in data deduplication systems. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’13).
[91]
Dirk Meister, Jurgen Kaiser, Andre Brinkmann, Toni Cortes, Michael Kuhn, and Julian Kunkel. 2012. A study on data deduplication in HPC storage systems. In Proceedings of the IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12).
[92]
Jai Menon, David A. Pease, Robert Rees, Linda Duyanovich, and Bruce Hillsberg. 2003. IBM Storage Tank: A heterogeneous scalable SAN file system. IBM Syst. J. 42, 2 (2003).
[93]
Dutch T. Meyer and William Joseph Bolosky. 2012. A study of practical deduplication. ACM Trans. Stor. 7, 4 (2012).
[94]
MIME. 2023. Multipurpose Internet Mail Extensions (MIME) Types. Retrieved from https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types
[95]
Changwoo Min, Kangnyeon Kim, Hyunjin Cho, Sang-Won Lee, and Young Ik Eom. 2012. SFS: Random write considered harmful in solid state drives. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’12).
[96]
Amar More, M. A. E. Alandi, Zishan Shaikh, and Vishal Salve. 2012. DEXT3: Block level inline deduplication for Ext3 file system. In Proceedings of the Linux Symposium.
[97]
Kiran-Kumar Muniswamy-Reddy, Charles P. Wright, Andrew Himmer, and Erez Zadok. 2004. A versatile and user-oriented versioning file system. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’04).
[98]
Athicha Muthitacharoen, Benjie Chen, and David Mazieres. 2001. A low-bandwidth network file system. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’01).
[99]
Chun-Ho Ng, Mingcao Ma, Tsz-Yeung Wong, Patrick P. C. Lee, and John Lui. 2011. Live deduplication storage of virtual machine images in an open-source cloud. In Proceedings of the ACM/IFIP/USENIX International Middleware Conference (Middleware’11).
[100]
Fan Ni and Song Jiang. 2019. RapidCDC: Leveraging duplicate locality to accelerate chunking in CDC-based deduplication systems. In Proceedings of the ACM Symposium on Cloud Computing (SoCC’19).
[101]
Lars Nielsen, Dorian Burihabwa, Valerio Schiavoni, Pascal Felber, and Daniel E. Lucani. 2021. MinervaFS: A user-space file system for generalised deduplication. In Proceedings of the IEEE International Symposium on Reliable Distributed Systems (SRDS’21).
[102]
Peter Odding. 2023. DedupFS: A Deduplicating FUSE File System Written in Python. Retrieved from https://github.com/xolox/dedupfs
[103]
Joontaek Oh, Sion Ji, Yongjin Kim, and Youjip Won. 2022. exF2FS: Transaction support in log-structured filesystem. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’22).
[104]
OpenSSL. 2023. Open Secure Sockets Layer (OpenSSL). Retrieved from https://www.openssl.org
[105]
OpenZFS. 2023. The Documentation of ZFS. Retrieved from https://openzfs.github.io/openzfs-docs
[106]
Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Inform. 33, 4 (1996).
[107]
Jisung Park, Jeonggyun Kim, Yeseong Kim, Sungjin Lee, and Onur Mutlu. 2022. DeepSketch: A new machine learning-based reference search technique for post-deduplication delta compression. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’22).
[108]
João Paulo and José Pereira. 2014. A survey and classification of storage deduplication systems. Comput. Surv. 47, 1 (2014).
[109]
João Paulo and José Pereira. 2016. Efficient deduplication in a distributed primary storage infrastructure. ACM Trans. Stor. 12, 4 (2016).
[110]
João Paulo, Pedro Reis, José Pereira, and Antonio Sousa. 2012. DEDISbench: A benchmark for deduplicated storage systems. In Proceedings of the Confederated International Conferences on the Move to Meaningful Internet Systems (OTM’12).
[111]
João Paulo, Pedro Reis, José Pereira, and Antonio Sousa. 2020. DEDISbench: A Disk I/O Block-based Benchmark for Deduplication Systems. Retrieved from https://github.com/jtpaulo/dedisbench
[112]
Gabriel De Perthuis. 2016. Bedup: Deduplication for New and Changed Files on BtrFS. Retrieved from https://pypi.org/project/bedup
[113]
Jiansheng Qiu, Yanqi Pan, Wen Xia, Xiaojia Huang, Wenjun Wu, Xiangyu Zou, Shiyi Li, and Yu Hua. 2023. Light-Dedup: A light-weight inline deduplication framework for non-volatile memory file systems. In Proceedings of the USENIX Annual Technical Conference (ATC’23).
[114]
Sheng Qiu and AL Narasimha Reddy. 2013. NVMFS: A hybrid file system for improving random write in NAND-flash SSD. In Proceedings of the IEEE Symposium on Mass Storage Systems and Technologies (MSST’13).
[115]
Sean Quinlan and Sean Dorward. 2002. Venti: A new approach to archival data storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’02).
[116]
Ronald Rivest. 1992. The MD5 message-digest algorithm. RFC Editor Technical Report (RFC1321).
[117]
Ohad Rodeh, Josef Bacik, and Chris Mason. 2013. BtrFS: The Linux B-Tree filesystem. ACM Trans. Stor. 9, 3 (2013).
[118]
Liana V. Rodriguez, Farzana Yusuf, Steven Lyons, Eysler Paz, Raju Rangaswami, Jason Liu, Ming Zhao, and Giri Narasimhan. 2021. Learning cache replacement with CACHEUS. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’21).
[119]
Mendel Rosenblum and John K. Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1 (1992).
[120]
Margo I. Seltzer, Keith Bostic, Marshall K. McKusick, and Carl Staelin. 1993. An implementation of a log-structured file system for UNIX. In Proceedings of the USENIX Winter Conference.
[121]
Dongjoo Seo, Ping-Xiang Chen, Huaicheng Li, Matias Bjørling, and Nikil Dutt. 2021. Is garbage collection overhead gone? Case study of F2FS on ZNS SSDs. In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’21).
[122]
Philip Shilane, Ravi Chitloor, and Uday Kiran Jonnala. 2016. 99 deduplication problems. In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’16).
[123]
Chunlin Song, Xianzhang Chen, Duo Liu, Jiali Li, Yujuan Tan, and Ao Ren. 2023. Optimizing the performance of consistency-aware deduplication using persistent memory. IEEE Trans. Comput.-Aid. Des. Integ. Circ. Syst. 43, 6 (2023).
[124]
Richard P. Spillane, Wenguang Wang, Luke Lu, Maxime Austruy, Rawlinson Rivera, and Christos Karamanolis. 2016. Exo-clones: Better container runtime image management across the clouds. In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’16).
[125]
Kiran Srinivasan, Timothy Bisson, Garth R. Goodson, and Kaladhar Voruganti. 2012. iDedup: Latency-aware, inline data deduplication for primary storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’12).
[126]
Adam Sweeney, Doug Doucette, Wei Hu, Curtis Anderson, and Geoff Peck. 1996. Scalability in the XFS file system. In Proceedings of the USENIX Annual Technical Conference (ATC’96).
[127]
Yujuan Tan, Baiping Wang, Jian Wen, Zhichao Yan, Hong Jiang, and Witawas Srisa-an. 2018. Improving restore performance in deduplication-based backup systems via a fine-grained defragmentation approach. IEEE Trans. Parallel Distrib. Syst. 29, 10 (2018).
[128]
Yujuan Tan, Jing Xie, Congcong Xu, Zhichao Yan, Hong Jiang, Yajun Zhao, Min Fu, Xianzhang Chen, Duo Liu, and Wen Xia. 2019. CDAC: Content-driven deduplication-aware storage cache. In Proceedings of the IEEE Symposium on Mass Storage Systems and Technologies (MSST’19).
[129]
Yan Tang, Jianwei Yin, Shuiguang Deng, and Ying Li. 2016. DIODE: Dynamic inline-offline DE duplication providing efficient space-saving and read/write performance for primary storage systems. In Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’16).
[130]
Vasily Tarasov, Deepak Jain, Geoff Kuenning, Sonam Mandal, Karthikeyani Palanisami, Philip Shilane, Sagar Trehan, and Erez Zadok. 2014. Dmdedup: Device mapper target for data deduplication. In Proceedings of the Ottawa Linux Symposium (OLS’14).
[131]
Vasily Tarasov, Deepak Jain, Geoff Kuenning, Sonam Mandal, Karthikeyani Palanisami, Philip Shilane, Sagar Trehan, and Erez Zadok. 2018. Device Mapper Deduplication. Retrieved from https://github.com/orgs/dmdedup/repositories
[132]
Nick Tehrany. 2023. msF2FS: Design and Implementation of an NVMe ZNS SSD Optimized F2FS File System. Retrieved from https://github.com/nicktehrany/msF2FS
[133]
Yingying Tian, Samira M. Khan, Daniel A. Jiménez, and Gabriel H. Loh. 2014. Last-level cache deduplication. In Proceedings of the ACM International Conference on Supercomputing (ICS’14).
[134]
Linus Torvalds. 2023. Publications of Linux Kernel Versions. Retrieved from https://kernel.org/pub/linux/kernel
[135]
Canonical Ltd. Ubuntu. 2023. Ubuntu Cloud Images. Retrieved from https://cloud-images.ubuntu.com
[136]
Bharath Kumar Reddy Vangoor, Prafful Agarwal, Manu Mathew, Arun Ramachandran, Swaminathan Sivaraman, Vasily Tarasov, and Erez Zadok. 2019. Performance and resource utilization of fuse user-space file systems. ACM Trans. Stor. 15, 2 (2019).
[137]
Akshat Verma, Ricardo Koller, Luis Useche, and Raju Rangaswami. 2010. FIU I/O deduplication and sample-replicate-consolidate mapping traces. In Storage Networking Industry Association (SNIA), Input/Output Traces, Tools, and Analysis Technical Work Group (IOTTA TWG) Trace Repository. Storage Networking Industry Association. Retrieved from http://iotta.snia.org/traces/block-io?only=390
[138]
VxFS. 2023. Veritas File System (VxFS) Deduplication. Retrieved from https://www.veritas.com
[139]
Chundong Wang, Qingsong Wei, Jun Yang, Cheng Chen, Yechao Yang, and Mingdi Xue. 2017. NV-Dedup: High-performance inline deduplication for non-volatile memory. IEEE Trans. Comput. 67, 5 (2017).
[140]
Qiuping Wang, Jinhong Li, Wen Xia, Erik Kruus, Biplob Debnath, and Patrick P. C. Lee. 2020. Austere flash caching with deduplication and compression. In Proceedings of the USENIX Annual Technical Conference (ATC’20).
[141]
Jiansheng Wei, Hong Jiang, Ke Zhou, and Dan Feng. 2010. MAD2: A scalable high-throughput exact deduplication approach for network backup services. In Proceedings of the IEEE Symposium on Mass Storage Systems and Technologies (MSST’10).
[142]
Zev Weiss, Sriram Subramanian, Swaminathan Sundararaman, Nisha Talagala, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2015. ANViL: Advanced virtualization for modern non-volatile memory devices. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’15).
[143]
Yuhong Wen, Xiaogang Zhao, You Zhou, Tong Zhang, Shangjun Yang, Changsheng Xie, and Fei Wu. 2024. Eliminating storage management overhead of deduplication over SSD arrays through a hardware/software co-design. In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’24).
[144]
Youjip Won, Jaemin Jung, Gyeongyeol Choi, Joontaek Oh, Seongbae Son, Jooyoung Hwang, and Sangyeun Cho. 2018. Barrier-enabled IO stack for flash storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’18).
[145]
Chun-Feng Wu, Martin Kuo, Ming-Chang Yang, and Yuan-Hao Chang. 2021. Performance enhancement of SMR-based deduplication systems. IEEE Trans. Comput.-Aid. Des. Integ. Circ. Syst. 41, 9 (2021).
[147]
Wen Xia, Hong Jiang, Dan Feng, Fred Douglis, Philip Shilane, Yu Hua, Min Fu, Yucheng Zhang, and Yukun Zhou. 2016. A comprehensive study of the past, present, and future of data deduplication. Proc. IEEE 104, 9 (2016).
[148]
Wen Xia, Hong Jiang, Dan Feng, and Lei Tian. 2015. DARE: A deduplication-aware resemblance detection and elimination scheme for data reduction with low overheads. IEEE Trans. Comput. 65, 6 (2015), 1692–1705.
[149]
Wen Xia, Hong Jiang, Dan Feng, Lei Tian, Min Fu, and Yukun Zhou. 2014. Ddelta: A deduplication-inspired fast delta compression approach. Perform. Eval.: Int. J. 79 (2014), 258–272.
[150]
Wen Xia, Yukun Zhou, Hong Jiang, Dan Feng, Yu Hua, Yuchong Hu, Qing Liu, and Yucheng Zhang. 2016. FastCDC: A fast and efficient content-defined chunking approach for data deduplication. In Proceedings of the USENIX Annual Technical Conference (ATC’16).
[151]
Jian Xu and Steven Swanson. 2016. NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’16).
[152]
Lianghong Xu, Andrew Pavlo, Sudipta Sengupta, and Gregory R. Ganger. 2017. Online deduplication for databases. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’17).
[153]
Dongliang Xue, Linpeng Huang, Chao Li, and Chentao Wu. 2019. Dapper: An adaptive manager for large-capacity persistent memory. IEEE Trans. Comput. 68, 7 (2019), 1019–1034.
[154]
Zhichao Yan, Hong Jiang, Song Jiang, Yujuan Tan, and Hao Luo. 2019. SES-Dedup: A case for low-cost ECC-based SSD deduplication. In Proceedings of the IEEE Symposium on Mass Storage Systems and Technologies (MSST’19).
[155]
Lihua Yang, Zhipeng Tan, Fang Wang, Dan Feng, Hongwei Qin, Shiyun Tu, Jiaxing Qian, and Yuting Zhao. 2021. Improving F2FS performance in mobile devices with adaptive reserved space based on traceback. IEEE Trans. Comput.-Aid. Des. Integ. Circ. Syst. 41, 1 (2021).
[156]
Lihua Yang, Zhipeng Tan, Fang Wang, Shiyun Tu, and Jicheng Shao. 2021. M2H: Optimizing F2FS via multi-log delayed writing and modified segment cleaning based on dynamically identified hotness. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE’21).
[157]
Qirui Yang, Runyu Jin, and Ming Zhao. 2019. SmartDedup: Optimizing deduplication for resource-constrained devices. In Proceedings of the USENIX Annual Technical Conference (ATC’19).
[158]
Tianming Yang, Hong Jiang, Dan Feng, Zhongying Niu, Ke Zhou, and Yaping Wan. 2010. DEBAR: A scalable high-performance de-duplication storage system for backup and archiving. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’10).
[159]
Miao-Chiang Yen, Shih-Yi Chang, and Li-Pin Chang. 2018. Lightweight, integrated data deduplication for write stress reduction of mobile flash storage. IEEE Trans. Comput.-Aid. Des. Integ. Circ. Syst. 37, 11 (2018).
[160]
Jianwei Yin, Yan Tang, Shuiguang Deng, Ying Li, and Albert Y. Zomaya. 2017. D\(^\text{3}\): A dynamic dual-phase deduplication framework for distributed primary storage. IEEE Trans. Comput. 67, 2 (2017).
[161]
Jianwei Yin, Yan Tang, Shuiguang Deng, Bangpeng Zheng, and Albert Y. Zomaya. 2020. MUSE: A multi-tiered and SLA-driven deduplication framework for cloud storage systems. IEEE Trans. Comput. 70, 5 (2020).
[162]
Chao Yu. 2020. Support Age-threshold based Garbage Collection for F2FS. Retrieved from https://lwn.net/Articles/828027
[163]
Yang Zhan, Alexander Conway, Yizheng Jiao, Nirjhar Mukherjee, Ian Groombridge, Michael A. Bender, Martin Farach-Colton, William Jannen, Rob Johnson, Donald E. Porter, and Jun Yuan. 2020. How to copy files. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’20).
[164]
Datong Zhang, Yuhui Deng, Yi Zhou, Yifeng Zhu, and Xiao Qin. 2021. Improving the performance of deduplication-based backup systems via container utilization based hot fingerprint entry distilling. ACM Trans. Stor. 17, 4 (2021).
[165]
Jiacheng Zhang, Jiwu Shu, and Youyou Lu. 2016. ParaFS: A log-structured file system to exploit the internal parallelism of flash devices. In Proceedings of the USENIX Annual Technical Conference (ATC’16).
[166]
Tianmeng Zhang, Renhui Chen, Congming Gao, Youtao Zhang, and Jiwu Shu. 2023. Libra: A space-efficient, high-performance inline deduplication for emerging hybrid storage system. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA’23).
[167]
Yu Zhang, Ping Huang, Ke Zhou, Hua Wang, Jianying Hu, Yongguang Ji, and Bin Cheng. 2020. OSCA: An online-model based cache allocation scheme in cloud block storage systems. In Proceedings of the USENIX Annual Technical Conference (ATC’20).
[168]
Yuqi Zhang, Ni Xue, and Yangxu Zhou. 2021. Automatic I/O stream management based on file characteristics. In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’21).
[169]
Yucheng Zhang, Ye Yuan, Dan Feng, Chunzhi Wang, Xinyun Wu, Lingyu Yan, Deng Pan, and Shuanghong Wang. 2020. Improving restore performance for in-line backup system combining deduplication and delta compression. IEEE Trans. Parallel Distrib. Syst. 31, 10 (2020).
[170]
Nannan Zhao, Vasily Tarasov, Hadeel Albahar, Ali Anwar, Lukas Rupprecht, Dimitrios Skourtis, Arnab K. Paul, Keren Chen, and Ali R. Butt. 2020. Large-scale analysis of Docker images and performance implications for container storage systems. IEEE Trans. Parallel Distrib. Syst. 32, 4 (2020).
[171]
Xun Zhao, Yang Zhang, Yongwei Wu, Kang Chen, Jinlei Jiang, and Keqin Li. 2013. Liquid: A scalable deduplication file system for virtual machine images. IEEE Trans. Parallel Distrib. Syst. 25, 5 (2013).
[172]
Shawn Zhong, Chenhao Ye, Guanzhou Hu, Suyan Qu, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, and Michael Swift. 2023. MadFS: Per-file virtualization for userspace persistent memory filesystems. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’23).
[173]
Su Zhou, Erci Xu, Hao Wu, Yu Du, Jiacheng Cui, Wanyu Fu, Chang Liu, Yingni Wang, Wenbo Wang, Shouqu Sun, Xianfei Wang, Bo Feng, Biyun Zhu, Xin Tong, Weikang Kong, Linyan Liu, Zhongjie Wu, Jinbo Wu, Qingchao Luo, and Jiesheng Wu. 2023. SMRSTORE: A storage engine for cloud object storage on HM-SMR drives. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’23).
[174]
Yongtao Zhou, Yuhui Deng, Laurence T. Yang, Ru Yang, and Lei Si. 2018. LDFS: A low latency in-line data deduplication file system. IEEE Access 6 (2018), 15743–15753.
[175]
You Zhou, Qiulin Wu, Fei Wu, Hong Jiang, Jian Zhou, and Changsheng Xie. 2021. Remap-SSD: Safely and efficiently exploiting ssd address remapping to eliminate duplicate writes. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’21).
[176]
Benjamin Zhu, Kai Li, and R. Hugo Patterson. 2008. Avoiding the disk bottleneck in the data domain deduplication file system. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’08).
[177]
Xiangyu Zou, Wen Xia, Philip Shilane, Haijun Zhang, and Xuan Wang. 2022. Building a high-performance fine-grained deduplication framework for backup storage with high deduplication ratio. In Proceedings of the USENIX Annual Technical Conference (ATC’22).
[178]
Xiangyu Zou, Jingsong Yuan, Philip Shilane, Wen Xia, Haijun Zhang, and Xuan Wang. 2021. The dilemma between deduplication and locality: Can both be achieved? In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’21).
[179]
Pengfei Zuo, Yu Hua, Ming Zhao, Wen Zhou, and Yuncheng Guo. 2018. Improving the performance and endurance of encrypted non-volatile main memory through deduplicating writes. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’18).

Cited By

View all
  • (2024)Speed-Dedup: A New Deduplication Framework for Enhanced Performance and Reduced Overhead in Scale-Out StorageElectronics10.3390/electronics1322439313:22(4393)Online publication date: 9-Nov-2024
  • (2024)Efficient Data Placement in Deduplication Enabled ZenFS via CRC-Based PredictionIEEE Access10.1109/ACCESS.2024.352018412(197233-197246)Online publication date: 2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 20, Issue 4
November 2024
248 pages
EISSN:1553-3093
DOI:10.1145/3613729
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 August 2024
Online AM: 29 April 2024
Accepted: 30 March 2024
Revised: 16 February 2024
Received: 16 August 2023
Published in TOS Volume 20, Issue 4

Check for updates

Author Tags

  1. Deduplication
  2. F2FS
  3. file system
  4. storage system

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)945
  • Downloads (Last 6 weeks)81
Reflects downloads up to 31 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Speed-Dedup: A New Deduplication Framework for Enhanced Performance and Reduced Overhead in Scale-Out StorageElectronics10.3390/electronics1322439313:22(4393)Online publication date: 9-Nov-2024
  • (2024)Efficient Data Placement in Deduplication Enabled ZenFS via CRC-Based PredictionIEEE Access10.1109/ACCESS.2024.352018412(197233-197246)Online publication date: 2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media