Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

OrcFS: Orchestrated File System for Flash Storage

Published: 12 April 2018 Publication History

Abstract

In this work, we develop the Orchestrated File System (OrcFS) for Flash storage. OrcFS vertically integrates the log-structured file system and the Flash-based storage device to eliminate the redundancies across the layers. A few modern file systems adopt sophisticated append-only data structures in an effort to optimize the behavior of the file system with respect to the append-only nature of the Flash memory. While the benefit of adopting an append-only data structure seems fairly promising, it makes the stack of software layers full of unnecessary redundancies, leaving substantial room for improvement. The redundancies include (i) redundant levels of indirection (address translation), (ii) duplicate efforts to reclaim the invalid blocks (i.e., segment cleaning in the file system and garbage collection in the storage device), and (iii) excessive over-provisioning (i.e., separate over-provisioning areas in each layer). OrcFS eliminates these redundancies via distributing the address translation, segment cleaning (or garbage collection), bad block management, and wear-leveling across the layers. Existing solutions suffer from high segment cleaning overhead and cause significant write amplification due to mismatch between the file system block size and the Flash page size. To optimize the I/O stack while avoiding these problems, OrcFS adopts three key technical elements.
First, OrcFS uses disaggregate mapping, whereby it partitions the Flash storage into two areas, managed by a file system and Flash storage, respectively, with different granularity. In OrcFS, the metadata area and data area are maintained by 4Kbyte page granularity and 256Mbyte superblock granularity. The superblock-based storage management aligns the file system section size, which is a unit of segment cleaning, with the superblock size of the underlying Flash storage. It can fully exploit the internal parallelism of the underlying Flash storage, exploiting the sequential workload characteristics of the log-structured file system. Second, OrcFS adopts quasi-preemptive segment cleaning to prohibit the foreground I/O operation from being interfered with by segment cleaning. The latency to reclaim the free space can be prohibitive in OrcFS due to its large file system section size, 256Mbyte. OrcFS effectively addresses this issue via adopting a polling-based segment cleaning scheme. Third, the OrcFS introduces block patching to avoid unnecessary write amplification in the partial page program. OrcFS is the enhancement of the F2FS file system. We develop a prototype OrcFS based on F2FS and server class SSD with modified firmware (Samsung 843TN). OrcFS reduces the device mapping table requirement to 1/465 and 1/4 compared with the page mapping and the smallest mapping scheme known to the public, respectively. Via eliminating the redundancy in the segment cleaning and garbage collection, the OrcFS reduces 1/3 of the write volume under heavy random write workload. OrcFS achieves 56% performance gain against EXT4 in varmail workload.

References

[1]
Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark S. Manasse, and Rina Panigrahy. 2008. Design tradeoffs for SSD performance. In Proceedings of the USENIX Annual Technical Conference (ATC). 57--70.
[2]
Jens Axboe. 2005. Fio-flexible i/o tester synthetic benchmark. URL https://github. com/axboe/fio (Accessed: 2015-06-13) (2005).
[3]
Kyle Banker. 2011. MongoDB in Action. Manning Publications Co.
[4]
Frank Berry. 2015. Enterprise flash storage: Who’s adopting them and why. Proceedings of the Flash Memory Summit, Santa Clara, CA (2015).
[5]
Daniel Campello, Hector Lopez, Ricardo Koller, Raju Rangaswami, and Luis Useche. 2015. Non-blocking writes to files. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). 151--165.
[6]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS) 26, 2 (2008), 4--4.
[7]
Yuan-Hao Chang, Jen-Wei Hsieh, and Tei-Wei Kuo. 2007. Endurance enhancement of flash-memory storage systems: An efficient static wear leveling design. In Proceedings of the ACM Annual Design Automation Conference. 212--217.
[8]
ChosunBiz. 2016. http://biz.chosun.com/site/data/html_dir/2016/08/12/2016081202016.html?main_box. (2016).
[9]
David Chow, Charles Lee, Abraham Ma, Frank Yu, Edward Lee, Ming-Shiang Shen, and others. 2007. Managing bad blocks in various flash memory cells for electronic data flash card. (2007). US Patent No. 11/864,684.
[10]
Christian Czezatke and M. Anton Ertl. 2000. LinLogFS-a log-structured file system for linux. In Proceedings of the USENIX Annual Technical Conference (ATC). 77--88.
[11]
John D. Davis, Laura Caulfield, and Steve Swanson. 2013. Flash trends: Challenges and future. In Proceedings of the IEEE Hot Chips 25 Symposium (HCS). IEEE, 1--42.
[12]
Jörn Engel and Robert Mertens. 2005. LogFS-finally a scalable flash file system. In Proceedings of the 12th International Linux System Technology Conference.
[13]
f2fs-tools. 2012. Formatting Tools for Flash-Friendly File System. http://git.kernel.org/cgit/linux/kernel/git/jaegeuk/f2fs-tools.git. (2012).
[14]
S. Ghemawat and J. Dean. 2014. LevelDB, A fast and lightweight key/value database library by Google. (2014).
[15]
Ayush Gupta, Youngjae Kim, and Bhuvan Urgaonkar. 2009. DFTL a flash translation layer employing demand-based selective caching of page-level address mappings. In Proceedings of the 14th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’09). 229--240.
[16]
Adrian Hunter. 2008. A brief introduction to the design of UBIFS. In Proceedings of the the Rapport Technique.
[17]
William K. Josephson, Lars A. Bongo, Kai Li, and David Flynn. 2010. DFS: A file system for virtualized flash storage. ACM Trans. Stor. 6, 14 (2010), 14:1--14:25.
[18]
Jeong-Uk Kang, Jeeseok Hyun, Hyunjoo Maeng, and Sangyeun Cho. 2014. The multi-streamed solid-state drive. In Proceedings of the 6th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage). 13--13.
[19]
Jeong-Uk Kang, Heeseung Jo, Jin-Soo Kim, and Joonwon Lee. 2006. A superblock-based flash translation layer for NAND flash memory. In Proceedings of the the 6th ACM 8 IEEE International Conference on Embedded Software. 161--170.
[20]
Atsuo Kawaguchi, Shingo Nishioka, and Hiroshi Motoda. 1995. A flash-memory based file system. In Proceedings of the the USENIX Anual Technical Conference (ATC’95). 155--164.
[21]
Joohyun Kim, Haesung Kim, Seongjin Lee, and Youjip Won. 2010. FTL design for TRIM command. In Proceedings of the the 15th International Workshop on Software Support for Portable Storage. 7--12.
[22]
Jesung Kim, Jong Min Kim, Sam H. Noh, Sang Lyul Min, and Yookun Cho. 2002. A space-efficient flash translation layer for CompactFlash systems. IEEE Consum. Electron. 48, 2 (2002), 366--375.
[23]
Kingston Technology. 2013. Understanding over-provisioning. (2013).
[24]
Ryusuke Konishi, Yoshiji Amagai, Koji Sato, Hisashi Hifumi, Seiji Kihara, and Satoshi Moriai. 2006. The Linux implementation of a log-structured file system. ACM SIGOPS Operat. Syst. Rev. 40, 3 (2006), 102--107.
[25]
Hunki Kwon, Eunsam Kim, Jongmoo Choi, Donghee Lee, and Sam H. Noh. 2010. Janus-FTL: Finding the optimal point on the spectrum between page and block mapping schemes. In Proceedings of the the ACM International Conference on Embedded Software (EMSOFT’10). 169--178.
[26]
Ohhoon Kwon, Jaewoo Lee, and Kern Koh. 2007. EF-greedy: A novel garbage collection policy for flash memory based embedded systems. In Computational Science (ICCS’07). Springer, 913--920.
[27]
Avinash Lakshman and Prashant Malik. 2009. Cassandra: Structured storage system on a P2P network. In Proceedings of the the 28th ACM Symposium on Principles of Distributed Computing (PODC’09). 5--5.
[28]
Changman Lee, Dongho Sim, Jooyoung Hwang, and Sangyeun Cho. 2015. F2FS: A new file system for flash storage. In Proceedings of the the USENIX Conference on File and Storage Technologies (FAST’15). 273--286.
[29]
Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, Feiyi Wang, and Jongman Kim. 2011. A semi-preemptive garbage collector for solid state drives. In Proceedings of the IEEE Performance Analysis of Systems and Software (ISPASS’11). 12--21.
[30]
Sungjin Lee, Ming Liu, Sangwoo Jun, Shuotao Xu, Jihong Kim, and others. 2016. Application-managed flash. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). 339--353.
[31]
Sungjin Lee, Dongkun Shin, Young-Jin Kim, and Jihong Kim. 2008. LAST: Locality-aware sector translation for NAND flash memory-based storage systems. In Proceedings of the ACM SIGOPS Operating Systems Review, Vol. 42. 36--42.
[32]
Sang-Won Lee, Dong-Joo Park, Tae-Sun Chung, Dong-Ho Lee, Sangwon Park, and Ha-Joo Song. 2007. A log buffer-based flash translation layer using fully-associative sector translation. ACM Trans. Embed. Comput. Syst. 6, 3 (July 2007), Article 18.
[33]
Hyeontaek Lim, Bin Fan, David G. Andersen, and Michael Kaminsky. 2011. SILT: A memory-efficient, high-performance key-value store. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11). 1--13.
[34]
Charles Manning. 2010. How YAFFS works. Retrieved April 6, 2010 from https://yaffs.net/documents/how-yaffs-works.
[35]
Leonardo Marmol, Swaminathan Sundararaman, Nisha Talagala, and Raju Rangaswami. 2015. NVMKV: A scalable, lightweight, FTL-aware key-value store. In Proceedings of the USENIX Annual Technical Conference (ATC’15). 207--219.
[36]
Lucas Mearian. 2016. SSD prices plummet again, close in on HDDs: Prices dropped by 12 percent in just the last quarter alone. Retrieved from http://www.pcworld.com/article/3040591/storage/ssd-prices-plummet-again-close-in-on-hdds.html.
[37]
Micron. 2016. Technology Innovation Redefined. Retrieved from https://www.micron.com/∼/media/documents/products/product-flyer/3d_nand_flyer.pdf.
[38]
Patrick O’ Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’ Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Inform. 33, 4 (1996), 351--385.
[39]
Jian Ouyang, Shiding Lin, Song Jiang, Zhenyu Hou, Yong Wang, and Yuanzheng Wang. 2014. SDF: Software-defined flash for web-scale internet storage system. In Proceedings of the 19th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). 471--484.
[40]
JungWook Park, Gi-Ho Park, Charles Weems, and ShinDug Kim. 2009. Sub-grouped superblock management for high-performance flash storages. IEICE Electron. Express 6, 6 (2009), 297--303.
[41]
RocksDB. 2014. A persistent key-value store for fast storage environments. Retrieved from http://rocksdb.org/.
[42]
Mendel Rosenblum and John K. Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10 (1992), 26--52.
[43]
Samsung. 2015. Next generation Samsung 3bit V-NAND Techonology. Retrieved from http://www.samsung.com/semiconductor/global/file/insight/2015/08/3bit_V-NAND_technology_White_Paper-1.pdf.
[44]
Samsung Electronics Co. 2014. Over-provisioning: Maximize the lifetime and performance of your SSD with small effect to earn more. Application note. (2014).
[45]
Mohit Saxena and Michael M. Swift. 2010. FlashVM: Virtual memory management on flash. In Proceedings of the USENIX Annual Technical Conference (ATC’10). 14--14.
[46]
Margo Seltzer, Keith A. Smith, Hari Balakrishnan, Jacqueline Chang, Sara McMains, and Venkata Padmanabhan. 1995. File system logging versus clustering: A performance comparison. In Proceedings of the the USENIX Technical Conference Proceedings. 21--21.
[47]
Frank Shu and Nathan Obr. 2007. Data set management commands proposal for ATA8-ACS2. Management 2 (2007), 1.
[48]
smartmontools. 2010. smartmontools package. Retrieved from http://sourceforge.net/apps/trac/smartmontools/wiki.
[49]
Kent Smith. 2011. Garbage collection. In Proceedings of the Flash Memory Summit. 1--9.
[50]
SSD843Tn. 2014. Samsung, SSD 843tn Specification. Retrieved from http://enterprise.m2m-direct.co.uk/downloads/resources/SAMSUNG%20Channel%20Info%20Memory%2010-14.pdf.
[51]
StarWind. 2014. Log-Structured File System. Retrieved from https://www.starwindsoftware.com/vm-centric-storage-lsfs.
[52]
Vasily Tarasov, Erez Zadok, and Spencer Shepler. 2016. Filebench: A flexible framework for file system benchmarking. USENIX Login Mag. 41 (2016).
[53]
Zev Weiss, Sriram Subramanian, Swaminathan Sundararaman, Nisha Talagala, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2015. ANViL: Advanced virtualization for modern non-volatile memory devices. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 111--118.
[54]
Guanying Wu and Xubin He. 2012. Reducing SSD read latency via NAND flash program and erase suspension. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’12). 10--10.
[55]
Shiqin Yan, Huaicheng Li, Mingzhe Hao, Michael Hao Tong, Swaminathan Sundararaman, Andrew A. Chien, and Haryadi S. Gunawi. 2017. Tiny-tail flash: Near-perfect elimination of garbage collection tail latencies in NAND SSDs. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’17). 22:1--22:26.
[56]
Jingpei Yang, Ned Plasson, Greg Gillis, Nisha Talagala, and Swaminathan Sundararaman. 2014. Dont́ stack your log on my log. In Proceedings of the Interactions of NVM/Flash with Operating Systems and Workloads (INFLOW’14).
[57]
Yudong Yang, Vishal Misra, and Dan Rubenstein. 2015. On the optimality of greedy garbage collection for SSDs. ACM SIGMETRICS Perform. Eval. Rev. 43, 2 (Sept. 2015), 63--65.
[58]
Jinsoo Yoo, Youjip Won, Joongwoo Hwang, Sooyong Kang, Jongmoo Choil, Sungroh Yoon, and Jaehyuk Cha. 2013. Vssim: Virtual machine based ssd simulator. In Proceedings of the the IEEE Mass Storage Systems and Technologies (MSST’13). 1--14.
[59]
Jiacheng Zhang, Jiwu Shu, and Youyou Lu. 2016. ParaFS: A log-structured file system to exploit the internal parallelism of flash devices. In Proceedings of the USENIX Annual Technical Conference (ATC’16). 87--100.
[60]
Yiying Zhang, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2015. Removing the costs and retaining the benefits of flash-based SSD virtualization with FSDV. In Proceedings of the Conference on Mass Storage Systems and Technologies (MSST’15). 1650--1665.
[61]
Yong Zhang and Xue-hong Qiu. 2006. Implementation of JFFS2 file system in embedded linux system. In Proceedings of the Computer Technology and Development, Vol. 4. 48--48.
[62]
Da Zheng, Randal C. Burns, and Alexander S. Szalay. 2015. Optimize unsynchronized garbage collection in an SSD array. Computing Research Repository, Vol. abs/1506.07566. 1--7.

Cited By

View all
  • (2022)ELOFS: An Extensible Low-overhead Flash File System for Resource-scarce Embedded DevicesIEEE Transactions on Computers10.1109/TC.2022.3152079(1-1)Online publication date: 2022
  • (2022)CoDiscard: A revenue model based cross-layer cooperative discarding mechanism for flash memory devicesJournal of Systems Architecture10.1016/j.sysarc.2022.102564(102564)Online publication date: May-2022
  • (2020)Two-tier garbage collection for persistent objectProceedings of the 35th Annual ACM Symposium on Applied Computing10.1145/3341105.3373986(1246-1255)Online publication date: 30-Mar-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 14, Issue 2
May 2018
210 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/3208078
  • Editor:
  • Sam H. Noh
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 April 2018
Accepted: 01 November 2017
Revised: 01 September 2017
Received: 01 November 2016
Published in TOS Volume 14, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Flash memories
  2. Garbage Collection
  3. Log-structured File System

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Ministry of ScienceICT8Future Plannin
  • ICT R8D program of MSIP/IITP
  • NRF
  • Ministry of Science ICT8Future Planning under the ITRC support program
  • BK21 plus program
  • Ministry of Education of Korea
  • Basic Research Lab Program

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)2
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2022)ELOFS: An Extensible Low-overhead Flash File System for Resource-scarce Embedded DevicesIEEE Transactions on Computers10.1109/TC.2022.3152079(1-1)Online publication date: 2022
  • (2022)CoDiscard: A revenue model based cross-layer cooperative discarding mechanism for flash memory devicesJournal of Systems Architecture10.1016/j.sysarc.2022.102564(102564)Online publication date: May-2022
  • (2020)Two-tier garbage collection for persistent objectProceedings of the 35th Annual ACM Symposium on Applied Computing10.1145/3341105.3373986(1246-1255)Online publication date: 30-Mar-2020
  • (2020)A Highly Reliable Metadata Service for Large-Scale Distributed File SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.293749231:2(374-392)Online publication date: 1-Feb-2020
  • (2018)VJournaling: Improving the Write Performance of a Virtualized Journaling File System2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI)10.1109/SmartWorld.2018.00226(1299-1304)Online publication date: Oct-2018

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media