Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3600006.3613171acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article

Enabling High-Performance and Secure Userspace NVM File Systems with the Trio Architecture

Published: 23 October 2023 Publication History

Abstract

Userspace library file systems (LibFSes) promise to unleash the performance potential of non-volatile memory (NVM) by directly accessing it and enabling unprivileged applications to customize their LibFSes to their workloads. Unfortunately, such benefits pose a significant challenge to ensuring metadata integrity. Existing works either underutilize NVM's performance or forgo critical file system security guarantees.
We present Trio, a userspace NVM file system architecture that resolves this inherent tension with a clean decoupling among file system design, access control, and metadata integrity enforcement. Our key insight is that other state (i.e., auxiliary state) in a file system can be regenerated from its "ground truth" state (i.e., core state). Thus, Trio explicitly defines the data structure of a single core state and shares it as common knowledge among its LibFSes and the trusted entity. Enabled by this, a LibFS can directly access NVM without involving the trusted entity and can be customized with its private auxiliary state. The trusted entity enforces metadata integrity by verifying the core state of a file when its write access is transferred from one LibFS to another. We design a generic POSIX-like file system called ArckFS and two customized file systems based on the Trio architecture. Our evaluation shows that ArckFS outperforms existing NVM file systems by 3.1× to 17× on LevelDB while the customized file systems further outperform ArckFS by 1.3×.

References

[1]
3D XPoint: A Breakthrough in Non-Volatile Memory Technology. https://www.intel.com/content/www/us/en/architecture-and-technology/intel-micron-3d-xpoint-webcast.html.
[2]
Compute Express Link 2.0 White Paper. https://b373eaf2-67af-4a29-b28c-3aae9e644f30.filesusr.com/ugd/0c141814c5283e7f3e40f9b2955c7d0f60bebe.pdf.
[3]
Compute Express Link: The Breakthrough CPU-to-Device Interconnect. https://www.computeexpresslink.org/download-the-specification.
[4]
dm-stripe. https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/striped.html.
[5]
Exim Internet Mailer. https://www.exim.org/.
[6]
ext4(5) --- Linux manual page. https://man7.org/linux/man-pages/man5/ext4.5.html.
[7]
Filebench - A Model Based File System Workload Generator. https://github.com/filebench/filebench.
[8]
Flexible I/O Tester. https://github.com/axboe/fio.
[9]
fsck.ext4(8) - Linux man page. https://linux.die.net/man/8/fsck.ext4.
[10]
HPE Persistent Memory. https://www.hpe.com/us/en/servers/persistent-memory.html.
[11]
Intel Optane Persistent Memory. https://www.intel.com/content/www/us/en/products/docs/memory-storage/optane-persistent-memory/overview.html.
[12]
Last week Intel killed Optane. Today, Kioxia and Everspin announced comparable tech: Rumors of storage-class memory's demise may have been premature. https://www.theregister.com/2022/08/02/kioxiaeverspinpersistentmemory/.
[13]
Memory Protection Keys. https://www.kernel.org/doc/html/latest/core-api/protection-keys.html.
[14]
Samsung Memory-Semantic SSD. https://news.samsung.com/global/samsung-electronics-unveils-far-reaching-next-generation-memory-solutions-at-flash-memory-summit-2022.
[15]
Small Files, Big Foils: Addressing the Associated Metadata and Application Challenges. https://blog.cloudera.com/small-files-big-foils/.
[16]
The Challenge in Big Data is Small Files. https://blog.min.io/challenge-big-data-small-files/.
[17]
util-linux. https://github.com/util-linux/util-linux.
[18]
Srivatsa S. Bhat, Rasha Eqbal, Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. Scaling a file system to many cores using an operation log. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP), Shanghai, China, October 2017.
[19]
Youmin Chen, Youyou Lu, Fan Yang, Qing Wang, Yang Wang, and Jiwu Shu. FlatStore: An Efficient Log-Structured Key-Value Storage Engine for Persistent Memory. In Proceedings of the 25th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Lausanne, Switzerland, March 2020.
[20]
Youmin Chen, Youyou Lu, Bohong Zhu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Jiwu Shu. Scalable Persistent Memory File System with Kernel-Userspace Collaboration. In Proceedings of the 19th USENIX Conference on File and Storage Technologies (FAST), Virtual, February 2021.
[21]
Björn Daase, Lars Jonas Bollmeier, Lawrence Benson, and Tilmann Rabl. Maximizing Persistent Memory Bandwidth Utilization for OLAP Workloads. In Proceedings of the 2021 ACM SIGMOD/PODS Conference, Xi'an, Shaanxi, China, May 2021.
[22]
Dave Dice and Alex Kogan. BRAVO: Biased Locking for Reader-Writer Locks. In Proceedings of the 2019 USENIX Annual Technical Conference (ATC), Renton, WA, July 2019.
[23]
Mingkai Dong, Heng Bu, Jifei Yi, Benchao Dong, and Haibo Chen. Performance and Protection in the ZoFS User-space NVM File System. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP), Ontario, Canada, October 2019.
[24]
Subramanya R Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. System Software for Persistent Memory. In Proceedings of the 9th European Conference on Computer Systems (EuroSys), Amsterdam, The Netherlands, April 2014.
[25]
Dawson R. Engler, M. Frans Kaashoek, and James O'Toole J. Exokernel: an operating system architecture for application-level resource management. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (SOSP), Copper Mountain, CO, December 1995.
[26]
Daniel Fryer, Kuei Sun, Rahat Mahmood, TingHao Cheng, Shaun Benjamin, Ashvin Goel, and Angela Demke Brown. Recon: Verifying File System Consistency at Runtime. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST), San JOSE, CA, February 2012.
[27]
Bill Gervasi. A Persistent CXL Memory Module with DRAM Performance. In Storage Developer Conference (SDC). SNIA, 2022. https://storagedeveloper.org/conference/agenda/sessions/persistent-cxl-memory-module-dram-performance.
[28]
Haryadi S. Gunawi, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. SQCK: A Declarative File System Checker. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI), San Diego, CA, December 2008.
[29]
Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R. Dulloor, Jishen Zhao, and Steven Swanson. Basic Performance Measurements of the Intel Optane DC Persistent Memory Module. arXiv preprint arXiv:1903.05714, 2019.
[30]
M. Frans Kaashoek, Dawson R. Engler, Gregory R. Ganger, Hector M. Briceno, Russell Hunt, David Mazieres, Thomas Pinckney, Robert Grimm, John Jannotti, and Kenneth Mackenzie. Exokernel: an operating system architecture for application-level resource management. In Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP), Saint-Malo, France, October 1997.
[31]
Rohan Kadekodi, Saurabh Kadekodi, Soujanya Ponnapalli, Harshad Shirwadkar, Gregory R. Ganger, Aasheesh Kolli, and Vijay Chidambaram. WineFS: a hugepage-aware file system for persistent memory that ages gracefully. In Proceedings of the 28th ACM Symposium on Operating Systems Principles (SOSP), Virtual, October 2021.
[32]
Rohan Kadekodi, Se Kwon Lee, Sanidhya Kashyap, Taesoo Kim, Aasheesh Kolli, and Vijay Chidambaram. SplitFS: Reducing Software Overhead in File Systems for Persistent Memory. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP), Ontario, Canada, October 2019.
[33]
Kostis Kaffes, Jack Tigar Humphries, David Mazières, and Christos Kozyrakis. Syrup: User-Defined Scheduling Across the Stack. In Proceedings of the 28th ACM Symposium on Operating Systems Principles (SOSP), Virtual, October 2021.
[34]
Rajat Kateja, Anirudh Badam, Sriram Govindan, Bikash Sharma, and Greg Ganger. Viyojit: Decoupling battery and dram capacities for battery-backed dram. In Proceedings of the 44th ACM/IEEE International Symposium on Computer Architecture (ISCA), Toronto, Canada, June 2018.
[35]
Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, and Thomas Anderson. Strata: A Cross Media File System. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP), Shanghai, China, October 2017.
[36]
Hayley LeBlanc, Shankara Pailoor, Om Saran K R E, Isil Dillig, James Bornholt, and Vijay Chidambaram. Chipmunk: Investigating Crash-Consistency in Persistent-Memory File Systems. In Proceedings of the 18th European Conference on Computer Systems (EuroSys), Rome, Italy, May 2023.
[37]
Changman Lee, Dongho Sim, Joo-Young Hwang, and Sangyeun Cho. F2FS: A New File System for Flash Storage. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST), Santa Clara, CA, February 2015.
[38]
Ruibin Li, Xiang Ren, Xu Zhao, Siwei He, Michael Stumm, and Ding Yuan. ctFS: Replacing File Indexing with Hardware Memory Translation through Contiguous File Allocation for Persistent Memory. In Proceedings of the 20th USENIX Conference on File and Storage Technologies (FAST), Santa Clara, CA, February 2022.
[39]
Changwoo Min, Sanidhya Kashyap, Steffen Maass, Woonhak Kang, and Taesoo Kim. Understanding Manycore Scalability of File Systems. In Proceedings of the 2016 USENIX Annual Technical Conference (ATC), Denver, CO, June 2016.
[40]
Sujin Park, Diyu Zhou, Yuchen Qian, Irina Calciu, Taesoo Kim, and Sanidhya Kashyap. Application-Informed Kernel Synchronization Primitives. In Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Carlsbad, CA, July 2022.
[41]
Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. Arrakis: The Operating System is the Control Plane. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Broomfield, Colorado, October 2014.
[42]
Suchitra Raman and Steven McCanne. A Model, Analysis, and Protocol Framework for Soft State-based Communication. In Proceedings of the 10th ACM SIGCOMM, Cambridge, MA, August-September 1999.
[43]
Yujie Ren, Changwoo Min, and Sudarsun Kannan. CrossFS: A Cross-layered Direct-Access File System. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Virtual, November 2020.
[44]
Mendel Rosenblum and John K. Ousterhout. The Design and Implementation of a Log-Structured File System. ACM Transactions on Computer Systems, 1992.
[45]
Chia-Che Tsai, Yang Zhan, Jayashree Reddy, Yizheng Jiao, Tao Zhang, and Donald E. Porter. How to Get More Value From Your File System Directory Cache. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), Monterey, CA, October 2015.
[46]
Haris Volos, Sanketh Nalli, Sankarlingam Panneerselvam, Venkatanathan Varadarajan, Prashant Saxena, and Michael M. Swift. Aerie: Flexible File-System Interfaces to Storage-Class Memory. In Proceedings of the 9th European Conference on Computer Systems (EuroSys), Amsterdam, The Netherlands, April 2014.
[47]
Zixuan Wang, Xiao Liu, Jian Yang, Theodore Michailidis, Steven Swanson, and Jishen Zhao. Characterizing and Modeling Non-Volatile Memory Systems. In Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Virtual, October 2020.
[48]
Jian Xu, Juno Kim, Amirsaman Memaripour, and Steven Swanson. Finding and Fixing Performance Pathologies in Persistent Memory Software Stacks. In Proceedings of the 24th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Providence, RI, April 2019.
[49]
Jian Xu and Steven Swanson. NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST), Santa Clara, CA, February 2016.
[50]
Jian Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah, Amit Borase, Tamires Brito Da Silva, Steven Swanson, and Andy Rudof. NOVA-Fortis: A Fault-Tolerant Non-Volatile Main Memory File System. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP), Shanghai, China, October 2017.
[51]
Jian Yang, Juno Kim, Morteza Hoseinzadeh, Joseph Izraelevitz, and Steven Swanson. An Empirical Guide to the Behavior and Use of Scalable Persistent Memory. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST), Santa Clara, CA, February 2020.
[52]
Suli Yang, Tyler Harter, Nishant Agrawal, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Samer Al-Kiswany, Rini T. Kaushik, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. Split-level i/o scheduling. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), Monterey, CA, October 2015.
[53]
Yang Zhan, Alex Conway, Yizheng Jiao, Eric Knorr, Michael A. Bender, Martin Farach-Colton, William Jannen, Rob Johnson, Donald E. Porter, and Jun Yuan. The Full Path to Full-Path Indexing. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST), Oakland, CA, February 2018.
[54]
Shawn Zhong, Chenhao Ye, Guanzhou Hu, Suyan Qu, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, and Michael Swift. MadFS: Per-File virtualization for userspace persistent memory filesystems. In Proceedings of the 21th USENIX Conference on File and Storage Technologies (FAST), Santa Clara, CA, February 2023.
[55]
Diyu Zhou, Yuchen Qian, Vishal Gupta, Zhifei Yang, Changwoo Min, and Sanidhya Kashyap. Odinfs: Scaling PM performance with Opportunistic Delegation. In Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Carlsbad, CA, July 2022.

Cited By

View all
  • (2024)Optimizing file systems on heterogeneous memory by integrating DRAM cache with virtual memory managementProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650702(71-88)Online publication date: 27-Feb-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SOSP '23: Proceedings of the 29th Symposium on Operating Systems Principles
October 2023
802 pages
ISBN:9798400702297
DOI:10.1145/3600006
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

In-Cooperation

  • USENIX

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2023

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. userspace file systems
  2. library file systems
  3. direct access
  4. file system customization
  5. file system integrity
  6. persistent memory

Qualifiers

  • Research-article

Funding Sources

Conference

SOSP '23
Sponsor:

Acceptance Rates

SOSP '23 Paper Acceptance Rate 43 of 232 submissions, 19%;
Overall Acceptance Rate 131 of 716 submissions, 18%

Upcoming Conference

SOSP '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,204
  • Downloads (Last 6 weeks)89
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Optimizing file systems on heterogeneous memory by integrating DRAM cache with virtual memory managementProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650702(71-88)Online publication date: 27-Feb-2024

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media