Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3341301.3359656acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article
Open access

File systems unfit as distributed storage backends: lessons from 10 years of Ceph evolution

Published: 27 October 2019 Publication History

Abstract

For a decade, the Ceph distributed file system followed the conventional wisdom of building its storage backend on top of local file systems. This is a preferred choice for most distributed file systems today because it allows them to benefit from the convenience and maturity of battle-tested code. Ceph's experience, however, shows that this comes at a high price. First, developing a zero-overhead transaction mechanism is challenging. Second, metadata performance at the local level can significantly affect performance at the distributed level. Third, supporting emerging storage hardware is painstakingly slow.
Ceph addressed these issues with BlueStore, a new back-end designed to run directly on raw storage devices. In only two years since its inception, BlueStore outperformed previous established backends and is adopted by 70% of users in production. By running in user space and fully controlling the I/O stack, it has enabled space-efficient metadata and data checksums, fast overwrites of erasure-coded data, inline compression, decreased performance variability, and avoided a series of performance pitfalls of local file systems. Finally, it makes the adoption of backwards-incompatible storage hardware possible, an important trait in a changing storage landscape that is learning to embrace hardware diversity.

References

[1]
Abutalib Aghayev and Peter Desnoyers. 2015. Skylight---A Window on Shingled Disk Operation. In 13th USENIX Conference on File and Storage Technologies (FAST 15). USENIX Association, Santa Clara, CA, USA, 135--149. https://www.usenix.org/conference/fast15/technical-sessions/presentation/aghayev
[2]
Abutalib Aghayev, Theodore Ts'o, Garth Gibson, and Peter Desnoyers. 2017. Evolving Ext4 for Shingled Disks. In 15th USENIX Conference on File and Storage Technologies (FAST 17). USENIX Association, Santa Clara, CA, 105--120. https://www.usenix.org/conference/fast17/technical-sessions/presentation/aghayev
[3]
Abutalib Aghayev, Sage Weil, Greg Ganger, and George Amvrosiadis. 2019. Reconciling LSM-Trees with Modern Hard Drives using BlueFS. Technical Report CMU-PDL-19--102. CMU Parallel Data Laboratory. http://www.pdl.cmu.edu/PDL-FTP/FS/CMU-PDL-19-102_abs.shtml
[4]
Amazon.com, Inc. 2019. Amazon Elastic Block Store. https://aws.amazon.com/ebs/.
[5]
Amazon.com, Inc. 2019. Amazon S3. https://aws.amazon.com/s3/.
[6]
Jens Axboe. 2009. Queue sysfs files. https://www.kernel.org/doc/Documentation/block/queue-sysfs.txt.
[7]
Jens Axboe. 2016. Flexible I/O Tester. git://git.kernel.dk/fio.git.
[8]
Jens Axboe. 2016. Throttled Background Buffered Writeback. https://lwn.net/Articles/698815/.
[9]
Matias Bjørling. 2019. From Open-Channel SSDs to Zoned Namespaces. In Linux Storage and Filesystems Conference (Vault 19). USENIX Association, Boston, MA.
[10]
Matias Bjørling. 2019. New NVMe Specification Defines Zoned Namespaces (ZNS) as Go-To Industry Technology. https://nvmexpress.org/new-nvmetm-specification-defines-zoned-namespaces-zns-as-go-to-industry-technology/.
[11]
Matias Bjørling, Javier Gonzalez, and Philippe Bonnet. 2017. Light-NVM: The Linux Open-Channel SSD Subsystem. In 15th USENIX Conference on File and Storage Technologies (FAST 17). USENIX Association, Santa Clara, CA, 359--374. https://www.usenix.org/conference/fast17/technical-sessions/presentation/bjorling
[12]
Artem Blagodarenko. 2016. Scaling LDISKFS for the future. https://www.youtube.com/watch?v=ubbZGpxV6zk.
[13]
Artem Blagodarenko. 2017. Scaling LDISKFS for the future. Again. https://www.youtube.com/watch?v=HLfEd0_Dq0U.
[14]
Frederick P Brooks Jr. 1986. No Silver Bullet---Essence and Accident in Software Engineering.
[15]
Btrfs. 2019. Btrfs Changelog. https://btrfs.wiki.kernel.org/index.php/Changelog.
[16]
David C. 2018. [ceph-users] Luminous | PG split causing slow requests. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/024984.html.
[17]
Luoqing Chao and Thunder Zhang. 2015. Implement Object Storage with SMR based key-value store. https://www.snia.org/sites/default/files/SDC15_presentations/smr/QingchaoLuo_Implement_Object_Storage_SMR_Key-Value_Store.pdf.
[18]
Dave Chinner. 2010. XFS Delayed Logging Design. https://www.kernel.org/doc/Documentation/filesystems/xfs-delayed-logging-design.txt.
[19]
Dave Chinner. 2015. SMR Layout Optimization for XFS. http://xfs.org/images/f/f6/Xfs-smr-structure-0.2.pdf.
[20]
Dave Chinner. 2019. Re: pagecache locking (was: bcachefs status update) merged). https://lkml.org/lkml/2019/6/13/1794.
[21]
Alibaba Clouder. 2018. Alibaba Deploys Alibaba Open Channel SSD for Next Generation Data Centers. https://www.alibabacloud.com/blog/alibaba-deploys-alibaba-open-channel-ssd-for-next-generation-data-centers_593802.
[22]
William Cohen. 2016. How to avoid wasting megabytes of memory a few bytes at a time. https://developers.redhat.com/blog/2016/06/01/how-to-avoid-wasting-megabytes-of-memory-a-few-bytes-at-a-time/.
[23]
Jonathan Corbet. 2009. Supporting transactions in Btrfs. https://lwn.net/Articles/361457/.
[24]
Jonathan Corbet. 2011. No-I/O dirty throttling. https://lwn.net/Articles/456904/.
[25]
Jonathan Corbet. 2018. PostgreSQL's fsync() surprise. https://lwn.net/Articles/752063/.
[26]
Western Digital. 2019. Zoned Storage. http://zonedstorage.io.
[27]
Anton Dmitriev. 2017. [ceph-users] All OSD fails after few requests to RGW. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-May/017950.html.
[28]
Jake Edge. 2015. Filesystem support for SMR devices. https://lwn.net/Articles/637035/.
[29]
Jake Edge. 2015. The OrangeFS distributed filesystem. https://lwn.net/Articles/643165/.
[30]
Jake Edge. 2015. XFS: There and back ... and there again? https://lwn.net/Articles/638546/.
[31]
D. R. Engler, M. F. Kaashoek, and J. O'Toole, Jr. 1995. Exokernel: An Operating System Architecture for Application-level Resource Management. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (SOSP '95). ACM, New York, NY, USA, 251--266.
[32]
Andrew Fikes. 2010. Storage Architecture and Challenges. https://cloud.google.com/files/storage_architecture_and_challenges.pdf.
[33]
Mary Jo Foley. 2018. Microsoft readies new cloud SSD storage spec for the Open Compute Project. https://www.zdnet.com/article/microsoft-readies-new-cloud-ssd-storage-spec-for-the-open-compute-project/.
[34]
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google File System. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP '03). ACM, New York, NY, USA, 29--43.
[35]
Mingzhe Hao, Gokul Soundararajan, Deepak Kenchammana-Hosekote, Andrew A. Chien, and Haryadi S. Gunawi. 2016. The Tail at Store: A Revelation from Millions of Hours of Disk and SSD Deployments. In 14th USENIX Conference on File and Storage Technologies (FAST 16). USENIX Association, Santa Clara, CA, 263--276. https://www.usenix.org/conference/fast16/technical-sessions/presentation/hao
[36]
Christoph Hellwig. 2009. XFS: The Big Storage File System for Linux. USENIX ;login issue 34, 5 (2009).
[37]
J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satyanarayanan, Robert N. Sidebotham, and M. West. 1987. Scale and Performance in a Distributed File System. In Proceedings of the Eleventh ACM Symposium on Operating Systems Principles (SOSP '87). ACM, New York, NY, USA, 1--2.
[38]
Joel Hruska. 2019. Western Digital to Demo Dual Actuator HDD, Will Use SMR to Hit 18TB Capacity. https://www.extremetech.com/computing/287319-western-digital-to-demo-dual-actuator-hdd-will-use-smr-to-hit-18tb-capacity.
[39]
Yige Hu, Zhiting Zhu, Ian Neal, Youngjin Kwon, Tianyu Cheng, Vijay Chidambaram, and Emmett Witchel. 2018. TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 879--891. https://www.usenix.org/conference/atc18/presentation/hu
[40]
Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. 2012. Erasure Coding in Windows Azure Storage. In Presented as part of the 2012 USENIX Annual Technical Conference (USENIX ATC 12). USENIX, Boston, MA, 15--26. https://www.usenix.org/conference/atc12/technical-sessions/presentation/huang
[41]
Felix Hupfeld, Toni Cortes, Björn Kolbeck, Jan Stender, Erich Focht, Matthias Hess, Jesus Malo, Jonathan Marti, and Eugenio Cesario. 2008. The XtreemFS Architecture - a Case for Object-based File Systems in Grids. Concurrency and Computation: Practice and Experience 20, 17 (Dec. 2008), 2049--2060.
[42]
Facebook Inc. 2019. RocksDB DirectIO. https://github.com/facebook/rocksdb/wiki/Direct-IO.
[43]
Facebook Inc. 2019. RocksDB Merge Operator. https://github.com/facebook/rocksdb/wiki/Merge-Operator.
[44]
Facebook Inc. 2019. RocksDB Synchronous Writes. https://github.com/facebook/rocksdb/wiki/Basic-Operations#synchronous-writes.
[45]
Silicon Graphics Inc. 2006. XFS Allocation Groups. http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure/tmp/en-US/html/Allocation_Groups.html.
[46]
INCITS T10 Technical Committee. 2014. Information technology - Zoned Block Commands (ZBC). Draft Standard T10/BSR INCITS 536. American National Standards Institute, Inc. Available from http://www.t10.org/drafts.htm.
[47]
William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2015. BetrFS: Write-Optimization in a Kernel File System. Trans. Storage 11, 4, Article 18 (Nov. 2015), 29 pages.
[48]
Sooman Jeong, Kisung Lee, Seongjin Lee, Seoungbum Son, and Youjip Won. 2013. I/O Stack Optimization for Smartphones. In Presented as part of the 2013 USENIX Annual Technical Conference (USENIX ATC 13). USENIX, San Jose, CA, 309--320. https://www.usenix.org/conference/atc13/technical-sessions/presentation/jeong
[49]
Theodore Johnson and Dennis Shasha. 1994. 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB '94). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 439--450. http://dl.acm.org/citation.cfm?id-645920.672996
[50]
M. Frans Kaashoek, Dawson R. Engler, Gregory R. Ganger, Hector M. Briceño, Russell Hunt, David Mazières, Thomas Pinckney, Robert Grimm, John Jannotti, and Kenneth Mackenzie. 1997. Application Performance and Flexibility on Exokernel Systems. In Proceedings of the Sixteenth ACM Symposium on Operating Systems Principles (SOSP '97). ACM, New York, NY, USA, 52--65.
[51]
Yangwook Kang, Rekha Pitchumani, Pratik Mishra, Yang-suk Kee, Francisco Londono, Sangyoon Oh, Jongyeol Lee, and Daniel D. G. Lee. 2019. Towards Building a High-performance, Scale-in Key-value Storage System. In Proceedings of the 12th ACM International Conference on Systems and Storage (SYSTOR '19). ACM, New York, NY, USA, 144--154.
[52]
John Kennedy and Michael Satran. 2018. About Transactional NTFS. https://docs.microsoft.com/en-us/windows/desktop/fileio/about-transactional-ntfs.
[53]
John Kennedy and Michael Satran. 2018. Alternatives to using Transactional NTFS. https://docs.microsoft.com/en-us/windows/desktop/fileio/deprecation-of-txf.
[54]
Jaeho Kim, Donghee Lee, and Sam H. Noh. 2015. Towards SLO Complying SSDs Through OPS Isolation. In 13th USENIX Conference on File and Storage Technologies (FAST 15). USENIX Association, Santa Clara, CA, 183--189. https://www.usenix.org/conference/fast15/technical-sessions/presentation/kim_jaeho
[55]
Butler Lampson and Howard E Sturgis. 1979. Crash recovery in a distributed data storage system. (1979).
[56]
Adam Leventhal. 2016. APFS in Detail: Overview. http://dtrace.org/blogs/ahl/2016/06/19/apfs-part1/.
[57]
Peter Macko, Xiongzi Ge, John Haskins Jr., James Kelley, David Slik, Keith A. Smith, and Maxim G. Smith. 2017. SMORE: A Cold Data ObjectStore for SMR Drives (Extended Version). CoRR abs/1705.09701 (2017). http://arxiv.org/abs/1705.09701
[58]
Magic Pocket & Hardware Engineering Teams. 2018. Extending Magic Pocket Innovation with the first petabyte scale SMR drive deployment. https://blogs.dropbox.com/tech/2018/06/extending-magic-pocket-innovation-with-the-first-petabyte-scale-smr-drive-deployment/.
[59]
Magic Pocket & Hardware Engineering Teams. 2019. SMR: What we learned in our first year. https://blogs.dropbox.com/tech/2019/07/smr-what-we-learned-in-our-first-year/.
[60]
Lars Marowsky-Brée. 2018. Ceph User Survey 2018 results. https://ceph.com/ceph-blog/ceph-user-survey-2018-results/.
[61]
Marshall K McKusick, William N Joy, Samuel J Leffler, and Robert S Fabry. 1984. A Fast File System for UNIX. ACM Transactions on Computer Systems (TOCS) 2, 3 (1984), 181--197.
[62]
Chris Mellor. 2019. Toshiba embraces shingling for next-gen MAMR HDDs. https://blocksandfiles.com/2019/03/11/toshiba-mamr-statements-have-shingling-absence/.
[63]
Changwoo Min, Woon-Hak Kang, Taesoo Kim, Sang-Won Lee, and Young Ik Eom. 2015. Lightweight Application-Level Crash Consistency on Transactional Flash Storage. In 2015 USENIX Annual Technical Conference (USENIX ATC 15). USENIX Association, Santa Clara, CA, 221--234. https://www.usenix.org/conference/atc15/technical-session/presentation/min
[64]
Sumedh N. 2013. Coding for Performance: Data alignment and structures. https://software.intel.com/en-us/articles/coding-for-performance-data-alignment-and-structures.
[65]
Michael A. Olson. 1993. The Design and Implementation of the Inversion File System. In USENIX Winter.
[66]
Michael A. Olson, Keith Bostic, and Margo Seltzer. 1999. Berkeley DB. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC '99). USENIX Association, Berkeley, CA, USA, 43--43. http://dl.acm.org/citation.cfm?id-1268708.1268751
[67]
OpenStack Foundation. 2017. 2017 Annual Report. https://www.openstack.org/assets/reports/OpenStack-AnnualReport2017.pdf.
[68]
Adrian Palmer. 2015. SMRFFS-EXT4---SMR Friendly File System. https://github.com/Seagate/SMR_FS-EXT4.
[69]
Swapnil Patil and Garth Gibson. 2011. Scale and Concurrency of GIGA+: File System Directories with Millions of Files. In Proceedings of the 9th USENIX Conference on File and Stroage Technologies (FAST'11). USENIX Association, Berkeley, CA, USA, 13--13. http://dl.acm.org/citation.cfm?id=1960475.1960488
[70]
Juan Piernas. 2002. DualFS: A New Journaling File System without Meta-data Duplication. In In Proceedings of the 16th International Conference on Supercomputing. 137--146.
[71]
Poornima G and Rajesh Joseph. 2016. Metadata Performance Bottlenecks in Gluster. https://www.slideshare.net/GlusterCommunity/performance-bottlenecks-for-metadata-workload-in-gluster-with-poornima-gurusiddaiah-rajesh-joseph.
[72]
Donald E. Porter, Owen S. Hofmann, Christopher J. Rossbach, Alexander Benn, and Emmett Witchel. 2009. Operating System Transactions. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles (SOSP '09). ACM, New York, NY, USA, 161--176.
[73]
Lee Prewitt. 2019. SMR and ZNS - Two Sides of the Same Coin. https://www.youtube.com/watch?v-jBxzO6YyMxU.
[74]
Red Hat Inc. 2019. GlusterFS Architecture. https://docs.gluster.org/en/latest/Quick-Start-Guide/Architecture/.
[75]
Kai Ren and Garth Gibson. 2013. TABLEFS: Enhancing Metadata Efficiency in the Local File System. In Presented as part of the 2013 USENIX Annual Technical Conference (USENIX ATC 13). USENIX, San Jose, CA, USA, 145--156. https://www.usenix.org/conference/atc13/technical-sessions/presentation/ren
[76]
Mendel Rosenblum and John K. Ousterhout. 1991. The Design and Implementation of a Log-structured File System. In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles (SOSP '91). ACM, New York, NY, USA, 1--15.
[77]
Frank Schmuck and Jim Wylie. 1991. Experience with Transactions in QuickSilver. In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles (SOSP '91). ACM, New York, NY, USA, 239--253.
[78]
Thomas J. E. Schwarz, Qin Xin, Ethan L. Miller, Darrell D. E. Long, Andy Hospodor, and Spencer Ng. 2004. Disk Scrubbing in Large Archival Storage Systems. In Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS '04). IEEE Computer Society, Washington, DC, USA, 409--418. http://dl.acm.org/citation.cfm?id-1032659.1034226
[79]
Seastar. 2019. Shared-nothing Design. http://seastar.io/shared-nothing/.
[80]
Margo I. Seltzer. 1993. Transaction Support in a Log-Structured File System. In Proceedings of the Ninth International Conference on Data Engineering. IEEE Computer Society, Washington, DC, USA, 503--510. http://dl.acm.org/citation.cfm?id-645478.654970
[81]
Kai Shen, Stan Park, and Men Zhu. 2014. Journaling of Journal Is (Almost) Free. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST 14). USENIX, Santa Clara, CA, 287--293. https://www.usenix.org/conference/fast14/technical-sessions/presentation/shen
[82]
Anton Shilov. 2017. Seagate Ships 35th Millionth SMR HDD, Confirms HAMR-Based Drives in Late 2018. https://www.anandtech.com/show/11315/seagate-ships-35th-millionth-smr-hdd-confirms-hamrbased-hard-drives-in-late-2018.
[83]
A. Shilov. 2019. Western Digital: Over Half of Data Center HDDs Will Use SMR by 2023. https://www.anandtech.com/show/14099/western-digital-over-half-of-dc-hdds-will-use-smr-by-2023.
[84]
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The Hadoop Distributed File System. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) (MSST '10). IEEE Computer Society, Washington, DC, USA, 1--10.
[85]
Chris Siebenmann. 2011. About the order that readdir() returns entries in. https://utcc.utoronto.ca/~cks/space/blog/unix/ReaddirOrder.
[86]
Chris Siebenmann. 2013. ZFS transaction groups and the ZFS Intent Log. https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSTXGsAndZILs.
[87]
Richard P. Spillane, Sachin Gaikwad, Manjunath Chinni, Erez Zadok, and Charles P. Wright. 2009. Enabling Transactional File Access via Lightweight Kernel Extensions. In 7th USENIX Conference on File and Storage Technologies (FAST 09). USENIX Association, San Francisco, CA. https://www.usenix.org/conference/fast-09/enabling-transactional-file-access-lightweight-kernel-extensions
[88]
Stas Starikevich. 2016. [ceph-users] RadosGW performance degradation on the 18 millions objects stored. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012983.html.
[89]
Jan Stender, Björn Kolbeck, Mikael Högqvist, and Felix Hupfeld. 2010. BabuDB: Fast and Efficient File System Metadata Storage. In Proceedings of the 2010 International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI '10). IEEE Computer Society, Washington, DC, USA, 51--58.
[90]
Michael Stonebraker. 1981. Operating System Support for Database Management. Communications of the ACM 24, 7 (July 1981), 412--418.
[91]
Michael Stonebraker and Lawrence A. Rowe. 1986. The Design of POSTGRES. In Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data (SIGMOD '86). ACM, New York, NY, USA, 340--355.
[92]
ZAR team. 2019. "Write hole" phenomenon. http://www.raid-recovery-guide.com/raid5-write-hole.aspx.
[93]
ThinkParQ. 2018. An introduction to BeeGFS. https://www.beegfs.io/docs/whitepapers/Introduction_to_BeeGFS_by_ThinkParQ.pdf.
[94]
Stephen C Tweedie. 1998. Journaling the Linux ext2fs Filesystem. In The Fourth Annual Linux Expo. Durham, NC, USA.
[95]
Sage Weil. 2009. Re: [RFC] big fat transaction ioctl. https://lwn.net/Articles/361472/.
[96]
Sage Weil. 2009. [RFC] big fat transaction ioctl. https://lwn.net/Articles/361439/.
[97]
Sage Weil. 2011. [PATCH v3] introduce sys_syncfs to sync a single file system. https://lwn.net/Articles/433384/.
[98]
Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. 2006. Ceph: A Scalable, High-performance Distributed File System. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI '06). USENIX Association, Berkeley, CA, USA, 307--320. http://dl.acm.org/citation.cfm?id=1298455.1298485
[99]
Sage A. Weil, Scott A. Brandt, Ethan L. Miller, and Carlos Maltzahn. 2006. CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC '06). ACM, New York, NY, USA, Article 122.
[100]
Sage A. Weil, Andrew W. Leung, Scott A. Brandt, and Carlos Maltzahn. 2007. RADOS: A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters. In Proceedings of the 2Nd International Workshop on Petascale Data Storage: Held in Conjunction with Supercomputing '07 (PDSW '07). ACM, New York, NY, USA, 35--44.
[101]
Brent Welch, Marc Unangst, Zainul Abbasi, Garth Gibson, Brian Mueller, Jason Small, Jim Zelenka, and Bin Zhou. 2008. Scalable Performance of the Panasas Parallel File System. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST'08). USENIX Association, Berkeley, CA, USA, Article 2, 17 pages. http://dl.acm.org/citation.cfm?id=1364813.1364815
[102]
Lustre Wiki. 2017. Introduction to Lustre Architecture. http://wiki.lustre.org/images/6/64/LustreArchitecture-v4.pdf.
[103]
Wikipedia. 2018. Btrfs History. https://en.wikipedia.org/wiki/Btrfs#History.
[104]
Wikipedia. 2018. XFS History. https://en.wikipedia.org/wiki/XFS#History.
[105]
Wikipedia. 2019. Cache flushing. https://en.wikipedia.org/wiki/Disk_buffer#Cache_flushing.
[106]
Charles P. Wright, Richard Spillane, Gopalan Sivathanu, and Erez Zadok. 2007. Extending ACID Semantics to the File System. Trans. Storage 3, 2, Article 4 (June 2007).
[107]
Fengguang Wu. 2012. I/O-less Dirty Throttling. https://events.linuxfoundation.org/images/stories/pdf/lcjp2012_wu.pdf.
[108]
Shiqin Yan, Huaicheng Li, Mingzhe Hao, Michael Hao Tong, Swaminathan Sundararaman, Andrew A. Chien, and Haryadi S. Gunawi. 2017. Tiny-Tail Flash: Near-Perfect Elimination of Garbage Collection Tail Latencies in NAND SSDs. In 15th USENIX Conference on File and Storage Technologies (FAST 17). USENIX Association, Santa Clara, CA, 15--28. https://www.usenix.org/conference/fast17/technical-sessions/presentation/yan
[109]
Lawrence Ying and Theodore Ts'o. 2017. Dynamic Hybrid-SMR: an OCP proposal to improve data center disk drives. https://www.blog.google/products/google-cloud/dynamic-hybrid-smr-ocp-proposal-improve-data-center-disk-drives/.
[110]
Zhihui Zhang and Kanad Ghose. 2007. hFS: A Hybrid File System Prototype for Improving Small File and Metadata Performance. In Proceedings of the 2Nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007 (EuroSys '07). ACM, New York, NY, USA, 175--187.
[111]
Qing Zheng, Charles D. Cranor, Danhao Guo, Gregory R. Ganger, George Amvrosiadis, Garth A. Gibson, Bradley W. Settlemyer, Gary Grider, and Fan Guo. 2018. Scaling Embedded In-situ Indexing with deltaFS. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). IEEE Press, Piscataway, NJ, USA, Article 3, 15 pages. http://dl.acm.org/citation.cfm?id-3291656.3291660
[112]
Alexey Zhuravlev. 2016. ZFS: Metadata Performance. https://www.eofs.eu/_media/events/lad16/02_zfs_md_performance_improvements_zhuravlev.pdf.

Cited By

View all
  • (2024)I/O PassthruProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650704(107-122)Online publication date: 27-Feb-2024
  • (2024)Combining buffered I/O and direct I/O in distributed file systemsProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650699(17-34)Online publication date: 27-Feb-2024
  • (2024)An Evaluation of DAOS for Simulation and Deep Learning HPCWorkloadsACM SIGOPS Operating Systems Review10.1145/3689051.368905858:1(37-44)Online publication date: 14-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SOSP '19: Proceedings of the 27th ACM Symposium on Operating Systems Principles
October 2019
615 pages
ISBN:9781450368735
DOI:10.1145/3341301
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • USENIX Assoc: USENIX Assoc

Publisher

Association for Computing Machinery

New York, NY, United States

Publication Notes

Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

Publication History

Published: 27 October 2019

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Ceph
  2. distributed file system
  3. file system
  4. object storage
  5. storage backend

Qualifiers

  • Research-article

Conference

SOSP '19
Sponsor:
SOSP '19: ACM SIGOPS 27th Symposium on Operating Systems Principles
October 27 - 30, 2019
Ontario, Huntsville, Canada

Acceptance Rates

Overall Acceptance Rate 131 of 716 submissions, 18%

Upcoming Conference

SOSP '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,004
  • Downloads (Last 6 weeks)106
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)I/O PassthruProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650704(107-122)Online publication date: 27-Feb-2024
  • (2024)Combining buffered I/O and direct I/O in distributed file systemsProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650699(17-34)Online publication date: 27-Feb-2024
  • (2024)An Evaluation of DAOS for Simulation and Deep Learning HPCWorkloadsACM SIGOPS Operating Systems Review10.1145/3689051.368905858:1(37-44)Online publication date: 14-Aug-2024
  • (2024)zQoS: Unleashing full performance capabilities of NVMe SSDs while enforcing SLOs in distributed storage systemsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673156(618-628)Online publication date: 12-Aug-2024
  • (2024)Reducing the Impact of I/O Contention in Numerical Weather Prediction Workflows at Scale Using DAOSProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3659914.3659926(1-12)Online publication date: 3-Jun-2024
  • (2024)D2Comp: Efficient Offload of LSM-tree Compaction with Data Processing Units on Disaggregated StorageACM Transactions on Architecture and Code Optimization10.1145/365658421:3(1-22)Online publication date: 9-Apr-2024
  • (2024)Shadow Filesystems: Recovering from Filesystem Runtime Errors via Robust Alternative ExecutionProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665942(15-22)Online publication date: 8-Jul-2024
  • (2024)CrossPrefetch: Accelerating I/O Prefetching for Modern StorageProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624872(102-116)Online publication date: 27-Apr-2024
  • (2024)Backing Up Object Storages – Challenges and Solutions2024 47th MIPRO ICT and Electronics Convention (MIPRO)10.1109/MIPRO60963.2024.10569293(841-843)Online publication date: 20-May-2024
  • (2024)An optimization strategy of distributed metadata based on cache2024 IEEE 6th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC)10.1109/IMCEC59810.2024.10575771(534-538)Online publication date: 24-May-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media