Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Can Applications Recover from fsync Failures?

Published: 15 June 2021 Publication History

Abstract

We analyze how file systems and modern data-intensive applications react to fsync failures. First, we characterize how three Linux file systems (ext4, XFS, Btrfs) behave in the presence of failures. We find commonalities across file systems (pages are always marked clean, certain block writes always lead to unavailability) as well as differences (page content and failure reporting is varied). Next, we study how five widely used applications (PostgreSQL, LMDB, LevelDB, SQLite, Redis) handle fsync failures. Our findings show that although applications use many failure-handling strategies, none are sufficient: fsync failures can cause catastrophic outcomes such as data loss and corruption. Our findings have strong implications for the design of file systems and applications that intend to provide strong durability guarantees.

References

[1]
SQLite. 2020. Atomic Commit in SQLite. Retrieved from https://www.sqlite.org/atomiccommit.html.
[2]
Anthony Rebello. 2020. Bug-207729 Mounting EXT4 with data_err=abort does not abort journal on data block write failure. Retrieved from https://bugzilla.kernel.org/show_bug.cgi?id=207729.
[3]
Allen Lai. 2020. Bug-27805553 HARD ERROR SHOULD BE REPORTED WHEN FSYNC() RETURN EIO. Retrieved from https://github.com/mysql/mysql-server/commit/8590c8e12a3374eeccb547359750a9d2a128fa6a.
[4]
Anthony Rebello. 2020. Custom Fault Injection Device Mapper Target: dm-loki. Retrieved from https://github.com/WiscADSL/dm-loki.
[5]
The Linux Kernel Organization. 2020. Device Mapper: dm-flakey. Retrieved from https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/dm-flakey.html.
[6]
The FreeBSD Project. 2020. FreeBSD VFS Layer re-dirties pages after failed block write. Retrieved from https://github.com/freebsd/freebsd/blob/0209fe3398be56e5e042c422a96a4fbc654247f4/sys/kern/vfs_bio.c#L2646.
[7]
The Linux Kernel Organization. 2020. FSQA (xfstests). Retrieved from https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/about/.
[8]
Thomas Munro and Craig Ringer. 2020. Fsync Errors—PostgreSQL wiki. Retrieved from https://wiki.postgresql.org/wiki/Fsync_Errors.
[9]
The Linux Kernel Organization. 2020. fsync(2)—Linux Programmer's Manual. Retrieved from http://man7.org/linux/man-pages/man2/fdatasync.2.html.
[10]
Anthony Rebello. 2020. FUSE file system to emulate different file-system failure reactions: CuttleFS. Retrieved from https://github.com/WiscADSL/cuttlefs.
[11]
Jonathan Corbet. 2020. Improved block-layer error handling. Retrieved from https://lwn.net/Articles/724307/.
[12]
The Stack Exchange network. 2020. Is data=journal safer for Ext4 as opposed to data=ordered? Retrieved from https://unix.stackexchange.com/q/127235.
[13]
Google. 2020. LevelDB. Retrieved from https://github.com/google/leveldb.
[14]
Howard Chu. 2020. Lightning Memory-Mapped Database Manager (LMDB). Retrieved from http://www.lmdb.tech/doc/.
[15]
The Linux Kernel Organization. 2020. Man Pages: dmsetup. Retrieved from https://man7.org/linux/man-pages/man8/dmsetup.8.html.
[16]
The Linux Kernel Organization. 2020. Man Pages: losetup. Retrieved from https://man7.org/linux/man-pages/man8/losetup.8.html.
[17]
IEEE and The Open Group. 2020. POSIX Specification for fsync. Retrieved from https://pubs.opengroup.org/onlinepubs/9699919799/functions/fsync.html.
[18]
The PostgreSQL Global Development Group. 2020. PostgreSQL. Retrieved from https://www.postgresql.org/.
[19]
The PostgreSQL Global Development Group. 2020. PostgreSQL: Write-Ahead Logging (WAL). Retrieved from https://www.postgresql.org/docs/current/wal-intro.html.
[20]
Craig Ringer. 2020. PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS. Retrieved from https://www.postgresql.org/message-id/flat/CAMsr+YHh+5Oq4xziwwoEfhoTZgr07vdGG+hu=1adXx59aTeaoQ@mail.gmail.com
[21]
Redis Labs. 2020. Redis. Retrieved from https://redis.io/.
[22]
Redis Labs. 2020. Redis Persistence. Retrieved from https://redis.io/topics/persistence.
[23]
SQLite. 2020. SQLite. Retrieved from https://www.sqlite.org/index.html.
[24]
SQLite. 2020. SQLite Write-Ahead Logging. Retrieved from https://www.sqlite.org/wal.html.
[25]
SystemTap. 2020. SystemTap. Retrieved from https://sourceware.org/systemtap/.
[26]
Theodore Ts'o. 2020. Why does ext4 clear the dirty bit on I/O error? Retrieved from https://www.postgresql.org/message-id/edc2e4d5-5446-e0db-25da-66db6c020cc3%40commandprompt.com
[27]
WiredTiger. 2020. WT-4045 Don't retry fsync calls after EIO failure. Retrieved from https://github.com/wiredtiger/wiredtiger/commit/ae8bccce3d8a8248afa0e4e0cf67674a43dede96.
[28]
Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. 2018. Operating Systems: Three Easy Pieces (1st ed.). Arpaci-Dusseau Books.
[29]
Lakshmi N. Bairavasundaram, Garth Goodson, Bianca Schroeder, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2008. An analysis of data corruption in the storage stack. In Proceedings of the 6th USENIX Symposium on File and Storage Technologies (FAST’08). 223–238.
[30]
Lakshmi N. Bairavasundaram, Garth R. Goodson, Shankar Pasupathy, and Jiri Schindler. 2007. An analysis of latent sector errors in disk drives. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’07). 289–300.
[31]
Lakshmi N. Bairavasundaram, Meenali Rungta, Nitin Agrawal, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Michael M. Swift. 2008. Analyzing the effects of disk-pointer corruption. In Proceedings of the International Conference on Dependable Systems and Networks (DSN’08). 502–511.
[32]
Jinrui Cao, Om Rameshwar Gatla, Mai Zheng, Dong Dai, Vidya Eswarappa, Yan Mu, and Yong Chen. 2018. PFault: A general framework for analyzing the reliability of high-performance parallel file systems. In Proceedings of the International Conference on Supercomputing. 1–11.
[33]
Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. Optimistic crash consistency. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). 228–243.
[34]
Vijay Chidambaram, Tushar Sharma, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2012. Consistency without ordering. In Proceedings of the 10th USENIX Symposium on File and Storage Technologies (FAST’12). 101–116.
[35]
Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong, Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David Johnson, Kirk Webb, Aditya Akella, Kuangching Wang, Glenn Ricart, Larry Landweber, Chip Elliott, Michael Zink, Emmanuel Cecchet, Snigdhaswin Kar, and Prabodh Mishra. 2019. The design and operation of CloudLab. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’19). 1–14.
[36]
Christian Forfang. 2014. Evaluation of High Performance Key-value Stores. Master’s thesis. Norwegian University of Science and Technology.
[37]
Aishwarya Ganesan, Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2017. Redundancy does not imply fault tolerance: Analysis of distributed storage reactions to single errors and corruptions. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). 149–165.
[38]
Gregory R. Ganger and Yale N. Patt. 1994. Metadata update performance in file systems. In Proceedings of the 1st Symposium on Operating Systems Design and Implementation (OSDI’94). 49–60.
[39]
Haryadi S. Gunawi, Vijayan Prabhakaran, Swetha Krishnan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2007. Improving file system reliability with I/O shepherding. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP’07). 293–306.
[40]
Haryadi S. Gunawi, Cindy Rubio-González, Remzi H. Arpaci-Dusseau Andrea C. Arpaci-Dusseau, and Ben Liblit. 2008. EIO: Error handling is occasionally correct. In Proceedings of the 6th USENIX Symposium on File and Storage Technologies (FAST’08). 207–222.
[41]
Robert Hagmann. 1987. Reimplementing the Cedar file system using logging and group commit. In Proceedings of the 11th ACM Symposium on Operating Systems Principles (SOSP’87). 155–162.
[42]
FUSE (Filesystem in Userspace). 2020. The reference implementation of the Linux FUSE (Filesystem in Userspace) interface. Retrieved from https://github.com/libfuse/libfuse.
[43]
Shehbaz Jaffer, Stathis Maneas, Andy Hwang, and Bianca Schroeder. 2019. Evaluating file system reliability on solid state drives. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’19). 783–797.
[44]
Hannu H. Kari. 1997. Latent Sector Faults and Reliability of Disk Arrays. Ph.D. Dissertation. Helsinki University of Technology.
[45]
Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica Hamrick, Jason Grout, Sylvain Corlay, Paul Ivanov, Damián Avila, Safia Abdalla, Carol Willing, and Jupyter development team. 2016. Jupyter notebooks—A publishing format for reproducible computational workflows. In Positioning and Power in Academic Publishing: Players, Agents and Agendas, Fernando Loizides and Birgit Scmidt (Eds.). IOS Press, Netherlands, 87–90. Retrieved from https://eprints.soton.ac.uk/403913/.
[46]
Andrew Krioukov, Lakshmi N. Bairavasundaram, Garth R. Goodson, Kiran Srinivasan, Randy Thelen, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2008. Parity lost and parity regained. In Proceedings of the 6th USENIX Symposium on File and Storage Technologies (FAST’08). 127–141.
[47]
Avantika Mathur, Mingming Cao, and Andreas Dilger. 2007. Ext4: The next generation of the Ext3 file system. Usenix Assoc. 32, 3 (June 2007), 25–30.
[48]
Jeffrey C. Mogul. 1994. A better update policy. In Proceedings of the USENIX Summer Technical Conference (USENIX Summer’94). 99–111.
[49]
Jayashree Mohan, Ashlie Martinez, Soujanya Ponnapalli, Pandian Raju, and Vijay Chidambaram. 2018. Finding crash-consistency bugs with bounded black-box crash testing. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18). 33–50.
[50]
Thanumalayan Sankaranarayana Pillai, Ramnatthan Alagappan, Lanyue Lu, Vijay Chidambaram, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2017. Application crash consistency and performance with CCFS. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). 181–196.
[51]
Thanumalayan Sankaranarayana Pillai, Vijay Chidambaram, Ramnatthan Alagappan, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2014. All file systems are not created equal: On the complexity of crafting crash-consistent applications. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI’14). 433–448.
[52]
Vijayan Prabhakaran, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2005. Model-based failure analysis of journaling file systems. In Proceedings of the International Conference on Dependable Systems and Networks (DSN’05). 802–811.
[53]
Vijayan Prabhakaran, Lakshmi N. Bairavasundaram, Nitin Agrawal, Haryadi S. Gunawi, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2005. IRON file systems. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP’05). 206–220.
[54]
Anthony Rebello, Yuvraj Patel, Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2020. Can applications recover from fsync failures? In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’20). 753–767.
[55]
Ohad Rodeh, Josef Bacik, and Chris Mason. 2013. BTRFS: The Linux B-tree filesystem. ACM Trans. 9, 3 (Aug. 2013), 1–32.
[56]
Bianca Schroeder, Sotirios Damouras, and Phillipa Gill. 2010. Understanding latent sector errors and how to protect against them. In Proceedings of the 8th USENIX Symposium on File and Storage Technologies (FAST’10). 71–84.
[57]
Margo Seltzer, Peter Chen, and John Ousterhout. 1990. Disk scheduling revisited. In Proceedings of the Winter 1990 USENIX Conference. 313–323.
[58]
Chuck Silvers. 2000. UBC: An efficient unified I/O and memory caching subsystem for NetBSD. In Proceedings of the USENIX Annual Technical Conference: FREENIX Track. 285–290.
[59]
Adam Sweeney, Doug Doucette, Wei Hu, Curtis Anderson, Mike Nishimoto, and Geoff Peck. 1996. Scalability in the XFS file system. In Proceedings of the USENIX Annual Technical Conference.
[60]
Tomas Vondra. 2019. PostgreSQL vs. fsync. How is it possible that PostgreSQL used fsync incorrectly for 20 years, and what we’ll do about it.Brussels, Belgium. Retrieved from https://archive.fosdem.org/2019/schedule/event/postgresql_fsync/.
[61]
Youjip Won, Jaemin Jung, Gyeongyeol Choi, Joontaek Oh, Seongbae Son, Jooyoung Hwang, and Sangyeun Cho. 2018. Barrier-enabled IO stack for flash storage. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST’18). 211–226.
[62]
Yupu Zhang, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2010. End-to-end data integrity for file systems: A ZFS case study. In Proceedings of the 8th USENIX Symposium on File and Storage Technologies (FAST’10). San Jose, CA, 29–42.
[63]
Yiying Zhang and Steven Swanson. 2015. A study of application performance with non-volatile main memory. In Proceedings of the 31st IEEE Conference on Massive Data Storage (MSST’15). Santa Clara, CA, 1–10.

Cited By

View all
  • (2024)High-Performance Computing Storage Performance and Design Patterns—Btrfs and ZFS Performance for Different Use CasesComputers10.3390/computers1306013913:6(139)Online publication date: 3-Jun-2024
  • (2024)When Amnesia Strikes: Understanding and Reproducing Data Loss Bugs with Fault InjectionProceedings of the VLDB Endowment10.14778/3681954.368198017:11(3017-3030)Online publication date: 1-Jul-2024
  • (2024)MemSnap μCheckpoints: A Data Single Level Store for Fearless PersistenceProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651334(622-638)Online publication date: 27-Apr-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 17, Issue 2
May 2021
202 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/3465461
  • Editor:
  • Sam H. Noh
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2021
Accepted: 01 February 2021
Received: 01 November 2020
Published in TOS Volume 17, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Durability
  2. persistence
  3. fsync failures
  4. fsync
  5. file system

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • VMware
  • Intel
  • Seagate
  • Samsung
  • NSF
  • DOE

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)506
  • Downloads (Last 6 weeks)74
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)High-Performance Computing Storage Performance and Design Patterns—Btrfs and ZFS Performance for Different Use CasesComputers10.3390/computers1306013913:6(139)Online publication date: 3-Jun-2024
  • (2024)When Amnesia Strikes: Understanding and Reproducing Data Loss Bugs with Fault InjectionProceedings of the VLDB Endowment10.14778/3681954.368198017:11(3017-3030)Online publication date: 1-Jul-2024
  • (2024)MemSnap μCheckpoints: A Data Single Level Store for Fearless PersistenceProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651334(622-638)Online publication date: 27-Apr-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media