Re: Frequent metadata corruption with ext3 + hard power-off

Top Page

Reply to this message
Author: Mats Ahlgren
Date:  
To: ext3-users
CC: Theodore Tso, ahlgren
Old-Topics: Re: Frequent metadata corruption with ext3 + hard power-off
Subject: Re: Frequent metadata corruption with ext3 + hard power-off
Hello,


After 2 months of usage, all such system-destroying problems have disappeared
after disabling write caching and setting data=journal (instead of ordered).

Just thought I should let everyone know. Thank you Ted.


If anyone has any insight into what was going on, I'd appreciate it:

Namely, I'm confused: I would guess caching simply delays the time data gets
to disk, and perhaps exacerbates data being written in not-the-order it was
given? But, how could this cause a problem on a journaled filesystem? if one
is (theoretically) only appending to the journal, checksumming/hashing to
detect consistent journal entries on failure (since the last checkpoint), and
only replaying consistent journal entries (which are idempotent)... then,
assuming all those things above work, how could caching cause massive
corruption of the directory tree? (Is the above an accurate model for ext3?)

Also, does anyone think data-journaling mode being 'ordered' instead
of 'journaled' had anything to do with it?


Sincerely,
Mats

On Sunday 18 March 2007 09:33:59 Theodore Tso wrote:
> It sounds like you have a disk which is doing very aggressive write
> caching. If you are using a new enough kernel (2.6.9 or greater
> should have this), adding "barrier=1" to your mount options should
> help. We should probably make this the default at this point...
>
>                         - Ted


On Saturday 17 March 2007 21:42:17 Mats Ahlgren wrote:
> Hello.
>
> I'm having serious issues with ext3; any insight would be greatly

appreciated:
>
>
> _____ Overview:
>
> I believe ext3 is supposed to be recoverable in the case of a power failure

by
> replaying the log.
>
> However, on two separate computers (running different operatings systems

too),
> this has been everything but the case.
>
>
> _____ Specifics:
>
> Sometimes, my kernel will hard-freeze and I'll have to do a hard reboot.

When
> this happens, sometimes fsck will insist on running and find some orphaned
> inodes, which it will proceed to put in the /lost+found directory.
>
> This is unacceptable: The last time this happened, random files in my
> operating system were plucked from the file system and stuffed in

lost+found,
> corrupting the OS and forcing a reinstall. Another time, files I had

recently
> moved (a final project) a minute before the crash were orphaned and put in
> the lost+found, effectively destroying it.
>
> Why should a lost+found folder even be necessary when the file hierarchy is
> guaranteed to be consistent?
>
>
> In response to these problems, I changed the ext3 journaling mode

to "journal"
> rather than "ordered" (frankly it seems deeply disturbing that "ordered" is
> the default). Since then, I've once had to hard-reboot and yet again found
> files in the /lost+found folder.
>
> Might anyone know why ext3 is not fulfilling its promise of an
> always-consistent file system?
>
>
> _____ Other interacting issues:
>
> I'm running RAID1 (mirroring) on one computer, but I've had the same issues

on
> another computer without RAID.
>
> (In response to "you shouldn't hard-reboot your computer": I realize that

most
> computers are not meant to be hard-rebooted, but I don't have a sysrq key

and
> xmodmapping it has been difficult. I also realize that kernels shouldn't
> crash, but what's a person to do if the computer doesn't respond to
> ctrl-alt-f1 and doesn't leave any messages in the logs...)
>
> (In response to "maybe your drive is defective": This is not a problem with

a
> defective drive; I've tried multiple drives.)
>
> (In response to "you should backup your data": Periodic backups clearly

help,
> but it's ridiculous to restore a system from backup every week because a
> hard-freeze corrupted your filesystem...)
>
>
> Any insight would be greatly appreciated. These problems have been making me
> look for other file systems (such as zfs, which unfortunately I can't use to
> boot; or reiser4, which also makes a filesystem-is-always-consistent
> guarantee); I would prefer to use ext3, but I've never had these sorts of
> problems with old Mac OS, OS X, or Windows.
>
>
> Thank you,
> Mats
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users@???
> https://www.redhat.com/mailman/listinfo/ext3-users
>





_______________________________________________
Ext3-users mailing list
Ext3-users@???
https://www.redhat.com/mailman/listinfo/ext3-users