Re: Open source storage



S wrote:

....

XFS has had its own issues. Yes you have on-disk continuity, but if
you lose power while XFS is building its extant, you've got data
corruption.
I'd like to see a credible reference for that allegation (unless you're
simply referring to the potential inconsistency that virtually all
update-in-place file systems have when *updating* - rather than writing
for the first time - multiple sectors at once).
See section 6.1: Delaying allocation
http://oss.sgi.com/projects/xfs/papers/xfs_usenix/index.html
There's nothing there that even remotely hints at data corruption on
power loss: the defined semantics of any normal Unix-style file system
(including ZFS) specifies that any user data that hasn't been explicitly
flushed to disk may or may not be on the disk, in whole or in part,
should power fail (that's what write-back caching is all about: if you
want atomic on-disk persistence, you use fsync or per-request
write-through - though even those won't necessarily guarantee
full-request, let alone multi-request, atomicity beyond the individual
file block level should power fail before the request completes, even on
ZFS; about the only difference with ZFS is that individual file block
disk writes are guaranteed to be atomic rather than just the
near-guarantee that disks provide that individual sector writes will be
atomic).

You're right semantically. I understand the difference between sync
and async, but it seems like the XFS designers almost went out of
their way to ensure your data got corrupted when you lost power.

I see your point, but it strikes me as one of the 'a little bit pregnant' variety: absent explicit write-through or cache-flush control, *any* Unix file system will tend to produce data inconsistencies after interruption, the only question being just how many (not whether there will be any at all).

What the XFS designers went out of their way to do was to avoid writing data that never needed to be written (files that got deleted before ever making it to disk) and avoid fragmenting data that did get written (by deferring allocation and writing as long as feasible). As a by-product, dirty data in the cache didn't get flushed out as often as in more primitive file system environments where flushing data older than (e.g.) 30 seconds (ZFS uses 5 seconds as its default IIRC) didn't have any real down-side.

To put it another way, arbitrarily making data persistent frequently for an application or user that isn't sufficiently interested to have taken the appropriate steps to do so penalizes those applications and users that *have* taken such steps (by consuming system resources unnecessarily). And since you can never completely protect such negligent applications/users (unless you make every write synchronous), going to the opposite extreme (and thereby encouraging them actually to address the issue rather than merely hope that it won't bite them too frequently) has merit.

That said, a different design might have achieved more up-to-date persistence with minimal impact on system resource consumption (e.g., by dumping small user data updates lazily into the log temporarily).

The
early SGI systems were designed with special hardware to shutdown
gracefully in case of power loss, so maybe XFS was designed on the
assumption that this would always be the case.

Could be, but I kind of doubt it: with potentially gigabytes of discontiguous dirty data in system cache, you'd need a full-blown UPS to guarantee persistence in such a case (and since even UPSs have been known to fail, a pair of them suitably wired for redundancy).

I'll take boring old ext3 anytime over XFS or ReiserFS, I don't like
to live life on the edge when it comes to my data or worse, other
people's data.

Then you really should consider a system like VMS, where at least many writes are synchronous by default: Unix file systems *always* 'live on the edge' in the sense that you describe - the only question being just how sharp the edge is.

- bill
.



Relevant Pages

  • Re: disk check vs scandisk
    ... upon power being restored, no scan disk came on ... > I am not a file system expert, but my guess is that your "xp home ... > Microsoft Product Support Services White Paper ...
    (microsoft.public.windowsxp.basics)
  • Re: filesystem reliability
    ... Most Lisp Machines were effectively diskless. ... The disk was used mostly ... there really isn't any such thing as a file system that is ... 'invulnerable' to power failures and such. ...
    (comp.os.linux.misc)
  • Re: Open source storage
    ... you lose power while XFS is building its extant, ... There's nothing there that even remotely hints at data corruption on ... power loss: the defined semantics of any normal Unix-style file system ...
    (comp.arch.storage)
  • Re: Open source storage
    ... you lose power while XFS is building its extant, ... power loss: the defined semantics of any normal Unix-style file system ... flushed to disk may or may not be on the disk, in whole or in part, ...
    (comp.arch.storage)
  • Re: Open source storage
    ... you lose power while XFS is building its extant, ... There's nothing there that even remotely hints at data corruption on ... power loss: the defined semantics of any normal Unix-style file system ...
    (comp.arch.storage)