Re: Open source storage



On Feb 20, 5:08 pm, Bill Todd <billt...@xxxxxxxxxxxxx> wrote:
S wrote:
XFS has had its own issues. Yes you have on-disk continuity, but if
you lose power while XFS is building its extant, you've got data
corruption.
I'd like to see a credible reference for that allegation (unless you're
simply referring to the potential inconsistency that virtually all
update-in-place file systems have when *updating* - rather than writing
for the first time - multiple sectors at once).
See section 6.1: Delaying allocation
http://oss.sgi.com/projects/xfs/papers/xfs_usenix/index.html
There's nothing there that even remotely hints at data corruption on
power loss: the defined semantics of any normal Unix-style file system
(including ZFS) specifies that any user data that hasn't been explicitly
flushed to disk may or may not be on the disk, in whole or in part,
should power fail (that's what write-back caching is all about: if you
want atomic on-disk persistence, you use fsync or per-request
write-through - though even those won't necessarily guarantee
full-request, let alone multi-request, atomicity beyond the individual
file block level should power fail before the request completes, even on
ZFS; about the only difference with ZFS is that individual file block
disk writes are guaranteed to be atomic rather than just the
near-guarantee that disks provide that individual sector writes will be
atomic).

You're right semantically. I understand the difference between sync
and async, but it seems like the XFS designers almost went out of
their way to ensure your data got corrupted when you lost power.

I see your point, but it strikes me as one of the 'a little bit
pregnant' variety: absent explicit write-through or cache-flush
control, *any* Unix file system will tend to produce data
inconsistencies after interruption, the only question being just how
many (not whether there will be any at all).

For a change, I agree somewhat with your initial analogy. Using XFS is
like having sex without a condom, its great, but very unsafe if you
don't know what you're doing. Not something I'd use in an enterprise
scenario.

What the XFS designers went out of their way to do was to avoid writing
data that never needed to be written (files that got deleted before ever
making it to disk) and avoid fragmenting data that did get written (by
deferring allocation and writing as long as feasible). As a by-product,
dirty data in the cache didn't get flushed out as often as in more
primitive file system environments where flushing data older than (e.g.)
30 seconds (ZFS uses 5 seconds as its default IIRC) didn't have any real
down-side.

To put it another way, arbitrarily making data persistent frequently for
an application or user that isn't sufficiently interested to have taken
the appropriate steps to do so penalizes those applications and users
that *have* taken such steps (by consuming system resources
unnecessarily). And since you can never completely protect such
negligent applications/users (unless you make every write synchronous),
going to the opposite extreme (and thereby encouraging them actually to
address the issue rather than merely hope that it won't bite them too
frequently) has merit.

Again, your points make great sense in theory, but wouldn't fly in the
real world. Obviously you wouldn't want to make every write
synchronous, but there is a gray area between that and the XFS/
ReiserFS approach which just says hey..you're on your own.

That said, a different design might have achieved more up-to-date
persistence with minimal impact on system resource consumption (e.g., by
dumping small user data updates lazily into the log temporarily).

This I agree with 100%.


The

early SGI systems were designed with special hardware to shutdown
gracefully in case of power loss, so maybe XFS was designed on the
assumption that this would always be the case.

Could be, but I kind of doubt it: with potentially gigabytes of
discontiguous dirty data in system cache, you'd need a full-blown UPS to
guarantee persistence in such a case (and since even UPSs have been
known to fail, a pair of them suitably wired for redundancy).

Check these out:
http://www.ibm.com/developerworks/linux/library/l-fs11.html
http://linuxmafia.com/faq/Filesystems/reiserfs.html

I'll take boring old ext3 anytime over XFS or ReiserFS, I don't like
to live life on the edge when it comes to my data or worse, other
people's data.

Then you really should consider a system like VMS, where at least many
writes are synchronous by default: Unix file systems *always* 'live on
the edge' in the sense that you describe - the only question being just
how sharp the edge is.

Come on. Boring is one thing, dead is another.

S


- bill

.



Relevant Pages

  • Re: disk check vs scandisk
    ... upon power being restored, no scan disk came on ... > I am not a file system expert, but my guess is that your "xp home ... > Microsoft Product Support Services White Paper ...
    (microsoft.public.windowsxp.basics)
  • Re: filesystem reliability
    ... Most Lisp Machines were effectively diskless. ... The disk was used mostly ... there really isn't any such thing as a file system that is ... 'invulnerable' to power failures and such. ...
    (comp.os.linux.misc)
  • Re: Open source storage
    ... you lose power while XFS is building its extant, ... There's nothing there that even remotely hints at data corruption on ... power loss: the defined semantics of any normal Unix-style file system ...
    (comp.arch.storage)
  • Re: Open source storage
    ... you lose power while XFS is building its extant, ... There's nothing there that even remotely hints at data corruption on ... power loss: the defined semantics of any normal Unix-style file system ...
    (comp.arch.storage)
  • Re: Open source storage
    ... you lose power while XFS is building its extant, ... power loss: the defined semantics of any normal Unix-style file system ... flushed to disk may or may not be on the disk, in whole or in part, ... I see your point, but it strikes me as one of the 'a little bit pregnant' variety: absent explicit write-through or cache-flush control, *any* Unix file system will tend to produce data inconsistencies after interruption, the only question being just how many. ...
    (comp.arch.storage)