Re: Open source storage
- From: Bill Todd <billtodd@xxxxxxxxxxxxx>
- Date: Wed, 20 Feb 2008 20:08:37 -0500
S wrote:
....
There's nothing there that even remotely hints at data corruption onSee section 6.1: Delaying allocationXFS has had its own issues. Yes you have on-disk continuity, but ifI'd like to see a credible reference for that allegation (unless you're
you lose power while XFS is building its extant, you've got data
corruption.
simply referring to the potential inconsistency that virtually all
update-in-place file systems have when *updating* - rather than writing
for the first time - multiple sectors at once).
http://oss.sgi.com/projects/xfs/papers/xfs_usenix/index.html
power loss: the defined semantics of any normal Unix-style file system
(including ZFS) specifies that any user data that hasn't been explicitly
flushed to disk may or may not be on the disk, in whole or in part,
should power fail (that's what write-back caching is all about: if you
want atomic on-disk persistence, you use fsync or per-request
write-through - though even those won't necessarily guarantee
full-request, let alone multi-request, atomicity beyond the individual
file block level should power fail before the request completes, even on
ZFS; about the only difference with ZFS is that individual file block
disk writes are guaranteed to be atomic rather than just the
near-guarantee that disks provide that individual sector writes will be
atomic).
You're right semantically. I understand the difference between sync
and async, but it seems like the XFS designers almost went out of
their way to ensure your data got corrupted when you lost power.
I see your point, but it strikes me as one of the 'a little bit pregnant' variety: absent explicit write-through or cache-flush control, *any* Unix file system will tend to produce data inconsistencies after interruption, the only question being just how many (not whether there will be any at all).
What the XFS designers went out of their way to do was to avoid writing data that never needed to be written (files that got deleted before ever making it to disk) and avoid fragmenting data that did get written (by deferring allocation and writing as long as feasible). As a by-product, dirty data in the cache didn't get flushed out as often as in more primitive file system environments where flushing data older than (e.g.) 30 seconds (ZFS uses 5 seconds as its default IIRC) didn't have any real down-side.
To put it another way, arbitrarily making data persistent frequently for an application or user that isn't sufficiently interested to have taken the appropriate steps to do so penalizes those applications and users that *have* taken such steps (by consuming system resources unnecessarily). And since you can never completely protect such negligent applications/users (unless you make every write synchronous), going to the opposite extreme (and thereby encouraging them actually to address the issue rather than merely hope that it won't bite them too frequently) has merit.
That said, a different design might have achieved more up-to-date persistence with minimal impact on system resource consumption (e.g., by dumping small user data updates lazily into the log temporarily).
The
early SGI systems were designed with special hardware to shutdown
gracefully in case of power loss, so maybe XFS was designed on the
assumption that this would always be the case.
Could be, but I kind of doubt it: with potentially gigabytes of discontiguous dirty data in system cache, you'd need a full-blown UPS to guarantee persistence in such a case (and since even UPSs have been known to fail, a pair of them suitably wired for redundancy).
I'll take boring old ext3 anytime over XFS or ReiserFS, I don't like
to live life on the edge when it comes to my data or worse, other
people's data.
Then you really should consider a system like VMS, where at least many writes are synchronous by default: Unix file systems *always* 'live on the edge' in the sense that you describe - the only question being just how sharp the edge is.
- bill
.
- Follow-Ups:
- Re: Open source storage
- From: S
- Re: Open source storage
- References:
- Open source storage
- From: S
- Re: Open source storage
- From: Anton Rang
- Re: Open source storage
- From: the wharf rat
- Re: Open source storage
- From: Bill Todd
- Re: Open source storage
- From: S
- Re: Open source storage
- From: Bill Todd
- Re: Open source storage
- From: S
- Re: Open source storage
- From: Bill Todd
- Re: Open source storage
- From: S
- Open source storage
- Prev by Date: Re: Open source storage
- Next by Date: Re: Open source storage
- Previous by thread: Re: Open source storage
- Next by thread: Re: Open source storage
- Index(es):
Relevant Pages
|