Re: Open source storage



S wrote:

....

if Reiser hasn't run his FS on any enterprise-class storage
how can we assume its ready for prime-time, enterprise-class
deployment?

Because any failure of enterprise-class storage to faithfully mimic (e.g.) SCSI behavior should be considered to be an enterprise-storage bug rather than any problem with the file system?


XFS has had its own issues. Yes you have on-disk continuity, but if
you lose power while XFS is building its extant, you've got data
corruption.
I'd like to see a credible reference for that allegation (unless you're
simply referring to the potential inconsistency that virtually all
update-in-place file systems have when *updating* - rather than writing
for the first time - multiple sectors at once).

See section 6.1: Delaying allocation

http://oss.sgi.com/projects/xfs/papers/xfs_usenix/index.html

There's nothing there that even remotely hints at data corruption on power loss: the defined semantics of any normal Unix-style file system (including ZFS) specifies that any user data that hasn't been explicitly flushed to disk may or may not be on the disk, in whole or in part, should power fail (that's what write-back caching is all about: if you want atomic on-disk persistence, you use fsync or per-request write-through - though even those won't necessarily guarantee full-request, let alone multi-request, atomicity beyond the individual file block level should power fail before the request completes, even on ZFS; about the only difference with ZFS is that individual file block disk writes are guaranteed to be atomic rather than just the near-guarantee that disks provide that individual sector writes will be atomic).

It's been many years since I read that paper, though, and it provided a pleasant trip down memory lane. XFS did a lot of interesting things for the early '90s, even if not all of them were necessarily optimal.

....

So I really think the issues stopping people from deploying open
source storage are:
1. Lack of snapshots, which may not be an issue if ZFS gains traction.
My impression is that snapshots have been available in Linux, BSD, and
for that matter Solaris itself for many years in various forms
associated with LVMs and/or file systems.

I believe you can only have 1 snapshot at a time in LVM. Nowhere near
the sophistication of WAFL snapshots.

But all that you need to do an on-line backup, one of the most important consumers of snapshot technology. Other uses of snapshots tend to be more like inferior substitutes for 'continuous data protection' facilities, though the advent of writable snapshots (clones) has opened up new uses (at least new imaginable uses: how much actual utility they have I'm not sure).

The old Solaris fssnap mechanism may have been limited to a single snapshot. Peter Braam et al. produced alpha and beta releases of a more general snapshot facility called snapfs in 2001 which I thought either got further developed or replaced with another product of the same name, but I didn't find further information on it. The Linux LVM and LVM2 support snapshots (the latter including writable snapshots) - and a quick glance at the documentation didn't seem to indicate that they supported only one at a time.


2. No coherent DR strategy. I don't consider rsync a mirroring
solution if it needs to walk the tree each time.
Synchronous mirroring at the driver level has been available for ages,
and is entirely feasible across distances of at least 100 miles - enough
to survive any disaster which your business is likely to survive as long
as your remote site is reasonably robust. If write performance
requirements can be relaxed a bit distances can be significantly
greater. I haven't looked recently, so I don't know how well those
facilities deal with temporary link interruptions and subsequent
catch-up (if you've got dedicated fiber to a robust back-up site that
may not be too likely to occur, but in other circumstances it would be
very desirable)

Can you name some examples of synchronous mirroring at the driver
level? Is it open source? Easy to deploy?

I'm not all that familiar with the offerings, but my impression is that DRDB may be the current Linux standard in this area; a 2003 description can be found at http://www.linux-mag.com/id/1502 , and it's still being developed (just Google it). You may have been able to roll your own remote replication before DRDB by using a remote disk paired (RAID-1-style) with a local disk under local LVM facilities.

- bill
.



Relevant Pages

  • Re: Open source storage
    ... Because any failure of enterprise-class storage to faithfully mimic ... power loss: the defined semantics of any normal Unix-style file system ... flushed to disk may or may not be on the disk, in whole or in part, ... Lack of snapshots, which may not be an issue if ZFS gains traction. ...
    (comp.arch.storage)
  • Re: best way to sych a mirror SATA drive to RIAD array
    ... live filesystem, whilst taking up only the space used by files that have ... currently have snapshots of Day_1..6, ... backup and archiving system. ... A disk has a likelyy MTBF or 2-5 years realistically., Better if its being used for a timed backup only, since its not doing a lot most of the time. ...
    (comp.os.linux.misc)
  • Re: Mounting virtual file system using dd command
    ... >how do I make a catalog of a disk or partition with checksums (e.g. ... >that have changed between the snapshots. ... Which is to say that _find_ will be the primary command ... you'll need to accomplish all of the tasks you've asked ...
    (comp.os.linux.misc)
  • Re: [9fans] Corrupted file entry on QEMU - how to recover?
    ... Until then, I managed to free up a lot of space, so this time I'll put up a new 4GB QEMU fossil+venti Plan 9 and save the old disk until I can fix it. ... If you were using snapshots, then you might be able to ... could rewrite the super block to point at an older, ... hopefully not corrupt, root. ...
    (comp.os.plan9)
  • Isolated Base Installation?
    ... which will allow a FUSE File System to read through itself. ... FUSE file systems can be used to make snapshots. ... IBI is a method for preserving the install time ... The basic premise for IBI is that when the system is installed, the root ...
    (Ubuntu)