Re: Odysseus update, change in file storage
- From: Bill Cole <bill@xxxxxxxxxxxxx>
- Date: Fri, 04 Apr 2008 14:42:20 -0400
In article <op.t81904hwnn735j@xxxxxxx>,
"John H Meyers" <jhmeyers@xxxxxxxxxxxxxx> wrote:
On Thu, 03 Apr 2008 13:27:26 -0500, Bill Cole wrote:
FWIW, here is how [Time Machine] works...
Thanks for the info.
That sort of backup mustering that is done all of the time by software
that is hooked into the filesystem layer is where the future is for all
routine backup...
A full backup of a discrete filesystem (i.e. usually a disk partition
or the logical equivalent) can be a lot faster than an incremental backup
of just what's changed since the last backup...
With a backup approach that tracks changes as they happen
rather than scanning for them retrospectively, most of the effort
of figuring out what to back up is eliminated.
Coming back down to the small task at hand, for me,
I just want an independent backup of my Eudora folder,
and even if my entire computer can be backed up over gigabit ethernet
to a Storage Area Network in the same time, I don't care :)
This is _personal_ backup, to be put on a CD,
or a USB stick, or "split" and emailed to myself at Gmail,
not a trip to Mars :) So I use good old "zip folders,"
which is mighty universal and compatible.
If Odysseus stores one message per file,
the efficiency of my backup would sink faster than stocks in 1929,
and in fact it won't even be possible to do it any more.
I guess I made my point poorly...
Your choice of a backup method for your mail is influenced by how the
mail is stored. Other methods exist that fit the bazillion-little-file
model better, and they are becoming more widespread in all sorts of
environments precisely because the numbers of files that people want to
keep backed up are exploding. On the Mac side, Time Machine is an
example, but my Windows colleagues have told me that there's some
similar thing on their side. Backup strategies traditionally have spent
a lot of compute time in order to economize on bytes transferred and
stored, but for desktop backup these days the real economy of that has
changed. I can't say what exactly the Windows alternatives are, but I'm
sure they must exist.
Searching is another activity which is strained by per-file processing;
the only way to _seem_ to compensate for that is by an extraordinary
amount of up-front indexing, which only shifts the burden around,
also requiring far more storage, plus a complex and more fragile system,
where certain activities (e.g. editing messages) tax its coherence,
and which also commonly have more limitations in searching
(e.g. only "whole words" [like Google], "words beginning with" etc.)
I think you are overstating this. Any mail storage model has to include
indexing of the message fields that it presents to users in a message
list, or the UI ends up painfully slow.
I have no indexes at all, other than my TOCs (which don't index
anything for searching, except that "summaries" searches the TOC itself),
and my searches are not only plenty fast enough, but I can search
even for embedded strings within words, etc.
Searches that are restricted to the header fields in the TOC use the TOC
as an index. Meta-data fields that only exist in the TOC use the TOC as
an index.
Others who enable the "superfast X1 search" (Eudora v7 only, sorry)
write all the time about their computer "churning" and barely responding
while indexing (for rebuilding), about the huge indexes created,
about the inability to search text embedded in longer words,
about the indexes "breaking" fairly often, about losing
the flexibility to "drop in" or re-arrange mailboxes
and still keep searches working, etc.
So I like my searches better -- primitive implementation, not "super fast,"
but lowest overhead in other ways, and also most flexible.
With Odysseus' "one message per file," my searching would go down the drain,
or require me to build a highly complex and "indexed" system
for just a simple task. Even though I have ten years' mail on hand,
Eudora's current "dumb" system works best for me.
Again, I think you are overstating the slowdown from doing unindexed
searches of a large number of files. One of the reasons I use the
'tradspool' storage method (file per article) for most of my news server
is so that I can easily do arbitrary content searches (usually using
grep) across the whole spool (or in specific group subtrees) and find
specific articles, and doing so is not particularly slow.
Message content searching is actually a pretty complex issue in regards
to how mail is stored. Search that is done inside a single-threaded MUA
is made a little easier by having a few large files rather than many
small ones because the need to open and close each file is avoided, but
when you look at the issue in relationship to the modern world of
multi-threaded apps on multi-core machines and search tools that work on
whole systems (e.g. Spotlight, Google Desktop Search, etc.) the picture
is less clear. One of the things Mac users sometimes complain about in
Eudora is that Spotlight can't be used to find text in specific
messages, and in fact there is no way to get that sort of search from
Spotlight without a file-per-message model, and it is unlikely that any
other system-wide search tool would be able to provide such a feature
without file-per-message storage.
I don't want a massive "system wide" tool for Eudora;
I want Eudora's own tool, which does the job for me perfectly.
"Lean and mean," less "big government bureaucracy" :)
And file-per-message does not prevent raw unindexed searches of all of
the message content by the MUA, if that's what one prefers. different
people prefer different approaches, and where mbox makes finding a
specific message outside of the MUA (or some other mbox-aware tool)
impossible, file-per-message allows anything that can read a text file
to do a search and find specific messages.
Even returning to the issue of local file storage, the NTFS (Windows)..
Windows is off-topic here. :)
There was a specific question about whether Windows' NTFS was comparable,
and Odysseus (the topic) is still cross-platform, and needs to suit all.
You might find it interesting that the file-per-message Maildir++ model
has been made popular primarily by the growing use of IMAP, where
servers retain messages indefinitely, not transiently.
I did raise the point that it's more suitable for servers
than for personal computers :)
I was trying to address your conjecture that it was designed to fit
transient spools rather than permanent storage.
The original Maildir model was developed as a way to simplify concurrent
access to a message store without needing to have all software aware of
a common locking mechanism. That may not strike you as relevant to a MUA
mailstore...
Right :)
but with multiprocessor/multicore/hyperthreaded machines
becoming the norm and desktop OS's long supporting multithreaded apps
it is actually a significant issue.
As I asked before, does this recommend the storage of database records
in a "one record per file" manner?
If you don't want to implement a database that has record-level locking,
and want to allow access from other tools that don't know about such
locking or some internal data structure, yes.
The traditional news spool layout has retained a fair number of users
for decades (with the addition of overview indexing) in large part
because many people have found the accessibility to be useful, but also
because it avoids locking issues and allows multiple concurrent writers
to be active in the same directory (i.e. newsgroup.)
And they need not worry about space allocation (not using the same
physical blocks), or when they can re-use the space, how to assure
that no two write requests write into the same place, etc.?
Right. Filesystem implementations make certain operations 'atomic' from
the perspective of userspace software.
There is always a "locking system" -- if you remove it from one place,
you have to make it re-appear in another.
Right, and with file-per-message most locking issues are pushed down
into the filesystem so that userspace processes never see an
inconsistent state. For example, moving a message between mailboxes is
just a matter of changing which directory the file is linked into, and
userspace code cannot catch that procedure in a half-done state.
Besides being again a _server_ issue, rather than a one-user-at-a-time
personal computer issue,
One user at a time does not mean one task at a time. The clunky
'background' mail checking in Eudora is an example, and it is degraded
by having to manipulate messages in mbox files rather than files in
directories. Similar issues can occur outside of the MUA as well, since
things like backup software and AV scanners can catch an mbox mid-write
or out of sync with its TOC.
you could do exactly the same by building
an "inner filing system" within a single OS file, which,
in effect, is what databases do, so I'm just not persuaded
that dumping the job onto the OS file system is a profound advance.
That's another approach, but doing that moves a MUA in the direction of
Outlook, with its opaque .pst files that nothing but Outlook can touch.
The thought to move it all to the OS file level
just doesn't grab me, but fortunately, I only "think small,"
and what I want is a personal email client that stands on its own,
doesn't make a mountain (of individual files)
out of what ain't the least broke, has fit the bill perfectly
for all this time, and don't need fixing :)
And we come back to my original point: the different approaches to mail
storage all have their own advantages and disadvantages. Maildir-like
models may waste a little allocation block space and complicate
filesystem structures, but they can make multi-threaded MUA's smoother,
bind metadata tightly to the messages, and open up the mail store to
other tools. On the other hand, mbox storage forces the MUA to keep a
separate record of where messages sit in the file and per-message
metadata, and while they reduce filesystem complexity and theoretically
waste less allocated but not used space, in practice they usually are
not actually compacted on every message delete, so they have wastage
internally that is hard to see. Purpose-built database-like storage can
be very fast, efficient with space, support multi-threaded access and
provide neat tricks like 'views' but all of that comes at the cost of
hiding the messages from other tools.
On the other hand, if we abandon the entire "mailbox" concept,
in favor of a "views" concept, like Gmail or Opera's mail,
then it becomes imperative to think of each item as independent,
absolutely requiring an extensive and more complex indexing system
(though not necessarily wasting so much space as "one message per file").
It's no longer Eudora, however, nor apparently even Odysseus.
--
Now where did I hide that website...
.
- Follow-Ups:
- Re: Odysseus update, change in file storage
- From: John H Meyers
- Re: Odysseus update, change in file storage
- References:
- Re: Odysseus update, change in file storage
- From: John H Meyers
- Re: Odysseus update, change in file storage
- From: John H Meyers
- Re: Odysseus update, change in file storage
- From: Bill Cole
- Re: Odysseus update, change in file storage
- From: R. Millstein
- Re: Odysseus update, change in file storage
- From: John H Meyers
- Re: Odysseus update, change in file storage
- From: Bill Cole
- Re: Odysseus update, change in file storage
- From: John H Meyers
- Re: Odysseus update, change in file storage
- Prev by Date: Re: Odysseus update, change in file storage
- Next by Date: Re: update to 6.2.4?
- Previous by thread: Re: Odysseus update, change in file storage
- Next by thread: Re: Odysseus update, change in file storage
- Index(es):
Relevant Pages
|
Loading