Re: Dreadfully plonking Wifi question
- From: Tim Auton <tim.auton@xxxxxxxxxxxxxxxxx>
- Date: Fri, 4 May 2007 14:56:38 +0100
PeterD <pd.news@xxxxxxxxxxxxxxxxx> wrote:
Tim Auton <tim.auton@xxxxxxxxxxxxxxxxx> wrote:
I'm working on a script to extract all the addresses I've used from
Mail; I can let you have a copy if you'd find it useful.
I spoke too soon. I want to do what you're doing with Mail, to Eudora
mailboxes. I think this is a 'simple' text file process to extract all
chunks of text that start with < or space, followed by anylengthstring
ending in @mydomain.com and append said chunk to a text file. I can make
a unique list, but I'm sure there's a nixy command to uniquefy the list
too.
The uk2 servers add a "Received:" header which includes the
foo@xxxxxxxxxxxxxxxxxx bit that we're after, even for messages where
that may not be obvious after redirection (mailing lists, for example),
so I look at that as well as some other headers.
This one-liner was as far as I got in bash:
find ~/Library/Mail -name *.emlx -exec grep -A 3 \
-e "^To:\|^From:\|^Cc:\|^Bcc:\|^BCC:\|^CC:\|^Received:" {} \; \
| sed -n 's/.*[^0-9a-zA-Z\.\-_]\([0-9a-zA-Z\.\-_]*@uton.org\).*/\1/p' \
| sort \
| uniq -c
Change the search path, filename pattern and domain name and it ought to
work equally as well for Eudora, assuming it keeps its messages as plain
text.
Unfortunately equally well is not ideally well. It handles headers
broken over multiple lines very clumsily and actual addresses broken over
two lines not at all. The quickest fix for that was to do it in
AppleScript :) Printing multiple lines from grep means addresses may be
counted many times per email, so the numbers are only of any use for
rough indications.
The sed expression might not be ideal either, only searching for a
subset of legal email address characters. It may also be a bit
write-only (are there any regular expressions which aren't?). Sed is
doing a line-by-line:
s - search-and-replace for
/
.* - whatever
[^0-9a-zA-Z\.\-_] - a character which isn't one of these
(
[0-9a-zA-Z\.\-_]* - any number of these characters
@uton.org - this exact string [1]
)
.* - whatever
replacing with
/
\1 - the first expression in brackets
/
p - printing only matching lines
Tim
[1] Spot the deliberate mistake!
.
- Follow-Ups:
- Re: Dreadfully plonking Wifi question
- From: Tim Auton
- Re: Dreadfully plonking Wifi question
- References:
- Re: Dreadfully plonking Wifi question
- From: PeterD
- Re: Dreadfully plonking Wifi question
- From: Tim Auton
- Re: Dreadfully plonking Wifi question
- From: PeterD
- Re: Dreadfully plonking Wifi question
- Prev by Date: Re: Another possibly prescient dream ?
- Next by Date: Re: Modem for intel Mac?
- Previous by thread: Re: Dreadfully plonking Wifi question
- Next by thread: Re: Dreadfully plonking Wifi question
- Index(es):
Relevant Pages
|