Re: Dreadfully plonking Wifi question



PeterD <pd.news@xxxxxxxxxxxxxxxxx> wrote:
Tim Auton <tim.auton@xxxxxxxxxxxxxxxxx> wrote:

I'm working on a script to extract all the addresses I've used from
Mail; I can let you have a copy if you'd find it useful.

I spoke too soon. I want to do what you're doing with Mail, to Eudora
mailboxes. I think this is a 'simple' text file process to extract all
chunks of text that start with < or space, followed by anylengthstring
ending in @mydomain.com and append said chunk to a text file. I can make
a unique list, but I'm sure there's a nixy command to uniquefy the list
too.

The uk2 servers add a "Received:" header which includes the
foo@xxxxxxxxxxxxxxxxxx bit that we're after, even for messages where
that may not be obvious after redirection (mailing lists, for example),
so I look at that as well as some other headers.

This one-liner was as far as I got in bash:

find ~/Library/Mail -name *.emlx -exec grep -A 3 \
-e "^To:\|^From:\|^Cc:\|^Bcc:\|^BCC:\|^CC:\|^Received:" {} \; \
| sed -n 's/.*[^0-9a-zA-Z\.\-_]\([0-9a-zA-Z\.\-_]*@uton.org\).*/\1/p' \
| sort \
| uniq -c

Change the search path, filename pattern and domain name and it ought to
work equally as well for Eudora, assuming it keeps its messages as plain
text.

Unfortunately equally well is not ideally well. It handles headers
broken over multiple lines very clumsily and actual addresses broken over
two lines not at all. The quickest fix for that was to do it in
AppleScript :) Printing multiple lines from grep means addresses may be
counted many times per email, so the numbers are only of any use for
rough indications.

The sed expression might not be ideal either, only searching for a
subset of legal email address characters. It may also be a bit
write-only (are there any regular expressions which aren't?). Sed is
doing a line-by-line:

s - search-and-replace for
/
.* - whatever
[^0-9a-zA-Z\.\-_] - a character which isn't one of these
(
[0-9a-zA-Z\.\-_]* - any number of these characters
@uton.org - this exact string [1]
)
.* - whatever

replacing with
/
\1 - the first expression in brackets
/
p - printing only matching lines


Tim

[1] Spot the deliberate mistake!
.



Relevant Pages

  • Re: Outlook 2003 email listing ALL previous posts on topic, etc.
    ... plus every earlier one sent by that sender. ... lists the address of every email on topic that preceeded the one received, plus every earlier sent by that sender." ... So are you talking about the original message getting quoted by the first respondent that forwarded or replied, and then that 2nd message again getting quoted by the next person that forwarded or replied, and so on which leaves a series of indented e-mails when each person quoted the content of the e-mail that they got along the entire chain of recipients? ... that quoted content with a subset of the headers for that original message, like From, To, Cc, Subject, and Date. ...
    (microsoft.public.outlook)
  • Re: UGH, TOO MUCH SPAM
    ... There are some things in the XOVER headers (From:, Subject:, References:, ... rr.com has no reliable controls on their server, ... network is where you can find Eleventy-Zillion windoze boxes waiting to ... be zombied (if they aren't already on 'Open Proxy" lists). ...
    (comp.os.linux.misc)
  • Re: Reading an Excel file using Jet.Oledb.4.0
    ... Jet.OLEDB.4.0 and Microsoft.ACE.OLEDB.12.0 regarding the errors codes. ... There are various HRESULT lists published here and there, ... But HRESULTS ... etc., these headers are supplied with VC++ and a several SDK kits, because ...
    (microsoft.public.data.ado)
  • Re: Match (merge) components from 2 spreadsheets
    ... Copy the 2 lists into column A of this new sheet ... (Don't include the headers when you copy--just the raw data) ... a roy january ...
    (microsoft.public.excel.misc)
  • Re: Bogus reply-to
    ... is all of these discussion lists have differnt ... > Either way, the headers don't work. ... mail client authors/vendors simply choose not to support them. ...
    (Debian-User)