Fixing mangled mbox 'From ' header lines?



Hello,

I have an archive of a 10 year old public mailing list that I plan to
import into GoogleGroups for archival and retrieval. There are over
27000 messages in the archive. It is in standard 'mbox' format.

In preparation for uploading to Google, I've been doing a lot of
cleanup of the archive -- finding duplicate and off-topic posts,
fixing some mangled headers, removing excess EOL spaces, etc. The
tools I've used for this cleanup are 'vi' and 'The Bat' email client.

One problem I notice is that over 2000 messages have badly misdated
'From ' header fields (the first line in the header). The date in the
field is essentially bogus (however, the data in the 'Date:' and the
various 'Received:' fields look correct.)

So, is there a tool or script which will fix the 'From ' lines?

If you can, post your reply to this newsgroup.

Thanks!

Mark

.



Relevant Pages

  • Re: Fixing mangled mbox From header lines?
    ... import into GoogleGroups for archival and retrieval. ... 'From ' header fields. ... In vi, if you determined the parten the "bogus" entries have, you can ... Usenet is full of trolls trolling for email addresses, ...
    (comp.mail.misc)
  • Re: How to get rid of all spam in this NG
    ... header shows the email address of the person making the ... header is added by the newsserver which the user is posting ... While many posts that come through GoogleGroups show From: ...
    (alt.home.repair)
  • Re: Newsreader Question
    ... Your own interest will be met, I suppose, by either altering articles ... "Followup-To" in the header and crossposts should be noted. ... I used to post from Googlegroups using Firefox and "Add followup-to ...
    (rec.arts.sf.written)
  • Re: [slrn][pan] Moving Along
    ... Blinky the Shark wrote: ... Although someone else (or a combination of someone elses) came up ... (Which is where the Googlegroups MIDs allegedly ... with the header name "References: ...
    (news.software.readers)