Re: Converting individual messages to mbox format

From: <jose.kahan_at_w3.org_at_hypermail-project.org>
Date: Fri, 29 Oct 1999 00:28:25 +0200 (MET DST)
Message-Id: <199910282228.AAA04419_at_tuvalu.inrialpes.fr>


In our previous episode, Ashley M. Kirchner said:
>
> for file in * ; do cat $file >> mbox ; echo "" >> mbox ; done (this is
> done under sh/bash)
>
> ...this would just cat each file, with a blank line between them, to
> another file called mbox (or whatever you want to call it).
>
> I -think- this would work. I don't have anything loose like that right
> now that I can try for you, but this wouldn't hurt to try, specially since you
> won't mess up anything. You're not deleting any files, just catting them into
> a different one. Then try parsing that big mbox file through hypermail see
> what it does. There's another test for hypermail.

That's what my script does, but it adds a > char to any line that's not the envelope (From ....). This is the IETF recommended mbox format.

After testing hypermail with some historical archives, I can tell you that the parser may get confused in many cases. For example, if someone copies some message headers into his own message asking "what does this mean", hypermail may think it's a new message.

The IETF proposition allows you to parse the mbox and be sure where a message starts. And my hypermail set_ietf_mbox option allows you to take advantage of this format :)

Of course, having a better parser is the best thing, but my simple example will continue breaking it... well, it's just an exception and won't happen often, I think.

-Jose Received on Fri 29 Oct 1999 12:28:14 AM GMT

This archive was generated by hypermail 2.2.0 : Thu 22 Feb 2007 07:33:51 PM GMT GMT