parsemail rewrite from John Finlay on 1999-04-23 (Hypermail Development List)

From: John Finlay <finlay_at_moeraki.com_at_hypermail-project.org>
Date: Fri, 23 Apr 1999 06:04:34 -0700
Message-ID: <37206FE2.5578DD4F_at_moeraki.com>

Just to follow up on my previous message. My thinking is that the strategy of parsemail is to extract each email message as a whole breaking it into header and body sections but not doing further processing of the body until later. This is simple and straightforward. The headers can then be decoded to extract the sorting info. Then the headers of each message are canonicalized and the message is added to the hash tables. At this point the determination of whether the message is to be saved could be done based on the msgid and other criteria.

The messages are added to a growable (in chunks of 512) array of messages that are sorted using quicksort, if necessary (the addheader function is deleted). The incremental case would employ a header file which contains the cache of header info including the sort info which is read in (or created on the fly is not existing) and used to populate the message table before processing any new messages. In the incremental case, if the header file is available a merge sort of the new messages would be done.

During the printing of the messages, the message decoding and MIME decoding of the message bodies would be done (though this can be done any time after it is decided to save the message.

I'm not sure I completely understand the handling of multipart messages especially when they are recursive (e.g. multipart/digest). Is it the intention to handle these fully i.e. multipart messages inside multipart messages? the comments seem to recognize the problem but it's unclear to me whether full recursion has been implemented.

John Received on Fri 23 Apr 1999 03:03:59 PM GMT

This archive was generated by hypermail 2.3.0 : Sat 13 Mar 2010 03:46:11 AM GMT GMT