Re: parsemail rewrite from John Finlay on 1999-04-26 (Hypermail Development List)

From: John Finlay <finlay_at_moeraki.com_at_hypermail-project.org>
Date: Mon, 26 Apr 1999 12:14:49 -0700
Message-ID: <3724BB29.5FCA83CB_at_moeraki.com>

Daniel Stenberg wrote:

> On Fri, 23 Apr 1999, John Finlay wrote:
>
> > > I agree that it sounds simple and straightforward eoungh, only a tad too
> > > much memory consuming for my taste.
>
> > I assume that you are referring to the pathological case you mention
> > below. I notice that the current code drops a lot of attachments. Is
> > this done because of these concerns?
>
> No. It is because of some bug. I've never seen this behaviour myself.
>

I meant the code intentionally drops some attachments (e.g. virtual business cards).

>
> > > Consider the case when reading a mail with 10MB mime attachment. I say it
> > > needs to be stored while read, while your version would alloc memory for this
> > > beast and keep it in memory for a while until later. What if you read a
> > > mailbox with a 100 of such attachments...?
>
> > I suspect that this would be a problem in any case for lot of mail
> > systems, receivers, etc. I don't believe that this size attachment occurs
> > frequently in practice - I would think that many email gateways would
> > choke. The largest attachment I've ever received was about 2MB but maybe
> > I've been lucky.
>
> Well, then consider a more realistic case where you have a mailbox with 1000
> attachments, each being 2MB. What I am trying to say is that attachments is
> what makes mails very big, not the next parts. In general of course.
>
> My point is still that hypermail uses far less system resources if it
> extracts the attachments as early as possible without needing to allocate
> memory keep them in various lists.

>
> > The incremental mode would fare much better by only having to deal with
> > one of these beasts at a time.
>
> Indeed. Still, we need to make it deal with complete archives properly too.
>
> > I'd like to avoid as much of the complexity of the current parsemail by
> > deferring the depth first message extraction as long as possible. I
> > believe that the proposed strategy will work perfectly in 99.99% of the
> > cases and degrade performance only in the case of a mailfile with lots of
> > large attachments.
>
> I don't think we gain source readability by changing where in the flow we
> extract the attachments. I think we gain it by re-structuring the parsemail()
> function into a set of smaller functions that each have a defined set of
> input and output parameters.
>

I guess I've been too focused on the incremental case and haven't thought about some of the problems of the eat-a-mailbox case. I notice that some of my mailfiles arelarger than 20MB.

It seems that memory would be a problem in any event for large mailboxes even if they aren't full of large attachments. To avoid memory allocation problems, it would seem that dumping the bodies out to individual files or alternatively dealing with each message completely before reading the next would provide the lowest memory cost by avoiding the allocation for each body line.

>
> > Is it the goal to support all of the MIME types fully as the effort
> > progresses?
>
> I'd say it is, yes.
>
> --
> Daniel Stenberg - http://www.fts.frontec.se/~dast
> ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol
Received on Mon 26 Apr 1999 09:16:05 PM GMT

This archive was generated by hypermail 2.3.0 : Sat 13 Mar 2010 03:46:11 AM GMT GMT