Re: Nightly Archiving

From: Scott Rose <>
Date: Thu, 25 Jun 1998 11:07:43 -0500
Message-ID: <>

Charles Hall wrote:

> The problem is that by the 28th day of the month I'm re-processing the
> previous 27 days as well as the new day.
> MHonArc apparently has some sort of database it uses to remember what
> mail has already been processed, but at the cost of oodles of Perl
> scripts.

I process incoming mail as it arrives, and roll the archives only quarterly. Some of the message counts get into the mid-four-digits by the time of the roll, which makes for a pretty expensive archive towards the end of the quarter. hypermail 1.02 stores the relevant header information in the message files as HTML comments, which means that if you have an archive of 4000 messages, you have 4000 files to open and read before you are ready to add a new one.

To cheapen that up, I started storing the header information in a GDBM database. I had to make one other change to get the performance to be reasonable-- there was one function that was being called N^2 times, despite the fact that the function results were the same each time. I can't give you numbers, but the performance of the hacked version is far superior to that of the out-of-the-box 1.02.

Ideally, I'd submit patches in time for 2.0, in case the new maintainer likes the idea. I haven't found the time for that yet, in part because it should really be generalized a bit more, for example to use other *DBM libraries.

I know that this isn't precisely an answer to the question on the table; I'm simply taking this perhaps overdue opportunity to share the information in case it should be useful to somebody.

That hacked version of 1.02 is at Notable other changes include fixes of all the (many) opportunities in the vanilla version for buffer overflows that I could find, which might still not be all of them.

Scott Rose               
Bicycling Community Page  <URL:>
Received on Mon 29 Jun 1998 04:50:21 PM GMT

This archive was generated by hypermail 2.2.0 : Thu 22 Feb 2007 07:33:49 PM GMT GMT