Hypermail incremental mode from John Finlay on 1999-04-11 (Hypermail Development List)

From: John Finlay <finlay_at_moeraki.com_at_hypermail-project.org>
Date: Sun, 11 Apr 1999 12:49:46 -0700
Message-ID: <3710FCDA.BB2A327B_at_moeraki.com>

About 2 years ago I set up hypermail 1.02 to archive some mail lists at work. I used the incremental add mode of operation. After a short while it was apparent that:

hypermail processing got very very slow as the archives got larger
- hypermail used up a significant amount of server resources during
processing
- hypermail tripped over itself because of internal race conditions when
trying to handle 2 messages at the same time to the same list.
- hypermail could get into a mode where it would spawn zillions of
processes and use up almost all resources on the server. The cause was hypermail processes that were stuck in infinite loops - one per message that came in. Pathological extension of the above problem.

Of course, the first 2 problems exacerbated the race condition problems.

Assuming that there were some simple fixes that would cure the problems, I set about adding those fixes in. Of course I was wrong about the simplicity of the fixes needed and spent much more time improving the performance and making hypermail robust in the incremental mode under heavy load.

At the time, I didn't discover any others who were working on hypermail so I just sat on those fixes. Now that I have discovered that others are still interested in hypermail, I'm trying to come up to speed on the latest version(s) and their features.

I see that there have been a number of useful enhancements to hypermail and am wondering about upgrading to a newer version but want to make sure that someone has already added in fixes to make the incremental mode robust and to make hypermail performant.

It's been quite a while since I last looked at what I did but a quick summary is:

added archive locking and a mechanism for allowing multiple spawned copies of hypermail to correctly update a hypermail archive while limiting the server resources used. Basically, if an archive was being updated, additional hypermail processes would queue their work and exit. An active hypermail process would exit only if there was no work to be done. [this fixed the incremental update race condition problems]
added an optional file that contained the summary info of the archive to avoid opening all the archive files during an update. [this removed a big performance hit on large archives]
fixed up the crossindexing of threads, etc. [more reliable indexing]
cleaned up the print.c routines. [rationalized these]
fixed lots of string handling errors in string.c [source of coredumps]
changed the addhash function and introduced a mail struct to avoid replicating each header string 4 times. [big saving on memory usage with large archives]
eliminated the on-the-fly sorting of messages by using qsort on the tables. [big performance enhancement for large archives]
fixed up the date routines which seemed to broken in a number of ways and the source of many core dumps. [fixed a lot of core dumps]
fixed lots of string handling errors in the parse.c routines. [fixed a lot of core dumps]

I'll bet that most of these fixes (or similar) have been added in the current releases though for me the most important ones are 1, 2, and 7.

I took a quick look at 20b3 and it seems not to have fixes equivalent to 1, 2, 6, or 7. I didn't check for 3, 4, 5, 8, or 9 but I would hope that these have been fixed since they are a source of coredumps.

After a quick look at 2a18, I noticed fixes equivalent to 1 and 6. I didn't see fixes equivalent to 2 or 7. I didn't check for the rest though I hope these are fixed.

My questions are:

is my analysis of the current releases wrt my concerns somewhat correct?
does anyone else care about/use incremental mode of hypermail?
does anyone care about large archives or is the current practice to slice the archives by month or week to avoid the inherent problems?
does the 2a18 represent the current development effort? Is 20b3 an earlier dead release?

Thanks

John Received on Sun 11 Apr 1999 09:51:25 PM GMT

This archive was generated by hypermail 2.3.0 : Sat 13 Mar 2010 03:46:11 AM GMT GMT