About 2 years ago I set up hypermail 1.02 to archive some mail lists at
work. I used the incremental add mode of operation. After a short while
it was apparent that:
- hypermail processing got very very slow as the archives got larger
- hypermail used up a significant amount of server resources during
processing
- hypermail tripped over itself because of internal race conditions when
trying to handle 2 messages at the same time to the same list.
- hypermail could get into a mode where it would spawn zillions of
processes and use up almost all resources on the server. The cause was
hypermail processes that were stuck in infinite loops - one per message
that came in. Pathological extension of the above problem.
Of course, the first 2 problems exacerbated the race condition problems.
Assuming that there were some simple fixes that would cure the problems,
I set about adding those fixes in. Of course I was wrong about the
simplicity of the fixes needed and spent much more time improving the
performance and making hypermail robust in the incremental mode under
heavy load.
At the time, I didn't discover any others who were working on hypermail
so I just sat on those fixes. Now that I have discovered that others are
still interested in hypermail, I'm trying to come up to speed on the
latest version(s) and their features.
I see that there have been a number of useful enhancements to hypermail
and am wondering about upgrading to a newer version but want to make
sure that someone has already added in fixes to make the incremental
mode robust and to make hypermail performant.
It's been quite a while since I last looked at what I did but a quick
summary is:
- added archive locking and a mechanism for allowing multiple spawned
copies of hypermail to correctly update a hypermail archive while
limiting the server resources used. Basically, if an archive was being
updated, additional hypermail processes would queue their work and
exit. An active hypermail process would exit only if there was no work
to be done. [this fixed the incremental update race condition problems]
- added an optional file that contained the summary info of the archive
to avoid opening all the archive files during an update. [this removed a
big performance hit on large archives]
- fixed up the crossindexing of threads, etc. [more reliable indexing]
- cleaned up the print.c routines. [rationalized these]
- fixed lots of string handling errors in string.c [source of
coredumps]
- changed the addhash function and introduced a mail struct to avoid
replicating each header string 4 times. [big saving on memory usage with
large archives]
- eliminated the on-the-fly sorting of messages by using qsort on the
tables. [big performance enhancement for large archives]
- fixed up the date routines which seemed to broken in a number of ways
and the source of many core dumps. [fixed a lot of core dumps]
- fixed lots of string handling errors in the parse.c routines. [fixed
a lot of core dumps]
I'll bet that most of these fixes (or similar) have been added in the
current releases though for me the most important ones are 1, 2, and 7.
I took a quick look at 20b3 and it seems not to have fixes equivalent to
1, 2, 6, or 7. I didn't check for 3, 4, 5, 8, or 9 but I would hope
that these have been fixed since they are a source of coredumps.
After a quick look at 2a18, I noticed fixes equivalent to 1 and 6. I
didn't see fixes equivalent to 2 or 7. I didn't check for the rest
though I hope these are fixed.
My questions are:
- is my analysis of the current releases wrt my concerns somewhat
correct?
- does anyone else care about/use incremental mode of hypermail?
- does anyone care about large archives or is the current practice to
slice the archives by month or week to avoid the inherent problems?
- does the 2a18 represent the current development effort? Is 20b3 an
earlier dead release?
Thanks
John
Received on Sun 11 Apr 1999 09:51:25 PM GMT