I'm a bit confused by the above statement; is it necessary to reprocess all 4.5M every time Hypermail is run? Does it not keep an on-disk database, and update things incrementally? Anyway, I'm the guy who wrote Pipermail, and will contribute a brief discussion of Pipermail's structure that hopefully will prove helpful.
At bottom, Pipermail is just a few abstract base classes: Database, Article, and a class representing an archiver. The base classes contain the logic of adding articles to an archive, threading them, writing index files, etc. None of the actual formatting of messages or indexes is specified in the base classes; I've written a set of classes which mimic Hypermail, and which can be subclassed further, but you could also write your own from scratch and produce something which looked nothing like Hypermail at all. This is because I looked around and noticed how many hacked versions of Hypermail there were, and wanted something that would be easily extensible.
So, for example, the archiver class has a get_archives() method that's passed the article, and returns a list naming the indexes where the article should be archived. You can therefore file a message into several indexes, or none at all; a hypothetical archiver for linux-kernel could check if a message contains a patch, and return ['linux-kernel', 'posted-patches'] if it did. Or get_archives() could check if the subject matched ^subscribe\s, and return an empty list, discarding the message. It's only limited by how much work you want to do in that function.
Similarly, there are methods which are supposed to write the header of an index, the footer, and a single entry, to convert a message to HTML, write a table of contents listing all the archives, and so forth. It hasn't proven difficult to produce a customized interface on top of the Hypermail-lookalike classes, such as the SIG archives at python.org.
The pseudo-code for archiving messages looks like this:
class MyArchiver( ... ):
... define or override methods ...
db = BSDDBDatabase() # Or whatever Database subclass you write archiver = MyArchiverClass(base_directory, db)
for each message to be added:
# Make an article object from an rfc822.Message object # (read from a mailbox file or whatever) a = MyArticle(message) archiver.add_article( a )
archiver.close() # On closing, indexes get written, databases updated, etc.
Weak points of Pipermail:
-- A.M. Kuchling http://starship.skyport.net/crew/amk/ The mathematician lives long and lives young; the wings of his soul do not early drop off, nor do its pores become clogged with the earthy particles blown from the dusty highways of vulgar life. -- James Joseph SylvesterReceived on Thu 23 Apr 1998 05:39:20 PM GMT
This archive was generated by hypermail 2.3.0 : Sat 13 Mar 2010 03:46:10 AM GMT GMT