Re: [hypermail] ultimate searchable archives

From: Bill Moseley <>
Date: Mon, 16 Sep 2002 21:13:21 -0700
Message-Id: <>

At 01:24 PM 09/06/02 -0700, Bill Paxton wrote:
>Are there some pre-done htdig modifications out there?
>I checked contrib but nothing I could find. Is there
>something better than htdig?

I'm one of the developers of swish-e ( I've used it for indexing hypermail archives -- there's a perl script in the swish-e distribution that I have used for parsing the metadata form the hypermail HTML messages.

The downfall is that swish doesn't do incremental indexing, so for a very high volume list it might be a problem. On the other hand, swish is so damn fast[1] at indexing that for most application you don't need incremental indexing. If your messages are not coming in every second or so then you can typically figure out a way to build an index quickly (i.e. have master index created once a day and run indexing on just new messages for the day every minute or so and search both indexes at the same time).

The swish-e list is a hypermail archive and it's searchable at

You could probably come up with a better looking interface.

[1] Fast is subjective, of course. On my athlon I can index 100,000 2K text files in about three minutes. YMMV, of course.

Bill Moseley
Received on Fri 20 Sep 2002 12:46:44 AM GMT

