RE: Time-based archive subdivision

From: Tom von Alten <>
Date: Fri, 22 Oct 1999 09:11:59 -0600
Message-ID: <005101bf1c9f$c9b19720$>

Thanks to Peter, Paul and Kent for the reponses. (Hmm, do we have a new musical group there?)

Kent, you're exactly right that there are two issues here -- how to generated subdivided archives as we go, and my particular implementation issue of how to subdivide existing archives.

For the latter, it involves our local enhanced variant of v1.02 that has been handling attachments.

Regarding Peter's question, I think I can list several different issues, not all of which are addressed by simply paring down what's shown in "current" indexes.

  1. The size of the index page makes it slow to serve, transmit and view. (Recalling the slow performance of big tables that I've seen in some browsers reminds me that using a table-based layout may make this worse.)
  2. The time it takes hypermail to process a new article increases geometrically with the number of articles in the archive. This may not be an issue with v2, I don't know. (It may not even be *true* with v2, but it seems likely that it is.) All my really big archives are using our local v1+. I'll see if I can come up with a realistic way to "overpopulate" my test archive and see if v2 will care about this. If v2 is sufficiently fast and effective (or robust against intermittent overloading), then subdivided indexes will solve more of the overall concern.
  3. I need a reasonable method to let messages expire, or to delete (or "inactivate" -- I'd use the term "archive" if it weren't overloaded here) messages that are no longer of interest.
  4. In general, more recent information is of greater interest than "old" information; hierarchical indexing allows more useful organization based on how likely it is the information will be required.

Finally, Peter wrote:
> There are still some performance reasons to want to break up large
> directories, but I suspect it will be a while before anyone implements
> a clean way to do that.

Yeah, I'm not quite running into size-of-directory problems, per se. Any subdivision (or other management) approach that involves saving incoming messages intact increases the overall filesystem load, although presumably one can put the mboxes in separate directories if that's an issue.

There is one other problem that no one's raised - the "m10k" problem inherent in the NNNN.html filenaming scheme. My guess is all the other reasons to subdivide will come into play before our installation runs into that one, but ymmv.

I do see that a lot of archive management issues are simplified by saving all the incoming messages intact. I wish we'd taken that approach three years ago. :-/ Maybe we'll start better late than never.

_____________ Hewlett-Packard Computer Peripherals Bristol Tom von Alten

          This posting is for informational purposes only.
          It is not a statement of the Hewlett-Packard Co.
Received on Fri 22 Oct 1999 05:13:01 PM GMT

This archive was generated by hypermail 2.2.0 : Thu 22 Feb 2007 07:33:51 PM GMT GMT