Re: Suggestions for hypermail - I may even be willing to help from Craig A Summerhill on 1999-03-16 (Hypermail Development List)

From: Craig A Summerhill <craig_at_cni.org_at_hypermail-project.org>
Date: Tue, 16 Mar 1999 20:15:42 -0500 (EST)
Message-Id: <9903170115.AA22983_at_a.cni.org>

On Tue, 16 Mar 1999, Daniel Stenberg <daniel.stenberg_at_sth.frontec.se> wrote:
>
> On Sun, 14 Mar 1999, Fred Cohen <fc_at_all.net> wrote:
> >
> > 2) Create a cross-reference by author that crosses all archive sets (I
> > do things by month, but would like the author index to cross all time).
>
> That would mean that hypermail should by itself get a better grip of
> time-split archives, I mean where to find different archives from
> different times etc.

This strikes me as something that is going to be considerably more difficult than what hypermail is currently doing... and is more apropose to a piece of indexing software than to hypermail (see comments at end of note).

> > 3) Dates should optionally be allowed to appear as YYYY MM DD format -
> > particularly nice for sorting!
>
> Ideally, the dates should be formatted using some kind of "%y %m %d"-type
> coded strings.

Given that it is 16 March 1999, I think that should be %Y %m %d... B^)

(I know a couple of folks posted some notes last week or the week before about dates not sorting properly, and Daniel replied. I didn't read that stuff too carefully, so I hope this isn't a re-hash...)

Actually, one of the problems with many of the archive files I work with is that the date format exists in the Date: field of the message in several different formats:

   Date: Tue, 02 Mar 1999 15:12:22
   Date: Tue, 2  Mar 1999 15:12:22
   Date: Tue, 02 Mar 1999 15:12:22 -0500
   Date: 2 Mar 99 15:12:22
   Date: Tue, 02 Mar 99 15:12:22 -0800

I wouldn't mind seeing these date formats all normalized to some extent in the markup of HTML. (It might not be a bad idea to maintain the original format as a , but display a Zulu of GMT time. Or vice-a-versa.)

Beyond sorting, as Fred points out, if you are using a simple text indexing tool (like SWISH-E for example), it is really difficult to limit your search to any meaningful time period -- short of saving indexes for each smaller time slice -- which makes searching the entire repository a drag. With a consisitent date format in the HTML it would be fairly easy to put an HTML search form up that would allow people to limit their search to a particular time period (useful on active lists which have 10 years of archives...) In fact, I have a one-click link which allows people to see the "most recent" for each of our lists (usually the current month).

Anyway, I do think the date handling is something that falls under the purview of hypermail and could be cleaned up.

> > The idea would be to create one giant archive that allows views by
> > author, by date, etc. - but with hierarchies of dates (see all, see
> > year, see month, see week, see day, (in my case) see hour.
>
> I'd love that!

As noted above, if the dates were normalized, you could probably throw together a system to do this now using search form that does a date search against a SWISH-E index. Searching for date ranges would be a less than optimal this way, and of course, you would lack the ability to sort the results by author, date, subject, and/or thread.

> > 4) I would like to see a command line option for adding a mailbox to an
> > existing archive.
>
> Doesn't it already?

Um. I'm not sure it does Daniel.

To be honest, I never got the incremental addition working with the older version 1.x hypermail. My recollection of this function is that I could incrementally add one message to an existing archive by piping the message through hypermail when it arrives, but that there was no facility for adding a large number of messages to an existing archive. I ended up with corrupted indexes as I recall...

(I haven't focused on this ability in the new version 2.x hypermail, but I should test again. I'll try later this week.)

As a general comment re: Fred's message, I want to say...

I appreciate the comments from Fred Cohen, and would add one additional comment. It would be nice to have the indexes for those kinds of views (by month, by week, etc.) generated dynamically.

However (and I think it is a big however), I am not sure those functions should necessarily be folded into hypermail. As Daniel noted, there is a real issue regarding who is doing the work. Secondly, it is probably worth considering whether this kind of functionality is consistent with the philosophy of how hypermail operates.

The kind of functions Fred requests are fairly easily accomplished using a backend DBMS or some full text indexing software and some intelligent CGI scripting/programming. I don't view hypermail as a tool which is trying to (or going to) gain the high ground in that kind of performance. It (hypermail) on the other hand is focused on relatively simple and static markup and displays of data (FIFO).

I've got access to both kinds of systems here. We have Dataware's BRS/SEARCH with C/Perl CGI interfaces that allow user's to perform all kinds of complicated searchs (including date limits, proximity searching in text, etc.). The reason I want to use hypermail is to get away from having to maintain BRS databases for our list archives. I have come to the conclusion that hypermail is better suited for the vast majority of list archives we want to make accessible.

o First off, disk is cheap.
o Secondly, once it is processed in hypermail, it never has to get

touched again (dynamic views constantly re-read the same data). o Finally, web crawlers and indexing tools can take advantage of

      the messages as hypermail marks them up, while the data stored 
      in the BRS database is inaccessible.  This has advantages, 
      sometimes.

Anyway, this is just my $.02...

-- 

   Craig A. Summerhill, Systems Coordinator and Program Officer
   Coalition for Networked Information
   21 Dupont Circle, N.W., Washington, D.C.   20036
   Internet: craig_at_cni.org   AT&Tnet (202) 296-5098

Received on Wed 17 Mar 1999 03:23:21 AM GMT

This archive was generated by hypermail 2.3.0 : Sat 13 Mar 2010 03:46:11 AM GMT GMT