Re: Getting started (Goals for the next generation hypermail "system") from Nick Arnett on 1998-04-24 (Hypermail Development List)

From: Nick Arnett <arnett_at_alink.net_at_hypermail-project.org>
Date: Fri, 24 Apr 1998 17:29:33 -0700
Message-ID: <B0003655265_at_mail.mccmedia.com>

At 10:23 AM 4/23/98 -0400, Gary Adams - SMI Software Development wrote:

>Personally, I find most mail archives useless without an accompanying
>search interface. Is there anything that can be done to make that a natural
>part of the parsing subsystem. (Today, a search engine is typically
>applied to the generated HTML with some loss in the document filtering.
>e.g bad search hits on next and previous headers and footers as opposed to
>the actual message or attachments.)

Dublin Core-based meta tags are a no-brainer and would help with field searching, obviously.

As for multiple messages per page, that's a bit more difficult for the average engine. I'm not aware of one that allows you to mark text not to index. However, there's a somewhat ugly way around it. You index a text version of the body, but as the key, you use the URL to the HTML version. The ugliness arises when you try to use byte offsets in the body of the retrieved document. This might be to highlight search terms, for example. But that's beyond most simple search engine and isn't especially important for short documents/

I think the big win, for large collections, will come from returning search results in the context of message subjects and other forms of categorization. This takes an extra index (or multiple searches) that has the subject thread parent-child relationships. The results list is made up of subjects, rather than individual messages. Selecting a subject shows the subject thread and the messages that "hit" for your search terms. That level of contextualizing search results should be part of any archiver that is intended to store more than a few thousand message (in my somewhat-experienced opinion).

A big step beyond that is to categorize messages into a hierarchical subject tree, which I'm hoping to get to in the next couple of months. That topology will scale even better in terms of search.

>Since computations are equally support on the client of server side of the
>browser/http web environment, I'd like to get more powerful access to
>mail archives with traditional database capabilities. Hypermail already
>provides preformatted indexes, linked records, and formatted metadata
>in a fairly static form. Why not allow some dynamic presentation of
>those data structures? e.g. a folded outline of the threaded index
>in an applet applet viewer.

Ah, you get it, too. But add this -- when you can return search results into that some folded outline, those long, useless results lists become useful again. I'm not sure what's public or not yet, but there are a number of companies that are developing UIs for navigating knowledge trees, at least in the form of trees, if not graphs. To clarify that last point, Yahoo!'s subjects are a graph, not just a tree. The practical implication is that subjects can have multiple parents. That's not a requirement for what we're talking about here, I would hazard.

As I mentioned earlier, I'm going to be somewhat of a side player here because I'm working primarily in Visual Basic. One reason for that is the ready availability of UI components, such as a tree viewer, as ActiveX controls. But it also gives me the Jet database, which is the same one that Microsoft Access is based on. After doing this kind of thing in HyperCard years ago, I was dying for a database.

Nick

--

Phone/fax: (408) 733-7613  E-mail: narnett_at_mccmedia.com

Received on Sat 25 Apr 1998 02:34:45 AM GMT

This archive was generated by hypermail 2.2.0 : Thu 22 Feb 2007 07:33:49 PM GMT GMT