Re: Getting started (Goals for the next generation hypermail "system")

From: Gary Adams - SMI Software Development <Gary.Adams_at_East.Sun.COM_at_hypermail-project.org>
Date: Thu, 23 Apr 1998 10:23:00 -0400 (EDT)
Message-Id: <199804231423.KAA10835_at_zeppo.East.Sun.COM>


I never really thought of hypermail as a compute intensive task. For the most part it's an off line document conversion task. One of the reasons I got back on the hypermail alias was to be a part of more functional tool. e.g. more than just handling attachments better. (Personally, I'd been thinking about spinning off a Java implementation for my own use, perhaps with an initial RDF/XML approach.)

When hypermail maintenance stopped, most people here switched over to MHonArc, because it was a going concern. Has anyone compared hypermail and Mhonarc features? Is one better than the other for particular tasks. e.g. more efficient for large folders, or well suited for large projects.

Personally, I find most mail archives useless without an accompanying search interface. Is there anything that can be done to make that a natural part of the parsing subsystem. (Today, a search engine is typically applied to the generated HTML with some loss in the document filtering. e.g bad search hits on next and previous headers and footers as opposed to the actual message or attachments.)

Since computations are equally support on the client of server side of the browser/http web environment, I'd like to get more powerful access to mail archives with traditional database capabilities. Hypermail already provides preformatted indexes, linked records, and formatted metadata in a fairly static form. Why not allow some dynamic presentation of those data structures? e.g. a folded outline of the threaded index in an applet applet viewer.

http://java.sun.com/docs/books/tutorial/ui/swing/tree.html
http://java.sun.com/products/jfc/swingdoc-archive/jtable.html
http://java.sun.com/products/javahelp/features.html

>
> Writing it in C gives a few advantages:
> - For a speed-intensive task like this, nothing else
> gives you the control needed (don't get me wrong - Python is
> cool, just not something I'd personally choose to use here)

Performance requirements should address

> - We can mmap() in the mailbox file and speed up parsing a lot.

        We can unwind the loops in assembly language :-)         

> - Other cool speedups (writev() for writing out the messages,
> for example) that require low-level manipulations.
> - There's probably more, I'm just not thinking of it right now.

	Some attention should be made to portability if you want hypermail
	wide deployed again.
	
	I think there should also be a goal of extensibility. The Apache
	module architecture made it possible for lots of different orthogonal 
	contributions to be made to the basic web server architecture, because 
	the request/reply transaction was decomposed into incremental 
	stages where appropriate vaule added capabilities could be made.
	e.g. I already mentioned the desire to contact an "external 
	indexing agent" when a mesage was fully parsed. It also be good
	to allow a per attachment filter to be configured based on the type
	of the attachment (why not do the pdf to text conversion at 
	archive filtering time?)
	

>
> I would suggest using 'glib' (a set of utility classes for C, i.e. sane
> string manipulation, hash tables, tree, linked list, etc.). It would give
> us much better memory management, among other things, as well as GString -
> the infinitely-long buffer :)

>
> Perhaps we must rewrite hypermail for sanity, though - existing code has
> no concept of "buffer overflow" no matter how hard you try ;-)

How much code are we talking about today? 100, 1000, 10000 lines of code?

Sorry for the long reply. My hope is that I'll be able to help contribute some Java and Search engine experience in getting more out of the generated archive indexes and formatted messages. e.g. Hood ornaments, rather than new V8 engines. Received on Fri 24 Apr 1998 11:51:45 PM GMT

This archive was generated by hypermail 2.3.0 : Sat 13 Mar 2010 03:46:10 AM GMT GMT