RE: threading is a risky business

From: Tom von Alten <tom_vonalten_at_boi.hp.com_at_hypermail-project.org>
Date: Thu, 14 Oct 1999 16:05:30 -0600
Message-ID: <001f01bf1690$3a62abf0$c9d1020f_at_alien-nt.boi.hp.com>


Daniel Stenberg writes:
> 1. The thread index requires a mail that is not a reply to start a
> thread with.
>
> 2. The function hashreplynumlookup() (struct.c:362) does it best
> to look up possible replies.
>
> 3. In the problematic mailbox we have identical subjects "Re:
> Beschwerde" in all three mails, but only one of them with a
> In-Reply-To: header.

...
> 4. All this taken together, mail 0 is considered the reply to
> message 2 and the other two are considered replies to mail 0.
>
> I really can't see any good way out from this. Other than not match that
> willingly on subjects, but that will make hypermail less effective when
> clients that don't support in-reply-to and similar are used.
>
> Anyone with ideas or suggestions around this?

I'm not exactly sure what you mean by #4, but it sounds like a circular reference. Is that what made the thread disappear?

Without diving into the code, it occurs to me that there needs to be a defined algorithm with no ties, and no unending lists. It seems like the date (sure, but which one?!) should be a consideration. In the example you give, the way out might be that we find 0 is a reply to 2, 1 is a reply to 0 and 2 is a reply to 0... but disallow that as circular. So, this thread is
  2
   0
    1
and go on.

As for suggestions, I have one that probably can't fly for v2, but something to consider:

When I first started hacking v1.02 and the wrapper script a coworker gave me, I worked to improve threading based on the subject field alone. I quickly got lost in cross references in the hypermail code, and chose to work on the wrapper shell script instead.

The general outline of what I did was to try to recognize and remove prefixes (and variations in capitalization) and extract a string I called the "thread" (title, if you will).

After trying various approaches to managing this, I ended up with a separate file of threads which I use for comparison to incoming messages. If I find a match, I stick "Re: " in front of the thread string, and feed it to hypermail, which threads it by the subject.

It makes sense to strongly consider "In-reply-to" and it makes sense to strongly consider the subject. At this point, I'd vote for whatever is closer to "done." Broken threading is not too big a tragedy, as much as we'd like it. (Missing threads need to be fixed, of course!)

Cheers,
_____________ Hewlett-Packard Computer Peripherals Bristol Tom von Alten mailto:Tom_vonAlten_at_boi.hp.com

          This posting is for informational purposes only.
          It is not a statement of the Hewlett-Packard Co.
Received on Fri 15 Oct 1999 12:06:16 AM GMT

This archive was generated by hypermail 2.2.0 : Thu 22 Feb 2007 07:33:51 PM GMT GMT