Re: Duplicate message ids

From: Craig A Summerhill <craig_at_cni.org_at_hypermail-project.org>
Date: Wed, 5 May 1999 18:06:42 -0400 (EDT)
Message-Id: <9905052206.AA18025_at_a.cni.org>


On Wed, 5 May 1999, Paul Haldane <paul.haldane_at_newcastle.ac.uk> wrote:
>
> On Wed, 5 May 1999, Daniel Stenberg <daniel.stenberg_at_sth.frontec.se> wrote:
> >
> > On Tue, 4 May 1999, Paul Haldane wrote:
> > >
> > > Any more thoughts on what (if anything) we should do with messages with
> > > duplicate message ids?
> > >
> > > My inclination is to stick with what we do now - don't try to add the
> > > duplicates to the web archive but put out a warning message to that a
> > > human can fix things by hand.
> >
> > I think we could start with trying to think of reasons why this happens in
> > the first place. How do you add several mails to the arcive using the same
> > Message-ID? Does it ever actually occur that two different mails have the
> > same ID?
>
> It does (see my previous messages). It _shouldn't_ happen but (because of
> broken mail systems) it does. In a previous existence I did some work on
> loop avoidance in a mailing list manager. One of the techniques I used
> was suppressing messages with the same msgid. I soon found that there
> were some (a few) systems out there that don't generate unique msgids. We
> made the decision to recognise those messages and skip the check.
>
> We're talking about a small number of messages here (in the test mailboxes
> I'm using at the moment, perhaps 4-5 out of 1000) but obviously this
> depends on the MUAs in use by people sending the mail that hypermail is
> archiving.

Daniel, Paul, et al.

My personal preference would be to include messages with duplicate IDs into the HTMLed archive. In my case, we are using a mailing list agent which does a check for duplicates before the message end up in the mbox files which we are archiving. Personally, that check is plenty enough for me. (The MLA we use employs a combination of Message-Id: check and MD5 checksum on the body of the message, as well as a few other things like SMTP envelope address to determine if a message is a duplicate.) In my case, if a message gets through that check, I *want* it in my archive even if it has a message id matching one already there.

If hypermail is going to do a duplicate check, I would prefer a switch to turn the feature on and off.

-- 

   Craig A. Summerhill, Systems Coordinator and Program Officer
   Coalition for Networked Information
   21 Dupont Circle, N.W., Washington, D.C.   20036
   Internet: craig_at_cni.org   AT&Tnet (202) 296-5098
Received on Thu 06 May 1999 12:12:07 AM GMT

This archive was generated by hypermail 2.2.0 : Thu 22 Feb 2007 07:33:50 PM GMT GMT