Re: Duplicate message ids

From: Craig A Summerhill <>
Date: Wed, 5 May 1999 18:06:42 -0400 (EDT)
Message-Id: <>

On Wed, 5 May 1999, Paul Haldane <> wrote:
> On Wed, 5 May 1999, Daniel Stenberg <> wrote:
> >
> > On Tue, 4 May 1999, Paul Haldane wrote:
> > >
> > > Any more thoughts on what (if anything) we should do with messages with
> > > duplicate message ids?
> > >
> > > My inclination is to stick with what we do now - don't try to add the
> > > duplicates to the web archive but put out a warning message to that a
> > > human can fix things by hand.
> >
> > I think we could start with trying to think of reasons why this happens in
> > the first place. How do you add several mails to the arcive using the same
> > Message-ID? Does it ever actually occur that two different mails have the
> > same ID?
> It does (see my previous messages). It _shouldn't_ happen but (because of
> broken mail systems) it does. In a previous existence I did some work on
> loop avoidance in a mailing list manager. One of the techniques I used
> was suppressing messages with the same msgid. I soon found that there
> were some (a few) systems out there that don't generate unique msgids. We
> made the decision to recognise those messages and skip the check.
> We're talking about a small number of messages here (in the test mailboxes
> I'm using at the moment, perhaps 4-5 out of 1000) but obviously this
> depends on the MUAs in use by people sending the mail that hypermail is
> archiving.

Daniel, Paul, et al.

My personal preference would be to include messages with duplicate IDs into the HTMLed archive. In my case, we are using a mailing list agent which does a check for duplicates before the message end up in the mbox files which we are archiving. Personally, that check is plenty enough for me. (The MLA we use employs a combination of Message-Id: check and MD5 checksum on the body of the message, as well as a few other things like SMTP envelope address to determine if a message is a duplicate.) In my case, if a message gets through that check, I *want* it in my archive even if it has a message id matching one already there.

If hypermail is going to do a duplicate check, I would prefer a switch to turn the feature on and off.


