On Wed, 5 May 1999, Paul Haldane <paul.haldane_at_newcastle.ac.uk> wrote:
>
> On Wed, 5 May 1999, Daniel Stenberg <daniel.stenberg_at_sth.frontec.se> wrote:
> >
> > On Tue, 4 May 1999, Paul Haldane wrote:
> > >
> > > Any more thoughts on what (if anything) we should do with messages with
> > > duplicate message ids?
> > >
> > > My inclination is to stick with what we do now - don't try to add the
> > > duplicates to the web archive but put out a warning message to that a
> > > human can fix things by hand.
> >
> > I think we could start with trying to think of reasons why this happens in
> > the first place. How do you add several mails to the arcive using the same
> > Message-ID? Does it ever actually occur that two different mails have the
> > same ID?
>
> It does (see my previous messages). It _shouldn't_ happen but (because of
> broken mail systems) it does. In a previous existence I did some work on
> loop avoidance in a mailing list manager. One of the techniques I used
> was suppressing messages with the same msgid. I soon found that there
> were some (a few) systems out there that don't generate unique msgids. We
> made the decision to recognise those messages and skip the check.
>
> We're talking about a small number of messages here (in the test mailboxes
> I'm using at the moment, perhaps 4-5 out of 1000) but obviously this
> depends on the MUAs in use by people sending the mail that hypermail is
> archiving.
Daniel, Paul, et al.
My personal preference would be to include messages with duplicate IDs into the HTMLed archive. In my case, we are using a mailing list agent which does a check for duplicates before the message end up in the mbox files which we are archiving. Personally, that check is plenty enough for me. (The MLA we use employs a combination of Message-Id: check and MD5 checksum on the body of the message, as well as a few other things like SMTP envelope address to determine if a message is a duplicate.) In my case, if a message gets through that check, I *want* it in my archive even if it has a message id matching one already there.
If hypermail is going to do a duplicate check, I would prefer a switch to turn the feature on and off.
-- Craig A. Summerhill, Systems Coordinator and Program Officer Coalition for Networked Information 21 Dupont Circle, N.W., Washington, D.C. 20036 Internet: craig_at_cni.org AT&Tnet (202) 296-5098Received on Thu 06 May 1999 12:12:07 AM GMT
This archive was generated by hypermail 2.2.0 : Thu 22 Feb 2007 07:33:50 PM GMT GMT