Hello, everyone,
Here's the snippet of code I mentioned previously; it's pretty brainless in that it assumes that words after "<b" or "<S" (the way way hypermail currently format "Next Message", etc) should not be indexed.
/* html2txt_hypermail.c -- a hack of the original html2txt.c
for use with a hypermail archive. Skips material on any given line following "<b" or "<S", which we assume means the line in question is an "adminstrative" line in the hypermail-generated HTML file. Avoids excessive meaningless references when glimse is invoked. To use compile with "gcc -o html2txt_hypermail html2txt_hypermail.c", and edit .glimpse_filters in the relevant directory to call html2txt_hypermail rather than html2txt.
Allin Cottrell (cottrell_at_wfu.edu), December 1997
June 5, 1998: added skip for lines beginning with "<s" (for "<strong>"), for compatibility with new hypermail. Untested!!!
*/
#include <stdio.h>
main()
{
int c;
while(1) {
c=getchar(); if (c==EOF) exit(1); if (c != '<') putchar(c); else { c=getchar(); if (c==EOF) exit(1); if (c != 'b' && c != 'S') { while (c != '>') { c=getchar(); if (c==EOF) exit(1); }
}
else { while (c != 10) { c=getchar(); if (c==EOF) exit(1); } putchar(10);
}
}
Cheers,
-darci
On Thu, 10 Sep 1998, Robert J. Lebowitz wrote:
> I assume you're trying to exclude phrases, not just specific words??? > > No, I don't believe that Swish++ has this capability. You can specify a > list of stop words but I don't think that it can be set to identify phrases. > > -----Original Message----- > From: Allan Schaffer <allan_at_southpark.engr.sgi.com> > To: hypermail_at_landfield.com <hypermail_at_landfield.com> > Date: Thursday, September 10, 1998 3:49 PM > Subject: Re: Searching hypermail > > > >On Sep 10, 1:22pm, Robert J. Lebowitz wrote: > >> Not true!! There is a new product called Swish++ available at > >> http://www.best.com/~pjl/ > >> Much better than the original and incredibly fast. > > > >Swish comes pretty close to suiting my needs too but (as > >minerva_at_phix.com mentioned) there doesn't seem to be a way to > >suppress the indexing of the words in "Next Message", "Previous > >Message", etc. Do you know if Swish++ has a way to get around this? > > > >Allan > > > >-- > >Allan Schaffer allan_at_sgi.com > >Silicon Graphics http://reality.sgi.com/allan >
-- information is not knowledge. knowledge is not wisdom. wisdom is not truth. truth is not beauty. beauty is not love. love is not music. music is the best. -- FZReceived on Fri 11 Sep 1998 01:10:35 AM GMT
This archive was generated by hypermail 2.2.0 : Thu 22 Feb 2007 07:33:50 PM GMT GMT