HTML Filter

From: Byron C. Darrah <>
Date: Mon, 30 Nov 1998 14:42:21 -0800 (PST)
Message-Id: <>

I just released a small change to the file below. I added a bunch of tags to the config file. The HTML filter now understands 50 HTML tags, which should be enable it to work pretty well on just about any HTML input.


Date: Mon, 30 Nov 1998 11:12:52 -0800 (PST) From: "Byron C. Darrah" <>

Alright, I put together a little something that I think will make a good start for an HTML filter. You can download it from:

Here's a little description of how it works:

  1. Comments, SGML commands, and unrecognized HTML tags are removed.
  2. Unmatched close tags are removed.
  3. The list of recognized tags is configurable, in a header file called filter_config.h.
  4. Recognized tags can be supressed. ie: removed.
  5. Recongized tags which are containers can cause all contained text to be supressed.
  6. Close tags are generated for unclosed containers.
  7. In the case of 2 or 6, a comment is emitted into the output, denoting the problem.

The current version has a very small list of recognized tags. We need to expand that.

In order to gurantee no buffer overflows, the html_filter uses the dynamic_strings_t module that I offered to Kent (by way of this mailing list) a while back. So I think the current unreleased Landfield beta version of hypermail probably already has this module in it.

If you want to integrate this filter with a version of hypermail (or other program) that uses different code for handling arbitrary length strings, you may want to either change html_filter or hypermail so that they use the same code for this.

This is my first cut at this, so there may be bugs :-).

--Byron Darrah
Received on Tue 01 Dec 1998 12:45:55 AM GMT

This archive was generated by hypermail 2.2.0 : Thu 22 Feb 2007 07:33:50 PM GMT GMT