Re: [hypermail] Latin1 subject with UTF-8 body

From: Zvi Har'El <rl_at_math.technion.ac.il_at_hypermail-project.org>
Date: Mon, 7 Apr 2003 19:39:27 +0300
Message-ID: <20030407163927.GA24892_at_fermat.math.technion.ac.il>


Dear Jose,

On Mon, 07 Apr 2003 18:09:47 +0200, Jose Kahan wrote about "Re: [hypermail] Latin1 subject with UTF-8 body":
> I did something similar in my XHTML hypermail work for converting
> the winlatin1 characters that are inserted inside messages coded with
> ISO-8859-1. That is, I'm converting them into the equivalent Unicode
> entities. I guess that in your case, the rule would be "if the
> message's charset is UTF-8, then convert the ISO-8859-1 set into
> the equivalent Unicode one.

This conversion has nothing to do with what the message charset is. My suggestion is to use character entities, which are ascii representation of the unicode characters, and they are usable for any message charset (assuming of course that ascii is a subset of the document charset, which is the basic html assumption). So such a conversion should be done, I believe, for any non-ascii part of the subject, i.e, which is expressed using the rule

    encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

of RFC 2047, where 'encoding' is "Q" (quoted printable)or "B" (base64).

It is true that after the conversion to unicode is done, the printable ascii range can still be represented as is, without the recourse to character entities, for optiomization of the representation, in case the MUA (like in my case) use the encoding even in cases they are not really needed (where the few words of the subject where encoded, although it was necessary to encode just two letters in one of the words).

>
> You can see how I did it and then expand on that work.

Thanks. I'll look into it.

Best,

Zvi.

-- 
Dr. Zvi Har'El     mailto:rl_at_math.technion.ac.il     Department of Mathematics
tel:+972-54-227607 icq:179294841     Technion - Israel Institute of Technology
fax:+972-4-8293388 http://www.math.technion.ac.il/~rl/     Haifa 32000, ISRAEL
"If you can't say somethin' nice, don't say nothin' at all." -- Thumper (1942)
                                  Monday, 6 Nisan 5763,  7 April 2003,  7:11PM
Received on Mon 07 Apr 2003 06:45:49 PM GMT

This archive was generated by hypermail 2.2.0 : Thu 22 Feb 2007 07:33:54 PM GMT GMT