Dear Jose,
On Mon, 07 Apr 2003 18:09:47 +0200, Jose Kahan wrote about "Re: [hypermail] Latin1 subject with UTF-8 body":
> I did something similar in my XHTML hypermail work for converting
> the winlatin1 characters that are inserted inside messages coded with
> ISO-8859-1. That is, I'm converting them into the equivalent Unicode
> entities. I guess that in your case, the rule would be "if the
> message's charset is UTF-8, then convert the ISO-8859-1 set into
> the equivalent Unicode one.
This conversion has nothing to do with what the message charset is. My suggestion is to use character entities, which are ascii representation of the unicode characters, and they are usable for any message charset (assuming of course that ascii is a subset of the document charset, which is the basic html assumption). So such a conversion should be done, I believe, for any non-ascii part of the subject, i.e, which is expressed using the rule
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
of RFC 2047, where 'encoding' is "Q" (quoted printable)or "B" (base64).
It is true that after the conversion to unicode is done, the printable ascii range can still be represented as is, without the recourse to character entities, for optiomization of the representation, in case the MUA (like in my case) use the encoding even in cases they are not really needed (where the few words of the subject where encoded, although it was necessary to encode just two letters in one of the words).
>
> You can see how I did it and then expand on that work.
Thanks. I'll look into it.
Best,
Zvi.
-- Dr. Zvi Har'El mailto:rl_at_math.technion.ac.il Department of Mathematics tel:+972-54-227607 icq:179294841 Technion - Israel Institute of Technology fax:+972-4-8293388 http://www.math.technion.ac.il/~rl/ Haifa 32000, ISRAEL "If you can't say somethin' nice, don't say nothin' at all." -- Thumper (1942) Monday, 6 Nisan 5763, 7 April 2003, 7:11PMReceived on Mon 07 Apr 2003 06:45:49 PM GMT
This archive was generated by hypermail 2.2.0 : Thu 22 Feb 2007 07:33:54 PM GMT GMT