[hypermail] Latin1 subject with UTF-8 body

From: Zvi Har'El <rl_at_math.technion.ac.il_at_hypermail-project.org>
Date: Mon, 7 Apr 2003 16:23:18 +0300
Message-ID: <20030407132318.GB15873_at_fermat.math.technion.ac.il>


Dear Hypermail Developers,

I am using hypermail for archiving my mailing list, the "Jules Verne Forum", at
<http://JV.Gilead.org.il/forum/>. Yesterday, I sent a mail message to the
list, which is composed in English with few French words. In particular, the subject line contained French accented characters. My mailer, mutt 1.4, is configured to send iso-8859-1 if it can, utf-8 otherwise. In the body of the message, I had a quoted French expression, and I hastily decided to use the Unicode non-ascii single quotes (U+2018 and U+2019)instead of the ascii single quote (U+0027). Therefore, the body of the message was sent in utf-8, not iso-8859-1. So, the headers looked as follows:

Subject: New mailing address for =?iso-8859-1?Q?the?=

        =?iso-8859-1?Q?_Soci=E9t=E9?= Jules Verne Message-ID: <20030406194902.GB28158_at_fermat.math.technion.ac.il> Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8

....

Now here is the problem: Although the mail is completely ok, and the index page, which is generated in iso-8859-1, is ok, there was a problem, with the message page, which was generated in utf-8. The <title> and <h1> tags of this page contain the subject, and is expressed in iso-8859-1 characters, and not in the corresponding utf-8 characters (the utf-8 representation of ascii characters is the identity, however for non-ascii, such as the accented french characters, it is not). You can see the index file in
<http://JV.Gilead.org.il/forum/2003/04/> and the message file in
<http://JV.Gilead.org.il/forum/2003/04/0011.html>

My suggestion is the following: since rfc 2822 dictates the message subject to be encoded in ascii, independantly of the mime type of the body, it is impossible to store a correct subject in the html file unless it is encoded in ascii, i.e., raw html entities. For example, translate =?iso-8859-1?=E9=, which is the e-acute character, to its entity equivalent, &#xe9; (in hexadecimal) or &#233; (in decimal). Since from programming point of view the forms are equivalent, the latter is perhaps better since older browsers may not recognize the former. Therefore, the subject of the mail I have above should be translated to the ascii string
New mailing address for the Soci&#xe9;t&#xe9; Jules Verne or
New mailing address for the Soci&#233;t&#233; Jules Verne And not to a iso-8859-1 string
New mailing address for the Société Jules Verne as it is currently tranlated.

I still haven't looked how this should be implemented in code but I hope it should not be hard.

Best,

Zvi.

-- 
Dr. Zvi Har'El     mailto:rl_at_math.technion.ac.il     Department of Mathematics
tel:+972-54-227607 icq:179294841     Technion - Israel Institute of Technology
fax:+972-4-8293388 http://www.math.technion.ac.il/~rl/     Haifa 32000, ISRAEL
"If you can't say somethin' nice, don't say nothin' at all." -- Thumper (1942)
                                  Monday, 5 Nisan 5763,  7 April 2003,  3:53PM
Received on Mon 07 Apr 2003 03:29:37 PM GMT

This archive was generated by hypermail 2.3.0 : Sat 13 Mar 2010 03:46:12 AM GMT GMT