I am using hypermail for archiving my mailing list, the "Jules Verne Forum", at
<http://JV.Gilead.org.il/forum/>. Yesterday, I sent a mail message to the
list, which is composed in English with few French words. In particular, the
subject line contained French accented characters. My mailer, mutt 1.4, is
configured to send iso-8859-1 if it can, utf-8 otherwise. In the body of the
message, I had a quoted French expression, and I hastily decided to use the
Unicode non-ascii single quotes (U+2018 and U+2019)instead of the ascii single
quote (U+0027). Therefore, the body of the message was sent in utf-8, not
iso-8859-1. So, the headers looked as follows:
Subject: New mailing address for =?iso-8859-1?Q?the?=
=?iso-8859-1?Q?_Soci=E9t=E9?= Jules Verne
Message-ID: <20030406194902.GB28158_at_fermat.math.technion.ac.il>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
....
Now here is the problem: Although the mail is completely ok, and the index
page, which is generated in iso-8859-1, is ok, there was a problem, with the
message page, which was generated in utf-8. The <title> and <h1> tags of this
page contain the subject, and is expressed in iso-8859-1 characters, and
not in the corresponding utf-8 characters (the utf-8 representation of ascii
characters is the identity, however for non-ascii, such as the accented french
characters, it is not). You can see the index file in
<http://JV.Gilead.org.il/forum/2003/04/> and the message file in
<http://JV.Gilead.org.il/forum/2003/04/0011.html>
My suggestion is the following: since rfc 2822 dictates the message subject to
be encoded in ascii, independantly of the mime type of the body, it is
impossible to store a correct subject in the html file unless it is encoded in
ascii, i.e., raw html entities. For example, translate =?iso-8859-1?=E9=, which
is the e-acute character, to its entity equivalent, é (in hexadecimal) or
é (in decimal). Since from programming point of view the forms are
equivalent, the latter is perhaps better since older browsers may not recognize
the former. Therefore, the subject of the mail I have above should be
translated to the ascii string
New mailing address for the Société Jules Verne
or
New mailing address for the Société Jules Verne
And not to a iso-8859-1 string
New mailing address for the Société Jules Verne
as it is currently tranlated.
I still haven't looked how this should be implemented in code but I hope it should not be hard.
Best,
Zvi.
-- Dr. Zvi Har'El mailto:rl_at_math.technion.ac.il Department of Mathematics tel:+972-54-227607 icq:179294841 Technion - Israel Institute of Technology fax:+972-4-8293388 http://www.math.technion.ac.il/~rl/ Haifa 32000, ISRAEL "If you can't say somethin' nice, don't say nothin' at all." -- Thumper (1942) Monday, 5 Nisan 5763, 7 April 2003, 3:53PMReceived on Mon 07 Apr 2003 03:29:37 PM GMT
This archive was generated by hypermail 2.3.0 : Sat 13 Mar 2010 03:46:12 AM GMT GMT