[1286] in BarnOwl Developers
Re: Zephyr charset-aware patch
daemon@ATHENA.MIT.EDU (Keith Winstein)
Thu Oct 29 18:14:58 2009
Resent-From: nelhage@mit.edu
Resent-To: barnowl-dev-mtg@charon.mit.edu
X-Original-To: nelhage@lunatique.mit.edu
Date: Tue, 3 Feb 2009 13:56:58 -0500 (EST)
From: Keith Winstein <keithw@MIT.EDU>
To: Jeffrey Hutzelman <jhutz@cmu.edu>
cc: Karl C Ramm <kcr@mit.edu>, Sam Hartman <hartmans@mit.edu>,
Greg Hudson <ghudson@mit.edu>, Anders Kaseorg <andersk@mit.edu>,
Geoffrey G Thomas <geofft@mit.edu>, Nelson Elhage <nelhage@mit.edu>,
asedeno@mit.edu, "Mark W. Eichin" <eichin@mit.edu>,
dirty-owl-hackers@mit.edu
In-Reply-To: <9EA9E2C19AF57466A66840E4@minbar.fac.cs.cmu.edu>
On Tue, 3 Feb 2009, Jeffrey Hutzelman wrote:
> How does this interact with the longstanding convention that zephyr notices
> be encoded in the ISO-8859-1 character set? If a client tries to send a
> notice encoded in latin-1 and the library reencodes it in UTF-8, you are
> going to decrease interoperability.
Hi Jeff,
Thanks for the thoughtful note.
At MIT, we have discovered that there is no longer an agreed-upon
convention for the on-the-wire character encoding of zephyr messages.
That's the problem that led to this patch.
Although there used to be a tacit agreement for Latin-1, that has wilted
over the last several years because (a) charset-agnostic zephyr just sends
outgoing messages, and interprets incoming messages, in whatever the
user's own character encoding happens to be, and (b) most terminals are
now UTF-8.
Many users are now attached to the full range of UCS characters.
This patch _establishes_ interoperability between Latin-1 users and UTF-8
users, where currently there is no interoperability. It also _preserves_
interoperability between two Latin-1 users, since each client will encode
the outgoing message from the user charmap (Latin-1) to UTF-8 on the wire,
and then back from UTF-8 to Latin-1 for display on the terminal.
It also preserves interoperability between a _sending_ Latin-1 user who
has not upgraded and a _receiving_ Latin-1 user who has upgraded (failed
conversions are related verbatim). Where it hurts interoperability is
between a _sending_ Latin-1 user who has upgraded and a receiving Latin-1
user who has not upgraded, at least when the sending user is sending
non-ASCII Latin-1 characters.
An alternative would be to add extra fields to zephyrs so they would carry
both "legacy" Latin-1 data (to be read by legacy clients and those in a
Latin-1 locale) and parallel UTF-8 encodings of the same fields (to be
read by new clients and translated to the local character encoding, which
may be UTF-8). I think the complexity of trying to do this transparently
to legacy clients is not worth it.
> How does this interact with use of zephyr notices to carry non-textual data,
> for which implicit character set conversion may be totally inappropriate?
> For example, I know there are tools in use for carrying on encrypted
> conversations using zephyr notices to carry ciphertext.
I haven't tested it, but I would be interested to hear about problems...
One note: we are not redefining the semantics of any "notice field" except
the z_message, which is a series of null-terminated character strings in
an unspecified character encoding.
-Keith