[1286] in BarnOwl Developers

home help back first fref pref prev next nref lref last post

Re: Zephyr charset-aware patch

daemon@ATHENA.MIT.EDU (Keith Winstein)
Thu Oct 29 18:14:58 2009

Resent-From: nelhage@mit.edu
Resent-To: barnowl-dev-mtg@charon.mit.edu
X-Original-To: nelhage@lunatique.mit.edu
Date: Tue, 3 Feb 2009 13:56:58 -0500 (EST)
From: Keith Winstein <keithw@MIT.EDU>
To: Jeffrey Hutzelman <jhutz@cmu.edu>
cc: Karl C Ramm <kcr@mit.edu>, Sam Hartman <hartmans@mit.edu>,
        Greg Hudson <ghudson@mit.edu>, Anders Kaseorg <andersk@mit.edu>,
        Geoffrey G Thomas <geofft@mit.edu>, Nelson Elhage <nelhage@mit.edu>,
        asedeno@mit.edu, "Mark W. Eichin" <eichin@mit.edu>,
        dirty-owl-hackers@mit.edu
In-Reply-To: <9EA9E2C19AF57466A66840E4@minbar.fac.cs.cmu.edu>

On Tue, 3 Feb 2009, Jeffrey Hutzelman wrote:

> How does this interact with the longstanding convention that zephyr notices 
> be encoded in the ISO-8859-1 character set?  If a client tries to send a 
> notice encoded in latin-1 and the library reencodes it in UTF-8, you are 
> going to decrease interoperability.

Hi Jeff,

Thanks for the thoughtful note.

At MIT, we have discovered that there is no longer an agreed-upon 
convention for the on-the-wire character encoding of zephyr messages. 
That's the problem that led to this patch.

Although there used to be a tacit agreement for Latin-1, that has wilted 
over the last several years because (a) charset-agnostic zephyr just sends 
outgoing messages, and interprets incoming messages, in whatever the 
user's own character encoding happens to be, and (b) most terminals are 
now UTF-8.

Many users are now attached to the full range of UCS characters.

This patch _establishes_ interoperability between Latin-1 users and UTF-8 
users, where currently there is no interoperability. It also _preserves_ 
interoperability between two Latin-1 users, since each client will encode 
the outgoing message from the user charmap (Latin-1) to UTF-8 on the wire, 
and then back from UTF-8 to Latin-1 for display on the terminal.

It also preserves interoperability between a _sending_ Latin-1 user who 
has not upgraded and a _receiving_ Latin-1 user who has upgraded (failed 
conversions are related verbatim). Where it hurts interoperability is 
between a _sending_ Latin-1 user who has upgraded and a receiving Latin-1 
user who has not upgraded, at least when the sending user is sending 
non-ASCII Latin-1 characters.

An alternative would be to add extra fields to zephyrs so they would carry 
both "legacy" Latin-1 data (to be read by legacy clients and those in a 
Latin-1 locale) and parallel UTF-8 encodings of the same fields (to be 
read by new clients and translated to the local character encoding, which 
may be UTF-8). I think the complexity of trying to do this transparently 
to legacy clients is not worth it.

> How does this interact with use of zephyr notices to carry non-textual data, 
> for which implicit character set conversion may be totally inappropriate? 
> For example, I know there are tools in use for carrying on encrypted 
> conversations using zephyr notices to carry ciphertext.

I haven't tested it, but I would be interested to hear about problems...

One note: we are not redefining the semantics of any "notice field" except 
the z_message, which is a series of null-terminated character strings in 
an unspecified character encoding.

-Keith

home help back first fref pref prev next nref lref last post