[5786] in www-talk@info.cern.ch

home help back first fref pref prev next nref lref last post

Re: ISO charsets; Unicode

daemon@ATHENA.MIT.EDU (Chris Lilley, Computer Graphics Un)
Mon Sep 26 13:36:48 1994

Date: Mon, 26 Sep 1994 18:32:28 +0100
Errors-To: listmaster@www0.cern.ch
Errors-To: listmaster@www0.cern.ch
Reply-To: lilley@v5.cgu.mcc.ac.uk
From: lilley@v5.cgu.mcc.ac.uk (Chris Lilley, Computer Graphics Unit)
To: Multiple recipients of list <www-talk@www0.cern.ch>

In message <9409261441.AA12654@midway.uchicago.edu> Richard L. Goerwitz said:

> The project head would be happy to plug it into the Web,
> but again the Web only knows ASCII.

Not so, the Web knows only ISO 8859-1 (so if you send it ASCII it will work) but 
that is not the same thing.

I agree with much of the posting, but:

> Then, of course, there's the giant database project called ARTFL, which
> essentially attempts to make the entire French literary corpus availa-
> ble online.  It's already here, and tied to the Web.  But they have no
> standard specs for how to allow users to input things as simple as an
> accute accent over an "a".

I suggest you check this. ISO 8859-1 covers most western European languages and 
should certainly do French. "A acute" is doable already and has been since the 
Web started. See for example

<http://info.mcc.ac.uk/CGU/staff/lilley/charset.html>

> They have an extremely competent staff to
> work on such problems - but I wonder:  Should this _be_ a problem?

Not in this particular instance, no. In the general case of Aramaic etc yes it 
is currently a problem. There has been some discussion on the list before about 
this: I seem to remember that we learned that SGML does not have the expressive 
power to say that this here paragraph is in ISO 8859-9 or shift-JIS or whatever.

--
Chris

home help back first fref pref prev next nref lref last post