[5844] in www-talk@info.cern.ch

home help back first fref pref prev next nref lref last post

Re: ISO charsets; Unicode

daemon@ATHENA.MIT.EDU (Chris Lilley, Computer Graphics Un)
Tue Sep 27 13:00:01 1994

Date: Tue, 27 Sep 1994 10:52:41 +0100
Errors-To: listmaster@www0.cern.ch
Errors-To: listmaster@www0.cern.ch
Reply-To: lilley@v5.cgu.mcc.ac.uk
From: lilley@v5.cgu.mcc.ac.uk (Chris Lilley, Computer Graphics Unit)
To: Multiple recipients of list <www-talk@www0.cern.ch>

In message <9409261553.AA22679@midway.uchicago.edu> Richard L. Goerwitz said:

> Still, is the general problem of multi-language text worth dis-
> cussing? 

Worth discussing and solving.

> For my part, I'd love to make a few of my non-English
> databases available online, but I don't know how to tell query
> forms to expect something other than ISO 8859-1.

You can't, currently. There is the internationalised version of the M-entity by 
Toshihiro Takada <http://www.ntt.jp/Mosaic-l10n/README.html> which addresses 
some of the problems of automatic switching on a *per document* basis. 

I agree that switching on a per-element basis is needed, but as I said SGML does 
not appear to support this. Perhaps the SGML standard could do with an 
ammendment in the light of this?

> Let me just toss off a suggestion here.  Say we suddenly move
> from English to Greek text:

> <language Greek encoding="ISO 8859-8">

I see what you are saying, I have said similar things myself in the past. You 
correctly separate the problem into two sub-problems: changing the language, and 
changing the encoding. 

For example, being able to tag the English, French and Italian sections of a 
document (all using ISO 8859-1) is doable with changes to the DTD. This would 
have desirable consequences in terms of targetted searching, for example.

Changing the encoding cannot apparently be done like this. You cannot express it 
in the DTD. This is a severe limitation. As I have said before, the 
Much-of-Western-Europe-and-the-USA Wide Web is flourishing, but the "World" bit 
is sorely lacking.

A related issue is defining overlaps between languages and their subsets etc. 
There are actually three things to juggle with; the country, the language and 
the encoding (there may be more than one encoding possible).

For example, you may want to tag a section as US English. Or British English. Or 
US Medical English. But it needs to be expressed somehow that people who read 
British English can also understand US English (well, mostly ;-) ). Expressing 
tis as a locale, eg Britain, doesn't help because people here speak Scots Gaelic 
or Welsh or Manx or Romany or Sheltie, in some cases as a first language, and we 
don't want any backdoor cultural imperialism do we ;-(

In case that example was a little too UK-centric for some tastes: expressing 
locale as Switzerland doesn't help becuase the primary language may be Swiss 
German or Swiss French or Swiss Italian (or English if they are at CERN ;-) ) 
and it may in some cases be desirable to distinguish Swiss German from German 
German.

Or again, in the USA a lot of people have Latin American Spanish as a first 
language I am told. In Spain, "Spanish" (Castillian) is widely used but there 
are a lot that speak Catalan. And so on. 

> The question for me is just how sophisticated we want clients to
> get.  The Web is supposed to be worldwide, to be sure, and this
> would seem to imply multilinguality. But how are we supposed to
> be sure that all of the requisite fonts, with all of the requisite
> registries and encodings, are on every machine? 

I think you answer that well in your next paragraph - people have the fonts they 
use regularly on their machine. People aquire others according to need. Clients 
can help by offering to download fonts that are needed. Or gracefully recover, 
or just to say "this passage is in Ancient Babylonian, you don't have the 
fonts". The user may care, or they may not. Clients start off with a minimal set 
and build up capabilities for what they are interested in reading.

> I'm sorry if I seem to be obtruding in a forum without knowing what I
> am doing.  

Hardly.

> As I noted above, I'm in the Humanities, and am simply trying to see if I 
> can be any help at all....

Good. Keep saying these things. Multilingual capability is essential. The more 
people that express opinions about it and keep it in the forefront, the better.

--
Chris

home help back first fref pref prev next nref lref last post