[5797] in www-talk@info.cern.ch

home help back first fref pref prev next nref lref last post

WWW and non-English (was ISO charsets; Unicode )

daemon@ATHENA.MIT.EDU (Peter Svanberg)
Mon Sep 26 18:09:44 1994

Date: Mon, 26 Sep 1994 23:07:24 +0100
Errors-To: listmaster@www0.cern.ch
Errors-To: listmaster@www0.cern.ch
Reply-To: psv@nada.kth.se
From: Peter Svanberg <psv@nada.kth.se>
To: Multiple recipients of list <www-talk@www0.cern.ch>

Quoting:  "Richard L. Goerwitz" <goer@midway.uchicago.edu>
>
> Has a formal mechanism been considered for specifying various popular
> coding standards, such as ISO 8859-7, ISO 8859-8, etc., and (perhaps
> off in the future) Unicode?

Good question! This is HTML+ discussion text, from
<URL:http://info.cern.ch/hypertext/WWW/MarkUp/HTMLPlus/htmlplus_13.html>:

    By default, HTML+ documents are made up of 8-bit characters
    from the ISO 8859 Latin-1 character set. The network protocol
    used to retrieve documents may translate the character set into
    a locally acceptable form, e.g. EBCDIC. The HTTP protocol uses
    the MIME standard (RFC 1341) to specify the document type and
    character set. ISO SGML entity definitions are used to include
    characters which are missing from the character set or which
    would otherwise be confused with markup elements...

    Appendix II lists a broad range of characters and symbols,
    relating their ISO names to the corresponding character codes
    in common character sets. They allow authors to include
    accented characters in 7-bit ASCII documents. ...

    There are a large number of entities defined by the ISO,
    covering most languages and symbols for publishing and
    mathematics. Requiring all browsers to support these would
    be impractical, e.g. how should a dumb terminal show such
    symbols. In some cases there will be accepted ways of
    mapping them to normal characters, e.g. <aelig> as ae and
    <egrave> as e. Perhaps the safest recommendation is that
    where authors need to use a specialised character or
    symbol, they should use ISO entity names rather than
    inventing their own. Browsers should leave unrecognised
    entity names untranslated.

That is all I have found on this subject - not much.

> What ideas have been floated along the lines of making the Web more all-
> encompassing, linguistically speaking?  Are there any practical solutions
> the folks mentioned above could be working on now?  Where should I direct
> people who have questions about internationalization/multilingualism and
> the Web?  Can Humanities people help aid the process, even if many of them
> are not technically oriented?

A very important matter here is the choice of language:

   (1) in the client
   (2) in the documents

For (1) we must urge the client developers to make their
program internationalized - preferably through the standardized
"i18n" methods. Some work is being done for at least Mosaic (in
Germany and in Sweden), but apparently not in cooperation with
the developing team, with all the disadvantages that entails.

Concerning (2), I have looked in the plans for future HTML and
HTTP and found (in
<URL:http://info.cern.ch/hypertext/WWW/Protocols/HTTP/HTRQ_Headers.html>)
that a HTTP request can contain

    Accept-Language: <list>

which is a list of "Language values which are preferable in the
response". In
<URL:http://info.cern.ch/hypertext/WWW/Protocols/HTTP/Object_Headers.html>
the parallell specification in the "Object MetaInformation"
contained in the "header fields given with or in relation to
objects in HTTP" is given as

    Content-Language: <code>

This seems nice, I have just the following comment:

Make both of these conformant with the suggested
Content-Language header (draft-ietf-mailext-lang-tag-00.txt?),
with the semantic difference that the value for Accept-Language
is a user's priority list for desired language.
---
Peter Svanberg, NADA, KTH		    Email: psv@nada.kth.se
Dept of Num An & CS,
Royal Inst of Tech			    Phone: +46 8 790 71 40
S-100 44  Stockholm, SWEDEN		    Fax:   +46 8 790 09 30

home help back first fref pref prev next nref lref last post