[5846] in www-talk@info.cern.ch


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Re: ISO charsets; Unicode

daemon@ATHENA.MIT.EDU (Jeff Smith)
Tue Sep 27 13:40:48 1994

Date: Tue, 27 Sep 1994 06:16:35 +0100
Errors-To: listmaster@www0.cern.ch
Errors-To: listmaster@www0.cern.ch
Reply-To: sumisu@slab.ntt.jp
From: Jeff Smith <sumisu@slab.ntt.jp>
To: Multiple recipients of list <www-talk@www0.cern.ch>

If you haven't noticed, Motif doesn't allow the mixing of character
sets in a single text widget - it takes more than a hack of the client
to display multiple character sets (e.g. Hebrew, Greek, Japanese) on
the same "page."

The only way to do this - I haven't tried - would be to use Mule.

js

 |>In article <8899@cernvm.cern.ch> you write:
 |>
 |>|>Has a formal mechanism been considered for specifying various popular
 |>|>coding standards, such as ISO 8859-7, ISO 8859-8, etc., and (perhaps
 |>|>off in the future) Unicode?
 |>
 |>Yes, it is a parameter to the text/xxx content type:-
 |>
 |>text/html; charset=ISO8859-7
 |>
 |>Or some such stuff.
 |>
 |>
 |>|>The motivation for this question is essentially this:  Several really
 |>|>exciting developments are being stymied by the Web's largely ASCII/
 |>|>English-only focus.  As I discussed privately with several readers of
 |>|>this forum, there is, for example, a project afoot (nearly complete)
 |>|>to create a full lexicon and concordance of the Dead Sea Scrolls.  I
 |>|>imagine a system where users can look up words, and view the original
 |>|>scrolls as inlined images.  The problem is that the DSS are written
 |>|>in Greek, Aramaic, and Hebrew. 
 |>
 |>This is a Mosaic problem, not a WWW problem. Mosaic can handle multiple
 |>fonts but only one charset. At least one TBA browser supports mixed
 |>character set documents.HTML/3.0 is better here as well.
 |>
 |>
 |>|> Specially hacked clients are only just
 |>|>recently arriving that can do Japanese and a few other languages.  No
 |>|>general solution exists.  And (perhaps most importantly) there is no-
 |>|>thing in the HTML(+) descriptions that allows one to specify when text
 |>|>in one language ends and text in another begins, or to specify what
 |>|>encoding system is being used for either.  The few hacked clients I've
 |>|>seen also are not really geared for display of arbitrary languages.
 |>
 |>Hacked versions afor any particular language are easy to come by. There
 |>is no browser that can display english, greek and Hebrew together at
 |>present. This will change. At some point the difficult question of 
 |>mixing left/right scanning languages will have to be tackled.
 |>
 |>
 |>|>The DSS project isn't the only one that appears stymied.  There is a
 |>|>Cushitic etymological database (say that with a mouth full) at the U
 |>|>of Chicago that's machine readable, and comes replete with a standard
 |>|>interface.  The project head would be happy to plug it into the Web,
 |>|>but again the Web only knows ASCII.
 |>
 |>Here I suspect you need something quite a bit more sophisticated and which
 |>is at least 6 months off. You need a highly modular browser and drop in your
 |>own module into it. That type of research tends to need highly specialised
 |>fonts and a lot more flexibility that first sight might imply.
 |>
 |>
 |>|>Other projects afoot are a comprehensive Aramaic dictionary.  Aramaic
 |>|>is the language of parts of the biblical book of Daniel and Ezra, and
 |>|>a stray verse in Jeremiah.  There is a huge corpus of early Christian
 |>|>literature written in it, as well as several fundamental Jewish docu-
 |>|>ments like the Talmud.
 |>
 |>Again for any ancient language I suspect you will need multiple character
 |>sets for different periods, different script styles etc. RFC-822 is pretty
 |>much the same in gothic or helvetica. But if you are discussing an ancient
 |>text typeface questions can be very critical. This is especially so with
 |>cuenniform or hyroglyphic texts.
 |>
 |>
 |>|>Then, of course, there's the giant database project called ARTFL, which
 |>|>essentially attempts to make the entire French literary corpus availa-
 |>|>ble online.  It's already here, and tied to the Web.  But they have no
 |>|>standard specs for how to allow users to input things as simple as an
 |>|>accute accent over an "a".  They have an extremely competent staff to
 |>|>work on such problems - but I wonder:  Should this _be_ a problem?
 |>
 |>If the browser is a good one it should understand the accents in standard
 |>ISO code as well as as entities. The entities are pretty much redundant
 |>for a, v etc unless you have a derraged transport that is not 8 bit clean.
 |>
 |>|>				 	-> Richard L. Goerwitz
 |>|>					-> goer@mithra-orinst.uchicago.edu
 |>
 |>Well since a large number of developers here have funny accents in their
 |>name as I suspect your forebears would have done (Gvrwitz) extended Latin
 |>is pretty much catered  for. Greek is essential for the maths and so will
 |>go in. Hebrew characters will probably arrive before mixing right/left
 |>scanning.
 |>
 |>If the 7bit mail transport strips off the accents then you might not
 |>understand some of the above bits...
 |>
 |>--
 |>Phillip M. Hallam-Baker
 |>
 |>Not Speaking for anyone else.

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[5846] in www-talk@info.cern.ch

Re: ISO charsets; Unicode

daemon@ATHENA.MIT.EDU (Jeff Smith)Tue Sep 27 13:40:48 1994

daemon@ATHENA.MIT.EDU (Jeff Smith)
Tue Sep 27 13:40:48 1994