| home | help | back | first | fref | pref | prev | next | nref | lref | last | post |
Date: Mon, 26 Sep 1994 16:02:50 +0100 Errors-To: listmaster@www0.cern.ch Errors-To: listmaster@www0.cern.ch Reply-To: goer@midway.uchicago.edu From: "Richard L. Goerwitz" <goer@midway.uchicago.edu> To: Multiple recipients of list <www-talk@www0.cern.ch> Has a formal mechanism been considered for specifying various popular coding standards, such as ISO 8859-7, ISO 8859-8, etc., and (perhaps off in the future) Unicode? Might be possible to use SGML entities for every conceivable character in every conceivable language, but as a practical solution to a current problem, this seems difficult at best. The motivation for this question is essentially this: Several really exciting developments are being stymied by the Web's largely ASCII/ English-only focus. As I discussed privately with several readers of this forum, there is, for example, a project afoot (nearly complete) to create a full lexicon and concordance of the Dead Sea Scrolls. I imagine a system where users can look up words, and view the original scrolls as inlined images. The problem is that the DSS are written in Greek, Aramaic, and Hebrew. Specially hacked clients are only just recently arriving that can do Japanese and a few other languages. No general solution exists. And (perhaps most importantly) there is no- thing in the HTML(+) descriptions that allows one to specify when text in one language ends and text in another begins, or to specify what encoding system is being used for either. The few hacked clients I've seen also are not really geared for display of arbitrary languages. The DSS project isn't the only one that appears stymied. There is a Cushitic etymological database (say that with a mouth full) at the U of Chicago that's machine readable, and comes replete with a standard interface. The project head would be happy to plug it into the Web, but again the Web only knows ASCII. Other projects afoot are a comprehensive Aramaic dictionary. Aramaic is the language of parts of the biblical book of Daniel and Ezra, and a stray verse in Jeremiah. There is a huge corpus of early Christian literature written in it, as well as several fundamental Jewish docu- ments like the Talmud. Then, of course, there's the giant database project called ARTFL, which essentially attempts to make the entire French literary corpus availa- ble online. It's already here, and tied to the Web. But they have no standard specs for how to allow users to input things as simple as an accute accent over an "a". They have an extremely competent staff to work on such problems - but I wonder: Should this _be_ a problem? I suppose I shouldn't bend anyone's ears any longer. Suffice it to say that there are many, many projects being worked on, and many people working on them. A lot of them simply won't be enhancing the Web in the near future because the Web isn't (yet) really world-wide (in a cultural or linguistic sense). Always wanting to bring disciplines together, I'm led to ask, then: What ideas have been floated along the lines of making the Web more all- encompassing, linguistically speaking? Are there any practical solutions the folks mentioned above could be working on now? Where should I direct people who have questions about internationalization/multilingualism and the Web? Can Humanities people help aid the process, even if many of them are not technically oriented? -> Richard L. Goerwitz -> goer@mithra-orinst.uchicago.edu
| home | help | back | first | fref | pref | prev | next | nref | lref | last | post |