[5909] in www-talk@info.cern.ch

home help back first fref pref prev next nref lref last post

Re: FYI: Multilingual Encoding Discussion on www-talk@www0.cern.ch

daemon@ATHENA.MIT.EDU ((Frank Rojas ))
Thu Sep 29 01:35:07 1994

Date: Thu, 29 Sep 1994 06:32:30 +0100
Errors-To: listmaster@www0.cern.ch
Errors-To: listmaster@www0.cern.ch
Reply-To: fxrojas@nlsarch.austin.ibm.com
From: (Frank Rojas  ) <fxrojas@nlsarch.austin.ibm.com>
To: Multiple recipients of list <www-talk@www0.cern.ch>


I received the following through the Unicode consortium mailing list.
I hate to jump into a discussion but here are some comments based on 
my experience on Motif and X i18N development ... 

Since I am not signed onto this mailing list, you'll need to respond
directly...

 Frank Rojas                              
 AIX NLS Architecture                      VNET:    AUSTIN(FXROJAS)
 Advanced Workstation and System Division  Tie-line 678-8183  
 IBM, Mail 9652                            Phone:   (512) 838-8183
 Austin, TX 78758                          FAX:     (512) 838-3886
                                     AWD Net: fxrojas@nlsarch.austin.ibm.com

------------------

    ----- Begin Included Message -----

    Sender: www-talk@www0.cern.ch
    From:       Jeff Smith <sumisu@slab.ntt.jp>
    To: www-talk@www0.cern.ch
    Subject: Re: ISO charsets; Unicode
    To: Multiple recipients of list <www-talk@www0.cern.ch>


    If you haven't noticed, Motif doesn't allow the mixing of character
    sets in a single text widget - it takes more than a hack of the client
    to display multiple character sets (e.g. Hebrew, Greek, Japanese) on
    the same "page."

    The only way to do this - I haven't tried - would be to use Mule.

    js

This is not entirely true.     It depends on the localization of the
particular system.  AIX is now providing a UTF-8 locale and I believe 
Plan 9 is doing something also with UTF-8.

Actually X11 release 6 did include some support for UTF-8 but it was never
completed.  Finally, UTF-8 has been promoted by the X/Open and Uniforum
Joing Internatioanlization Working Group as the most portable means to 
support UCS on traditional XPG/POSIX systems....

And to prove that it is real...

On the recent AIX 4.1, we provide a UNIVERSAL locale that is based on 
UTF-8 (wchar_t = UCS-2)....  We are currently able to input and display
using the standard Motif 1.2  with our localization for:

ISO8859-1,2,5,6,7,8,9
Japanese/Chinese/Korean
Hebrew/Arabic

This support was demononstrated at the last Unicode Workshop this month.

All of this is using standard Motif 1.2 which is internataionalized such
that it can display in any locale.   In addition we (AIX) provide over 50 
national locales that use the local code set of the territory.  All of this
actually works with the Common Desktop Environment.

     |>In article <8899@cernvm.cern.ch> you write:
     |>
     |>|>Has a formal mechanism been considered for specifying various popular
     |>|>coding standards, such as ISO 8859-7, ISO 8859-8, etc., and (perhaps
     |>|>off in the future) Unicode?

UTF-8 seems to be the preferred vehicle... amoung XOpen members...

     |>|>The motivation for this question is essentially this:  Several really
     |>|>exciting developments are being stymied by the Web's largely ASCII/
     |>|>English-only focus.  As I discussed privately with several readers of
     |>|>this forum, there is, for example, a project afoot (nearly complete)
     |>|>to create a full lexicon and concordance of the Dead Sea Scrolls.  I
     |>|>imagine a system where users can look up words, and view the original
     |>|>scrolls as inlined images.  The problem is that the DSS are written
     |>|>in Greek, Aramaic, and Hebrew. 

This is what we built the UNIVERSAL locale for...

     |>This is a Mosaic problem, not a WWW problem. Mosaic can handle multiple
     |>fonts but only one charset. 

Using UTF-8 this should be sufficient.

     |>At least one TBA browser supports mixed
     |>character set documents.HTML/3.0 is better here as well.

This would require by-passing the standard Motif localization and providing
your own localization (fonts, input methods, locale, etc...) along with 
the browser...  I think a better approach is to use the standard
Motif 1.2 internationalization API's and depend on the localization provided
by the Motif implementation.

I know that the (COSE) CDE environment I18N is based on this and is
sufficient for their mail/edit/helps/etc...

     |>|> Specially hacked clients are only just
     |>|>recently arriving that can do Japanese and a few other languages.  

I wonder if they are using standard Motif/X I18N functions?

     |>|>No general solution exists.  

I'd say that Motif/X I18N functions should meet the needs of regional 
documents.   

For multilingual documents, the UTF-8 should be the preferred means...

     |>|>And (perhaps most importantly) there is no-
     |>|>thing in the HTML(+) descriptions that allows one to specify when text
     |>|>in one language ends and text in another begins, or to specify what
     |>|>encoding system is being used for either.  

For display and input purposes this is not absolutely neccessary.  We've
build a "universal input method" that allows user to switch from one
language to another and allows user to select characters using planes
of UCS...

     |>|>The DSS project isn't the only one that appears stymied.  There is a
     |>|>Cushitic etymological database (say that with a mouth full) at the U
     |>|>of Chicago that's machine readable, and comes replete with a standard
     |>|>interface.  The project head would be happy to plug it into the Web,
     |>|>but again the Web only knows ASCII.
     |>
     |>Here I suspect you need something quite a bit more sophisticated and 
     |>which is at least 6 months off. 

Just get a browser that uses the standard Motif 1.2 and X 11 release 5 
interfaces and that should be enough for the time being...  

this will meet the requirements for documents shared in a regional 
environment and ...

then put the requirement for UTF-8 localization from the Motif suppliers
to do multilingual documents ... actually, we should request the 
X Consortium to finalize the UTF-8 localization with X11.6 ...

     |> You need a highly modular browser and drop in your
     |>own module into it. That type of research tends to need highly specialised
     |>fonts and a lot more flexibility that first sight might imply.

True... such localization does not come easy not quickly...


home help back first fref pref prev next nref lref last post