| home | help | back | first | fref | pref | prev | next | nref | lref | last | post |
Date: Fri, 30 Sep 1994 00:57:33 +0100 Errors-To: listmaster@www0.cern.ch Errors-To: listmaster@www0.cern.ch Reply-To: jgrass@CNRI.Reston.VA.US From: "Judith E. Grass" <jgrass@CNRI.Reston.VA.US> To: Multiple recipients of list <www-talk@www0.cern.ch> Re: Transliteration Some languages are relatively easy to transliterate: e.g. Russian Some are really, really difficult: Japanese written in Kanji... where there is no simple one to one correspondence between a single character and the transliteration. To transliterate you need context, and you might even need to actually understand what is written in order to disambiguate some cases... and maybe even to put any word breaks into the transcription, since Japanese doesn't usually have any (and doesn't need any). Japanese is readable in English transcription... and for those gajgin who don't know a whole lot of Kanji, it can be easier. Once your kanji vocabulary is big enough, the kanji actually is easier to deal with. BUT: transcription systems even for something like Russian are not simple. For English speakers there are at least three different systems in common use and the choice depends pretty much on use: Library of Congress system Slavic Linguists system (lots of hacheks and diacritics) Third system whose name escapes me (no diacritics, less precise than ones above) A fourth system has also appeared based on a encoding in common use in the former USSR (I think this is KOI-8, but it may be one of the variants of it) that pretty much transliterates cyrillic character by chopping off the 8th bit of the character which yields a rough ascii transliteration.... which can be read off the screen directly that way and is perfectly understandable and useful. More than a little bit of email gets transmitted this way, and some of it never gets transliterated back into cyrillic. A second thing to understand about transcription, beyond the fact that there may be multiple systems within a language, is that the transcription system is frequently different for different languages. The reason you listen to ballets with music by "Tchaikowsky" is because his name came to us via the French. If it had come via some English speaking Slavic scholar, it might have been "Chajkovskij" (Maybe "C-hachek" rather than "Ch", though). For a real good time, look at Korean. A fascinating writing system that I suspect would give a machine transliteration system a run for its money, although this is way out of the range of languages that I have studied. An additional related point: The English-Arabic dictionary is one thing, but how about Armenian-Russian or Arabic-Chinese? One object I heard from the Russians and Balts that I have spoken to is that even the attempts to standardize on expanded character sets have tended to ignore THESE kinds of mixtures, showing a kind of western europe-fixation that does not solve THEIR problems. -- Judy Grass, CNRI resident ex-slavicist
| home | help | back | first | fref | pref | prev | next | nref | lref | last | post |