[2236] in java-interest

home help back first fref pref prev next nref lref last post

Re: regular expressions in Java

daemon@ATHENA.MIT.EDU (Thomas Breuel)
Wed Sep 27 08:26:30 1995

Date: Wed, 27 Sep 1995 02:47:50 -0700
From: Thomas Breuel <tmb@best.com>
To: Glen.Perkins@NativeGuide.com, java-interest@java.Eng.Sun.COM

I don't see the big problem about a regular expression package that
works with unicode.  Whether the characters come from an 8bit set
or a 16bit set shouldn't make a big difference.  I have written
FSM code that deals with large character sets that were neither
ASCII nor unicode, and there was nothing conceptually difficult
about it.

For starters, it would be nice just to see an RE package that does
what Perl regular expressions do, for normal ASCII characters.
That is what most people (in the US and outside) need.  You can
make a pretty arbitrary choice as to what Russian or Japanese
characters are treated like by default without offending anybody
or causing havoc.

Further support for unicode would probably mostly consist of adding
new character classes (e.g., the "Japanese character", "Kanji",
"Cyrillic", and "Katakana" classes) and of adding mode changes that
affect the interpretation of constructs like "digit", "word
constituent", "complement", etc., as well as the treatment of what
really is ligatures that for some reason or another made their way
into unicode (e.g., you may want to treat German "a-umlaut" as an
"ae" ligature in some contexts).  But I think those constructs can
be added as the need arises, and there is no need to delay implementing
the basic functionality for until all those issues have been worked out.

In fact, you could make the RE matching class extensible so that
users can add their own constructs, mode switches, and characters
classes, to it.

				Thomas.

PS: I have studied German, English, French, Japanese, Russian,
Arabic, and Latin, and have some experience with other languages
and scripts, so I do have some basis for making these comments.
-
Note to Sun employees: this is an EXTERNAL mailing list!
Info: send 'help' to java-interest-request@java.sun.com

home help back first fref pref prev next nref lref last post