[59] in java-interest
Guile with Java-related features
daemon@ATHENA.MIT.EDU (Tom Lord)
Mon May 8 12:22:11 1995
From: Tom Lord <lord@cygnus.com>
Date: Sun, 7 May 1995 22:29:27 -0700
To: java-interest@java.Eng.Sun.COM
Hi.
I am the coordinator of the Guile extension-language-library project
for GNU. I am also part of the GROW project at Cygnus Support -- a
project started partly from the idea that a free extensible web
browser that could download and run applications on the fly would make
an attractive investment for anyone building internet-related
infrastructure. From those two positions, I think I have an
interesting perspective on Sun's Java project.
I've enjoyed, been impressed by, and informed by the discussion on
this mailing list. I hope it will not be seen as out of place to
contribute news of how Java relates to the project's I'm working on.
This is a fairly long announcment, so I've broken it up into
several sections:
***
** Hacker's Java tools are included in the latest Guile snapshot.
*****
The latest snapshot of the Guile extension language library
can be found at:
ftp.cygnus.com:pub/lord/guile-ii.tar.gz
In the release is a subdirectory called 'latte'. It contains a number
of Scheme sources related to Java. These tools are not yet especially
documented or in a production state, but they are usable by anyone
willing to read the source. I think these tools and new Guile
features relating to them are interesting because they imply the
possibility of making a free Java or Java-like interpreter in a short
period of time, given appropriate resources.
More about these tools is explained below. Included are a macro
assembler, a disassembler, a class file parser/loader for the Java VM
(with some extensions and omissions), and the beginnings of a compiler
back-end for translating a C-like language into Java byte-codes.
The interpreter in the snapshot also includes a bytecode execution
engine. The language understood by the engine is mostly a subset of
the Java VM language. Not all Java VM instructions have been
implemented yet, and a few new instructions have been added. More
about this is explained below as well. For all instructions they have
in common, the two VMs use the same encoding. The two VMs have the
same register set and stack format.
The interpreter does not include a full implementation of the Java
object system. It does, however, include a low-level storage manager
for Java-like objects. (You can create Scheme-accessible objects with
some fields stored in native-scalar formats. These objects can also
have fast-access vtables that Scheme programs can design on-the-fly.
These facilities are useful for implementing higher-level object
systems like SOS/STklos (CLOS for Scheme) or Java.)
***
** Funny random performance numbers.
****
I implemented the byte-code interpreter as quickly as I could. My
goal was to make a pliable implementation that later on I can go back
and optimize and extend. Nevertheless, once I had the interpreter and
macro assembler, the temptation to make one benchmark was
irresistible.
I chose to use the fibonocci function for comparison. This leaves out
any measurement of object access or member function invocation.
This is a measure of iteration, local variable access, and small integer
math.
Please take these results with a grain of salt. They are accurate
to the best of my ability, but comparing languages for performance
is much more complex than just this one test.
I wrote the same function in four languages: C, Java VM macro
assembly, Scheme, and Tcl. The function takes two arguments: x and n.
It computes the xth fibonocci number, n times. To compensate for the
different time-scale of Tcl, I ran the Tcl version for 100 times fewer
iterations and multiplied the resulting time by 100. (I justified
that decision by observing that Tcl run-times varied linearly with the
number of iterations).
I did two trials, both with x=20.
N = 200000
% time ./reffib 20 200000
6765
0.48user [...]
C : Java : Scheme : Tcl
1 : 12 : 200 : 2204553
1 : 17 : 73029
1 : 12025
N = 100000
% time ./reffib 20 200000
6765
0.24user [.,.]
C : Java : Scheme : Tcl
1 : 12 : 195 : 2328566
1 : 16 : 75726
1 : 11926
n.b. "Java" refers to a hand-written macro-assembly
program running on the Guile bytecode interpreter.
This has no relation to the Sun interpreter other
than the external bytecode format and largely-the-same
instruction sets.
In other words, the Java byte-code interpreter ran the code at 1/12
the speed of C. Interpreted Scheme was 1/16 or 1/17 as bytecodes.
Tcl is pretty much off this particular map.
The C, Java, and Tcl versions of the function compute precisely the
same answer for all possible inputs. The Scheme function is slower
than Java, but more general. For example:
>> (reference-fib-loop 500 1)
;Evaluation took 10 mSec (0 in scm_gc) 2982 cells work, 11215 bytes other
139423224561697880139724382870407283950070256587697307264108962948325571622863290691557658876222521294125
***
** The snapshot includes a Java macro-assembler and disassembler.
****
The latest release includes a macro assembler and a disassembler for
the bytecode engine. The assembler emits Java bytecodes and a
constant table.
The input to the assembler is a Scheme list. This is (perhaps)
slightly less convenient than traditional assembler to write by hand,
but it is certainly easier to generate from a program than traditional
syntax.
Here is an example of its use:
;; Call the assembler and save the result
;; in a variable:
;;
>>(set! byte-code
(li-assemble '(
(decl x (ilocal 0))
(decl loop-count (ilocal 1))
(decl tn-2 (ilocal 2))
(decl tn-1 (ilocal 3))
(decl tn (ilocal 4))
(decl saved-x (ilocal 5))
(load x)
(store saved-x)
(iinc saved-x -2)
main-loop
(load 0)
(load loop-count)
[... much code omitted...]
) #f #f))
;; The assembled code:
;;
>> byte-code
"^Z6^E\204^E\376^C..."
;; And it can be disassembled. Notice that macros
;; like "(load x)" have been expanded into the most
;; efficient available opcodes, such as "(iload_0)":
;;
>> (pretty-print (li-disasm byte-code))
((0 iload_0)
(1 istore 5)
(3 iinc 5 -2)
(6 iconst_0)
(7 iload_1)
[...])
The assembler source code includes a complete table of Java VM
instructions mapping names to numeric opcodes and to a description of
the kinds of operands the instruction uses. I've found it very
convenient to use this data structure to generate C code. If your
project involves working with the Java instruction set -- I recommend
grabbing the assembler source and starting from that.
***
** A bytecode-compiler back-end is included in the snapshot.
****
For some time I've been experimenting with a C-like syntax for Scheme
called "ctax". Inspired by Java, I wrote a simple compiler back-end
for "ctax" (that is, syntax-tree->assembly). I added support
for "int" and "float" variables. Here is a typical use:
;; Here is some ctax for an unusual version of
;; fibonocci that takes its argument as a float.
;;
int
fib (float xp)
{
int x, tn, tn1, tn2;
x = xp;
for (x = x - 1; x; x = x - 1)
{
tn = tn1 + tn2;
tn2 = tn1;
tn1 = tn;
}
return tn;
}
;; This is a syntax-tree for that fibonocci
;; function.
;;
;; This syntax was generated by hand since the real
;; parser doesn't yet understand the "int" and "float" types
;; -- only the compiler back-end does.
;;
>>(define source '(fn int fib ((xp float)) ;; One float parameter.
;; Fib has four local variables:
;;
((tn2 int)
(tn1 int)
(tn int)
(x int))
;; Init some variables:
;;
((= tn2 (= tn1 (= tn 1)))
(= x xp)
;; Loop adding up pairs of fibonocci #s.
;;
(for (= x (- x 2))
x
(-= x 1)
(begin
(= tn (+ tn2 tn1))
(= tn2 tn1)
(= tn1 tn)))
;; All done.
;;
(return tn))))
;; Pretty-print the result of compiling the syntax-tree:
;;
>> (pp (cfn-compute-bytecodes (make-table) source))
(#(()) ;; Produced an empty constant table.
fib ;; Function name.
"(f)i" ;; Function signature.
5 ;; Number of locals.
2 ;; Amount of stack needed.
((decl x (ilocal 4))
(decl tn (ilocal 3))
(decl tn1 (ilocal 2))
(decl tn2 (ilocal 1))
(decl xp (flocal 0))
(load 1)
(dup)
(store tn)
(dup)
(store tn1)
(store tn2)
(load xp)
(f2i)
(store x)
[....]))
The ctax parser needs some touch-ups before the back-end will work on
syntax trees not made by hand.
The fibonocci function compiled above produced close to the code I had
previously written by hand. The differences could be eliminated by
means of a simple peephole optimizer.
***
** The snapshot includes a Java class file loader.
****
Guile includes a reader for the Java class file format. It is
available as the function "parse-latte" that takes a string argument
(the contents of a class file) and returns a vector describing the
class defined by the string. Typically, this is used as in
"(parse-latte (map-file class-file-name))". The file format
understood by the parser is that which is defined in an appendix of
the Java VM specification.
On top of that low-level reader, Guile includes a loader written in
Scheme. The loader keeps a cache of parsed files and an associated
search path. It can keep multiple caches with different paths
concurrently.
There are convenience functions that access the data returned by the
object file parser. For example, it is easy to construct a list of
all of the fields in an object type and their signatures. It is easy
to find the superclasses of a class.
Using the loader I was able to read many of the object files in the
alpha snapshot. (I never found one I couldn't read -- but I didn't do
an exhaustive test.)
One adjunct to the loader is a function that computes the layout of a
Java object. It returns a word-by-word description of how memory
allocated to an object of a particular class should be used. The
description specifies which words contain scalars and which contain
object references.
***
** The snapshot includes a low level object system that can define new
** structure types on the fly.
****
I've made the Guile object system extensible in a new way. It is now
possible to define on-the-fly new structure types that include scalar
fields stored in a native format. These structure types are very good
as representations for Java objects.
Structures in Guile are represented by a two-word handle, by a type
object, and by some malloced data. For example, here is the format of
a structure with two integer fields. Programs manipulating the object
pass around pointers to the handle:
struct point
{
int x;
int y;
};
handle: malloced data:
________ ________ ________ ________
| o 1| o---|-------> | int x | int y |
---|---- -------- -------- --------
|
V
type object (also malloced):
_______ ________ ________ __
| <opaque> | vtable ...
------- -------- -------- --
The handle and x and y fields are private to a particular instance of
the type "point". The type data is shared between all instances. The
contents of the vtable are arbitrary and can be determined by
arbitrary Scheme code. The fields x and y are stored as 32 bit
integers.
***
** Comparing the Guile VM and Sun's Java VM.
*****
The Guile bytecode interpreter currently implements an instruction set
which is mostly a subset of the Java virtual machine. 64 bit
operations, array operations, object operations, and the invoke
instructions have been omitted. More math operations and some
operations related to Scheme data types have been added.
I'm sure that Guile could be made into a 100% compatible superset of
Java. That is a tractable problem that could be solved relatively
quickly given appropriate resources. The result would be a system
programmable in both Java and Scheme (and other languages as well).
The world of Scheme objects would appear in Java as a new built-in
final subclass of Object. Java objects and functions would be
accessible from Scheme as new types of Scheme objects. Java would
gain access to the dynamic capabilities of Scheme and to some handy
libraries of Scheme code. Scheme would gain access to the performance
of Java and to some handy libraries of Java code.
With reference to some recent conversation about reflection, Java
programs are already reflected into Scheme by the loader. Since all
Scheme objects can be made available in Java as a built-in class, Java
programs could also optionally be permitted to reflect on the loader
data -- even to generate new code on the fly -- without loss of
safety.
Alas, I'm not currently motivated to embark on the particular task of
true Java compatibility. At the moment, I would have nothing to gain.
I don't know of any free software that runs on the Java VM -- so there
is little to gain from that perspective. I don't currently know of
anyone who would support the development of a free, license-less Java
so there is little to gain from that perspective either. If free
Java software becomes available, or if I find people who will support
the development of license-less Java, my motivations may change.
If a volunteer wants to work on 100% compatibility, that could get the
job done as well.
Feedback is especially welcome on these points. What is your interest
in a free (e.g. license-less) Java-compatible language? Would you
find such a thing valuable?
***
** Alternatives or Extensions to the Java Language.
****
Some features of ANSI-C, notably pointers, are missing from Java. Is
this is a misfeature? A technique exists for the efficient, safe
execution of C code that uses pointers (essentially, a fast
bit-masking operation is injected before every pointer indirection in
the program).
Is there interest in a Java virtual machine extended so as to include
safe pointers? How about an ANSI-C compiler for this extended VM?
***
** How fast?
****
The new bytecode execution engine in Guile is by no means optimized.
It is only a few weeks old and was constructed with swift development
in mind.
Several optimizations are now possible. The interpreter can be sped
up. A portable bytecode->native-code translator is also possible,
using the features of gcc. A Java or ctax->C compiler is also
possible.
I wonder what performance goals people have in mind for their
projects, at this stage. Feedback is welcome!
***
** Web Browser Plans
****
The GNU project has set the goal of providing the world with a free
software web browser. The plan is to start from already free sources
(specificly TkWWW), to convert the program to Scheme, and to upgrade
the user interface to use the features of the latest version of Tk.
The Guile snapshot announced in this message contains our (slightly)
modified version of Tk and a Scheme-based, wish-like program built
using Tk and Guile. There are a few demos of this Tk binding in
action.
I am personally interested in making sure that the GNU browser is
"applette-capable", in the spirit of HotJava. So, I am wondering
whether there is any interest in alternative security models to that
imposed by Java?
For example, Java allows one to run programs without necessarily
trusting the author of the code or the transmission line over which
code is sent. It may be easy to use a conventional language, such as
C in much the same way provided that you trust the program's author,
even if you do not trust the transmission line. (It may be possible
to support C regardless, but it appears to be easy under those
specific conditions of trust.) Are applettes-in-C a valuable feature
to consider implementing? How could your project use them?
Which features attract you to Java? What new features would you like
to see? Which are potentially valuable to you in a free interpreter?
- Portable implementation?
- Fast implementation?
- More like ANSI C?
- Web browser features?
- Features for shipping code over the net?
- The Java language specificly?
- Any sufficiently "C-like" language?
- Objects?
- The AWT library?
- Any portable graphics library?
- Freedom from licensing requirements or restrictions
against "commercial use"?
- Support for other languages? Which?
- Interactive programming features?
- Embedded use?
- Real-time features?
- Support for concurrency?
I am interested not only in "Yes that feature is valuable...", but
"Here is what I need that feature for...". Off-line feedback
is welcome! My address is "lord@cygnus.com".
-t
-
Note to Sun employees: this is an EXTERNAL mailing list!
Info: send 'help' to java-interest-request@java.sun.com