[2399] in java-interest
Java Numeric Confusion
daemon@ATHENA.MIT.EDU (Ken Dickey)
Sat Sep 30 09:02:18 1995
Date: Thu, 28 Sep 1995 12:41:11 -0700
To: java-interest@java@Sun.COM
From: KenD@apple.com (Ken Dickey)
Sung to the tune of "I don't understand the Parisians":
I don't understand Java Numerics,
Infs an Nans every time they get the chance!
I don't understand Java Numerics--
have to wrap them as objects in advance!
But they seem to love them
and think nothing of them.
I don't understand Java Numerics!
I am really confused about Java numerics. I think I understand how Java's
numerics work. What I don't understand is WHY they were designed this way.
Perhaps some kind soul out there can help me out.
BSILFD (Byte Short Int Long Float Double): I understand why they are in C,
but I don't understand why they are in Java. Now I must admit that I have
done my 100+ KLOC of C, but I work every day using languages which have
integers, rationals (fractions, e.g. 1/3), real and complex numbers. To my
eye -3, 57, 1+0i and 642674746462346280438940238 are all fine integers.
I have come up with some theories as to why Java uses BSILFD, but on closer
examination none of them seem to be compelling.
>>First, some NON-REASONS:
* Safety
IEEE float arithmetic gives me the wrong answer fast (answers with double
precision but zero accuracy). Java's non-stop arithmetic n addition gives
me no indication of various numeric errors--from which I might have
recovered.
* Regularity
Does the compiler let me know that I should not use a byte (or short) as a
loop counter? I see where adding 1 coerces to an int. If I use the loop
index as a method argument, is it a byte or an int? If one goal for Java
is simplicity in learning, why do I need to know the storage size of
numbers (as opposed to their accuracy)? Why even write numeric methods
which dispatch differently based on BSIL or FD? If Java is a simple
language, why do I care what the representation length is? If I do care,
why would I use Java's numbers? (more on this below)
I have to wrap each number in order to use it as an object [e.g. Long(37)].
In most systems I use, numbers ARE objects. Why have two kinds of things,
Numbers and Objects, when one will do?
I find all these special cases confusing.
* GC Speed & Safety
Java currently uses a conservative collector. It should not have to. [E.g.
stack markers, stack descriptors, or a separate "binary" values stack could
be used (foreign values are either integers or string 'blobs')]. Being
conservative cost me something in gc speed and I suspect complicates
multithreaded gc in that stricter invariants must be maintained, as care
must be taken that separate Java runtime spaces don't interact when
pseudo-objects are marked. I would think that having all numbers be
objects would reduce work and gain me some gc speed.
* Portability
Other implementations of numbers and numeric system runtimes are already
portable, so I don't see a difference here.
>>Now some POSSIBLE REASONS to support BSILFD:
* Calculation Speed
Java uses a bytecode VM. I already get reasonable numeric performance with
bytecode VMs which use bignums and rationals. I don't expect that Java
numeric ops are significantly faster. I can get a CPU with a higher clock
rate or work on making the implementation better if it makes a real
difference. For major calculations I can make out-of-line (foreign) calls
for speed. Since I can't trust Java's numerics, if I do any serious
numeric calculation, I am going to call my own code, some library, or a
system such as Maple or Mathematica, which gives me control of numeric
speed/accuracy tradeoffs. Someone will have to show me how I can get a
qualitative gain here in Java.
* Space Savings
Small integers are represented as immediate (unboxed) values. I really
have a hard time believing that one saves a significant percentage of RAM
in storing bytes and shorts instead of ints. Given "the wrong answer fast"
how much numeric data is Java going to process? If not much, then the
space hit is small. If there is a lot of computation to be done, then it
probably will be done by out-of-line calls to some foreign processing
engine, so again the percentage of RAM saved is probably small.
Similarly reasoning holds for "unboxed floats". Having a few variables
hold floats is not much of a space hit and arrays of floats can be stored
either boxed or unboxed as convenient to the runtime (as array elements are
all of the same type). Treating numbers in the same way as (other) objects
here would seem to be a compiler and runtime simplification--fewer special
cases to test.
* Foreign (C) calls are easier because less argument checking & coercion is
required.
[I have not yet looked at the Java foreign function (C) interface, but have
done work with foreign function interfaces in Lisp, Scheme, and Smalltalk].
Most calls to C pass byte, short, or int values which are typically passed
encoded as an immediate small-integer fixnum, which has to be shifted (1
cycle in most RISC CPUs, i.e. free), so frequently a range check must be
used here in addition to any check for a numeric type. Because immediate
values are tagged, short integers are usually 30 bit quantities, so if more
that 30 bits are used then a bignum must be converted into a 32 bit C int
(note that boxed values are unshifted and can be just copied onto the C
stack or left in a register). I have never found this to be a qualitative
problem in practice but have not benchmarked the problem, so someone could
convince me that it makes a difference. I think that the other costs of
making a foreign call and the safety of "fixnum overflow into bignum" makes
the speed argument seem pretty weak.
* The compiler/runtime is better/easier/safer
Having worked for many years with bytecode compilers and runtimes, it
appears to me that the extra testing and special case code work saved my
making numbers be regular objects in the language balances out with the
work required to do the compile and runtime numeric checks (especially as C
code is already available to port--see below). I am not convinced that an
easier implementation of unboxed floats is worth the irregularity of making
them non-objects.
In summary, I find the numeric portion of Java to be unsafe, irregular and
hence confusing and have a hard time seeing how it pays for
itself--particularly for Java users. I understand that this is late in the
game to suggest changes, but will note that Java has the fewest users it
will ever have. I think that revisiting the design choices in this area
could pay off in a safer, more easily learned and used system.
Again, I have had little time to do Java programming and perhaps some kind
person can show me the reasons I have missed that make BSILFD worthwhile.
Cheers,
-Ken
====================================================================
A couple of language implementations with reasonable numerics (in C) are
Gambit 2-2 and VSCM (bytecode), both of which implement the full Scheme
numeric tower--where an integer is a rational is a real is a complex [I'm
sure there are many other fine implementations to choose from]. Sources
are available from the Scheme Repository
(www.cs.indiana.edu/scheme-repository/SRhome.html).
--- E O F ---
-
Note to Sun employees: this is an EXTERNAL mailing list!
Info: send 'help' to java-interest-request@java.sun.com