[626] in Release_7.7_team

home help back first fref pref prev next nref lref last post

Re: Athena 8.0 is slow on Solaris

daemon@ATHENA.MIT.EDU (John Hawkinson)
Mon Jul 8 10:48:15 1996

Date: Mon, 8 Jul 1996 10:48:08 -0400 (EDT)
To: Greg Hudson <ghudson@MIT.EDU>, Bill Cattey <wdc@MIT.EDU>
Cc: release-team@MIT.EDU, testers@MIT.EDU
In-Reply-To: "[625] in Release_7.7_team"
From: John Hawkinson <jhawk@MIT.EDU>

> We can't do anything about this for fall term, but this is bad.
> Inattention to this sort of problem is why the Decstations are so
> unusable today even though they were cool fast machines when I was a
> freshman.

This is somewhat disturbing to me; I'm somewhat curious what criteria
define a SHOWSTOPPER, but to me this certainly seems like it should be
one.

[reordering paragraphs]
> Multiple performance benchmarks are not a bad thing, and they
> shouldn't take very long to try out, so I encourage anyone with a
> little time to try to come up with one in the next, say, 24 hours (the
> amount of time we have before 8.0 is scheduled to go public).

I would like to see the release delayed over this, if only a week.

I personally feel somewhat guilty for not having spent a lot of time
during the beta process on this; last night was the first time I really
did a build on a beta machine, and the results were pretty pitiful.

> Since kernel performance studies are not exactly our area of
> expertise, attention to the problem should probably fall along the
> following lines:

I would suggest that userland performance studies are not exactly our
area of expertise, and we should not neglect kernel performance
benchmarks if can be avoided.

> 	  * Someone should develop a vaguely repeatable benchmark that
> 	    demonstrations the 8.0 slowness (building a moderately-sized
> 	    package with the gnu locker gcc, perhaps, or running latex
> 	    multiple times on a document).  This will be a lot easier if
> 	    it's done while there are still 7.7 machines.

I'm doing this now; unfortunately, I had planned on getting a
significant amount of stuff done at work today, and don't have time
for a huge amount of instrumentation. Nevertheless, I think my results
(appended below) provide a clear indicication of the magnitude of the
problem.

If no one else is willing I can drop some stuff on the floor and try
and look into kernel benchmarking tonight, but it'd be nice if someone
else could commit the resources to that.

I think it would also make sense for someone to run a widely available
generic benchmarking program (eg: lmbench) on both releases and
compare them; perhaps such a thing should be a standard part of the
release process.

> 	  * Someone should start replacing components on a test machine
> 	    (upgrading to Solaris 2.5 or 2.5.1, upgrading to newer
> 	    versions of AFS as they come out) and seeing how it affects
> 	    the benchmark.

To me, it seems like it might be more reasonable to start with 7.7 and
redo the changes that came with 8.0 and see if we can find an obvious
component that's at fault. This may or may not be feasable for lots of
reasons, I guess.

> Multiple performance benchmarks are not a bad thing, and they
> shouldn't take very long to try out, so I encourage anyone with a
> little time to try to come up with one in the next, say, 24 hours (the
> amount of time we have before 8.0 is scheduled to go public).

The specific case where I was seeing problems was build tcpdump from
sipb cell sources into the sipb cell. To avoid significant resource
contention issues, I'm benchmarking building tcpdump from sipb cell
sources onto local disk.

I ran the same set of commands (almost) on bill-the-cat, a 24mb SPARC Classic
running the 7.7V release, and bart-savagewood, a ??mb SPARC Classic running
8.0B (I probably should have made sure it was running 8.0C first,
sorry...).

Typescripts of both sessions are in ~jhawk/80bench.

The sequence of commands were:

	Symlink /afs/sipb/project/tcpdump/build/libpcap to /var/tmp/libpcap
	Build a symlink farm from /afs/sipb/project/tcpdump/src/tcpdump to
		/var/tmp/tcpdump
	Stat `which gcc`, `which gmake`, /srvd, and /os.
	"fs flushv" the above
	"fs getcacheparms" just for kicks.
	"uptime" just for kicks (in all cases I was the only one
		logged in on the machine, but I logged in under X so was
		subject to my normal set of xterms. I killed zwgc, however.
	"time ./configure"
	"time ./gmake"


Results:

On bill-the-cat (7.7):

configure:	24.09u 45.79s 1:54.10 61.2%
gmake:		201.55u 108.42s 5:43.23 90.3%

On bart-savagewood (8.0):

configure:	25.37u 49.39s 2:30.61 49.6%
gmake:		205.60u 811.90s 50:29.26 33.5%

My shell is sipb tcsh, so time reports the user-level time, kernel
time, elapsed time, and another number (% utilization?).

The important thing to note here is that the elapsed time for
configure+build under 7.7 was 7 minutes 37.33 seconds. Under 8.0 it
seriously worse at 52 minutes 59.87 seconds, or a factor of 6.9 times
SLOWER.

In hindsight, these benchmarks should have probably been preformed
with Solaris' process accounting turned on. Whoever else does some
benchmarks should give that a shot, perhaps to see what's losing the most.

--jhawk

home help back first fref pref prev next nref lref last post