[16517] in Athena Bugs
hpcomm dumps core querying jobs that take a long time to process
daemon@ATHENA.MIT.EDU (Camilla R Fox)
Mon Nov 30 17:15:12 1998
To: bugs@MIT.EDU, bug-print@MIT.EDU
Date: Mon, 30 Nov 1998 17:15:08 EST
From: Camilla R Fox <cfox@MIT.EDU>
I'd been playing with fractals generated in postscript over the
weekend. Figuring that it was a long weekend and I was the only one in
the zone, I felt free to abuse zone printers by sending them jobs
requiring them to do a lot of computation. However, today, Ted
mentioned that I still had a job on null. As far as I can tell, this
is what happened:
Null starts out in powersaving mode.
Nov 28, 14:16, I sent the job to null. It woke up, and flashed
"processing job"
Null processes for a while, and about 20 minutes from when I sent the
job, I get a "printed successfully" zephyr, then the job prints.
This afternoon, (Nov 30, 3:0x) Ted sent a job to null, poked it to wake
it out of powersaving mode, looked at the queue, and saw my job, which
has a date stamp of Saturday afternoon. He punted his own job, and
asked me about it when I get in.
I am fairly certain that I didn't send the job a second time, so I
login to hobbes and look at null's logs.
Notable things:
There was a second copy of the job sitting in null's output tray
hobbes# ~ted/src/foo.solaris /usr/spool/printer/null/?f*
/usr/spool/printer/null/cfA549SMOKE-SCREEN.MIT.EDU Atime: Mon Nov 30 16:29:10 1998
/usr/spool/printer/null/cfA549SMOKE-SCREEN.MIT.EDU Ctime: Sat Nov 28 14:16:24 1998
/usr/spool/printer/null/dfA003smoke-screen.mit.edu Atime: Mon Nov 30 15:33:14 1998
/usr/spool/printer/null/dfA003smoke-screen.mit.edu Ctime: Sat Nov 28 14:16:24 1998
the Atime on cfA549SMOKE-SCREEN.MIT.EDU corresponds to me running lpq
or klpc modtime, while the Atime on dfA003smoke-screen.mit.edu
corresponds to when Ted poked null, and it printed the job a second
time.
There's a core file from hpcomm corresponding to a few minutes after
Ted poked null and the job got spooled to null a second time.
-rw-rw-rw- 1 daemon daemon 289452 Nov 30 15:36 core
Neither null-acct nor null-log have any record of the job.
/var/adm/printlog has no mention of anything related, except of the
klpc queries that Ted and I both made afterwards.
Our guess as to what happened, is that hpcomm queried null while it was
processing the job, got a confusing answer, and dumped core. When the
job finished printing, it remained in the queue, and didn't get printed
again until null got wakened out of powersaving mode.
The job is sitting in /mit/cfox/ps/lorenz-color.ps, and the core file
in /mit/cfox/ops/core.hobbes (as well as hobbes:/var/spool/printer/null/core).
It's trivial to modify that print job so it executes fewer or more
iterations; at a guess, any job that takes more than three minutes to
process will tickle the bug, so it's not restricted to such abuses as
fractals, since some legitimate color print jobs are huge.
- Camilla