[1118] in linux-net channel archive
Oops: SCSI tape, Networking, 1.3.28, 1.2.13
daemon@ATHENA.MIT.EDU (Grant R. Guenther)
Tue Sep 19 16:52:00 1995
From: "Grant R. Guenther" <guenther@empress.com>
To: Linus.Torvalds@helsinki.ft, iialan@iifeak.swan.ac.uk,
linux-kernel@vger.rutgers.edu, linux-net@vger.rutgers.edu,
linux-scsi@vger.rutgers.edu
Date: Tue, 19 Sep 1995 09:12:49 -0400 (EDT)
First the hardware configuration:
i486/33, 16M
AHA 1542
Archive Python DAT tape drive
WD8013 ethercard
Panasonic CDrom on SMS (Lasermate clone) card
IDE disks: Conner CFS1275A, Seagate 94354-230
Software:
Slackware 2.0? and 2.3
Kernels 1.2.13 and 1.3.28
Operational context:
This machine runs a network backup script every night. The
script uses rsh to run cpio on about 30 machines, capturing and
storing output onto the DAT (it's a bit more complex than
that - but the rest is just local processing through a pipeline
of filters.)
The machine is a workstation during the day, but it is rebooted
and X is not active when the backup runs.
Ancient problem:
This system worked about 80% of the time, rather too often
the ST tape driver would lock up leaving processes in D wait.
New problem:
Starting early last week, the system began to panic at the same
time every night. This was running on an old slackware and
kernel 1.2.13. After the oops, the networking would be dead.
(There's no obvious correlation between the panic and the
state of the backup process. The amount of data varies
radically on each machine every day, but the trouble seems to
happen at about the same time.)
Once before - in another life - I encountered something like this
that turned out to be corrupt binaries, so I decided to eliminate
that possibility by completely reinstalling Linux. Slackware 2.3
and the latest kernel (since I noticed that some fixes have gone
into the ST driver recently). Result: NO CHANGE ! At 2am this
morning the following oops:
Unable to handle kernel paging request at virtual address ce7659b8
current->tss.cr3 = 00101000, [r3 = 00101000
*pde = 00000000
Oops: 0000
EIP: 0010:0013e5c7
EFLAGS: 00010206
eax: 8310176d ebx: 00308b4c ecx: 83100088 edx: 001c720c
esi: 00308b0c edi: 00e31214 ebp: 001c720c esp: 001bb8d8
ds: 0018 es: 0018 fs: 002b gs: 0018 ss: 0018
Process swapper (pid: 0, process nr: 0, stackpage=001b9aa0)
Stack: 00308b20 003ee858 00308b4c 00e31214 00e31214 00141664 001416a5 00e31214
001c720c 00308b4c 00000001 1bdb2bc0 00e31214 003ee844 00e312a0 00000000
00308b4c 001c720c 00146828 bc24d16f 9d1132d3 00e31214 003ee844 1bdb2bc0
Call Trace: 00141664 001416a5 00146828 001a3f2e 001a0214 0013e32b 00138806
00117276 0010a78d 001b0018 001b0018 00109934 0010a809 00109443
Code: 28 88 4b 42 66 8b 46 02 86 c4 66 39 45 3e 73 29 6a 00 55 53
Aiee, killing interrupt handler
Unable to handle kernel paging request at virtual address c0001004
current->tss.cr3 = 00101000, [r3 = 00101000
*pde = 00102067
*pte = 00000000
Oops: 0002
EIP: 0010:0011b492
EFLAGS: 00010046
eax: 00000000 ebx: 00000000 ecx: 001da118 edx: 001e8000
esi: fffff000 edi: 00001000 ebp: 00000000 esp: 001bb7c4
ds: 0018 es: 0018 fs: 002b gs: 0018 ss: 0018
Process swapper (pid: 0, process nr: 0, stackpage=001b9aa0)
Stack: 00001000 00102004 00001000 00400000 001da100 001e6002 00000202 00118cd0
00001000 00000000 001bbaa0 00000000 001bc110 001bb89c 00400000 00101000
00000000 40000000 00101000 0011d273 001bc110 00000000 40000000 001bc110
Call Trace: 00118cd0 0011d273 0011573d 00115957 0010af1e 0010ac65 02000000
01800000 00110153 0010fefa 0010ff08 0010a9cb 00140018 0013e5c7 00141664
001416a5 00146828 001a3f2e 001a0214 0013e32b 00138806 00117276 0010a78d
001b0018 001b0018 00109934 0010a809 00109443
Code: 89 4f 04 8b 90 18 a1 1d 00 89 17 89 7a 04 89 b8 18 a1 1d 00
kfree of non-kmalloced memory: 001bbae0, next= 00000000, order=0
kfree of non-kmalloced memory: 001bbad0, next= 00000000, order=0
kfree of non-kmalloced memory: 001bbf04, next= 00000000, order=0
idle task may not sleep
Here's what ksymoops has to say:
EIP: 13e5c7 T _ip_queue_xmit+87/220
Trace: 141664 t _tcp_send_ack+264/2d0
Trace: 1416a5 t _tcp_send_ack+2a5/2d0
Trace: 146828 T _tcp_rcv+2388/23d0
Trace: 1a3f2e t _wd_block_input+be/100
Trace: 1a0214 t _pty_write+60/180
Trace: 13e32b T _ip_rcv+43b/4f0
Trace: 138806 T _net_bh+116/160
Trace: 117276 T _do_bottom_half+3e/a8
Trace: 10a78d t handle_bottom_half+d/20
Trace: 1b0018 t _aha1542_intr_handle+380/4a0
Trace: 1b0018 t _aha1542_intr_handle+380/4a0
Trace: 109934 T _sys_idle+44/50
Trace: 10a809 T _system_call+59/a0
Trace: 109443 T _start_kernel+1b3/1c0
EIP: 11b492 T _free_pages+ca/1c0
Trace: 118cd0 T _zap_page_range+120/1c0
Trace: 11d273 T _exit_mmap+63/b0
Trace: 11573d t _exit_mm+21/50
Trace: 115957 T _do_exit+4b/c0
Trace: 10af1e T _die_if_kernel+2b2/2e0
Trace: 10ac65 T _page_fault+155/15c
Trace: 2000000
Trace: 1800000
Trace: 110153 T _do_page_fault+24b/2d0
Trace: 10fefa T _si_meminfo+1aa/1b8
Trace: 110153 T _do_page_fault+24b/2d0
Trace: 10a9cb t error_code+4b/60
Trace: 140018 T _icmp_rcv+18/160
Trace: 13e5c7 T _ip_queue_xmit+87/220
Trace: 141664 t _tcp_send_ack+264/2d0
Trace: 1416a5 t _tcp_send_ack+2a5/2d0
Trace: 146828 T _tcp_rcv+2388/23d0
Trace: 1a3f2e t _wd_block_input+be/100
Trace: 1a0214 t _pty_write+60/180
Trace: 13e32b T _ip_rcv+43b/4f0
Trace: 138806 T _net_bh+116/160
Trace: 117276 T _do_bottom_half+3e/a8
Trace: 10a78d t handle_bottom_half+d/20
Trace: 1b0018 t _aha1542_intr_handle+380/4a0
Trace: 1b0018 t _aha1542_intr_handle+380/4a0
Trace: 109934 T _sys_idle+44/50
Trace: 10a809 T _system_call+59/a0
Trace: 109443 T _start_kernel+1b3/1c0
Does anybody care to suggest what is going on ?
I can't say for sure, but this *may* be coincidental with the installation
of a FreeBSD machine on the same network. (The network is already very
heterogeneous: Linux, HP-UX, AIX, Solaris, SunOS, WfWg, NT, OSF-1, AUX,
you get the idea ...) We started installing our 1st FreeBSD machine at
about the same time. It's 2.1.0-950726-SNAP. I'm going to see what
happens if I take it off-line tonight.
All suggestions are welcome, thanks in advance !
----------------------------------------------------------------------------
Grant R. Guenther, System Administrator guenther@empress.com
Empress Software, 3100 Steeles Ave. E., Markham, Ont. 905-513-8888
----------------------------------------------------------------------------