[886] in SIPB_Linux_Development
Linux AFS and low memory
daemon@ATHENA.MIT.EDU (Erik Nygren)
Sun Jan 8 15:47:33 1995
To: linux-afs-bugs@MIT.EDU, linux-dev@MIT.EDU
Date: Sun, 08 Jan 1995 15:47:13 EST
From: Erik Nygren <nygren@MIT.EDU>
Hello,
I got some interesting insight into the Linux AFS problems this
morning when my machine again crashed. The memory corruption problem
(which sometimes manifests itself as a disk corruption problem) is
almost definately related to either low memory conditions or by X
doing memory mapping.
Foundation has been up for 17 days running Linux 1.1.68 and Linux AFS
1.1.67 without problems. For the entire time I (and two other people)
were regularly logged in from remote (so X wasn't running and memory
usage wasn't really extensive (about 6 MB is used in buffers and could
be quickly freed up). When logged in with X, the free memory shrinks
to a few hundred K and the buffers decrease significantly.
For the 17 days, I didn't have any AFS problems. However,
once I started running X I had xv seg fault while trying
to load a file that wasn't in the cache. In the logs I found:
Jan 6 02:23:21 foundation kernel: Unable to handle kernel paging request at virtual address c1488824
Jan 6 02:23:21 foundation kernel: current->tss.cr3 = 005ee000, Dr3 = 005ee000
Jan 6 02:23:21 foundation kernel: *pde = 00000000
(actually, that happened before X was being run.....)
Anyways, awhile after xv segfaulted I had other apps
start dissappearing (such as xterms). Then the machine
wedged solid and I had to power cycle it.
I rebooted it and started up X again and was wroking.
Someone else tried to telnet in and got:
Jan 8 11:48:06 foundation kernel: general protection: 0000
Jan 8 11:48:06 foundation kernel: EIP: 0010:0012164f
Jan 8 11:48:06 foundation kernel: EFLAGS: 00010216
Jan 8 11:48:06 foundation kernel: eax: 00000004 ebx: 00e75474 ecx: 00a44258 edx: 00000a9f
Jan 8 11:48:06 foundation kernel: esi: 00000800 edi: 00e75474 ebp: 00dd0d10 esp: 00dd0cb8
Jan 8 11:48:06 foundation kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Jan 8 11:48:06 foundation kernel: Process tcsh (pid: 331, process nr: 47, stackpage=00dd0000)
Jan 8 11:48:06 foundation kernel: Stack: 00e75474 00e75474 00121b1c 00e75474 00882414
Jan 8 11:48:06 foundation kernel: Code: d0 83 c4 04 c6 43 72 00 8d 43 40 50 e8 ec cf fe ff 83 c4 04
00121600 t _write_inode
00121670 t _read_inode
I just find it interesting that for 17 days I run from remote with no
problems and then return and have things start crashing right and
left. People on the kernel channel are really picky and I'm sure this
would have been mentioned by someone elsewhere if it went beyond Linux AFS.
The most suspicious problem to me is apps sometimes segfaulting when
reading files not yet in the cache. Someone suggested this might be
because files and binaries might sometiems get corrupted during reads. (?)
--- Erik