[886] in SIPB_Linux_Development

home help back first fref pref prev next nref lref last post

Linux AFS and low memory

daemon@ATHENA.MIT.EDU (Erik Nygren)
Sun Jan 8 15:47:33 1995

To: linux-afs-bugs@MIT.EDU, linux-dev@MIT.EDU
Date: Sun, 08 Jan 1995 15:47:13 EST
From: Erik Nygren <nygren@MIT.EDU>


Hello,

I got some interesting insight into the Linux AFS problems this
morning when my machine again crashed.  The memory corruption problem
(which sometimes manifests itself as a disk corruption problem) is
almost definately related to either low memory conditions or by X
doing memory mapping.

Foundation has been up for 17 days running Linux 1.1.68 and Linux AFS
1.1.67 without problems.  For the entire time I (and two other people)
were regularly logged in from remote (so X wasn't running and memory
usage wasn't really extensive (about 6 MB is used in buffers and could
be quickly freed up).  When logged in with X, the free memory shrinks
to a few hundred K and the buffers decrease significantly.
For the 17 days, I didn't have any AFS problems.  However,
once I started running X I had xv seg fault while trying
to load a file that wasn't in the cache.  In the logs I found:

Jan  6 02:23:21 foundation kernel: Unable to handle kernel paging request at virtual address c1488824
Jan  6 02:23:21 foundation kernel: current->tss.cr3 = 005ee000, Dr3 = 005ee000
Jan  6 02:23:21 foundation kernel: *pde = 00000000

(actually, that happened before X was being run.....)

Anyways, awhile after xv segfaulted I had other apps
start dissappearing (such as xterms).  Then the machine
wedged solid and I had to power cycle it.

I rebooted it and started up X again and was wroking.
Someone else tried to telnet in and got:

Jan  8 11:48:06 foundation kernel: general protection: 0000
Jan  8 11:48:06 foundation kernel: EIP:    0010:0012164f
Jan  8 11:48:06 foundation kernel: EFLAGS: 00010216
Jan  8 11:48:06 foundation kernel: eax: 00000004   ebx: 00e75474   ecx: 00a44258   edx: 00000a9f
Jan  8 11:48:06 foundation kernel: esi: 00000800   edi: 00e75474   ebp: 00dd0d10   esp: 00dd0cb8
Jan  8 11:48:06 foundation kernel: ds: 0018   es: 0018   fs: 002b   gs: 002b   ss: 0018
Jan  8 11:48:06 foundation kernel: Process tcsh (pid: 331, process nr: 47, stackpage=00dd0000)
Jan  8 11:48:06 foundation kernel: Stack: 00e75474 00e75474 00121b1c 00e75474 00882414 
Jan  8 11:48:06 foundation kernel: Code: d0 83 c4 04 c6 43 72 00 8d 43 40 50 e8 ec cf fe ff 83 c4 04

00121600 t _write_inode
00121670 t _read_inode

I just find it interesting that for 17 days I run from remote with no
problems and then return and have things start crashing right and
left.  People on the kernel channel are really picky and I'm sure this
would have been mentioned by someone elsewhere if it went beyond Linux AFS.

The most suspicious problem to me is apps sometimes segfaulting when
reading files not yet in the cache.  Someone suggested this might be
because files and binaries might sometiems get corrupted during reads.  (?)

	--- Erik






home help back first fref pref prev next nref lref last post