[2673] in linux-net channel archive

home help back first fref pref prev next nref lref last post

Re: Multiport hellp

daemon@ATHENA.MIT.EDU (Doug Ledford)
Fri Apr 26 16:33:01 1996

Date: 	Thu, 25 Apr 1996 23:41:41 -0500 (CDT)
From: Doug Ledford <dledford@dialnet.net>
To: Jon Lewis <jlewis@inorganic5.fdt.net>
cc: Linux Net Mailing List <linux-net@vger.rutgers.edu>
In-Reply-To: <Pine.LNX.3.91.960424135953.27145s-100000@inorganic5.chem.ufl.edu>

On Wed, 24 Apr 1996, Jon Lewis wrote:

> I've got 2 64-port terminal servers...ewok and endor.  Ewok was recently a
> 486-100 with 2 32 port RocketPort cards.  It would kernel panic often
> (like just about daily)...even after applying about every 1.2.13 bug-fix I
> could find.  I started to blame things on rocket.o (RocketPort driver)
> race conditions, heard that a faster CPU might do the trick, and so I
> upgraded ewok to a P100.  Since then it's been up 26 days. 

Out of curiousity, how many ports do you have actually in use on ewok?  
My 64 port teminal and ppp server has 62 28.8Kbd modems and 1 56Kbd 
CSU/DSU hooked into it, so as you can guess, it gets hit pretty hard.  It 
is a Pent100 machine.  Before I applied the bug fixes to the kernel, it 
went down almost daily.  After putting in the bug-fixes and using and 
experimental RocketPort driver version that has a few printks followed by 
immediate returns in the case of three select race conditions, I haven't 
had a lockup since.  Although, I should note the there WOULD have been 
two lockups on the machine if it hadn't been for the printk-return pairs, 
I found two rp: WARNING: rp_do_transmit called with info->tty==NULL 
messages in my syslog.  This condition used to cause an oops, but now it 
bypasses the offending port until the next timer click which gives the 
race condition time to settle itself.

> 
> In the interest of science and stable terminal servers, I decided to put 
> 64 ports of Cyclades gear into ewok's old 486-100 board.  It ran for 5 
> days while I configured things, then I put close to 40 modems on it.  I 
> call this one endor.  Endor then started locking up every 24 hours.  On a 
> tip from Cyclades, I turned off swap (it has plenty of RAM for what it 
> does) and it ran a few days.  I've kept swap disabled, and now instead of 
> locking up, it kernel panics every few days.
> 
> Endor just panic'd again...this time under very light load:
> 
> Wed Apr 24 13:30:02 EDT 1996
>   1:30pm  up 3 days, 11:34,  9 users,  load average: 0.00, 0.02, 0.04
>              total       used       free     shared    buffers
> Mem:         31168      30472        696       7172      19988
> -/+ buffers:            10484      20684
> Swap:            0          0          0
> 
> Apr 24 13:32:08 endor pppd[28341]: remote IP address 205.229.51.140
> Apr 24 13:32:49 endor pppd[29774]: pppd 2.2.0 started by topherjc, uid 333
> Apr 24 13:36:31 endor syslogd: restart
> Apr 24 13:36:32 endor kernel: Kernel logging (proc) started.
> Apr 24 13:36:32 endor kernel: kswap 2.2.1.3 (Exp 1995/06/03 04:10:43)
> 
> Again, it panic'd and then rebooted itself with the reset_on_panic 
> patch.  Nothing about the Oops got logged...it rarely does...but the 
> interesting thing is that this mode of crash, panic right as a PPP 
> session starts up, is exactly what the RocketPort based system used to 
> do.  It used to do this under pppd 2.1.2d, 2.2.0e, and 2.2.0f...so it 
> would seem unlikely that the PPP code is at fault, but it seems very much 
> to me that the problem is independant of the RocketPort and Cyclades 
> drivers, and must be elsewhere in the 1.2.13 kernel.
> 

I had this same problem with the RocketPort, panic without an oops, 
lockup, reset.  No logging info.  Except once.  It came back with an oops 
in rp_write one time.  It seems to me the race condition is somewhere in 
either both drivers or in the kernel, in which case the ppp writing to 
the tty structure causes the oops.  It can also happen with the getty in 
use or with init if it is writing to the serial port.

> It should be noted that I'm not using a standard (known to be quite 
> buggy) 1.2.13...but a heavily bugfix patched one.  I'm using all of the 
> bugfixes at http://trishul.sci.gu.edu.au/~tony/linux/patches.html, and 
> have been for some time.
> 

I'm now using my homegrown 1.2.14-unofficial, which includes the above 
plus quite a bit more in the way of driver updates, and I haven't gotten 
pagged by our monitoring system in over two weeks whereas I used to get 
paged once to twice a day.

> Endor's P100 parts and case just came in today...so I'll start building
> that this afternoon.  I suspect endor will magically stabilize as ewok 
> did once it's on a P100 board.

I would agree to a certain extent, but if you push it from 40 to 64, I 
bet it gives you problems :(

*****************************************************************************
* Doug Ledford                      *   Unix, Novell, Dos, Windows 3.x,     *
* dledford@dialnet.net    873-DIAL  *     WfW, Windows 95 & NT Technician   *
*   PPP access $14.95/month         *****************************************
*   Springfield, MO and surrounding * Usenet news, e-mail and shell account.*
*   communities.  Sign-up online at * Web page creation and hosting, other  *
*   873-9000 V.34                   * services available, call for info.    *
*****************************************************************************



home help back first fref pref prev next nref lref last post