[4014] in linux-net channel archive

home help back first fref pref prev next nref lref last post

Re: Strange Problem with Linux 2.0 TCP

daemon@ATHENA.MIT.EDU (Eric Schenk)
Thu Aug 15 04:51:10 1996

To: Andi Kleen <andi@mlm.extern.lrz-muenchen.de>
cc: linux-net@vger.rutgers.edu
In-reply-to: Your message of "Sun, 11 Aug 1996 06:51:43 EDT."
             <199608111051.MAA23054@maja.mlm.extern.lrz-muenchen.de> 
Date: 	Tue, 13 Aug 1996 03:19:01 -0400
From: "Eric Schenk" <schenk@cs.toronto.edu>


Andi Kleen <andi@mlm.extern.lrz-muenchen.de> writes:
>I just discovered that telnet to our main linux server (RH 3.0.3/2.0.8)
>doesn't work anymore. I tried to track the problem down and it looks
>like a TCP bug:

Hmm. Not sure, I can't tell from the information I have here.

>and then nothing. ps shows that it hangs in the login process. strace 
>on the login process shows that it hangs in the write to stdout, strace
>on the telnetd process shows this:
>
>select(16, [0 3], [], [0], NULL

>tcp        0    117 maja:telnet            bse:1720 ESTABLISHED root

I assume the above is the netstat output? Which machine is it taken on? Maja?
Anyway, it appears to indicated that the machine has 117 bytes queued
to send. This seems inconsistent with the TCP behavior below...

>This is the tcpdump:

>12:35:58.454772 frob.1375 > maja.telnet: S 2728469057:2728469057(0) win 512 <mss 1460>
>12:35:58.454772 maja.telnet > frob.1375: S 3955995639:3955995639(0) ack 2728469058 win 31744 <mss 1460>

Ok. Frob and maja set up a connection.

>12:35:58.454772 frob.1375 > maja.telnet: . ack 1 win 2048 (DF)
>12:35:58.464772 frob.1375 > maja.telnet: . 1:25(24) ack 1 win 4096 (DF)
>12:35:58.594772 maja.telnet > frob.1375: P 1:4(3) ack 25 win 31744 (DF)
>12:35:58.594772 frob.1375 > maja.telnet: . 25:28(3) ack 4 win 4096 (DF)
>12:35:58.594772 maja.telnet > frob.1375: P 4:28(24) ack 25 win 31744 (DF)
>12:35:58.784772 maja.telnet > frob.1375: P 4:28(24) ack 28 win 31744 (DF)
>12:35:58.784772 frob.1375 > maja.telnet: . ack 28 win 4096 (DF)
>12:35:58.784772 maja.telnet > frob.1375: P 28:31(3) ack 28 win 31744 (DF)
>12:35:58.784772 frob.1375 > maja.telnet: . 28:37(9) ack 31 win 4096 (DF)
>12:35:59.124772 maja.telnet > frob.1375: . ack 37 win 31744
>12:35:59.124772 frob.1375 > maja.telnet: . 37:40(3) ack 31 win 4096 (DF)
>12:35:59.124772 maja.telnet > frob.1375: P 31:49(18) ack 40 win 31744 (DF)
>12:35:59.144772 frob.1375 > maja.telnet: . ack 49 win 4096 (DF)
>12:35:59.144772 frob.1375 > maja.telnet: . 40:74(34) ack 49 win 4096 (DF)
>12:35:59.154772 maja.telnet > frob.1375: P 49:52(3) ack 74 win 31744 (DF)
>12:35:59.154772 frob.1375 > maja.telnet: . 74:77(3) ack 52 win 4096 (DF)
>12:35:59.164772 maja.telnet > frob.1375: P 52:69(17) ack 77 win 31744 (DF)
>12:35:59.164772 frob.1375 > maja.telnet: . 77:80(3) ack 69 win 4096 (DF)
>12:35:59.194772 maja.telnet > frob.1375: . ack 80 win 31744

Maja sends out up to character 69. Frob acks it. Everyone _looks_ happy.
Both ends don't seem to think they have anything further to say
to each other.

Anyway, after 8 seconds or so it looks like you gave up and killed
the connection.

>12:36:08.274772 frob.1375 > maja.telnet: F 80:80(0) ack 69 win 4096 (DF)
Frob sends out a FIN.

>12:36:08.274772 maja.telnet > frob.1375: . ack 81 win 31744
Maja acks it 

>12:36:08.284772 maja.telnet > frob.1375: F 69:69(0) ack 81 win 31744
Maja sends out its own FIN.

>12:36:08.284772 frob.1375 > maja.telnet: . ack 70 win 4096 (DF)
Frob acks it, connection gets shut down properly.

The fact that both ends of the connection shut down properly indicates
that they don't think anything is queued up that they need to send.
This is a bit weird, since the netstat output above seems to indicate
that at least one of them does...

>and then it stops and and the connection hangs. Other servers
>(sendmail, ftp, finger, squid, cern httpd) work.

This is fishy. A TCP bug _should_ hit these services just as bad.

>sshd and rlogind hang too.
>I tried telnetting from a Linux 2.0.10 box, a box with
>2.0.10+pedro's netpatch and from FreeBSD 2.2. The strange thing
>is that it worked from the BSD box 2-3x when it didn't from one
>of the Linux boxes, but after 3 telnets or so it stopped working
>too.

Let's try to get a bit more information here:

(1) Does the server work with anything in the 2.0.x series?
    If so, which ones?
    [There have been VERY few tcp changes in the 2.0.x series,
    so this should help narrow down the cause considerably.]


(2) What is the network topology we are dealing with here?
    Everything connected by ethernet, or something else?

(3) What are the MTU's on the various links in the network.

-- eric

---------------------------------------------------------------------------
Eric Schenk                          www: http://www.cs.toronto.edu/~schenk
Department of Computer Science	               email: schenk@cs.toronto.edu
University of Toronto

home help back first fref pref prev next nref lref last post