[1438] in linux-net channel archive

home help back first fref pref prev next nref lref last post

TCP getting confused

daemon@ATHENA.MIT.EDU (Greg Stein)
Tue Nov 28 09:55:22 1995

Date: Mon, 27 Nov 1995 19:53:40 -0800
To: linux-net@vger.rutgers.edu, linux-kernel@vger.rutgers.edu
From: greg_stein@eshop.com (Greg Stein)
Cc: greg_stein@eshop.com

Hello...

I ran into a situation today where I was FTP'ing a large (21meg) file
between my Linux machine (1.3.18 kernel, P133, EtherExpress Pro) and a Sun
Sparcstation 5. The FTP was performed over two ethernet segments connected
by a PC clone running the Novell MPR software. The FTP would repeatedly
hang during the transfer and my networking would stop functioning. Marking
the interface as down and bringing it back up would not reset the
situation. After trying a couple times, rebooting to clear things up, I
grabbed a 1.3.45 kernel, built it, and tried again. Same symptoms.

So I backed up to 1.3.18 and watched the transfer with a sniffer package on
a Macintosh (EtherPeek, btw). I found some interesting results. The packets
were 506 bytes (dunno why... the MTUs of both machines were 1500). This
means each IP packet is 488 bytes, or 448 bytes of actual data. Note: I
have dropped the high digits of the sequence numbers below and the
significant portions of the times are shown in seconds.milliseconds.

At 28.216, my linux machine sent out a packet with seq #35630. It continued
sending packets until it reached seq #45038 at 28.235. The sun caught up
with the ACKs at 28.244, sending an ack #45486.

Here is where things freaked out. The next packet Linux sent was at 28.440
with seq #45934. Note that a packet is missing. This was immediately
followed with three more packets with times/seq numbers: 28.440/46382,
28.441/46830, and 28.441/35630. Note how it backed up on this last
packet!?!

At 28.443, the Sun resent four ack packets for #45486. At 28.858, the Linux
machine again sent packet #35630. From this point on, the linux and sun
traded acks for 45486 and sending pkt 35630. This went on for a couple
minutes. I stopped the FTP at this point.

After I killed the FTP, things were acting really strange. When I tried to
ping the sun, I saw the ping packets go out (using the sniffer), but
nothing came back. After a while, when I tried again, the sun _did_ send
back replies but the linux machine did not see them. I tried to ping
another host, but nothing went out. Checking the ARP cache, I found this
new host had an address of 00:00:00:00:00:00. Ick. Lastly, I also watched
our 10Base-T hub to ensure it didn't think I was spamming the net or
something. No errors or warning lights there. Finally, I swapped my card
for another (same model and configuration). Didn't fix it.

A few more tidbits of information: my message log shows "eth0: transmit
timed out, network cable problem?" then "last message repeated 22 times".
I'm also running atalkd and got a message that its gateway went down, too
(hmm.. this might be whenever I downed the interface, I can test and verify
if needed).

Any clues at all? Any way I can fix this? Is there more information that
would be handy? I have complete copies of the FTP packets, but not the
later ping packets (although I can reproduce the problem and capture those,
too if needed). I have no problems with building kernels if there are
patches that somebody would like me to apply to gather data.

Thanx much,
Greg Stein
greg_stein@eshop.com



home help back first fref pref prev next nref lref last post