[1997] in linux-net channel archive
TCP timing out too fast when the connection goes down, patch included.
daemon@ATHENA.MIT.EDU (Eric Schenk)
Sun Mar 3 18:23:36 1996
To: linux-net@vger.rutgers.edu
cc: Alan Cox <alan@cymru.net>
Date: Sun, 3 Mar 1996 16:14:30 -0500
From: "Eric Schenk" <schenk@cs.toronto.edu>
I've been noticing a problem with the recent 1.3.X kernels
where the TCP stack seems to be a bit too hasty to timeout
a TCP socket when the network layer isn't responding.
The following message describes the problem and proposes a patch to
linux 1.3.70 that fixes the problem.
Here's a test I performed on 1.3.70.
I run an rlogin session to a remote
machine, in this scenario the connect gets performed over ppp
by diald. After the connection is established and I actually
have logged into the remote side I tell diald to block the link.
This causes diald to take down the ppp link and set the routes
to point to its proxy device. So, at this point, the network
layer still sees a route to the remote machine, but any packets
that get sent won't get acked.
Now, I type a character in my rlogin session, and watch the
packets that get passed to the proxy interface.
Looking at the session log I see:
Mar 3 13:49:22 icebox diald[1794]: filter accepted rule 9 proto 6 len 41 seq ec744c17 ack 94dce194 flags PUSH ACK packet 128.100.26.84,1023 => 128.100.1.15,513
Mar 3 13:50:01 icebox last message repeated 6 times
Mar 3 13:50:41 icebox diald[1794]: filter accepted rule 9 proto 6 len 41 seq ec744c17 ack 94dce194 flags PUSH ACK packet 128.100.26.84,1023 => 128.100.1.15,513
Mar 3 13:52:00 icebox diald[1794]: filter accepted rule 9 proto 6 len 41 seq ec744c17 ack 94dce194 flags PUSH ACK packet 128.100.26.84,1023 => 128.100.1.15,513
Mar 3 13:52:06 icebox schenk: flush
This indicates that the initial send of the packet happened, followed
by 8 resends. Just after the 8th resend my rlogin session terminates with:
rlogin: read: Connection timed out.
rlogin: connection closed.
Now, looking at the code in tcp_timer.c it seems that this should not
be happening until sk->retransmits > TCP_RETR2, which is 15.
My guess is that sk->retransmits is getting bumped by 2 every time a
retransmit occurs, but I can't see how this is happening at the moment.
Adding a bit of instrumentation to the kernel I get the following
trace ever time a packet is retransmitted:
Mar 3 15:39:19 icebox kernel: sk->retransmits bumped in tcp_time_write_timeout (sk = 7678976).
Mar 3 15:39:19 icebox kernel: sk->prot->retransmits bumped in tcp_time_write_timeout (sk = 7678976).
call the tcp_do_retransmit happens
Mar 3 15:39:19 icebox kernel: sk->prot->retransmits bumped in tcp_do_retransmit (sk = 7678976)
Mar 3 15:39:19 icebox kernel: sk->retransmits bumped in tcp_retransmit_time (sk = 7678976).
Mar 3 15:39:19 icebox kernel: sk->prot->retransmits bumped in tcp_retransmit_time (sk = 7678976).
Mar 3 15:39:19 icebox kernel: tcp_time_write_timeout completed.
Mar 3 15:39:19 icebox kernel: tcp_retransmit_timer done.
Clearly sk->retransmits is double counting, and in fact sk->prot->retransmits
seems to be triple counting!
What seems to happen:
tcp_time_write_timeout() gets called when the timeout occurs.
This in turn bumps sk->retransmits ,and sk->prot->retransmits by one
and calls sk->prot->retransmit(), which turns out to be tcp_do_retransmit().
This in turn bymps sk->prot->retransmits again and
sends the packet out to the device.
When the packet appears at the device tcp_rcv() gets called,
and this in turn calls tcp_ack(), which calls tcp_retransmit(),
which calls tcp_retransmit_time().
Now, tcp_retransmit_time() bumps both counters one more time for good measure.
I think what should be happening is that the counters should only be
bumped in the tcp_do_retransmit code. That way we are counting
real retransmits. Note that the KEEPALIVE stuff also has to bump
the counters since it is sending a packet without using tcp_do_retransmit(),
as does the tcp_send_probe0() routine.
Here is a set of patches that I think fixes the problem. Alan can you
comment on these?
-- eric
--CUT HERE-----------------------------------------------------------
--- linux/net/ipv4/tcp_output.c.orig Sun Mar 3 15:47:32 1996
+++ linux/net/ipv4/tcp_output.c Sun Mar 3 15:58:00 1996
@@ -464,7 +464,8 @@
*/
ct++;
- sk->prot->retransmits ++;
+ sk->retransmits++;
+ sk->prot->retransmits++;
tcp_statistics.TcpRetransSegs++;
--- linux/net/ipv4/tcp_timer.c.orig Sun Mar 3 15:47:38 1996
+++ linux/net/ipv4/tcp_timer.c Sun Mar 3 15:56:19 1996
@@ -70,8 +70,6 @@
* the 120 second clamps though!
*/
- sk->retransmits++;
- sk->prot->retransmits++;
sk->backoff++;
sk->rto = min(sk->rto << 1, 120*HZ);
tcp_reset_xmit_timer(sk, TIME_WRITE, sk->rto);
@@ -208,8 +206,6 @@
/*
* Retransmission
*/
- sk->retransmits++;
- sk->prot->retransmits++;
sk->prot->retransmit (sk, 0);
tcp_write_timeout(sk);
}
--CUT HERE-----------------------------------------------------------