[689] in linux-net channel archive
TCP bug and potential fix
daemon@ATHENA.MIT.EDU (Mark Yarvis)
Sat Jul 15 03:12:11 1995
Date: Fri, 14 Jul 95 10:25:10 -0700
From: yarvis@FICUS.CS.UCLA.EDU (Mark Yarvis)
To: linux-net@vger.rutgers.edu
In porting a custom program to Linux, I believe I've found a bug in the
Linux TCP code.
The bug causes TCP connections to be prematurely reset and data lost.
The problem can be recreated using a simple rsh command (described
below). The bug causes the last few K-bytes of transmitted data to be
dropped.
I have built a patch which I believe properly fixes the problem. It is
my hope that someone more knowledgeable than I will examine my analysis
and one of two results will occur. Either, this patch will be adopted
into future Linux kernel releases (1.2.x and 1.3.x), or a more
appropriate solution will be determined and a patch built.
The following text describes my investigation of the problem and my
proposed patch.
Mark Yarvis
yarvis@ficus.cs.ucla.edu
-=-
-=-
1. Boiling It Down
I boiled down my initial problem to the following simple command that
simulates the same effect:
cat file | rsh host cat | tail
where file is some sufficiently large file and host is the name of a
remote host. Following is a sample execution of one such command.
yeager> cat /usr/lib/ispell/ispell.words | rsh ogmore cat | tail
wish
wished
wisher
wishers
wishes
wishful
wishing
wisp
wisp's
wisyeager>
In this case, the file is a large text file, yeager is the name of the
local machine (running Linux/Slackware 2.1), and ogmore is the remote
machine (running SunOS 4.1.1).
The end of the file is being cut off before it is sent to the tail
program. Note that the point at which the cutoff occurs may vary from
one execution to the next.
After testing various combinations of local and remote OS, I determined
that the type of OS on the remote system was unrelated to the problem.
2. Taking Apart "rsh"
To track the problem further, I obtained source for rsh (1.1). I
determined that in rsh, the final read() call on the communication
socket was returning ECONNRESET. A call to ioctl() confirmed that
additional data was in the input buffer waiting to be read. The
error was preventing read() from returning the data.
3. In the Kernel
At this point, I decided to search the Kernel code (1.2.9) for the
source of the error. Looking in the networking code, I found 2
places where ECONNRESET can be set (both in the TCP code).
Further investigation revealed that ECONNRESET was, in this case,
being set from within tcp_std_reset() which is called when a RST
packet is received.
Since the behavior encountered (the inability to get more data via
read) is consistent with the receipt of a RST packet, the next step
was to determine why a RST is being sent in the first place.
4. Verified Using "tcpdump"
Using the "tcpdump" utility, I watched the end of a conversation
generated by the rsh command.
[ The general format of the tcpdump output is:
Time-stamp src > dst: flags data-segno ack window
where flags is a combination of S (SYN), F (FIN), P (PUSH),
R (RST), or . (no flags) ]
14:52:22.603771 yeager.1022 > ogmore.shell: P 348773:350209(1436)
ack 350208 win 14335
14:52:22.604144 yeager.1022 > ogmore.shell: P 350209:350430(221)
ack 350208 win 14335
14:52:22.604533 yeager.1022 > ogmore.shell: F 350430:350430(0)
ack 350208 win 14335
14:52:22.605010 yeager.1022 > ogmore.shell: . ack 351232 win 13751
14:52:22.605358 ogmore.shell > yeager.1022: . ack 348773 win 4096
14:52:22.605679 ogmore.shell > yeager.1022: P 351232:352668(1436)
ack 348773 win 4096
14:52:22.606033 ogmore.shell > yeager.1022: . ack 350431 win 2439
14:52:22.606404 ogmore.shell > yeager.1022: . ack 350431 win 4096
14:52:22.607408 ogmore.shell > yeager.1022: . 352668:354104(1436)
ack 350431 win 4096
14:52:22.607922 yeager.1022 > ogmore.shell: . ack 352668 win 12960
14:52:22.608538 yeager.1022 > ogmore.shell: . ack 354104 win 12170
14:52:22.609109 ogmore.shell > yeager.1022: FP 354104:354325(221)
ack 350431 win 4096
14:52:22.609702 yeager.1022 > ogmore.shell: . ack 354326 win 11987
14:52:22.613374 yeager.1022 > ogmore.shell: . ack 354326 win 14152
14:52:22.614038 ogmore.shell > yeager.1022: R 379554328:379554328(0)
win 4096
14:52:22.616608 ogmore.1022 > yeager.1021: F 379264001:379264001(0)
ack -615138292 win 4096
14:52:22.617133 yeager.1021 > ogmore.1022: . ack 1 win 14334
14:52:22.617470 yeager.1021 > ogmore.1022: F 1:1(0) ack 1 win 14335
14:52:22.617919 ogmore.1022 > yeager.1021: . ack 2 win 4095
(Note that port 1021 is handling std-err and 1022 is handling
std-in/out)
In this conversation, yeager (the client) sends ogmore (the server) a
FIN (finish) to represent EOF on the input stream (a half close).
Ogmore acknowledges this FIN. Now yeager is in the state FIN_WAIT_2
and ogmore is in the state CLOSE_WAIT.
A short time later, ogmore completes the close by sending a FIN.
Ogmore is now in the LAST_ACK state. Yeager sends an ACK. Yeager is
now in the TIME_WAIT state and ogmore is CLOSED.
Immediately after this ACK, yeager sends another ack. Since ogmore
is CLOSED, this is an error, and it replies with a RST (reset).
When yeager receives the RST, it sets the error code ECONNRESET on
the socket. All subsequent read() requests on this socket will fail
and return this code. Thus, if yeager receives the RST before rsh
performs its final read() on the socket, it will be unable to read
all of the data sent by ogmore.
According to RFC 793 (under "Reset Processing" in section 3.4), when
a RST arrives, the user should be advised and the socket should
switch to the CLOSED state. It is unclear whether previously
received data should be available to the user. I believe, however,
that yeager's immediate return of ECONNRESET is appropriate.
According to RFC 793 (under "Reset Generation" in section 3.4),
ogmore should send a RST if it receives an ACK while it is in the
CLOSED state. So, it is yeager's extra ACK that is in error.
5. Tracking Down the Extra ACK
Further kernel debugging revealed that the second ACK was being sent
in the function tcp_read_wakeup(). In this function, an ACK is sent
to update the window on the remote end.
This extra ACK is being sent from this function after the socket has
already acknowledged a FIN. At this point, the local socket is
either CLOSED or in the TIME_WAIT state. The remote socket is CLOSED
and should not receive any packets.
6. Possible Solutions
There are three possible solutions:
A. Delay the reporting of a RST to the user. This would allow the
user to perform read() calls to obtain the remaining data. I do
not believe, however, that this conforms to RFC 793.
B. Prevent tcp_read_wakeup() from being called on a CLOSED socket.
It is my belief, however, that tcp_read_wakeup() is being called
from outside the TCP layer. If this is the case, then the
calling code does not know the status of the socket.
C. Prevent tcp_read_wakeup() from sending an ACK if it is in the
CLOSED or TIME_WAIT states. This should work correctly since a
socket should only be in these two states if the remote socket is
CLOSED. This will avoid sending an ACK to a CLOSED socket.
7. Building a Patch
I believe that option C is the correct fix. The following patch
implements that change. It places code in tcp_read_wakeup() that
checks the state before doing any work.
With this patch, the rsh problems described above will go away.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
--- linux/net/inet/tcp.c.orig Fri Jul 7 13:23:29 1995
+++ linux/net/inet/tcp.c Mon Jul 10 11:56:09 1995
@@ -135,6 +135,8 @@
* Alan Cox : tcp_data() doesn't ack illegal PSH
* only frames. At least one pc tcp stack
* generates them.
+ * Mark Yarvis : In tcp_read_wakeup(), don't send an
+ * ack if stat is TCP_CLOSED.
*
*
* To Fix:
@@ -1801,6 +1803,13 @@
if (!sk->ack_backlog)
return;
+
+ /*
+ * If we're closed, don't send an ack, or we'll get a RST
+ * from the closed destination.
+ */
+ if ((sk->state == TCP_CLOSE) || (sk->state == TCP_TIME_WAIT))
+ return;
/*
* FIXME: we need to put code here to prevent this routine from
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-