[689] in linux-net channel archive

home help back first fref pref prev next nref lref last post

TCP bug and potential fix

daemon@ATHENA.MIT.EDU (Mark Yarvis)
Sat Jul 15 03:12:11 1995

Date: Fri, 14 Jul 95 10:25:10 -0700
From: yarvis@FICUS.CS.UCLA.EDU (Mark Yarvis)
To: linux-net@vger.rutgers.edu

In porting a custom program to Linux, I believe I've found a bug in the
Linux TCP code.

The bug causes TCP connections to be prematurely reset and data lost.

The problem can be recreated using a simple rsh command (described
below).  The bug causes the last few K-bytes of transmitted data to be
dropped.

I have built a patch which I believe properly fixes the problem.  It is
my hope that someone more knowledgeable than I will examine my analysis
and one of two results will occur.  Either, this patch will be adopted
into future Linux kernel releases (1.2.x and 1.3.x), or a more
appropriate solution will be determined and a patch built.

The following text describes my investigation of the problem and my
proposed patch.

Mark Yarvis
yarvis@ficus.cs.ucla.edu

-=-
-=-

1.  Boiling It Down

   I boiled down my initial problem to the following simple command that
   simulates the same effect:

	cat file | rsh host cat | tail

   where file is some sufficiently large file and host is the name of a
   remote host.  Following is a sample execution of one such command.

	yeager> cat /usr/lib/ispell/ispell.words | rsh ogmore cat | tail
	wish
	wished
	wisher
	wishers
	wishes
	wishful
	wishing
	wisp
	wisp's
	wisyeager> 

   In this case, the file is a large text file, yeager is the name of the 
   local machine (running Linux/Slackware 2.1), and ogmore is the remote 
   machine (running SunOS 4.1.1).

   The end of the file is being cut off before it is sent to the tail
   program.  Note that the point at which the cutoff occurs may vary from
   one execution to the next.

   After testing various combinations of local and remote OS, I determined
   that the type of OS on the remote system was unrelated to the problem.


2.  Taking Apart "rsh"

   To track the problem further, I obtained source for rsh (1.1).  I
   determined that in rsh, the final read() call on the communication
   socket was returning ECONNRESET.  A call to ioctl() confirmed that
   additional data was in the input buffer waiting to be read.  The
   error was preventing read() from returning the data.


3.  In the Kernel

   At this point, I decided to search the Kernel code (1.2.9) for the
   source of the error.  Looking in the networking code, I found 2
   places where ECONNRESET can be set (both in the TCP code).

   Further investigation revealed that ECONNRESET was, in this case,
   being set from within tcp_std_reset() which is called when a RST
   packet is received.

   Since the behavior encountered (the inability to get more data via
   read) is consistent with the receipt of a RST packet, the next step
   was to determine why a RST is being sent in the first place.


4.  Verified Using "tcpdump"

   Using the "tcpdump" utility, I watched the end of a conversation
   generated by the rsh command.

   [ The general format of the tcpdump output is:
	Time-stamp src > dst: flags data-segno ack window
     where flags is a combination of S (SYN), F (FIN), P (PUSH), 
     R (RST), or . (no flags) ]

	14:52:22.603771  yeager.1022 > ogmore.shell: P 348773:350209(1436) 
							ack 350208 win 14335
	14:52:22.604144  yeager.1022 > ogmore.shell: P 350209:350430(221) 
							ack 350208 win 14335
	14:52:22.604533  yeager.1022 > ogmore.shell: F 350430:350430(0) 
							ack 350208 win 14335
	14:52:22.605010  yeager.1022 > ogmore.shell: . ack 351232 win 13751
	14:52:22.605358  ogmore.shell > yeager.1022: . ack 348773 win 4096
	14:52:22.605679  ogmore.shell > yeager.1022: P 351232:352668(1436) 
							ack 348773 win 4096
	14:52:22.606033  ogmore.shell > yeager.1022: . ack 350431 win 2439
	14:52:22.606404  ogmore.shell > yeager.1022: . ack 350431 win 4096
	14:52:22.607408  ogmore.shell > yeager.1022: . 352668:354104(1436) 
							ack 350431 win 4096
	14:52:22.607922  yeager.1022 > ogmore.shell: . ack 352668 win 12960
	14:52:22.608538  yeager.1022 > ogmore.shell: . ack 354104 win 12170
	14:52:22.609109  ogmore.shell > yeager.1022: FP 354104:354325(221) 
							ack 350431 win 4096
	14:52:22.609702  yeager.1022 > ogmore.shell: . ack 354326 win 11987
	14:52:22.613374  yeager.1022 > ogmore.shell: . ack 354326 win 14152
	14:52:22.614038  ogmore.shell > yeager.1022: R 379554328:379554328(0) 
							win 4096
	14:52:22.616608  ogmore.1022 > yeager.1021: F 379264001:379264001(0) 
							ack -615138292 win 4096
	14:52:22.617133  yeager.1021 > ogmore.1022: . ack 1 win 14334
	14:52:22.617470  yeager.1021 > ogmore.1022: F 1:1(0) ack 1 win 14335
	14:52:22.617919  ogmore.1022 > yeager.1021: . ack 2 win 4095

   (Note that port 1021 is handling std-err and 1022 is handling
   std-in/out)

   In this conversation, yeager (the client) sends ogmore (the server) a
   FIN (finish) to represent EOF on the input stream (a half close).
   Ogmore acknowledges this FIN.  Now yeager is in the state FIN_WAIT_2
   and ogmore is in the state CLOSE_WAIT.

   A short time later, ogmore completes the close by sending a FIN.
   Ogmore is now in the LAST_ACK state.  Yeager sends an ACK.  Yeager is
   now in the TIME_WAIT state and ogmore is CLOSED.

   Immediately after this ACK, yeager sends another ack.  Since ogmore
   is CLOSED, this is an error, and it replies with a RST (reset).

   When yeager receives the RST, it sets the error code ECONNRESET on
   the socket.  All subsequent read() requests on this socket will fail
   and return this code.  Thus, if yeager receives the RST before rsh
   performs its final read() on the socket, it will be unable to read
   all of the data sent by ogmore.

   According to RFC 793 (under "Reset Processing" in section 3.4), when
   a RST arrives, the user should be advised and the socket should
   switch to the CLOSED state.  It is unclear whether previously
   received data should be available to the user.  I believe, however,
   that yeager's immediate return of ECONNRESET is appropriate.

   According to RFC 793 (under "Reset Generation" in section 3.4),
   ogmore should send a RST if it receives an ACK while it is in the
   CLOSED state.  So, it is yeager's extra ACK that is in error.


5.  Tracking Down the Extra ACK

   Further kernel debugging revealed that the second ACK was being sent
   in the function tcp_read_wakeup().  In this function, an ACK is sent
   to update the window on the remote end.

   This extra ACK is being sent from this function after the socket has
   already acknowledged a FIN.  At this point, the local socket is
   either CLOSED or in the TIME_WAIT state.  The remote socket is CLOSED
   and should not receive any packets.

6.  Possible Solutions

   There are three possible solutions:

   A.  Delay the reporting of a RST to the user.  This would allow the
       user to perform read() calls to obtain the remaining data.  I do
       not believe, however, that this conforms to RFC 793.

   B.  Prevent tcp_read_wakeup() from being called on a CLOSED socket.
       It is my belief, however, that tcp_read_wakeup() is being called
       from outside the TCP layer.  If this is the case, then the
       calling code does not know the status of the socket.

   C.  Prevent tcp_read_wakeup() from sending an ACK if it is in the
       CLOSED or TIME_WAIT states.  This should work correctly since a
       socket should only be in these two states if the remote socket is
       CLOSED.  This will avoid sending an ACK to a CLOSED socket.


7.  Building a Patch

   I believe that option C is the correct fix.  The following patch
   implements that change.  It places code in tcp_read_wakeup() that
   checks the state before doing any work.

   With this patch, the rsh problems described above will go away.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
--- linux/net/inet/tcp.c.orig	Fri Jul  7 13:23:29 1995
+++ linux/net/inet/tcp.c	Mon Jul 10 11:56:09 1995
@@ -135,6 +135,8 @@
  *		Alan Cox	:	tcp_data() doesn't ack illegal PSH
  *					only frames. At least one pc tcp stack
  *					generates them.
+ *		Mark Yarvis	:	In tcp_read_wakeup(), don't send an
+ *					ack if stat is TCP_CLOSED.
  *
  *
  * To Fix:
@@ -1801,6 +1803,13 @@
 
 	if (!sk->ack_backlog) 
 		return;
+
+	/*
+	 * If we're closed, don't send an ack, or we'll get a RST
+	 * from the closed destination.
+	 */
+	if ((sk->state == TCP_CLOSE) || (sk->state == TCP_TIME_WAIT))
+		return; 
 
 	/*
 	 * FIXME: we need to put code here to prevent this routine from
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

home help back first fref pref prev next nref lref last post