[1403] in linux-net channel archive

home help back first fref pref prev next nref lref last post

again: output of /usr/bin/rsh truncated

daemon@ATHENA.MIT.EDU (Dick Streefland)
Sun Nov 19 04:49:22 1995

To: linux-net@vger.rutgers.edu
Date: Sat, 18 Nov 1995 22:55:11 +0100 (MET)
Reply-To: dicks@tasking.nl
From: rnews@tasking.nl (Dick Streefland)

For some time, we are experiencing an occasional truncation of the
output from the 'rsh' command. We were running the rsh command from
NetKit-B-0.05, and it turned out that this version has a bug causing
rsh to terminate premature when the stderr socket is closed before
there is output available on the stdout socket. This bug was fixed
in NetKit-B-0.06.

However, even with the new rsh command, there are still situations
were output is lost. This time the problem is probably in the kernel
(1.3.40). It is simple to reproduce:

$ gzip -9 < /bin/bash | wc
    391    2308  105965
$ rsh localhost 'cat /bin/bash' < /dev/null | gzip -9 | wc
    391    2308  105965
$ rsh localhost 'cat /bin/bash' < /dev/zero | gzip -9 | wc
    345    2133   97100
$ rsh localhost 'cat /bin/bash' < /dev/zero | gzip -9 | wc
    357    2135   97549

The gzip command is included to slow down the output of rsh. The
truncation occurs because the read() from the stdin/stdout socket
in rsh returns an ECONNRESET error.

Because the timing is so critical, I built a test version of rsh
which adds a 0.1 second delay in the loop that reads the remote
output from the socket. I also added perror() calls to report read
or write errors. The modifications are included at the end of this
message.

The modified version of rsh shows the problem more clearly.
Depending on the amount of output of the remote command, an
ECONNRESET or EPIPE (!) errror is generated for the read() from
the socket:

$ rsh localhost 'cat /bin/ls' < /dev/null | wc
    110     840   25604
$ rsh localhost 'cat /bin/ls' < /dev/zero | wc
rsh (stdout): Connection reset by peer
      1       6    2048
$ rsh localhost 'cat /bin/cat' < /dev/null | wc
     30     303   13316
$ rsh localhost 'cat /bin/cat' < /dev/zero | wc
rsh (stdout): Broken pipe
     17     107    2048

I think something like the following is happening:
1) The remote process terminates, shutting down the socket for
   stdin, before the local rsh has read all stdout data.
2) The local rsh writes another stdin data block (successfully).
3) The local rsh tries to read the next data block, but gets the
   error status of the preceding write to the socket.

Any experts out there to shed some light on this?
-- 
Dick Streefland              ////         Tasking Software BV
dicks@tasking.nl            (@ @)             The Netherlands
------------------------oOO--(_)--OOo------------------------

--- NetKit-B-0.06/rsh/rsh.c.orig	Tue Aug 30 09:26:35 1994
+++ NetKit-B-0.06/rsh/rsh.c	Sat Nov 18 19:08:45 1995
@@ -355,6 +355,7 @@
 		if (wc < 0) {
 			if (errno == EWOULDBLOCK)
 				goto rewrite;
+			perror("rsh (stdin)");
 			goto done;
 		}
 		bp += wc;
@@ -375,6 +376,7 @@
 			FD_SET(rfd2, &readfrom);
 		if (rem_ok)
 			FD_SET(rem, &readfrom);
+usleep( 100000 );
 		if (select(16, &readfrom, 0, 0, 0) < 0) {
 			if (errno != EINTR) {
 				(void)fprintf(stderr,
@@ -397,6 +399,8 @@
 				(void)write(2, buf, cc);
 			else if (cc == 0 || errno != EWOULDBLOCK)
 				rfd2_ok = 0;
+			if (cc < 0)
+				perror("rsh (stderr)");
 		}
 		if (FD_ISSET(rem, &readfrom)) {
 			errno = 0;
@@ -412,6 +416,8 @@
 				(void)write(1, buf, cc);
 			else if (cc == 0 || errno != EWOULDBLOCK)
 				rem_ok = 0;
+			if (cc < 0)
+				perror("rsh (stdout)");
 		}
 	}
 }

home help back first fref pref prev next nref lref last post