[1403] in linux-net channel archive
again: output of /usr/bin/rsh truncated
daemon@ATHENA.MIT.EDU (Dick Streefland)
Sun Nov 19 04:49:22 1995
To: linux-net@vger.rutgers.edu
Date: Sat, 18 Nov 1995 22:55:11 +0100 (MET)
Reply-To: dicks@tasking.nl
From: rnews@tasking.nl (Dick Streefland)
For some time, we are experiencing an occasional truncation of the
output from the 'rsh' command. We were running the rsh command from
NetKit-B-0.05, and it turned out that this version has a bug causing
rsh to terminate premature when the stderr socket is closed before
there is output available on the stdout socket. This bug was fixed
in NetKit-B-0.06.
However, even with the new rsh command, there are still situations
were output is lost. This time the problem is probably in the kernel
(1.3.40). It is simple to reproduce:
$ gzip -9 < /bin/bash | wc
391 2308 105965
$ rsh localhost 'cat /bin/bash' < /dev/null | gzip -9 | wc
391 2308 105965
$ rsh localhost 'cat /bin/bash' < /dev/zero | gzip -9 | wc
345 2133 97100
$ rsh localhost 'cat /bin/bash' < /dev/zero | gzip -9 | wc
357 2135 97549
The gzip command is included to slow down the output of rsh. The
truncation occurs because the read() from the stdin/stdout socket
in rsh returns an ECONNRESET error.
Because the timing is so critical, I built a test version of rsh
which adds a 0.1 second delay in the loop that reads the remote
output from the socket. I also added perror() calls to report read
or write errors. The modifications are included at the end of this
message.
The modified version of rsh shows the problem more clearly.
Depending on the amount of output of the remote command, an
ECONNRESET or EPIPE (!) errror is generated for the read() from
the socket:
$ rsh localhost 'cat /bin/ls' < /dev/null | wc
110 840 25604
$ rsh localhost 'cat /bin/ls' < /dev/zero | wc
rsh (stdout): Connection reset by peer
1 6 2048
$ rsh localhost 'cat /bin/cat' < /dev/null | wc
30 303 13316
$ rsh localhost 'cat /bin/cat' < /dev/zero | wc
rsh (stdout): Broken pipe
17 107 2048
I think something like the following is happening:
1) The remote process terminates, shutting down the socket for
stdin, before the local rsh has read all stdout data.
2) The local rsh writes another stdin data block (successfully).
3) The local rsh tries to read the next data block, but gets the
error status of the preceding write to the socket.
Any experts out there to shed some light on this?
--
Dick Streefland //// Tasking Software BV
dicks@tasking.nl (@ @) The Netherlands
------------------------oOO--(_)--OOo------------------------
--- NetKit-B-0.06/rsh/rsh.c.orig Tue Aug 30 09:26:35 1994
+++ NetKit-B-0.06/rsh/rsh.c Sat Nov 18 19:08:45 1995
@@ -355,6 +355,7 @@
if (wc < 0) {
if (errno == EWOULDBLOCK)
goto rewrite;
+ perror("rsh (stdin)");
goto done;
}
bp += wc;
@@ -375,6 +376,7 @@
FD_SET(rfd2, &readfrom);
if (rem_ok)
FD_SET(rem, &readfrom);
+usleep( 100000 );
if (select(16, &readfrom, 0, 0, 0) < 0) {
if (errno != EINTR) {
(void)fprintf(stderr,
@@ -397,6 +399,8 @@
(void)write(2, buf, cc);
else if (cc == 0 || errno != EWOULDBLOCK)
rfd2_ok = 0;
+ if (cc < 0)
+ perror("rsh (stderr)");
}
if (FD_ISSET(rem, &readfrom)) {
errno = 0;
@@ -412,6 +416,8 @@
(void)write(1, buf, cc);
else if (cc == 0 || errno != EWOULDBLOCK)
rem_ok = 0;
+ if (cc < 0)
+ perror("rsh (stdout)");
}
}
}