[414] in linux-net channel archive
(fwd) RDIST again: CPIO & RMT also?
daemon@ATHENA.MIT.EDU (Thomas Koenig)
Sun Jun 4 18:13:47 1995
Date: Sun, 4 Jun 1995 22:57:53 +0200
From: Thomas Koenig <ig25@mvmampc66.ciw.uni-karlsruhe.de>
To: linux-net@vger.rutgers.edu
Sorry if this appears twice; serious mail problems at the moment :-(
Has anybody gotten rmt to work?
From: andreas@vlsivie.tuwien.ac.at (Andreas Haumer)
Newsgroups: comp.os.linux.networking
Subject: RDIST again: CPIO & RMT also?
Followup-To: comp.os.linux.networking
Date: 03 Jun 1995 14:28:33 GMT
Organization: Technical University Vienna, Austria
Lines: 244
Distribution: world
Message-ID: <ANDREAS.95Jun3162833@anaphi.vlsivie.tuwien.ac.at>
NNTP-Posting-Host: anaphi.vlsivie.tuwien.ac.at
Hi!
A few days ago there was a discussion about problems with "rdist" hanging
and Thomas Koenig (and others I forgot, sorry!) posted a patch to solve this
problem.
I have similar problems when trying to read a backup with cpio on "Machine A"
from a SCSI tape connected to a remote system "Machine B" using rmt.
After a few MBytes read, the cpio/rmt processes on both systems just hang.
It works fine when using cpio on the local device.
I don't use rdist so I can't check if it's the same problem, but it looks
like that.
I checked out the "rdist" diffs and I was confused. Why do we have to put this
"while" statements around the write() system-calls? Do I miss something here?
I tried to modify cpio/rmt according to the rdist patches, but as I don't
understand exactly what's going on, I had no luck.
Here's what happens on my system:
Machine A:
=========
Command: cpio -it -H newc -B -F fucker:/dev/rmt0
(GNU cpio version 2.3)
"strace" of the cpio command:
[...]
read(6, "A", 1) = 1
read(6, "5", 1) = 1
read(6, "1", 1) = 1
read(6, "2", 1) = 1
read(6, "0", 1) = 1
read(6, "\n", 1) = 1
read(6, "\213\3\203\303\4\377\320\203;\0u"..., 5120) = 4096
read(6, "gprof.c\t5.6 (Berkeley) 6/1/90\0"..., 1024) = 1024
sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}) = 0
write(5, "R5120\n", 6) = 6
sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}) = 0
read(6, "A", 1) = 1
read(6, "5", 1) = 1
read(6, "1", 1) = 1
read(6, "2", 1) = 1
read(6, "0", 1) = 1
read(6, "\n", 1) = 1
read(6, "\377\377\377\377\377\377\377\377"..., 5120) = 4096
read(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}) = 0
write(5, "R5120\n", 6) = 6
sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}) = 0
read(6, "A", 1) = 1
read(6, "5", 1) = 1
read(6, "1", 1) = 1
read(6, "2", 1) = 1
read(6, "0", 1) = 1
read(6, "\n", 1) = 1
read(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 5120) = 4096
read(6, "\t.text\n\0\t.fill %d,1,0\n\0\t."..., 1024) = 1024
write(1, "usr/bin/mkimage\n", 16) = 16
sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}) = 0
write(5, "R5120\n", 6) = 6 <= next "R" (read) command
sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}) = 0
read(6, <= here it hangs, waiting
for data from machine B!
Machine B:
=========
"strace" of running /etc/rmt process ("rmt" from GNU cpio version 2.3):
[...]
write(1, "A5120\n", 6) = 6
write(1, "\213\3\203\303\4\377\320\203;\0u"..., 5120) = 5120
read(0, "R", 1) = 1
read(0, "5", 1) = 1
read(0, "1", 1) = 1
read(0, "2", 1) = 1
read(0, "0", 1) = 1
read(0, "\n", 1) = 1
read(4, "\377\377\377\377\377\377\377\377"..., 5120) = 5120
write(1, "A5120\n", 6) = 6
write(1, "\377\377\377\377\377\377\377\377"..., 5120) = 5120
read(0, "R", 1) = 1
read(0, "5", 1) = 1
read(0, "1", 1) = 1
read(0, "2", 1) = 1
read(0, "0", 1) = 1
read(0, "\n", 1) = 1
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 5120) = 5120
write(1, "A5120\n", 6) = 6
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 5120) = 5120
read(0, <= Here it hangs, waiting
for the next command from
machine A
In this situation, netstat -a tells (excerpt):
Machine A:
=========
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address Foreign Address (State)
User
[...]
tcp 0 6 gatekeeper.my.lin:1023 fucker.my.linux.:shell ESTABLISHED
root
[...]
Machine B:
=========
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address Foreign Address (State)
User
[...]
tcp 0 0 fucker.my.linux.:shell gatekeeper.my.lin:1023 ESTABLISHED root
[...]
It looks like the last cpio "R" command is hanging (forever?) somewhere in the
network send-queue on machine A ?!?
The last write() from "cpio" on machine A to the "rmt" server on machine B
returned a value of 6 (indicating 6 bytes written to file-descriptor 5), thus
indicating everything went ok (according to ANSI and POSIX standards), so the
"cpio" process continues waiting for the answer, which doesn't come because
the "rmt" process never received this command -> dead-lock situation!
To terminate the hanging processes, I had to interrupt the cpio process (with
SIGINT, Ctrl-C).
The sockets then entered the following states, for at least several minutes:
Machine A:
=========
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address Foreign Address (State)
User
[...]
tcp 0 7 gatekeeper.my.lin:1023 fucker.my.linux.:shell LAST_ACK
root
[...]
Machine B:
=========
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address Foreign Address (State)
User
[...]
tcp 0 0 fucker.my.linux.:shell gatekeeper.my.lin:1023 FIN_WAIT2
root
[...]
This seems to be a Linux-related problem (bug?), despite of the possibility
of some way around it!
Here's my system setup:
Machine A:
=========
Hardware:
i386DX-33, ISA, 8MB Ram
scsi0 : Adaptec 1542 at IO:330, IRQ 11, DMA priority 5
eth0: SMC Ultra at 0x300, 00 00 C0 9E 48 6A, IRQ 10 memory 0xcc000-0xcffff
uname -a:
Linux gatekeeper 1.2.5 #1 Tue Apr 18 20:17:21 MET DST 1995 i386
uptime:
1:04am up 40 days, 8:03, 4 users, load average: 0.10, 0.24, 0.18
ifconfig:
lo Link encap:Local Loopback
inet addr:127.0.0.1 Bcast:127.255.255.255 Mask:255.0.0.0
UP BROADCAST LOOPBACK RUNNING MTU:2000 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0
TX packets:11501 errors:8 dropped:0 overruns:0
eth0 Link encap:10Mbps Ethernet HWaddr 00:00:C0:9E:48:6A
inet addr:192.168.123.1 Bcast:192.168.123.31 Mask:255.255.255.224
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2877594 errors:0 dropped:0 overruns:0
TX packets:2838685 errors:0 dropped:0 overruns:0
Interrupt:10 Base address:0x310 Memory:cc000-d0000
route:
Kernel routing table
Destination Gateway Genmask Flags MSS Window Use Iface
localhost * 255.255.255.255 UH 1936 0 13177 lo
192.168.123.0 * 255.255.255.224 U 1436 0 2838590 eth0
Machine B:
=========
Hardware:
i486DX2-66, EISA, 32MB Ram
scsi0: Adaptec 1742 at IO:1c80, IRQ 11
eth0: 3c509 at 0x2000 tag 0, BNC port, address 00 60 8c 52 ec 77, IRQ 10.
Tape: HP35480A, external, on /dev/rmt0
uname -a:
Linux fucker 1.2.8 #1 Fri May 5 20:10:05 MET DST 1995 i486
uptime:
1:03am up 43 min, 9 users, load average: 0.18, 0.14, 0.15
ifconfig:
lo Link encap:Local Loopback
inet addr:127.0.0.1 Bcast:127.255.255.255 Mask:255.0.0.0
UP BROADCAST LOOPBACK RUNNING MTU:2000 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0
TX packets:4859 errors:0 dropped:0 overruns:0
eth0 Link encap:10Mbps Ethernet HWaddr 00:60:8C:52:EC:77
inet addr:192.168.123.2 Bcast:192.168.123.31 Mask:255.255.255.224
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:22159 errors:0 dropped:0 overruns:0
TX packets:20012 errors:0 dropped:0 overruns:0
Interrupt:10 Base address:0x2000
route:
Kernel routing table
Destination Gateway Genmask Flags MSS Window Use Iface
localhost * 255.255.255.255 UH 1936 0 4927 lo
192.168.123.0 * 255.255.255.224 U 1436 0 20053 eth0
default gatekeeper.my.l * UG 1436 0 0 eth0
Any ideas, anyone?
- andreas
--
----------------------+------------------------------+-------------------------
andreas haumer | andreas@vlsivie.tuwien.ac.at |
buchengasse 67/8 | tel: +43.1.6001508 (ISDN) |
a-1100 vienna | +43.664.3004449 (GSM) | god is real -
austria | fax: +43.1.6001084 | unless declared integer