[414] in linux-net channel archive

home help back first fref pref prev next nref lref last post

(fwd) RDIST again: CPIO & RMT also?

daemon@ATHENA.MIT.EDU (Thomas Koenig)
Sun Jun 4 18:13:47 1995

Date: Sun, 4 Jun 1995 22:57:53 +0200
From: Thomas Koenig <ig25@mvmampc66.ciw.uni-karlsruhe.de>
To: linux-net@vger.rutgers.edu

Sorry if this appears twice; serious mail problems at the moment :-(

Has anybody gotten rmt to work?

From: andreas@vlsivie.tuwien.ac.at (Andreas Haumer)
Newsgroups: comp.os.linux.networking
Subject: RDIST again: CPIO & RMT also?
Followup-To: comp.os.linux.networking
Date: 03 Jun 1995 14:28:33 GMT
Organization: Technical University Vienna, Austria
Lines: 244
Distribution: world
Message-ID: <ANDREAS.95Jun3162833@anaphi.vlsivie.tuwien.ac.at>
NNTP-Posting-Host: anaphi.vlsivie.tuwien.ac.at

Hi!

A few days ago there was a discussion about problems with "rdist" hanging
and Thomas Koenig (and others I forgot, sorry!) posted a patch to solve this
problem.

I have similar problems when trying to read a backup with cpio on "Machine A" 
from a SCSI tape connected to a remote system "Machine B" using rmt.
After a few MBytes read, the cpio/rmt processes on both systems just hang.
It works fine when using cpio on the local device.

I don't use rdist so I can't check if it's the same problem, but it looks
like that.
I checked out the "rdist" diffs and I was confused. Why do we have to put this
"while" statements around the write() system-calls? Do I miss something here?
I tried to modify cpio/rmt according to the rdist patches, but as I don't
understand exactly what's going on, I had no luck.

Here's what happens on my system:

Machine A:
=========

Command: cpio -it -H newc -B -F fucker:/dev/rmt0
(GNU cpio version 2.3)

"strace" of the cpio command:

[...]
read(6, "A", 1)                         = 1
read(6, "5", 1)                         = 1
read(6, "1", 1)                         = 1
read(6, "2", 1)                         = 1
read(6, "0", 1)                         = 1
read(6, "\n", 1)                        = 1
read(6, "\213\3\203\303\4\377\320\203;\0u"..., 5120) = 4096
read(6, "gprof.c\t5.6 (Berkeley) 6/1/90\0"..., 1024) = 1024
sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}) = 0
write(5, "R5120\n", 6)                  = 6
sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}) = 0
read(6, "A", 1)                         = 1
read(6, "5", 1)                         = 1
read(6, "1", 1)                         = 1
read(6, "2", 1)                         = 1
read(6, "0", 1)                         = 1
read(6, "\n", 1)                        = 1
read(6, "\377\377\377\377\377\377\377\377"..., 5120) = 4096
read(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}) = 0
write(5, "R5120\n", 6)                  = 6
sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}) = 0
read(6, "A", 1)                         = 1
read(6, "5", 1)                         = 1
read(6, "1", 1)                         = 1
read(6, "2", 1)                         = 1
read(6, "0", 1)                         = 1
read(6, "\n", 1)                        = 1
read(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 5120) = 4096
read(6, "\t.text\n\0\t.fill %d,1,0\n\0\t."..., 1024) = 1024
write(1, "usr/bin/mkimage\n", 16)       = 16
sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}) = 0
write(5, "R5120\n", 6)                  = 6    <= next "R" (read) command
sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}) = 0
read(6,                                        <= here it hangs, waiting
                                                  for data from machine B!

Machine B:
=========

"strace" of running /etc/rmt process ("rmt" from GNU cpio version 2.3):

[...]
write(1, "A5120\n", 6)                  = 6
write(1, "\213\3\203\303\4\377\320\203;\0u"..., 5120) = 5120
read(0, "R", 1)                         = 1
read(0, "5", 1)                         = 1
read(0, "1", 1)                         = 1
read(0, "2", 1)                         = 1
read(0, "0", 1)                         = 1
read(0, "\n", 1)                        = 1
read(4, "\377\377\377\377\377\377\377\377"..., 5120) = 5120
write(1, "A5120\n", 6)                  = 6
write(1, "\377\377\377\377\377\377\377\377"..., 5120) = 5120
read(0, "R", 1)                         = 1
read(0, "5", 1)                         = 1
read(0, "1", 1)                         = 1
read(0, "2", 1)                         = 1
read(0, "0", 1)                         = 1
read(0, "\n", 1)                        = 1
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 5120) = 5120
write(1, "A5120\n", 6)                  = 6
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 5120) = 5120
read(0,                                        <= Here it hangs, waiting
                                                  for the next command from 
                                                  machine A

In this situation, netstat -a tells (excerpt):

Machine A:
=========

Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address          Foreign Address        (State)       
User
[...]
tcp        0      6 gatekeeper.my.lin:1023 fucker.my.linux.:shell ESTABLISHED   
root       
[...]


Machine B:
=========

Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address          Foreign Address        (State)       
User
[...]
tcp        0      0 fucker.my.linux.:shell gatekeeper.my.lin:1023 ESTABLISHED   root       
[...]

It looks like the last cpio "R" command is hanging (forever?) somewhere in the 
network send-queue on machine A ?!?

The last write() from "cpio" on machine A to the "rmt" server on machine B 
returned a value of 6 (indicating 6 bytes written to file-descriptor 5), thus 
indicating everything went ok (according to ANSI and POSIX standards), so the 
"cpio" process continues waiting for the answer, which doesn't come because 
the "rmt" process never received this command -> dead-lock situation!

To terminate the hanging processes, I had to interrupt the cpio process (with
SIGINT, Ctrl-C).

The sockets then entered the following states, for at least several minutes:

Machine A:
=========

Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address          Foreign Address        (State)       
User
[...]
tcp        0      7 gatekeeper.my.lin:1023 fucker.my.linux.:shell LAST_ACK      
root       
[...]

Machine B:
=========

Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address          Foreign Address        (State)       
User
[...]
tcp        0      0 fucker.my.linux.:shell gatekeeper.my.lin:1023 FIN_WAIT2     
root       
[...]


This seems to be a Linux-related problem (bug?), despite of the possibility
of some way around it!


Here's my system setup:

Machine A:
=========

Hardware:
i386DX-33, ISA, 8MB Ram
scsi0 : Adaptec 1542 at IO:330, IRQ 11, DMA priority 5
eth0: SMC Ultra at 0x300, 00 00 C0 9E 48 6A, IRQ 10 memory 0xcc000-0xcffff

uname -a:
Linux gatekeeper 1.2.5 #1 Tue Apr 18 20:17:21 MET DST 1995 i386

uptime:
  1:04am  up 40 days,  8:03,  4 users,  load average: 0.10, 0.24, 0.18

ifconfig:
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Bcast:127.255.255.255  Mask:255.0.0.0
          UP BROADCAST LOOPBACK RUNNING  MTU:2000  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0
          TX packets:11501 errors:8 dropped:0 overruns:0

eth0      Link encap:10Mbps Ethernet  HWaddr 00:00:C0:9E:48:6A
          inet addr:192.168.123.1  Bcast:192.168.123.31  Mask:255.255.255.224
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2877594 errors:0 dropped:0 overruns:0
          TX packets:2838685 errors:0 dropped:0 overruns:0
          Interrupt:10 Base address:0x310 Memory:cc000-d0000 

route:
Kernel routing table
Destination     Gateway         Genmask         Flags MSS    Window Use Iface
localhost       *               255.255.255.255 UH    1936   0    13177 lo
192.168.123.0   *               255.255.255.224 U     1436   0   2838590 eth0


Machine B:
=========

Hardware:
i486DX2-66, EISA, 32MB Ram
scsi0: Adaptec 1742 at IO:1c80, IRQ 11
eth0: 3c509 at 0x2000 tag 0, BNC port, address  00 60 8c 52 ec 77, IRQ 10.
Tape: HP35480A, external, on /dev/rmt0

uname -a:
Linux fucker 1.2.8 #1 Fri May 5 20:10:05 MET DST 1995 i486

uptime:
  1:03am  up 43 min,  9 users,  load average: 0.18, 0.14, 0.15

ifconfig:
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Bcast:127.255.255.255  Mask:255.0.0.0
          UP BROADCAST LOOPBACK RUNNING  MTU:2000  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0
          TX packets:4859 errors:0 dropped:0 overruns:0

eth0      Link encap:10Mbps Ethernet  HWaddr 00:60:8C:52:EC:77
          inet addr:192.168.123.2  Bcast:192.168.123.31  Mask:255.255.255.224
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:22159 errors:0 dropped:0 overruns:0
          TX packets:20012 errors:0 dropped:0 overruns:0
          Interrupt:10 Base address:0x2000 

route:
Kernel routing table
Destination     Gateway         Genmask         Flags MSS    Window Use Iface
localhost       *               255.255.255.255 UH    1936   0     4927 lo
192.168.123.0   *               255.255.255.224 U     1436   0    20053 eth0
default         gatekeeper.my.l *               UG    1436   0        0 eth0


Any ideas, anyone?

- andreas
--
----------------------+------------------------------+-------------------------
 andreas haumer       | andreas@vlsivie.tuwien.ac.at |
 buchengasse 67/8     | tel:  +43.1.6001508  (ISDN)  |
 a-1100 vienna        |       +43.664.3004449 (GSM)  | god is real - 
 austria              | fax:  +43.1.6001084          | unless declared integer

home help back first fref pref prev next nref lref last post