[15496] in Athena Bugs
Athena 8.1.9: Solaris nfs doesn't track errors right...
daemon@ATHENA.MIT.EDU (John Hawkinson)
Sun Sep 14 13:28:44 1997
Date: Sun, 14 Sep 1997 13:28:39 -0400
To: bugs@MIT.EDU
From: John Hawkinson <jhawk@MIT.EDU>
This is not a complete bug report.
I first noticed this with the 2.6 August beta installed
on deathtongue, and found similar symptoms under 2.5.1 on portnoy.
It appears that Solaris doesn't properly handle errors received from
remote nfs servers. There are two problems:
1. It shouldn't do what's causing the error in the first
place.
2. It should update the error countes when it gets an
error.
A specific example is mounting the NFS-AFS translators, for instance
ni.mit.edu. By default Solaris mounts with an rsize of 8192 bytes.
Somehow this results in a problem such that "ls /afs" returns no
output.
Performing this sequence:
Mount ni:/afs w/ rsize=8192
Start tcpdump
Zero nfs client statistics (nfsstat -cz)
ls /ni/afs
nfssstat -c
Produces two items of note. tcpdump (-Nvvs1500) shows:
13:14:35.665630 portnoy.5c3682d2 > NI.nfs: 160 readdir fh 0,1/1 1048 bytes @ 0 (DF) (ttl 255, id 47226)
13:14:35.671014 NI.nfs > portnoy.5c3682d2: reply ok 28 readdir ERROR: Invalid argument [|nfs] (ttl 253, id 50143)
(there's likely a tcpdump bug if it claims that the packet is truncated...)
A diff of /tmp/[12] shows no interesting incrementing error counters. Here's
the dump:
---cut nfs
Client rpc:
Connection oriented:
calls badcalls badxids timeouts newcreds badverfs timers
0 0 0 0 0 0 0
cantconn nomem interrupts
0 0 0
Connectionless:
calls badcalls retrans badxids timeouts newcreds badverfs
4 0 0 0 0 0 0
timers nomem cantsend
0 0 0
Client nfs:
calls badcalls clgets cltoomany
4 0 4 0
Version 2: (4 calls)
null getattr setattr root lookup readlink read
0 0% 3 75% 0 0% 0 0% 0 0% 0 0% 0 0%
wrcache write create remove rename link symlink
0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
mkdir rmdir readdir statfs
0 0% 0 0% 1 25% 0 0%
Version 3: (0 calls)
null getattr setattr lookup access readlink read
0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
write create mkdir symlink mknod remove rmdir
0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
rename link readdir readdir+ fsstat fsinfo pathconf
0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
commit
0 0%
Client nfs_acl:
Version 2: (0 calls)
null getacl setacl getattr access
0 0% 0 0% 0 0% 0 0% 0 0%
Version 3: (0 calls)
null getacl setacl
0 0% 0 0% 0 0%
---cut nfs
Running snoop (-s 1500) instead gives us:
portnoy.MIT.EDU -> NI.MIT.EDU NFS C READDIR2 FH=0920 Cookie=0
NI.MIT.EDU -> portnoy.MIT.EDU NFS R READDIR2 Invalid argument
or with -V:
________________________________
portnoy.MIT.EDU -> NI.MIT.EDU ETHER Type=0800 (IP), size = 202 bytes
portnoy.MIT.EDU -> NI.MIT.EDU IP D=18.72.1.157 S=18.70.0.218 LEN=188, ID=63938
portnoy.MIT.EDU -> NI.MIT.EDU UDP D=2049 S=1022 LEN=168
portnoy.MIT.EDU -> NI.MIT.EDU RPC C XID=1547076324 PROG=100003 (NFS) VERS=2 PROC=16
portnoy.MIT.EDU -> NI.MIT.EDU NFS C READDIR2 FH=0920 Cookie=0
________________________________
NI.MIT.EDU -> portnoy.MIT.EDU ETHER Type=0800 (IP), size = 70 bytes
NI.MIT.EDU -> portnoy.MIT.EDU IP D=18.70.0.218 S=18.72.1.157 LEN=56, ID=65247
NI.MIT.EDU -> portnoy.MIT.EDU UDP D=1022 S=2049 LEN=36
NI.MIT.EDU -> portnoy.MIT.EDU RPC R (#7) XID=1547076324 Success
NI.MIT.EDU -> portnoy.MIT.EDU NFS R READDIR2 Invalid argument
or with -v (output editted):
portnoy -> ni
RPC: ----- SUN RPC Header -----
RPC:
RPC: Transaction id = 1547076330
RPC: Type = 0 (Call)
RPC: RPC version = 2
RPC: Program = 100003 (NFS), version = 2, procedure = 16
RPC: Credentials: Flavor = 1 (Unix), len = 80 bytes
RPC: Time = 14-Sep-97 17:24:29
RPC: Hostname = portnoy
RPC: Uid = 0, Gid = 1
RPC: Groups = 33536 32630 1 0 2 3 4 5 6 7 8 9 12
RPC: Verifier : Flavor = 0 (None), len = 0 bytes
RPC:
NFS: ----- Sun NFS -----
NFS:
NFS: Proc = 16 (Read from directory)
NFS: File handle = 01000000010000000900002001000000
NFS: 01000000000000000000000000000000
NFS: Cookie = 0
NFS: Count = 1048
NFS:
ni -> portnoy
RPC: ----- SUN RPC Header -----
RPC:
RPC: Transaction id = 1547076330
RPC: Type = 1 (Reply)
RPC: This is a reply to frame 11
RPC: Status = 0 (Accepted)
RPC: Verifier : Flavor = 0 (None), len = 0 bytes
RPC: Accept status = 0 (Success)
RPC:
NFS: ----- Sun NFS -----
NFS:
NFS: Proc = 16 (Read from directory)
NFS: Status = 22 (Invalid argument)
NFS:
Anyhow, the problem is that the EINVAL should be detectable somehow
by the user. You should just get:
[portnoy!jhawk] ~# ls /mnt
[portnoy!jhawk] ~#
And assume that everything was working ok. Arguably this is the most
important bug here (the interface for detecting problems is often the most
important part of an OS).
I'd reserve judgement as to whether the EINVAL is Solaris' fault or not.
A NetBSD client seems to be able to mount ni:/afs with an rsize
of 8192 bytes, so there's reason to suspect Solaris is at fault.
--jhawk