[15496] in Athena Bugs

home help back first fref pref prev next nref lref last post

Athena 8.1.9: Solaris nfs doesn't track errors right...

daemon@ATHENA.MIT.EDU (John Hawkinson)
Sun Sep 14 13:28:44 1997

Date: Sun, 14 Sep 1997 13:28:39 -0400
To: bugs@MIT.EDU
From: John Hawkinson <jhawk@MIT.EDU>

This is not a complete bug report.

I first noticed this with the 2.6 August beta installed
on deathtongue, and found similar symptoms under 2.5.1 on portnoy.

It appears that Solaris doesn't properly handle errors received from
remote nfs servers. There are two problems:

	1.	It shouldn't do what's causing the error in the first
		place.

	2.	It should update the error countes when it gets an
		error.

A specific example is mounting the NFS-AFS translators, for instance
ni.mit.edu. By default Solaris mounts with an rsize of 8192 bytes.
Somehow this results in a problem such that "ls /afs" returns no
output.

Performing this sequence:

	Mount ni:/afs w/ rsize=8192
	Start tcpdump
	Zero nfs client statistics (nfsstat -cz)
	ls /ni/afs
	nfssstat -c

Produces two items of note. tcpdump (-Nvvs1500) shows:

13:14:35.665630 portnoy.5c3682d2 > NI.nfs: 160 readdir fh 0,1/1 1048 bytes @ 0 (DF) (ttl 255, id 47226)
13:14:35.671014 NI.nfs > portnoy.5c3682d2: reply ok 28 readdir ERROR: Invalid argument [|nfs] (ttl 253, id 50143)

(there's likely a tcpdump bug if it claims that the packet is truncated...)

A diff of /tmp/[12] shows no interesting incrementing error counters. Here's
the dump:

---cut nfs
Client rpc:
Connection oriented:
calls      badcalls   badxids    timeouts   newcreds   badverfs   timers     
0          0          0          0          0          0          0          
cantconn   nomem      interrupts 
0          0          0          
Connectionless:
calls      badcalls   retrans    badxids    timeouts   newcreds   badverfs   
4          0          0          0          0          0          0          
timers     nomem      cantsend   
0          0          0          
 
Client nfs:
calls      badcalls   clgets     cltoomany  
4          0          4          0          
Version 2: (4 calls)
null       getattr    setattr    root       lookup     readlink   read       
0 0%       3 75%      0 0%       0 0%       0 0%       0 0%       0 0%       
wrcache    write      create     remove     rename     link       symlink    
0 0%       0 0%       0 0%       0 0%       0 0%       0 0%       0 0%       
mkdir      rmdir      readdir    statfs     
0 0%       0 0%       1 25%      0 0%       
Version 3: (0 calls)
null       getattr    setattr    lookup     access     readlink   read       
0 0%       0 0%       0 0%       0 0%       0 0%       0 0%       0 0%       
write      create     mkdir      symlink    mknod      remove     rmdir      
0 0%       0 0%       0 0%       0 0%       0 0%       0 0%       0 0%       
rename     link       readdir    readdir+   fsstat     fsinfo     pathconf   
0 0%       0 0%       0 0%       0 0%       0 0%       0 0%       0 0%       
commit     
0 0%       
 
Client nfs_acl:
Version 2: (0 calls)
null       getacl     setacl     getattr    access     
0 0%       0 0%       0 0%       0 0%       0 0%       
Version 3: (0 calls)
null       getacl     setacl     
0 0%       0 0%       0 0%   
---cut nfs

Running snoop (-s 1500) instead gives us:

portnoy.MIT.EDU -> NI.MIT.EDU   NFS C READDIR2 FH=0920 Cookie=0
  NI.MIT.EDU -> portnoy.MIT.EDU NFS R READDIR2 Invalid argument

or with -V:

________________________________
portnoy.MIT.EDU -> NI.MIT.EDU   ETHER Type=0800 (IP), size = 202 bytes
portnoy.MIT.EDU -> NI.MIT.EDU   IP  D=18.72.1.157 S=18.70.0.218 LEN=188, ID=63938
portnoy.MIT.EDU -> NI.MIT.EDU   UDP D=2049 S=1022 LEN=168
portnoy.MIT.EDU -> NI.MIT.EDU   RPC C XID=1547076324 PROG=100003 (NFS) VERS=2 PROC=16
portnoy.MIT.EDU -> NI.MIT.EDU   NFS C READDIR2 FH=0920 Cookie=0
________________________________
  NI.MIT.EDU -> portnoy.MIT.EDU ETHER Type=0800 (IP), size = 70 bytes
  NI.MIT.EDU -> portnoy.MIT.EDU IP  D=18.70.0.218 S=18.72.1.157 LEN=56, ID=65247
  NI.MIT.EDU -> portnoy.MIT.EDU UDP D=1022 S=2049 LEN=36
  NI.MIT.EDU -> portnoy.MIT.EDU RPC R (#7) XID=1547076324 Success
  NI.MIT.EDU -> portnoy.MIT.EDU NFS R READDIR2 Invalid argument

or with -v (output editted):

portnoy -> ni
RPC:  ----- SUN RPC Header -----
RPC:  
RPC:  Transaction id = 1547076330
RPC:  Type = 0 (Call)
RPC:  RPC version = 2
RPC:  Program = 100003 (NFS), version = 2, procedure = 16
RPC:  Credentials: Flavor = 1 (Unix), len = 80 bytes
RPC:     Time = 14-Sep-97 17:24:29
RPC:     Hostname = portnoy
RPC:     Uid = 0, Gid = 1
RPC:     Groups = 33536 32630 1 0 2 3 4 5 6 7 8 9 12 
RPC:  Verifier   : Flavor = 0 (None), len = 0 bytes
RPC:  
NFS:  ----- Sun NFS -----
NFS:  
NFS:  Proc = 16 (Read from directory)
NFS:  File handle = 01000000010000000900002001000000
NFS:                01000000000000000000000000000000
NFS:  Cookie = 0
NFS:  Count = 1048
NFS:  
ni -> portnoy
RPC:  ----- SUN RPC Header -----
RPC:  
RPC:  Transaction id = 1547076330
RPC:  Type = 1 (Reply)
RPC:  This is a reply to frame 11
RPC:  Status = 0 (Accepted)
RPC:  Verifier   : Flavor = 0 (None), len = 0 bytes
RPC:  Accept status = 0 (Success)
RPC:  
NFS:  ----- Sun NFS -----
NFS:  
NFS:  Proc = 16 (Read from directory)
NFS:  Status = 22 (Invalid argument)
NFS:  
 

Anyhow, the problem is that the EINVAL should be detectable somehow
by the user. You should just get:

[portnoy!jhawk] ~# ls /mnt
[portnoy!jhawk] ~# 

And assume that everything was working ok. Arguably this is the most
important bug here (the interface for detecting problems is often the most
important part of an OS).

I'd reserve judgement as to whether the EINVAL is Solaris' fault or not.
A NetBSD client seems to be able to mount ni:/afs with an rsize
of 8192 bytes, so there's reason to suspect Solaris is at fault.

--jhawk

home help back first fref pref prev next nref lref last post