[17594] in Athena Bugs
Re: problems with large directories on SGI's?
daemon@ATHENA.MIT.EDU (Greg Hudson)
Mon Feb 28 18:01:44 2000
Message-Id: <200002282301.SAA08560@small-gods.mit.edu>
To: Stan Zanarotti <srz@MIT.EDU>
Cc: bugs@MIT.EDU
In-Reply-To: Your message of "Mon, 28 Feb 2000 00:23:56 EST."
<200002280523.AAA20011@x15-cruise-basselope.mit.edu>
Date: Mon, 28 Feb 2000 18:01:39 -0500
From: Greg Hudson <ghudson@MIT.EDU>
> Are there any known AFS bugs with the SGI dealing with large
> directories?
This is the first I've heard. Thanks for reporting this. Since we
are upgrading to AFS 3.5 in the next release, we should probably test
with AFS 3.5 before reporting a bug, but here are my findings under
AFS 3.4 ports 1.73 (the version we're running on SGIs in the cluster
now):
I tried making a directory with 8192 files on it in the dev cell and
doing an ls -l. On whirlpool (an Ultra 5) without the cache
preloaded, it took 12.84 seconds, for around 38280 stats per second;
on pyramids (an R4600 Indy), system call tracing revealed that the
process was plodding along at 213 stats per second. tcpdump revealed
a repeating pattern of:
pyramids: rx data seq 1 <client-init>,<last-pckt> fs bulk-stat (396) (ttl 60, id 52991)
pyramids: rx data seq 1 <client-init>,<last-pckt> fs bulk-stat (396) (ttl 60, id 52991)
hum: rx data seq 1 <req-ack> (1444) (DF) (ttl 254, id 37014)
pyramids: rx ack seq 1 <client-init> (62) (ttl 60, id 52992)
pyramids: rx ack seq 1 <client-init> (62) (ttl 60, id 52992)
hum: rx data seq 2 <req-ack>,<more-pckts> (1444) (DF) (ttl 254, id 37015)
hum: rx data seq 3 <last-pckt> (108) (DF) (ttl 254, id 37016)
pyramids: rx ack seq 2 <client-init> (63) (ttl 60, id 52993)
pyramids: rx ack seq 2 <client-init> (63) (ttl 60, id 52993)
pyramids: rx data seq 1 <client-init>,<last-pckt> fs fetch-status fid 536871712/47816/178256 (44) (ttl 60, id 53019)
pyramids: rx data seq 1 <client-init>,<last-pckt> fs fetch-status fid 536871712/47816/178256 (44) (ttl 60, id 53019)
hum: rx data seq 1 <last-pckt> (148) (DF) (ttl 254, id 37017)
(For conciseness, I have elided timestamps and showed only the
originating host of each packet.) Each pattern involves a different
fid in the fetch-status; some successive fids are:
536871712/47814/178255
536871712/47816/178256
536871712/47818/178257
536871712/47820/178258
536871712/47822/178259
So it looks like the AFS client is trying to use the bulk stat
optimization but is failing somehow and falling back to fetch-status.