[31492] in Perl-Users-Digest
Perl-Users Digest, Issue: 2751 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jan 4 03:10:01 2010
Date: Mon, 4 Jan 2010 00:09:06 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Mon, 4 Jan 2010 Volume: 11 Number: 2751
Today's topics:
Re: How to put '#!/usr/bin/env perl -w' at the beginnin <nospam-abuse@ilyaz.org>
Re: How to put '#!/usr/bin/env perl -w' at the beginnin <sysadmin@example.com>
Re: How to put '#!/usr/bin/env perl -w' at the beginnin <sysadmin@example.com>
most efficient way to get number of files in a director <guba@vi-anec.de>
Re: most efficient way to get number of files in a dire <jurgenex@hotmail.com>
Re: most efficient way to get number of files in a dire <uri@StemSystems.com>
Re: most efficient way to get number of files in a dire <rvtol+usenet@xs4all.nl>
Re: most efficient way to get number of files in a dire <sysadmin@example.com>
Re: most efficient way to get number of files in a dire <john@castleamber.com>
Re: most efficient way to get number of files in a dire <ben@morrow.me.uk>
Re: most efficient way to get number of files in a dire <nospam-abuse@ilyaz.org>
Re: most efficient way to get number of files in a dire <sysadmin@example.com>
Re: most efficient way to get number of files in a dire <m@rtij.nl.invlalid>
perl in BartPE: locale warning <nospam-abuse@ilyaz.org>
Re: WWW::Scripter or Javascript and perl <ben@morrow.me.uk>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sun, 3 Jan 2010 17:34:10 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: How to put '#!/usr/bin/env perl -w' at the beginning of a perl script?
Message-Id: <slrnhk1l8i.ot6.nospam-abuse@powdermilk.math.berkeley.edu>
On 2010-01-03, Wanna-Be Sys Admin <sysadmin@example.com> wrote:
>>>> Every system you've seen gives write permission
>>>> on /usr/bin to ordinary users?
>>
>>> Nope. I never said this was for /usr/bin. The symlink would (or
>>> should) be from their user's account path, and point that to
>>> wherever,
>>> if they were going to use a symlink. This would allow them to run it
>>> on whatever system, and just have to point the one symlink to the
>>> system's Perl install.
>>
>> Won't work. Here, /home directories migrate between different mount
>> points...
> Would it matter what home directory the account had, if they created
> symlink to the unique path, or are you saying the mount points
> alternate without notice and the path might change and break after the
> initial link creation?
When I "invented" your proposed solution, I investigated why it does
not work. There were several reasons; what I remember:
a) when entering from different computers on the net, $HOME may
point to different locations (via local disk, or via NFS mount
points - which in turn may be mounted differently on different
architectures);
b) when a disk carrying (a bunch of $HOME directories) would fill
up, $HOME may migrate to a different disk on a different computer.
I do not think "prior notice" matters - should I hunt for all
symlinks which are going to be broken? And *when* I'm going to
do this? AND: I do not remember any notice anyway...
Yours,
Ilya
------------------------------
Date: Sun, 03 Jan 2010 15:38:58 -0800
From: Wanna-Be Sys Admin <sysadmin@example.com>
Subject: Re: How to put '#!/usr/bin/env perl -w' at the beginning of a perl script?
Message-Id: <nU90n.13887$DY5.2010@newsfe08.iad>
Ilya Zakharevich wrote:
> On 2010-01-03, Wanna-Be Sys Admin <sysadmin@example.com> wrote:
>>>>> Every system you've seen gives write permission
>>>>> on /usr/bin to ordinary users?
>>>
>>>> Nope. I never said this was for /usr/bin. The symlink would (or
>>>> should) be from their user's account path, and point that to
>>>> wherever,
>>>> if they were going to use a symlink. This would allow them to run
>>>> it on whatever system, and just have to point the one symlink to
>>>> the system's Perl install.
>>>
>>> Won't work. Here, /home directories migrate between different mount
>>> points...
>
>> Would it matter what home directory the account had, if they created
>> symlink to the unique path, or are you saying the mount points
>> alternate without notice and the path might change and break after
>> the initial link creation?
>
> When I "invented" your proposed solution, I investigated why it does
> not work. There were several reasons; what I remember:
I think there is some confusion. I wasn't the one that proposed
modifying $HOME or creating symlinks. I talked about them in response
to what some others had suggested.
> a) when entering from different computers on the net, $HOME may
> point to different locations (via local disk, or via NFS mount
> points - which in turn may be mounted differently on different
> architectures);
Yes, they might. The OP's issue was having the script work across
different systems, stating the path has to be fixed to use whatever
unique install. There are only so many ways to go about that, and
several people offered various solutions. If it's always a unique
install on the local account, then they should be able to set the path
without setting it absolute (and static) perhaps. Who knows what will
really work for them well, since they didn't outline the different
scenarios about the exact problem porting their script to different
systems.
> b) when a disk carrying (a bunch of $HOME directories) would fill
> up, $HOME may migrate to a different disk on a different
> computer.
True, I was simply saying that usually an account is set up for one
directory or partition, or drive, or mount (local or NFS, or iscsi, or
whatever), and not moved, as new accounts are usually set up with a
different path if the other starts to become full, but you're right,
accounts may be moved after the fact to clear up room. I'd hope any
system admin that moved accounts would be smart enough to know they
should do one of two things; 1) inform the client so they can update
any absolute paths any scripts might use that point to their home
directory, or better yet 2) move the account and create a symlink from
the old path to the new one (e.g., ln -s /home2/account /home), so the
paths remain the same. Of course, there's no guarantee the OP will put
their script on a system where someone thought to do that if they did
move data and the $PATH or $HOME variable changed (or needs to be).
Anyway, I never recommended using $HOME.
> I do not think "prior notice" matters
See above about what I had meant, and I agree it shouldn't matter.
> - should I hunt for all
> symlinks which are going to be broken?
Nope, but as a sys admin, it's wise to not assume or force clients to
have to update a bunch of absolute paths in their scripts (for any
number of things) when the sys admin can simply create a link to the
new target so things continue to work as normal (and it takes a single
inode for that symlink, so space/inode issues don't have any
relevance). Anyway, I just responded previously to state that a
symlink solution could work, but that depends on a few variables, just
like most any other solution does. There's not enough information to
suggest that anything in particular will work best for them, but I
agree that symlinks and modifying $HOME isn't what I'd usually
recommend either.
> And *when* I'm going to
> do this? AND: I do not remember any notice anyway...
I don't know what the above is supposed to mean, but it doesn't really
matter, I'm not arguing with or disagreeing with you about anything
regarding symlinks or modifying $HOME. In fact, I posted previously in
response to say it might work, but it probably wasn't the best
solution. I guess it's been a long year???
--
Not really a wanna-be, but I don't know everything.
------------------------------
Date: Sun, 03 Jan 2010 15:44:09 -0800
From: Wanna-Be Sys Admin <sysadmin@example.com>
Subject: Re: How to put '#!/usr/bin/env perl -w' at the beginning of a perl script?
Message-Id: <eZ90n.541$Mv3.149@newsfe05.iad>
l v wrote:
> Create a symbolic link for /usr/bin/perl to your non-standard location
> and you should then be able to use the correct shebang in your Perl
> scripts
Just to update a response on this, I believe I misread this post
earlier. It looks like the user is just a normal account level user on
the system (uid >= 500), so they couldn't modify the perl binary path
to be a symlink to somewhere else, and if they had that power to, they
may as well install the perl binary to just be _at_ /usr/bin/perl (or
whatever default path they use) and not use a symlink, etc. I believe
the issue was they had a unique perl binary built and located in
another path or mount or most likely in the account's own directory
that they upload their script to. It's actually not really clear about
the details of what they need or how it might work (or break), but
either way, it doesn't sound like they have super user access to change
the system installed perl binary or its path.
--
Not really a wanna-be, but I don't know everything.
------------------------------
Date: Sun, 3 Jan 2010 14:46:50 -0800 (PST)
From: "guba@vi-anec.de" <guba@vi-anec.de>
Subject: most efficient way to get number of files in a directory
Message-Id: <9faa6298-a642-4000-8288-44b2cb602daa@j5g2000yqm.googlegroups.com>
Hello,
I am searching the most efficient way to get the number of files
in a directory (up to 10^6 files). I will use the nr as a stop
condition
of of generation process so the method must be applied during this
process
a lot of times. Therefore it must be efficient and opendir is not the
choice.
I am thinking about the bash command "ls | wc -l"
but I don't know how to get this in a perl variable.
Thank you very much for any help!
------------------------------
Date: Sun, 03 Jan 2010 15:10:28 -0800
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: most efficient way to get number of files in a directory
Message-Id: <5n82k5tb6e0t8vjquet8lcf9gf0kno5q9g@4ax.com>
"guba@vi-anec.de" <guba@vi-anec.de> wrote:
>I am searching the most efficient way to get the number of files
>in a directory (up to 10^6 files). I will use the nr as a stop
>condition
>of of generation process so the method must be applied during this
>process
>a lot of times. Therefore it must be efficient and opendir is not the
>choice.
opendir() or glob() would have been my first suggestion. But you will
have to run your own benchmark tests, I doubt that anyone has ever
investigated performance in such a scenario before.
>I am thinking about the bash command "ls | wc -l"
>but I don't know how to get this in a perl variable.
Use backticks:
my $captured = `ls | wc -l`;
Of course, if launching two external processes and initiating IPC is
indeed faster than using Perl's buildin functions has to be tested.
jue
------------------------------
Date: Sun, 03 Jan 2010 18:28:22 -0500
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: most efficient way to get number of files in a directory
Message-Id: <87y6keydih.fsf@quad.sysarch.com>
>>>>> "JE" == Jürgen Exner <jurgenex@hotmail.com> writes:
JE> "guba@vi-anec.de" <guba@vi-anec.de> wrote:
>> I am searching the most efficient way to get the number of files
>> in a directory (up to 10^6 files). I will use the nr as a stop
>> condition
>> of of generation process so the method must be applied during this
>> process
>> a lot of times. Therefore it must be efficient and opendir is not the
>> choice.
JE> opendir() or glob() would have been my first suggestion. But you will
JE> have to run your own benchmark tests, I doubt that anyone has ever
JE> investigated performance in such a scenario before.
how would opendir be slower than any other method (perl, shell, ls, glob
or other)? they ALL must do a system call to opendir underneath as that
is the only normal way to read a dir (you can 'open' a dir as a file but
then you have to parse it out yourself which can be painful).
JE> Of course, if launching two external processes and initiating IPC is
JE> indeed faster than using Perl's buildin functions has to be tested.
i can't see how they would ever be faster unless they can buffer the
dirnames better than perl's opendir can (when assigning to an
array). the fork overhead should easily lose out in this case but i
won't benchmark it with 10k files in a dir! :)
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
------------------------------
Date: Mon, 04 Jan 2010 00:42:13 +0100
From: "Dr.Ruud" <rvtol+usenet@xs4all.nl>
Subject: Re: most efficient way to get number of files in a directory
Message-Id: <4b412b55$0$22917$e4fe514c@news.xs4all.nl>
guba@vi-anec.de wrote:
> I am searching the most efficient way to get the number of files
> in a directory (up to 10^6 files). I will use the nr as a stop
> condition
> of of generation process so the method must be applied during this
> process
> a lot of times. Therefore it must be efficient and opendir is not the
> choice.
>
> I am thinking about the bash command "ls | wc -l"
> but I don't know how to get this in a perl variable.
Why have so many files in a directory? You could create them in
subdirectories named after the first few characters of the filename.
Or maybe you are looking for a database solution?
Or add a byte to a metafile, each time a new file is created, and check
the size of that file?
--
Ruud
------------------------------
Date: Sun, 03 Jan 2010 15:48:07 -0800
From: Wanna-Be Sys Admin <sysadmin@example.com>
Subject: Re: most efficient way to get number of files in a directory
Message-Id: <X0a0n.542$Mv3.36@newsfe05.iad>
Jürgen Exner wrote:
> opendir() or glob() would have been my first suggestion. But you will
> have to run your own benchmark tests, I doubt that anyone has ever
> investigated performance in such a scenario before.
Hmm, I've not looked, so you might be right, but I'd think someone
probably had benchmarked the results before, but then again, maybe
you're right, considering the number of files in the directory itself
is ridiculously large, so someone may have not bothered and used a
better directory structure for the files instead. Daily, I see this as
a common issue with clients, asking why their FTP program doesn't show
files after the 2000th one, and ask if they can have use modify FTP to
allow the listing of 10-20K files. That's when the education has to
begin for the client.
--
Not really a wanna-be, but I don't know everything.
------------------------------
Date: Sun, 03 Jan 2010 17:58:56 -0600
From: John Bokma <john@castleamber.com>
Subject: Re: most efficient way to get number of files in a directory
Message-Id: <877hrybv0f.fsf@castleamber.com>
"Dr.Ruud" <rvtol+usenet@xs4all.nl> writes:
> guba@vi-anec.de wrote:
>
>> I am searching the most efficient way to get the number of files
>> in a directory (up to 10^6 files). I will use the nr as a stop
>> condition
>> of of generation process so the method must be applied during this
>> process
>> a lot of times. Therefore it must be efficient and opendir is not the
>> choice.
>>
>> I am thinking about the bash command "ls | wc -l"
>> but I don't know how to get this in a perl variable.
>
> Why have so many files in a directory? You could create them in
> subdirectories named after the first few characters of the filename.
I've used the first few characters of the md5 hex digest of the
filename, depending on how the files are named [1], this might
distribute the files more evenly.
(e.g. if a lot of files start with the you might end up with a lot of
files in the "the" directory).
--
John Bokma
Read my blog: http://johnbokma.com/
Hire me (Perl/Python): http://castleamber.com/
------------------------------
Date: Sun, 3 Jan 2010 23:48:59 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: most efficient way to get number of files in a directory
Message-Id: <b0n717-uh6.ln1@osiris.mauzo.dyndns.org>
Quoth "guba@vi-anec.de" <guba@vi-anec.de>:
>
> I am searching the most efficient way to get the number of files
> in a directory (up to 10^6 files). I will use the nr as a stop
> condition
> of of generation process so the method must be applied during this
> process
> a lot of times. Therefore it must be efficient and opendir is not the
> choice.
Your algorithm is broken. Find a different way of detecting when you
have finished. Apart from anything else, on many filesystems it's very
inefficient to put that many files in a single directory.
> I am thinking about the bash command "ls | wc -l"
> but I don't know how to get this in a perl variable.
ls calls opendir, so that won't help. Depending on your OS, you may be
able to to use FAM or inotify or some equivalent file-change-
notification mechanism, which could allow you to do a single scan at the
start and then count new files as they are added. You would need to
watch carefully for race conditions, though.
Ben
------------------------------
Date: Mon, 4 Jan 2010 02:36:15 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: most efficient way to get number of files in a directory
Message-Id: <slrnhk2l0u.ppg.nospam-abuse@powdermilk.math.berkeley.edu>
On 2010-01-03, Wanna-Be Sys Admin <sysadmin@example.com> wrote:
> J?rgen Exner wrote:
>
>> opendir() or glob() would have been my first suggestion. But you will
>> have to run your own benchmark tests, I doubt that anyone has ever
>> investigated performance in such a scenario before.
>
> Hmm, I've not looked, so you might be right, but I'd think someone
> probably had benchmarked the results before, but then again, maybe
> you're right, considering the number of files in the directory itself
> is ridiculously large, so someone may have not bothered and used a
> better directory structure for the files instead. Daily, I see this as
> a common issue with clients, asking why their FTP program doesn't show
> files after the 2000th one, and ask if they can have use modify FTP to
> allow the listing of 10-20K files. That's when the education has to
> begin for the client.
???? Just upgrade the server to use some non-brain-damaged
filesystem. 100K files in a directory should not be a big deal...
E.g., AFAIK, with HPFS386 1Mfile would not be much user-noticable.
Ilya
P.S. Of course, if one uses some brain-damaged API (like POSIX, which
AFAIK does not allow "merged" please_do_readdir_and_stat()
call), this may significantly slow down things even with
average-intelligence FSes...
------------------------------
Date: Sun, 03 Jan 2010 22:25:40 -0800
From: Wanna-Be Sys Admin <sysadmin@example.com>
Subject: Re: most efficient way to get number of files in a directory
Message-Id: <ERf0n.1238$rL7.410@newsfe23.iad>
Ilya Zakharevich wrote:
> On 2010-01-03, Wanna-Be Sys Admin <sysadmin@example.com> wrote:
>> J?rgen Exner wrote:
>>
>>> opendir() or glob() would have been my first suggestion. But you
>>> will have to run your own benchmark tests, I doubt that anyone has
>>> ever investigated performance in such a scenario before.
>>
>> Hmm, I've not looked, so you might be right, but I'd think someone
>> probably had benchmarked the results before, but then again, maybe
>> you're right, considering the number of files in the directory itself
>> is ridiculously large, so someone may have not bothered and used a
>> better directory structure for the files instead. Daily, I see this
>> as a common issue with clients, asking why their FTP program doesn't
>> show files after the 2000th one, and ask if they can have use modify
>> FTP to
>> allow the listing of 10-20K files. That's when the education has to
>> begin for the client.
>
> ???? Just upgrade the server to use some non-brain-damaged
> filesystem. 100K files in a directory should not be a big deal...
> E.g., AFAIK, with HPFS386 1Mfile would not be much user-noticable.
>
A lot of systems I have to fix things on, are not one's I make the call
for. ext3 is about as good as it gets, which is fine, but... Anyway,
this is also about programs users are limited to use by management,
such as pure-ftpd, where it becomes a resource issue if it has to list
20K+ files in each directory. But, I do understand what you're getting
at.
--
Not really a wanna-be, but I don't know everything.
------------------------------
Date: Mon, 4 Jan 2010 07:44:10 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: most efficient way to get number of files in a directory
Message-Id: <qaf817-f0a.ln1@news.rtij.nl>
On Sun, 03 Jan 2010 14:46:50 -0800, guba@vi-anec.de wrote:
> I am thinking about the bash command "ls | wc -l" but I don't know how
> to get this in a perl variable.
Perls opendir is better, but if you use ls, you probably want to use the
unsorted flag to ls.
M4
------------------------------
Date: Mon, 4 Jan 2010 02:27:59 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: perl in BartPE: locale warning
Message-Id: <slrnhk2khf.ppg.nospam-abuse@powdermilk.math.berkeley.edu>
I cannot manage to build BartPE in a way which allows perl (AS flavor)
to run without warnings about (rephrasing):
setting locale failed
reverting to "C" locale
Somebody knowing a workaround? Some environment missing, or do I need
some more "language support" components?
Thanks,
Ilya
P.S. I'm getting similar warnings from some other Unixish projects
(e.g., Hugin)...
------------------------------
Date: Sun, 3 Jan 2010 15:27:23 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: WWW::Scripter or Javascript and perl
Message-Id: <rjp617-hk1.ln1@osiris.mauzo.dyndns.org>
Quoth Nathan <nathanabu@gmail.com>:
>
> I found out that wsp.pl actually works, but it doesnt work on that
> given site because of security...it doesnt dump anything when I browse
> that site, but other sites works fine.
> so, I believe there is another way to see the the JS is issuing when I
> click on that 'Submit' button, and then I can do the same with a POST
> request,isnt it??
You can use the LiveHTTPHeaders FF extension, but this is now OT for
clpmisc.
Ben
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 2751
***************************************