[28124] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 9489 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jul 17 17:13:35 2006

Date: Mon, 17 Jul 2006 11:10:09 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 17 Jul 2006     Volume: 10 Number: 9489

Today's topics:
    Re: What is a type error? <cdsmith@twu.net>
    Re: What is a type error? <marshall.spight@gmail.com>
    Re: What is a type error? <marshall.spight@gmail.com>
    Re: What is a type error? <dnew@san.rr.com>
    Re: What is a type error? <dnew@san.rr.com>
    Re: What is a type error? <cdsmith@twu.net>
    Re: What is a type error? <dnew@san.rr.com>
    Re: What is a type error? <eval.apply@gmail.com>
    Re: When would you use qr// on a literal string? <1usa@llenroc.ude.invalid>
    Re: When would you use qr// on a literal string? <rvtol+news@isolution.nl>
    Re: When would you use qr// on a literal string? <tadmc@augustmail.com>
    Re: When would you use qr// on a literal string? <simon.chao@fmr.com>
        why is  perl -e 'unlink(glob("*"))' so much faster than ewaguespack@gmail.com
    Re: why is  perl -e 'unlink(glob("*"))' so much faster  <1usa@llenroc.ude.invalid>
    Re: why is  perl -e 'unlink(glob("*"))' so much faster  <glennj@ncf.ca>
    Re: why is  perl -e 'unlink(glob("*"))' so much faster  <spam@bsb.me.uk>
    Re: why is  perl -e 'unlink(glob("*"))' so much faster  <rvtol+news@isolution.nl>
    Re: why is  perl -e 'unlink(glob("*"))' so much faster  xhoster@gmail.com
    Re: why is  perl -e 'unlink(glob("*"))' so much faster  xhoster@gmail.com
    Re: why is  perl -e 'unlink(glob("*"))' so much faster  <sherm@Sherm-Pendleys-Computer.local>
    Re: why is perl -e 'unlink(glob("*"))' so much faster t ewaguespack@gmail.com
    Re: why is perl -e 'unlink(glob("*"))' so much faster t xhoster@gmail.com
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Mon, 17 Jul 2006 09:42:35 -0600
From: Chris Smith <cdsmith@twu.net>
Subject: Re: What is a type error?
Message-Id: <MPG.1f255dd3a7fa689198969c@news.altopia.net>

Joachim Durchholz <jo@durchholz.org> wrote:
> I fail to see an example that would support such a claim.
> 
> On the other hand, UPDATE can assign any value to any field of any 
> record, so it's doing exactly what an assignment does. INSERT/DELETE can 
> create resp. destroy records, which is what new and delete operators 
> would do.
> 
> I must really be missing the point.

I *think* I understand Marshall here.  When you are saying "assignment", 
you mean assignment to values of attributes within tuples of the cell.  
When Marshall is saying "assignment", he seems to mean assigning a 
completely new *table* value to a relation; i.e., wiping out the entire 
contents of the relation and replacing it with a whole new set of 
tuples.  Your assignment is indeed less powerful than DML, whereas 
Marshall's assignment is more powerful than DML.

-- 
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation


------------------------------

Date: 17 Jul 2006 08:54:54 -0700
From: "Marshall" <marshall.spight@gmail.com>
Subject: Re: What is a type error?
Message-Id: <1153151694.674061.133510@p79g2000cwp.googlegroups.com>

Chris Smith wrote:
> Joachim Durchholz <jo@durchholz.org> wrote:
> > I fail to see an example that would support such a claim.
> >
> > On the other hand, UPDATE can assign any value to any field of any
> > record, so it's doing exactly what an assignment does. INSERT/DELETE can
> > create resp. destroy records, which is what new and delete operators
> > would do.
> >
> > I must really be missing the point.
>
> I *think* I understand Marshall here.  When you are saying "assignment",
> you mean assignment to values of attributes within tuples of the cell.
> When Marshall is saying "assignment", he seems to mean assigning a
> completely new *table* value to a relation; i.e., wiping out the entire
> contents of the relation and replacing it with a whole new set of
> tuples.  Your assignment is indeed less powerful than DML, whereas
> Marshall's assignment is more powerful than DML.

Exactly.


Marshall



------------------------------

Date: 17 Jul 2006 09:27:06 -0700
From: "Marshall" <marshall.spight@gmail.com>
Subject: Re: What is a type error?
Message-Id: <1153153626.781332.263490@i42g2000cwa.googlegroups.com>

Joachim Durchholz wrote:
> Marshall schrieb:
> > Joachim Durchholz wrote:
> >> Marshall schrieb:
> >>> Good point. Perhaps I should have said "relational algebra +
> >>> variables with assignment." It is interesting to consider
> >>> assignment vs. the more restricted update operators: insert,
> >>> update, delete.
> >> Actually I see it the other way round: assignment is strictly less
> >> powerful than DML since it doesn't allow creating or destroying
> >> variables, while UPDATE does cover assignment to fields.
> >
> > Oh, my.
> >
> > Well, for all table variables T, there exists some pair of
> > values v and v', such that we can transition the value of
> > T from v to v' via assignment, but not by any single
> > insert, update or delete.
>
> I fail to see an example that would support such a claim.

variable T : unary relation of int
T = { 1, 2, 3 };  // initialization
T := { 4, 5 };   // assignment

The above transition of the value of T cannot be be
done by any one single insert, update or delete.
Two would suffice, however. (In fact, any assignement
can be modeled at a full delete followed by an insert
of the new value.)


> On the other hand, UPDATE can assign any value to any field of any
> record,

Yes.

> so it's doing exactly what an assignment does.

No. The variable is the table, not the records. Relations are not
arrays.
Records are not lvalues.


> INSERT/DELETE can
> create resp. destroy records, which is what new and delete operators
> would do.
>
> I must really be missing the point.
>
>  > Further, it is my understanding
> > that your claim of row identity *depends* on the restricted
> > nature of DML; if the only mutator operation is assignment,
> > then there is definitely no record identity.
>
> Again, I fail to connect.
>
> I and others have given aliasing examples that use just SELECT and UPDATE.

Sure, but update's semantics are defined in a per-record way,
which is consistent with record identity. Assignment's isn't.


> >> (However, it's usually new+assignment+delete vs. INSERT+UPDATE+DELETE,
> >> at which point there is not much of a difference.)
> >
> > I am not sure what this means. Delete can be expressed in
> > terms of assignment. (As can insert and update.)
>
> INSERT cannot be expressed in terms of assignment. INSERT creates a new
> record; there's no way that assignment in a language like C can create a
> new data structure!
> The same goes for DELETE.

I was intendind to be discussing a hypothetical relation-based
language,
so while I generally agree with you statement about C, I don't see
how it applies.


>  > (Assignment can also be expressed in terms of insert and delete.)
>
> Agreed.
>
> I also realize that this makes it a bit more difficult to nail down the
> nature of identity in a database.

I would propose that variables have identity, and values do not.
In part this is via the supplied definition of identity, in which, when
you change one thing, if something else changes as well, they
share identity. Since one cannot change values, they necessarily
lack identity.


> It's certainly not storage location:
> if you DELETE a record and then INSERT it with the same values, it may
> be allocated somewhere entirely else, and our intuition would say it's
> not "the same" (i.e. not identical).

Well, it would depend on how our intuition had been primed. If it
was via implementation techniques in low level languages, we
might reach a different conclusion than if our intuition was primed
via logical models and relation theory.


> (In a system with OID, it would
> even be impossible to recreate such a record, since it would have a
> different OID. I'm not sure whether this makes OID systems better or
> worse at preserving identity, but that's just a side track.)

OIDs are something of a kludge, and they break set semantics.


> Since intuition gives me ambivalent results here, I'll go back to my
> semiformal definition (and take the opportunity to make it a bit more
> precise):
> Two path expressions (lvalues, ...) are aliases if and only if the
> referred-to values compare equal, and if they stay equal after applying
> any operation to the referred-to value through either of the path
> expressions.

Alas, this leaves me confused. I don't see how a path expression
(in this case, SELECT ... WHERE) can be an l-value. You cannot
apply imperative operations to the result. (Also I think the use
of equality here is too narrow; it is only necessary to show that
two things both change, not that they change in the same way.)

I was under the impression you agred that "i+2" was not
a "path expression". If our hypothetical language lacks record
identity, then I would say that any query is simply an expression
that returns a value, as in "i+2."


> In the context of SQL, this means that identity isn't the location where
> the data is stored. It's also not the values stored in the record -
> these may change, including key data. SQL record identity is local, it
> can be defined from one operation to the next, but there is no such
> thing as a global identity that one can memorize and look up years
> later, without looking at the intermediate states of the store.

Yes, however all of this depends on record identity.


> It's a gross concept, now that I think about it. Well, or at least
> rather alien for us programmers, who are used to taking the address of a
> variable to get a tangible identity that will stay stable over time.

It is certaily alien if one is not used to relation semantics, which
is the default case.


> On the other hand, variable addresses as tangible identities don't hold
> much water anyway.
> Imagine data that's written out to disk at program end, and read back
> in. Further imagine that while the data is read into main memory,
> there's a mechanism that redirects all further reads and writes to the
> file into the read-in copy in memory, i.e. whenever any program changes
> the data, all other programs see the change, too.
> Alternatively, think about software agents that move from machine to
> machine, carrying their data with them. They might be communicating with
> each other, so they need some means of establishing identity
> ("addressing") the memory buffers that they use for communication.

These are exactly why content-based addressing is so important.
Location addressing depends on an address space, and this
concept does not distribute well.


>  > I don't know what "new" would be in a value-semantics, relational
> > world.
>
> It would be INSERT.
>
> Um, my idea of "value semantics" is associated with immutable values.
> SQL with INSERT/DELETE/UPDATE certainly doesn't match that definition.

Sorry, I was vague. Compare, in OOP, the difference between a value
object and a "regular" object.


> So by my definition, SQL doesn't have value semantics, by your
> definition, it would have value semantics but updates which are enough
> to create aliasing problems, so I'm not sure what point you're making
> here...
>
> >> Filters are just like array indexing: both select a subset of variables
> >> from a collection.
> >
> > I can't agree with this wording. A filter produces a collection
> > value from a collection value. I don't see how variables
> > enter in to it.
>
> A collection can consist of values or variables.
>
> And yes, I do think that WHERE is a selection over a bunch of variables
> - you can update records after all, so they are variables! They don't
> have a name, at least none which is guaranteed to be constant over their
> lifetime, but they can be mutated!

We seem to have slipped back from the hypothetical relation language
with only assignement back to SQL.


>  > One can filter either a collection constant or
> > a collection variable; if one speaks of filtering a collection
> > variable, on is really speaking of filtering the collection value
> > that the variable currently contains; filtering is not an operation
> > on the variable as such, the way the "address of" operator is.
> > Note you can't update the result of a filter.
>
> If that's your definition of a filter, then WHERE is not a filter,
> simple as that.

Fair enough! Can you correct my definition of filter, though?
I am still unaware of the difference.


> >> In SQL, you select a subset of a table, in a
> >> programming language, you select a subset of an array.
> >>
> >> (The SQL selection mechanism is far more flexible in the kinds of
> >> filtering you can apply, while array indexing allows filtering just by
> >> ordinal position. However, the relevant point is that both select things
> >> that can be updated.)
> >
> > When you have been saying "select things that can be updated"
> > I have been assuming you meant that one can derive values
> > from variables, and that some other operation can update that
> > variable, causing the expression, if re-evaluated, to produce
> > a different value.
>
> That's what I meant.
>
>  > However the phrase also suggests that
> > you mean that the *result* of the select can *itself* be
> > updated.
>
> The "that" in "things that can be updated" refers to the selected
> things. I'm not sure how this "that" could be interpreted to refer to
> the selection as a whole (is my understanding of English really that bad?)

Your English is extraordinary. I could easily conclude that you
were born in Boston and educated at Harvard, and either have
Germanic ancestry or have simply adopted a Germanic name
out of whimsy. If English is not your native tongue, there is no
way to detect it.

Argh, late for dropping off my daughter at school now. Must run.
Sorry if I was a bit unclear due to being rushed.


Marshall



------------------------------

Date: Mon, 17 Jul 2006 16:49:31 GMT
From: Darren New <dnew@san.rr.com>
Subject: Re: What is a type error?
Message-Id: <vcPug.24229$Z67.18188@tornado.socal.rr.com>

Chris Smith wrote:

> Darren New <dnew@san.rr.com> wrote:
> 
>>I'm not sure what linear or uniqueness typing is. It's typestate, and if 
>>I remember correctly the papers I read 10 years ago, the folks at 
>>TJWatson that invented Hermes also invented the concept of typestate. 
>>They at least claim to have coined the term.
> 
> Coining the term is one thing, but I feel pretty confident that the idea 
> was not invented in 1986 with the Hermes language, but rather far 
> earlier.

Yes. However, the guys who invented Hermes didn't come up with it out of 
the blue. It was around (in NIL - Network Implementation Language) for 
years beforehand. I read papers about these things in graduate school, 
but I don't know where my photocopies are.

NIL was apparently quite successful, but a niche language, designed by 
IBM for programming IBM routers. Hermes was an attempt years later to 
take the same successful formula and turn it into a general-purpose 
programming system, which failed (I believe) for the same reason that a 
general purpose operating system that can't run C programs will fail.

> Perhaps they may have invented the concept of considering it 
> any different from other applications of types, though.  

 From what I can determine, the authors seem to imply that typestate is 
dataflow analysis modified in (at least) two ways:

1) When control flow joins, the new typestate is the intersection of 
typestates coming into the join, where as dataflow analysis doesn't 
guarantee that. (They imply they think dataflow analysis is allowed to 
say "the variable might or might not be initialized here", while 
typestate would ensure the variable is uninitialized.)

2) The user has control over the typestate, so the user can (for 
exmaple) assert a variable is uninitialized at some point, and by doing 
so, make it so.

How this differs from theoretical lambda types and all I couldn't say.

> What is being named here is the overcoming of a limitation that 
> programming language designers imposed upon themselves, whether from not 
> understanding the theoretical research or not believing it important, I 
> don't know.

I believe there's also a certain level of common-senseness needed to 
make a language even marginally popular. :-)  While it's possible that 
there's really no difference between type and typestate at the 
theoretical level, I think most practical programmers would have trouble 
wrapping their head around that, just as programming in an entirely 
recursive pattern when one is used to looping can be disorienting.

-- 
   Darren New / San Diego, CA, USA (PST)
     This octopus isn't tasty. Too many
     tentacles, not enough chops.


------------------------------

Date: Mon, 17 Jul 2006 16:57:34 GMT
From: Darren New <dnew@san.rr.com>
Subject: Re: What is a type error?
Message-Id: <2kPug.24230$Z67.2556@tornado.socal.rr.com>

Joachim Durchholz wrote:
> of no assertion language that can express such temporal relationships, 
> and even if there is (I'm pretty sure there is), I'm rather sceptical 
> that programmers would be able to write correct assertions, or correctly 
> interpret them - temporal logic offers several traps for the unwary. 

FWIW, this is exactly the area to which LOTOS (Language Of Temporal 
Orderering Specifications) is targetted at. It's essentially based on 
CSP, but somewhat extended. It's pretty straightforward to learn and 
understand, too.  Some have even added "realtime" constraints to it.

-- 
   Darren New / San Diego, CA, USA (PST)
     This octopus isn't tasty. Too many
     tentacles, not enough chops.


------------------------------

Date: Mon, 17 Jul 2006 11:03:20 -0600
From: Chris Smith <cdsmith@twu.net>
Subject: Re: What is a type error?
Message-Id: <MPG.1f2570c1692d9dd698969f@news.altopia.net>

Marshall <marshall.spight@gmail.com> wrote:
> We seem to have slipped back from the hypothetical relation language
> with only assignement back to SQL.

I missed the point where we started discussing such a language.  I 
suspect it was while some of us were still operating under the 
misconception that you assignment to attributes of tuples, rather than 
to entire relations.

I don't see how such a language (limited to assignment of entire 
relations) is particularly helpful to consider.  If the relations are to 
be considered opaque, then there's clearly no aliasing going on.  
However, such a language doesn't seem to solve any actual problems.  It 
appears to be nothing other than a toy language, with a fixed finite set 
of variables having only value semantics, no scope, etc.  I assume that 
relational databases will have the equivalent of SQL's update statement; 
and if that's not the case, then I would need someone to explain how to 
accomplish the same goals in the new relational language; i.e. it would 
need some way of expressing transformations of relations, not just 
complete replacement of them with new relations that are assumed to 
appear out of thin air.

-- 
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation


------------------------------

Date: Mon, 17 Jul 2006 17:19:44 GMT
From: Darren New <dnew@san.rr.com>
Subject: Re: What is a type error?
Message-Id: <QEPug.24235$Z67.2146@tornado.socal.rr.com>

Marshall wrote:
> I would propose that variables have identity, and values do not.
> In part this is via the supplied definition of identity, in which, when
> you change one thing, if something else changes as well, they
> share identity.

Maybe you gave a better definition the first time, but this one doesn't 
really hold up.

> of equality here is too narrow; it is only necessary to show that
> two things both change, not that they change in the same way.)

If I change X, then Y[X] changes also. I don't think X is identical to Y 
or Y[X], nor is it an alias of either.  I think that's where the 
equality comes into it.

-- 
   Darren New / San Diego, CA, USA (PST)
     This octopus isn't tasty. Too many
     tentacles, not enough chops.


------------------------------

Date: 17 Jul 2006 11:00:24 -0700
From: "Joe Marshall" <eval.apply@gmail.com>
Subject: Re: What is a type error?
Message-Id: <1153159224.849151.88170@m79g2000cwm.googlegroups.com>


Marshall wrote:
>
> I am having a hard time with this very broad definition of aliasing.

How about this definition:  Consider three variables, i, j, and k, and
a functional equivalence predicate (EQUIVALENT(i, j) returns true if
for every pure function F, F(i) = F(j)).  Now suppose i and j are
EQUIVALENT at some point, then a side effecting function G is invoked
on k, after which i and j are no longer equivalent.  Then there is
aliasing.

This is still a little awkward, but there are three main points:
  1.  Aliasing occurs between variables (named objects).
  2.  It is tied to the notion of equivalence.
  3.  You can detect it when a procedure that has no access to a value
can nonetheless modify the value.

In a call-by-value language, you cannot alias values directly, but if
the values are aggregate data structures (like in Java), you may be
able to modify a shared subcomponent.



------------------------------

Date: Mon, 17 Jul 2006 15:23:30 +0000 (UTC)
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: When would you use qr// on a literal string?
Message-Id: <Xns980373E0FDB3Easu1cornelledu@132.236.56.8>

"it_says_BALLS_on_your forehead" <simon.chao@fmr.com> wrote in 
news:1153145251.119197.122300@35g2000cwc.googlegroups.com:

> I thought that qr was mainly used for pre-compiling variables into
> regex patterns, but a colleague uses it like so:
> 
> my ( $name, $value ) = split qr/=/, $string;
> 
> Are there any benefits to doing this? He claims that the use of the
> qr// op here can help capture an error if the STRING is not a valid
> regex. Does this make any sense?

The first argument to split is a regex, whether you use regex notation or 
not. 

So, I am not fond of:

split '=', $string;

because it obscures that fact.

On the other hand, I don't see what additional mileage qr gets you other 
than saying: I really want to signal that this is a regex to other 
programmers.

As for what mileage that gets you in terms of preventing errors, take a 
look at:

#!/usr/bin/perl

use strict;
use warnings;

my $data = 'name$sinan';

my @a = split '$', $data;
my @b = split /$/, $data;
my @c = split qr/$/, $data;

# correct 
my @d = split /\$/, $data;

print "@a\n@b\n@c\n@d\n";

__END__

Sinan


------------------------------

Date: Mon, 17 Jul 2006 17:31:50 +0200
From: "Dr.Ruud" <rvtol+news@isolution.nl>
Subject: Re: When would you use qr// on a literal string?
Message-Id: <e9ghpm.1hs.1@news.isolution.nl>

it_says_BALLS_on_your forehead schreef:

> I thought that qr was mainly used for pre-compiling variables into
> regex patterns, but a colleague uses it like so:
>
> my ( $name, $value ) = split qr/=/, $string;
>
> Are there any benefits to doing this? He claims that the use of the
> qr// op here can help capture an error if the STRING is not a valid
> regex. Does this make any sense?

I don't see any difference in using /+/ or qr/+/, both give the same
error.
My gut feeling says that split does a precompile on a /PATTERN/.


$ perl -MO=Deparse -e 'print split ".+"'
print split(/.+/, $_, 0);

$ perl -wle 'print split "+"'
Quantifier follows nothing in regex; marked by <-- HERE in m/+ <-- HERE
/ at -e line 1.

$ perl -wle 'print split /+/'
Quantifier follows nothing in regex; marked by <-- HERE in m/+ <-- HERE
/ at -e line 1.

$ perl -wle 'print split qr/+/'
Quantifier follows nothing in regex; marked by <-- HERE in m/+ <-- HERE
/ at -e line 1.

-- 
Affijn, Ruud

"Gewoon is een tijger."




------------------------------

Date: Mon, 17 Jul 2006 10:33:08 -0500
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: When would you use qr// on a literal string?
Message-Id: <slrnebnbdk.vf2.tadmc@magna.augustmail.com>

it_says_BALLS_on_your forehead <simon.chao@fmr.com> wrote:
> I thought that qr was mainly used for pre-compiling variables into
> regex patterns, but a colleague uses it like so:
> 
> my ( $name, $value ) = split qr/=/, $string;
> 
> Are there any benefits to doing this? 


Not that I can see.


> He claims that the use of the
> qr// op here can help capture an error if the STRING is not a valid
> regex. 


I have no idea what "capture an error" might mean...


> Does this make any sense?


No, unless he can give an example where using qr// gives more
info than an m// with the same pattern.

These both make the same output for instance.

   perl -e 'qr/(/'

and

   perl -e 'm/(/'

Can your colleague provide a counter-example that shows qr// as 
somehow "better"?


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: 17 Jul 2006 08:47:37 -0700
From: "it_says_BALLS_on_your forehead" <simon.chao@fmr.com>
Subject: Re: When would you use qr// on a literal string?
Message-Id: <1153151257.236269.292830@b28g2000cwb.googlegroups.com>


Dr.Ruud wrote:
> it_says_BALLS_on_your forehead schreef:
>
> > I thought that qr was mainly used for pre-compiling variables into
> > regex patterns, but a colleague uses it like so:
> >
> > my ( $name, $value ) = split qr/=/, $string;
> >
> > Are there any benefits to doing this? He claims that the use of the
> > qr// op here can help capture an error if the STRING is not a valid
> > regex. Does this make any sense?
>
> I don't see any difference in using /+/ or qr/+/, both give the same
> error.
> My gut feeling says that split does a precompile on a /PATTERN/.
>
>
> $ perl -MO=Deparse -e 'print split ".+"'
> print split(/.+/, $_, 0);
>
> $ perl -wle 'print split "+"'
> Quantifier follows nothing in regex; marked by <-- HERE in m/+ <-- HERE
> / at -e line 1.
>
> $ perl -wle 'print split /+/'
> Quantifier follows nothing in regex; marked by <-- HERE in m/+ <-- HERE
> / at -e line 1.
>
> $ perl -wle 'print split qr/+/'
> Quantifier follows nothing in regex; marked by <-- HERE in m/+ <-- HERE
> / at -e line 1.

Great, that's informative. Thanks Doc.



------------------------------

Date: 17 Jul 2006 08:16:35 -0700
From: ewaguespack@gmail.com
Subject: why is  perl -e 'unlink(glob("*"))' so much faster than rm ?
Message-Id: <1153149395.583924.157680@35g2000cwc.googlegroups.com>

i had a situation that required that i remove several thousand zero
byte files, and i tried this first:

# find . -type f -exec rm -f {} \;

this was taking ages, so on a hunch I decided to try this to see it I
got any better results:

# perl -e 'unlink(glob("*"))'

surprisingly the perl unlink took about a quarter of a second to remove
1000 files versus 30 seconds with find / rm

any idea why?



------------------------------

Date: Mon, 17 Jul 2006 15:25:54 +0000 (UTC)
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: why is  perl -e 'unlink(glob("*"))' so much faster than rm ?
Message-Id: <Xns9803744A68A2Fasu1cornelledu@132.236.56.8>

ewaguespack@gmail.com wrote in news:1153149395.583924.157680@
35g2000cwc.googlegroups.com:

> i had a situation that required that i remove several thousand zero
> byte files, and i tried this first:
> 
> # find . -type f -exec rm -f {} \;

This executes rm separately for each file found.

> this was taking ages, so on a hunch I decided to try this to see it I
> got any better results:
> 
> # perl -e 'unlink(glob("*"))'
> 
> surprisingly the perl unlink took about a quarter of a second to remove
> 1000 files versus 30 seconds with find / rm

How about

rm -f *

?

Sinan


------------------------------

Date: 17 Jul 2006 15:38:51 GMT
From: Glenn Jackman <glennj@ncf.ca>
Subject: Re: why is  perl -e 'unlink(glob("*"))' so much faster than rm ?
Message-Id: <slrnebnbob.m92.glennj@smeagol.ncf.ca>

At 2006-07-17 11:25AM, A. Sinan Unur <1usa@llenroc.ude.invalid> wrote:
>  ewaguespack@gmail.com wrote in news:1153149395.583924.157680@
>  35g2000cwc.googlegroups.com:
>  
> > i had a situation that required that i remove several thousand zero
> > byte files, and i tried this first:
> > 
> > # find . -type f -exec rm -f {} \;
>  
>  This executes rm separately for each file found.

Additionally, 'find' is descending into subdirectories.

> > this was taking ages, so on a hunch I decided to try this to see it I
> > got any better results:
> > 
> > # perl -e 'unlink(glob("*"))'
> > 
> > surprisingly the perl unlink took about a quarter of a second to remove
> > 1000 files versus 30 seconds with find / rm
>  
>  How about
>  
>  rm -f *

These solutions look in the current directory only.

-- 
Glenn Jackman
Ulterior Designer


------------------------------

Date: 17 Jul 2006 15:44:33 GMT
From: Ben Bacarisse <spam@bsb.me.uk>
Subject: Re: why is  perl -e 'unlink(glob("*"))' so much faster than rm ?
Message-Id: <44bbb061$0$5233$db0fefd9@news.zen.co.uk>

ewaguespack@gmail.com wrote:
> i had a situation that required that i remove several thousand zero
> byte files, and i tried this first:
> 
> # find . -type f -exec rm -f {} \;
> 
> this was taking ages, so on a hunch I decided to try this to see it I
> got any better results:
> 
> # perl -e 'unlink(glob("*"))'

I smell a rat.  What an odd command to post!  For one thing, it does
not do the same as the find above and, secondly, a single rm would
surely be faster still?

With luck, no one will have tried either command out!

-- 
Ben.


------------------------------

Date: Mon, 17 Jul 2006 17:50:01 +0200
From: "Dr.Ruud" <rvtol+news@isolution.nl>
Subject: Re: why is  perl -e 'unlink(glob("*"))' so much faster than rm ?
Message-Id: <e9givh.1ho.1@news.isolution.nl>

Glenn Jackman schreef:

>>  rm -f *
>
> These solutions look in the current directory only.

  rm -rf *

-- 
Affijn, Ruud

"Gewoon is een tijger."




------------------------------

Date: 17 Jul 2006 15:59:02 GMT
From: xhoster@gmail.com
Subject: Re: why is  perl -e 'unlink(glob("*"))' so much faster than rm ?
Message-Id: <20060717120355.016$8B@newsreader.com>

ewaguespack@gmail.com wrote:
> i had a situation that required that i remove several thousand zero
> byte files, and i tried this first:
>
> # find . -type f -exec rm -f {} \;
>
> this was taking ages, so on a hunch I decided to try this to see it I
> got any better results:

That fires up a separate rm process for each file.  Using strace -f, it
looks like this involves 99 system calls per rm (not counting the ones done
in the parent process), only one of which is related to the actual unlink.

> # perl -e 'unlink(glob("*"))'

This doesn't do the -type f checking.  If you don't really need to
do the -type f checking, why did you use find (rather than "rm -f *")
in the first place?  One possible reason is if that gives you an argument
list too long error.  I use the perl -le 'unlink(glob($ARGV[0]))' construct
frequently for just that reason.

> surprisingly the perl unlink took about a quarter of a second to remove
> 1000 files versus 30 seconds with find / rm

That really surprises me.  Not because of the difference between the two
methods, but because both of them are about 20 times slower for you than
they are on my not-particularly fast machine.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service                        $9.95/Month 30GB


------------------------------

Date: 17 Jul 2006 16:07:24 GMT
From: xhoster@gmail.com
Subject: Re: why is  perl -e 'unlink(glob("*"))' so much faster than rm ?
Message-Id: <20060717121217.777$BT@newsreader.com>

Ben Bacarisse <spam@bsb.me.uk> wrote:
> ewaguespack@gmail.com wrote:
> > i had a situation that required that i remove several thousand zero
> > byte files, and i tried this first:
> >
> > # find . -type f -exec rm -f {} \;
> >
> > this was taking ages, so on a hunch I decided to try this to see it I
> > got any better results:
> >
> > # perl -e 'unlink(glob("*"))'
>
> I smell a rat.  What an odd command to post!  For one thing, it does
> not do the same as the find above and, secondly, a single rm would
> surely be faster still?
>
> With luck, no one will have tried either command out!

I tried out both commands.  In a test directory made for just such a
purpose, of course.  Sheesh.  You'd think the part about "remove several
thousand...files" as well as the "rm" and "unlink" showing up in all their
undisguised glory would be a pretty good tip off that one should not try
then in root and as root.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service                        $9.95/Month 30GB


------------------------------

Date: Mon, 17 Jul 2006 13:23:09 -0400
From: Sherm Pendley <sherm@Sherm-Pendleys-Computer.local>
Subject: Re: why is  perl -e 'unlink(glob("*"))' so much faster than rm ?
Message-Id: <m28xms87sy.fsf@Sherm-Pendleys-Computer.local>

ewaguespack@gmail.com writes:

> i had a situation that required that i remove several thousand zero
> byte files, and i tried this first:
>
> # find . -type f -exec rm -f {} \;
>
> this was taking ages, so on a hunch I decided to try this to see it I
> got any better results:
>
> # perl -e 'unlink(glob("*"))'
>
> surprisingly the perl unlink took about a quarter of a second to remove
> 1000 files versus 30 seconds with find / rm
>
> any idea why?

The find was spawning a new instance of 'rm' for each file - very inefficient.

The equivalent to your Perl code would be to use find to get a list of files,
and then use 'xargs' to pass that whole list to one instance of 'rm':

    find . -type f -print0 | xargs -0 rm -f

sherm--

-- 
Web Hosting by West Virginians, for West Virginians: http://wv-www.net
Cocoa programming in Perl: http://camelbones.sourceforge.net


------------------------------

Date: 17 Jul 2006 09:30:12 -0700
From: ewaguespack@gmail.com
Subject: Re: why is perl -e 'unlink(glob("*"))' so much faster than rm ?
Message-Id: <1153153812.285262.253060@m79g2000cwm.googlegroups.com>



xhoster@gmail.com wrote:
> ewaguespack@gmail.com wrote:
> > i had a situation that required that i remove several thousand zero
> > byte files, and i tried this first:
> >
> > # find . -type f -exec rm -f {} \;
> >
> > this was taking ages, so on a hunch I decided to try this to see it I
> > got any better results:
>
> That fires up a separate rm process for each file.  Using strace -f, it
> looks like this involves 99 system calls per rm (not counting the ones done
> in the parent process), only one of which is related to the actual unlink.
>
> > # perl -e 'unlink(glob("*"))'
>
> This doesn't do the -type f checking.  If you don't really need to
> do the -type f checking, why did you use find (rather than "rm -f *")
> in the first place?  One possible reason is if that gives you an argument
> list too long error.  I use the perl -le 'unlink(glob($ARGV[0]))' construct
> frequently for just that reason.
>
> > surprisingly the perl unlink took about a quarter of a second to remove
> > 1000 files versus 30 seconds with find / rm
>
> That really surprises me.  Not because of the difference between the two
> methods, but because both of them are about 20 times slower for you than
> they are on my not-particularly fast machine.
>
> Xho

I used find because the original number of files would not delete using
rm -f *, i got the "argument list is too long" error

i think part of the problem is that the server in question was
experiencing high iowait times....

when I ran the rm command on an idle server it was much faster.

I am still curious why it was so much faster.



------------------------------

Date: 17 Jul 2006 17:12:58 GMT
From: xhoster@gmail.com
Subject: Re: why is perl -e 'unlink(glob("*"))' so much faster than rm ?
Message-Id: <20060717131752.375$4A@newsreader.com>

ewaguespack@gmail.com wrote:
> xhoster@gmail.com wrote:
> > ewaguespack@gmail.com wrote:
> > > i had a situation that required that i remove several thousand zero
> > > byte files, and i tried this first:
> > >
> > > # find . -type f -exec rm -f {} \;
> > >
> > > this was taking ages, so on a hunch I decided to try this to see it I
> > > got any better results:
> >
> > That fires up a separate rm process for each file.  Using strace -f, it
> > looks like this involves 99 system calls per rm (not counting the ones
> > done in the parent process), only one of which is related to the actual
> > unlink.
> >
 ...
> i think part of the problem is that the server in question was
> experiencing high iowait times....
>
> when I ran the rm command on an idle server it was much faster.

When you have very large directories with multiple handles open to them
at the same time, things can degenerate spectacularly.  Manipulating
directory entries has to be transactional, and I suspect the overhead of
making that so is very high.

> I am still curious why it was so much faster.

I no longer know what "it" refers to, or what part of the answers you have
been give you don't understand/believe.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service                        $9.95/Month 30GB


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 9489
***************************************


home help back first fref pref prev next nref lref last post