[32385] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3652 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Mar 28 18:09:48 2012

Date: Wed, 28 Mar 2012 15:09:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 28 Mar 2012     Volume: 11 Number: 3652

Today's topics:
        Getting old values back (Tim McDaniel)
        How to speed up this slow part of my program <justin.1203@purestblue.com>
    Re: How to speed up this slow part of my program <rweikusat@mssgmbh.com>
    Re: How to speed up this slow part of my program <ben@morrow.me.uk>
    Re: How to speed up this slow part of my program (Tim McDaniel)
    Re: How to speed up this slow part of my program <rweikusat@mssgmbh.com>
    Re: How to speed up this slow part of my program <ben@morrow.me.uk>
    Re: How to speed up this slow part of my program <rweikusat@mssgmbh.com>
        reason for local($_) ? jaialai.technology@gmail.com
    Re: reason for local($_) ? (Randal L. Schwartz)
    Re: reason for local($_) ? (Tim McDaniel)
    Re: subroutine exists <nospam.gravitalsun@hotmail.com.nospam>
    Re: subroutine exists <NoSpamPleaseButThisIsValid3@gmx.net>
    Re: subroutine exists <rweikusat@mssgmbh.com>
    Re: Why a different result? (Tim McDaniel)
    Re: Why a different result? <rweikusat@mssgmbh.com>
    Re: Why a different result? <ben@morrow.me.uk>
    Re: Why a different result? (Tim McDaniel)
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 28 Mar 2012 05:11:03 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Getting old values back
Message-Id: <jku6h7$cv7$1@reader1.panix.com>

In article <jku4qp$h9a$1@reader1.panix.com>,
Tim McDaniel <tmcd@panix.com> wrote:
>(As a tangent: I think there was a way in Perl 4 to assign to a
>subscript of an array, shrink the array to be smaller than the
>subscript, regrow past that subscript, and see the assigned value
>again.  Does anyone remember the details?  Please tell me that it no
>longer works.)

Found it in "man perldata":

    The length of an array is a scalar value.  You may find the length
    of array @days by evaluating $#days, as in csh.  However, this
    isn't the length of the array; it's the subscript of the last
    element, which is a different value since there is ordinarily a
    0th element.  Assigning to $#days actually changes the length of
    the array.  Shortening an array this way destroys intervening
    values.  Lengthening an array that was previously shortened does
    not recover values that were in those elements.  (It used to do so
    in Perl 4, but we had to break this to make sure destructors were
    called when expected.)

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Wed, 28 Mar 2012 16:24:40 +0100
From: Justin C <justin.1203@purestblue.com>
Subject: How to speed up this slow part of my program
Message-Id: <o2nb49-dnt.ln1@zem.masonsmusic.co.uk>

We have a database of thousands of clothing items. Some of the items are
almost identical apart from their size. Consequently we use the same
image in our web-shop to advertise items of the same style, design and
colour.

In a program I have to get new images from the art guy's computer I end
up grepping the entire list of items $#(list-of-items) times, there must
be a better way. The file names are exactly the same as the style codes
apart from the size suffix being dropped. I'm using File::Find.

Here's some code:

find(\&set_flag, (keys %{ $stock_groups->{text2code} }));

sub set_flag {
	return unless (-f $_ );
	
	(my $item_code_part = $_) =~ s/\.jpg//;
	$item_code_part = uc($item_code_part);
	$item_code_part =~ s|_|/|g;
	
	my @matches = grep(/$item_code_part/, keys %{ $stock_items });
	foreach my $i (@matches) {
		$stock_items->{$i}{got_image} = 1;
	}
}

The 'find' iterates through two top level directories, 36 next level
directories in each of the top level, and about 20k files accross the
entire 72 level 2 directories. It then passes the filename to the sub
which compares the filename (which is only a partial stock code because
it may have several matches) with the hash of stock_items, pulling out
matches. Those matches are then iterated over and the items with an
available image get a hash element added and set to 1.

I can probably do the grep and iteration over the matches with map{...},
grep{...}, keys %{ $stock_items}; but I don't think that'll save much
time, and I'm not certain how to do, I can see how to create a new hash,
but I'm not sure if changing the hash while grep iterates through it is
a good idea.

The bottle-neck, as I see it, is running grep 20k times, once for each
image found. Can anyone suggest a better way?

   Justin.

-- 
Justin C, by the sea.


------------------------------

Date: Wed, 28 Mar 2012 17:21:59 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: How to speed up this slow part of my program
Message-Id: <87hax8wva0.fsf@sapphire.mobileactivedefense.com>

Justin C <justin.1203@purestblue.com> writes:

[...]

> find(\&set_flag, (keys %{ $stock_groups->{text2code} }));
>
> sub set_flag {
> 	return unless (-f $_ );
> 	
> 	(my $item_code_part = $_) =~ s/\.jpg//;
> 	$item_code_part = uc($item_code_part);
> 	$item_code_part =~ s|_|/|g;
> 	
> 	my @matches = grep(/$item_code_part/, keys %{ $stock_items });
> 	foreach my $i (@matches) {
> 		$stock_items->{$i}{got_image} = 1;
> 	}
> }
>
> The 'find' iterates through two top level directories, 36 next level
> directories in each of the top level, and about 20k files accross the
> entire 72 level 2 directories. It then passes the filename to the sub
> which compares the filename (which is only a partial stock code because
> it may have several matches) with the hash of stock_items, pulling out
> matches. Those matches are then iterated over and the items with an
> available image get a hash element added and set to 1.

[...]

> The bottle-neck, as I see it, is running grep 20k times, once for each
> image found. Can anyone suggest a better way?

"Doing linear scans over an associative array is like trying to club
someone to death with a loaded Uzi."

An obvious suggestion would be to traverse the stock_items keys once
an build a second hash which maps each item_code_part to an array
reference containing all the corresponding stock_items keys (or even
to a reference to the corresponding stock_items element) and use this
second hash to locate the keys which need to be changed (or the hash
locations which need to be changed) in constant time.

NB: Someone should probably repost that since 'Justin who keeps coming
out of my killfile' as chosen to 'block' himself from seeing my
answers to questions of him for 'clearly indecent behaviour' ...


------------------------------

Date: Wed, 28 Mar 2012 20:35:02 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: How to speed up this slow part of my program
Message-Id: <6o5c49-d65.ln1@anubis.morrow.me.uk>


Quoth Justin C <justin.1203@purestblue.com>:
> 
> We have a database of thousands of clothing items. Some of the items are
> almost identical apart from their size. Consequently we use the same
> image in our web-shop to advertise items of the same style, design and
> colour.
> 
> In a program I have to get new images from the art guy's computer I end
> up grepping the entire list of items $#(list-of-items) times, there must
> be a better way. The file names are exactly the same as the style codes
> apart from the size suffix being dropped. I'm using File::Find.
> 
> Here's some code:
> 
> find(\&set_flag, (keys %{ $stock_groups->{text2code} }));
> 
> sub set_flag {
> 	return unless (-f $_ );
> 	
> 	(my $item_code_part = $_) =~ s/\.jpg//;
> 	$item_code_part = uc($item_code_part);
> 	$item_code_part =~ s|_|/|g;
> 	
> 	my @matches = grep(/$item_code_part/, keys %{ $stock_items });

Careful: you want \Q there, even if you think you're sure the filenames
are all safe.

> 	foreach my $i (@matches) {
> 		$stock_items->{$i}{got_image} = 1;
> 	}
> }

I would probably turn this into a big pattern match. Something like
this:

    use File::Find::Rule;

    my ($imgs) = map qr/$_/, join "|", map "\Q\U$_",
        map { (my ($x) = /(.*)\.jpg/) =~ s!_!/!g; $x }
        File::Find::Rule->file->in(keys %{...});

    while (my ($item, $entry) = each %$stock_items) {
        $item =~ $imgs and $entry->{got_image} = 1;
    }

If you're using 5.14 you can get rid of the ugly map block using s///r
and tr///r:

    map tr!_!/!r, map s/\.jpg//r,

This assumes the entries in %$stock_items are already hashrefs; if they
aren't a 'for (keys %$stock_items)' loop will be easier.

If you want to speed it up more, use Regexp::Assemble. My experiments
suggest it speeds a match like this up considerably when you've got
20_000 entries.

Ben



------------------------------

Date: Wed, 28 Mar 2012 20:17:05 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: How to speed up this slow part of my program
Message-Id: <jkvrk1$t6q$1@reader1.panix.com>

In article <6o5c49-d65.ln1@anubis.morrow.me.uk>,
Ben Morrow  <ben@morrow.me.uk> wrote:
>I would probably turn this into a big pattern match. Something like
>this:
>
>    use File::Find::Rule;
>
>    my ($imgs) = map qr/$_/, join "|", map "\Q\U$_",
>        map { (my ($x) = /(.*)\.jpg/) =~ s!_!/!g; $x }
>        File::Find::Rule->file->in(keys %{...});
>
>    while (my ($item, $entry) = each %$stock_items) {
>        $item =~ $imgs and $entry->{got_image} = 1;
>    }
>
>If you're using 5.14 you can get rid of the ugly map block using s///r
>and tr///r:
>
>    map tr!_!/!r, map s/\.jpg//r,
>
>This assumes the entries in %$stock_items are already hashrefs; if they
>aren't a 'for (keys %$stock_items)' loop will be easier.

What's s///r and tr///r?  I looked in "man perlop" and "man perlre"
for a system with 5.14.2, but I didn't see them.

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Wed, 28 Mar 2012 21:37:07 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: How to speed up this slow part of my program
Message-Id: <871uocbgy4.fsf@sapphire.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Justin C <justin.1203@purestblue.com>:

[...]

>> find(\&set_flag, (keys %{ $stock_groups->{text2code} }));
>> 
>> sub set_flag {
>> 	return unless (-f $_ );
>> 	
>> 	(my $item_code_part = $_) =~ s/\.jpg//;
>> 	$item_code_part = uc($item_code_part);
>> 	$item_code_part =~ s|_|/|g;
>> 	
>> 	my @matches = grep(/$item_code_part/, keys %{ $stock_items });
>
> Careful: you want \Q there, even if you think you're sure the filenames
> are all safe.
>
>> 	foreach my $i (@matches) {
>> 		$stock_items->{$i}{got_image} = 1;
>> 	}
>> }
>
> I would probably turn this into a big pattern match. Something like
> this:
>
>     use File::Find::Rule;
>
>     my ($imgs) = map qr/$_/, join "|", map "\Q\U$_",
>         map { (my ($x) = /(.*)\.jpg/) =~ s!_!/!g; $x }
>         File::Find::Rule->file->in(keys %{...});
>
>     while (my ($item, $entry) = each %$stock_items) {
>         $item =~ $imgs and $entry->{got_image} = 1;
>     }

Something I should already have written last time: A hash lookup is
(supposed to be) an O(1) operation. Matching against a set of
alternative patterns is O(n). In this particular case, the means that
the algorithm you suggest still scales as badly as the originally
used one in this respect (run time proportional to the of patters
times the number of stock items), except that it is possibly somewhat faster
(although I wouldn't want to bet on that blindly).

The sensible way to do a lot of dictionary lookups is using a
dictionary, especially considering that Perl has native support for
that.




------------------------------

Date: Wed, 28 Mar 2012 21:54:33 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: How to speed up this slow part of my program
Message-Id: <9dac49-446.ln1@anubis.morrow.me.uk>


Quoth tmcd@panix.com:
> In article <6o5c49-d65.ln1@anubis.morrow.me.uk>,
> Ben Morrow  <ben@morrow.me.uk> wrote:
> >
> >If you're using 5.14 you can get rid of the ugly map block using s///r
> >and tr///r:
> >
> >    map tr!_!/!r, map s/\.jpg//r,
> 
> What's s///r and tr///r?  I looked in "man perlop" and "man perlre"
> for a system with 5.14.2, but I didn't see them.

From 5.14.2's perlop, under s/// and tr///:

|   If the "/r" (non-destructive) option is used then it runs the
|   substitution on a copy of the string and instead of returning
|   the number of substitutions, it returns the copy whether or not
|   a substitution occurred.  The original string is never changed
|   when "/r" is used.  The copy will always be a plain string,
|   even if the input is an object or a tied variable.

Ben



------------------------------

Date: Wed, 28 Mar 2012 22:38:52 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: How to speed up this slow part of my program
Message-Id: <87wr649zir.fsf@sapphire.mobileactivedefense.com>

Rainer Weikusat <rweikusat@mssgmbh.com> writes:

> Ben Morrow <ben@morrow.me.uk> writes:
>> Quoth Justin C <justin.1203@purestblue.com>:
>
> [...]
>
>>> find(\&set_flag, (keys %{ $stock_groups->{text2code} }));
>>> 
>>> sub set_flag {
>>> 	return unless (-f $_ );
>>> 	
>>> 	(my $item_code_part = $_) =~ s/\.jpg//;
>>> 	$item_code_part = uc($item_code_part);
>>> 	$item_code_part =~ s|_|/|g;
>>> 	
>>> 	my @matches = grep(/$item_code_part/, keys %{ $stock_items });
>>
>> Careful: you want \Q there, even if you think you're sure the filenames
>> are all safe.
>>
>>> 	foreach my $i (@matches) {
>>> 		$stock_items->{$i}{got_image} = 1;
>>> 	}
>>> }
>>
>> I would probably turn this into a big pattern match. Something like
>> this:
>>
>>     use File::Find::Rule;
>>
>>     my ($imgs) = map qr/$_/, join "|", map "\Q\U$_",
>>         map { (my ($x) = /(.*)\.jpg/) =~ s!_!/!g; $x }
>>         File::Find::Rule->file->in(keys %{...});
>>
>>     while (my ($item, $entry) = each %$stock_items) {
>>         $item =~ $imgs and $entry->{got_image} = 1;
>>     }

[...]

> Matching against a set of alternative patterns is O(n). In this
> particular case, the means that the algorithm you suggest still
> scales as badly as the originally used one

OTOH, I have now learnt why 'autothreading' is supposed to be
useful. I can imagine the following set of 'optimizations' which led
to it:

	1. Do it as in the original example. The algorithm is
	quadratic and doesn't scale, performance sucks.

        2. Assemble a giant regexp: The algorithm is still quadratic
        and doesn't scale, performance sucks.

        3. Turn the giant regexp into a suitable kind of search tree:
        Unfortunately, the algorithm is still quadratic and doesn't
        scale, performance sucks.

        4. Go back to 2, use as many cores as available in order to
        try alternatives in parallell: Given that the algorithm
        remains quadratic and doesn't scale, performance still sucks
        but now, it at least flattens the complete system.

        [5. Give up in despair and use a Hadoop-cluster. Add more
        nodes whenever the fact that quadratic algorithms don't scale
        and performance sucks because of this becomes to obvious].

Steps 1 - 5 also invole a successive increase in code complexity,
starting from 'more complicated than necessary' and ending with
'absolutely insanely more complicated than necessary' ...


------------------------------

Date: Wed, 28 Mar 2012 15:29:08 -0400
From: jaialai.technology@gmail.com
Subject: reason for local($_) ?
Message-Id: <70b7b$4f736685$6c1489b1$29143@news.eurofeeds.com>

I recently saw some Perl code at work in which the original author
starts every subroutine with the line local($_);
Every.single.one. Like this:

sub whatever{
     local($_);
     print "Whatever man!\n";
}

Of course that isn't a real example but my point is
that even in subroutines that don't use $_ this is done.
I have no idea in what context this would ever make sense to do?
Any thoughts?
I suspect that this is some construct from Perl versions < 5. Is it?
If so, what did it mean back then?


------------------------------

Date: Wed, 28 Mar 2012 13:07:57 -0700
From: merlyn@stonehenge.com (Randal L. Schwartz)
To: jaialai.technology@gmail.com
Subject: Re: reason for local($_) ?
Message-Id: <86limkcwv6.fsf@red.stonehenge.com>

>>>>> "jaialai" == jaialai technology <jaialai.technology@gmail.com> writes:

jaialai> sub whatever{
jaialai>     local($_);
jaialai>     print "Whatever man!\n";
jaialai> }

jaialai> Of course that isn't a real example but my point is
jaialai> that even in subroutines that don't use $_ this is done.
jaialai> I have no idea in what context this would ever make sense to do?
jaialai> Any thoughts?

Voodoo.  It was needed in one place, but he does it everywhere now out
of fear, rather than thinking about it.

It's like all those crazy people who put "i" and "s" modifiers on *all*
regex even if they aren't using any feature affected by it.  Silly.

print "Just another Perl hacker,"; 

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.posterous.com/ for Smalltalk discussion


------------------------------

Date: Wed, 28 Mar 2012 20:34:48 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: reason for local($_) ?
Message-Id: <jkvsl8$h7r$1@reader1.panix.com>

In article <70b7b$4f736685$6c1489b1$29143@news.eurofeeds.com>,
 <jaialai.technology@gmail.com> wrote:
>I recently saw some Perl code at work in which the original author
>starts every subroutine with the line local($_);
>Every.single.one. Like this:
>
>sub whatever{
>     local($_);
>     print "Whatever man!\n";
>}
>
>Of course that isn't a real example but my point is
>that even in subroutines that don't use $_ this is done.
>I have no idea in what context this would ever make sense to do?

Um, I usually did that.  As you know, $_ is a global variable that's
used implicitly by a lot of operations.  local()ing it in a sub makes
absolutely sure that anything done in this sub or under it will not
screw up the caller.

I did it in each sub because it was just part of how I started a sub.
It was very quick to type and not likely to cause much overhead.
Otherwise, when editing a sub, I would have had to consider whether
the new code might change $_, and if so go to the top of the sub to
see whether there's a "local $_" already there.  And if you only
"local $_" when it's necessary, then there's a chance that you'll
overlook a code path and fail to local() it, causing a bug that may be
subtle.

It's true that 95% of the time I used $_, it was as the implicit
control variable of a foreach loop, which automatically localizes its
control variable.  But better an unnecessary local than a silent bug.

"my $_" is relatively newish, coming in in Perl 5.9, it appears.

"man perlsub" has a section on "When to Still Use local".

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Wed, 28 Mar 2012 11:19:38 +0300
From: "George Mpouras" <nospam.gravitalsun@hotmail.com.nospam>
Subject: Re: subroutine exists
Message-Id: <jkuhhb$20s$1@news.ntua.gr>

Some::Package->can("some_method")  is good. The side effect is that executes 
the method (if exists) when you might only want to check 




------------------------------

Date: Wed, 28 Mar 2012 11:09:18 +0200
From: Wolf Behrenhoff <NoSpamPleaseButThisIsValid3@gmx.net>
Subject: Re: subroutine exists
Message-Id: <4f72d53e$0$7617$9b4e6d93@newsspool1.arcor-online.net>

Am 28.03.2012 10:19, schrieb George Mpouras:
> Some::Package->can("some_method")  is good. The side effect is that executes 
> the method (if exists) when you might only want to check 

I don't observe such a side effect. It doesn't execute the function.

from perltoot:
The can() method, called against that object or class, reports back
whether its string argument is a callable method name in that class.  In
fact, it gives you back a function reference to that method:

- Wolf


------------------------------

Date: Wed, 28 Mar 2012 16:54:36 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: subroutine exists
Message-Id: <87pqbwwwjn.fsf@sapphire.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:

[...]

> The correct check for a method is to use ->can:
>
>     Some::Package->can("some_method")
>
> This will also detect inherited methods, which is important.
>
> A correct AUTOLOAD implementation will also supply a ->can method which
> gives the right answers (though this is surprisingly difficult to get
> right, in practice, and not all modules do it correctly).

This is inherently impossible because Perl is not a static language in
the sense that the set of existing or potentially existing named
subroutines at time X is necessarily a subset of the set of existing
or potentially existing named subroutines at some time Y, Y > X.
While the set of entries into existing symbol tables is a less dynamic
environment than the set of directory entries in a UNIX(*) filesystem,
checking for existence of named subroutine before calling them suffers
generally from the same TOCTOU races checking for name files before
doing something with them does.


------------------------------

Date: Wed, 28 Mar 2012 04:42:01 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: Why a different result?
Message-Id: <jku4qp$h9a$1@reader1.panix.com>

In article <87ty19pvg4.fsf@stemsystems.com>,
Uri Guttman  <uri@stemsystems.com> wrote:
>>>>>> "J" == James  <hslee911@yahoo.com> writes:
>
>  J> undef %o, %e;
>
>first off it isn't even needed before the first loop.

Sometimes I like to have an explicit initialization, as documention
that I have considered the issue and believe that the assignment is
needed at that point.  It's also nice if I have to move the block of
code someone else.  But I'm idiosyncratic in that respect -- I suspect
few people do that.

>secondly it doesn't do what you think it does.

If nothing else, for one reason not mentioned yet.  The undef docs say

    undef EXPR
    undef

    ... Note that this is a unary operator, not a list operator.

That is, it takes either zero or one argument.  It certainly doesn't
take two or more.  So the quoted line is equivalent to

    (undef(%o)), %e;

It undefs only %o and doesn't touch %e.  %e is evaluated in a void
context, and if it returns a value, it's thrown away.

The output under Perl 5.14 of

    #! /usr/bin/perl -w
    use strict;
    use warnings;
    my %x;
    my %y = ('sparkly' => "You might think this is deleted by undef");
    undef %x, %y;
    print "$y{'sparkly'}. If so, you'd be wrong.\n";
    exit 0;

is

    $ perl local/test/095.pl
    Useless use of private hash in void context at local/test/095.pl line 6.
    You might think this is deleted by undef. If so, you'd be wrong.

Note well: if the original poster had used
    use strict;
    use warnings;
as ought to be done, the bug would have been found.

Similarly for "my" taking only one argument (though it can be a list
of variables), and that "use strict; use warnings;" is great.
     $ perl -e 'use warnings; my $x = 3, $y = 4;'
     Name "main::y" used only once: possible typo at -e line 1.
That is, the warning about $main::y indicates that $y was NOT my-ed.
     $ perl -e 'use strict; use warnings; my $x = 3, $y = 4;'
     Global symbol "$y" requires explicit package name at -e line 1.
     Execution of -e aborted due to compilation errors.
is better, having a fatal error due to the undeclared $y.
The usual way for multiple initializations in my is with separate
statements
    $ perl -e 'use strict; use warnings; my $x = 3; my $y = 4; print "$x $y\n"'
    3 4
or with lists
    $ perl -e 'use strict; use warnings; my ($x, $y) = (3, 4); print "$x $y\n"'
    3 4

>the proper way to clear hashes is to assign an empty list to them:
>
>	%o = () ;
>
>undef is not meant to be used on aggregates (arrays and hashes). it
>not only clears the data, it reclaims all storage inside it.

Um, if (by Grice's heuristics) you're trying to imply that %o=() does
NOT reclaim all storage, you just made undef sound better.  In Perl
5.14, the scalar value of the hash is the same for deleting every
existing element, assigning (), or undeffing:

    #! /usr/bin/perl -w
    my %x;
    @x{0..1000} = ();
    print '%x is ', scalar(%x), "\n";
    delete @x{0..1000};
    print '%x is ', scalar(%x), "\n";
    @x{0..1000} = ();
    %x = ();
    print '%x is ', scalar(%x), "\n";
    @x{0..1000} = ();
    undef %x;
    print '%x is ', scalar(%x), "\n";
    exit 0;

results in

    $ perl local/test/097.pl
    %x is 630/1024
    %x is 0
    %x is 0
    %x is 0

There may well be a way to look at internals to know how much storage
space is used for each behind the scenes, but I don't know what such a
method might be.

(As a tangent: I think there was a way in Perl 4 to assign to a
subscript of an array, shrink the array to be smaller than the
subscript, regrow past that subscript, and see the assigned value
again.  Does anyone remember the details?  Please tell me that it no
longer works.)

>and it leads to a worse problem which is using defined on aggregates
>to see if they have any elements.

That is to say, using "undef %x" might lead you to think that
"defined %x" is also usable, but "defined" on a hash table is a trap
and a snare, because

>and that is very wrong as defined on a hash which has been undef'ed
>will be false but if is ever had elements but was empty now, defined
>on it will be true. and that is almost never what you expect.

Indeed.  To illustrate that,

     #! /usr/bin/perl -w
     use strict;
     use warnings;

     my %x;
     sub checkx($) {
         print "After $_[0]\n";
         print "    defined? ", (defined %x ? "yes\n" : "no\n");
         print "    boolean? ", (%x ? "true\n" : "false\n");
         print "\n";
     }
     checkx("start");
     %x = ();
     checkx("empty list");
     # Quotes aren't needed around 'fred' or 'sparkly',
     # but I'd rather not go into the bareword rules.
     $x{'fred'} = 'barney';
     $x{23} = 45;
     delete $x{23};
     delete $x{'fred'};
     checkx("deleting last element");
     %x = ();
     checkx("again empty list");
     undef %x;
     checkx("undeffing");

     exit 0;

results in

    $ perl local/test/094.pl
    defined(%hash) is deprecated at local/test/094.pl line 8.
            (Maybe you should just omit the defined()?)
    After start
        defined? no
        boolean? false

    After empty list
        defined? no
        boolean? false

    After deleting last element
        defined? yes
        boolean? false

    After again empty list
        defined? yes
        boolean? false

    After undeffing
        defined? no
        boolean? false

That is, "use warnings;" again comes through.  And the program shows
that a simple use of %x in boolean context like
    if (! %x)
tells you accurately in each case that %x has no elements, but
    if (! defined %x)
is not a reliable indication of that.

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Wed, 28 Mar 2012 13:23:04 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Why a different result?
Message-Id: <87sjgssymv.fsf@sapphire.mobileactivedefense.com>

tmcd@panix.com (Tim McDaniel) writes:
> In article <87pqbxpvcr.fsf@stemsystems.com>,
> Uri Guttman  <uri@stemsystems.com> wrote:
>>>>>>> "J" == James  <hslee911@yahoo.com> writes:
>>
>>  J> Thanks. So parenthesizing individual expressions is the key.
>>  J> ($f%2) ? ($o{$f}= -1) : ($e{$f}=1) ;
>>
>>NO! using ?: the correct way is the key. it is for returning one
>>expression from the pair. it is NOT for side effects like assignment
>>or calling functions.
>
> It was made to work, though with some difficulty, so it is a large
> terminological inexactitude to call it flatly "incorrect" and to yell
> "NO!"

Below is a quote from the perlop(1) manpage:

       The operator may be assigned to if both the 2nd and 3rd
       arguments are legal lvalues (meaning that you can assign to
       them):

           ($a_or_b ? $a : $b) = $c;

       Because this operator produces an assignable result, using
       assignments without parentheses will get you in trouble.  For
       example, this:

           $a % 2 ? $a += 10 : $a += 2

       Really means this:

           (($a % 2) ? ($a += 10) : $a) += 2

       Rather than this:

           ($a % 2) ? ($a += 10) : ($a += 2)

       That should probably be written more simply as:

           $a += ($a % 2) ? 10 : 2;

I would usually avoid the second form because it discards the result
of the ?: and needlessly repeats the variable and the operator in both
terms. OTOH, results are frequently discarded, so that's not a 'hard'
reason. I have certainly used the first form whenever it was
convenient[*]. That ?: returns an lvalue in Perl is different from the
C ?: which implies some conscious design descision.

[*] I actually used to do

	(a ? func_a : func_b)(a, b, c);

in C or

        ($a ? \&func_a : \&func_b)->($a, $b, $c)

in Perl until I sadly realized that no compiler will ever 'optimize
that' in the seemingly obvious way: After all, it's not a common
coding blunder mathematically oriented perpetual newbies with a
desired to avoid learning are wont to make ...


------------------------------

Date: Wed, 28 Mar 2012 14:09:27 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Why a different result?
Message-Id: <75fb49-i5.ln1@anubis.morrow.me.uk>


Quoth tmcd@panix.com:
> In article <87ty19pvg4.fsf@stemsystems.com>,
> Uri Guttman  <uri@stemsystems.com> wrote:
> >>>>>> "J" == James  <hslee911@yahoo.com> writes:
> >
> >  J> undef %o, %e;
> >
> >first off it isn't even needed before the first loop.
> 
> Sometimes I like to have an explicit initialization, as documention
> that I have considered the issue and believe that the assignment is
> needed at that point.  It's also nice if I have to move the block of
> code someone else.  But I'm idiosyncratic in that respect -- I suspect
> few people do that.

Not at all: it's a common mistake made by those who haven't learned how
to use 'my' properly. The correct way to have written the OP's example
would be

    {
        my (%o, %e);
        ...;
    }
    {
        my (%o, %e);
        ...;
    }

That way each block is self-contained, doesn't affect the code outside
it, and can be moved anywhere else in the program safely.

> >the proper way to clear hashes is to assign an empty list to them:
> >
> >	%o = () ;
> >
> >undef is not meant to be used on aggregates (arrays and hashes). it
> >not only clears the data, it reclaims all storage inside it.
> 
> Um, if (by Grice's heuristics) you're trying to imply that %o=() does
> NOT reclaim all storage, you just made undef sound better.

    ~% perl -MDevel::Peek -e'my %h = 1..20; %h = (); Dump \%h'
    SV = IV(0x624288) at 0x624290
      REFCNT = 1
      FLAGS = (TEMP,ROK)
      RV = 0x624080
      SV = PVHV(0x60b070) at 0x624080
        REFCNT = 2
        FLAGS = (PADMY,SHAREKEYS)
        ARRAY = 0x609230
        KEYS = 0
        FILL = 0
        MAX = 15
        RITER = -1
        EITER = 0x0

Notice that ARRAY is still set, and MAX is 15, so this hash still has
space allocated for 15 keys (none of which are currently being used,
since KEYS is 0).

    ~% perl -MDevel::Peek -e'my %h = 1..20; undef %h; Dump \%h'
    SV = IV(0x624288) at 0x624290
      REFCNT = 1
      FLAGS = (TEMP,ROK)
      RV = 0x624080
      SV = PVHV(0x60b070) at 0x624080
        REFCNT = 2
        FLAGS = (PADMY,SHAREKEYS)
        ARRAY = 0x0
        KEYS = 0
        FILL = 0
        MAX = 7
        RITER = -1
        EITER = 0x0

This time ARRAY has been freed and MAX reset to 7 (which is what it
always starts at). 

However, this is not necessarily a good thing: if you are about to reuse
it (and if you aren't, why is it still in scope?) then perl will just
have to reallocate all that memory again, which is a waste of time.
Leaving it allocated is part of perl's general policy of trading space
for speed. It is worth being aware, though, that if you've allocated a
really big hash, *and* you're definitely not going to use that hash
again, that it may pay to explicitly undef it when you've finished using
it.

Ben



------------------------------

Date: Wed, 28 Mar 2012 15:24:57 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: Why a different result?
Message-Id: <jkvag9$dgo$1@reader1.panix.com>

Ben, thank you for your clear and cogent reply (as usual).  I quite
agree with you that restricting scope of a variable by

>    {
>        my (%o, %e);
>        ...;
>    }

is often a good technique, and that @a=() may be more time-efficient
than undef @a if you want to add elements again to @a "(and if you
aren't, why is it still in scope?)".

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3652
***************************************


home help back first fref pref prev next nref lref last post