[32046] in Perl-Users-Digest
Perl-Users Digest, Issue: 3310 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Mar 7 18:09:24 2011
Date: Mon, 7 Mar 2011 15:09:06 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Mon, 7 Mar 2011 Volume: 11 Number: 3310
Today's topics:
Cheap Wholesale Chanel Shoes <jiangsuwu88@gmail.com>
Re: contexts <cwilbur@chromatico.net>
Re: contexts <mvdwege@mail.com>
efficiency of if ( my @a = /pattern/g ) { print "@a\n" jidanni@jidanni.org
Re: efficiency of if ( my @a = /pattern/g ) { print "@a <m@rtij.nl.invlalid>
Re: efficiency of if ( my @a = /pattern/g ) { print "@a <glex_no-spam@qwest-spam-no.invalid>
Re: efficiency of if ( my @a = /pattern/g ) { print "@a <uri@StemSystems.com>
Re: efficiency of if ( my @a = /pattern/g ) { print "@a <jwkrahn@example.com>
Re: going from CPAN to RPM <agw@dsm.fordham.edu>
Re: Hashes are good, but not good enough. <tzz@lifelogs.com>
Re: Hashes are good, but not good enough. <hjp-usenet2@hjp.at>
Re: Hashes are good, but not good enough. <tzz@lifelogs.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sun, 6 Mar 2011 23:32:58 -0800 (PST)
From: brandseller <jiangsuwu88@gmail.com>
Subject: Cheap Wholesale Chanel Shoes
Message-Id: <970c6e7e-86fd-4e7d-9a1c-c586be67178a@y31g2000prd.googlegroups.com>
Cheap Wholesale Gucci Shoes (paypal payment) (http://
www.cntrade88.com/)
Cheap Wholesale GUCCI Boots
Cheap Wholesale Lacoste Shoes
Cheap Wholesale LV Shoes (paypal payment) (http://
www.cntrade88.com/ )
Cheap Wholesale LV Boots
Cheap Wholesale Prada Shoes (paypal payment) (http://
www.cntrade88.com/)
Cheap Wholesale Timberland Shoes
Cheap Wholesale D&G Shoes (paypal payment)
(http://www.cntrade88.com/)
Cheap Wholesale D&G Boots
Cheap Wholesale Puma Shoes
Cheap Wholesale Puma AAA (paypal payment)
(http://www.cntrade88.com/)
Cheap Wholesale UGG Boots Shoes
Cheap Wholesale Bikkem Bergs Shoes (free shipping)
Cheap Wholesale Mauri Shoes Man
Cheap Wholesale Versace Shoes (paypal payment)
(http://www.cntrade88.com/)
Cheap Wholesale Versace Boots
Cheap Wholesale Paul Smith Shoes (free shipping)
Cheap Wholesale BOSS Shoes
Cheap Wholesale Burberry Shoes (paypal payment)
(http://www.cntrade88.com/ )
Cheap Wholesale Dsquared shoes
Cheap Wholesale Dior Shoes (free shipping)
Cheap Wholesale Dior Boots
Cheap Wholesale ED Hardy Shoes (paypal payment)
(http://www.cntrade88.com/ )
Cheap Wholesale ED Hardy Boots
Cheap Wholesale ED Hardy Shoes Man (paypal payment)
(http://www.cntrade88.com/ )
Cheap Wholesale Fendi Shoes (paypal payment)
(http://www.cntrade88.com/ )
Cheap Wholesale Fendi Boots
Cheap Wholesale AFF Shoes (paypal payment)
(http://www.cntrade88.com/ )
Cheap Wholesale Evisu Shoes (free shipping)
Cheap Wholesale 4US Shoes
Cheap Wholesale Sebago Shoes (paypal payment)
(http://www.cntrade88.com/ )
Cheap Wholesale Supra Shoes
Cheap Wholesale Hight Converse Shoes (paypal payment)
(http://www.cntrade88.com/ )
Cheap Wholesale Coach Boots
Cheap Wholesale Coach Shoes
Women Christian Louboutin (paypal payment)
(http://www.cntrade88.com/ )
Cheap Wholesale Chanel Shoes
Cheap Wholesale Chanel Boots (free shipping)
Cheap Wholesale Bape Shoes
Cheap Wholesale Adidas Shoes (paypal payment)
(http://www.cntrade88.com/ )
Cheap Wholesale Adicolor (free shipping)
Cheap Wholesale Adidas 35TH (paypal payment)
(http://www.cntrade88.com/ )
Cheap Wholesale Adidas NBA
Cheap Wholesale Adidas Running (paypal payment)
(http://www.cntrade88.com/ )
Cheap Wholesale Adidas Y3
Cheap Wholesale Soccer Shoes (paypal payment)
(http://www.cntrade88.com/ )
------------------------------
Date: Mon, 07 Mar 2011 11:04:20 -0500
From: Charlton Wilbur <cwilbur@chromatico.net>
Subject: Re: contexts
Message-Id: <86oc5mex2z.fsf@mithril.chromatico.net>
>>>>> "UG" == Uri Guttman <uri@StemSystems.com> writes:
UG> as i said, i won't read it just because of his stupid comments
UG> on perl. you can knock perl (or ANY lang) for many legit reasons
UG> but to flame about it and show so little understanding of
UG> contexts says he is a dope and not worth reading.
Actually - in context, it's pretty funny. If you're familiar with C,
C++, Lisp, or Java, the languages that the author skewers before he gets
around to Perl, it's clear that he's far from serious. The OP's mistake
is in reading over-the-top commentary as if it were straight.
The core of the commentary is that flattening lists in Perl4 was a bad
decision because it makes hierarchical data structures difficult, that
objects and references seem like a bolted-on solution to people who are
accustomed to languages that are OO from the ground up, and that
contexts baffle people who are unaccustomed to them. Also, Perl has a
lot of edge cases where what code should do is not apparent or
intuitive. It's expressed in over-the-top hyperbole, but none of these
are statements I could reasonably disagree with.
Charlton
--
Charlton Wilbur
cwilbur@chromatico.net
------------------------------
Date: Mon, 07 Mar 2011 21:55:25 +0100
From: Mart van de Wege <mvdwege@mail.com>
Subject: Re: contexts
Message-Id: <86tyfe4pmq.fsf@gareth.avalon.lan>
Charlton Wilbur <cwilbur@chromatico.net> writes:
>>>>>> "UG" == Uri Guttman <uri@StemSystems.com> writes:
>
> UG> as i said, i won't read it just because of his stupid comments
> UG> on perl. you can knock perl (or ANY lang) for many legit reasons
> UG> but to flame about it and show so little understanding of
> UG> contexts says he is a dope and not worth reading.
>
> Actually - in context, it's pretty funny. If you're familiar with C,
> C++, Lisp, or Java, the languages that the author skewers before he gets
> around to Perl, it's clear that he's far from serious. The OP's mistake
> is in reading over-the-top commentary as if it were straight.
>
From reading the comments on his various blogs, most people seem to have
that problem with Steve Yegge.
I *like* reading the guy, but I recognise the obvious satire and
over-the-top humour; apparently this is not always clear to everyone.
Mart
--
"We will need a longer wall when the revolution comes."
--- AJS, quoting an uncertain source.
------------------------------
Date: Mon, 07 Mar 2011 03:33:33 +0800
From: jidanni@jidanni.org
Subject: efficiency of if ( my @a = /pattern/g ) { print "@a\n" }
Message-Id: <il0rme$vb4$1@news.datemas.de>
Gentlemen, is this the most efficient way to do this, or should one use
split() and more arrays and loops?
use strict;
use warnings FATAL => 'all';
while (<DATA>) {
if ( my @a = /BB|BM|MY|SG|TW|US/g ) { print "@a\n" }
}
__DATA__
BD BE BF BG BA BB WF BM BN BO BH BI BJ BT JM BV BW WS...
BD BE BF BG BA BB WF BN BO BH BI BJ BT JM BV BW WS BR...
BD WF BF BG BA BB BE BM BN BO BH BI BJ BT JM JO WS BS BY BZ...
PR FR GU IL KR VI CA JP IT US TW NZ AU GB BR IN NL IE MX ES...
------------------------------
Date: Sun, 6 Mar 2011 22:14:38 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: efficiency of if ( my @a = /pattern/g ) { print "@a\n" }
Message-Id: <u2ad48-53b.ln1@news.rtij.nl>
On Mon, 07 Mar 2011 03:33:33 +0800, jidanni wrote:
> Gentlemen, is this the most efficient way to do this, or should one use
> split() and more arrays and loops?
>
> use strict;
> use warnings FATAL => 'all';
> while (<DATA>) {
> if ( my @a = /BB|BM|MY|SG|TW|US/g ) { print "@a\n" }
> }
> __DATA__
> BD BE BF BG BA BB WF BM BN BO BH BI BJ BT JM BV BW WS... BD BE BF BG BA
> BB WF BN BO BH BI BJ BT JM BV BW WS BR... BD WF BF BG BA BB BE BM BN BO
> BH BI BJ BT JM JO WS BS BY BZ... PR FR GU IL KR VI CA JP IT US TW NZ AU
> GB BR IN NL IE MX ES...
Iff your input data is so clean as this, your way is tops.
M4
------------------------------
Date: Mon, 07 Mar 2011 11:11:51 -0600
From: "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid>
Subject: Re: efficiency of if ( my @a = /pattern/g ) { print "@a\n" }
Message-Id: <4d7511d7$0$9078$815e3792@news.qwest.net>
jidanni@jidanni.org wrote:
> Gentlemen, is this the most efficient way to do this, or should one use
> split() and more arrays and loops?
>
> use strict;
> use warnings FATAL => 'all';
> while (<DATA>) {
> if ( my @a = /BB|BM|MY|SG|TW|US/g ) { print "@a\n" }
> }
> __DATA__
> BD BE BF BG BA BB WF BM BN BO BH BI BJ BT JM BV BW WS...
> BD BE BF BG BA BB WF BN BO BH BI BJ BT JM BV BW WS BR...
> BD WF BF BG BA BB BE BM BN BO BH BI BJ BT JM JO WS BS BY BZ...
> PR FR GU IL KR VI CA JP IT US TW NZ AU GB BR IN NL IE MX ES...
You can answer that yourself by benchmarking (perldoc Bench)
other solutions. The following isn't exactly the same, since
it's looking for the exact values, instead of something
that might contain BB or BM or MY, etc. but looking at
your data, you're possibly after the exact value.
my %item = map{ $_ => 1 } qw( BB BM MY SG TW US );
while( <DATA> )
{
my @a;
for my $k ( split( / / ) )
{
push( @a, $k ) if $item{ $k };
}
print "@a\n" if @a;
}
That's 37% faster, on my machine.
If this is all your code is doing, then it would be good
to experiment a bit. If the code is doing many other things,
then it won't matter that much. e.g. A slight
optimization, depending on the input, would be to only
split up the values, if one exists in the line:
next unless /BB|BM|MY|SG|TW|US/;
or maybe using egrep might be better:
egrep 'BB|BM|MY|SG|TW|US' file | script.pl
Of course, if every line has one of those values, then
that's a useless thing to do.
If you're really doing this a lot or are just playing around
to find if something else might be faster, experiment a
bit with different solutions. The Bench module, provides
a means to measure different solutions, against each other.
------------------------------
Date: Mon, 07 Mar 2011 12:49:46 -0500
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: efficiency of if ( my @a = /pattern/g ) { print "@a\n" }
Message-Id: <87r5ain7lx.fsf@quad.sysarch.com>
>>>>> "JG" == J Gleixner <glex_no-spam@qwest-spam-no.invalid> writes:
JG> You can answer that yourself by benchmarking (perldoc Bench)
JG> other solutions. The following isn't exactly the same, since
JG> it's looking for the exact values, instead of something
JG> that might contain BB or BM or MY, etc. but looking at
JG> your data, you're possibly after the exact value.
it is Benchmark.
JG> my %item = map{ $_ => 1 } qw( BB BM MY SG TW US );
JG> while( <DATA> )
JG> {
JG> my @a;
JG> for my $k ( split( / / ) )
JG> {
JG> push( @a, $k ) if $item{ $k };
JG> }
JG> print "@a\n" if @a;
JG> }
JG> That's 37% faster, on my machine.
i was thinking a hash as well. alternation in regexes can be slow (some
optimizaions have been done recently though).
i wouldn't even do the split. i think it would be faster (i am not in
the mood to benchmark it) to do grab loop like this (untested). it saves
building up the list of tokens in each line loop.
while( $line = /(\w\w)/g ) {
next if $item{ $1 } ;
...
i forget the boolean test direction so if could be unless.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
------------------------------
Date: Mon, 07 Mar 2011 15:07:03 -0800
From: "John W. Krahn" <jwkrahn@example.com>
Subject: Re: efficiency of if ( my @a = /pattern/g ) { print "@a\n" }
Message-Id: <syddp.55459$195.11503@newsfe05.iad>
J. Gleixner wrote:
> jidanni@jidanni.org wrote:
>> Gentlemen, is this the most efficient way to do this, or should one use
>> split() and more arrays and loops?
>>
>> use strict;
>> use warnings FATAL => 'all';
>> while (<DATA>) {
>> if ( my @a = /BB|BM|MY|SG|TW|US/g ) { print "@a\n" }
>> }
>> __DATA__
>> BD BE BF BG BA BB WF BM BN BO BH BI BJ BT JM BV BW WS...
>> BD BE BF BG BA BB WF BN BO BH BI BJ BT JM BV BW WS BR...
>> BD WF BF BG BA BB BE BM BN BO BH BI BJ BT JM JO WS BS BY BZ...
>> PR FR GU IL KR VI CA JP IT US TW NZ AU GB BR IN NL IE MX ES...
>
> You can answer that yourself by benchmarking (perldoc Bench)
> other solutions. The following isn't exactly the same, since
> it's looking for the exact values, instead of something
> that might contain BB or BM or MY, etc. but looking at
> your data, you're possibly after the exact value.
>
> my %item = map{ $_ => 1 } qw( BB BM MY SG TW US );
> while( <DATA> )
> {
> my @a;
> for my $k ( split( / / ) )
Why split( / / ) and not just:
for my $k ( split )
Besides, split( / / ) won't remove the newline.
> {
> push( @a, $k ) if $item{ $k };
> }
> print "@a\n" if @a;
> }
>
> That's 37% faster, on my machine.
John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein
------------------------------
Date: Sun, 06 Mar 2011 21:28:20 -0500
From: Art Werschulz <agw@dsm.fordham.edu>
Subject: Re: going from CPAN to RPM
Message-Id: <2mipvvu0jf.fsf@sobolev.dsm.fordham.edu>
Hi.
Martijn Lievaart <m@rtij.nl.invlalid> writes:
> On Sun, 06 Mar 2011 11:01:54 -0500, Art Werschulz wrote:
>
>> Hi.
>>
>> "Peter J. Holzer" <hjp-usenet2@hjp.at> writes:
>>
>>> On Redhat, packages installed via RPM generally reside in vendor_perl,
>>> while packages installed via CPAN reside in site_perl.
>>
>> I'm not finding a vendor_perl subdirectory in /usr/lib/perl5.
>
> I do, both on CentOS (RHEL) and Fedora. Are you sure you're on (a) RedHat
> (derivative)?
I'm using Fedora 14.
When I print the contents of @INC, I get
/usr/local/lib/perl5
/usr/local/share/perl5
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl5
/usr/share/perl5
/usr/local/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
/usr/local/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
/usr/local/lib/perl5/site_perl/5.10.0
/usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl
/usr/lib/perl5/site_perl
But:
$ ls /usr/lib/perl5/vendor_perl
ls: cannot access /usr/lib/perl5/vendor_perl: No such file or directory
--
Art Werschulz (8-{)} "Metaphors be with you." -- bumper sticker
GCS/M (GAT): d? -p+ c++ l++ u+ P++ e--- m* s n+ h f g+ w+ t+ r-
Net: agw@dsm.fordham.edu http://www.dsm.fordham.edu/~agw
Phone: Fordham U. (212) 636-6325, Columbia U. (646) 775-6035
------------------------------
Date: Mon, 07 Mar 2011 09:28:47 -0600
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: Hashes are good, but not good enough.
Message-Id: <87oc5n3q6o.fsf@lifelogs.com>
On Sat, 5 Mar 2011 18:25:10 +0100 "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
PJH> On 2011-02-22 15:02, Ted Zlatanov <tzz@lifelogs.com> wrote:
>> Sure, once it's loaded, it's fast. But the startup time is atrocious
PJH> The startup time is O(n). For most other data structures it's worse (for
PJH> trees it's O(n * log n), for example). The insertion time for each
PJH> element may be a bit long (on my test system, inserting a hash member
PJH> with a short key takes about 2µs - 10 times as long as inserting an
PJH> array member), but that's the same for small hashes as for large hashes,
PJH> so it's not an argument against large hashes specifically.
PJH> Of course loading a hash with 100 million members into memory and then
PJH> doing a single lookup would be silly. But again that's true for any data
PJH> structure. The time to build the data structure must amortize itself
PJH> over the lifetime of the structure. So a huge hash makes only sense if
PJH> the time to build it is small compared to the time saved by doing fast
PJH> lookups. Adding 100 million records into a database takes even longer
PJH> (a *lot* longer), but if it lives long enough it may eventually amortize
PJH> itself.
A tiny MySQL database will happily chug through the same amount of
records as a 100GB hash, have very cheap startup time once the records
are in persistent storage, will be accessible from multiple processes
and network locations, and have other nice properties like better
queries and deterministic enumeration. I don't buy your argument, which
boils down to "fast lookups are worth a lot of trouble, like locking
yourself down to a single process, starting up slowly every time, and
using up a lot of memory." It doesn't make technical or business sense
to propose it as a solution unless you're certain there is no need for a
database and anticipate no need for long-term maintenance or extensions.
PJH> You can also walk the whole hash in hash order. If you have different
PJH> requirements, don't use a hash. But again that's mostly independent of
PJH> the hash size.
Well, no. The hash size can make it impossible to construct a list of
the keys if memory runs out. So it certainly matters and you have to be
VERY careful to avoid making a copy of the keys. A trie, by comparison,
can be comfortably walked without using much memory.
PJH> You forgot the that it may also hurt a lot. So the advice to always
PJH> partition large hashes without looking at the specific use case may
PJH> backfire badly.
I think all along I've shown great willingness to look at specific use cases.
>> This is similar to the case where I ended up with Sybase because the
>> Perl-based approach didn't scale.
PJH> That doesn't have much to do with the scalability of Perl hashes (within
PJH> RAM limits). You delegated a task to a program written for that task in
PJH> C instead of writing it in Perl and discovered that it was faster at it
PJH> than your Perl program. Big surprise.
But Peter, we're talking about how to approach a problem here. The tool
matters, certainly: you want to delegate to tools written for managing
large amounts of data. Perl hashes are not such a tool. They are a
general-purpose storage structure.
PJH> Other than that I suspect that you just ran out of RAM. Perl is a memory
PJH> hog and adds a large overhead. But again, that's perl's weakness in
PJH> general and not specific to hashes (much less large hashes).
I think you'll find most hash table implementations in any language have
that problem. They must overallocate by quite a bit, for instance, to
amortize the malloc costs.
PJH> There are use cases where hashes are appropriate and some where
PJH> they are not. This depends on a lot of factors but the the hash
PJH> size is of relatively small importance.
I agree as long as the resulting hash size is manageable in terms of the
specific needs of the user. But when it becomes a problem (AKA the
point at which the CompSci major calls the CompEng major to complain the
system is slow) it's a *big* problem. So I'm very suspicious of large
hash tables.
PJH> In fact, hashes scale better than almost any other data structure.
That's definitely not true. They make terrible arrays ;)
PJH> I strongly disagree with "small hashes good, large hashes bad".
I don't think I said that. I use verbs, not grunts.
Ted
------------------------------
Date: Mon, 7 Mar 2011 17:00:36 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Hashes are good, but not good enough.
Message-Id: <slrnina094.mq.hjp-usenet2@hrunkner.hjp.at>
On 2011-03-07 15:28, Ted Zlatanov <tzz@lifelogs.com> wrote:
> Well, no. The hash size can make it impossible to construct a list of
> the keys if memory runs out.
How often do I have to repeat "as long as it fits into memory"? I don't
think it makes sense to continue this discussion if you don't read what
I write.
hp
------------------------------
Date: Mon, 07 Mar 2011 13:07:58 -0600
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: Hashes are good, but not good enough.
Message-Id: <87bp1mzr3l.fsf@lifelogs.com>
On Mon, 7 Mar 2011 17:00:36 +0100 "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
PJH> On 2011-03-07 15:28, Ted Zlatanov <tzz@lifelogs.com> wrote:
>> Well, no. The hash size can make it impossible to construct a list of
>> the keys if memory runs out.
PJH> How often do I have to repeat "as long as it fits into memory"? I don't
PJH> think it makes sense to continue this discussion if you don't read what
PJH> I write.
By "it" you meant the hash itself, right? I am saying that the list of
keys may be large enough that constructing it for any purpose will make
you run out of memory even if the hash fits by itself.
Also you haven't mentioned memory overallocation and hash resizing.
These performance optimizations make it hard to judge exactly when the
hash will stop fitting in memory, with unfortunate consequences.
Ted
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3310
***************************************