[28199] in Perl-Users-Digest
Perl-Users Digest, Issue: 9563 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Aug 4 18:05:41 2006
Date: Fri, 4 Aug 2006 15:05:04 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Fri, 4 Aug 2006 Volume: 10 Number: 9563
Today's topics:
Re: 5 uncommonly known things about Perl <mgarrish@gmail.com>
Re: How probably not to hand over a variable from one p <tzz@lifelogs.com>
Re: How to "convert" a string into a variable name? <uri@stemsystems.com>
Re: Perl hash of hash efficiency. <yekasi@gmail.com>
Re: Perl hash of hash efficiency. <yekasi@gmail.com>
Re: Perl hash of hash efficiency. <1usa@llenroc.ude.invalid>
Re: Perl hash of hash efficiency. <someone@example.com>
Reading HTTP response body that is gzip'd *and* in UTF- <jsm@jmarshall.com>
Re: Recursion anno4000@radom.zrz.tu-berlin.de
Re: Recursion <tzz@lifelogs.com>
Re: Recursion <koko_loko_0@yahoo.co.uk>
Re: Recursion <koko_loko_0@yahoo.co.uk>
Re: Recursion <koko_loko_0@yahoo.co.uk>
Re: Regex...HTML::Parser...Getting webpage data? <wbresson@gmail.com>
Re: Regex...HTML::Parser...Getting webpage data? <mritty@gmail.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 4 Aug 2006 12:45:10 -0700
From: "Matt Garrish" <mgarrish@gmail.com>
Subject: Re: 5 uncommonly known things about Perl
Message-Id: <1154720709.983501.309280@m79g2000cwm.googlegroups.com>
David H. Adler wrote:
> On 2006-08-04, Matt Garrish <mgarrish@gmail.com> wrote:
> >
> > Matt Garrish wrote:
> >
> >> usenet@DavidFilmer.com wrote:
> >>
> >> > Taher wrote:
> >> > > I was once asked to name some uncommonly known things about Perl
> >> >
> >> > Perl comes with documentation (including a FAQ). This doesn't seem to
> >> > be commonly known...
> >> >
> >>
> >> And isn't spelled PERL
> >
> > And as I think about it that Perl != CGI. Of course, a lot of the
> > people who think Perl == CGI tend to write Perl as PERL, which leads me
> > to think they're acronym dyslexic.
>
> I tend to not put it quite that way:
>
> ~ 0:48:00% perl -e 'print "yikes!\n" if CGI == Perl;'
> yikes!
>
> I'll leave it as an exercise for the reader to figure out why that is.
Well, that leads us to that other uncommonly known fact, that all
scripts should use the warnings and strictures pragmas... :)
Matt
------------------------------
Date: Fri, 04 Aug 2006 15:57:22 -0400
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: How probably not to hand over a variable from one perl script to another
Message-Id: <g69y7u46zq5.fsf@CN1374059D0130.kendall.corp.akamai.com>
On 4 Aug 2006, nospam-abuse@ilyaz.org wrote:
>> Data::Dumper may also save data you don't want saved, e.g. a password
>> in a data structure you didn't know you were dumping.
>>
>> Pass data with a data language. I suggested YAML or XML - there are
>> many others.
>
> Ted, I think the last objection is misplaced. Data language wouldn't
> help if you have some data you don't want saved...
True. I think the advice should be, "think before you save data, to
make sure you're not saving things you shouldn't" - regardless of the
data saving method or language. Thanks for catching that.
> And, of course, now there are builtin *quick* serialization methods.
>
> perldoc Storable
I remember how, at a previous job, some genius decided it would be a
good idea to freeze() complex structures into a database column. I
didn't have to deal with it, fortunately, but I can imagine the fun.
Anyhow, definitely for speed I would go with Storable, but a neutral
data language is generally better otherwise.
Ted
------------------------------
Date: Fri, 04 Aug 2006 17:28:53 -0400
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: How to "convert" a string into a variable name?
Message-Id: <x77j1otckq.fsf@mail.sysarch.com>
>>>>> "p" == perlistpaul <betterdie@gmail.com> writes:
p> You shouldn't practice the use of symbolic reference, if you use under
p> 'strict' it will complain.
p> Why not using symbolic reference vs. soft reference.
those are names for the same thing, so why the vs?
p> You check yourself why not the use it or what reason?
obviously english is not your primary language but i can't figure out
what you mean there.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
------------------------------
Date: 4 Aug 2006 14:36:56 -0700
From: "tak" <yekasi@gmail.com>
Subject: Re: Perl hash of hash efficiency.
Message-Id: <1154727416.683423.205940@i3g2000cwc.googlegroups.com>
Hello,
Here is the partial code that reads the data from a txt file.
open(mainfile) or die("Could not open master file
'$mainfile'.");
foreach my $line (<mainfile>) {
$i++;
chomp($line);
my @values = split(/\|/, $line);
$Master_Hash{$values[3]} = \@values;
if ($i % 10000 == 0) {
#print ("loaded $i lines in hash so far - last entry was:
$values[3] \n");
my $size = keys(%Master_Hash);
my $scalarSize = scalar %Master_Hash;
print "Loaded $i entries - #ofKeys: $size - ScalarSize:
$scalarSize\n";
}
}
And each line is about 420 characters.
$ wc -l -L 07*file
238348 449 07302006file
After it finishes loading these 240k lines into the hash - the xp task
manager reports 1.91 GB of usage.
-Tak
------------------------------
Date: 4 Aug 2006 14:39:57 -0700
From: "tak" <yekasi@gmail.com>
Subject: Re: Perl hash of hash efficiency.
Message-Id: <1154727597.822657.109240@m73g2000cwd.googlegroups.com>
Oh, I should add that, each line of the 240k lines are about 420
characters, and it becomes 97 elements in the array (after the split)
that is being pushed into the hash.
tak wrote:
> Hello,
>
> Here is the partial code that reads the data from a txt file.
>
> open(mainfile) or die("Could not open master file
> '$mainfile'.");
> foreach my $line (<mainfile>) {
> $i++;
> chomp($line);
> my @values = split(/\|/, $line);
>
> $Master_Hash{$values[3]} = \@values;
>
> if ($i % 10000 == 0) {
> #print ("loaded $i lines in hash so far - last entry was:
> $values[3] \n");
> my $size = keys(%Master_Hash);
> my $scalarSize = scalar %Master_Hash;
> print "Loaded $i entries - #ofKeys: $size - ScalarSize:
> $scalarSize\n";
> }
> }
>
> And each line is about 420 characters.
>
> $ wc -l -L 07*file
> 238348 449 07302006file
>
> After it finishes loading these 240k lines into the hash - the xp task
> manager reports 1.91 GB of usage.
>
> -Tak
------------------------------
Date: Fri, 04 Aug 2006 21:52:50 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Perl hash of hash efficiency.
Message-Id: <Xns9815B5FCF3B52asu1cornelledu@127.0.0.1>
"tak" <yekasi@gmail.com> wrote in news:1154727416.683423.205940
@i3g2000cwc.googlegroups.com:
> Here is the partial code that reads the data from a txt file.
Please read the posting guidelines for this group, especially the sections
on quoting and posting code (along with data).
use strict;
use warnings;
missing.
> open(mainfile)
Wot?
or die("Could not open master file
> '$mainfile'.");
> foreach my $line (<mainfile>) {
Haven't you read any of the responses??? I pointed this out as a distinct
possibility. You are slurping the entire file, just to process it line by
line.
Given this snippent, I presume the rest of your code is just as silly, and
it is no shock that you run out of memory.
Please read the posting guidelines. Especially the section about posting
code.
You have just wasted our time trying to guess what your problem was. I
won't be seeing you.
Sinan
--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)
comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
------------------------------
Date: Fri, 04 Aug 2006 21:56:48 GMT
From: "John W. Krahn" <someone@example.com>
Subject: Re: Perl hash of hash efficiency.
Message-Id: <AoPAg.138500$A8.103036@clgrps12>
tak wrote:
>
> Here is the partial code that reads the data from a txt file.
>
> open(mainfile) or die("Could not open master file
> '$mainfile'.");
> foreach my $line (<mainfile>) {
> $i++;
> chomp($line);
> my @values = split(/\|/, $line);
>
> $Master_Hash{$values[3]} = \@values;
>
> if ($i % 10000 == 0) {
> #print ("loaded $i lines in hash so far - last entry was:
> $values[3] \n");
> my $size = keys(%Master_Hash);
> my $scalarSize = scalar %Master_Hash;
> print "Loaded $i entries - #ofKeys: $size - ScalarSize:
> $scalarSize\n";
> }
> }
>
> And each line is about 420 characters.
>
> $ wc -l -L 07*file
> 238348 449 07302006file
>
> After it finishes loading these 240k lines into the hash - the xp task
> manager reports 1.91 GB of usage.
foreach operates on lists so the contents of $mainfile are first loaded into a
list in memory before the contents are processed. Use a while loop instead so
that only one line is loaded into memory at a time. So change:
foreach my $line (<mainfile>) {
To:
while ( my $line = <mainfile> ) {
John
--
use Perl;
program
fulfillment
------------------------------
Date: Fri, 4 Aug 2006 11:36:24 -0700
From: James Marshall <jsm@jmarshall.com>
Subject: Reading HTTP response body that is gzip'd *and* in UTF-8
Message-Id: <20060804102410.P33234@jmarshall.com>
I'm writing an HTTP client that handles gzip'd content as well as UTF-8
text, including when a response body is both gzip'd and in UTF-8.
I'm newish to both compression and PerlIO layers, so I'd like a second
opinion from someone who knows them better than I do. Does the code below
look correct? The goal is to end up with the uncompressed body in $body,
and interpreted as UTF-8 if identified as such by "charset".
I appreciate not wanting to use utf8::upgrade() ; is there a better way to
handle it in this case, or is this one of those cases where it's
legitimately needed?
Finally, does anyone know if Compress::Zlib::memGzip() handles UTF-8 input
correctly, or do I need to "utf8::downgrade($body)" before compressing it?
=======================================================
use Compress::Zlib ;
# Assume S is the socket, and $is_gzipped and $is_utf8 are set correctly
# from the HTTP response headers, which have just been read from S.
if ($is_gzipped) {
$body= &read_full_body(S) ;
$body= Compress::Zlib::memGunzip($body) ;
if ($is_utf8) {
utf8::upgrade($body) ;
}
} else { # not gzip'd
if ($is_utf8) {
binmode(S, ':encoding(utf8)') ;
}
$body= &read_full_body(S) ;
}
# $body should now contain response body in workable format.
.
.
.
$body= Compress::Zlib::memGzip($body) ;
print $status, $headers, $body ;
=======================================================
Thanks for any thoughts, I appreciate it!
James
............................................................................
James Marshall james@jmarshall.com Berkeley, CA @}-'-,--
"Teach people what you know."
............................................................................
------------------------------
Date: 4 Aug 2006 18:14:45 GMT
From: anno4000@radom.zrz.tu-berlin.de
Subject: Re: Recursion
Message-Id: <4jhh4lF7mungU1@news.dfncis.de>
kokolo <koko_loko_0@yahoo.co.uk> wrote in comp.lang.perl.misc:
>
> <xhoster@gmail.com> wrote in message
> news:20060804115842.187$8B@newsreader.com...
> > It also gives the wrong answer!
> >
> > > I wonder how much it can be optimized and what are bottlenecks here:
> >
> > Don't wonder, profile! Devel::SmallProf
> >
> > >
> > > sub qs{
> > > my $left=shift;
> > > my $right=shift;
> > > my @smaller;
> > > my @bigger;
> > >
> > > for ($i=$left;$i<$right;$i++){
> > >
> > > if ($array[$i] <= $array[$right]) {push @smaller,$array[$i]}
> > > else {push @bigger,$array[$i]}
> > > }
> > >
> > > $pivot = $left + @smaller;
> > > $array[$pivot] = $array[$right];
> >
> > Your array now has two of whatever was in $array[$right], and none of
> > whatever was in $array[$pivot].
> >
> > Xho
>
> Damn, I can't get it. It is obviously wrong but it works 3-4 times out of 5
> and it tricked me. It looked suspisious to me as there were too many
> repeating numbers .
> As for what you mentioned about $pivot, $array[$pivot] is saved either in
> @smaller or @bigger.
> I need to re-think it as I cannot figure out why it works if I extract
>
> if ($pivot-1>$left){qs($left,$pivot-1)}
> if ($pivot+1<$right){qs($pivot+1,$right)}
>
> from the "if" statements and put them at the end of the sub.
I think most of all you need to understand what it means to manipulate
an array in place. You're still copying partial arrays all over the
place.
The key is to learn to swap elements of an array(ref). In Perl, the
idiom to swap the elements at $i and $k in an arrayref $l is
@$l[ $i, $k] = @$l[ $k, $i];
Instead of copying list elements back and forth, find a strategy
to move the elements into place by always swapping them with another.
Work on (parts of) the original list only, don't create "auxiliary"
arrays. Standard implementations of quicksort in other languages
also work like that.
Anno
------------------------------
Date: Fri, 04 Aug 2006 16:10:39 -0400
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: Recursion
Message-Id: <g69r6zw6z40.fsf@CN1374059D0130.kendall.corp.akamai.com>
On 4 Aug 2006, koko_loko_0@yahoo.co.uk wrote:
> I made it with sending $left and $right to the subroutine and
> sorting the original array between these limits.You can take a look
> at the other post for the code.
OK. I see you tried to make it work, but it's incorrect (as pointed
out by Xho and Anno). I won't add to their answers, except to point
out that perhaps Quick Sort is a little ambitious, and you should try
implementing a Bubble Sort first. QS will always be there, but if you
keep trying to solve a hard problem without progressing to that level
natureally, you may become too frustrated. BS is a good start,
especially since it involves swapping entries in an array just like
QS, except of course the algorithm is a lot simpler.
> Is there still a need to send the array reference? Even in your
> example is it a benefit if you work with the reference and not with
> the original array?
If you have a choice, always write your functions so they can work
without awareness of the rest of your program. In the context of this
thread, write qs() so it gets all data from its parameters, including
the array it has to sort. This is the essence of reusable, modular
code (modules extend the idea, but it's still the driving force).
Ted
------------------------------
Date: Fri, 4 Aug 2006 21:39:46 +0100
From: "kokolo" <koko_loko_0@yahoo.co.uk>
Subject: Re: Recursion
Message-Id: <eb0bam$ubp$1@news.al.sw.ericsson.se>
<anno4000@radom.zrz.tu-berlin.de> wrote in message
news:4jhh4lF7mungU1@news.dfncis.de...
> I think most of all you need to understand what it means to manipulate
> an array in place. You're still copying partial arrays all over the
> place.
>
> The key is to learn to swap elements of an array(ref). In Perl, the
> idiom to swap the elements at $i and $k in an arrayref $l is
>
> @$l[ $i, $k] = @$l[ $k, $i];
>
> Instead of copying list elements back and forth, find a strategy
> to move the elements into place by always swapping them with another.
> Work on (parts of) the original list only, don't create "auxiliary"
> arrays. Standard implementations of quicksort in other languages
> also work like that.
>
Thanks a lot for the advice. So, creating temporary arrays by "push" and
then copying back is much slower
than direct swapping although there will be more loops and conditions?
I'll re-write it to work with in-place swapping and compare it. When I
wrote the first version, it looked very nice
and the faster it gets the uglier it is :)
kokolo
------------------------------
Date: Fri, 4 Aug 2006 21:42:23 +0100
From: "kokolo" <koko_loko_0@yahoo.co.uk>
Subject: Re: Recursion
Message-Id: <eb0bfi$uhn$1@news.al.sw.ericsson.se>
"Ted Zlatanov" <tzz@lifelogs.com> wrote in message
news:g69r6zw6z40.fsf@CN1374059D0130.kendall.corp.akamai.com...
> OK. I see you tried to make it work, but it's incorrect (as pointed
> out by Xho and Anno). I won't add to their answers, except to point
> out that perhaps Quick Sort is a little ambitious, and you should try
> implementing a Bubble Sort first. QS will always be there, but if you
> keep trying to solve a hard problem without progressing to that level
> natureally, you may become too frustrated. BS is a good start,
> especially since it involves swapping entries in an array just like
> QS, except of course the algorithm is a lot simpler.
Actually, I did write BS, Fast BS, BiBS,Fast BiBS, Insertion and Shell
sorts.
And by saying that I don't think I did anything special as it was just
exercising my weak Perl and all these algorithms are well known
and already written in all languages. I just wanted to do it by myself.
After that I wanted to write QS to see how much faster it is.
And it is way faster but not as much as I expected I believe mostly due to
my bad coding and approach.
But while improving it I'm learning what are bottlenecks in the program and
I realized by now
that unnecessary copying thus consuming too much memory is a major slowdown
although keeps the code nice and simple.
For instance, I'd like to know what is the slowest part of my code and how
it should be re-written.
Thanks for advices.
kokolo
------------------------------
Date: Fri, 4 Aug 2006 22:19:48 +0100
From: "kokolo" <koko_loko_0@yahoo.co.uk>
Subject: Re: Recursion
Message-Id: <eb0dln$v7j$1@news.al.sw.ericsson.se>
<anno4000@radom.zrz.tu-berlin.de> wrote in message
news:4jhh4lF7mungU1@news.dfncis.de...
>
> Instead of copying list elements back and forth, find a strategy
> to move the elements into place by always swapping them with another.
> Work on (parts of) the original list only, don't create "auxiliary"
> arrays. Standard implementations of quicksort in other languages
> also work like that.
>
> Anno
I found an implementation without sub-arrays:
http://www.thescripts.com/forum/thread49795.html
changedit so it works and tried it for 99999 elements.
It did it in13 seconds while my code does it in 14.
How quick it can get at all?
kokolo
------------------------------
Date: 4 Aug 2006 12:13:28 -0700
From: "Wesley Bresson" <wbresson@gmail.com>
Subject: Re: Regex...HTML::Parser...Getting webpage data?
Message-Id: <1154718807.959738.244660@m73g2000cwd.googlegroups.com>
Jim Gibson wrote:
> In article <1154706298.589975.262540@s13g2000cwa.googlegroups.com>,
> Wesley Bresson <wbresson@gmail.com> wrote:
>
> > xhoster@gmail.com wrote:
>
> > > #!/usr/bin/perl
> > > use strict;
> > > use warnings;
> > > $/=undef; # same as the -0777 command line
> > > $_=<>; # slurp
> > > s/\s+/ /g;
> > > /2006 1oz Silver American Eagles.+?20 - 99.*?\$(\d{1,5}\.\d\d)/
> > > and print "$1\n";
> > > __END__
> > >
> > > --
> > > -------------------- http://NewsReader.Com/ --------------------
> > > Usenet Newsgroup Service $9.95/Month 30GB
> >
> > Thanks, I can see that works now. Now, hang in with a newbie, but I'm
> > trying to understand why exactly your code works.
> >
> >
> > $/=undef --- inputs the whole file instead of one line by one line
> > correct ? Why is it needed ?
>
> To read the entire file at once and process it, after getting rid of
> newlines, since the text you are looking for may be on more than one
> line.
>
> >
> > $_=<> --not sure....is this what inputs the file off of the command
> > line ?
>
> The <> is the input operator and returns the result of a read-line
> operation. Since $/ is undef, your input file is treated as one big
> line, and the whole file ends up as a string in the $_ variable.
>
> >
> > s/\s+/ /g; --not sure...is this taking out the white spaces ? If so why
> > is it needed ?
>
> It is changing all occurences of whitespace (\s) to a single space,
> concatenating any successive whitespace characters (\s+) into one space
> character. Since newlines (\n) are whitespace, this also removes all
> newlines from your string and you can use space characters in your
> regular expression.
Thanks people, I'm slowly getting it, examples help a lot compared to
hard to read documentation. If anyone knows of any good regex
documentation or books listing all of the options and varibles let me
know and I'd appriciate it. I'm working on fully understanding how this
code works for this page first and then I'll move onto some others of
my own and see how I do there, thanks for you help.
------------------------------
Date: 4 Aug 2006 12:20:55 -0700
From: "Paul Lalli" <mritty@gmail.com>
Subject: Re: Regex...HTML::Parser...Getting webpage data?
Message-Id: <1154719255.683564.304640@p79g2000cwp.googlegroups.com>
Wesley Bresson wrote:
> Thanks people, I'm slowly getting it, examples help a lot compared to
> hard to read documentation. If anyone knows of any good regex
> documentation or books listing all of the options and varibles let me
> know and I'd appriciate it. I'm working on fully understanding how this
> code works for this page first and then I'll move onto some others of
> my own and see how I do there, thanks for you help.
perldoc perlretut
perldoc perlre
perldoc perlreref
(in that order)
If you don't like the built-in documentation for some reason, I suggest
Mastering Regular Expressions (
http://www.oreilly.com/catalog/regex3/index.html ). Note that it
covers more than just Perl regular expressions...
Paul Lalli
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 9563
***************************************