[31786] in Perl-Users-Digest
Perl-Users Digest, Issue: 3049 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Jul 28 18:09:25 2010
Date: Wed, 28 Jul 2010 15:09:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Wed, 28 Jul 2010 Volume: 11 Number: 3049
Today's topics:
Re: 4D hash iteration <ben@morrow.me.uk>
Re: 4D hash iteration sln@netherlands.com
Re: 4D hash iteration <jurgenex@hotmail.com>
Re: 4D hash iteration <hjp-usenet2@hjp.at>
Re: Confusion about the smart matching operator <klaus03@gmail.com>
Re: FAQ 5.13 How can I open a filehandle to a string? <hjp-usenet2@hjp.at>
Re: FAQ 5.13 How can I open a filehandle to a string? <justin.1007@purestblue.com>
If Perl is compiled on a 32-bit system, and the system <usenet@davidfilmer.com>
Re: Speed of reading some MB of data using qx(...) <w.c.humann@arcor.de>
Re: Speed of reading some MB of data using qx(...) <hjp-usenet2@hjp.at>
Re: XML::Simple - Processing Query <janedunnie@gmail.com>
Re: XML::Simple - Processing Query <ben@morrow.me.uk>
Re: XML::Simple - Processing Query <helius@gmail.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Wed, 28 Jul 2010 11:30:03 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: 4D hash iteration
Message-Id: <bed5i7-vn1.ln1@osiris.mauzo.dyndns.org>
Quoth "Peter J. Holzer" <hjp-usenet2@hjp.at>:
> On 2010-07-28 00:36, Jürgen Exner <jurgenex@hotmail.com> wrote:
> > "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
> >>On 2010-07-27 14:33, Jürgen Exner <jurgenex@hotmail.com> wrote:
> >>> Michael <a@a.com> wrote:
> >>>>foreach $a(...)
> >>>>{
> >>>> foreach $b(...)
> >>>> {
> [...]
> >>>> }
> >>>>}
> >>>>
> >>>>How to do this? What should be in braces (...)?
> >>>
> >>> Maybe something trivial like
> >>>
> >>> keys %a
> >>> keys %b
> >>> keys %c
> >>> keys %d
> >>
> >>Maybe. But where do %a, %b, %c and %d come from?
> >
> > Maybe from
> > %a = %hash
> > %b = $a{$a}
> > %c = $b{$b}
> > %d=$c{$c}
>
> Make that
>
> %a = %hash
> %b = %{ $a{$a} }
> %c = %{ $b{$b} }
> %d = %{ $c{$c} }
>
> and it will actually work.
For some value of 'work', depending on what the for loop does in the
end. Much better would be
for my $a (keys %hash) {
for my $b (keys %{$hash{$a}}) {
but that's really ugly at the bottom; better still would be
while (my ($a, $ah) = each %hash) {
while (my ($b, $bh) = each %$bh) {
Just use a module.
Ben
------------------------------
Date: Wed, 28 Jul 2010 06:19:44 -0700
From: sln@netherlands.com
Subject: Re: 4D hash iteration
Message-Id: <bfb056lb0r18urd16sdebgd3oib0slkbmr@4ax.com>
On Tue, 27 Jul 2010 22:30:23 -0700, "John W. Krahn" <jwkrahn@example.com> wrote:
>Michael wrote:
>> I need a construction like this:
>>
>> foreach $a(...)
>> {
>> foreach $b(...)
>> {
>> foreach $c(...)
>> {
>> foreach $d(...)
>> {
>> $myval=$hash{$a}{$b}{$c}{$d};
>> ...
>> }
>> }
>> }
>> }
>>
>> How to do this? What should be in braces (...)?
>
>foreach my $a ( values %hash ) {
^
and if $a isin't a hash reference ?
> foreach my $b ( values %$a ) {
-sln
------------------------------
Date: Wed, 28 Jul 2010 06:34:50 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: 4D hash iteration
Message-Id: <3cc05693139s1eha3juu2mr5ke0kclvmu1@4ax.com>
"Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>Please make some effort to ensure that your postings are correct and
>include all the relevant information.
Why? After all I can be certain that _YOU_ will never fail to correct
any sloppyness on my part.
jue
------------------------------
Date: Wed, 28 Jul 2010 17:58:20 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: 4D hash iteration
Message-Id: <slrni50kss.sc1.hjp-usenet2@hrunkner.hjp.at>
On 2010-07-28 13:34, Jürgen Exner <jurgenex@hotmail.com> wrote:
> "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>>Please make some effort to ensure that your postings are correct and
>>include all the relevant information.
>
> Why?
If the answer to that isn't obvious ...
> After all I can be certain that _YOU_ will never fail to correct any
> sloppyness on my part.
Don't be too sure. I do that only in some cases and the frequency is
decreasing.
hp
------------------------------
Date: Wed, 28 Jul 2010 05:15:09 -0700 (PDT)
From: Klaus <klaus03@gmail.com>
Subject: Re: Confusion about the smart matching operator
Message-Id: <9ae58e82-337d-4362-a856-49e9506dfbe3@w12g2000yqj.googlegroups.com>
On 28 juil, 10:48, "Peter J. Holzer" <hjp-usen...@hjp.at> wrote:
> On 2010-07-28 07:03, Klaus <klau...@gmail.com> wrote:
>
> > On 24 juil, 08:38, Ilya Zakharevich <nospam-ab...@ilyaz.org> wrote:
> >> On 2010-07-23, jl_p...@hotmail.com <jl_p...@hotmail.com> wrote:
>
> >> > =A0 =A0I find this a bit counter-intuitive.
>
> >> There is nothing "a bit counter-intuitive" about the smart matching
> >> operation. =A0It is just *absolutely useless*, since there is no
> >> humanly-possible way to predict what it would do.
>
> > I perfectly agree.
>
> I haven't tried to use it in anger yet (as mentioned in another posting,
> I have still too many machines running 5.8.x), but from reading the docs
> I tend to agree: It's way too complicated and I don't think I can
> remember that stuff. It is of course possible that those rules exactly
> match my intuition, but somehow I doubt it.
>
> > Another feature in smart matching that is counter-intuitive/useless
> > where there is no humanly possible way to predict what it would do is:
>
> > the rule that if the lefthand side of a smartmatch is a number and the
> > righthand side is a string that *looks like a number*, then that
> > string is treated like a number.
>
> > First of all, it is impossible in Perl 5 (due to dualvars) to see
> > whether or not a variable contains a number or not.
>
> Right.
>
> > Secondly, the rule whether or not a string looks like a number is not
> > straight forward:
>
> What's not straightforward about that? Except maybe for leading and
> trailing whitespace the results are what I expected.
Agreed, the rule as such whether or not a string looks like a number
is straight forward.
However, what trips me up with smartmatching is the combination of
numbers on the lefthand side combined with what looks like a number on
the righthand side.
Here is my (admittedly contrived) example:
******************************
use strict;
use warnings;
use 5.010;
my $val =3D ' 3';
checkvalue($val);
my $formatted =3D sprintf '%6.2f', $val;
checkvalue($val);
sub checkvalue {
if ($_[0] ~~ '3') { say "there is no space"; }
elsif ($_[0] ~~ ' 3') { say "there is one space"; }
elsif ($_[0] ~~ ' 3') { say "there are two spaces"; }
elsif ($_[0] ~~ ' 3') { say "there are three spaces"; }
else { say "I don't know what to say..."; }
}
******************************
Here is the output:
******************************
there are two spaces
there is no space
******************************
(please note that for exactly the same subroutine call
checkvalue($val); we get different output, depending on whether $val
has been part of an sprintf-call or not)
The thing that annoys me here is that each time I use smartmatches
with stringliterals on the righthand side, I always have to think
whether or not the string looks like a number. (in which case I should
better use the old "eq" instead of "~~")
Which leads me to the conclusion that I better should use the old "eq"
in all cases --> that also includes cases like $val ~~ ['3', ' 3', '
3', ' 3'] --> that would also better be written as $val eq '3' or
$val eq ' 3' or $val eq ' 3' or $val eq ' 3'.
Final verdict: Smartmatching works as expected in 99.9999% of all
cases. If you are concerned about 0.00001% of the cases (such as $val
~~ ['3', ' 3', ' 3', ' 3']), then better stick with the old "eq"
and let others debug their own smartmatching code.
--
Klaus
------------------------------
Date: Wed, 28 Jul 2010 13:53:25 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: FAQ 5.13 How can I open a filehandle to a string?
Message-Id: <slrni506hn.m5c.hjp-usenet2@hrunkner.hjp.at>
On 2010-07-28 10:04, Justin C <justin.1007@purestblue.com> wrote:
> On 2010-07-27, PerlFAQ Server <brian@theperlreview.com> wrote:
>> 5.13: How can I open a filehandle to a string?
>>
>> (contributed by Peter J. Holzer, hjp-usenet2@hjp.at)
>>
>> Since Perl 5.8.0 a file handle referring to a string can be created by
>> calling open with a reference to that string instead of the filename.
>> This file handle can then be used to read from or write to the string:
>>
>> open(my $fh, '>', \$string) or die "Could not open string for writing";
>> print $fh "foo\n";
>> print $fh "bar\n"; # $string now contains "foo\nbar\n"
>>
>> open(my $fh, '<', \$string) or die "Could not open string for reading";
>> my $x = <$fh>; # $x now contains "foo\n"
>
> I can see how it works, but I wonder in what circumstances one might
> want to do this?
Sometimes you have a module which is intended to read from a file. But
you have the data already in memory. Of course you could write it to a
temporary file and then invoke the module to read it back. But that's
not very elegant - much better if you can tell it to read from a scalar.
Same thing the other way: You have a module which can only write to a
file but you need the output in a variable.
I regularly use this in test scripts.
hp
------------------------------
Date: Wed, 28 Jul 2010 14:23:23 +0100
From: Justin C <justin.1007@purestblue.com>
Subject: Re: FAQ 5.13 How can I open a filehandle to a string?
Message-Id: <bjn5i7-qd5.ln1@zem.masonsmusic.co.uk>
On 2010-07-28, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
> On 2010-07-28 10:04, Justin C <justin.1007@purestblue.com> wrote:
>> On 2010-07-27, PerlFAQ Server <brian@theperlreview.com> wrote:
>>> 5.13: How can I open a filehandle to a string?
>>>
>>> (contributed by Peter J. Holzer, hjp-usenet2@hjp.at)
>>>
>>> Since Perl 5.8.0 a file handle referring to a string can be created by
>>> calling open with a reference to that string instead of the filename.
>>> This file handle can then be used to read from or write to the string:
>>>
>>> open(my $fh, '>', \$string) or die "Could not open string for writing";
>>> print $fh "foo\n";
>>> print $fh "bar\n"; # $string now contains "foo\nbar\n"
>>>
>>> open(my $fh, '<', \$string) or die "Could not open string for reading";
>>> my $x = <$fh>; # $x now contains "foo\n"
>>
>> I can see how it works, but I wonder in what circumstances one might
>> want to do this?
>
> Sometimes you have a module which is intended to read from a file. But
> you have the data already in memory. Of course you could write it to a
> temporary file and then invoke the module to read it back. But that's
> not very elegant - much better if you can tell it to read from a scalar.
> Same thing the other way: You have a module which can only write to a
> file but you need the output in a variable.
>
> I regularly use this in test scripts.
Thank you for the explanation. I suppose I've either never used a module
like that or I've always gone the long way round. I will try to keep
this in mind should I encounter such a problem.
Justin.
--
Justin C, by the sea.
------------------------------
Date: Wed, 28 Jul 2010 15:04:48 -0700 (PDT)
From: David Filmer <usenet@davidfilmer.com>
Subject: If Perl is compiled on a 32-bit system, and the system is upgraded to 64-bit...
Message-Id: <1e2036de-8a9b-40ac-9268-5da247f093fd@y32g2000prc.googlegroups.com>
If Perl is compiled on a 32-bit system, and the system is later
upgraded to 64-bit hardware and O/S, would Perl programs then be able
to use the full amount of memory that a 64-bit system would allow?
Or would I need to re-compile Perl in the 64-bit environment to access
the larger memory?
Thanks!
------------------------------
Date: Wed, 28 Jul 2010 08:09:28 -0700 (PDT)
From: Wolfram Humann <w.c.humann@arcor.de>
Subject: Re: Speed of reading some MB of data using qx(...)
Message-Id: <8b936131-0010-4482-b243-cf0a42d0f8e0@x21g2000yqa.googlegroups.com>
On Jul 28, 2:50=A0am, Ilya Zakharevich <nospam-ab...@ilyaz.org> wrote:
> On 2010-07-27, Wolfram Humann <w.c.hum...@arcor.de> wrote:
>
> > sv.c). Perl_sv_grow then needs to decide if the string's memory is
> > already sufficient or really needs to grow. In the latter case,
> > safesysrealloc -> Perl_safesysrealloc -> realloc is called. The
> > interesting point is: how much memory does it request? The answer is:
>
> > newlen +=3D 10 * (newlen - SvCUR(sv)); /* avoid copy each time */
>
> > I.e. it requests 10 times as much memory as is required for the
> > current append operation. So when I loop 10000 times and each time
> > append 100 chars to an initial string size of 10 million, the memory
> > grows from 10.000e6 to 10.001e6 to 10.002e6 and so on 1000 times till
> > it ends at 11.000e6.
>
> Good l*rd!
>
> The current algorithm is optimized to work in tandem with "my"
> malloc(), which would round up to a certain geometric progression
> anyway. =A0So if one use as different malloc()s, one should better use
>
> =A0 newlen +=3D (newlen >> 4) + 10; /* avoid copy each time */
I finally managed to compile my own win32 perl. (Actually it was quite
easy once I refrained from doing mistakes so stupid I do not dare to
talk about them...)
Now I could modify Perl_sv_grow() and insert debugging prints and I
found good and bad news.
The bad news: Looks like I was *overly optimistic* (LOL!) concerning
the efficiency of the current string memory allocation on win32. The
"newlen +=3D 10 * (newlen - SvCUR(sv))" line is only executed if
SvOOK(sv) -- i.e. in most cases it is *not* executed. Therefore win32
system realloc is not called every tenth string-append operation but
*every* time something gets appended to a string.
The good news: A single additional line of code makes win32 perl
100...1000 times faster!
(for code that appends to strings very frequently)
I went with Ilya's proposal but inserted the line a little further
down, just after
if (newlen > SvLEN(sv)) { /* need more room? */
So now we have:
if (newlen > SvLEN(sv)) { /* need more room? */
newlen +=3D (newlen >> 2) + 10;
#ifndef Perl_safesysmalloc_size
newlen =3D PERL_STRLEN_ROUNDUP(newlen);
#endif
if (SvLEN(sv) && s) {
s =3D (char*)saferealloc(s, newlen);
}
The remaining question is by what ratio a string's memory should grow.
I tried several values from (newlen >> 0) to (newlen >> 6) for the
best compromise between execution time and memory usage and my
personal favorite is (newlen >> 2). What do others here think? At the
end of this post I will attach the results for my benchmark script
starting with Cygwin Perl followed by several versions of (newlen >>
x) and finally the unpatched Strawberry Perl. These reports now also
include memory footprint info (courtesy of pslist from the
Sysinternals suite). I also went back to my original task of reading a
12 MB postscript file using qx(cat ...) and in some cases I also
report times for that -- here Cygwin (70 ms) still beats my modified
perl (210 ms), but that's still waaaaay better than the original 18000
ms :-)
I will also report to p5p.
Wolfram
###########################################################
c:\cygwin\bin\perl d:\exe\LongStrings.pl
1E5 chars + 1E4 x 1E2 chars: 1.5 ms
1E6 chars + 1E4 x 1E2 chars: 2.3 ms
1E7 chars + 1E4 x 1E2 chars: 1.5 ms
1E7 chars + 1E5 x 1E1 chars: 12.2 ms
1E7 chars + 1E4 x 1E2 chars: 1.4 ms
1E7 chars + 1E3 x 1E3 chars: 0.6 ms
1E7 chars + 1E2 x 1E4 chars: 0.6 ms
1E7 chars + 1E1 x 1E5 chars: 0.8 ms
1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.2 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 5.9 ms
Private MB: 326.5
Peak Private MB: 326.5
--------------
qx(cat postscriptfile.ps): 68.7 ms
Private MB: 38.5
Peak Private MB: 38.5
###########################################################
newlen +=3D (newlen >> 0) + 10;
C:\wh_fast_perl\bin\perl d:\exe\LongStrings.pl
1E5 chars + 1E4 x 1E2 chars: 2.2 ms
1E6 chars + 1E4 x 1E2 chars: 1.4 ms
1E7 chars + 1E4 x 1E2 chars: 1.4 ms
1E7 chars + 1E5 x 1E1 chars: 10.4 ms
1E7 chars + 1E4 x 1E2 chars: 1.4 ms
1E7 chars + 1E3 x 1E3 chars: 0.6 ms
1E7 chars + 1E2 x 1E4 chars: 0.6 ms
1E7 chars + 1E1 x 1E5 chars: 0.6 ms
1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.2 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 6.0 ms
Private MB: 378.3
Peak Private MB: 418.0
--------------
qx(cat postscriptfile.ps): 181.2 ms
Private MB: 25.1
Peak Private MB: 40.3
###########################################################
newlen +=3D (newlen >> 1) + 10;
C:\wh_fast_perl\bin\perl d:\exe\LongStrings.pl
1E5 chars + 1E4 x 1E2 chars: 2.5 ms
1E6 chars + 1E4 x 1E2 chars: 2.4 ms
1E7 chars + 1E4 x 1E2 chars: 1.3 ms
1E7 chars + 1E5 x 1E1 chars: 9.6 ms
1E7 chars + 1E4 x 1E2 chars: 1.3 ms
1E7 chars + 1E3 x 1E3 chars: 0.7 ms
1E7 chars + 1E2 x 1E4 chars: 0.7 ms
1E7 chars + 1E1 x 1E5 chars: 0.6 ms
1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.1 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 6.4 ms
Private MB: 290.2
Peak Private MB: 319.5
###########################################################
newlen +=3D (newlen >> 2) + 10;
C:\wh_fast_perl\bin\perl d:\exe\LongStrings.pl
1E5 chars + 1E4 x 1E2 chars: 9.2 ms
1E6 chars + 1E4 x 1E2 chars: 5.3 ms
1E7 chars + 1E4 x 1E2 chars: 1.5 ms
1E7 chars + 1E5 x 1E1 chars: 9.9 ms
1E7 chars + 1E4 x 1E2 chars: 1.4 ms
1E7 chars + 1E3 x 1E3 chars: 0.5 ms
1E7 chars + 1E2 x 1E4 chars: 0.5 ms
1E7 chars + 1E1 x 1E5 chars: 0.6 ms
1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.1 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 5.4 ms
Private MB: 244.9
Peak Private MB: 270.1
--------------
qx(cat postscriptfile.ps): 209.8 ms
Private MB: 16.2
Peak Private MB: 29.0
###########################################################
newlen +=3D (newlen >> 3) + 10;
C:\wh_fast_perl\bin\perl d:\exe\LongStrings.pl
1E5 chars + 1E4 x 1E2 chars: 12.1 ms
1E6 chars + 1E4 x 1E2 chars: 6.9 ms
1E7 chars + 1E4 x 1E2 chars: 1.4 ms
1E7 chars + 1E5 x 1E1 chars: 10.3 ms
1E7 chars + 1E4 x 1E2 chars: 1.4 ms
1E7 chars + 1E3 x 1E3 chars: 0.5 ms
1E7 chars + 1E2 x 1E4 chars: 0.5 ms
1E7 chars + 1E1 x 1E5 chars: 0.5 ms
1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.1 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 5.6 ms
Private MB: 221.9
Peak Private MB: 244.3
###########################################################
newlen +=3D (newlen >> 4) + 10;
C:\wh_fast_perl\bin\perl d:\exe\LongStrings.pl
1E5 chars + 1E4 x 1E2 chars: 17.0 ms
1E6 chars + 1E4 x 1E2 chars: 13.8 ms
1E7 chars + 1E4 x 1E2 chars: 11.2 ms
1E7 chars + 1E5 x 1E1 chars: 19.4 ms
1E7 chars + 1E4 x 1E2 chars: 10.1 ms
1E7 chars + 1E3 x 1E3 chars: 10.9 ms
1E7 chars + 1E2 x 1E4 chars: 11.1 ms
1E7 chars + 1E1 x 1E5 chars: 11.0 ms
1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.2 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 6.3 ms
Private MB: 219.4
Peak Private MB: 233.8
--------------
qx(cat postscriptfile.ps): 312.0 ms
Private MB: 14.0
Peak Private MB: 25.8
###########################################################
newlen +=3D (newlen >> 6) + 10;
C:\wh_fast_perl\bin\perl d:\exe\LongStrings.pl
1E5 chars + 1E4 x 1E2 chars: 57.7 ms
1E6 chars + 1E4 x 1E2 chars: 59.8 ms
1E7 chars + 1E4 x 1E2 chars: 67.9 ms
1E7 chars + 1E5 x 1E1 chars: 69.4 ms
1E7 chars + 1E4 x 1E2 chars: 71.6 ms
1E7 chars + 1E3 x 1E3 chars: 69.6 ms
1E7 chars + 1E2 x 1E4 chars: 64.8 ms
1E7 chars + 1E1 x 1E5 chars: 53.8 ms
1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.2 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 5.7 ms
Private MB: 219.8
Peak Private MB: 230.0
###########################################################
unpatched Strawberry Perl
c:\strawberry\perl\bin\perl d:\exe\LongStrings.pl
1E5 chars + 1E4 x 1E2 chars: 96.2 ms
1E6 chars + 1E4 x 1E2 chars: 325.7 ms
1E7 chars + 1E4 x 1E2 chars: 2655.9 ms
1E7 chars + 1E5 x 1E1 chars: 2687.3 ms
1E7 chars + 1E4 x 1E2 chars: 2687.4 ms
1E7 chars + 1E3 x 1E3 chars: 2656.1 ms
1E7 chars + 1E2 x 1E4 chars: 1093.6 ms
1E7 chars + 1E1 x 1E5 chars: 108.3 ms
1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.1 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 6.1 ms
Private MB: 200.4
Peak Private MB: 210.2
--------------
qx(cat postscriptfile.ps): 18187.5 ms
Private MB: 13.2
Peak Private MB: 24.9
------------------------------
Date: Wed, 28 Jul 2010 21:38:07 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Speed of reading some MB of data using qx(...)
Message-Id: <slrni511ov.std.hjp-usenet2@hrunkner.hjp.at>
On 2010-07-28 15:09, Wolfram Humann <w.c.humann@arcor.de> wrote:
> I went with Ilya's proposal but inserted the line a little further
> down, just after
> if (newlen > SvLEN(sv)) { /* need more room? */
>
> So now we have:
> if (newlen > SvLEN(sv)) { /* need more room? */
> newlen += (newlen >> 2) + 10;
> #ifndef Perl_safesysmalloc_size
> newlen = PERL_STRLEN_ROUNDUP(newlen);
> #endif
> if (SvLEN(sv) && s) {
> s = (char*)saferealloc(s, newlen);
> }
>
> The remaining question is by what ratio a string's memory should grow.
> I tried several values from (newlen >> 0) to (newlen >> 6) for the
> best compromise between execution time and memory usage and my
> personal favorite is (newlen >> 2). What do others here think?
That sounds about right. I've used factors between 1.2 and 1.5 in the
past for similar problems. I suggest you base the growth on the old
size, though, something like:
if (newlen > SvLEN(sv)) { /* need more room? */
size_t min = SvLEN(sv) * 5/4 + 10;
if (newlen < min) newlen = min;
...
This gives you the same growth pattern if the increments are small, but
it doesn't allocate extra memory if you append a large chunk.
hp
------------------------------
Date: Wed, 28 Jul 2010 04:06:59 -0700 (PDT)
From: "Jane D." <janedunnie@gmail.com>
Subject: Re: XML::Simple - Processing Query
Message-Id: <ea981241-158c-4bcb-bce0-d0a0da1015a0@5g2000yqz.googlegroups.com>
Okay, here's the first bit of the Data Dumper result, content edited
for brevity:
$VAR1 = {
'count' => '10',
'story' => {
'16647039' => {
'link' => 'http://www...somelink.html',
'topic' => {
'Health' => {
'short_name' =>
'health'
}
},
'status' => 'upcoming',
'submit_date' => '1256158570',
'container' => {
'Lifestyle' => {
'short_name' => 'lifestyle'
}
},
'comments' => '0',
'description' => [
'Some description
in here'
],
'diggs' => '1',
'media' => 'news',
'href' => 'http://digg.com/
restofurl',
'user' => {
'diggusername' => {
'icon'
=> '',
'registered' => '1255514015',
'profileviews' => '105'
}
},
'shorturl' => [
{
'view_count' =>
'0',
'short_url' =>
'http://digg.com/restofurl'
}
],
'title' => [
'Some title of some
article'
]
},
'22914259' => {
'link' => 'http://www...next-
link.html',
'topic' => {
'World News' => {
'short_name' => 'world_news'
}
},
etc,
etc
Hope that helps, and that somebody is able to assist. Much
appreciated.
------------------------------
Date: Wed, 28 Jul 2010 14:04:17 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: XML::Simple - Processing Query
Message-Id: <hfm5i7-m03.ln1@osiris.mauzo.dyndns.org>
Quoth "Jane D." <janedunnie@gmail.com>:
> Okay, here's the first bit of the Data Dumper result, content edited
> for brevity:
Please wrap your lines to 76 characters, and reformat the data so it's
comprehensible when wrapped like that. I don't know what you did to
cause the key indentation to get so confused; whatever it was, please
don't do it again. (DDumper output is not, by default, formatted very
well. I prefer Data::Dump for this reason.)
> $VAR1 = {
>
> 'count' => '10',
> 'story' => {
>
> '16647039' => {
> 'link' => 'http://www...somelink.html',
> 'topic' => {
> 'Health' => {
> 'short_name' =>
> 'health'
> }
> },
You need to re-read the section on 'KeyAttr' in XML::Simple. Your
<story>s have been flattened into a hashref, meaning you lose the order
but can look up stories by their 'id' attribute. If you just want to
process the stories in the order they are in the file, you want to
specify KeyAttr => [] (and probably ForceArray => 1 as well), in which
case you will get a structure like
$data = {
count => 10,
story => [
id => '16647039',
link => '...',
...,
],
...,
}
which you can iterate over with
for (@{$data->{story}}) {
Ben
------------------------------
Date: Wed, 28 Jul 2010 07:53:50 -0700 (PDT)
From: Jasper2000 <helius@gmail.com>
Subject: Re: XML::Simple - Processing Query
Message-Id: <c1e6bd7c-2d11-4d21-9663-6988d90e1571@w12g2000yqj.googlegroups.com>
Thanks for that Ben, much appreciated. I can play with that now I have
an idea what's going on.
Thanks again!
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3049
***************************************