[31784] in Perl-Users-Digest
Perl-Users Digest, Issue: 3047 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Jul 27 18:09:23 2010
Date: Tue, 27 Jul 2010 15:09:04 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Tue, 27 Jul 2010 Volume: 11 Number: 3047
Today's topics:
Re: 4D hash iteration <ben@morrow.me.uk>
Re: 4D hash iteration <tadmc@seesig.invalid>
Re: 4D hash iteration <jurgenex@hotmail.com>
Re: 4D hash iteration <a@a.com>
Re: 4D hash iteration <hjp-usenet2@hjp.at>
Re: 4D hash iteration sln@netherlands.com
Re: Confusion about the smart matching operator <jl_post@hotmail.com>
Re: Speed of reading some MB of data using qx(...) <w.c.humann@arcor.de>
Re: Speed of reading some MB of data using qx(...) <ben@morrow.me.uk>
Re: Speed of reading some MB of data using qx(...) <w.c.humann@arcor.de>
Re: Speed of reading some MB of data using qx(...) <ben@morrow.me.uk>
Re: Speed of reading some MB of data using qx(...) <jl_post@hotmail.com>
Re: Speed of reading some MB of data using qx(...) <hjp-usenet2@hjp.at>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Tue, 27 Jul 2010 12:45:20 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: 4D hash iteration
Message-Id: <gft2i7-5t02.ln1@osiris.mauzo.dyndns.org>
Quoth Michael <a@a.com>:
> I need a construction like this:
>
> foreach $a(...)
> {
> foreach $b(...)
> {
> foreach $c(...)
> {
> foreach $d(...)
> {
> $myval=$hash{$a}{$b}{$c}{$d};
This is messy, and gets worse as the hash gets deeper. I would use
something like Data::Walk instead.
Ben
------------------------------
Date: Tue, 27 Jul 2010 07:43:13 -0500
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: 4D hash iteration
Message-Id: <slrni4tkv0.rdh.tadmc@tadbox.sbcglobal.net>
Michael <a@a.com> wrote:
Apply "Use Rule 1" from:
perldoc perlreftut
> I need a construction like this:
>
> foreach $a(...)
foreach my $a (keys %hash)
> {
> foreach $b(...)
foreach my $b (keys %{$hash{$a}})
> {
> foreach $c(...)
foreach my $c (keys %{$hash{$a}{$b}})
> {
> foreach $d(...)
foreach my $d (keys %{$hash{$a}{$b}{$c}})
> {
> $myval=$hash{$a}{$b}{$c}{$d};
> ...
> }
> }
> }
> }
>
> How to do this? What should be in braces (...)?
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.
------------------------------
Date: Tue, 27 Jul 2010 07:33:13 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: 4D hash iteration
Message-Id: <ldrt46lip3fp9vclehbl9846hdnj9httq8@4ax.com>
Michael <a@a.com> wrote:
>I need a construction like this:
>
>foreach $a(...)
>{
> foreach $b(...)
> {
> foreach $c(...)
> {
> foreach $d(...)
> {
> $myval=$hash{$a}{$b}{$c}{$d};
> ...
> }
> }
> }
>}
>
>How to do this? What should be in braces (...)?
Maybe something trivial like
keys %a
keys %b
keys %c
keys %d
jue
------------------------------
Date: Tue, 27 Jul 2010 21:58:20 +0700
From: michael20545 <a@a.com>
Subject: Re: 4D hash iteration
Message-Id: <i2ms6j$1a53$1@adenine.netfront.net>
27.07.2010 19:43, Tad McClellan пишет:
> Michael<a@a.com> wrote:
>
>
> Apply "Use Rule 1" from:
>
> perldoc perlreftut
>
>
>> I need a construction like this:
>>
>> foreach $a(...)
>
> foreach my $a (keys %hash)
>
>> {
>> foreach $b(...)
>
> foreach my $b (keys %{$hash{$a}})
>
>> {
>> foreach $c(...)
>
> foreach my $c (keys %{$hash{$a}{$b}})
>
>> {
>> foreach $d(...)
>
> foreach my $d (keys %{$hash{$a}{$b}{$c}})
>
>> {
>> $myval=$hash{$a}{$b}{$c}{$d};
>> ...
>> }
>> }
>> }
>> }
>>
>> How to do this? What should be in braces (...)?
>
>
Thank you!
--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
------------------------------
Date: Tue, 27 Jul 2010 17:06:09 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: 4D hash iteration
Message-Id: <slrni4ttf1.v56.hjp-usenet2@hrunkner.hjp.at>
On 2010-07-27 14:33, Jürgen Exner <jurgenex@hotmail.com> wrote:
> Michael <a@a.com> wrote:
>>foreach $a(...)
>>{
>> foreach $b(...)
>> {
>> foreach $c(...)
>> {
>> foreach $d(...)
>> {
>> $myval=$hash{$a}{$b}{$c}{$d};
>> ...
>> }
>> }
>> }
>>}
>>
>>How to do this? What should be in braces (...)?
>
> Maybe something trivial like
>
> keys %a
> keys %b
> keys %c
> keys %d
Maybe. But where do %a, %b, %c and %d come from?
hp
------------------------------
Date: Tue, 27 Jul 2010 09:09:02 -0700
From: sln@netherlands.com
Subject: Re: 4D hash iteration
Message-Id: <l11u46phu4beg9mc9s273rq9e7rla7r9m0@4ax.com>
On Tue, 27 Jul 2010 21:58:20 +0700, michael20545 <a@a.com> wrote:
>27.07.2010 19:43, Tad McClellan ?????:
>> Michael<a@a.com> wrote:
>>
>>
>> Apply "Use Rule 1" from:
>>
>> perldoc perlreftut
>>
>>
>>> I need a construction like this:
>>>
>>> foreach $a(...)
>>
>> foreach my $a (keys %hash)
>>
>>> {
>>> foreach $b(...)
>>
>> foreach my $b (keys %{$hash{$a}})
>>
>>> {
>>> foreach $c(...)
>>
>> foreach my $c (keys %{$hash{$a}{$b}})
>>
>>> {
>>> foreach $d(...)
>>
>> foreach my $d (keys %{$hash{$a}{$b}{$c}})
>>
>>> {
>>> $myval=$hash{$a}{$b}{$c}{$d};
>>> ...
>>> }
>>> }
>>> }
>>> }
>>>
>>> How to do this? What should be in braces (...)?
>>
>>
>
>Thank you!
>
>--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
Make sure you read the docs because it gets even more nasty.
-sln
--------------------
use strict;
use warnings;
my %hash = (
av1 => 'av-1',
av2 => {
bv1 => 'bv-1',
bv2 => {
cv1 => 'cv-1',
cv2 => {
dv1 => 'dv-1',
dv2 => 'dv-2',
dv3 => 'dv-3',
},
cv3 => ['foo3','bar3'],
},
bv3 => 'bv-3',
},
av3 => 'av-3',
av4 => sub {print "av4\n"}
);
foreach my $a (keys %hash)
{
if (ref($hash{$a}) eq "HASH") {
foreach my $b (keys %{$hash{$a}})
{
if (ref($hash{$a}{$b}) eq "HASH") {
foreach my $c (keys %{$hash{$a}{$b}})
{
if (ref($hash{$a}{$b}{$c}) eq "HASH") {
foreach my $d (keys %{$hash{$a}{$b}{$c}})
{
my $myval=$hash{$a}{$b}{$c}{$d};
print "$d = $myval\n";
}
}
}
}
}
}
}
------------------------------
Date: Tue, 27 Jul 2010 07:50:33 -0700 (PDT)
From: "jl_post@hotmail.com" <jl_post@hotmail.com>
Subject: Re: Confusion about the smart matching operator
Message-Id: <22ba29dc-83fd-4619-b250-c1297f4d2308@m18g2000vbg.googlegroups.com>
> In article <a4526931-df5f-4acd-9b48-66e5143f9...@i31g2000yqm.googlegroups=
.com>,
> jl_p...@hotmail.com <jl_p...@hotmail.com> wrote:
>
> > =A0 use Scalar::Util qw(looks_like_number);
> > =A0 if ( @a ~~ sub { looks_like_number($_[0]) } ) =A0# prints "false"
> > =A0 {
> > =A0 =A0 =A0print "true";
> > =A0 }
> > =A0 else
> > =A0 {
> > =A0 =A0 =A0print "false"
> > =A0 }
>
> >the smart matching operator will return a false value. =A0In this case,
> >the smart matching operator behaves like an "all()" function,
On Jul 23, 3:55=A0pm, pac...@kosh.dhis.org (Alan Curry) replied:
>
> No it doesn't.
>
> Try it with @a=3D(1,2,3); it's still false.
>
> The coderef is only called once, with \@a as the argument. You passed an
> arrayref to looks_like_number, which is going to be false no matter what.
Hmmm... contrary to what you're saying, I tried it with @a=3D(1,2,3);
and it's evaluating to true for me. Here is the sample script I used
to test it with:
#!/usr/bin/perl
use strict;
use warnings;
my @a =3D (1, 2, 3);
use Scalar::Util qw(looks_like_number);
if ( @a ~~ sub { looks_like_number($_[0]) } ) # prints "true"
{
print "true";
}
else
{
print "false"
}
__END__
In fact, if I change the sub { } to be:
sub { print "$_[0]\n" }
then I see:
1
2
3
true
(The "true" is there because print() is returning a true value.) So I
have to disagree with the statement that the coderef is only called
once, with \@a as its argument.
> You're right about one thing at least: it's not easy to predict what ~~ w=
ill
> do based on the documentation.
Yeah... evidently I'm not the only one who's confused about aspects
of the '~~' operator. That's too bad -- it seems like it has a lot of
potential.
Cheers,
-- Jean-Luc
------------------------------
Date: Tue, 27 Jul 2010 06:15:15 -0700 (PDT)
From: Wolfram Humann <w.c.humann@arcor.de>
Subject: Re: Speed of reading some MB of data using qx(...)
Message-Id: <33c4086f-90bb-4cc0-af4c-e01185d51ed8@z25g2000vbn.googlegroups.com>
Alright, I did some profiling and code-reading and what I found is
something that I would consider a bug or at least fairly poor coding
practice in the core.
Opinions very welcome!
In Strawberry almost the entire time is spent in the following call
sequence:
Perl_sv_catpvn_flags -> Perl_sv_grow -> Perl_safesysrealloc -> realloc
(msvcrt)
Perl_sv_catpvn_flags (in sv.c) is documented as "Concatenates the
string onto the end of the string which is in the SV". That's what my
code does all the time. So far so good.
Perl_sv_catpvn_flags *always* calls SvGrow -> Perl_sv_grow (also in
sv.c). Perl_sv_grow then needs to decide if the string's memory is
already sufficient or really needs to grow. In the latter case,
safesysrealloc -> Perl_safesysrealloc -> realloc is called. The
interesting point is: how much memory does it request? The answer is:
newlen += 10 * (newlen - SvCUR(sv)); /* avoid copy each time */
I.e. it requests 10 times as much memory as is required for the
current append operation. So when I loop 10000 times and each time
append 100 chars to an initial string size of 10 million, the memory
grows from 10.000e6 to 10.001e6 to 10.002e6 and so on 1000 times till
it ends at 11.000e6. I can sort of confirm this to be true if I look
at the memory graph in Process Explorer: it grows smoothly (no
discernible steps), becoming incrementally slower towards the end
(because the amount of memory that needs to be copied for each realloc
increases).
Growing memory in such tiny increments is what I consider bad
practice.
By the way: I estimate the time required for each realloc to be around
3 ms for 10e6 chars, growing linearly with the amount of data -- I
consider that a fair speed and no reason to blame win32.
What happens in Cygwin? The stack-sampling profiler is of little help
because it easily misses infrequent events. I would expect that
Perl_sv_grow is called just as often as in Strawberry Perl. The
difference is that safesysrealloc does not call Perl_safesysrealloc ->
realloc, it calls Perl_realloc. And Perl_realloc (in malloc.c) seems
to have it's own logic (something with '<<' and 'LOG' and 'pow' which
I did not try to fully understand) to determine what amount of memory
it finally allocates. When I add some sleep() to the string append
process, I can see how memory grows in Process Explorer: There are 5
steps (probably corresponding to 5 calls to Perl_realloc) of growing
size when I start with 0.1e6 chars and then grow to 1.1e6 chars. When
I start with 10e6 chars and grow to 11e6 chars, there is just 1 step
in memory size. This looks like a sensible memory growth strategy to
me. It explains why Cygwin is several 100 times faster than Strawberry
Perl. It also explains why I observed during my experiments that
Cygwin Perl consistently needs more memory than Strawberry Perl -- but
that's a small price to pay for such a dramatic speedup.
Wolfram
------------------------------
Date: Tue, 27 Jul 2010 15:13:37 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Speed of reading some MB of data using qx(...)
Message-Id: <h563i7-7r52.ln1@osiris.mauzo.dyndns.org>
Quoth Wolfram Humann <w.c.humann@arcor.de>:
> Alright, I did some profiling and code-reading and what I found is
> something that I would consider a bug or at least fairly poor coding
> practice in the core.
> Opinions very welcome!
>
> In Strawberry almost the entire time is spent in the following call
> sequence:
>
> Perl_sv_catpvn_flags -> Perl_sv_grow -> Perl_safesysrealloc -> realloc
> (msvcrt)
>
> Perl_sv_catpvn_flags (in sv.c) is documented as "Concatenates the
> string onto the end of the string which is in the SV". That's what my
> code does all the time. So far so good.
> Perl_sv_catpvn_flags *always* calls SvGrow -> Perl_sv_grow (also in
> sv.c). Perl_sv_grow then needs to decide if the string's memory is
> already sufficient or really needs to grow. In the latter case,
> safesysrealloc -> Perl_safesysrealloc -> realloc is called. The
> interesting point is: how much memory does it request? The answer is:
>
> newlen += 10 * (newlen - SvCUR(sv)); /* avoid copy each time */
>
> I.e. it requests 10 times as much memory as is required for the
> current append operation. So when I loop 10000 times and each time
> append 100 chars to an initial string size of 10 million, the memory
> grows from 10.000e6 to 10.001e6 to 10.002e6 and so on 1000 times till
> it ends at 11.000e6. I can sort of confirm this to be true if I look
> at the memory graph in Process Explorer: it grows smoothly (no
> discernible steps), becoming incrementally slower towards the end
> (because the amount of memory that needs to be copied for each realloc
> increases).
>
> Growing memory in such tiny increments is what I consider bad
> practice.
Possibly; I don't know what the rationale behind that choice was.
Certainly Perl seems to expect whatever malloc it's using to be smart
about pre-allocating extra memory and using that to satisfy reallocs.
> By the way: I estimate the time required for each realloc to be around
> 3 ms for 10e6 chars, growing linearly with the amount of data -- I
> consider that a fair speed and no reason to blame win32.
If you timed perl's own realloc, you would (I believe) find it does much
better than this. AFAICS from the code, it has a fixed set of block
sizes it actually allocates. Enlarging a block such that it doesn't go
over the block size actually allocated is *free*, not even linear in the
size of the block, since all it does is adjust the end marker. This is
the logic you are expecting sv_grow to implement, but perl has decided
that this is the allocator's responsibility.
> What happens in Cygwin? The stack-sampling profiler is of little help
> because it easily misses infrequent events. I would expect that
> Perl_sv_grow is called just as often as in Strawberry Perl. The
> difference is that safesysrealloc does not call Perl_safesysrealloc ->
> realloc, it calls Perl_realloc.
Right, so your Cygwin perl is built with -Dusemymalloc.
> And Perl_realloc (in malloc.c) seems
> to have it's own logic (something with '<<' and 'LOG' and 'pow' which
> I did not try to fully understand) to determine what amount of memory
> it finally allocates. When I add some sleep() to the string append
> process, I can see how memory grows in Process Explorer: There are 5
> steps (probably corresponding to 5 calls to Perl_realloc
No, I think not. I think Perl_realloc gets called just as often as
realloc got called with Strawberry, it's just that most of the time
realloc can return the new block without having to call sbrk (or
whatever Cygwin uses instead) and without having to do any copying.
> ) of growing
> size when I start with 0.1e6 chars and then grow to 1.1e6 chars. When
> I start with 10e6 chars and grow to 11e6 chars, there is just 1 step
> in memory size. This looks like a sensible memory growth strategy to
> me. It explains why Cygwin is several 100 times faster than Strawberry
> Perl. It also explains why I observed during my experiments that
> Cygwin Perl consistently needs more memory than Strawberry Perl -- but
> that's a small price to pay for such a dramatic speedup.
OK, I just ran your benchmark with the following perls:
5.8.8-vanilla i386-freebsd, default build options
5.8.8-malloc i386-freebsd, -Dusemymalloc
(chosen solely because this is the only matched pair of mymalloc/not
perls I have lying around) and got these results:
~/src/perl% runperl 5.8.8-vanilla realloc
1E5 chars + 1E4 x 1E2 chars: 420.3 ms
1E6 chars + 1E4 x 1E2 chars: 1043.1 ms
1E7 chars + 1E4 x 1E2 chars: 7159.0 ms
1E7 chars + 1E5 x 1E1 chars: 7590.6 ms
1E7 chars + 1E4 x 1E2 chars: 7148.9 ms
1E7 chars + 1E3 x 1E3 chars: 7158.1 ms
1E7 chars + 1E2 x 1E4 chars: 2948.6 ms
1E7 chars + 1E1 x 1E5 chars: 326.1 ms
1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 5.1 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 15.4 ms
~/src/perl% runperl 5.8.8-malloc realloc
1E5 chars + 1E4 x 1E2 chars: 18.6 ms
1E6 chars + 1E4 x 1E2 chars: 18.5 ms
1E7 chars + 1E4 x 1E2 chars: 45.3 ms
1E7 chars + 1E5 x 1E1 chars: 86.1 ms
1E7 chars + 1E4 x 1E2 chars: 7.4 ms
1E7 chars + 1E3 x 1E3 chars: 6.5 ms
1E7 chars + 1E2 x 1E4 chars: 3.1 ms
1E7 chars + 1E1 x 1E5 chars: 3.4 ms
1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 39.6 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 8.7 ms
~/src/perl%
So the difference you are seeing is precisely the difference between
using the system malloc and using perl's. (FreeBSD's malloc, unlike
Win32's, has a reputation for being rather efficient, so this lets
Microsoft off the hook.)
Would you be able to repeat these tests with 5.12.0: that is, build
(under Cygwin, if you don't have access to a Unix system) matched perls
configured with and without -Dusemymalloc, and run the test on both?
I'll try and do the same here, but I can't promise I'll have time. If
the slowdown still exists in 5.12, I think you have a good case for a
bug report. I'm not sure how possible it would be to fix, but it would
clearly be a big win under some circumstances to be able to build Win32
perl with perl's malloc.
Ben
------------------------------
Date: Tue, 27 Jul 2010 08:03:35 -0700 (PDT)
From: Wolfram Humann <w.c.humann@arcor.de>
Subject: Re: Speed of reading some MB of data using qx(...)
Message-Id: <4ca80c6b-c82e-4913-b5b7-ac142282b6aa@t11g2000vbj.googlegroups.com>
On Jul 27, 4:13=A0pm, Ben Morrow <b...@morrow.me.uk> wrote:
> Would you be able to repeat these tests with 5.12.0: that is, build
> (under Cygwin, if you don't have access to a Unix system) matched perls
> configured with and without -Dusemymalloc, and run the test on both?
> I'll try and do the same here, but I can't promise I'll have time. If
> the slowdown still exists in 5.12, I think you have a good case for a
> bug report. I'm not sure how possible it would be to fix, but it would
> clearly be a big win under some circumstances to be able to build Win32
> perl with perl's malloc.
I do have a linux machine and I did comile my own perl there so I
think a could redo that (possibly easier than recompiling Perl for
Cygwin). The strange thing is that on Cygwin perl -V says
'usemymalloc=3Dy' while the one on Linux says 'usemymalloc=3Dn'. And on
Linux my bechmark runs everything under 12 ms. Are you certain
changing usemymalloc would have much effect there?
What I would much more *like* to try is recompile a perl (e.g.
strawberry perl) on win32 and replace
newlen +=3D 10 * (newlen - SvCUR(sv));
with something like
newlen +=3D 10 * (newlen - SvCUR(sv)) + 0.5 * SvCUR(sv);
(with the factor reasonably somewhere between 0.2 and 1)
but a quick attempt to follow http://perldoc.perl.org/perlwin32.html
was not successful :(
Wolfram
------------------------------
Date: Tue, 27 Jul 2010 18:07:58 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Speed of reading some MB of data using qx(...)
Message-Id: <ecg3i7-ke72.ln1@osiris.mauzo.dyndns.org>
Quoth Wolfram Humann <w.c.humann@arcor.de>:
> On Jul 27, 4:13 pm, Ben Morrow <b...@morrow.me.uk> wrote:
> > Would you be able to repeat these tests with 5.12.0: that is, build
> > (under Cygwin, if you don't have access to a Unix system) matched perls
> > configured with and without -Dusemymalloc, and run the test on both?
> > I'll try and do the same here, but I can't promise I'll have time. If
> > the slowdown still exists in 5.12, I think you have a good case for a
> > bug report. I'm not sure how possible it would be to fix, but it would
> > clearly be a big win under some circumstances to be able to build Win32
> > perl with perl's malloc.
>
> I do have a linux machine and I did comile my own perl there so I
> think a could redo that (possibly easier than recompiling Perl for
> Cygwin). The strange thing is that on Cygwin perl -V says
> 'usemymalloc=y' while the one on Linux says 'usemymalloc=n'. And on
> Linux my bechmark runs everything under 12 ms. Are you certain
> changing usemymalloc would have much effect there?
No. It's possible that glibc's malloc already behaves the way perl is
expecting it to, so using perl's malloc doesn't change the performance
much.
Given that we know Win32's malloc behaves badly, one thing to try would
be building Win32 perls without USE_IMP_SYS, but with and without
PERL_MALLOC. I will try to repeat the FreeBSD tests with 5.12, since
that seems to show the symptoms.
> What I would much more *like* to try is recompile a perl (e.g.
> strawberry perl) on win32 and replace
>
> newlen += 10 * (newlen - SvCUR(sv));
>
> with something like
>
> newlen += 10 * (newlen - SvCUR(sv)) + 0.5 * SvCUR(sv);
>
> (with the factor reasonably somewhere between 0.2 and 1)
> but a quick attempt to follow http://perldoc.perl.org/perlwin32.html
> was not successful :(
Last time I built perl on Win32 I started by installing Strawberry and
putting c:\strawberry\c\bin in %PATH%, and setting INCLUDE and LIB to
c:\strawberry\c\include and \lib respectively. That gives you a
known-good toolchain to start with. It's best to make sure you don't
have anything unnecessary in %PATH%; in particular, you mustn't have
some other copy of perl. Also remember that your build directory must
not have any spaces in its name.
Ben
------------------------------
Date: Tue, 27 Jul 2010 10:57:54 -0700 (PDT)
From: "jl_post@hotmail.com" <jl_post@hotmail.com>
Subject: Re: Speed of reading some MB of data using qx(...)
Message-Id: <b84b8820-5754-459f-ae10-9c461d9604a4@e35g2000vbl.googlegroups.com>
On Jul 22, 7:02=A0am, Wolfram Humann <w.c.hum...@arcor.de> wrote:
> I have a program that processes PDF files by converting them to
> Postscript, read the ps and do something with it. I use pdftops (from
> xpdf) for the pdf->ps conversion and retrieve the result like this:
>
> $ps_text =3D qx( pdftops $infile - );
>
> On win32 using strawberry perl (tried 5.10 and 5.12) this takes much
> more time than I expected so I did a test and first converted the PDF
> to Postscript, then read the Postscript (about 12 MB) like this (cat
> on win32 provided by cygwin):
>
> perl -E" $t =3D qx(cat psfile.ps); say length $t "
>
> This takes about 16 seconds on win32 but only <1 seconds on Linux.
Dear Wolfram,
I've encountered a similar problem on Strawberry Perl before.
I'm curious: Could you try "pre-allocating" the needed space to
$ps_text (or $t) before you set it? For example, try this:
perl -E "$t =3D ' ' x (-s 'psfile.ps'); $t =3D qx(cat psfile.ps); say
length $t"
See if that helps. I've found that setting my variable to the
target length BEFORE it's set to the proper string can reduce time
significantly (when it is eventually being set to its target value).
I'm not sure why this is so, but I can guess that it's because it can
avoid the time-consuming process of "growing" the string a little at a
time.
I hope this helps,
-- Jean-Luc
------------------------------
Date: Tue, 27 Jul 2010 23:09:34 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Speed of reading some MB of data using qx(...)
Message-Id: <slrni4uioe.2os.hjp-usenet2@hrunkner.hjp.at>
On 2010-07-27 17:07, Ben Morrow <ben@morrow.me.uk> wrote:
>
> Quoth Wolfram Humann <w.c.humann@arcor.de>:
>> On Jul 27, 4:13 pm, Ben Morrow <b...@morrow.me.uk> wrote:
>> > Would you be able to repeat these tests with 5.12.0: that is, build
>> > (under Cygwin, if you don't have access to a Unix system) matched perls
>> > configured with and without -Dusemymalloc, and run the test on both?
>> > I'll try and do the same here, but I can't promise I'll have time. If
>> > the slowdown still exists in 5.12, I think you have a good case for a
>> > bug report. I'm not sure how possible it would be to fix, but it would
>> > clearly be a big win under some circumstances to be able to build Win32
>> > perl with perl's malloc.
>>
>> I do have a linux machine and I did comile my own perl there so I
>> think a could redo that (possibly easier than recompiling Perl for
>> Cygwin). The strange thing is that on Cygwin perl -V says
>> 'usemymalloc=y' while the one on Linux says 'usemymalloc=n'. And on
>> Linux my bechmark runs everything under 12 ms. Are you certain
>> changing usemymalloc would have much effect there?
>
> No. It's possible that glibc's malloc already behaves the way perl is
> expecting it to, so using perl's malloc doesn't change the performance
> much.
I'm pretty sure that GNU malloc doesn't round up to powers of two or
something like that. However, the performance difference between GNU
malloc and Perl malloc is rather small:
perl 5.12.1, default config, EGLIBC 2.11.2-2:
1E5 chars + 1E4 x 1E2 chars: 3.9 ms
1E6 chars + 1E4 x 1E2 chars: 3.8 ms
1E7 chars + 1E4 x 1E2 chars: 4.4 ms
1E7 chars + 1E5 x 1E1 chars: 28.4 ms
1E7 chars + 1E4 x 1E2 chars: 4.5 ms
1E7 chars + 1E3 x 1E3 chars: 2.6 ms
1E7 chars + 1E2 x 1E4 chars: 2.0 ms
1E7 chars + 1E1 x 1E5 chars: 1.9 ms
1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 2.0 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 4.4 ms
perl 5.12.1, usemymalloc=y, EGLIBC 2.11.2-2:
1E5 chars + 1E4 x 1E2 chars: 2.6 ms
1E6 chars + 1E4 x 1E2 chars: 3.8 ms
1E7 chars + 1E4 x 1E2 chars: 2.5 ms
1E7 chars + 1E5 x 1E1 chars: 18.8 ms
1E7 chars + 1E4 x 1E2 chars: 2.5 ms
1E7 chars + 1E3 x 1E3 chars: 0.9 ms
1E7 chars + 1E2 x 1E4 chars: 0.9 ms
1E7 chars + 1E1 x 1E5 chars: 1.1 ms
1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.9 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 3.4 ms
That may be accidental, though: An strace output for GNU malloc) shows
that very few reallocations actually result in a different address -
mostly the allocated area can be grown because there is nothing after
it. This is even true if two strings grow in parallel - each time one of
the strings moves it leaves a hole which the other string can grow into,
so in practice this works like a binary backoff. I guess there are
allocation patterns which spoil this effect (e.g. if you allocate lots
of small objects while growing large strings) but I haven't tried to
find them.
hp
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3047
***************************************