[31559] in Perl-Users-Digest
Perl-Users Digest, Issue: 2818 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Feb 15 14:09:23 2010
Date: Mon, 15 Feb 2010 11:09:06 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Mon, 15 Feb 2010 Volume: 11 Number: 2818
Today's topics:
hashes (geo::coder::google) <fgnowfg@gmail.com>
Re: hashes (geo::coder::google) <hjp-usenet2@hjp.at>
How to get offset position from unpack()? <jl_post@hotmail.com>
Re: Is a merge interval function available? <e9427749@stud4.tuwien.ac.at>
Re: know-how(-not) about regular expressions sln@netherlands.com
Re: know-how(-not) about regular expressions <hjp-usenet2@hjp.at>
Re: know-how(-not) about regular expressions sln@netherlands.com
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sun, 14 Feb 2010 23:17:20 -0800 (PST)
From: Faith Greenwood <fgnowfg@gmail.com>
Subject: hashes (geo::coder::google)
Message-Id: <567c00a9-d53d-4ff3-b6d7-066a6944ff0b@d27g2000yqn.googlegroups.com>
I am trying to do some geocoding w/ geo::coder::google. It works fine
but my problem is accessing the values in the hash. How can I return
the coordinates (ie '-122.397323','37.778993',)?
#here is the hash (as posted on CSPAN page for Geo::Coder::Google)
##Link for a more legible version:
http://search.cpan.org/~miyagawa/Geo-Coder-Google-0.06/lib/Geo/Coder/Google.pm
{
'AddressDetails' => {
'Country' => {
'AdministrativeArea' => {
'SubAdministrativeArea' => {
'SubAdministrativeAreaName' => 'San Francisco',
'Locality' => {
'PostalCode' => {
'PostalCodeNumber' => '94107'
},
'LocalityName' => 'San Francisco',
'Thoroughfare' => {
'ThoroughfareName' => '548 4th St'
}
}
},
'AdministrativeAreaName' => 'CA'
},
'CountryNameCode' => 'US'
}
},
'address' => '548 4th St, San Francisco, CA 94107, USA',
'Point' => {
'coordinates' => [
'-122.397323',
'37.778993',
0
]
}
}
------------
my code is as follows:
..
my $geocoder = Geo::Coder::Google->new(apikey=>'my_key');
my $location=undef; $location=$geocoder->geocode(location=>"548 4th
St, San Francisco, CA 94107, USA");
while (my ($key,$value)=each(%$location)){
print "$key=>$value\n";
}
..
#unfortunately, I am unable to get past this part and can't find
anything w/in the module to help.
thx!
------------------------------
Date: Mon, 15 Feb 2010 12:10:57 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: hashes (geo::coder::google)
Message-Id: <slrnhniau3.o0j.hjp-usenet2@hrunkner.hjp.at>
On 2010-02-15 07:17, Faith Greenwood <fgnowfg@gmail.com> wrote:
> I am trying to do some geocoding w/ geo::coder::google. It works fine
> but my problem is accessing the values in the hash. How can I return
> the coordinates (ie '-122.397323','37.778993',)?
>
> #here is the hash (as posted on CSPAN page for Geo::Coder::Google)
> ##Link for a more legible version:
> http://search.cpan.org/~miyagawa/Geo-Coder-Google-0.06/lib/Geo/Coder/Google.pm
> {
> 'AddressDetails' => {
[...]
> },
> 'address' => '548 4th St, San Francisco, CA 94107, USA',
> 'Point' => {
> 'coordinates' => [
> '-122.397323',
> '37.778993',
> 0
> ]
> }
> }
>
>
> ------------
> my code is as follows:
>
> ..
> my $geocoder = Geo::Coder::Google->new(apikey=>'my_key');
> my $location=undef; $location=$geocoder->geocode(location=>"548 4th
> St, San Francisco, CA 94107, USA");
>
> while (my ($key,$value)=each(%$location)){
> print "$key=>$value\n";
> }
> ..
Read perldoc perlreftut for an introduction to references.
If $location contains the hashref above, then
$location->{Point} is
{
'coordinates' => [
'-122.397323',
'37.778993',
0
]
}
$location->{Point}{coordinates} is
[
'-122.397323',
'37.778993',
0
]
and finally, $location->{Point}{coordinates}[0] is
'-122.397323',
hp
------------------------------
Date: Mon, 15 Feb 2010 08:39:44 -0800 (PST)
From: "jl_post@hotmail.com" <jl_post@hotmail.com>
Subject: How to get offset position from unpack()?
Message-Id: <1533bf00-bf43-4c21-9de2-d21f751b5cca@t31g2000prh.googlegroups.com>
Hi,
The unpack() function is very, very useful for me, as I regularly
do a lot of unpacking of non-Perl-created data strings to see what
information they hold. If I didn't have the use of the unpack()
function, certain tasks would be much more difficult.
However, there's something I want to do with unpack() that I
haven't figured out how to do: I'd like to unpack part of a string,
but keep track of where the unpacking ended, so I can resume unpacking
the string (at a later time) where I left off.
Here's a trivial example:
Let's say I have a data string that holds lists of strings, like
this:
" 2 5hello 5world 2 2hi 5there"
The first number (" 2") signifies that the first list holds two
strings. The next number (" 5") signifies that the first encoded
string is 5 characters long. The next number (also a " 5") signifies
the same for the next encoded string.
So I could write a format string for unpack() to be: "a2/(a2/a)"
So the lines of code:
my $dataString = ' 2 5hello 5world extra data';
my @a = unpack 'a2/(a2/a)', $dataString;
print "$_\n" foreach @a;
would output:
hello
world
My question becomes: What if I want to parse out the extra data
later with a different pack string? It would be nice if there was a
way to return the current offset somehow with unpack(), so that I
could unpack again with something like this:
my @b = unpack "\@$offset $newPackString", $dataString;
Now, I could calculate this offset myself by examining what was
placed in @a, but this gets tricky fast with packstrings that use "Z",
"A", and 'a' (and combinations).
(Incidentally, C's sscanf() function has a little-known "n" format
character that returns the number of characters consumed. I'm hoping
that unpack() has a similar feature.)
I posted a similar question back in 2004, and Anno Siegel responded
with the suggestion of adding "a*" to my first packstring, and then
using the length() of the last element to calculate the offset, like
this:
my $dataString = ' 2 5hello 5world extra data';
my @a = unpack 'a2/(a2/a) a*', $dataString;
my $offset = length($dataString) - length( pop(@a) );
print "$_\n" foreach @a;
my @b = unpack "\@$offset $newPackString", $dataString;
While this approach technically works, repeatedly using "a*" at the
end of a packstring in a continual loop creates a O(n^2) algorithm.
This isn't a problem for short $dataStrings, but is a significant
problem when $dataStrings are long and/or have no limit in length.
I've noticed that Perl 5.10 added lots of convenient new features
to pack() and unpack() (such as the ability to pack floats and doubles
in an endian-ness different than your own), so I'm hoping that
unpack() now has a way to return the $dataString offset. However,
I've read both "perldoc -f unpack" and "perldoc -f pack" but I can't
seem to find this behavior documented, if it exists at all.
So does anyone know if I can get unpack() to return an offset?
Thanks!
-- Jean-Luc
------------------------------
Date: Sun, 14 Feb 2010 23:01:32 +0100
From: Josef <e9427749@stud4.tuwien.ac.at>
Subject: Re: Is a merge interval function available?
Message-Id: <4b7872c1$0$11352$3b214f66@tunews.univie.ac.at>
Peng Yu wrote:
> I'm wondering there is already a function in perl library that can
> merge intervals.
Maybe CPAN is with you: Set::Infinite, Set::IntSpan
br,
Josef
------------------------------
Date: Sun, 14 Feb 2010 14:22:26 -0800
From: sln@netherlands.com
Subject: Re: know-how(-not) about regular expressions
Message-Id: <38tgn5lun2imorfencb58hnraj5aqgk6d0@4ax.com>
On Sun, 14 Feb 2010 13:11:13 +0100, "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>On 2010-02-13 17:15, sln@netherlands.com <sln@netherlands.com> wrote:
>> On Fri, 12 Feb 2010 15:22:20 -0800, sln@netherlands.com wrote:
>>
>>>On Fri, 12 Feb 2010 12:40:14 +0100, Helmut Richter <hhr-m@web.de> wrote:
>>>
>> [snip]
>>
>>>Yea, this is better. Slow but maybe try to reduce copying with a while(/.../g) type
>>>of thing.
>>
>> [snip]
>>
>>>>This version runs considerably slower, by a factor of three
>>
>> [snip]
>>
>>>I didn't bench the code, its probably fairly quick.
>>
>> [snip]
>>
>> I did bench the code on a 7 mbyte file 'mscore.xml'.
>> What really makes it slow on large files is the constant
>> "appending" to a variable. Its roughly 2 times + slower doing
>> it this way.
>>
>> The fastest way to do it, is to write it to the disk as you
>> get it. Pass in a filehandle, or some other method.
>>
>> Perl would have to spend all its time on realloc() because
>> of all the appending.
>
>That's a surprising result. Perl doubles the size of a string every time
>it needs to expand it, so it shouldn't have to realloc much
>(only O(log(length($MarkupNew))) times).
>
>As it is, I cannot reproduce your result. Trying it on a 22 MB file I
>get these times:
>
>append 9.031 9.041 9.150
>tempfile 9.285 9.370 9.479
>
>As you can see, appending is consistently faster than writing to a
>temporary file and reading it back.
>
>According to Devel::NYTProf nearly all of the time is spent in these
>lines:
>
>
> while ($$markup =~ /$Rxmarkup/g)
>
> $begin_pos = pos($$markup);
>
> while ($$strref =~ /$Rxent/g) {
>
>where the second is the end of the loop started in the first, so I
>suspect that the time attributed to the second line is really spent in
>the match, not the pos call.
>
> hp
>
>PS: The nytprofhtml output is at http://www.hjp.at/junk/nytprof/
I looked at that profiling result. Impressive utility. Is it free?
To isolate what I am seeing, I am posting a benchmark that simulates
what I found on the other code. It shows huge performance degredation.
I don't know if its the Perl build 5.10.0 (from ActiveState) or what.
Run this and compare the relative numbers with your build.
I'd feel better knowing Perl is not like this and there is a grave error
on my part/and or build.
Thanks.
-sln
-----------------------
## bench.pl
## ----------
use strict;
use warnings;
use Benchmark ':hireswallclock';
my ($t0,$t1);
my @limit = (
0, # 0
1_000_000, # 1 MB
2_000_000, # 2 MB
3_000_000, # 3 MB
4_000_000, # 4 MB
5_000_000 # 5 MB
);
my @buf = ('') x scalar(@limit);
my $append = '<RXZWQ>sdfgg<oo/>';
print "Starting ...\n";
for (1 .. 2)
{
print "\n",'-' x 30,"\n>> Pass $_:\n";
for my $ndx (0 .. $#limit)
{
my ($t0,$t1);
$buf[$ndx] = 'P' x $limit[$ndx]; # pre-allocate buffer from limit array
$buf[$ndx] = ''; # clear buffer
$t0 = new Benchmark;
for ( 1 .. 235_000 ) { # simulate 235,000 segment appends
$buf[$ndx] .= $append; # from 'mscorlib.xml'
}
$t1 = new Benchmark;
printf STDERR "\nBuf[%d]", $ndx;
printf STDERR ", start size = %.0fmb", $limit[$ndx]/1_000_000;
printf STDERR ", current size = %d bytes\n", length $buf[$ndx];
print STDERR "code metrics: ",timestr( timediff($t1, $t0) ),"\n";
}
}
print "\n", '-' x 30, "\n";
system ('perl -V');
__END__
Output =
Starting ...
------------------------------
>> Pass 1:
Buf[0], start size = 0mb, current size = 3995000 bytes
code metrics: 2.32798 wallclock secs ( 1.52 usr + 0.81 sys = 2.33 CPU)
Buf[1], start size = 1mb, current size = 3995000 bytes
code metrics: 2.23181 wallclock secs ( 1.47 usr + 0.77 sys = 2.23 CPU)
Buf[2], start size = 2mb, current size = 3995000 bytes
code metrics: 1.7917 wallclock secs ( 1.34 usr + 0.45 sys = 1.80 CPU)
Buf[3], start size = 3mb, current size = 3995000 bytes
code metrics: 1.0548 wallclock secs ( 0.78 usr + 0.28 sys = 1.06 CPU)
Buf[4], start size = 4mb, current size = 3995000 bytes
code metrics: 0.0685248 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU)
Buf[5], start size = 5mb, current size = 3995000 bytes
code metrics: 0.0682061 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
------------------------------
>> Pass 2:
Buf[0], start size = 0mb, current size = 3995000 bytes
code metrics: 0.0659492 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
Buf[1], start size = 1mb, current size = 3995000 bytes
code metrics: 0.0691559 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU)
Buf[2], start size = 2mb, current size = 3995000 bytes
code metrics: 0.069617 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
Buf[3], start size = 3mb, current size = 3995000 bytes
code metrics: 0.0686679 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
Buf[4], start size = 4mb, current size = 3995000 bytes
code metrics: 0.0811398 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU)
Buf[5], start size = 5mb, current size = 3995000 bytes
code metrics: 0.068722 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU)
------------------------------
Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
Platform:
osname=MSWin32, osvers=5.00, archname=MSWin32-x86-multi-thread
uname=''
config_args='undef'
hint=recommended, useposix=true, d_sigaction=undef
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=undef, use64bitall=undef, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cl', ccflags ='-nologo -GF -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 -D_CONSOLE -
DNO_STRICT -DHAVE_DES_FCRYPT -DUSE_SITECUSTOMIZE -DPRIVLIB_LAST_IN_INC -DPERL_IM
PLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL_MSVCRT_READFIX',
optimize='-MD -Zi -DNDEBUG -O1',
cppflags='-DWIN32'
ccversion='12.0.8804', gccversion='', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64', lseeksi
ze=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf -libpath:"C:
\Perl\lib\CORE" -machine:x86'
libpth=\lib
libs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32
.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib ws2_
32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib
perllibs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comd
lg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib
ws2_32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib
libc=msvcrt.lib, so=dll, useshrplib=true, libperl=perl510.lib
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt:ref,icf -
libpath:"C:\Perl\lib\CORE" -machine:x86'
Characteristics of this binary (from libperl):
Compile-time options: MULTIPLICITY PERL_DONT_CREATE_GVSV
PERL_IMPLICIT_CONTEXT PERL_IMPLICIT_SYS
PERL_MALLOC_WRAP PL_OP_SLAB_ALLOC USE_ITHREADS
USE_LARGE_FILES USE_PERLIO USE_SITECUSTOMIZE
Locally applied patches:
ActivePerl Build 1004 [287188]
33741 avoids segfaults invoking S_raise_signal() (on Linux)
33763 Win32 process ids can have more than 16 bits
32809 Load 'loadable object' with non-default file extension
32728 64-bit fix for Time::Local
Built under MSWin32
Compiled at Sep 3 2008 13:16:37
@INC:
C:/Perl/site/lib
C:/Perl/lib
.
------------------------------
Date: Mon, 15 Feb 2010 00:10:56 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: know-how(-not) about regular expressions
Message-Id: <slrnhnh0o2.ovp.hjp-usenet2@hrunkner.hjp.at>
On 2010-02-14 22:22, sln@netherlands.com <sln@netherlands.com> wrote:
> On Sun, 14 Feb 2010 13:11:13 +0100, "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>
>>On 2010-02-13 17:15, sln@netherlands.com <sln@netherlands.com> wrote:
>>> I did bench the code on a 7 mbyte file 'mscore.xml'.
>>> What really makes it slow on large files is the constant
>>> "appending" to a variable. Its roughly 2 times + slower doing
>>> it this way.
>>>
>>> The fastest way to do it, is to write it to the disk as you
>>> get it. Pass in a filehandle, or some other method.
>>>
>>> Perl would have to spend all its time on realloc() because
>>> of all the appending.
>>
>>That's a surprising result. Perl doubles the size of a string every time
>>it needs to expand it, so it shouldn't have to realloc much
>>(only O(log(length($MarkupNew))) times).
>>
>>As it is, I cannot reproduce your result. Trying it on a 22 MB file I
>>get these times:
>>
>>append 9.031 9.041 9.150
>>tempfile 9.285 9.370 9.479
>>
>>As you can see, appending is consistently faster than writing to a
>>temporary file and reading it back.
>>
>>According to Devel::NYTProf nearly all of the time is spent in these
>>lines:
[...]
>>PS: The nytprofhtml output is at http://www.hjp.at/junk/nytprof/
>
> I looked at that profiling result. Impressive utility. Is it free?
Yes. Available from CPAN.
Devel::NYTProf is really nice. However, it adds a rather large overhead
(smaller than most other Perl profilers, but still large), so it is
impractical for programs which run for a long time and sometimes the
overhead hides the real performance bottleneck.
> To isolate what I am seeing, I am posting a benchmark that simulates
> what I found on the other code. It shows huge performance degredation.
> I don't know if its the Perl build 5.10.0 (from ActiveState) or what.
[...]
> ------------------------------
>>> Pass 1:
>
> Buf[0], start size = 0mb, current size = 3995000 bytes
> code metrics: 2.32798 wallclock secs ( 1.52 usr + 0.81 sys = 2.33 CPU)
>
> Buf[1], start size = 1mb, current size = 3995000 bytes
> code metrics: 2.23181 wallclock secs ( 1.47 usr + 0.77 sys = 2.23 CPU)
>
> Buf[2], start size = 2mb, current size = 3995000 bytes
> code metrics: 1.7917 wallclock secs ( 1.34 usr + 0.45 sys = 1.80 CPU)
>
> Buf[3], start size = 3mb, current size = 3995000 bytes
> code metrics: 1.0548 wallclock secs ( 0.78 usr + 0.28 sys = 1.06 CPU)
>
> Buf[4], start size = 4mb, current size = 3995000 bytes
> code metrics: 0.0685248 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU)
>
> Buf[5], start size = 5mb, current size = 3995000 bytes
> code metrics: 0.0682061 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
>
> ------------------------------
>>> Pass 2:
>
> Buf[0], start size = 0mb, current size = 3995000 bytes
> code metrics: 0.0659492 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
>
> Buf[1], start size = 1mb, current size = 3995000 bytes
> code metrics: 0.0691559 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU)
>
> Buf[2], start size = 2mb, current size = 3995000 bytes
> code metrics: 0.069617 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
>
> Buf[3], start size = 3mb, current size = 3995000 bytes
> code metrics: 0.0686679 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
>
> Buf[4], start size = 4mb, current size = 3995000 bytes
> code metrics: 0.0811398 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU)
>
> Buf[5], start size = 5mb, current size = 3995000 bytes
> code metrics: 0.068722 wallclock secs ( 0.08 usr + 0.00 sys = 0.08 CPU)
>
> ------------------------------
Ouch. That's really a ludicrous slowdown.
Here are the results from my home system:
------------------------------
>> Pass 1:
Buf[0], start size = 0mb, current size = 3995000 bytes
code metrics: 0.093436 wallclock secs ( 0.08 usr + 0.01 sys = 0.09 CPU)
Buf[1], start size = 1mb, current size = 3995000 bytes
code metrics: 0.105453 wallclock secs ( 0.10 usr + 0.01 sys = 0.11 CPU)
Buf[2], start size = 2mb, current size = 3995000 bytes
code metrics: 0.10132 wallclock secs ( 0.07 usr + 0.03 sys = 0.10 CPU)
Buf[3], start size = 3mb, current size = 3995000 bytes
code metrics: 0.10031 wallclock secs ( 0.05 usr + 0.04 sys = 0.09 CPU)
Buf[4], start size = 4mb, current size = 3995000 bytes
code metrics: 0.0609372 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
Buf[5], start size = 5mb, current size = 3995000 bytes
code metrics: 0.060972 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
------------------------------
>> Pass 2:
Buf[0], start size = 0mb, current size = 3995000 bytes
code metrics: 0.058821 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
Buf[1], start size = 1mb, current size = 3995000 bytes
code metrics: 0.0602 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
Buf[2], start size = 2mb, current size = 3995000 bytes
code metrics: 0.060935 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
Buf[3], start size = 3mb, current size = 3995000 bytes
code metrics: 0.0601468 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
Buf[4], start size = 4mb, current size = 3995000 bytes
code metrics: 0.0608931 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
Buf[5], start size = 5mb, current size = 3995000 bytes
code metrics: 0.0607629 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
------------------------------
The base time (0.06 seconds) is about the same as for you so I assume
that we use a processor of roughly the same speed (Intel Core2 6300 @
1.86GHz in my case). But I have only a slowdown of less than 2
(0.10/0.06), and you have a slowdown of almost 35 (2.33/0.068).
I don't have a plausible explanation for that. It seems most likely that
Activestate perl extends strings linearly instead of exponentially but
why it would do such a stupid thing is beyond me.
hp
------------------------------
Date: Mon, 15 Feb 2010 08:36:21 -0800
From: sln@netherlands.com
Subject: Re: know-how(-not) about regular expressions
Message-Id: <hhtin5d27i3auno8snmg9u6cgts9eamm9m@4ax.com>
On Mon, 15 Feb 2010 00:10:56 +0100, "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>On 2010-02-14 22:22, sln@netherlands.com <sln@netherlands.com> wrote:
>> On Sun, 14 Feb 2010 13:11:13 +0100, "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>>
>>>On 2010-02-13 17:15, sln@netherlands.com <sln@netherlands.com> wrote:
>>>> I did bench the code on a 7 mbyte file 'mscore.xml'.
>>>> What really makes it slow on large files is the constant
>>>> "appending" to a variable. Its roughly 2 times + slower doing
>>>> it this way.
>>>>
>>>> The fastest way to do it, is to write it to the disk as you
>>>> get it. Pass in a filehandle, or some other method.
>>>>
>>>> Perl would have to spend all its time on realloc() because
>>>> of all the appending.
>>>
>>>That's a surprising result. Perl doubles the size of a string every time
>>>it needs to expand it, so it shouldn't have to realloc much
>>>(only O(log(length($MarkupNew))) times).
>>>
>>>As it is, I cannot reproduce your result. Trying it on a 22 MB file I
>>>get these times:
>>>
>>>append 9.031 9.041 9.150
>>>tempfile 9.285 9.370 9.479
>>>
>>>As you can see, appending is consistently faster than writing to a
>>>temporary file and reading it back.
>>>
>>>According to Devel::NYTProf nearly all of the time is spent in these
>>>lines:
>[...]
>>>PS: The nytprofhtml output is at http://www.hjp.at/junk/nytprof/
>>
>> I looked at that profiling result. Impressive utility. Is it free?
>
>Yes. Available from CPAN.
>
>Devel::NYTProf is really nice. However, it adds a rather large overhead
>(smaller than most other Perl profilers, but still large), so it is
>impractical for programs which run for a long time and sometimes the
>overhead hides the real performance bottleneck.
>
>
>> To isolate what I am seeing, I am posting a benchmark that simulates
>> what I found on the other code. It shows huge performance degredation.
>> I don't know if its the Perl build 5.10.0 (from ActiveState) or what.
>[...]
>> ------------------------------
>>>> Pass 1:
[snip]
>> ------------------------------
>
>Ouch. That's really a ludicrous slowdown.
>
>Here are the results from my home system:
>------------------------------
[snip]
>The base time (0.06 seconds) is about the same as for you so I assume
>that we use a processor of roughly the same speed (Intel Core2 6300 @
>1.86GHz in my case). But I have only a slowdown of less than 2
>(0.10/0.06), and you have a slowdown of almost 35 (2.33/0.068).
>
>I don't have a plausible explanation for that. It seems most likely that
>Activestate perl extends strings linearly instead of exponentially but
>why it would do such a stupid thing is beyond me.
>
> hp
Yep, I have a 2.35 gz Opteron 170 (over clocked) dual core,
2 gig ram, on Windows XP.
My Activestate is using gcc and built using MS CRT, so its using realloc from win32.
Apparently using the win32 crt - realloc() and flavors, are
crap. If you use custom malloc, example gcc:
quote from link below:
"Compiling perl 5.10.1 without USE_IMP_SYS
and with USE_PERL_MALLOC makes a huge difference." ,
it disables threading ..
Ha ha. M$hit strikes again.
The gory details are to be found here (@ 11/09):
(btw, some guy used an example like mine)
--------------------
Subject:
"Why is Windows 100 times slower than Linux when growing a large scalar?"
http://www.perlmonks.org/?node_id=810276
Subquote:
"The problem seems to lie with the CRT realloc() which grows
the heap in iddy-biddy chunks each time"
----------------------
There are not many windows programs that use realloc(), (I know
I never use it), instead, just malloc and free.
But, in a dynamic typeless language, built on primitive C,
var .= "..." dictates the simplest approach, ie: realloc.
In C++, operator overloading can append using a private growing
scheme without using realloc. Helpfull if using win32 anyway.
In circumstances such as these, if the final size is nearly known,
preallocating using $var = 'a' x $size; or $var = 'a' x $size * 2);
should mitigate this dreadfull circumstance.
I'm actually mortified of this situation.
-sln
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 2818
***************************************