[32029] in Perl-Users-Digest
Perl-Users Digest, Issue: 3293 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Feb 21 03:09:23 2011
Date: Mon, 21 Feb 2011 00:09:06 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Mon, 21 Feb 2011 Volume: 11 Number: 3293
Today's topics:
Re: arithmetic persistence <willem@turtle.stack.nl>
Re: arithmetic persistence <marc.girod@gmail.com>
Re: arithmetic persistence <nospam-abuse@ilyaz.org>
Re: Hashes are good, but not good enough. <nospam-abuse@ilyaz.org>
Re: Hashes are good, but not good enough. <nospam-abuse@ilyaz.org>
Re: List Separator $, behaving oddly <nospam-abuse@ilyaz.org>
Re: List Separator $, behaving oddly sharma__r@hotmail.com
Re: List Separator $, behaving oddly <derykus@gmail.com>
Re: Please help me how is easiest way to extract text b <jurgenex@hotmail.com>
Re: Please help me how is easiest way to extract text b <tadmc@seesig.invalid>
Re: Please help me how is easiest way to extract text b sharma__r@hotmail.com
Text::DAWG (was Re: Hashes are good, but not good enoug <blgl@stacken.kth.se>
Re: Text::DAWG (was Re: Hashes are good, but not good e <nospam.gravitalsun@hotmail.com.nospam>
Re: Text::DAWG (was Re: Hashes are good, but not good e <nospam.gravitalsun@hotmail.com.nospam>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sun, 20 Feb 2011 19:14:16 +0000 (UTC)
From: Willem <willem@turtle.stack.nl>
Subject: Re: arithmetic persistence
Message-Id: <slrnim2q08.7n6.willem@turtle.stack.nl>
Marc Girod wrote:
) On Feb 20, 6:39?pm, Willem <wil...@turtle.stack.nl> wrote:
)
)> Like this, you're skipping a lot of calculations indeed,
)> but at the cost of sorting the digits.
)
) ...which is a rising cost, and ends up being prohibitive...
Nah. What's prohibitive is the memory footprint.
)> By the way, here's a simple version that's marginally faster
)> even, and doesn't require lots of memory. ?It uses a simple
)> pruning trick to cut off calculation when it knows that a
)> result isn't good enough.
)
) Yes, a much simpler idea, indeed.
) I have to get out of my first mindset of getting the value anyway.
)
)> I also wrote this in C, using 64-bit ints, and it turns out that
)> 3778888999 is the first of p(10), which my box found in 2m50.
)
) And I am nowhere near this, of course.
Well, this is almost all arithmetics, so Perl just doesn't compare.
0.8 seconds to find P(9) in C, versus 1m36 in Perl.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
------------------------------
Date: Sun, 20 Feb 2011 11:27:29 -0800 (PST)
From: Marc Girod <marc.girod@gmail.com>
Subject: Re: arithmetic persistence
Message-Id: <dd72f29f-9120-4441-a93f-82ad9a48fd84@n18g2000vbq.googlegroups.com>
On Feb 20, 7:14=A0pm, Willem <wil...@turtle.stack.nl> wrote:
> Nah. =A0What's prohibitive is the memory footprint.
But this rises quite slowly...
324 keys in the hash for 10 millions
459 for 100
596 for 1 billion...
I'd need to profile.
> Well, this is almost all arithmetics, so Perl just doesn't compare.
> 0.8 seconds to find P(9) in C, versus 1m36 in Perl.
Interesting. I was not aware of this ratio.
Thanks.
Marc
------------------------------
Date: Mon, 21 Feb 2011 00:48:58 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: arithmetic persistence
Message-Id: <slrnim3djq.if5.nospam-abuse@powdermilk.math.berkeley.edu>
On 2011-02-20, Willem <willem@turtle.stack.nl> wrote:
> Oh I see. Does that help ? I would imagine that looking up
> the results in an array would give a big speedup.
Me too. But it looks like there is little speedup or even slowdown; see below.
> use strict;
> use warnings;
> use List::Util qw(reduce);
>
> my $found = 0;
> my $fnum = 0;
>
> for (my $i = 10; $found < 8; $i++) {
> my $prod = reduce { $a * $b } split('', $i);
> next if ($prod < $fnum);
> my $cnt = 1;
> while ($prod >= 10) {
> $prod = reduce { $a * $b } split('', $prod);
> $cnt++;
> }
> if ($cnt > $found) {
> $found = $cnt;
> $fnum = $i;
> print "$i is the first of p($cnt)\n";
> }
> }
On my machine what is below is almost an order of magnitude better.
It also allows tuning (first arg is the target for $found [8 above]);
second arg gives size of cache in decimal digits (should be at least
half of the size of the answer). On machine arguments 8 4, 8 5, 8 6
finish in very similar time - this means that benefits of caching are
eaten by not being able to prune when caching...
Hope this helps,
Ilya
#!/usr/bin/perl -w
use strict;
use List::Util qw(reduce);
my $found = 0;
my $fnum = 0;
my $lim = shift;
my $cache_lim10 = shift;
my $cache_lim = 10**$cache_lim10;
my $rex_lim = '.' x $cache_lim10;
my (@prod, @perc, $prod, $p1, $p2, $cnt, $i);
sub report_size ($$) {
my ($i, $cnt) = (shift, shift);
$found = $cnt;
$fnum = $i;
print "$i is the first of p($cnt)\n";
}
$prod[$_] = $_, $perc[$_] = 0 for 0..9;
$#prod = $#perc = $cache_lim;
for my $i (10..$cache_lim-1) { # Round 1: cache, no pruning
if ($i =~ /0/) {
$prod = $prod[$i] = 0;
} else {
$prod = $prod[$i] = ($i%10) * $prod[int($i/10)];
}
report_size($i, $p1)
if ($p1 = $perc[$i] = $perc[$prod] + 1) > $found;
}
LOOP: # Round 2: non-hashing, pruning
for (my $i = $cache_lim; $found < $lim; $i++) {
next if $i =~ /0/;
$prod = $prod[$i % $cache_lim]*$prod[int($i / $cache_lim)];
next if $prod < $fnum; # Prune
$cnt = 1;
while ($prod >= $cache_lim) {
next LOOP if $prod =~ /0/;
$prod = $prod[$prod % $cache_lim]*$prod[int($prod / $cache_lim)];
++$cnt;
}
$cnt += $perc[$prod];
report_size($i, $cnt) if $cnt > $found;
}
------------------------------
Date: Mon, 21 Feb 2011 01:07:57 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: Hashes are good, but not good enough.
Message-Id: <slrnim3enc.if5.nospam-abuse@powdermilk.math.berkeley.edu>
On 2011-02-18, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
>> There is no problem on deletion.
>
> A deletion may force you...
Now I see how much I goofed here... Thanks!
>> (And I was considered "DAWG vs *folded* tries"
>> - those where a node contains a substring, not a char.)
> I found several references to "folded tries", but all of them were about
> storing unicode strings and they stored only part of a character per
> node. Anyway, whatever optimization you use, you can also use it for a
> DAWG, so I don't see the difference.
Won't work for DAWGs: E.g., in a folded trie for /usr/bin/words, after
I reach prefix "a" "b" "d" "i", all I need to store is "cate". I do
not expect you get any significant number of such chains of letters in
the corresponding DAWG.
>>> If you use it to store
>>> data without common suffixes it will degenerate into a trie, so it should
>>> never be worse than a trie, IMHO.
>>
>> DAWG may be MUCH worse than a folded trie - even if you fold a DAWG.
>
> You didn't show why my assumption that DAWG will degenerate into a trie
> in the worst case is false.
I just do not see why "optimizing" a folded trie into a (folded) DAWG
would ALWAYS save space. At LEAST, this should depend on the size of
overhead per node. Storing M strings of length L would take about LM
bytes of memory (imaging these strings as tails in a folded trie).
Now convert them into a DAWG - you get N nodes of size S. Obviously,
for large enough S, NS is much more than LM.
My guts feeling is that even with reasonable implemementations, QUITE
OFTEN one would have NS > ML.
Yours,
Ilya
------------------------------
Date: Mon, 21 Feb 2011 01:39:48 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: Hashes are good, but not good enough.
Message-Id: <slrnim3gj3.ihm.nospam-abuse@powdermilk.math.berkeley.edu>
On 2011-02-19, Bo Lindbergh <blgl@stacken.kth.se> wrote:
> The /usr/share/dict/words that comes with Mac OS X 10.6 has 234936 words
> totalling 2251877 bytes (newlines excluded). A trie matching these
> words (with a one-bit flag per vertex instead of end-of-word edges) has
> 791089 vertices. An optimal DAWG based on this trie has 130892 vertices.
> I don't know about you people, but I call a factor of six "significant
> savings".
This is an example of a completely useless stat. What is the URL for
this list? Do you consider tries or folded tries (those with a
string-per-node, not char-per-node)? 6x-difference in node count MAY
be still trivial if node sizes differ.
Note that if you implemement your DAWG
as struct {struct node*, char}
then on 64-bit machine it would take 2094272 bytes, so would provide
little space advantage over the initial word list (but blazing-fast
retrieval).
Ilya
------------------------------
Date: Sun, 20 Feb 2011 19:37:38 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: List Separator $, behaving oddly
Message-Id: <slrnim2rc2.huf.nospam-abuse@powdermilk.math.berkeley.edu>
On 2011-02-20, sharma__r@hotmail.com <sharma__r@hotmail.com> wrote:
> So what you are implying is that Perl looks at the "address" of $,
> to enable its magic.
Do not think it is a productive way to describe the situation.
The ACTUAL way things happen is that nobody ever reads $, . A certain
container is attached to a "scalar" slot of *, ; Perl internals read
THIS CONTAINER when print() happens. *foo = \BAR reassign the scalar
slot of *foo.
> And why doesn't this behavior impacting the Exporter ?
It is not clear what are you asking. After
*foo = \$, ;
assignments to $foo would change the same container as assignments to
$, .
> perl -wle "*foo = \$, ; $foo = 12; print 1,2"
1122
Ilya
------------------------------
Date: Sun, 20 Feb 2011 18:16:10 -0800 (PST)
From: sharma__r@hotmail.com
Subject: Re: List Separator $, behaving oddly
Message-Id: <98195f92-7b01-4fda-8b37-b1a457c8ac51@t19g2000prd.googlegroups.com>
On Feb 21, 12:37=A0am, Ilya Zakharevich <nospam-ab...@ilyaz.org> wrote:
> On 2011-02-20, sharma...@hotmail.com <sharma...@hotmail.com> wrote:
>
> > So what you are implying is that Perl looks at the "address" of $,
> > to enable its magic.
>
> Do not think it is a productive way to describe the situation.
>
> The ACTUAL way things happen is that nobody ever reads $, . =A0A certain
> container is attached to a "scalar" slot of *, ; Perl internals read
> THIS CONTAINER when print() happens. =A0*foo =3D \BAR reassign the scalar
> slot of *foo.
>
> > And why doesn't this behavior impacting the Exporter ?
>
> It is not clear what are you asking. =A0After
> =A0 *foo =3D \$, ;
> assignments to $foo would change the same container as assignments to
> $, .
>
> =A0 > perl -wle "*foo =3D \$, ; $foo =3D 12; print 1,2"
> =A0 1122
>
> Ilya
How do we explain what is going on in this scenario:
#!/usr/local/bin/perl
use strict; use warnings;
print "Before:...";
print "\$,=3D[$,]";
print "\\\$,=3D",\$,;
print qw(A B); #< -- AB
no strict 'refs';
*{"::,"} =3D \do{":"};
use strict 'refs';
print "After1:...";
print "\$,=3D[$,]";
print "\\\$,=3D",\$,;
print qw(A B); # <---- AB
$, =3D "+";
print "After2:...";
print "\$,=3D[$,]";
print "\\\$,=3D",\$,; #
print qw(A B); # <--- AB
__END__
Even when $, is reassigned as "+" the print is not taking it. This is
maybe coz container is still pointing to the do{":}" even now.
That means perl has stored the address of the "original" container of
$, and looks at that when print() is invoked. And that is
an undocumented feature.
--Rakesh
------------------------------
Date: Sun, 20 Feb 2011 21:02:26 -0800 (PST)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: List Separator $, behaving oddly
Message-Id: <02d881be-0e08-49e7-84fd-b6ea3fc4821f@8g2000prt.googlegroups.com>
On Feb 20, 6:16=A0pm, sharma...@hotmail.com wrote:
> On Feb 21, 12:37=A0am, Ilya Zakharevich <nospam-ab...@ilyaz.org> wrote:
>
>
>
> > On 2011-02-20, sharma...@hotmail.com <sharma...@hotmail.com> wrote:
>
> > > So what you are implying is that Perl looks at the "address" of $,
> > > to enable its magic.
>
> > Do not think it is a productive way to describe the situation.
>
> > The ACTUAL way things happen is that nobody ever reads $, . =A0A certai=
n
> > container is attached to a "scalar" slot of *, ; Perl internals read
> > THIS CONTAINER when print() happens. =A0*foo =3D \BAR reassign the scal=
ar
> > slot of *foo.
>
> > > And why doesn't this behavior impacting the Exporter ?
>
> > It is not clear what are you asking. =A0After
> > =A0 *foo =3D \$, ;
> > assignments to $foo would change the same container as assignments to
> > $, .
>
> > =A0 > perl -wle "*foo =3D \$, ; $foo =3D 12; print 1,2"
> > =A0 1122
>
> > Ilya
>
> How do we explain what is going on in this scenario:
>
> #!/usr/local/bin/perl
> use strict; use warnings;
>
> print "Before:...";
> print "\$,=3D[$,]";
> print "\\\$,=3D",\$,;
> print qw(A B); #< -- AB
>
> no strict 'refs';
> =A0 =A0 =A0 =A0 *{"::,"} =3D \do{":"};
> use strict 'refs';
> print "After1:...";
> print "\$,=3D[$,]";
> print "\\\$,=3D",\$,;
> print qw(A B); # <---- AB
>
> =A0 =A0 =A0 =A0 $, =3D "+";
> print "After2:...";
> print "\$,=3D[$,]";
> print "\\\$,=3D",\$,; #
> print qw(A B); # <--- AB
> __END__
>
> Even when $, is reassigned as "+" the print is not taking it. This is
> maybe coz container is still pointing to the do{":}" even now.
> That means perl has stored the address of the "original" container of
> $, and looks at that when print() is invoked. And that is
> an undocumented feature.
What version? Program output is considerably
different with modern Perl versions:
This is perl, v5.10.1 (*) built for amd64-freebsd
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Before:...
Use of uninitialized value $, in concatenation (.)
or string...
$,=3D[]
\$,=3DSCALAR(0x9054ca8)
AB
After1:...
$,=3D[:]
\$,=3DSCALAR(0x9054e28)
AB
Modification of a read-only value attempted...
This is perl 5, version 12, subversion 2 (v5.12.2)
built for MSWin32-x86-multi-thread
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
Before:...
Use of uninitialized value $, in concatenation (.)
$,=3D[]
\$,=3DSCALAR(0x1a8028c)
AB
After1:...
$,=3D[:]
\$,=3D:SCALAR(0x90af74)
A:B
After2:...
$,=3D[+]
\$,=3D+SCALAR(0x90af74)
A+B
--
Charles DeRykus
------------------------------
Date: Sun, 20 Feb 2011 17:27:55 -0800
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Please help me how is easiest way to extract text between some variable text
Message-Id: <qkf3m6p5ad1tktmuvf00djvet6p7pgb1b2@4ax.com>
"Mladen" <mladen.g@gmail.com> wrote:
>Please help me how is easiest way to extract text between some variable text
>
>Original text
><TH class=name width=100>New name</TH> need to
>extract: New name
>
><TH class=name width=50>Test name </TH> need to
>extract: Test name
>
><TH class=name width=65>Name 2</TH> need
>to extract: Name 2
You have a well-defined data structure. Treating it and analysing it as
if it were plain text would be foolish. Instead take advantage of the
existing structure and use a parser that can parse this data structure.
jue
------------------------------
Date: Sun, 20 Feb 2011 13:43:44 -0600
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: Please help me how is easiest way to extract text between some variable text
Message-Id: <slrnim2rdi.dk4.tadmc@tadbox.sbcglobal.net>
Mladen <mladen.g@gmail.com> wrote:
>
> Please help me how is easiest way to extract text between some variable text
><TH class=name width=100>New name</TH> need to
> extract: New name
The easiest way to process HTML data is to use a module that
processes HTML data.
HTML::TableExtract should be able to do it, but since you did not
provide us complete HTML source, we cannot provide you a complete
solution.
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.
------------------------------
Date: Sun, 20 Feb 2011 21:40:58 -0800 (PST)
From: sharma__r@hotmail.com
Subject: Re: Please help me how is easiest way to extract text between some variable text
Message-Id: <9ff2b436-bbb8-4eec-82be-b76dd198d93f@k15g2000prk.googlegroups.com>
On Feb 20, 11:33=A0pm, "Mladen" <mlade...@gmail.com> wrote:
> Please help me how is easiest way to extract text between some variable t=
ext
>
> Original text
>
> <TH class=3Dname width=3D100>New name</TH> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0need to
> extract: New name
>
> <TH class=3Dname width=3D50>Test name </TH> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 need to
> extract: Test name
>
> <TH class=3Dname width=3D65>Name 2</TH> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0need
> to extract: Name 2
>
> Thanks in advance
#!/usr/local/bin/perl
use strict;
use warnings;
local $\ =3D qq{\n};
my $np;
$np =3D
qr{
[<]
(?:
(?> [^<>]+ )
|
(??{ $np })
)*
[>]
}xms
;
my $var =3D'
original text
<TH class=3Dname width=3D100>New name</TH>
<TH class=3Dname width=3D50>Test name </TH>
need to
<TH class=3Dname width=3D65>Name 2</TH>
need
Thanks in advance
';
while ($var =3D~ m/ $np /xmsg) {
print $1 if $var =3D~ m/\G(.*?)<\/TH>/xmscg;
}
__END__
------------------------------
Date: Sun, 20 Feb 2011 22:14:52 +0100
From: Bo Lindbergh <blgl@stacken.kth.se>
Subject: Text::DAWG (was Re: Hashes are good, but not good enough.)
Message-Id: <ijs089$8op$1@speranza.aioe.org>
In article <3d380234-e788-4512-8b65-b9c5cc663c33@y31g2000prd.googlegroups.com>,
scattered <tooscattered@gmail.com> wrote:
> Having said all this, a DAWG implementation in Perl would be quite
> nice, though it would clearly belong in CPAN rather than as a core
> part of the language. It is hard to believe that it hasn't been done
> by anybody
Maybe the high resource usage during the construction phase
makes people think it's not worth the effort to write?
Anyway, since it isn't _completely_ useless, I named it Text::DAWG
rather than Acme::DAWG. Coming soon to a CPAN mirror near you!
/Bo Lindbergh
------------------------------
Date: Mon, 21 Feb 2011 09:30:02 +0200
From: "George Mpouras" <nospam.gravitalsun@hotmail.com.nospam>
Subject: Re: Text::DAWG (was Re: Hashes are good, but not good enough.)
Message-Id: <ijt48e$2blh$1@ulysses.noc.ntua.gr>
"Bo Lindbergh" <blgl@stacken.kth.se> wrote in message
news:ijs089$8op$1@speranza.aioe.org...
> In article
> <3d380234-e788-4512-8b65-b9c5cc663c33@y31g2000prd.googlegroups.com>,
> scattered <tooscattered@gmail.com> wrote:
>> Having said all this, a DAWG implementation in Perl would be quite
>> nice, though it would clearly belong in CPAN rather than as a core
>> part of the language. It is hard to believe that it hasn't been done
>> by anybody
>
> Maybe the high resource usage during the construction phase
> makes people think it's not worth the effort to write?
>
> Anyway, since it isn't _completely_ useless, I named it Text::DAWG
> rather than Acme::DAWG. Coming soon to a CPAN mirror near you!
>
>
> /Bo Lindbergh
Hi Bo,
I tried at weekend to make one, but I end up with two bloom filter variants.
I am waiting for your module, I realy want to see its insides, please tell
us when it is ready !
G.MPouras
------------------------------
Date: Mon, 21 Feb 2011 09:38:15 +0200
From: "George Mpouras" <nospam.gravitalsun@hotmail.com.nospam>
Subject: Re: Text::DAWG (was Re: Hashes are good, but not good enough.)
Message-Id: <ijt4ns$2d6t$1@ulysses.noc.ntua.gr>
"Bo Lindbergh" <blgl@stacken.kth.se> wrote in message
news:ijs089$8op$1@speranza.aioe.org...
> In article
> <3d380234-e788-4512-8b65-b9c5cc663c33@y31g2000prd.googlegroups.com>,
> scattered <tooscattered@gmail.com> wrote:
>> Having said all this, a DAWG implementation in Perl would be quite
>> nice, though it would clearly belong in CPAN rather than as a core
>> part of the language. It is hard to believe that it hasn't been done
>> by anybody
>
> Maybe the high resource usage during the construction phase
> makes people think it's not worth the effort to write?
>
> Anyway, since it isn't _completely_ useless, I named it Text::DAWG
> rather than Acme::DAWG. Coming soon to a CPAN mirror near you!
>
>
> /Bo Lindbergh
Also try to avoid the Obj.Or. interface because it slow down things, the
classic fantions are prefered here.
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3293
***************************************