[32382] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3649 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Mar 26 06:09:27 2012

Date: Mon, 26 Mar 2012 03:09:06 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 26 Mar 2012     Volume: 11 Number: 3649

Today's topics:
        Filter content from a list: hard-coded expression or re <massion@gmx.de>
    Re: Filter content from a list: hard-coded expression o <rvtol+usenet@xs4all.nl>
        naming modules <oneingray@gmail.com>
    Re: naming modules <ben@morrow.me.uk>
    Re: naming modules <oneingray@gmail.com>
    Re: naming modules <rvtol+usenet@xs4all.nl>
    Re: Problem with splitting data <hjp-usenet2@hjp.at>
    Re: Problem with splitting data <hjp-usenet2@hjp.at>
    Re: Problem with splitting data <uri@stemsystems.com>
    Re: Problem with splitting data <hjp-usenet2@hjp.at>
    Re: Problem with splitting data <rvtol+usenet@xs4all.nl>
    Re: Problem with splitting data <rweikusat@mssgmbh.com>
    Re: Problem with splitting data <hjp-usenet2@hjp.at>
    Re: Problem with splitting data <uri@stemsystems.com>
    Re: Problem with splitting data <ben@morrow.me.uk>
    Re: yet another question about numbers and strings (hymie!)
    Re: yet another question about numbers and strings <ben@morrow.me.uk>
        Your Regex Brain <xahlee@gmail.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sun, 25 Mar 2012 23:00:02 -0700 (PDT)
From: Francois Massion <massion@gmx.de>
Subject: Filter content from a list: hard-coded expression or read from a file?
Message-Id: <6d70d89c-6209-4176-b04d-bf843cde30c6@9g2000vbq.googlegroups.com>

Newbee question:
I have a list of strings like the following list:

Log file content
a long date
the mandatory check
Mark text to replace

I want to keep only the strings which do not begin with certain words.
So far I have done it with a hard coded list of words but this list
may vary and can be very long. I wonder how I could read the list from
a file and achieve the same result.
Here the code which works:

open(INPUT,'mytext.txt') || die("File cannot be opened!\n");
@sentence = <INPUT>;
close(INPUT);
foreach $sentence (@sentence) {
	chomp $sentence;
	if ($sentence !~ m/^a |^the |^therefore /i) { # Actually a very long
list
	push (@filteredresult,$sentence);
}



------------------------------

Date: Mon, 26 Mar 2012 09:48:16 +0200
From: "Dr.Ruud" <rvtol+usenet@xs4all.nl>
Subject: Re: Filter content from a list: hard-coded expression or read from a file?
Message-Id: <4f701f40$0$6902$e4fe514c@news2.news.xs4all.nl>


On 2012-03-26 08:00, Francois Massion wrote:

> Newbee question:

See also the beginners list @perl.org.


> [...]
> open(INPUT,'mytext.txt') || die("File cannot be opened!\n");

   my $infile = 'mytext.txt';

   open my $input, '<', $infile
     or die "Error opening '$infile': $!\n");


> @sentence =<INPUT>;

No need to slurp the file in, when you will process it by line.

   my @words = qw/ a the therefore /;

   my $re = join '|', @words;

   while ( <$input> ) {
       next if /^(?:$re)\x{20}/;
       ...;
   }

-- 
Ruud


------------------------------

Date: Mon, 26 Mar 2012 13:23:23 +0700
From: Ivan Shmakov <oneingray@gmail.com>
Subject: naming modules
Message-Id: <867gy7rick.fsf@gray.siamics.net>

	[Cross-posting to news:comp.lang.perl.misc, as
	news:comp.lang.perl.modules doesn't seems too active.]

	I've decided to put certain Perl sources to CPAN, but I'm having
	trouble inventing some nice names for them.

	In particular, I understand that a module name shouldn't be
	comprised of just the lowercase letters, so to avoid a potential
	name clash with a future Perl pragma.  However, is there a
	reason to avoid all-lowercase names containing colons (as in,
	e. g., common::sense)?

	The other question is whether I should use foo::bar or
	App::Foo::Bar for the modules related to an application Foo?

	And then the third one.  I intend to provide a module to compute
	multiple SHA digests at the same time (as may be used, e. g., to
	generate Debian list files.)  The working title is
	Digest::SHA::combined, but I wonder if I should use
	Digest::SHA::Combined for consistency instead?

	TIA.

-- 
FSF associate member #7257


------------------------------

Date: Mon, 26 Mar 2012 08:18:42 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: naming modules
Message-Id: <irh549-khn2.ln1@anubis.morrow.me.uk>


Quoth Ivan Shmakov <oneingray@gmail.com>:
> 	[Cross-posting to news:comp.lang.perl.misc, as
> 	news:comp.lang.perl.modules doesn't seems too active.]
> 
> 	I've decided to put certain Perl sources to CPAN, but I'm having
> 	trouble inventing some nice names for them.
> 
> 	In particular, I understand that a module name shouldn't be
> 	comprised of just the lowercase letters, so to avoid a potential
> 	name clash with a future Perl pragma.  However, is there a
> 	reason to avoid all-lowercase names containing colons (as in,
> 	e. g., common::sense)?

I would say that module names with the first part all in lowercase
should be considered reserved: the core has warnings::register, for
instance, and there are version::AlphaBeta and version::Limit on CPAN
which are subclasses of version. OTOH there are quite a lot of modules
with parts after the first in lowercase: LWP::Protocol::*, for instance,
or quite a few of the DBD modules.

common::sense considers itself to be pragmatic in nature, since all it's
doing is turning on various core pragmas.

> 	The other question is whether I should use foo::bar or
> 	App::Foo::Bar for the modules related to an application Foo?

App:: is for implementations of applications, not modules which relate
to them. So, for instance, App::Ack implements the guts of ack(1),
rather than being an interface for calling it.

> 	And then the third one.  I intend to provide a module to compute
> 	multiple SHA digests at the same time (as may be used, e. g., to
> 	generate Debian list files.)  The working title is
> 	Digest::SHA::combined, but I wonder if I should use
> 	Digest::SHA::Combined for consistency instead?

Yes. Unless you've got a good reason not to, go with WikiWords with
initial caps and no underscores.

Ben



------------------------------

Date: Mon, 26 Mar 2012 15:24:09 +0700
From: Ivan Shmakov <oneingray@gmail.com>
Subject: Re: naming modules
Message-Id: <86398vrcra.fsf@gray.siamics.net>

>>>>> Ben Morrow <ben@morrow.me.uk> writes:
>>>>> Quoth Ivan Shmakov <oneingray@gmail.com>:

[...]

 >> In particular, I understand that a module name shouldn't be
 >> comprised of just the lowercase letters, so to avoid a potential
 >> name clash with a future Perl pragma.  However, is there a reason to
 >> avoid all-lowercase names containing colons (as in, e. g.,
 >> common::sense)?

 > I would say that module names with the first part all in lowercase
 > should be considered reserved: the core has warnings::register, for
 > instance, and there are version::AlphaBeta and version::Limit on CPAN
 > which are subclasses of version.  OTOH there are quite a lot of
 > modules with parts after the first in lowercase: LWP::Protocol::*,
 > for instance, or quite a few of the DBD modules.

	ACK.  Thanks!

	However, for LWP::Protocol::* it's due to the fact that the URI
	schema names are all-lowercase themselves, I suppose.

	Unfortunately, even though I dislike mixed-case identifiers, I
	have no good reason at hand to avoid it for my Perl code.

 > common::sense considers itself to be pragmatic in nature, since all
 > it's doing is turning on various core pragmas.

	There still is a potential clash should the Perl developers
	choose to implement their own common::sense.

 >> The other question is whether I should use foo::bar or
 >> App::Foo::Bar for the modules related to an application Foo?

 > App:: is for implementations of applications, not modules which
 > relate to them.  So, for instance, App::Ack implements the guts of
 > ack(1), rather than being an interface for calling it.

	Actually, there's to be the modules that handle a format, or
	perhaps a faimily of formats, specific to this particular
	application.

	If App::MyApp::MyFormat doesn't fit, should it be, e. g.,
	Data::MyApp::MyFormat?

 >> And then the third one.  I intend to provide a module to compute
 >> multiple SHA digests at the same time (as may be used, e. g., to
 >> generate Debian list files.)  The working title is
 >> Digest::SHA::combined, but I wonder if I should use
 >> Digest::SHA::Combined for consistency instead?

 > Yes.  Unless you've got a good reason not to, go with WikiWords with
 > initial caps and no underscores.

	Having C, Shell and Lisp dialects as my "programming
	background", I find it quite hard to write in CamelCase.

PS.  I'll try to file an RT ticket against Digest::SHA on whether my
	module could be added to the distribution.

-- 
FSF associate member #7257


------------------------------

Date: Mon, 26 Mar 2012 10:41:55 +0200
From: "Dr.Ruud" <rvtol+usenet@xs4all.nl>
Subject: Re: naming modules
Message-Id: <4f702bd3$0$6852$e4fe514c@news2.news.xs4all.nl>

On 2012-03-26 10:24, Ivan Shmakov wrote:

> Unfortunately, even though I dislike mixed-case identifiers, I
> have no good reason at hand to avoid it for my Perl code.


PACKAGE PurgeAndArchive;

use strict;
use warnings;

our $VERSION; BEGIN { $VERSION = "0.99" }

my %_cache;

 ...

1;  # satisfy require

-- 
Ruud


------------------------------

Date: Thu, 22 Mar 2012 20:40:11 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Problem with splitting data
Message-Id: <slrnjmn00r.9ba.hjp-usenet2@hrunkner.hjp.at>

On 2012-03-21 16:33, Uri Guttman <uri@stemsystems.com> wrote:
>>>>>> "TM" == Tim McDaniel <tmcd@panix.com> writes:
>  >> you don't even know the proper idiom to bypass the open call.
>
>  TM> I'm afraid I don't remember such a thing.  Will you please explain?
>  TM> Not `cat FOO`, I hope?
>
> nope.
>
> my $text = do { local( @ARGV, $/ ) = $filename ; <> } ;
>
> that is the (im)proper idiom for slurping in a file. no open needed as
> it is done by the <> on the values in @ARGV. slow as hell too!

Have you actually benchmarked this in the last 10 years?

On my systems 
    my $text = do { local( @ARGV, $/ ) = $filename ; <> } ;
and 
    my $text = read_file($filename);
are almost exactly the same speed for largish files (for very small
files the former is even a bit faster).

However,
    read_file($filename, buf_ref => \$text);
is a lot (factor 3-4) faster, since it avoids the extra copy.

All tests were made with files which were already cached in memory -
when the files have to be read from disk, all differences will probably
be negligible.

	hp (satisfied user of File::Slurp)


-- 
   _  | Peter J. Holzer    | Deprecating human carelessness and
|_|_) | Sysadmin WSR       | ignorance has no successful track record.
| |   | hjp@hjp.at         | 
__/   | http://www.hjp.at/ |  -- Bill Code on asrg@irtf.org


------------------------------

Date: Thu, 22 Mar 2012 21:05:25 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Problem with splitting data
Message-Id: <slrnjmn1g5.9ba.hjp-usenet2@hrunkner.hjp.at>

On 2012-03-21 16:38, Uri Guttman <uri@stemsystems.com> wrote:
>>>>>> "RW" == Rainer Weikusat <rweikusat@mssgmbh.com> writes:
>  RW> Yes. And it doesn't use some 380 lines of code written by you I
>  RW> wouldn't want to touch with a ten feet pole if only because of the
>  RW> O(n*n) algorithm used to deal with Windows text files:
>
>  RW> 		$buf =~ s/\015\012/\n/g if $is_win32 ;
>
> and you know a better faster way? try reading some winblows files your
> way and benchmark them. oh, there already is a benchmark script in the
> distro you can use. ever think that perl already does the same crap to
> convert cr/lf to \n in stdio? oh, you didn't think about that. it has to
> be done regardless of whose code it doing it.

Well, there are better and worse ways of doing it. If s/\015\012/\n/g
was indeed O(n²) it would be pretty bad for large files. However, a
simple test shows that it's actually O(n):

#!/usr/bin/perl
use warnings;
use strict;
use Time::HiRes qw(time);

for (my $lines = 1; $lines < 100_000_000; $lines *= 2) {
    my $text = (("a" x 80) . "\r\n") x $lines;
    my $len0 = length($text);
    my $t0 = time();
    $text =~ s/\r\n/\n/g;
    my $t1 = time();
    my $len1 = length($text);
    my $dt = $t1 - $t0;
    printf "%d %d %g %g\n", $len0, $len1, $dt, $len0 / $dt;
}
__END__
82 81 5.00679e-06 1.63778e+07
164 162 2.86102e-06 5.73222e+07
 ...
41984 41472 0.000169992 2.46976e+08
83968 82944 0.000283003 2.96704e+08
167936 165888 0.000559092 3.00373e+08
 ...
171966464 169869312 0.57812 2.97458e+08
343932928 339738624 1.17936 2.91627e+08
687865856 679477248 2.35096 2.92589e+08

(Note that the 4th column is almost constant at about 3E8)

On a side note, the script dies with the message 

    Substitution loop at repl line 10.

which is a bug in Perl I think (reproducable with perl 5.8.8., 5.10.1,
5.14.2 on x86_64).


> but no, you know better about these things.

	hp


-- 
   _  | Peter J. Holzer    | Deprecating human carelessness and
|_|_) | Sysadmin WSR       | ignorance has no successful track record.
| |   | hjp@hjp.at         | 
__/   | http://www.hjp.at/ |  -- Bill Code on asrg@irtf.org


------------------------------

Date: Sun, 25 Mar 2012 01:02:34 -0400
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: Problem with splitting data
Message-Id: <87sjgxqnmd.fsf@stemsystems.com>

>>>>> "PJH" == Peter J Holzer <hjp-usenet2@hjp.at> writes:

  PJH> On 2012-03-21 16:33, Uri Guttman <uri@stemsystems.com> wrote:

  >> my $text = do { local( @ARGV, $/ ) = $filename ; <> } ;
  >> 
  >> that is the (im)proper idiom for slurping in a file. no open needed as
  >> it is done by the <> on the values in @ARGV. slow as hell too!

  PJH> Have you actually benchmarked this in the last 10 years?

  PJH> On my systems 
  PJH>     my $text = do { local( @ARGV, $/ ) = $filename ; <> } ;
  PJH> and 
  PJH>     my $text = read_file($filename);
  PJH> are almost exactly the same speed for largish files (for very small
  PJH> files the former is even a bit faster).

  PJH> However,
  PJH>     read_file($filename, buf_ref => \$text);
  PJH> is a lot (factor 3-4) faster, since it avoids the extra copy.

yes. and that is mentioned in the docs as the fastest style of slurp.

and the benchmark script shows that as well. given that i rewrote the
benchmark script last year (to improve the structure, options and
such), you know i benchmarked all the slurps recently. even if something
tied file::slurp in some cases it doesn't in all case and also they lack
the flexibility. so that means you won't lose any speed, likely will
gain a fair amount of speed and will also have more error and i/o
handling options.

  PJH> All tests were made with files which were already cached in memory -
  PJH> when the files have to be read from disk, all differences will probably
  PJH> be negligible.

the benchmark script uses Benchmark.pm and so it runs on the same files
many times. if you run the script twice in a row it will almost for sure
have the files cached in ram.

uri



------------------------------

Date: Sun, 25 Mar 2012 13:25:58 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Problem with splitting data
Message-Id: <slrnjmu067.909.hjp-usenet2@hrunkner.hjp.at>

On 2012-03-25 05:02, Uri Guttman <uri@stemsystems.com> wrote:
>>>>>> "PJH" == Peter J Holzer <hjp-usenet2@hjp.at> writes:
>
>  PJH> On 2012-03-21 16:33, Uri Guttman <uri@stemsystems.com> wrote:
>
>  >> my $text = do { local( @ARGV, $/ ) = $filename ; <> } ;
>  >> 
>  >> that is the (im)proper idiom for slurping in a file. no open needed as
>  >> it is done by the <> on the values in @ARGV. slow as hell too!
>
>  PJH> Have you actually benchmarked this in the last 10 years?
>
>  PJH> On my systems 
>  PJH>     my $text = do { local( @ARGV, $/ ) = $filename ; <> } ;
>  PJH> and 
>  PJH>     my $text = read_file($filename);
>  PJH> are almost exactly the same speed for largish files (for very small
>  PJH> files the former is even a bit faster).
>
>  PJH> However,
>  PJH>     read_file($filename, buf_ref => \$text);
>  PJH> is a lot (factor 3-4) faster, since it avoids the extra copy.
>
> yes. and that is mentioned in the docs as the fastest style of slurp.

It is not, however, mentioned in the synopsis.

I bet most users just use
    my $text = read_file($filename);

OTOH, performance probably isn't an issue for most users.

> and the benchmark script shows that as well. given that i rewrote the
> benchmark script last year (to improve the structure, options and
> such), you know i benchmarked all the slurps recently.

Your benchmark script doesn't include the case 
    $text = do { local( @ARGV, $/ ) = $filename ; <> } ;

It includes a case 
    my $text = orig_slurp_scalar( $file_name )

where orig_slurp_scalar then calls orig_slurp, which does the above. So
that adds two function calls and at least one, more likely several extra
copies (I don't know how scalar returns are implemented in perl). 

I have added this to the end of bench_scalar_slurp and rerun the script:

                direct_slurp_scalar =>· 
                        sub { my $text = do { local( @ARGV, $/ ) = $file_name ; <> } },

The result is surprising. I would have expected that to be about as fast
as FS::read_file (because that's what I've seen in my own benchmarks),
but it's a lot faster, even faster than FS::read_file_buf_ref2:

                           Rate  orig_slurp  FS::read_file  FS::read_file_buf_ref2 direct_slurp_scalar
file_contents             169/s        -76%           -81%                    -90%                -92%
file_contents_no_OO       170/s        -75%           -81%                    -90%                -92%
orig_read_file            560/s        -19%           -39%                    -67%                -73%
orig_slurp                694/s          --           -24%                    -59%                -66%
FS12::read_file           907/s         31%            -0%                    -46%                -56%
FS::read_file             910/s         31%             --                    -46%                -55%
old_sysread_file          919/s         32%             1%                    -45%                -55%
FS::read_file_scalar_ref 1047/s         51%            15%                    -37%                -49%
FS::read_file_buf_ref    1051/s         52%            15%                    -37%                -49%
old_read_file            1232/s         78%            35%                    -26%                -40%
FS::read_file_buf_ref2   1673/s        141%            84%                      --                -18%
direct_slurp_scalar      2043/s        195%           124%                     22%                  --

(irrelevant columns omitted)

I wonder if there is a systematic error here ...

>  PJH> All tests were made with files which were already cached in memory -
>  PJH> when the files have to be read from disk, all differences will probably
>  PJH> be negligible.
>
> the benchmark script uses Benchmark.pm and so it runs on the same files
> many times. if you run the script twice in a row it will almost for sure
> have the files cached in ram.

Yes, I know. I just wanted to mention that in real life the files you
have to read are not always already in memory, but often on disk, which
is a lot slower. So my benchmarks (like yours) exaggerate the
differences (If you have to wait for 20 disk seeks it doesn't matter if
you save 1 millisecond or not).

	hp


-- 
   _  | Peter J. Holzer    | Deprecating human carelessness and
|_|_) | Sysadmin WSR       | ignorance has no successful track record.
| |   | hjp@hjp.at         | 
__/   | http://www.hjp.at/ |  -- Bill Code on asrg@irtf.org


------------------------------

Date: Sun, 25 Mar 2012 15:13:32 +0200
From: "Dr.Ruud" <rvtol+usenet@xs4all.nl>
Subject: Re: Problem with splitting data
Message-Id: <4f6f19fc$0$6926$e4fe514c@news2.news.xs4all.nl>

On 2012-03-25 13:25, Peter J. Holzer wrote:

>  direct_slurp_scalar =>·
>    sub { my $text = do { local( @ARGV, $/ ) = $file_name ;<>  } },

What is the role of the "my $text = do {...}" wrapper?

I would expect just:

   direct_slurp_scalar =>
     sub { local( @ARGV, $/ ) = $file_name; <> },

-- 
Ruud


------------------------------

Date: Sun, 25 Mar 2012 15:55:16 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Problem with splitting data
Message-Id: <87398wraqz.fsf@sapphire.mobileactivedefense.com>

"Dr.Ruud" <rvtol+usenet@xs4all.nl> writes:
> On 2012-03-25 13:25, Peter J. Holzer wrote:
>
>>  direct_slurp_scalar =>·
>>    sub { my $text = do { local( @ARGV, $/ ) = $file_name ;<>  } },
>
> What is the role of the "my $text = do {...}" wrapper?

Make the code appear more complicated than it actually is.


------------------------------

Date: Sun, 25 Mar 2012 17:51:46 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Problem with splitting data
Message-Id: <slrnjmufoi.46e.hjp-usenet2@hrunkner.hjp.at>

On 2012-03-25 13:13, Dr.Ruud <rvtol+usenet@xs4all.nl> wrote:
> On 2012-03-25 13:25, Peter J. Holzer wrote:
>>  direct_slurp_scalar =>·
>>    sub { my $text = do { local( @ARGV, $/ ) = $file_name ;<>  } },
>
> What is the role of the "my $text = do {...}" wrapper?
>
> I would expect just:
>
>    direct_slurp_scalar =>
>      sub { local( @ARGV, $/ ) = $file_name; <> },

All the other benchmarks assign the result to a variable. So I
have to do that here, too, to make the results comparable.

There are various ways in which the assignments can happen, so it makes
sense to benchmark the effect of those ways. Just throwing away the
result doesn't make much sense, however.

	hp


-- 
   _  | Peter J. Holzer    | Deprecating human carelessness and
|_|_) | Sysadmin WSR       | ignorance has no successful track record.
| |   | hjp@hjp.at         | 
__/   | http://www.hjp.at/ |  -- Bill Code on asrg@irtf.org


------------------------------

Date: Sun, 25 Mar 2012 20:21:45 -0400
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: Problem with splitting data
Message-Id: <87obrkqkiu.fsf@stemsystems.com>

>>>>> "PJH" == Peter J Holzer <hjp-usenet2@hjp.at> writes:

  PJH> Your benchmark script doesn't include the case 
  PJH>     $text = do { local( @ARGV, $/ ) = $filename ; <> } ;

  PJH> It includes a case 
  PJH>     my $text = orig_slurp_scalar( $file_name )

  PJH> where orig_slurp_scalar then calls orig_slurp, which does the above. So
  PJH> that adds two function calls and at least one, more likely several extra
  PJH> copies (I don't know how scalar returns are implemented in perl). 

true. i didn't account for the overhead in the extra sub calls.

  PJH> I have added this to the end of bench_scalar_slurp and rerun the script:

  PJH>                 direct_slurp_scalar =>· 
  PJH>                         sub { my $text = do { local( @ARGV, $/ ) = $file_name ; <> } },

  PJH> The result is surprising. I would have expected that to be about as fast
  PJH> as FS::read_file (because that's what I've seen in my own benchmarks),
  PJH> but it's a lot faster, even faster than FS::read_file_buf_ref2:

what size file are you testing? the script has the option of selecting
multiple file sizes. slurp's speed wins more for larger files as it has
less overhead (much of that is in arg processing and error checking).

  PJH>                            Rate  orig_slurp  FS::read_file  FS::read_file_buf_ref2 direct_slurp_scalar
  PJH> file_contents             169/s        -76%           -81%                    -90%                -92%
  PJH> file_contents_no_OO       170/s        -75%           -81%                    -90%                -92%
  PJH> orig_read_file            560/s        -19%           -39%                    -67%                -73%
  PJH> orig_slurp                694/s          --           -24%                    -59%                -66%
  PJH> FS12::read_file           907/s         31%            -0%                    -46%                -56%
  PJH> FS::read_file             910/s         31%             --                    -46%                -55%
  PJH> old_sysread_file          919/s         32%             1%                    -45%                -55%
  PJH> FS::read_file_scalar_ref 1047/s         51%            15%                    -37%                -49%
  PJH> FS::read_file_buf_ref    1051/s         52%            15%                    -37%                -49%
  PJH> old_read_file            1232/s         78%            35%                    -26%                -40%
  PJH> FS::read_file_buf_ref2   1673/s        141%            84%                      --                -18%
  PJH> direct_slurp_scalar      2043/s        195%           124%                     22%                  --

i wouldn't call that much faster. also as i said, file sizes matter
too. and perl could have improved the guts of <> since i first wrote
that (it needed it badly). even so, it is such a fugly idiom that i
would never teach it.

  PJH> I wonder if there is a systematic error here ...

  PJH> All tests were made with files which were already cached in memory -
  PJH> when the files have to be read from disk, all differences will probably
  PJH> be negligible.

not exactly as requesting larger reads is still faster than what stdio
would do. but sure, disk is much slower than ram as we all know.

when i get to the next version (maybe in a couple of weeks) i will add
your entry to the benchmark. i have a couple of other minor fixes to
make.

uri


------------------------------

Date: Mon, 26 Mar 2012 02:04:59 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Problem with splitting data
Message-Id: <rur449-8td2.ln1@anubis.morrow.me.uk>


Quoth "Peter J. Holzer" <hjp-usenet2@hjp.at>:
> 
> Your benchmark script doesn't include the case 
>     $text = do { local( @ARGV, $/ ) = $filename ; <> } ;
> 
> It includes a case 
>     my $text = orig_slurp_scalar( $file_name )
> 
> where orig_slurp_scalar then calls orig_slurp, which does the above. So
> that adds two function calls and at least one, more likely several extra
> copies (I don't know how scalar returns are implemented in perl). 

The value to be returned gets pushed onto the stack, and the returned-to
opcode pops it off again. What's more interesting is how scalar
assignment is implemented: normally the various parts of the source
scalar are copied into the destination scalar; but if the source is a
temporary (plus a number of other conditions), the string part will be
directly transferred from one scalar to the other, without copying it.

I strongly suspect (but can't seem to find a way to prove) that

    $text = do {...; <> };

fulfils the conditions for copy-free assignment, but

    $test = do { ...; my $x = <>; $x };

does not, since the $x return value is not a temporary (it's a lexical).

Ben



------------------------------

Date: 25 Mar 2012 12:07:54 GMT
From: hymie@lactose.homelinux.net (hymie!)
Subject: Re: yet another question about numbers and strings
Message-Id: <4f6f0a9a$0$32592$882e7ee2@usenet-news.net>

In our last episode, the evil Dr. Lacto had captured our hero,
  Ben Morrow <ben@morrow.me.uk>, who said:
>
>Quoth hymie@lactose.homelinux.net (hymie!):
>> In our last episode, the evil Dr. Lacto had captured our hero,
>>   Ben Morrow <ben@morrow.me.uk>, who said:
>> >Quoth hymie@lactose.homelinux.net (hymie!):
>> 
>> >Odd. What do you get from this?
>> >
>> >    use DBI;
>> >    use Devel::Peek;
>> >
>> >    my $dbh = DBI->connect("dbi:Sybase:...", ...);
>> >    my $bigint = $dbh->selectcol_arrayref(<<SQL);
>> >        SELECT bigint FROM table WHERE ...
>> >    SQL
>> >
>> >    Dump $bigint;
>> 
>> SV = PVNV(0x7c9f60) at 0x7eef40
>
>Have you stripped some of the Dump output, or did you run something
>different? I was expecting $bigint to be an arrayref.

I'm sorry, I misunderstood.  I didn't actually use selectcol_arrayref.
I just took my existing script and added Devel:Peek to it.

Here is one result:

SV = RV(0x759088) at 0x63e1a0
  REFCNT = 1
  FLAGS = (PADBUSY,PADMY,ROK)
  RV = 0x6f9350
  SV = PVAV(0x6f8b90) at 0x6f9350
    REFCNT = 1
    FLAGS = (PADBUSY,PADMY)
    IV = 0
    NV = 0
    ARRAY = 0x7eba90
    FILL = 9
    MAX = 13
    ARYLEN = 0x0
    FLAGS = (REAL)
    Elt No. 0
    SV = NV(0x5289b8) at 0x7d49a0
      REFCNT = 1
      FLAGS = (NOK,pNOK)
      NV = 1.71104846977343e+18
    Elt No. 1
    SV = NV(0x5289c0) at 0x7eb090
      REFCNT = 1
      FLAGS = (NOK,pNOK)
      NV = 1.69309275828193e+18
    Elt No. 2
    SV = NV(0x5289d0) at 0x7eb0a0
      REFCNT = 1
      FLAGS = (NOK,pNOK)
      NV = 1.69318491698104e+18
    Elt No. 3
    SV = NV(0x5289d8) at 0x7eb0b0
      REFCNT = 1
      FLAGS = (NOK,pNOK)
      NV = 1.68108084934812e+18

Here is a second result:

SV = RV(0x759938) at 0x630fb0
  REFCNT = 1
  FLAGS = (PADBUSY,PADMY,ROK)
  RV = 0x6f9590
  SV = PVAV(0x6f9040) at 0x6f9590
    REFCNT = 1
    FLAGS = (PADBUSY,PADMY)
    IV = 0
    NV = 0
    ARRAY = 0x7ec180
    FILL = 9
    MAX = 13
    ARYLEN = 0x0
    FLAGS = (REAL)
    Elt No. 0
    SV = NV(0x5313a8) at 0x7eb4c0
      REFCNT = 1
      FLAGS = (NOK,pNOK)
      NV = 9.21814636810375e+18
    Elt No. 1
    SV = NV(0x5313b0) at 0x7eb620
      REFCNT = 1
      FLAGS = (NOK,pNOK)
      NV = 9.21474030305498e+18
    Elt No. 2
    SV = NV(0x5313c0) at 0x7eb630
      REFCNT = 1
      FLAGS = (NOK,pNOK)
      NV = 9.21281446114217e+18
    Elt No. 3
    SV = NV(0x5313c8) at 0x7eb640
      REFCNT = 1
      FLAGS = (NOK,pNOK)
      NV = 9.2061546186557e+18


I tried to do a "SELECT TOP 10" and "SELECT TOP 20" , but it only seems
to report 4.

>Are you *sure* you're using the newer version?

I couldn't find a specific "print version" command to add to my script,
but I try this:

use DBD::Sybase 1.15;

the resulting error is

DBD::Sybase version 1.15 required--this is only version 1.14 at ./test.pl
line 13.

So yes, I'm as sure as I can be that I'm using the newer version.

> (And that you didn't
>define SYB_NATIVE_NUM?)

Not that I can tell:

/*
 *
 * #define SYB_NATIVE_NUM
 */


>Something else you could try, if MSSQL
>will let you, is to cast the bigint to varchar or something equivalent
>before returning it. CAST(column AS varchar) would be the SQL92 syntax;
>I don't know if MSSQL implements that or if there's some equivalent.

Yes, that exists and works, if I know the names of the columns in advance.
My intent was to make this a generic script that would work on any table.
But maybe I can do something with that idea.

--hymie!    http://lactose.homelinux.net/~hymie    hymie@lactose.homelinux.net
-------------------------------------------------------------------------------


------------------------------

Date: Sun, 25 Mar 2012 16:07:36 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: yet another question about numbers and strings
Message-Id: <ouo349-fh02.ln1@anubis.morrow.me.uk>


Quoth hymie@lactose.homelinux.net (hymie!):
> In our last episode, the evil Dr. Lacto had captured our hero,
>   Ben Morrow <ben@morrow.me.uk>, who said:
> >
> >Quoth hymie@lactose.homelinux.net (hymie!):
> >> In our last episode, the evil Dr. Lacto had captured our hero,
> >>   Ben Morrow <ben@morrow.me.uk>, who said:
> >> >Quoth hymie@lactose.homelinux.net (hymie!):
> >> 
> >> >Odd. What do you get from this?
> >> >
> >> >    use DBI;
> >> >    use Devel::Peek;
> >> >
> >> >    my $dbh = DBI->connect("dbi:Sybase:...", ...);
> >> >    my $bigint = $dbh->selectcol_arrayref(<<SQL);
> >> >        SELECT bigint FROM table WHERE ...
> >> >    SQL
> >> >
> >> >    Dump $bigint;
> >> 
> >> SV = PVNV(0x7c9f60) at 0x7eef40
> >
> >Have you stripped some of the Dump output, or did you run something
> >different? I was expecting $bigint to be an arrayref.
> 
> I'm sorry, I misunderstood.  I didn't actually use selectcol_arrayref.
> I just took my existing script and added Devel:Peek to it.
> 
> Here is one result:
> 
<snip>
>     Elt No. 0
>     SV = NV(0x5289b8) at 0x7d49a0
>       REFCNT = 1
>       FLAGS = (NOK,pNOK)
>       NV = 1.71104846977343e+18

 ...and they're still NVs. (In this case pure NVs, since the values have
never been stringified.) Odd.

> I tried to do a "SELECT TOP 10" and "SELECT TOP 20" , but it only seems
> to report 4.

Yes, that's a feature of Devel::Peek, to stop it printing too much. I
can never remeber how to turn it off... It doesn't matter, I only needed
to see one row, anyway.

> >Are you *sure* you're using the newer version?
> 
> I couldn't find a specific "print version" command to add to my script,

The correct answer is

    warn "Using DBD::Sybase version " . DBD::Sybase->VERSION;

though there are compelling reasons for noone to take advantage of the
fact ->VERSION is a method, so in practice

    warn "Using DBD::Sybase version $DBD::Sybase::VERSION";

is fine.

> but I try this:
> 
> use DBD::Sybase 1.15;
> 
> the resulting error is
> 
> DBD::Sybase version 1.15 required--this is only version 1.14 at ./test.pl
> line 13.

That works too.

ISTM you've got to the point where this can be reported as a bug in
DBD::Sybase. It's definitely not behaving as it should.

Ben



------------------------------

Date: Sat, 24 Mar 2012 16:30:28 -0700 (PDT)
From: Xah Lee <xahlee@gmail.com>
Subject: Your Regex Brain
Message-Id: <ea530c3a-3c1a-49af-97f1-ddfd0a70ba1c@to5g2000pbc.googlegroups.com>

=E3=80=88Your Regex Brain=E3=80=89
http://xahlee.org/comp/your_regex_brain.html

Yours truely,

 Xah


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3649
***************************************


home help back first fref pref prev next nref lref last post