[32383] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3650 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Mar 26 21:09:28 2012

Date: Mon, 26 Mar 2012 18:09:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 26 Mar 2012     Volume: 11 Number: 3650

Today's topics:
    Re: Filter content from a list: hard-coded expression o <rweikusat@mssgmbh.com>
    Re: Filter content from a list: hard-coded expression o <massion@gmx.de>
    Re: Filter content from a list: hard-coded expression o <rweikusat@mssgmbh.com>
    Re: Filter content from a list: hard-coded expression o <cartercc@gmail.com>
    Re: Filter content from a list: hard-coded expression o <ben@morrow.me.uk>
    Re: Filter content from a list: hard-coded expression o <tzz@lifelogs.com>
    Re: Filter content from a list: hard-coded expression o <ben@morrow.me.uk>
    Re: naming modules <ben@morrow.me.uk>
    Re: naming modules <rweikusat@mssgmbh.com>
        perldoc: the key to perl <xahlee@gmail.com>
        Problem using OODoc::Meta and customer user properties. (Fergus McMenemie)
    Re: Problem using OODoc::Meta and customer user propert <ben@morrow.me.uk>
    Re: Problem with splitting data <hjp-usenet2@hjp.at>
    Re: Your Regex Brain sln@netherlands.com
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Mon, 26 Mar 2012 14:59:48 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Filter content from a list: hard-coded expression or read from a file?
Message-Id: <87vclr4g4r.fsf@sapphire.mobileactivedefense.com>

Francois Massion <massion@gmx.de> writes:
> I have a list of strings like the following list:
>
> Log file content
> a long date
> the mandatory check
> Mark text to replace
>
> I want to keep only the strings which do not begin with certain words.
> So far I have done it with a hard coded list of words but this list
> may vary and can be very long. I wonder how I could read the list from
> a file and achieve the same result.
> Here the code which works:
>
> open(INPUT,'mytext.txt') || die("File cannot be opened!\n");
> @sentence = <INPUT>;
> close(INPUT);
> foreach $sentence (@sentence) {
> 	chomp $sentence;
> 	if ($sentence !~ m/^a |^the |^therefore /i) { # Actually a very long
> list
> 	push (@filteredresult,$sentence);
> }

My suggestion would be to put the exclusion list into a hash (this is
uncompiled example code), ie,

open($fh, '<', '/path/to/list');
%excls = map { chomp; $_, 1; } <$fh>;

and then check it as follows:

next if $sentence =~ /^(\W*)/ && $excls{lc($1));

(push coming after this line) or

push(@result, $sentence) unless $sentence =~ /^(\W*)/ && $excls{lc($1)}


------------------------------

Date: Mon, 26 Mar 2012 07:41:37 -0700 (PDT)
From: Francois Massion <massion@gmx.de>
Subject: Re: Filter content from a list: hard-coded expression or read from a file?
Message-Id: <18881864-4505-4891-a452-e17f9d0d2dc4@q11g2000vbu.googlegroups.com>

On 26 Mrz., 15:59, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
> Francois Massion <mass...@gmx.de> writes:
> > I have a list of strings like the following list:
>
> > Log file content
> > a long date
> > the mandatory check
> > Mark text to replace
>
> > I want to keep only the strings which do not begin with certain words.
> > So far I have done it with a hard coded list of words but this list
> > may vary and can be very long. I wonder how I could read the list from
> > a file and achieve the same result.
> > Here the code which works:
>
> > open(INPUT,'mytext.txt') || die("File cannot be opened!\n");
> > @sentence =3D <INPUT>;
> > close(INPUT);
> > foreach $sentence (@sentence) {
> > =A0 =A0chomp $sentence;
> > =A0 =A0if ($sentence !~ m/^a |^the |^therefore /i) { # Actually a very =
long
> > list
> > =A0 =A0push (@filteredresult,$sentence);
> > }
>
> My suggestion would be to put the exclusion list into a hash (this is
> uncompiled example code), ie,
>
> open($fh, '<', '/path/to/list');
> %excls =3D map { chomp; $_, 1; } <$fh>;
>
> and then check it as follows:
>
> next if $sentence =3D~ /^(\W*)/ && $excls{lc($1));
>
> (push coming after this line) or
>
> push(@result, $sentence) unless $sentence =3D~ /^(\W*)/ && $excls{lc($1)}
>
>

I have tested 2 versions, unsuccessfully:

Version # 1 (based on Rainer's suggestion):
#!/usr/bin/perl -w

my $infile =3D 'a.txt';
open my $input, '<', $infile;
open($fh, '<', 'b.txt');
%excls =3D map { chomp; $_, 1; } <$fh>;
next if $input =3D~ /^(\W*)/ && $excls{lc($1)};
push(@result, $input) unless $input =3D~ /^(\W*)/ && $excls{lc($1)} ;
foreach (@result) {
	print "$_\n";
}

RESULT: GLOB(0x36f178)
(No idea what this means)

Version # 2 (based on Dr Ruud and Ben's suggestion; sorry if I messed
it up):

#!/usr/bin/perl -w

   my $infile =3D 'a.txt';

   open my $input, '<', $infile;
   open my $WORDS, '<', 'b.txt';
   my @words =3D <$WORDS>;
   my $re =3D join "|", map quotemeta, @words;
   while ( <$input> ) {
       next if /^(?:$re)\x{20}/;
        push (@filteredresult,$input);

foreach (@filteredresult) {
	print "$_\n";
}}

RESULT:
GLOB(0x1ff178)
GLOB(0x1ff178)
GLOB(0x1ff178)
 ...



------------------------------

Date: Mon, 26 Mar 2012 16:06:16 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Filter content from a list: hard-coded expression or read from a file?
Message-Id: <87ehsf4d1z.fsf@sapphire.mobileactivedefense.com>

Francois Massion <massion@gmx.de> writes:
> On 26 Mrz., 15:59, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
>> Francois Massion <mass...@gmx.de> writes:
>> > I have a list of strings like the following list:
>>
>> > Log file content
>> > a long date
>> > the mandatory check
>> > Mark text to replace
>>
>> > I want to keep only the strings which do not begin with certain words.
>> > So far I have done it with a hard coded list of words but this list
>> > may vary and can be very long. I wonder how I could read the list from
>> > a file and achieve the same result.
>> > Here the code which works:
>>
>> > open(INPUT,'mytext.txt') || die("File cannot be opened!\n");
>> > @sentence = <INPUT>;
>> > close(INPUT);
>> > foreach $sentence (@sentence) {
>> >    chomp $sentence;
>> >    if ($sentence !~ m/^a |^the |^therefore /i) { # Actually a very long
>> > list
>> >    push (@filteredresult,$sentence);
>> > }
>>
>> My suggestion would be to put the exclusion list into a hash (this is
>> uncompiled example code), ie,
>>
>> open($fh, '<', '/path/to/list');
>> %excls = map { chomp; $_, 1; } <$fh>;
>>
>> and then check it as follows:
>>
>> next if $sentence =~ /^(\W*)/ && $excls{lc($1));
>>
>> (push coming after this line) or
>>
>> push(@result, $sentence) unless $sentence =~ /^(\W*)/ && $excls{lc($1)}
>>
>>
>
> I have tested 2 versions, unsuccessfully:
>
> Version # 1 (based on Rainer's suggestion):
> #!/usr/bin/perl -w
>
> my $infile = 'a.txt';
> open my $input, '<', $infile;
> open($fh, '<', 'b.txt');
> %excls = map { chomp; $_, 1; } <$fh>;
> next if $input =~ /^(\W*)/ && $excls{lc($1)};
> push(@result, $input) unless $input =~ /^(\W*)/ && $excls{lc($1)} ;
> foreach (@result) {
> 	print "$_\n";
> }
>
> RESULT: GLOB(0x36f178)
> (No idea what this means)

The reason why I wrote 'you can do this OR that' was that these were
supposed to be mutually exclusive options. Also, you obviously need
some kind of input processing loop and test the condition against the
sentences, NOT against the result of stringfying the input file handle
(which is 'some glob').


------------------------------

Date: Mon, 26 Mar 2012 08:49:28 -0700 (PDT)
From: ccc31807 <cartercc@gmail.com>
Subject: Re: Filter content from a list: hard-coded expression or read from a file?
Message-Id: <5ba684e8-f70c-4552-9f8f-0cd051c86510@px4g2000pbc.googlegroups.com>

On Mar 26, 2:00=A0am, Francois Massion <mass...@gmx.de> wrote:
> Newbee question:
> I have a list of strings like the following list:
>
> Log file content
> a long date
> the mandatory check
> Mark text to replace
>
> I want to keep only the strings which do not begin with certain words.

It would have been more helpful (for me, anyway) if you had posted
your actual data, but that's okay.

I have found that these kinds of tasks often decompose into a
particular pattern, illustrated below. The pattern has three phases:
(1) read the file contents into a data structure, (2) munge the data,
and (3) write the data to a file. The following (hypothetical) script
illustrates this:

#! perl
use strict;
use warnings;

my %data;
read_file_contents();
munge_data();
write_data_to_file();
exit(0);

sub read_file_contets
{
  open FILE, '<', 'data_file.csv' or die "$!";
  next unless /\w/; #skip empty lines
  next if /your REGEX to skip/; #skip unneeded lines
  chomp;
  my ($val1, $val2, $val3, ...) =3D split(/?/, $_)
  $data{$val1} =3D {
    KEY2 =3D> $val2,
    KEY3 =3D> $val3,
    KEY4 =3D> $val4,
    ...,
  }
  close FILE;
}
sub munge_data
{
  #you now have your data in a convenient structure
  #so you can manipulate it how you please
  foreach my $key (keys %data) { munge_record($data{$key}); }
}
sub write_data_to_file
{
  open OUT, '>', 'output.csv' or die "$!";
  print OUT qq("COL1","COL2","COL3", ...);
  foreach my $key (keys %data)
  {
    print OUT qq("$key","$data{$key}{KEY2}"," ...);
  }
  close OUT;
}
sub munge_record
{
  my $record =3D shift;
  # munge here
}


------------------------------

Date: Mon, 26 Mar 2012 18:05:18 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Filter content from a list: hard-coded expression or read from a file?
Message-Id: <e7k649-gcp1.ln1@anubis.morrow.me.uk>


Quoth Francois Massion <massion@gmx.de>:
> 
> I have tested 2 versions, unsuccessfully:
> 
> Version # 1 (based on Rainer's suggestion):
> #!/usr/bin/perl -w
> 
> my $infile = 'a.txt';
> open my $input, '<', $infile;
> open($fh, '<', 'b.txt');
> %excls = map { chomp; $_, 1; } <$fh>;
> next if $input =~ /^(\W*)/ && $excls{lc($1)};
> push(@result, $input) unless $input =~ /^(\W*)/ && $excls{lc($1)} ;
> foreach (@result) {
> 	print "$_\n";
> }
> 
> RESULT: GLOB(0x36f178)
> (No idea what this means)

In this case it means 'this is a filehandle'. 

> Version # 2 (based on Dr Ruud and Ben's suggestion; sorry if I messed
> it up):
> 
> #!/usr/bin/perl -w
> 
>    my $infile = 'a.txt';
> 
>    open my $input, '<', $infile;
>    open my $WORDS, '<', 'b.txt';
>    my @words = <$WORDS>;
>    my $re = join "|", map quotemeta, @words;
>    while ( <$input> ) {
>        next if /^(?:$re)\x{20}/;
>         push (@filteredresult,$input);

    push (@filteredresult, $_);

$input is the filehandle you are reading from; the line of text you just
read is in $_, because perl converts

    while (<$input>) {

into

    while (defined($_ = <$input>)) {

for convenience.

Ben



------------------------------

Date: Mon, 26 Mar 2012 20:06:22 -0400
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: Filter content from a list: hard-coded expression or read from a file?
Message-Id: <87mx72q54x.fsf@lifelogs.com>

On Mon, 26 Mar 2012 07:41:37 -0700 (PDT) Francois Massion <massion@gmx.de> wrote: 

FM> I have tested 2 versions, unsuccessfully:

Hi Francois,

if you're OK with using different tools, maybe try the GNU egrep tool.

Given files a and b:

% grep . a b
a:1
a:2
a:3
a:4
a:5
b:^[12]
b:^[4]

You can just use the -f option to read patterns from b to filter a:

% egrep -f b a
1
2
4

This approach may work better for you, depending on the OS platforms you
have to support, the size of the file, and the complexity of the regular
expressions.  Try it out.

Ted


------------------------------

Date: Mon, 26 Mar 2012 12:25:56 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Filter content from a list: hard-coded expression or read from a file?
Message-Id: <4b0649-ils1.ln1@anubis.morrow.me.uk>


Quoth "Dr.Ruud" <rvtol+usenet@xs4all.nl>:
> 
> On 2012-03-26 08:00, Francois Massion wrote:
> 
> > Newbee question:
> 
> See also the beginners list @perl.org.
> 
> 
> > [...]
> > open(INPUT,'mytext.txt') || die("File cannot be opened!\n");
> 
>    my $infile = 'mytext.txt';
> 
>    open my $input, '<', $infile
>      or die "Error opening '$infile': $!\n");

or

    use autodie;

> > @sentence =<INPUT>;
> 
> No need to slurp the file in, when you will process it by line.
> 
>    my @words = qw/ a the therefore /;

The spec was to read it from a file, so

    open my $WORDS, "<", "words";
    my @words = <$WORDS>;

(I wouldn't worry about closing it in a tiny script like this. In
something longer I would let the filehandle close automatically at the
end of the scope, unless I had a reason to check the return value of
close().)

>    my $re = join '|', @words;

These are supposed to be words, not patterns.

    my $re = join "|", map quotemeta, @words;

>    while ( <$input> ) {
>        next if /^(?:$re)\x{20}/;
>        ...;
>    }

Ben



------------------------------

Date: Mon, 26 Mar 2012 12:35:59 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: naming modules
Message-Id: <vt0649-ils1.ln1@anubis.morrow.me.uk>


Quoth Ivan Shmakov <oneingray@gmail.com>:
> >>>>> Ben Morrow <ben@morrow.me.uk> writes:
> >>>>> Quoth Ivan Shmakov <oneingray@gmail.com>:
> 
> 	Unfortunately, even though I dislike mixed-case identifiers, I
> 	have no good reason at hand to avoid it for my Perl code.

I find them quote convenient for package names, even though I dislike
them for function and variable names: they're different enough from the
norm to stand out a bit. <shrug> Maybe I'm just used to it :).

>  > common::sense considers itself to be pragmatic in nature, since all
>  > it's doing is turning on various core pragmas.
> 
> 	There still is a potential clash should the Perl developers
> 	choose to implement their own common::sense.

Yes, though in practice new core module tend to start on CPAN and then
migrate in, nowadays.

>  >> The other question is whether I should use foo::bar or
>  >> App::Foo::Bar for the modules related to an application Foo?
> 
>  > App:: is for implementations of applications, not modules which
>  > relate to them.  So, for instance, App::Ack implements the guts of
>  > ack(1), rather than being an interface for calling it.
> 
> 	Actually, there's to be the modules that handle a format, or
> 	perhaps a faimily of formats, specific to this particular
> 	application.
> 
> 	If App::MyApp::MyFormat doesn't fit, should it be, e. g.,
> 	Data::MyApp::MyFormat?

If there's no more specific top-level that fits, then yes, that sounds
good.

> PS.  I'll try to file an RT ticket against Digest::SHA on whether my
> 	module could be added to the distribution.

I strongly suspect the answer will be 'no', given that Digest::SHA is in
the core and really rather important (it's part of the CPAN toolchain,
for instance). Still, there's no harm in asking.

Ben



------------------------------

Date: Mon, 26 Mar 2012 13:17:54 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: naming modules
Message-Id: <87obrj5zf1.fsf@sapphire.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Ivan Shmakov <oneingray@gmail.com>:
>> >>>>> Ben Morrow <ben@morrow.me.uk> writes:
>> >>>>> Quoth Ivan Shmakov <oneingray@gmail.com>:
>> 
>> 	Unfortunately, even though I dislike mixed-case identifiers, I
>> 	have no good reason at hand to avoid it for my Perl code.
>
> I find them quote convenient for package names, even though I dislike
> them for function and variable names: they're different enough from the
> norm to stand out a bit. <shrug> Maybe I'm just used to it :).

I'm usually opposed to camel case because I'm convinced that there's a
good reason that the writing style of the Romans,
justuseasequenceoflettersandletthereaderworryaboutbreakingitupinordertoreconstructwords,
was abandoned a long time ago. OTOH, the Perl convention is such that
camel case is supposed to be used for module names and sticking to an
existing convention in areas of minor relevance is better than
inventing a new one.

Random piece of information: Reportedly, the mixed-case convention was
originally invented to work around the fact that the keyboard used for
the Xerox Alto lacked an underscore key.


------------------------------

Date: Mon, 26 Mar 2012 13:06:55 -0700 (PDT)
From: Xah Lee <xahlee@gmail.com>
Subject: perldoc: the key to perl
Message-Id: <5a116730-161b-461e-bf0b-4bee038abb3d@z5g2000pbu.googlegroups.com>

=E3=80=88Perl Documentation: The Key to Perl=E3=80=89
http://xahlee.org/perl-python/key_to_perl.html

plain text follows
-------------------------------------

So, i wanted to know what the option perl -C does. So, here's perldoc
perlrun. Excerpt:

        -C [*number/list*]
             The -C flag controls some of the Perl Unicode features.

             As of 5.8.1, the -C can be followed either by a number or
a list of
             option letters. The letters, their numeric values, and
effects are
             as follows; listing the letters is equal to summing the
numbers.

                 I     1   STDIN is assumed to be in UTF-8
                 O     2   STDOUT will be in UTF-8
                 E     4   STDERR will be in UTF-8
                 S     7   I + O + E
                 i     8   UTF-8 is the default PerlIO layer for input
streams
                 o    16   UTF-8 is the default PerlIO layer for
output streams
                 D    24   i + o
                 A    32   the @ARGV elements are expected to be
strings encoded
                           in UTF-8
                 L    64   normally the "IOEioA" are unconditional,
                           the L makes them conditional on the locale
environment
                           variables (the LC_ALL, LC_TYPE, and LANG,
in the order
                           of decreasing precedence) -- if the
variables indicate
                           UTF-8, then the selected "IOEioA" are in
effect
                 a   256   Set ${^UTF8CACHE} to -1, to run the UTF-8
caching code in
                           debugging mode.

             For example, -COE and -C6 will both turn on UTF-8-ness on
both
             STDOUT and STDERR. Repeating letters is just redundant,
not
             cumulative nor toggling.

             The "io" options mean that any subsequent open() (or
similar I/O
             operations) in the current file scope will have the
":utf8" PerlIO
             layer implicitly applied to them, in other words, UTF-8
is expected
             from any input stream, and UTF-8 is produced to any
output stream.
             This is just the default, with explicit layers in open()
and with
             binmode() one can manipulate streams as usual.

             -C on its own (not followed by any number or option
list), or the
             empty string "" for the "PERL_UNICODE" environment
variable, has
             the same effect as -CSDL. In other words, the standard I/
O handles
             and the default "open()" layer are UTF-8-fied *but* only
if the
             locale environment variables indicate a UTF-8 locale.
This
             behaviour follows the *implicit* (and problematic) UTF-8
behaviour
             of Perl 5.8.0.

             You can use -C0 (or "0" for "PERL_UNICODE") to explicitly
disable
             all the above Unicode features.

             The read-only magic variable "${^UNICODE}" reflects the
numeric
             value of this setting. This variable is set during Perl
startup and
             is thereafter read-only. If you want runtime effects, use
the
             three-arg open() (see "open" in perlfunc), the two-arg
binmode()
             (see "binmode" in perlfunc), and the "open" pragma (see
open).

             (In Perls earlier than 5.8.1 the -C switch was a Win32-
only switch
             that enabled the use of Unicode-aware "wide system call"
Win32
             APIs. This feature was practically unused, however, and
the command
             line switch was therefore "recycled".)

             Note: Since perl 5.10.1, if the -C option is used on the
"#!" line,
             it must be specified on the command line as well, since
the
             standard streams are already set up at this point in the
execution
             of the perl interpreter. You can also use binmode() to
set the
             encoding of an I/O stream.

reading that is like a adventure. It's like this:

    The -C is a key to unlock many secrets. Just get it, and you'll be
all
    good to go, except in cases you may need the inner key. You'll
find a
    hinge in the key, open it, then there's a subkey. On the subkey,
    there's a number. Take that number to the lock, it will open with
    keyX. When you use keyX, it must be matched with the previous
inner
    key with 8th bit. keyX doesn't have a ID, but you can make one by
    finding the number at the place you found the key C. Key C is
actually
    optional, but when inner key and keyX's number matches, it changes
the
    nature of the lock. This is when you need to turn on keyMode =E2=80=A6

 Xah


------------------------------

Date: Mon, 26 Mar 2012 22:23:46 +0100
From: fergus@twig-me-uk.not.here (Fergus McMenemie)
Subject: Problem using OODoc::Meta and customer user properties.
Message-Id: <1khl4a0.17y03iadahkk0N%fergus@twig-me-uk.not.here>

Hi,

Using Jean-Maries rather useful OODoc::Meta to try and automate
some invoice manipulatiom. However the docs say not to use its
used_defined() function. But how else do i extract a list of all
defined user properity names for a document.

The preferred method getUserPropertyElements() returns a list of
all user properites but does not allow me to discover there names.

Am I missing something?
 


------------------------------

Date: Mon, 26 Mar 2012 23:34:26 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Problem using OODoc::Meta and customer user properties.
Message-Id: <ig7749-d1v1.ln1@anubis.morrow.me.uk>


Quoth fergus@twig-me-uk.not.here (Fergus McMenemie):
> 
> Using Jean-Maries rather useful OODoc::Meta to try and automate
> some invoice manipulatiom. However the docs say not to use its
> used_defined() function. But how else do i extract a list of all
> defined user properity names for a document.
> 
> The preferred method getUserPropertyElements() returns a list of
> all user properites but does not allow me to discover there names.

When in doubt, read the source:

    sub user_defined {
        my $self = shift;

        ...;
        my @elements = $self->getElementList('//meta:user-defined');

        ...;
        my %fields = ();
        foreach my $element (@elements) {
            my $name	    = $self->getAttribute($element, 'meta:name');
            my $content	    = $self->getText($element);
            $fields{$name}  = $content;
        }

        return %fields;
    }

    sub getUserPropertyElements {
        my $self = shift;
        return $self->getElementList('//meta:user-defined');
    }

(reformatted and abbreviated, obviously; that module has seriously
bizarre formatting...). So by the looks of things you can use
$doc->getAttribute($element, "meta:name") to extract a property's name.

Ben



------------------------------

Date: Mon, 26 Mar 2012 20:48:02 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Problem with splitting data
Message-Id: <slrnjn1ef5.oim.hjp-usenet2@hrunkner.hjp.at>

On 2012-03-26 00:21, Uri Guttman <uri@stemsystems.com> wrote:
>>>>>> "PJH" == Peter J Holzer <hjp-usenet2@hjp.at> writes:
>
>  PJH> Your benchmark script doesn't include the case 
>  PJH>     $text = do { local( @ARGV, $/ ) = $filename ; <> } ;
>
>  PJH> It includes a case 
>  PJH>     my $text = orig_slurp_scalar( $file_name )
>
>  PJH> where orig_slurp_scalar then calls orig_slurp, which does the above. So
>  PJH> that adds two function calls and at least one, more likely several extra
>  PJH> copies (I don't know how scalar returns are implemented in perl). 
>
> true. i didn't account for the overhead in the extra sub calls.
>
>  PJH> I have added this to the end of bench_scalar_slurp and rerun the script:
>
>  PJH>                 direct_slurp_scalar =>ˇ 
>  PJH>                         sub { my $text = do { local( @ARGV, $/ ) = $file_name ; <> } },
>
>  PJH> The result is surprising. I would have expected that to be about as fast
>  PJH> as FS::read_file (because that's what I've seen in my own benchmarks),
>  PJH> but it's a lot faster, even faster than FS::read_file_buf_ref2:
>
> what size file are you testing?

Sorry, I accidentally deleted that line. These times are from the 1MB
scalar read test case (on a 3GHz Core2). 

For the smaller sizes (512B, 10kB) orig_slurp is *faster* than
FS::read_file and and direct_slurp_scalar ist still faster, but
old_sysread_file beats them all ;-).

>  PJH>                            Rate  orig_slurp  FS::read_file  FS::read_file_buf_ref2 direct_slurp_scalar
>  PJH> file_contents             169/s        -76%           -81%                    -90%                -92%
>  PJH> file_contents_no_OO       170/s        -75%           -81%                    -90%                -92%
>  PJH> orig_read_file            560/s        -19%           -39%                    -67%                -73%
>  PJH> orig_slurp                694/s          --           -24%                    -59%                -66%
>  PJH> FS12::read_file           907/s         31%            -0%                    -46%                -56%
>  PJH> FS::read_file             910/s         31%             --                    -46%                -55%
>  PJH> old_sysread_file          919/s         32%             1%                    -45%                -55%
>  PJH> FS::read_file_scalar_ref 1047/s         51%            15%                    -37%                -49%
>  PJH> FS::read_file_buf_ref    1051/s         52%            15%                    -37%                -49%
>  PJH> old_read_file            1232/s         78%            35%                    -26%                -40%
>  PJH> FS::read_file_buf_ref2   1673/s        141%            84%                      --                -18%
>  PJH> direct_slurp_scalar      2043/s        195%           124%                     22%                  --
>
> i wouldn't call that much faster.

Well, you called orig_slurp "slow as hell", but FS::read_file is only
31% faster, while direct_slurp_scalar is 124% faster than FS::read_file.


> also as i said, file sizes matter too.

Yes, of course.


> and perl could have improved the guts of <> since i first wrote that
> (it needed it badly).

That's why I asked whether you had repeated your benchmarks in the last
ten years. Perl I/O has been significantly revamped for 5.8.x and it
hasn't used stdio by default for a long time (it's still available as a
compile time option I think). Oh and the last time we had this
discussion (about 2 years ago) you quoted benchmark results from a 300
MHz SPARC (IIRC), which wasn't exactly bleeding edge at the time.


> even so, it is such a fugly idiom that i would never teach it.

That I agree with.


>  PJH> I wonder if there is a systematic error here ...
>
>  PJH> All tests were made with files which were already cached in memory -
>  PJH> when the files have to be read from disk, all differences will probably
>  PJH> be negligible.
>
> not exactly as requesting larger reads is still faster than what stdio
> would do.

Even stdio is much faster than disk and has been for a long time (at
least on Linux). A CPU can burn an awful lot of cycles while waiting for
the next block. And perl doesn't use stdio anyway.

	hp


-- 
   _  | Peter J. Holzer    | Deprecating human carelessness and
|_|_) | Sysadmin WSR       | ignorance has no successful track record.
| |   | hjp@hjp.at         | 
__/   | http://www.hjp.at/ |  -- Bill Code on asrg@irtf.org


------------------------------

Date: Mon, 26 Mar 2012 17:02:19 -0700
From: sln@netherlands.com
Subject: Re: Your Regex Brain
Message-Id: <oe02n75navdq5lqdd47abmpo9fmn433kv0@4ax.com>

On Sat, 24 Mar 2012 16:30:28 -0700 (PDT), Xah Lee <xahlee@gmail.com> wrote:

>?Your Regex Brain?
>http://xahlee.org/comp/your_regex_brain.html
>

That's more like a brain cell.
This is more like a regex brain.

'
<img 
  (?=\s) 
  (?= (?:[^>"\']|"[^"]*"|\'[^\']*\')*? (?<=\s) width \s*=
      (?: (?> \s* ([\'"]) \s* (?<WIDTH>.*?) \s* \g{-2} )
        | (?> (?!\s*[\'"]) \s* (?<WIDTH>[^\s>]*) (?=\s|>) )   
      )
  )
  (?= (?:[^>"\']|"[^"]*"|\'[^\']*\')*? (?<=\s) src \s*=
      (?: (?> \s* ([\'"]) \s* (?<SRC>.*?) \s* \g{-2} )
        | (?> (?!\s*[\'"]) \s* (?<SRC>[^\s>]*) (?=\s|>) )   
      )
  )
  (?= (?:[^>"\']|"[^"]*"|\'[^\']*\')*? (?<=\s) height \s*=
      (?: (?> \s* ([\'"]) \s* (?<HEIGHT>.*?) \s* \g{-2} )
        | (?> (?!\s*[\'"]) \s* (?<HEIGHT>[^\s>]*) (?=\s|>) )   
      )
  )
  (?= (?:[^>"\']|"[^"]*"|\'[^\']*\')*? (?<=\s) alt \s*=
      (?: (?> \s* ([\'"]) \s* (?<ALT>.*?) \s* \g{-2} )
        | (?> (?!\s*[\'"]) \s* (?<ALT>[^\s>]*) (?=\s|>) )   
      )
  )
  (?> \s+ (?:".*?"|\'.*?\'|[^>]*?)+ > ) (?<!/>)
'

-sln



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3650
***************************************


home help back first fref pref prev next nref lref last post