[30117] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 1360 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Mar 13 18:10:14 2008

Date: Thu, 13 Mar 2008 15:09:10 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 13 Mar 2008     Volume: 11 Number: 1360

Today's topics:
    Re: command output <glex_no-spam@qwest-spam-no.invalid>
    Re: command output <noreply@gunnar.cc>
    Re: command output <mritty@gmail.com>
    Re: command output <jurgenex@hotmail.com>
        comparing a 2D array <rose@russ.org>
    Re: comparing a 2D array <glex_no-spam@qwest-spam-no.invalid>
    Re: comparing a 2D array <abigail@abigail.be>
    Re: comparing a 2D array <rose@russ.org>
    Re: comparing a 2D array <rose@russ.org>
    Re: comparing a 2D array <someone@example.com>
    Re: comparing a 2D array <glex_no-spam@qwest-spam-no.invalid>
        m// on very long lines leaks memory <sjackman@gmail.com>
    Re: m// on very long lines leaks memory <someone@example.com>
    Re: m// on very long lines leaks memory <someone@example.com>
    Re: m// on very long lines leaks memory xhoster@gmail.com
    Re: Matching multiple subexpressions in a regular expre <devnull4711@web.de>
    Re: Matching multiple subexpressions in a regular expre <sjackman@gmail.com>
    Re: Matching multiple subexpressions in a regular expre xhoster@gmail.com
    Re: Matching multiple subexpressions in a regular expre <devnull4711@web.de>
        req example uncompressing mybigdir.tar.gz file <rsarpi@gmail.com>
    Re: Style and subroutines in Perl Programs xhoster@gmail.com
    Re: Style and subroutines in Perl Programs <pgodfrin@gmail.com>
    Re: Style and subroutines in Perl Programs xhoster@gmail.com
    Re: Style and subroutines in Perl Programs <pgodfrin@gmail.com>
    Re: Style and subroutines in Perl Programs <someone@example.com>
    Re: Style and subroutines in Perl Programs <bik.mido@tiscalinet.it>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 13 Mar 2008 13:29:56 -0500
From: "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid>
Subject: Re: command output
Message-Id: <47d972a4$0$33227$815e3792@news.qwest.net>

jammer wrote:
> There is no output from this:
>                 `/usr/bin/ls -l "$backupDir/$backupName"`;
> 
> This shows correct values:
>                 print "/usr/bin/ls -l $backupDir/$backupName";

Correct values???  It prints:

/usr/bin/ls -l /

It sounds like you answered your own question though, whatever
that question was.

Possibly this is what you're after:

perldoc -q "Why can't I get the output of a command with system"


------------------------------

Date: Thu, 13 Mar 2008 19:33:25 +0100
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: command output
Message-Id: <63tabmF29gbouU1@mid.individual.net>

jammer wrote:
> There is no output from this:
>                 `/usr/bin/ls -l "$backupDir/$backupName"`;

Are you sure that the ls program resides in /usr/bin? On my box it's 
located in /bin, but in any case

     `ls -l "$backupDir/$backupName"`

ought to be sufficient.

Otherwise, if you for some reason don't see what is in STDERR, you may 
want to try:

     `/usr/bin/ls -l "$backupDir/$backupName" 2>&1`

Please see http://perldoc.perl.org/perlop.html#qx/STRING/

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl


------------------------------

Date: Thu, 13 Mar 2008 11:51:38 -0700 (PDT)
From: Paul Lalli <mritty@gmail.com>
Subject: Re: command output
Message-Id: <f3ef3d62-4f2a-471a-83fd-7fe7e721b02c@d45g2000hsc.googlegroups.com>

On Mar 13, 2:01=A0pm, jammer <jamesloc...@mail.com> wrote:
> There is no output from this:
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 `/usr/bin/ls -l "$backupDir/$backupName"`;=


Why would there be?  You're capturing the output and then discarding
it by using `` in a void context.

Read perldoc perlop (search for qx//) to learn what `` actually do,
because you don't seem to understand them.

Paul Lalli


------------------------------

Date: Thu, 13 Mar 2008 20:09:32 GMT
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: command output
Message-Id: <492jt35ses6i2hr7925cn02p1chgrqe3n5@4ax.com>

jammer <jameslockie@mail.com> wrote:
>There is no output from this:
>                `/usr/bin/ls -l "$backupDir/$backupName"`;

Does that surprise you? It shouldn't.

>This shows correct values:
>                print "/usr/bin/ls -l $backupDir/$backupName";

Whatever "correct values" is supposed to mean. It will print that given
string with the two variables $backupDir and $backupName expanded. Did you
expect something else?
 
Of course backticks to do something different then double quotes. Does that
surprise you?

What is your question/issue/...?

jue


------------------------------

Date: Fri, 14 Mar 2008 03:53:07 +0800
From: "Rose" <rose@russ.org>
Subject: comparing a 2D array
Message-Id: <frc0n7$ror$1@ijustice.itsc.cuhk.edu.hk>

For the following 2D array comparison codes modified from perllol google 
search, I wonder why the output generated is not what I expected. Could 
anybody tell me what i should modify in order to have an exact match of all 
the attributes of a row from an individual file? Thanks a lot~



#!/usr/bin/perl

use warnings;

$usage='prog t1 t2 cmpname';
die "Usage: $usage\n" if $#ARGV < 1;

$outname = $ARGV[2];
$file1 = $ARGV[0];
$file2 = $ARGV[1];
$cmpcout = $outname . ".cmpofcmp.xls";

open(FP1, $file1);
open(FP2, $file2);
open(CMP, ">$cmpcout");

$line = <FP1>;  #get header
while ($line ne "") {
        $line = <FP1>;
    @attr1 = split(/[\t ]+/, $line);
        push @row1, [ @attr1 ];
}

$line = <FP2>;  #get header
while ($line ne "") {
        $line = <FP2>;
    @attr2 = split(/[\t ]+/, $line);
        push @row2, [ @attr2 ];
}

for $aref1 (@row1) {
        for $aref2 (@row2) {
                if (@$aref1 == @$aref2) {
                        print "@$aref1\n";
                }
        }
}

========================
File t1:
a1      a2      a3
1       AAAA    99
0       TT      88
99      what    -888
File t2:
a1      a2      a3
1       AAAAA   99
0       TCCT    88
-10     TT      88
99      what    888

==========================
Output:
1 AAAA 99

1 AAAA 99

1 AAAA 99

1 AAAA 99

0 TT 88

0 TT 88

0 TT 88

0 TT 88

99 what -888

99 what -888

99 what -888

99 what -888






------------------------------

Date: Thu, 13 Mar 2008 15:04:10 -0500
From: "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid>
Subject: Re: comparing a 2D array
Message-Id: <47d988ba$0$89867$815e3792@news.qwest.net>

Rose wrote:
> For the following 2D array comparison codes modified from perllol google 
> search, I wonder why the output generated is not what I expected. Could 
> anybody tell me what i should modify in order to have an exact match of all 
> the attributes of a row from an individual file? Thanks a lot~
> 
> 
> 
> #!/usr/bin/perl
> 
> use warnings;

Forgot:

use strict;

> 
> $usage='prog t1 t2 cmpname';
> die "Usage: $usage\n" if $#ARGV < 1;
> 
> $outname = $ARGV[2];
> $file1 = $ARGV[0];
> $file2 = $ARGV[1];
> $cmpcout = $outname . ".cmpofcmp.xls";
> 
> open(FP1, $file1);
> open(FP2, $file2);
> open(CMP, ">$cmpcout");

Check that they were successful.

> 
> $line = <FP1>;  #get header
> while ($line ne "") {
>         $line = <FP1>;
>     @attr1 = split(/[\t ]+/, $line);
No need for @attr1, just put the split in the [ ], below.
>         push @row1, [ @attr1 ];
> }
> 
> $line = <FP2>;  #get header
> while ($line ne "") {
>         $line = <FP2>;
>     @attr2 = split(/[\t ]+/, $line);
>         push @row2, [ @attr2 ];
> }
> 
> for $aref1 (@row1) {
>         for $aref2 (@row2) {
>                 if (@$aref1 == @$aref2) {

perldoc -q "How do I test whether two arrays or hashes are equal"

>                         print "@$aref1\n";
>                 }
>         }
> }

[...]


------------------------------

Date: 13 Mar 2008 20:39:30 GMT
From: Abigail <abigail@abigail.be>
Subject: Re: comparing a 2D array
Message-Id: <slrnftj482.154.abigail@alexandra.abigail.be>

                              _
Rose (rose@russ.org) wrote on VCCCVIII September MCMXCIII in
<URL:news:frc0n7$ror$1@ijustice.itsc.cuhk.edu.hk>:
??  For the following 2D array comparison codes modified from perllol google 
??  search, I wonder why the output generated is not what I expected. Could 
??  anybody tell me what i should modify in order to have an exact match of all 
??  the attributes of a row from an individual file? Thanks a lot~
??  
??  
??  
??  #!/usr/bin/perl
??  
??  use warnings;
??  
??  $usage='prog t1 t2 cmpname';
??  die "Usage: $usage\n" if $#ARGV < 1;
??  
??  $outname = $ARGV[2];
??  $file1 = $ARGV[0];
??  $file2 = $ARGV[1];
??  $cmpcout = $outname . ".cmpofcmp.xls";
??  
??  open(FP1, $file1);
??  open(FP2, $file2);
??  open(CMP, ">$cmpcout");

You are blindly assuming that the open will succeed. What if an input
file doesn't exist, or if you don't have permission to open the output
file?

??  
??  $line = <FP1>;  #get header
??  while ($line ne "") {

Eh, did you by any chance follow a Perl course organized by HP education?
They have the only books I've encountered that don't use the canonical way
of iterating over a file:

    while ($line = <FP1>) {
        ...
    }

But I wouldn't use bare file handles any more. With any Perl from the 
current century, you could write:

    open my $fh, '<', $file or die "open: $!";
    while (my $line = <$fh>) {
        # Do something with $line
    }
    close $fh or die "close: $!";

??          $line = <FP1>;
??      @attr1 = split(/[\t ]+/, $line);
??          push @row1, [ @attr1 ];
??  }


??  
??  $line = <FP2>;  #get header
??  while ($line ne "") {
??          $line = <FP2>;
??      @attr2 = split(/[\t ]+/, $line);
??          push @row2, [ @attr2 ];
??  }
??  
??  for $aref1 (@row1) {
??          for $aref2 (@row2) {
??                  if (@$aref1 == @$aref2) {
??                          print "@$aref1\n";

This just compares whether $aref1 and $aref2 point to arrays of the 
same length.

You should compare each element individually.

??                  }
??          }
??  }

Untested code:

LOOP:
    my $equal = 1;
    foreach my $row1 (@row1) {
        foreach my $row2 (@row2) {
            unless (@$row1 == @$row2) {
                $equal = 0;
                last LOOP;
            }
            for (my $i = 0; $i < @$row1; $i ++) {
                my $elem1 = $$row1 [$i];
                my $elem2 = $$row2 [$i];
                unless ($elem1 eq $elem2) {
                    $equal = 0;
                    last LOOP;
                }
            }
        }
    }

    say "Arrays are equal" if $equal;

        

Abigail
-- 
A perl rose:  perl -e '@}>-`-,-`-%-'


------------------------------

Date: Fri, 14 Mar 2008 04:41:18 +0800
From: "Rose" <rose@russ.org>
Subject: Re: comparing a 2D array
Message-Id: <frc3hi$tfp$1@ijustice.itsc.cuhk.edu.hk>


"J. Gleixner" <glex_no-spam@qwest-spam-no.invalid> wrote in message 
news:47d988ba$0$89867$815e3792@news.qwest.net...

>> $line = <FP1>;  #get header
>> while ($line ne "") {
>>         $line = <FP1>;
>>     @attr1 = split(/[\t ]+/, $line);
> No need for @attr1, just put the split in the [ ], below.
>>         push @row1, [ @attr1 ];
>> }

put the split in the []? I don't quite get what you mean....

After modifying the codes in perlfaq as:

for ($i = 0; $i < @$aref1; $i++) {
            if ($aref1->[$i] ne $aref2->[$i]){
                $match = 0;
                last;
            }
        }

        if ($match == 1) {
            print "@$aref1\n";
        }
}

this time it successfully finds the matches, but it prints out extra new 
lines, why does that happen? 




------------------------------

Date: Fri, 14 Mar 2008 04:52:57 +0800
From: "Rose" <rose@russ.org>
Subject: Re: comparing a 2D array
Message-Id: <frc47d$tnn$1@ijustice.itsc.cuhk.edu.hk>

> Untested code:
>
> LOOP:
>    my $equal = 1;
>    foreach my $row1 (@row1) {
>        foreach my $row2 (@row2) {
>            unless (@$row1 == @$row2) {
>                $equal = 0;
>                last LOOP;
>            }
>            for (my $i = 0; $i < @$row1; $i ++) {
>                my $elem1 = $$row1 [$i];
>                my $elem2 = $$row2 [$i];
>                unless ($elem1 eq $elem2) {
>                    $equal = 0;
>                    last LOOP;
>                }
>            }
>        }
>    }
>
>    say "Arrays are equal" if $equal;
>
>
>
> Abigail
> -- 
> A perl rose:  perl -e '@}>-`-,-`-%-'


Thanks, Abigail. But it seems that your untested code is testing the whole 
array of array, but not individual rows, maybe the last code should enter 
after the last but second curly bracket. 




------------------------------

Date: Thu, 13 Mar 2008 20:57:56 GMT
From: "John W. Krahn" <someone@example.com>
Subject: Re: comparing a 2D array
Message-Id: <ozgCj.108165$C61.20057@edtnps89>

Rose wrote:
> "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid> wrote in message 
> news:47d988ba$0$89867$815e3792@news.qwest.net...
> 
>>> $line = <FP1>;  #get header
>>> while ($line ne "") {
>>>         $line = <FP1>;
>>>     @attr1 = split(/[\t ]+/, $line);
>> No need for @attr1, just put the split in the [ ], below.
>>>         push @row1, [ @attr1 ];
>>> }
> 
> put the split in the []? I don't quite get what you mean....

         push @row1, [ split /[\t ]+/, $line ];


John
-- 
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall


------------------------------

Date: Thu, 13 Mar 2008 16:50:01 -0500
From: "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid>
Subject: Re: comparing a 2D array
Message-Id: <47d9a18a$0$48223$815e3792@news.qwest.net>

Rose wrote:
> "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid> wrote in message 
> news:47d988ba$0$89867$815e3792@news.qwest.net...
> 
>>> $line = <FP1>;  #get header
>>> while ($line ne "") {
>>>         $line = <FP1>;
[...]
> this time it successfully finds the matches, but it prints out extra new 
> lines, why does that happen? 

Because you're not removing it from your input.

perldoc -f chomp

Eithe chomp() or remove the "\n" from your print.


------------------------------

Date: Thu, 13 Mar 2008 14:26:18 -0700 (PDT)
From: ShaunJ <sjackman@gmail.com>
Subject: m// on very long lines leaks memory
Message-Id: <7f5c9c41-d199-470c-8789-d10562a0033b@i29g2000prf.googlegroups.com>

The following snippet leaks memory until it breaks and falls down when
m// is used on a very long line. It works fine if the line lengths are
short. Try
 ./test.pl /usr/share/dict/words /usr/share/dict/words
Depending on your dictionary, you'll see that compiling the regex
takes about 200 MB. However the following matching loop leaks memory
at an alarming rate. Start up `top` and watch it run. I'm using Perl
5.8.6 built for darwin-thread-multi-2level. If anyone cares to confirm
or deny this behaviour for other architectures or version of Perl,
that would be interesting too.

Cheers,
Shaun

#!/usr/bin/perl
use strict;
use English;
open REFILE, '<' . shift;
chomp (my @restrings = <REFILE>);
close REFILE;
my @re = map { qr/$_/ } @restrings;

open TEXTFILE, '<' . shift;
chomp (my @text = <TEXTFILE>);
close TEXTFILE;
my $text = join '', @text;

foreach my $re (@re) {
	if ($text =~ m/$re/) {
		print $LAST_MATCH_START[0], "\n";
	}
}


------------------------------

Date: Thu, 13 Mar 2008 21:47:28 GMT
From: "John W. Krahn" <someone@example.com>
Subject: Re: m// on very long lines leaks memory
Message-Id: <QhhCj.108179$C61.84238@edtnps89>

ShaunJ wrote:
> The following snippet leaks memory until it breaks and falls down when
> m// is used on a very long line. It works fine if the line lengths are
> short. Try
> ./test.pl /usr/share/dict/words /usr/share/dict/words
> Depending on your dictionary, you'll see that compiling the regex
> takes about 200 MB. However the following matching loop leaks memory
> at an alarming rate. Start up `top` and watch it run. I'm using Perl
> 5.8.6 built for darwin-thread-multi-2level. If anyone cares to confirm
> or deny this behaviour for other architectures or version of Perl,
> that would be interesting too.
> 
> Cheers,
> Shaun
> 
> #!/usr/bin/perl
> use strict;
> use English;
> open REFILE, '<' . shift;
> chomp (my @restrings = <REFILE>);
> close REFILE;
> my @re = map { qr/$_/ } @restrings;
> 
> open TEXTFILE, '<' . shift;
> chomp (my @text = <TEXTFILE>);
> close TEXTFILE;
> my $text = join '', @text;
> 
> foreach my $re (@re) {
> 	if ($text =~ m/$re/) {
> 		print $LAST_MATCH_START[0], "\n";
> 	}
> }

I tested it and if I remove the English module it works fine.
(So don't use English.pm!)



John
-- 
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall


------------------------------

Date: Thu, 13 Mar 2008 21:53:53 GMT
From: "John W. Krahn" <someone@example.com>
Subject: Re: m// on very long lines leaks memory
Message-Id: <RnhCj.108185$C61.74913@edtnps89>

John W. Krahn wrote:
> ShaunJ wrote:
>> The following snippet leaks memory until it breaks and falls down when
>> m// is used on a very long line. It works fine if the line lengths are
>> short. Try
>> ./test.pl /usr/share/dict/words /usr/share/dict/words
>> Depending on your dictionary, you'll see that compiling the regex
>> takes about 200 MB. However the following matching loop leaks memory
>> at an alarming rate. Start up `top` and watch it run. I'm using Perl
>> 5.8.6 built for darwin-thread-multi-2level. If anyone cares to confirm
>> or deny this behaviour for other architectures or version of Perl,
>> that would be interesting too.
>>
>> Cheers,
>> Shaun
>>
>> #!/usr/bin/perl
>> use strict;
>> use English;
>> open REFILE, '<' . shift;
>> chomp (my @restrings = <REFILE>);
>> close REFILE;
>> my @re = map { qr/$_/ } @restrings;
>>
>> open TEXTFILE, '<' . shift;
>> chomp (my @text = <TEXTFILE>);
>> close TEXTFILE;
>> my $text = join '', @text;
>>
>> foreach my $re (@re) {
>>     if ($text =~ m/$re/) {
>>         print $LAST_MATCH_START[0], "\n";
>>     }
>> }
> 
> I tested it and if I remove the English module it works fine.
> (So don't use English.pm!)

Or at least don't use the $PREMATCH, $MATCH, or $POSTMATCH variables:

use English qw( -no_match_vars );



John
-- 
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall


------------------------------

Date: 13 Mar 2008 22:01:08 GMT
From: xhoster@gmail.com
Subject: Re: m// on very long lines leaks memory
Message-Id: <20080313180110.774$tG@newsreader.com>

ShaunJ <sjackman@gmail.com> wrote:
> The following snippet leaks memory until it breaks and falls down when
> m// is used on a very long line. It works fine if the line lengths are
> short. Try
> ./test.pl /usr/share/dict/words /usr/share/dict/words
> Depending on your dictionary, you'll see that compiling the regex
> takes about 200 MB. However the following matching loop leaks memory
> at an alarming rate. Start up `top` and watch it run. I'm using Perl
> 5.8.6 built for darwin-thread-multi-2level. If anyone cares to confirm
> or deny this behaviour for other architectures or version of Perl,
> that would be interesting too.

Technically, this does not seem to be a leak.  If I throw in infinite
loop around your foreach my $re (@re) loop, then memory only grows
up to 15.5Gig when the inner loop completes.  Upon the next iteration of
the outer loop, memory stops growing.  So it seems like it is an
inefficiency rather than a leak.  With idle speculation, I'd say that each
$re maintains some kind of independent state, that that state is
proportional to the size of the string it was last used on, and that that
storage is reused next time that $re gets invoked, but not before then.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.


------------------------------

Date: Thu, 13 Mar 2008 21:08:11 +0100
From: Frank Seitz <devnull4711@web.de>
Subject: Re: Matching multiple subexpressions in a regular expression
Message-Id: <63tfteF29csd1U2@mid.individual.net>

ShaunJ wrote:
> On Mar 12, 7:40 pm, Frank Seitz <devnull4...@web.de> wrote:
>>
>>400'000 precompiled regexes are quite a lot!
>>Why don't you read and create them in chunks of, say, 1000?
> 
> 400'000 * 27 = 12 MB. I wouldn't say it's that large.

Oh, this is a very naive calculation. A compiled regex
is much, much bigger, but I can't tell how big.

> In any case, the following 9-line code snippet burns through 100MB of
> memory a second using Perl 5.8.6! Certainly a memory leak. The only
> explanation I can think of is if the m/$re/g expression were
> recompiling the regex every time and the previously compiled regex
> weren't being discarded.

No, Perl precompiles the patterns (your 400'000 regexes)
into an internal representation at the moment of qr//,
that is at the beginning of your program.

> my @restrings = <REFILE>;
> my @re = map { qr/$_/x } @restrings;
> while (<>) {
> 	foreach my $re (@re) {
> 		while (m/$re/g) {
> 			print $LAST_MATCH_START[0], "\n";
> 		}
> 	}
> }

Try to exchange the loops, as I proposed in
<63rld7F28brs2U6@mid.individual.net>.

Frank
-- 
Dipl.-Inform. Frank Seitz; http://www.fseitz.de/
Anwendungen für Ihr Internet und Intranet
Tel: 04103/180301; Fax: -02; Industriestr. 31, 22880 Wedel


------------------------------

Date: Thu, 13 Mar 2008 14:02:20 -0700 (PDT)
From: ShaunJ <sjackman@gmail.com>
Subject: Re: Matching multiple subexpressions in a regular expression
Message-Id: <b4883dca-6dcd-4eb6-b0ca-5e46838b8cfe@s13g2000prd.googlegroups.com>

On Mar 13, 1:08 pm, Frank Seitz <devnull4...@web.de> wrote:
 ...
> > In any case, the following 9-line code snippet burns through 100MB of
> > memory a second using Perl 5.8.6! Certainly a memory leak. The only
> > explanation I can think of is if the m/$re/g expression were
> > recompiling the regex every time and the previously compiled regex
> > weren't being discarded.
>
> No, Perl precompiles the patterns (your 400'000 regexes)
> into an internal representation at the moment of qr//,
> that is at the beginning of your program.

That is my intention. A quick experiment with `top` shows that the
400'000 regex use 500 MB of memory once they're compiled, which is
fine by me. It's the following loop that leaks memory like crazy, and
it shouldn't use any additional memory. Any ideas why?

Swapping the loops won't have any effect, as there's only one string
(one line in the input file) for the first while(<>) loop.

Cheers,
Shaun


------------------------------

Date: 13 Mar 2008 21:28:39 GMT
From: xhoster@gmail.com
Subject: Re: Matching multiple subexpressions in a regular expression
Message-Id: <20080313172842.016$Hq@newsreader.com>

ShaunJ <sjackman@gmail.com> wrote:

> Hi John,
>
> If I structure my program as in the example, using many small regex
> instead of one big regex, Perl 5.8.6 runs out of memory and dies:
> vm_allocate failed, Out of memory! I have 400'000 regex of exactly 27
> characters each, and the input string is one line 100 kB long. The
> machine has 2 GB of memory and free disk space, which should be
> enough, so I presume the code is somehow leeking memory. It's only a
> dozen or so lines long, so I've posted my code below. Can you see an
> obvious leak?
>
> Thanks,
> Shaun
>
> my @restrings = <REFILE>;
> my @re = map { qr/$_/x } @restrings;
> while (<>) {
 ...

Can you produce a version that we can run?  (I.e. that doesn't
depend on REFILE or STDIN, which we don't have access to?)

The below doesn't leak on  v5.8.3, 5.8.7, or 5.8.8.

use strict;
use warnings;
use English;

my @re = map { qr/$_/x } '000000'..'400000';
push @re, qr/\d/; #just to make sure something matches

foreach
('000000000000000000000000000000'..'000000000000000000000001000000') {
        print "$_\n";
        my $i = 0;
        foreach my $re (@re) {
                $i++;
                pos = 0;
                while (m/$re/g) {
                        print $i, "\t",
                                $LAST_MATCH_START[0] + 1, "\t",
                                $&, "\n";
                        pos = $LAST_MATCH_START[0] + 1;
                }
        }
}
__END__

-- 
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.


------------------------------

Date: Thu, 13 Mar 2008 22:32:21 +0100
From: Frank Seitz <devnull4711@web.de>
Subject: Re: Matching multiple subexpressions in a regular expression
Message-Id: <63tkr7F29csd1U3@mid.individual.net>

ShaunJ wrote:
> On Mar 13, 1:08 pm, Frank Seitz <devnull4...@web.de> wrote:
>>
>>No, Perl precompiles the patterns (your 400'000 regexes)
>>into an internal representation at the moment of qr//,
>>that is at the beginning of your program.
> 
> That is my intention. A quick experiment with `top` shows that the
> 400'000 regex use 500 MB of memory once they're compiled, which is
> fine by me. It's the following loop that leaks memory like crazy, and
> it shouldn't use any additional memory. Any ideas why?

No.

> Swapping the loops won't have any effect, as there's only one string
> (one line in the input file) for the first while(<>) loop.

Sure? This (untested) code should solve problem:

my $str = <>;
while (<REFILE>) {
   my $re = qr/$_/x;
   while ($str =~ /$re/g) {
       print $LAST_MATCH_START[0], "\n";
   }
}

Frank
-- 
Dipl.-Inform. Frank Seitz; http://www.fseitz.de/
Anwendungen für Ihr Internet und Intranet
Tel: 04103/180301; Fax: -02; Industriestr. 31, 22880 Wedel


------------------------------

Date: Thu, 13 Mar 2008 12:55:26 -0700 (PDT)
From: monk <rsarpi@gmail.com>
Subject: req example uncompressing mybigdir.tar.gz file
Message-Id: <2068ac78-3ed0-4353-93df-1836c92c9622@i29g2000prf.googlegroups.com>

I'm looking for the equivalent of the command `tar xvzpf
mybigdir.tar.gz`

I'd like to use core modules to uncompress gunzipped and tarballs.

mybigdir.tar.gz.  Inside the tarball, there are two subdirectories(foo
and lando).  Each subdirectory has regular ascii files.

Do you mind providing an example which decompresses myfile.tar.gz
while keeping the dir structure? dumping it into that same root
directory or other directory.

Thanks in advance,







------------------------------

Date: 13 Mar 2008 18:15:49 GMT
From: xhoster@gmail.com
Subject: Re: Style and subroutines in Perl Programs
Message-Id: <20080313141551.021$zt@newsreader.com>

pgodfrin <pgodfrin@gmail.com> wrote:
> Greetings,
>
> I've browsed the Camel book and perldoc.perl.org  about style in a
> program and I was wondering what most people think.
>
> I'm in the habit of the following pseudo-code structure of my programs
> (or are they scripts :) ):
>
> 0.  shebang and comments

The shebang has to be first if it is going to be meaningful.
Comments go where ever the thing that needs commenting is.
On some scripts, there are comments right after the shebang line
giving the purpose of the script as a whole.  But if there is
a usage message, then I don't also include comments with the same
info.


> 1.  Use statements

If the module being used supports only one sub or code-section and is
unlikely to be useful outside that context, then the use statement goes in
the sub it supports.  Otherwise it goes up near the top.  (use strict and
use warnings go right up after shebang/purpose.)


> 2.  'global' variables or constants
> 3.  INIT or BEGINs

If the INIT or BEGIN supports a well-defined sub-set of the code, then it
goes with the code it supports.  Same with END blocks.  Otherwise
I usually put BEGIN blocks before the "use"s, other than strict and
warnings, because I might want whatever BEGIN does to happen before the
modules are run.

> 4.  Subroutine definitions

I generally put the subroutines after the main code.  This was drilled into
me by a teacher who was a big advocate of "top-down" programming.  It
always seemed kind of silly to me, because tying the lexical order of the
code to the temporal order of its planning/conceptualization seemed to be
such weak symbolism as to be trivial.  But it has mostly stuck with me
anyway.  If I am taking over someone else's code and they put the
subroutines first, I rarely take the effort to rearrange it.  If you are
into B&D, then your order has the advantage that perl will not compile
subroutines using lexical variable names that are "my"ed in the main body
of code.  With the other order you can get away this sin (provided the main
code is at file scope.)

> 5.  main program, using a "MAIN:" label (just so I can find it)

Do you put the main program in a "bare" block just so you can label it like
this?  I'd just use a comment, # MAIN:, instead of adding a spurious scope.

> 6.  an exit; statement

I usually don't use unconditional exit statements. I just let it fall of
the end of the code and exit that way.  If the code is so messy that it
is not obvious where the end of the file-scope-level code is, and I don't
want to take the time to clean it up at the moment, then I might add an
unconditional exit just as a reminder.


Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.


------------------------------

Date: Thu, 13 Mar 2008 12:26:17 -0700 (PDT)
From: pgodfrin <pgodfrin@gmail.com>
Subject: Re: Style and subroutines in Perl Programs
Message-Id: <438bc395-e2ca-42c6-ad97-237e986c0d64@i29g2000prf.googlegroups.com>

On Mar 13, 1:15 pm, xhos...@gmail.com wrote:
> pgodfrin <pgodf...@gmail.com> wrote:
> > Greetings,
>
> > I've browsed the Camel book and perldoc.perl.org  about style in a
> > program and I was wondering what most people think.
>
> > I'm in the habit of the following pseudo-code structure of my programs
> > (or are they scripts :) ):
>
> > 0.  shebang and comments
>
> The shebang has to be first if it is going to be meaningful.
> Comments go where ever the thing that needs commenting is.
> On some scripts, there are comments right after the shebang line
> giving the purpose of the script as a whole.  But if there is
> a usage message, then I don't also include comments with the same
> info.
>
> > 1.  Use statements
>
> If the module being used supports only one sub or code-section and is
> unlikely to be useful outside that context, then the use statement goes in
> the sub it supports.  Otherwise it goes up near the top.  (use strict and
> use warnings go right up after shebang/purpose.)
>
> > 2.  'global' variables or constants
> > 3.  INIT or BEGINs
>
> If the INIT or BEGIN supports a well-defined sub-set of the code, then it
> goes with the code it supports.  Same with END blocks.  Otherwise
> I usually put BEGIN blocks before the "use"s, other than strict and
> warnings, because I might want whatever BEGIN does to happen before the
> modules are run.
>
> > 4.  Subroutine definitions
>
> I generally put the subroutines after the main code.  This was drilled into
> me by a teacher who was a big advocate of "top-down" programming.  It
> always seemed kind of silly to me, because tying the lexical order of the
> code to the temporal order of its planning/conceptualization seemed to be
> such weak symbolism as to be trivial.  But it has mostly stuck with me
> anyway.  If I am taking over someone else's code and they put the
> subroutines first, I rarely take the effort to rearrange it.  If you are
> into B&D, then your order has the advantage that perl will not compile
> subroutines using lexical variable names that are "my"ed in the main body
> of code.  With the other order you can get away this sin (provided the main
> code is at file scope.)
>
> > 5.  main program, using a "MAIN:" label (just so I can find it)
>
> Do you put the main program in a "bare" block just so you can label it like
> this?  I'd just use a comment, # MAIN:, instead of adding a spurious scope.
>
> > 6.  an exit; statement
>
> I usually don't use unconditional exit statements. I just let it fall of
> the end of the code and exit that way.  If the code is so messy that it
> is not obvious where the end of the file-scope-level code is, and I don't
> want to take the time to clean it up at the moment, then I might add an
> unconditional exit just as a reminder.
>
> Xho
>
> --
> --------------------http://NewsReader.Com/--------------------
> The costs of publication of this article were defrayed in part by the
> payment of page charges. This article must therefore be hereby marked
> advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
> this fact.

Thanks Xho - very good observations. What is B&D ?
pg


------------------------------

Date: 13 Mar 2008 19:49:58 GMT
From: xhoster@gmail.com
Subject: Re: Style and subroutines in Perl Programs
Message-Id: <20080313155001.048$B7@newsreader.com>

pgodfrin <pgodfrin@gmail.com> wrote:
>
> Thanks Xho - very good observations. What is B&D ?

"bondage and discipline", originating from some salacious/erotic activity.

In a programming context, it is slang to refer to highly rigid languages or
philosophies that force you to obey "good practices" or highly structured
programming, even when some compromise on those standards might be
convenient (and even when such compromise would lead to better code, some
would argue).

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.


------------------------------

Date: Thu, 13 Mar 2008 12:59:43 -0700 (PDT)
From: pgodfrin <pgodfrin@gmail.com>
Subject: Re: Style and subroutines in Perl Programs
Message-Id: <b9e73d5a-a511-4f64-9389-286e19fbc29b@s8g2000prg.googlegroups.com>

On Mar 13, 2:49 pm, xhos...@gmail.com wrote:
> pgodfrin <pgodf...@gmail.com> wrote:
>
> > Thanks Xho - very good observations. What is B&D ?
>
> "bondage and discipline", originating from some salacious/erotic activity.
>
> In a programming context, it is slang to refer to highly rigid languages or
> philosophies that force you to obey "good practices" or highly structured
> programming, even when some compromise on those standards might be
> convenient (and even when such compromise would lead to better code, some
> would argue).
>
> Xho
>
> --
> --------------------http://NewsReader.Com/--------------------
> The costs of publication of this article were defrayed in part by the
> payment of page charges. This article must therefore be hereby marked
> advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
> this fact.

k - got it...I got into the habit of putting the subroutines first so
I didn't need to "forward declare". But I'm beginning to think that
you really don't need to anyway...
pg


------------------------------

Date: Thu, 13 Mar 2008 20:35:24 GMT
From: "John W. Krahn" <someone@example.com>
Subject: Re: Style and subroutines in Perl Programs
Message-Id: <gegCj.108161$C61.93935@edtnps89>

pgodfrin wrote:
> On Mar 13, 2:49 pm, xhos...@gmail.com wrote:
>> pgodfrin <pgodf...@gmail.com> wrote:
>>
>>> Thanks Xho - very good observations. What is B&D ?
>> "bondage and discipline", originating from some salacious/erotic activity.
>>
>> In a programming context, it is slang to refer to highly rigid languages or
>> philosophies that force you to obey "good practices" or highly structured
>> programming, even when some compromise on those standards might be
>> convenient (and even when such compromise would lead to better code, some
>> would argue).
> 
> k - got it...I got into the habit of putting the subroutines first so
> I didn't need to "forward declare". But I'm beginning to think that
> you really don't need to anyway...

You don't really need to but if the subroutines are defined first then 
they can be called without using parentheses like Perl's built-in operators.

sub my_sub { return "In my_sub" }

print my_sub;


However if they are defined at the end then parentheses are required:

print my_sub();

sub my_sub { return "In my_sub" }


You could also declare the subroutines at the start and then define them 
later:

sub my_sub;

print my_sub;

sub my_sub { return "In my_sub" }



John
-- 
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall


------------------------------

Date: Thu, 13 Mar 2008 21:47:45 +0100
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: Style and subroutines in Perl Programs
Message-Id: <9l4jt31g760s5k9lqa44o6oskodlinegc1@4ax.com>

On Thu, 13 Mar 2008 10:19:34 -0700 (PDT), pgodfrin
<pgodfrin@gmail.com> wrote:

>I've seen some code where the subroutines are placed after the "main"
>program - this way (my way) seems logical to me, but short of starting
>a war of opinion, I was wondering what others thought about the
>placement of subroutines?

The advantage of putting them at the beginning is that they will be
known as the main program is compiled and you can e.g. skip some
parens. Alternatively you can place them at the end, and use
declarations.


Michele
-- 
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
 .'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 1360
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[30117] in Perl-Users-Digest

Perl-Users Digest, Issue: 1360 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Thu Mar 13 18:10:14 2008

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Mar 13 18:10:14 2008