[32857] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4122 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Jan 25 23:08:11 2014

Date: Wed, 22 Jan 2014 21:09:09 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 22 Jan 2014     Volume: 11 Number: 4122

Today's topics:
    Re: file size <marius@ieval.ro>
    Re: file size <ben@morrow.me.uk>
    Re: file size <rweikusat@mobileactivedefense.com>
    Re: file size <ben@morrow.me.uk>
    Re: file size <rweikusat@mobileactivedefense.com>
        Regex replacement via external command <sun_tong_001@users.sourceforge.net>
    Re: Regex replacement via external command <tim@tim-landscheidt.de>
    Re: Regex replacement via external command <rweikusat@mobileactivedefense.com>
    Re: Regex replacement via external command <rweikusat@mobileactivedefense.com>
    Re: Regex replacement via external command <gravitalsun@hotmail.foo>
    Re: Regex replacement via external command <derykus@gmail.com>
    Re: Regex replacement via external command <derykus@gmail.com>
    Re: Regex replacement via external command <sun_tong_001@users.sourceforge.net>
    Re: Regex replacement via external command <sun_tong_001@users.sourceforge.net>
    Re: Regex replacement via external command <derykus@gmail.com>
    Re: Regex replacement via external command <tim@tim-landscheidt.de>
    Re: Regex replacement via external command <sun_tong_001@users.sourceforge.net>
        sys call length limitation <sun_tong_001@users.sourceforge.net>
    Re: sys call length limitation (Tim McDaniel)
    Re: sys call length limitation <gravitalsun@hotmail.foo>
    Re: sys call length limitation <rweikusat@mobileactivedefense.com>
    Re: sys call length limitation <sun_tong_001@users.sourceforge.net>
    Re: sys call length limitation <rweikusat@mobileactivedefense.com>
    Re: sys call length limitation <sun_tong_001@users.sourceforge.net>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Tue, 21 Jan 2014 13:46:35 +0200
From: Marius Gavrilescu <marius@ieval.ro>
Subject: Re: file size
Message-Id: <87bnz5pnfo.fsf@ieval.ro>

--=-=-=
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

George Mpouras <gravitalsun@foo.com> writes:

> =CE=A3=CF=84=CE=B9=CF=82 21/1/2014 12:08, =CE=BF/=CE=B7 George Mpouras =
=CE=AD=CE=B3=CF=81=CE=B1=CF=88=CE=B5:
>> I have a file already open (in fact could be 100s). How can I get the
>> size faster

use Benchmark;

This code:

	#!/usr/bin/perl -w
=09
	use Benchmark qw/cmpthese/;
=09
	open FILE, '<file';
	cmpthese(10000000, {
		-s =3D> sub { -s FILE },
		seektell =3D> sub { seek FILE, 0, 2; tell FILE },
	})

produces this output over here (Debian amd64, perl v5.18.2):

                  Rate seektell       -s
    seektell 3952569/s       --     -48%
    -s       7575758/s      92%       --

So -s is almost twice as fast as seektell.

If the seek is only done once, the result becomes:

                   Rate       -s seektell
    -s        7022472/s       --     -82%
    seektell 39370079/s     461%       --

So tell is significantly faster than -s, but the combination of
seek+tell is slower. Of course, YMMV. Run the benchmark and see what
works best in your case.
=2D-=20
Marius Gavrilescu

--=-=-=
Content-Type: application/pgp-signature

-----BEGIN xxx SIGNATURE-----
Version: GnuPG v1

iQIcBAEBCgAGBQJS3l4bAAoJEMoENb5ewbNi5uoQAJJ0bWy8563Jb0F1AFrMIbLC
tfTBwpuotCznakouOaB8d41H6vVeXrs9y0owtmYNg2SRCmfFoqPB2vc7wypF4J3j
B7/UR2uoeFYvkWpqu0t7EjeozhtXv4Vo39JiTzJulzm48lp1mVk9Oj/azsa/hxTu
hDK0zcWJL9q5CGk3ppBzpjrelS6iyx3tbbRHaOAyKY79DO4tr22JJFBhns7nAWih
AB57p8/465IirAxYP3yrY3OABFsd2DUcjILJyASR9haTPU2MqvUIKLGU6rua/uwA
7mXik8yEWrpbdIovO9XHKXqeNaGCCayNnobnpZiSZFuuvSxA9m+oP8/DS8M2vHSH
z5yCcJbo3aQ8yk1OOugZ/U8slh59WzxG88Nme7uoWhjP/4qIek93pakrjGDJWuRm
VWpuawTNXIgPhVyJsDQ9p95pBXCrO0yEzxC13RjDUzDCtDAOOe3Awhdd+TR3ztx1
c4T7p81G7LSJiZqzujS9ITi6sphHz4YH0TfQbeoqnSW7Ba0ZWZgl6QQ61+hVeAKb
i2PnunZL4mkXlQxJy/AqnDHS+u/5Vun5K2M+1DVr0xrPzaWsyiEWWtX3hpuJ0zh9
9CyrsmCr0ggDosYFcww67NK/PIplo/SJAOjSBo6Y2YDDGkkpvIVQ8iSCivCJINot
fsUKksOK1RcCInh21m45
=bgO+
-----END PGP SIGNATURE-----
--=-=-=--


------------------------------

Date: Tue, 21 Jan 2014 13:50:39 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: file size
Message-Id: <fia2ra-95o1.ln1@anubis.morrow.me.uk>


Quoth Marius Gavrilescu <marius@ieval.ro>:
> 
> So tell is significantly faster than -s, but the combination of
> seek+tell is slower. Of course, YMMV. Run the benchmark and see what
> works best in your case.

That's expected. tell (as opposed to calling sysseek and looking at the
return value) does not make a system call, it just looks into PerlIO's
data structures to see where the file pointer is now. seek calls
lseek(2) every time it is called (in fact, the PerlIO implementation
calls lseek twice, once to 'seek' the :unix layer and then again to
'tell' it), so it will be (much) slower than tell. -s calls fstat(2), so
it's likely to be faster than seek but rather slower than tell.

Ben



------------------------------

Date: Tue, 21 Jan 2014 14:34:46 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: file size
Message-Id: <8761pd769l.fsf@sable.mobileactivedefense.com>

Marius Gavrilescu <marius@ieval.ro> writes:
> George Mpouras <gravitalsun@foo.com> writes:
>> Στις 21/1/2014 12:08, ο/η George Mpouras έγραψε:
>>> I have a file already open (in fact could be 100s). How can I get the
>>> size faster
>
> use Benchmark;
>
> This code:
>
> 	#!/usr/bin/perl -w
> 	
> 	use Benchmark qw/cmpthese/;
> 	
> 	open FILE, '<file';
> 	cmpthese(10000000, {
> 		-s => sub { -s FILE },
> 		seektell => sub { seek FILE, 0, 2; tell FILE },
> 	})
>
> produces this output over here (Debian amd64, perl v5.18.2):
>
>                   Rate seektell       -s
>     seektell 3952569/s       --     -48%
>     -s       7575758/s      92%       --
>
> So -s is almost twice as fast as seektell.
>
> If the seek is only done once, the result becomes:
>
>                    Rate       -s seektell
>     -s        7022472/s       --     -82%
>     seektell 39370079/s     461%       --
>
> So tell is significantly faster than -s, but the combination of
> seek+tell is slower.

The first seek moves the current file position to the end of the
file, which causes the old current position to be lost. All subsequent
seeks don't do any actual seeking. This should rather be something like

---------
#!/usr/bin/perl -w
	
use Benchmark qw/cmpthese/;
	
open FILE, '<file';
cmpthese(10000000, {
	-s => sub {
	    -s FILE
	},
	seektell => sub {
	    my ($old, $rc);

	    $old = tell(FILE);
	    seek FILE, 0, 2;
	    $rc = tell FILE;
	    seek FILE, $old, 0;
	    return $rc;
	},
})
----------


------------------------------

Date: Tue, 21 Jan 2014 16:12:36 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: file size
Message-Id: <ksi2ra-6kp1.ln1@anubis.morrow.me.uk>


Quoth Rainer Weikusat <rweikusat@mobileactivedefense.com>:
> Marius Gavrilescu <marius@ieval.ro> writes:
> >
> > 	cmpthese(10000000, {
> > 		-s => sub { -s FILE },
> > 		seektell => sub { seek FILE, 0, 2; tell FILE },
> > 	})
> 
> The first seek moves the current file position to the end of the
> file, which causes the old current position to be lost. All subsequent
> seeks don't do any actual seeking.

They may not move the (OS) file pointer, but they will still make two
lseek(2) calls, which is what takes the time. (Moving the file pointer
from within the kernel is obviously entirely trivial.) Perl doesn't
know, until the OS tells it, that the file hasn't changed length since
the last time it found the end.

> This should rather be something like
> 
> ---------
> #!/usr/bin/perl -w
> 	
> use Benchmark qw/cmpthese/;
> 	
> open FILE, '<file';
> cmpthese(10000000, {
> 	-s => sub {
> 	    -s FILE
> 	},
> 	seektell => sub {
> 	    my ($old, $rc);
> 
> 	    $old = tell(FILE);
> 	    seek FILE, 0, 2;
> 	    $rc = tell FILE;
> 	    seek FILE, $old, 0;
> 	    return $rc;
> 	},
> })

This will do four lseek(2)s per iteration vs one fstat(2); not exactly a
fair comparison. If you want a fairer comparison of seeking vs stat, you
want to use 

    lseek => sub {
        sysseek FILE, 0, SEEK_END;
    }

which, because it's not going through the PerlIO interface, only calls
lseek(2) once.

Ben



------------------------------

Date: Tue, 21 Jan 2014 16:44:05 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: file size
Message-Id: <87ob355lpm.fsf@sable.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Rainer Weikusat <rweikusat@mobileactivedefense.com>:
>> Marius Gavrilescu <marius@ieval.ro> writes:
>> >
>> > 	cmpthese(10000000, {
>> > 		-s => sub { -s FILE },
>> > 		seektell => sub { seek FILE, 0, 2; tell FILE },
>> > 	})
>> 
>> The first seek moves the current file position to the end of the
>> file, which causes the old current position to be lost. All subsequent
>> seeks don't do any actual seeking.
>
> They may not move the (OS) file pointer, but they will still make two
> lseek(2) calls, which is what takes the time. (Moving the file pointer
> from within the kernel is obviously entirely trivial.) Perl doesn't
> know, until the OS tells it, that the file hasn't changed length since
> the last time it found the end.

It seems to me that a better implementation should be possible here but
that's sort-of besides the point which was supposed to be that the first
seek moves the file pointer to the end of the file and it then stays
there for the purpose of this benchmark: There's nothing which magically
causes it to revert to the 'current position' prior to the seek, hence


[...]

>> 	seektell => sub {
>> 	    my ($old, $rc);
>> 
>> 	    $old = tell(FILE);
>> 	    seek FILE, 0, 2;
>> 	    $rc = tell FILE;
>> 	    seek FILE, $old, 0;
>> 	    return $rc;
>> 	},
>> })
>
> This will do four lseek(2)s per iteration vs one fstat(2); not exactly a
> fair comparison.

would be a fairer comparison because fstat doesn't destroy the current
file position. 


------------------------------

Date: Wed, 22 Jan 2014 14:31:16 GMT
From: * Tong * <sun_tong_001@users.sourceforge.net>
Subject: Regex replacement via external command
Message-Id: <UCQDu.202578$Yb6.126769@fx29.iad>

On Wed, 22 Jan 2014 06:25:18 +0000, Tim McDaniel wrote:

> Don't do `...` when there may be a lot of output.  Please see the
> perlopentut man page, specifically "pipe open", and don't try to pass
> long values on the command line.

The problem I was dealing with is, I need to pick out a big chunk of 
input string (>200K, by regex), feed it to external program (which is 
pipe after pipe after pipe), then replace the matching string with the 
processed result. what's the proper way to do it (for big matching 
chunks)? 

A thousand time over-simplified version is:

  perl -e 'print("aa". "x" x 238565 . "bb", "\n")' > HttpBody
  <HttpBody perl -n000e 's,(x+),`echo $1 | wc -c`,eg; print'

The problem is  that I not only need to process this big chunk of 
matching string via the external program, but I also need to replace the 
matching string with the result of the external process. Putting two 
together is where the problem for me. 

Please help.

Thanks


------------------------------

Date: Wed, 22 Jan 2014 16:02:16 +0000
From: Tim Landscheidt <tim@tim-landscheidt.de>
Subject: Re: Regex replacement via external command
Message-Id: <87fvogvwc7.fsf@passepartout.tim-landscheidt.de>

(anonymous) wrote:

>> Don't do `...` when there may be a lot of output.  Please see the
>> perlopentut man page, specifically "pipe open", and don't try to pass
>> long values on the command line.

> The problem I was dealing with is, I need to pick out a big chunk of
> input string (>200K, by regex), feed it to external program (which is
> pipe after pipe after pipe), then replace the matching string with the
> processed result. what's the proper way to do it (for big matching
> chunks)?

> A thousand time over-simplified version is:

>   perl -e 'print("aa". "x" x 238565 . "bb", "\n")' > HttpBody
>   <HttpBody perl -n000e 's,(x+),`echo $1 | wc -c`,eg; print'

> The problem is  that I not only need to process this big chunk of
> matching string via the external program, but I also need to replace the
> matching string with the result of the external process. Putting two
> together is where the problem for me.

You could replace (in this example) the call of `echo $1 |
wc -c` with a double-sided pipe where you feed $1 on stdin
to wc and collect wc's stdout.  You need to look at
IPC::Open2 & Co. on how to achieve that; see "perldoc -q
'How can I open a pipe both to and from a command?'" for
pointers.

Another approach (as always) would be temporary files.

Tim


------------------------------

Date: Wed, 22 Jan 2014 16:27:56 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Regex replacement via external command
Message-Id: <87fvogrng3.fsf@sable.mobileactivedefense.com>

Tim Landscheidt <tim@tim-landscheidt.de> writes:
> (anonymous) wrote:
>
>>> Don't do `...` when there may be a lot of output.  Please see the
>>> perlopentut man page, specifically "pipe open", and don't try to pass
>>> long values on the command line.
>
>> The problem I was dealing with is, I need to pick out a big chunk of
>> input string (>200K, by regex), feed it to external program (which is
>> pipe after pipe after pipe), then replace the matching string with the
>> processed result. what's the proper way to do it (for big matching
>> chunks)?
>
>> A thousand time over-simplified version is:
>
>>   perl -e 'print("aa". "x" x 238565 . "bb", "\n")' > HttpBody
>>   <HttpBody perl -n000e 's,(x+),`echo $1 | wc -c`,eg; print'
>
>> The problem is  that I not only need to process this big chunk of
>> matching string via the external program, but I also need to replace the
>> matching string with the result of the external process. Putting two
>> together is where the problem for me.
>
> You could replace (in this example) the call of `echo $1 |
> wc -c` with a double-sided pipe where you feed $1 on stdin
> to wc and collect wc's stdout.  You need to look at
> IPC::Open2 & Co. on how to achieve that; see "perldoc -q
> 'How can I open a pipe both to and from a command?'" for
> pointers.

That's almost certainly a recipe for disaster for 'large amounts of
data' because the process writing to the input pipe will block once the
'input' pipe buffer is full and the external command will block once the
'output' pipe buffer is full, ie, the whole thing will deadlock.

> Another approach (as always) would be temporary files.

In case the program is really supposed to work as a filter, a possible
other aproach would be to use a 'Perl lexer', eg, for the example above,
assuming the input is in $s (untested)

for ($s) {
	/\G(x+)/gc and do {
        	my $fh;

                open($fh, '|command');
                print $fh ($1);
                $fh = undef;

                redo;
	};

        /\G([^x]+)/gc and print($1), redo;
}

and simply let the output of the external command appear 'in the right
place' of the stdout output of the perl script (since they'll share the
same stdout).


------------------------------

Date: Wed, 22 Jan 2014 16:30:42 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Regex replacement via external command
Message-Id: <87bnz4rnbh.fsf@sable.mobileactivedefense.com>

Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:

[...]


> for ($s) {
> 	/\G(x+)/gc and do {
>         	my $fh;
>
>                 open($fh, '|command');
>                 print $fh ($1);
>                 $fh = undef;

The last line isn't really needed.


------------------------------

Date: Wed, 22 Jan 2014 20:44:41 +0200
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: Regex replacement via external command
Message-Id: <lbp3j6$td$1@news.ntua.gr>

Στις 22/1/2014 16:31, ο/η * Tong * έγραψε:
> On Wed, 22 Jan 2014 06:25:18 +0000, Tim McDaniel wrote:
>
>> Don't do `...` when there may be a lot of output.  Please see the
>> perlopentut man page, specifically "pipe open", and don't try to pass
>> long values on the command line.
>
> The problem I was dealing with is, I need to pick out a big chunk of
> input string (>200K, by regex), feed it to external program (which is
> pipe after pipe after pipe), then replace the matching string with the
> processed result. what's the proper way to do it (for big matching
> chunks)?
>
> A thousand time over-simplified version is:
>
>    perl -e 'print("aa". "x" x 238565 . "bb", "\n")' > HttpBody
>    <HttpBody perl -n000e 's,(x+),`echo $1 | wc -c`,eg; print'
>
> The problem is  that I not only need to process this big chunk of
> matching string via the external program, but I also need to replace the
> matching string with the result of the external process. Putting two
> together is where the problem for me.
>
> Please help.
>
> Thanks
>


here you go , do something like similar (no size limit)


	perl -e 'while(<>) { s/aa/bb/;  print }'  HttpBody




------------------------------

Date: Wed, 22 Jan 2014 18:47:57 -0800
From: Charles DeRykus <derykus@gmail.com>
Subject: Re: Regex replacement via external command
Message-Id: <lbpvt3$q15$1@speranza.aioe.org>

On 1/22/2014 6:31 AM, * Tong * wrote:
> On Wed, 22 Jan 2014 06:25:18 +0000, Tim McDaniel wrote:
>
>> Don't do `...` when there may be a lot of output.  Please see the
>> perlopentut man page, specifically "pipe open", and don't try to pass
>> long values on the command line.
>
> The problem I was dealing with is, I need to pick out a big chunk of
> input string (>200K, by regex), feed it to external program (which is
> pipe after pipe after pipe), then replace the matching string with the
> processed result. what's the proper way to do it (for big matching
> chunks)?
>
> A thousand time over-simplified version is:
>
>    perl -e 'print("aa". "x" x 238565 . "bb", "\n")' > HttpBody
>    <HttpBody perl -n000e 's,(x+),`echo $1 | wc -c`,eg; print'
>
> The problem is  that I not only need to process this big chunk of
> matching string via the external program, but I also need to replace the
> matching string with the result of the external process. Putting two
> together is where the problem for me.
>

I'd recommend putting the mule train on a separate line:

     if ( ($pre, $match, $post) = $huge_str =~ /...pattern.../p ) {
          my $new_match = ...  # external process  | ... | ...
          say $pre, $new_match, $post;
     }

-- 
Charles DeRykus






------------------------------

Date: Wed, 22 Jan 2014 19:47:56 -0800
From: Charles DeRykus <derykus@gmail.com>
Subject: Re: Regex replacement via external command
Message-Id: <lbq3dn$gd$1@speranza.aioe.org>

On 1/22/2014 6:47 PM, Charles DeRykus wrote:
> ...
>
>      if ( ($pre, $match, $post) = $huge_str =~ /...pattern.../p ) {
>           my $new_match = ...  # external process  | ... | ...
>           say $pre, $new_match, $post;
>      }
>


Make that:

        if ( $huge_str =~ /...pattern.../p ) {
            (my $new_match = ${^MATCH}) = ext process ... | ...
            say ${^PREMATCH}, $new_match, ${^POSTMATCH};
        }

-- 
Charles DeRykus


------------------------------

Date: Thu, 23 Jan 2014 04:01:58 GMT
From: * Tong * <sun_tong_001@users.sourceforge.net>
Subject: Re: Regex replacement via external command
Message-Id: <Wu0Eu.368970$DB4.29583@fx14.iad>

On Wed, 22 Jan 2014 16:27:56 +0000, Rainer Weikusat wrote:

>>> The problem is  that I not only need to process this big chunk of
>>> matching string via the external program, but I also need to replace
>>> the matching string with the result of the external process. Putting
>>> two together is where the problem for me.
>>
>> You could replace (in this example) the call of `echo $1 |
>> wc -c` with a double-sided pipe where you feed $1 on stdin to wc and
>> collect wc's stdout.  You need to look at IPC::Open2 & Co. on how to
>> achieve that; see "perldoc -q 'How can I open a pipe both to and from a
>> command?'" for pointers.
> 
> That's almost certainly a recipe for disaster for 'large amounts of
> data' because the process writing to the input pipe will block once the
> 'input' pipe buffer is full and the external command will block once the
> 'output' pipe buffer is full, ie, the whole thing will deadlock.

BINGO. I tried IPC::Open2, as I was trying to solve the problem myself, 
but it froze up. Now I know exactly why.

Thanks


------------------------------

Date: Thu, 23 Jan 2014 04:08:28 GMT
From: * Tong * <sun_tong_001@users.sourceforge.net>
Subject: Re: Regex replacement via external command
Message-Id: <0B0Eu.56659$J12.13269@fx10.iad>

On Wed, 22 Jan 2014 18:47:57 -0800, Charles DeRykus wrote:

> I'd recommend putting the mule train on a separate line . . .

Thank you, and for the code correction too. 

I'll see if I can make it works first, else/then try Rainer's. 

Thanks again everyone for your helps. 



------------------------------

Date: Wed, 22 Jan 2014 20:12:06 -0800
From: Charles DeRykus <derykus@gmail.com>
Subject: Re: Regex replacement via external command
Message-Id: <lbq4qt$2v0$1@speranza.aioe.org>

On 1/22/2014 7:47 PM, Charles DeRykus wrote:
> On 1/22/2014 6:47 PM, Charles DeRykus wrote:
>> ...
>>
>>      if ( ($pre, $match, $post) = $huge_str =~ /...pattern.../p ) {
>>           my $new_match = ...  # external process  | ... | ...
>>           say $pre, $new_match, $post;
>>      }
>>
>
>
> Make that:
>
>         if ( $huge_str =~ /...pattern.../p ) {
>             (my $new_match = ${^MATCH}) = ext process ... | ...
                ^^^^^^^^^^^^^^^^^^^^^^^^^
                my $match = ${^MATCH};
                my $new_match =  ext process .... $match | ... ;


-- 
Charles DeRykus



------------------------------

Date: Thu, 23 Jan 2014 04:32:45 +0000
From: Tim Landscheidt <tim@tim-landscheidt.de>
Subject: Re: Regex replacement via external command
Message-Id: <87txcvuxle.fsf@passepartout.tim-landscheidt.de>

Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:

>>>> Don't do `...` when there may be a lot of output.  Please see the
>>>> perlopentut man page, specifically "pipe open", and don't try to pass
>>>> long values on the command line.

>>> The problem I was dealing with is, I need to pick out a big chunk of
>>> input string (>200K, by regex), feed it to external program (which is
>>> pipe after pipe after pipe), then replace the matching string with the
>>> processed result. what's the proper way to do it (for big matching
>>> chunks)?

>>> A thousand time over-simplified version is:

>>>   perl -e 'print("aa". "x" x 238565 . "bb", "\n")' > HttpBody
>>>   <HttpBody perl -n000e 's,(x+),`echo $1 | wc -c`,eg; print'

>>> The problem is  that I not only need to process this big chunk of
>>> matching string via the external program, but I also need to replace the
>>> matching string with the result of the external process. Putting two
>>> together is where the problem for me.

>> You could replace (in this example) the call of `echo $1 |
>> wc -c` with a double-sided pipe where you feed $1 on stdin
>> to wc and collect wc's stdout.  You need to look at
>> IPC::Open2 & Co. on how to achieve that; see "perldoc -q
>> 'How can I open a pipe both to and from a command?'" for
>> pointers.

> That's almost certainly a recipe for disaster for 'large amounts of
> data' because the process writing to the input pipe will block once the
> 'input' pipe buffer is full and the external command will block once the
> 'output' pipe buffer is full, ie, the whole thing will deadlock.

> [...]

"& Co.".  Personally, I prefer IPC::Run and:

| use IPC::Run qw(start pump finish);

| my ($stdin, $stdout, $stderr) = ('', '', '');

| my $h = start (['/usr/bin/wc'], \$stdin, \$stdout, \$stderr);
| for (1..1000000) {
|     $stdin .= "another line\n";
|     pump ($h);
| }
| finish ($h);

| print "wc output: ", $stdout;

doesn't block.

Tim


------------------------------

Date: Thu, 23 Jan 2014 04:45:46 GMT
From: * Tong * <sun_tong_001@users.sourceforge.net>
Subject: Re: Regex replacement via external command
Message-Id: <_71Eu.61663$wD3.10472@fx04.iad>

On Wed, 22 Jan 2014 16:27:56 +0000, Rainer Weikusat wrote:

>> Another approach (as always) would be temporary files.
> 
> In case the program is really supposed to work as a filter, a possible
> other aproach would be to use a 'Perl lexer', eg, for the example above,
> assuming the input is in $s (untested)
> 
> for ($s) {
> 	/\G(x+)/gc and do {
>         	my $fh;
> 
>                 open($fh, '|command');
>                 print $fh ($1);
>                 $fh = undef;
> 
>                 redo;
> 	};
> 
>         /\G([^x]+)/gc and print($1), redo;
> }
> 
> and simply let the output of the external command appear 'in the right
> place' of the stdout output of the perl script (since they'll share the
> same stdout).

This would be quite a mouthful for me. I have to try it out so as to 
understand exactly what's going on. 

But first, a quick question, this looks to me like a filter program in 
Perl, that writes out matches ($1) itself, and uses external command to 
further processing the matches ($1) as well. The external command take 
the match string from pipe input, and writes out its result in the right 
place of the stdout output, correct? I.e., both the Perl script and the 
external command write their output to stdout, correct?

The problem for me is that I not only need to process this big chunk of 
matching string via the external program, but I also need to replace the 
matching string with the result of the external process as well. The 
above code doesn't take care of grabbing the output of the external 
command and use it as the replacement, correct?

Just curious, because as Tim suggested, I can always use temporary files.

Thanks



------------------------------

Date: Tue, 21 Jan 2014 23:58:58 GMT
From: * Tong * <sun_tong_001@users.sourceforge.net>
Subject: sys call length limitation
Message-Id: <6RDDu.243526$2R1.24158@fx13.iad>

Hi, 

I just found out that there is a strict limitation on how many characters 
one can stuff in between `` sys calls:

  $ echo "`cat HttpBody`" | wc -c
  238566

  $ cat HttpBody | perl -e '$s = <>; print length $s; print `echo $s`'
  238566

I.e., the echo works fine in my shell, but not OK within Perl. 

Any way to increase the sys call length limitation, ie the limitation how 
many characters one can stuff in between ``? 

Thanks


------------------------------

Date: Wed, 22 Jan 2014 06:25:18 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: sys call length limitation
Message-Id: <lbno8d$qo6$1@reader1.panix.com>

In article <6RDDu.243526$2R1.24158@fx13.iad>,
* Tong *  <sun_tong_001@users.sourceforge.net> wrote:
>I just found out that there is a strict limitation on how many characters 
>one can stuff in between `` sys calls:
>
>  $ echo "`cat HttpBody`" | wc -c
>  238566
>
>  $ cat HttpBody | perl -e '$s = <>; print length $s; print `echo $s`'
>  238566
>
>I.e., the echo works fine in my shell, but not OK within Perl.

To be more precise, not OK within *your* Perl.  On the system I'm on
on the moment,

$ perl -e 'print("x" x 238565, "\n")' > HttpBody
$ echo "`cat HttpBody`" | wc -c
  238566
$ <HttpBody perl -e '$s = <>; print length $s, "\n"; print `echo $s`' |
less
238566
xxxxxxxxxxx...

But that's something of a quibble.  On this system, using 2385651
works in the echo...wc... command but does not work in Perl.

(As a side note, note that "cat ... |" is usually a useless use of a
program.  Ending the perl call with "<HttpBody" would work as well or
better.  As shown above, at least in bash on this system, "<" and
other file redirection characters work anywhere in the command line --
I usually put them at the end, but putting them at the front, as
above, makes them more visually evident.)

>Any way to increase the sys call length limitation, ie the limitation
>how many characters one can stuff in between ``?

There may well be, but there's an old joke:
    "Doctor, doctor, it hurts when I do this!"
    "Then don't do this!"

Don't do `...` when there may be a lot of output.  Please see the
perlopentut man page, specifically "pipe open", and don't try to pass
long values on the command line.  Mind you, translating it naively to
a pipe open of echo doesn't work either on my system for 2385651:

$ <HttpBody perl -e '$s = <>; print length $s, "\n"; open my $f, "-|", "echo", $s or die "$!";  print <$f>'
2385651
Argument list too long at -e line 1, <> line 1.

but of course it is silly to try to put it on the command line.

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Wed, 22 Jan 2014 09:46:30 +0200
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: sys call length limitation
Message-Id: <lbnsvf$jq$1@news.ntua.gr>

Bash, maximum command line length

	getconf ARG_MAX
	
	echo $(( $(getconf ARG_MAX) - $(env | wc -c) ))


------------------------------

Date: Wed, 22 Jan 2014 11:38:35 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: sys call length limitation
Message-Id: <87d2jki6v8.fsf@sable.mobileactivedefense.com>

* Tong * <sun_tong_001@users.sourceforge.net> writes:
> I just found out that there is a strict limitation on how many characters 
> one can stuff in between `` sys calls:
>
>   $ echo "`cat HttpBody`" | wc -c
>   238566
>
>   $ cat HttpBody | perl -e '$s = <>; print length $s; print `echo $s`'
>   238566
>
> I.e., the echo works fine in my shell, but not OK within Perl. 
>
> Any way to increase the sys call length limitation, ie the limitation how 
> many characters one can stuff in between ``? 

As George already pointed out, this limit usually limits the combined
size of environment and command-line arguments.  Usually, it can only
be changed by changing the kernel.


------------------------------

Date: Wed, 22 Jan 2014 14:17:29 GMT
From: * Tong * <sun_tong_001@users.sourceforge.net>
Subject: Re: sys call length limitation
Message-Id: <ZpQDu.302695$Rp6.63250@fx15.iad>

On Wed, 22 Jan 2014 11:38:35 +0000, Rainer Weikusat wrote:

>> I just found out that there is a strict limitation on how many
>> characters one can stuff in between `` sys calls:
>>
>>   $ echo "`cat HttpBody`" | wc -c 238566
>>
>>   $ cat HttpBody | perl -e '$s = <>; print length $s; print `echo $s`'
>>   238566
>>
>> I.e., the echo works fine in my shell, but not OK within Perl.
>>
>> Any way to increase the sys call length limitation, ie the limitation
>> how many characters one can stuff in between ``?
> 
> As George already pointed out, this limit usually limits the combined
> size of environment and command-line arguments.  

That's only partly true:

 $ getconf ARG_MAX
 2097152

That's way bigger than the 238566 of my file. That's why the echo on the 
shell doesn't fail. 

> Usually, it can only be changed by changing the kernel.

But how come it works with Tim's Perl but not mine?

My env is Ubuntu. Tested in both 12.10 and 13.10. Both are the same. 




------------------------------

Date: Wed, 22 Jan 2014 14:55:43 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: sys call length limitation
Message-Id: <87vbxcrrps.fsf@sable.mobileactivedefense.com>

* Tong * <sun_tong_001@users.sourceforge.net> writes:
> On Wed, 22 Jan 2014 11:38:35 +0000, Rainer Weikusat wrote:
>
>>> I just found out that there is a strict limitation on how many
>>> characters one can stuff in between `` sys calls:
>>>
>>>   $ echo "`cat HttpBody`" | wc -c 238566
>>>
>>>   $ cat HttpBody | perl -e '$s = <>; print length $s; print `echo $s`'
>>>   238566
>>>
>>> I.e., the echo works fine in my shell, but not OK within Perl.
>>>
>>> Any way to increase the sys call length limitation, ie the limitation
>>> how many characters one can stuff in between ``?
>> 
>> As George already pointed out, this limit usually limits the combined
>> size of environment and command-line arguments.  
>
> That's only partly true:

{ARG_MAX}
    Maximum length of argument to the exec functions including environment data.
    Minimum Acceptable Value: {_POSIX_ARG_MAX}
    
    http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/limits.h.html

NB: You may need to register before being allowed to access that.


>  $ getconf ARG_MAX
>  2097152
>
> That's way bigger than the 238566 of my file. That's why the echo on the 
> shell doesn't fail.

The reason the echo in the shell doesn't fail is most likely because
echo is a shell-builtin, ie, there is no exec involved here. When I try
this on my machine with a file of size 140,000,

[rw@sable]/tmp#echo "`cat x`" | wc -c
140001

but

[rw@sable]/tmp#/bin/echo "`cat x`" | wc -c
bash: /bin/echo: Argument list too long
0

The perl-variant can deal with at most 131,067 bytes (getconf ARG_MAX
returns the same value) and strace -f perl ... shows that

[pid  2123] execve("/bin/sh", ["sh", "-c", "echo ;;;;;;;;;;;;;;;;;;;;;;;;;;;"...], [/* 26 vars */]) = -1 E2BIG (Argument list too long)

see also

"Limits on size of arguments and environment" section in the Linux
execve(2) manpage, in particular,

	Additionally, the limit per string is 32 pages (the kernel
	constant MAX_ARG_STRLEN)
        


------------------------------

Date: Thu, 23 Jan 2014 04:15:17 GMT
From: * Tong * <sun_tong_001@users.sourceforge.net>
Subject: Re: sys call length limitation
Message-Id: <pH0Eu.56660$J12.7888@fx10.iad>

On Wed, 22 Jan 2014 14:55:43 +0000, Rainer Weikusat wrote:

> The reason the echo in the shell doesn't fail is most likely because
> echo is a shell-builtin, ie, there is no exec involved here. When I try
> this on my machine with a file of size 140,000,

Thanks!
Figured that out myself the hard way, 'cause I wasn't able to access to NG 
for the whole day, till now when I come back home. 


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4122
***************************************


home help back first fref pref prev next nref lref last post