[32856] in Perl-Users-Digest
Perl-Users Digest, Issue: 4124 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Jan 25 23:05:10 2014
Date: Sat, 25 Jan 2014 11:09:03 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sat, 25 Jan 2014 Volume: 11 Number: 4124
Today's topics:
Re: [OT] sys call length limitation (Tim McDaniel)
Re: [OT] sys call length limitation <rweikusat@mobileactivedefense.com>
Re: [OT] sys call length limitation <rweikusat@mobileactivedefense.com>
Re: [OT] sys call length limitation (Tim McDaniel)
Re: [OT] sys call length limitation <rweikusat@mobileactivedefense.com>
Re: [OT] sys call length limitation (Tim McDaniel)
Re: [OT] sys call length limitation <sun_tong_001@users.sourceforge.net>
Re: [OT] sys call length limitation <sun_tong_001@users.sourceforge.net>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Fri, 24 Jan 2014 17:03:06 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: [OT] sys call length limitation
Message-Id: <lbu6ca$d0u$1@reader1.panix.com>
In article <87fvod9y2r.fsf@sable.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
>Ben Morrow <ben@morrow.me.uk> writes:
>> my $IN = File::Temp->new;
>> my $OUT = File::Temp->new;
>>
>> write_file $IN, $data;
>> system sprintf "wc -c <%s >%s", $IN->filename, $OUT->filename;
>> return read_file $OUT;
>
>Something it does by default is it stringifies to a
>filename. Considering that `` undergoes double-quote interpolation,
>the ->filename and read_file and actually, the output file altogether
>can be omitted via
>
>`wc -c < $IN`
In this example, yes. I don't remember the complete description of
the actual task, but I thought it was some sort of filter where a
large block of HTML (too large for the command line) was being passed
to some filters, which would output a similarly large munged block.
If I'm remembering right, `<$IN his | filter | stuff` would also fail.
>That's 380 LOC (in a 1261 line text file) of a spurious, external
>dependency with no tangible effect save slowing down compilation of
>the code somewhat avoided at the clearly intolerable expense of
>having to write less code.
Oh, please hush up about that.
--
Tim McDaniel, tmcd@panix.com
------------------------------
Date: Fri, 24 Jan 2014 17:20:07 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: [OT] sys call length limitation
Message-Id: <87vbx98fg8.fsf@sable.mobileactivedefense.com>
tmcd@panix.com (Tim McDaniel) writes:
> In article <87fvod9y2r.fsf@sable.mobileactivedefense.com>,
> Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
>>Ben Morrow <ben@morrow.me.uk> writes:
>>> my $IN = File::Temp->new;
>>> my $OUT = File::Temp->new;
>>>
>>> write_file $IN, $data;
>>> system sprintf "wc -c <%s >%s", $IN->filename, $OUT->filename;
>>> return read_file $OUT;
>>
>>Something it does by default is it stringifies to a
>>filename. Considering that `` undergoes double-quote interpolation,
>>the ->filename and read_file and actually, the output file altogether
>>can be omitted via
>>
>>`wc -c < $IN`
>
> In this example, yes. I don't remember the complete description of
> the actual task, but I thought it was some sort of filter where a
> large block of HTML (too large for the command line) was being passed
> to some filters, which would output a similarly large munged block.
It is read into a single Perl scalar either way. If it is too large to
keep it in memory, neither of both approaches is suitable. But
speculating about the differences between the posted 'contrived example'
and the unknown 'real situation' is rather pointless since anyone is
free to make up whatever seems suitable to him because the 'real
situation' is unknown.
>>That's 380 LOC (in a 1261 line text file) of a spurious, external
>>dependency with no tangible effect save slowing down compilation of
>>the code somewhat avoided at the clearly intolerable expense of
>>having to write less code.
>
> Oh, please hush up about that.
Sorry, but these are facts: File::Slurp doesn't provide a benefit here
(or anywhere, for that matter) or at least I don't know what this
benefit might be (Please feel free to argue that it does provide a
benefit, even if only the benefit of "if I write it that way, I won't
have to write it the other way"). But using it is not 'free', costing
both developer time and computer time. While I'm convinced that I'm
right, this doesn't mean I actually am. "Shut up if you don't just love
Justin Bieber" is not a counterargument, though.
------------------------------
Date: Fri, 24 Jan 2014 18:29:03 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: [OT] sys call length limitation
Message-Id: <87r47x8c9c.fsf@sable.mobileactivedefense.com>
Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
[...]
> "Shut up if you don't just love Justin Bieber" is not a
> counterargument, though.
Interesting/ useful text touching this:
http://www.paulgraham.com/disagree.html
------------------------------
Date: Fri, 24 Jan 2014 19:20:13 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: [OT] sys call length limitation
Message-Id: <lbuedd$drg$1@reader1.panix.com>
In article <87vbx98fg8.fsf@sable.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
>tmcd@panix.com (Tim McDaniel) writes:
>> In article <87fvod9y2r.fsf@sable.mobileactivedefense.com>,
>> Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
>>>Ben Morrow <ben@morrow.me.uk> writes:
>>>> my $IN = File::Temp->new;
>>>> my $OUT = File::Temp->new;
>>>>
>>>> write_file $IN, $data;
>>>> system sprintf "wc -c <%s >%s", $IN->filename, $OUT->filename;
>>>> return read_file $OUT;
>>>
>>>Something it does by default is it stringifies to a
>>>filename. Considering that `` undergoes double-quote interpolation,
>>>the ->filename and read_file and actually, the output file
>>>altogether can be omitted via
>>>
>>>`wc -c < $IN`
>>
>> In this example, yes. I don't remember the complete description of
>> the actual task, but I thought it was some sort of filter where a
>> large block of HTML (too large for the command line) was being passed
>> to some filters, which would output a similarly large munged block.
>
>It is read into a single Perl scalar either way. If it is too large to
>keep it in memory, neither of both approaches is suitable.
The problem wasn't that it was too large to keep in memory -- it was
stated that the problem was with the shell command line length for
`command "the data he was trying to process"`.
Moreover, if it were too large for a Perl scalar, that code could be
modified to read line-by-line, or block-by-block, or whatever. `...`
cannot.
>But speculating about the differences between the posted 'contrived
>example' and the unknown 'real situation'
As I recall, he wrote that he tried `...` and it didn't work because
it was too large for the command line. It was diagnosed as being too
large for the shell in particular: I got a problem with `...` with
2000K data when a Perl scalar would have no problems with that size.
--
Tim McDaniel, tmcd@panix.com
------------------------------
Date: Fri, 24 Jan 2014 19:33:17 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: [OT] sys call length limitation
Message-Id: <877g9p89aa.fsf@sable.mobileactivedefense.com>
tmcd@panix.com (Tim McDaniel) writes:
> In article <87vbx98fg8.fsf@sable.mobileactivedefense.com>,
> Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
>>tmcd@panix.com (Tim McDaniel) writes:
>>> In article <87fvod9y2r.fsf@sable.mobileactivedefense.com>,
>>> Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
>>>>Ben Morrow <ben@morrow.me.uk> writes:
>>>>> my $IN = File::Temp->new;
>>>>> my $OUT = File::Temp->new;
>>>>>
>>>>> write_file $IN, $data;
>>>>> system sprintf "wc -c <%s >%s", $IN->filename, $OUT->filename;
>>>>> return read_file $OUT;
>>>>
>>>>Something it does by default is it stringifies to a
>>>>filename. Considering that `` undergoes double-quote interpolation,
>>>>the ->filename and read_file and actually, the output file
>>>>altogether can be omitted via
>>>>
>>>>`wc -c < $IN`
>>>
>>> In this example, yes. I don't remember the complete description of
>>> the actual task, but I thought it was some sort of filter where a
>>> large block of HTML (too large for the command line) was being passed
>>> to some filters, which would output a similarly large munged block.
>>
>>It is read into a single Perl scalar either way. If it is too large to
>>keep it in memory, neither of both approaches is suitable.
>
> The problem wasn't that it was too large to keep in memory -- it was
> stated that the problem was with the shell command line length for
> `command "the data he was trying to process"`.
That's not happening here because $IN will (in the given context)
stringify to the name of the temporary file the data was writte to.
Remark about File::Temp: It struck me that this is really a nice, simple
example of 'an object' in the OOP sense, ie not just an overengineered
way to (ab-)use Perl hashes in order to emulate C structs: A Temp::File
object is 'the temporary file' insofar Perl code is concerned. It can be
used as a file handle and behaves like a file handle in this case. It
can also be used as a file name and then, it behaves like a file
name. Lastly, when the object is destroyed, the file is automatically
unlinked. There's no need to mess around with 'the mechanics' anywhere
here.
------------------------------
Date: Fri, 24 Jan 2014 21:05:58 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: [OT] sys call length limitation
Message-Id: <lbukjm$jnf$1@reader1.panix.com>
In article <877g9p89aa.fsf@sable.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
>tmcd@panix.com (Tim McDaniel) writes:
>> In article <87vbx98fg8.fsf@sable.mobileactivedefense.com>,
>> Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
>>>tmcd@panix.com (Tim McDaniel) writes:
>>>> In article <87fvod9y2r.fsf@sable.mobileactivedefense.com>,
>>>> Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
>>>>>Ben Morrow <ben@morrow.me.uk> writes:
>>>>>> my $IN = File::Temp->new;
>>>>>> my $OUT = File::Temp->new;
>>>>>>
>>>>>> write_file $IN, $data;
>>>>>> system sprintf "wc -c <%s >%s", $IN->filename, $OUT->filename;
>>>>>> return read_file $OUT;
>>>>>
>>>>>Something it does by default is it stringifies to a
>>>>>filename. Considering that `` undergoes double-quote interpolation,
>>>>>the ->filename and read_file and actually, the output file
>>>>>altogether can be omitted via
>>>>>
>>>>>`wc -c < $IN`
>>>>
>>>> In this example, yes. I don't remember the complete description of
>>>> the actual task, but I thought it was some sort of filter where a
>>>> large block of HTML (too large for the command line) was being passed
>>>> to some filters, which would output a similarly large munged block.
>>>
>>>It is read into a single Perl scalar either way. If it is too large to
>>>keep it in memory, neither of both approaches is suitable.
>>
>> The problem wasn't that it was too large to keep in memory -- it was
>> stated that the problem was with the shell command line length for
>> `command "the data he was trying to process"`.
>
>That's not happening here because $IN will (in the given context)
>stringify to the name of the temporary file the data was writte to.
While I wasn't objecting to that, I was getting confused at times
about what was too big. You're right.
`...` would work as well (or as badly) as Ben's proposal. A command
like `... <some_input` does not hit the command-line limit either.
Both Ben's code above and `...` would both hit Perl memory
limitations, or neither will hit. Since the example he provided was
on the order of 200K, your solution `...` should be fine there on most
modern systems.
On the system I'm on right now, the knee of the curve starts bending
way up at 0.5GB:
$ time perl -e 'my $size = (1<<28); print "size $size\n"; my $x = "x" x $size'
size 268435456
real 0m1.379s
user 0m0.111s
sys 0m1.257s
$ time perl -e 'my $size = (1<<29); print "size $size\n"; my $x = "x" x $size'
size 536870912
real 0m5.722s
user 0m0.248s
sys 0m2.444s
$ time perl -e 'my $size = (1<<30); print "size $size\n"; my $x = "x" x $size'
size 1073741824
^C
real 1m43.568s
user 0m0.334s
sys 0m6.197s
but of course Your System's Mileage May Vary.
If one were ever faced with a problem too large for the system
involved, neither `...` nor Ben's solution unmodified would work. At
that point, `...` could not be helped. It'd have to be abandoned in
favor of files or temp files being written to and/or read from record
by record, or block by block, or something like that.
Thank you for bearing with my confusion about what's too big.
--
Tim McDaniel, tmcd@panix.com
------------------------------
Date: Sat, 25 Jan 2014 00:59:23 GMT
From: * Tong * <sun_tong_001@users.sourceforge.net>
Subject: Re: [OT] sys call length limitation
Message-Id: <L%DEu.216375$Yb6.90559@fx29.iad>
On Fri, 24 Jan 2014 13:35:34 +0000, Rainer Weikusat wrote:
> Or consider using Perl. It's not that difficult:
>
> ---------------
> sub filter {
> {
> my $fh = File::Temp->new();
> print $fh (@_);
> `wc -c < $fh`;
> }
> }
> ---------------
Thanks. that works really well, with that extra pair of braces that I
didn't know what's for at the beginning.
Although being poked fun of doesn't feel good. But anyway,
Thanks
------------------------------
Date: Sat, 25 Jan 2014 01:10:01 GMT
From: * Tong * <sun_tong_001@users.sourceforge.net>
Subject: Re: [OT] sys call length limitation
Message-Id: <J9EEu.308830$Wm1.206009@fx17.iad>
On Fri, 24 Jan 2014 19:20:13 +0000, Tim McDaniel wrote:
>>>>`wc -c < $IN`
>>>
>>> In this example, yes. I don't remember the complete description of
>>> the actual task, but I thought it was some sort of filter where a
>>> large block of HTML (too large for the command line) was being passed
>>> to some filters, which would output a similarly large munged block.
>>
>>It is read into a single Perl scalar either way. If it is too large to
>>keep it in memory, neither of both approaches is suitable.
>
> The problem wasn't that it was too large to keep in memory -- it was
> stated that the problem was with the shell command line length for
> `command "the data he was trying to process"`.
>
> Moreover, if it were too large for a Perl scalar, that code could be
> modified to read line-by-line, or block-by-block, or whatever. `...`
> cannot.
>
>>But speculating about the differences between the posted 'contrived
>>example' and the unknown 'real situation'
>
> As I recall, he wrote that he tried `...` and it didn't work because it
> was too large for the command line. It was diagnosed as being too large
> for the shell in particular:
Yes, that's exactly what happened -- I can't stuff over 200k into `...`
call. That's why on seeing Rainer's new code, the first thing I tested is
to verify that `` call can in fact return >200k strings.
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 4124
***************************************