[33052] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 4328 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Dec 15 03:09:22 2014

Date: Mon, 15 Dec 2014 00:09:06 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 15 Dec 2014     Volume: 11 Number: 4328

Today's topics:
    Re: [Q] How can I create a binary-distribution? oblivionsoldier1@gmail.com
    Re: [Q] How can I create a binary-distribution? <rweikusat@mobileactivedefense.com>
    Re: Both substitute and filter <gamo@telecable.es>
    Re: Both substitute and filter <stevem_@nogood.com>
    Re: Both substitute and filter <ben.usenet@bsb.me.uk>
    Re: Both substitute and filter <gamo@telecable.es>
    Re: Both substitute and filter <ben.usenet@bsb.me.uk>
    Re: Both substitute and filter <gamo@telecable.es>
    Re: Both substitute and filter <rweikusat@mobileactivedefense.com>
    Re: Both substitute and filter <rweikusat@mobileactivedefense.com>
    Re: Both substitute and filter <rweikusat@mobileactivedefense.com>
    Re: Both substitute and filter <gamo@telecable.es>
    Re: Both substitute and filter <stevem_@nogood.com>
    Re: Both substitute and filter (Tim McDaniel)
    Re: Both substitute and filter (Tim McDaniel)
    Re: Both substitute and filter (Tim McDaniel)
    Re: Both substitute and filter (Tim McDaniel)
    Re: Both substitute and filter <derykus@gmail.com>
    Re: Latest version of "dedup", and questions on open fi <rweikusat@mobileactivedefense.com>
    Re: Latest version of "dedup", and questions on open fi <whynot@pozharski.name>
    Re: Latest version of "dedup", and questions on open fi <hjp-usenet3@hjp.at>
    Re: Latest version of "dedup", and questions on open fi <whynot@pozharski.name>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sat, 13 Dec 2014 22:30:57 -0800 (PST)
From: oblivionsoldier1@gmail.com
Subject: Re: [Q] How can I create a binary-distribution?
Message-Id: <402589d5-0f59-44c9-b468-af14c273b58c@googlegroups.com>

legend has it Volker is still trying for the life of him to create a binary distribution



------------------------------

Date: Sun, 14 Dec 2014 12:08:44 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: [Q] How can I create a binary-distribution?
Message-Id: <87k31ufn77.fsf@doppelsaurus.mobileactivedefense.com>

oblivionsoldier1@gmail.com writes:
> legend has it Volker is still trying for the life of him to create a
> binary distribution

perl -e 'print("0\t1\n")'


------------------------------

Date: Sat, 13 Dec 2014 19:26:21 +0100
From: gamo <gamo@telecable.es>
Subject: Re: Both substitute and filter
Message-Id: <m6i0ca$aja$1@speranza.aioe.org>

El 13/12/14 a las 00:49, Tim McDaniel escribió:
> I keep asking about idiomatic / readable / concise / cool Perl, and
> here's my latest question.
>
my @sources = ("hi", "there", "TARGET23", "hello", "TARGETblarg", "world");

> @results = ()
> for each member of @sources
>      if s/^TARGET// matched
>          push onto @results the result of the s///
>      else
>          do nothing

I was dreaming about the problem and I doubt anything can beat:

for (@sources){
	next unless ( /^TARGET/ );
	s/^TARGET//;
	push @results, $_;
}
print "@results\n";

~/test$ time perl test.s
23 blarg

real	0m0.004s
user	0m0.002s
sys	0m0.002s


-- 
http://www.telecable.es/personales/gamo/
The generation of random numbers is too important to be left to chance


------------------------------

Date: Sat, 13 Dec 2014 10:35:55 -0800
From: Steve May <stevem_@nogood.com>
Subject: Re: Both substitute and filter
Message-Id: <gG%iw.656281$Hb3.389872@fx03.iad>

On 12/12/2014 03:49 PM, Tim McDaniel wrote:
> I keep asking about idiomatic / readable / concise / cool Perl, and
> here's my latest question.
>
> @sources = ("hi", "there", "TARGET23", "hello", "TARGETblarg", "world");
> @results = ()
> for each member of @sources
>      if s/^TARGET// matched
>          push onto @results the result of the s///
>      else
>          do nothing
>
> Is there a better way of doing that than a loop like the above?
> Here are my initial thoughts.
>

Being self-misstaught and simpleminded, I'd probably do something like:

#! /usr/bin/perl

use strict;
use warnings;

my @sources = ("hi", "there", "TARGET23", "hello", "TARGETblarg", "world");

my @results = ();

for ( @sources ) {  /^TARGET(.+)/ and push @results, $1; }

for (@results) { print "$_\n" }

# Output:
23
blarg


hth,

Steve


------------------------------

Date: Sat, 13 Dec 2014 19:22:16 +0000
From: Ben Bacarisse <ben.usenet@bsb.me.uk>
Subject: Re: Both substitute and filter
Message-Id: <87y4qb5p93.fsf@bsb.me.uk>

gamo <gamo@telecable.es> writes:

> El 13/12/14 a las 00:49, Tim McDaniel escribió:
>> I keep asking about idiomatic / readable / concise / cool Perl, and
>> here's my latest question.
>>
> my @sources = ("hi", "there", "TARGET23", "hello", "TARGETblarg", "world");
>
>> @results = ()
>> for each member of @sources
>>      if s/^TARGET// matched
>>          push onto @results the result of the s///
>>      else
>>          do nothing
>
> I was dreaming about the problem and I doubt anything can beat:
>
> for (@sources){
> 	next unless ( /^TARGET/ );
> 	s/^TARGET//;
> 	push @results, $_;
> }
> print "@results\n";

The simpler and shorter

  @results = map { /^TARGET(.*)$/ } @sources;

is also faster according to Benchmark::cmpthese.

-- 
Ben.


------------------------------

Date: Sat, 13 Dec 2014 22:49:06 +0100
From: gamo <gamo@telecable.es>
Subject: Re: Both substitute and filter
Message-Id: <m6ic8f$7if$1@speranza.aioe.org>

El 13/12/14 a las 20:22, Ben Bacarisse escribió:
> gamo <gamo@telecable.es> writes:
>
>> El 13/12/14 a las 00:49, Tim McDaniel escribió:
>>> I keep asking about idiomatic / readable / concise / cool Perl, and
>>> here's my latest question.
>>>

my @sources = ("hi", "there", "TARGET23", "hello", "TARGETblarg", "world");

>>
>>> @results = ()
>>> for each member of @sources
>>>       if s/^TARGET// matched
>>>           push onto @results the result of the s///
>>>       else
>>>           do nothing
>>
>> I was dreaming about the problem and I doubt anything can beat:
>>
>> for (@sources){
>> 	next unless ( /^TARGET/ );
>> 	s/^TARGET//;
>> 	push @results, $_;
>> }
>> print "@results\n";
>
> The simpler and shorter
>
>    @results = map { /^TARGET(.*)$/ } @sources;
>
> is also faster according to Benchmark::cmpthese.
>

My test runned many times don't said so:

use Benchmark qw(cmpthese);

cmpthese (-3, {
                    'MIO' => sub {
                        my @results;
                        for (@sources){
                            next unless ( /^TARGET/ );
                            s/^TARGET//;
                            push @results, $_;
                        }
#                      print "@results\n";
                    },
                    'MAP' => sub {
                        @results = map { /^TARGET(.*)$/ } @sources;
                    },
                });


-- 
http://www.telecable.es/personales/gamo/
The generation of random numbers is too important to be left to chance


------------------------------

Date: Sun, 14 Dec 2014 00:52:34 +0000
From: Ben Bacarisse <ben.usenet@bsb.me.uk>
Subject: Re: Both substitute and filter
Message-Id: <87h9wz59yl.fsf@bsb.me.uk>

gamo <gamo@telecable.es> writes:

> El 13/12/14 a las 20:22, Ben Bacarisse escribió:
>> gamo <gamo@telecable.es> writes:
>>
>>> El 13/12/14 a las 00:49, Tim McDaniel escribió:
>>>> I keep asking about idiomatic / readable / concise / cool Perl, and
>>>> here's my latest question.
>>>>
>
> my @sources = ("hi", "there", "TARGET23", "hello", "TARGETblarg", "world");
>
>>>
>>>> @results = ()
>>>> for each member of @sources
>>>>       if s/^TARGET// matched
>>>>           push onto @results the result of the s///
>>>>       else
>>>>           do nothing
>>>
>>> I was dreaming about the problem and I doubt anything can beat:
>>>
>>> for (@sources){
>>> 	next unless ( /^TARGET/ );
>>> 	s/^TARGET//;
>>> 	push @results, $_;
>>> }
>>> print "@results\n";
>>
>> The simpler and shorter
>>
>>    @results = map { /^TARGET(.*)$/ } @sources;
>>
>> is also faster according to Benchmark::cmpthese.
>>
>
> My test runned many times don't said so:
>
> use Benchmark qw(cmpthese);
>
> cmpthese (-3, {
>                    'MIO' => sub {
>                        my @results;
>                        for (@sources){
>                            next unless ( /^TARGET/ );
>                            s/^TARGET//;
>                            push @results, $_;
>                        }
> #                      print "@results\n";
>                    },
>                    'MAP' => sub {
>                        @results = map { /^TARGET(.*)$/ } @sources;
>                    },
>                });

Your code modifies @sources so only the first run ever alters the
strings and builds the result.  Even so, the map method can come out
faster if you give the code a name that causes it to be run second!

-- 
Ben.


------------------------------

Date: Sun, 14 Dec 2014 07:44:28 +0100
From: gamo <gamo@telecable.es>
Subject: Re: Both substitute and filter
Message-Id: <m6jbkb$1ft$1@speranza.aioe.org>

El 14/12/14 a las 01:52, Ben Bacarisse escribió:
> Your code modifies @sources so only the first run ever alters the
> strings and builds the result.  Even so, the map method can come out
> faster if you give the code a name that causes it to be run second!

You are right. My code modifies @sources so second pass is different.
In equal conditions, my method is 27% slower. I don't understand why,
because map is a loop and (.*) another. Maybe a overhead in using
more functions.

-- 
http://www.telecable.es/personales/gamo/
The generation of random numbers is too important to be left to chance


------------------------------

Date: Sun, 14 Dec 2014 12:40:14 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Both substitute and filter
Message-Id: <87fvciflqp.fsf@doppelsaurus.mobileactivedefense.com>

tmcd@panix.com (Tim McDaniel) writes:
> In article <548BA5F6.9010007@todbe.com>, $Bill <news@todbe.com> wrote:
>>On 12/12/2014 15:49, Tim McDaniel wrote:
>>> I keep asking about idiomatic / readable / concise / cool Perl, and
>>> here's my latest question.
>>>
>>> @sources = ("hi", "there", "TARGET23", "hello", "TARGETblarg", "world");
>>> @results = ()
>>> for each member of @sources
>>>      if s/^TARGET// matched
>>>          push onto @results the result of the s///
>>>      else
>>>          do nothing

[...]

> I tried this just wondering what it would do:
>
>     @results = map { /^TARGET(.*)$/ } @sources;

[...]

> That's not bad, and I don't think it could be bettered.  Even the loop
> would, I think, be no more clear.

The $ can be omitted.


------------------------------

Date: Sun, 14 Dec 2014 12:49:27 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Both substitute and filter
Message-Id: <87bnn6flbc.fsf@doppelsaurus.mobileactivedefense.com>

Steve May <stevem_@nogood.com> writes:
> On 12/12/2014 03:49 PM, Tim McDaniel wrote:
>> I keep asking about idiomatic / readable / concise / cool Perl, and
>> here's my latest question.
>>
>> @sources = ("hi", "there", "TARGET23", "hello", "TARGETblarg", "world");
>> @results = ()
>> for each member of @sources
>>      if s/^TARGET// matched
>>          push onto @results the result of the s///
>>      else
>>          do nothing

[...]

> #! /usr/bin/perl
>
> use strict;
> use warnings;
>
> my @sources = ("hi", "there", "TARGET23", "hello", "TARGETblarg", "world");
>
> my @results = ();

This is pointless: You're assigning an empty list to an empty array
which results in an empty array.

> for ( @sources ) {  /^TARGET(.+)/ and push @results, $1; }

That's different from both the pseudo-code above and the eventual real
code: This pattern will only match strings starting with TARGET and
followed by at least one other character. A lone TARGET will end up as
empty string in the result with /^TARGET(.*)/.




------------------------------

Date: Sun, 14 Dec 2014 13:02:04 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Both substitute and filter
Message-Id: <877fxufkqb.fsf@doppelsaurus.mobileactivedefense.com>

gamo <gamo@telecable.es> writes:
> El 14/12/14 a las 01:52, Ben Bacarisse escribiÃ³:
>> Your code modifies @sources so only the first run ever alters the
>> strings and builds the result.  Even so, the map method can come out
>> faster if you give the code a name that causes it to be run second!
>
> You are right. My code modifies @sources so second pass is different.
> In equal conditions, my method is 27% slower. I don't understand why,
> because map is a loop and (.*) another. Maybe a overhead in using
> more functions.

Even if (.*) ends up as a loop, that's a loop run by native code inside
the perl regex engine and not a Perl loop. And it surely doesn't have
to, at least not for determining what .* will be as the length of the
string is known in advance. OTOH, for

--------
use Benchmark qw(cmpthese);

my @sources = ("hi", "there", "TARGET23", "hello", "TARGETblarg", "world");


cmpthese (-3, {
	       'MAP' => sub {
		   my @result;
		   @result =  map { /^TARGET(.*)/ } @sources;
	       },

	       'MOP' => sub {
		   my @result;
		   push(@result, /^TARGET(.*)/) for @sources;
	       }});
--------

the 2nd is consistently somewhat faster for me (and they're IMHO as
identical as they can be).


------------------------------

Date: Sun, 14 Dec 2014 14:46:53 +0100
From: gamo <gamo@telecable.es>
Subject: Re: Both substitute and filter
Message-Id: <m6k4cb$o9j$1@speranza.aioe.org>

El 14/12/14 a las 14:02, Rainer Weikusat escribiÃ³:
> 	       'MOP' => sub {
> 		   my @result;
> 		   push(@result, /^TARGET(.*)/) for @sources;
> 	       }});
> --------
>
> the 2nd is consistently somewhat faster for me (and they're IMHO as
> identical as they can be).


Thank you!

-- 
http://www.telecable.es/personales/gamo/
The generation of random numbers is too important to be left to chance


------------------------------

Date: Sun, 14 Dec 2014 20:19:11 -0800
From: Steve May <stevem_@nogood.com>
Subject: Re: Both substitute and filter
Message-Id: <4jtjw.1006660$412.892157@fx30.iad>

On 12/14/2014 04:49 AM, Rainer Weikusat wrote:
> Steve May <stevem_@nogood.com> writes:
>> On 12/12/2014 03:49 PM, Tim McDaniel wrote:
>>> I keep asking about idiomatic / readable / concise / cool Perl, and
>>> here's my latest question.
>>>
>>> @sources = ("hi", "there", "TARGET23", "hello", "TARGETblarg", "world");
>>> @results = ()
>>> for each member of @sources
>>>       if s/^TARGET// matched
>>>           push onto @results the result of the s///
>>>       else
>>>           do nothing
>
> [...]
>
>> #! /usr/bin/perl
>>
>> use strict;
>> use warnings;
>>
>> my @sources = ("hi", "there", "TARGET23", "hello", "TARGETblarg", "world");
>>
>> my @results = ();
>
> This is pointless: You're assigning an empty list to an empty array
> which results in an empty array.

Pointless from a strictly required standpoint, mostly yes.

For self documentation standpoint, ease of reading, and some specific 
cases I'm not so sure.

Though not required, I tend to use the form from long habit and to make 
the original declaration stand out when old eyes come back to look at 
years old code. I don't think the idiom is ambiguous or harmful in any 
way and I'm unaware of any functional difference or unintended side effects.

Assuming only stylistic issues, I will probably continue to declare 
arrays that way as long as I'm able to continue working in Perl.


>
>> for ( @sources ) {  /^TARGET(.+)/ and push @results, $1; }
>
> That's different from both the pseudo-code above and the eventual real
> code: This pattern will only match strings starting with TARGET and
> followed by at least one other character. A lone TARGET will end up as
> empty string in the result with /^TARGET(.*)/.
>
>

Ah.... OK, I think. Though it did not appear that's what the OP wanted. 
And I posted my response prior to any other listed responses on my news 
server so, whatever.

But if the OP wanted to capture the empty strings for a count or some 
such, it would seem simple to change (.+) to (.*).

\s





------------------------------

Date: Mon, 15 Dec 2014 05:59:43 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: Both substitute and filter
Message-Id: <m6ltcf$7lv$1@reader1.panix.com>

In article <m6fuv4$kj1$1@reader1.panix.com>,
Tim McDaniel <tmcd@panix.com> wrote:
>I keep asking about idiomatic / readable / concise / cool Perl, and
>here's my latest question.
>
>@sources = ("hi", "there", "TARGET23", "hello", "TARGETblarg", "world");
>@results = ()
>for each member of @sources
>    if s/^TARGET// matched
>        push onto @results the result of the s///
>    else
>        do nothing

I just realized that I didn't make it clear that I didn't want to
modify @sources, but given that foreach aliases to each element in
turn and s/// would change it, the pseudocode kind of implied it.
(Tho' my specific solutions shown didn't modify @sources and there'd
be no reason to have @results if @sources could be munged.)  My
apologies.

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Mon, 15 Dec 2014 06:03:42 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: Both substitute and filter
Message-Id: <m6ltju$1ip$1@reader1.panix.com>

In article <8fltlb-ndq.ln1@news.rtij.nl>,
Martijn Lievaart  <m@rtij.nl.invlalid> wrote:
>I might use something like (untested)
>
>@results = map s/^TARGET//, grep /^TARGET, @sources;

I really don't like to use map or grep without curly braces.

I considered that, but I don't like the repetition, but I don't know
that many people know that s//whatever/ means to use the last regexp
pattern, but I don't know that the pattern could be set in the grep
and used in the map or vice versa.  I don't like big buts.

>or even
>
>@results = map substr($_,6), grep /^TARGET, @sources;

(Why do you keep forgetting to close the regexp?)

I also don't like substr in Perl.  (The current code uses it, in fact,
and I'd like to lobby against it.)

>I would write your solution 3) as
>
>@results = map {/^TARGET(.*)/ ? $1 : ()}  @sources;

As I noted later, if the pattern matches, it'll return $1, and if it
doesn't match, it returns (), so the conditional is unneeded and it
can be just

>@results = map { /^TARGET(.*)/ }  @sources;

Unless you think that's too obscure and the ?$1:() is needed for
clarity.

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Mon, 15 Dec 2014 06:06:06 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: Both substitute and filter
Message-Id: <m6ltoe$geu$1@reader1.panix.com>

In article <m6ha5t$1u1h$1@news.ntua.gr>,
George Mpouras  <gravitalsun@hotmail.foo> wrote:
>On 13/12/2014 01:49, Tim McDaniel wrote:
>> @sources = ("hi", "there", "TARGET23", "hello", "TARGETblarg", "world");
>> @results = ()
>> for each member of @sources
>>      if s/^TARGET// matched
>>          push onto @results the result of the s///
>>      else
>>          do nothing
>
>
>use strict;
>use warnings;
>
>/^TARGET(.*)$(?{print "$^N\n"})/ for  qw/hi there TARGET23 hello 
>TARGETblarg world/

I do believe we have the winner for worst solution.  Can you work in
an eval somewhere for bonus points?

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Mon, 15 Dec 2014 06:09:46 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: Both substitute and filter
Message-Id: <m6ltva$qii$1@reader1.panix.com>

In article <4jtjw.1006660$412.892157@fx30.iad>,
Steve May  <stevem_@nogood.com> wrote:
>On 12/14/2014 04:49 AM, Rainer Weikusat wrote:
>> Steve May <stevem_@nogood.com> writes:
>>> On 12/12/2014 03:49 PM, Tim McDaniel wrote:
>>>> I keep asking about idiomatic / readable / concise / cool Perl, and
>>>> here's my latest question.
>>>>
>>>> @sources = ("hi", "there", "TARGET23", "hello", "TARGETblarg", "world");
>>>> @results = ()
>>>> for each member of @sources
>>>>       if s/^TARGET// matched
>>>>           push onto @results the result of the s///
>>>>       else
>>>>           do nothing
>>
>> [...]
>>
>>> #! /usr/bin/perl
>>>
>>> use strict;
>>> use warnings;
>>>
>>> my @sources = ("hi", "there", "TARGET23", "hello", "TARGETblarg", "world");
>>>
>>> my @results = ();
>>
>> This is pointless: You're assigning an empty list to an empty array
>> which results in an empty array.
>
>Pointless from a strictly required standpoint, mostly yes.
>
>For self documentation standpoint, ease of reading, and some specific
>cases I'm not so sure.
>
>Though not required, I tend to use the form from long habit and to
>make the original declaration stand out when old eyes come back to
>look at years old code. I don't think the idiom is ambiguous or
>harmful in any way and I'm unaware of any functional difference or
>unintended side effects.
>
>Assuming only stylistic issues, I will probably continue to declare
>arrays that way as long as I'm able to continue working in Perl.

I entirely agree.  I use it to say that I have considered the proper
initial value and that that is what I want it to be.

>>> for ( @sources ) {  /^TARGET(.+)/ and push @results, $1; }
>>
>> That's different from both the pseudo-code above and the eventual real
>> code: This pattern will only match strings starting with TARGET and
>> followed by at least one other character. A lone TARGET will end up as
>> empty string in the result with /^TARGET(.*)/.
>
>Ah.... OK, I think. Though it did not appear that's what the OP
>wanted.

Yeah, it is what I wanted: s/^TARGET// would have that effect.

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Sun, 14 Dec 2014 22:28:07 -0800 (PST)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: Both substitute and filter
Message-Id: <928251a4-ed41-4a8e-aaab-e089868810d1@googlegroups.com>

On Sunday, December 14, 2014 10:06:10 PM UTC-8, Tim McDaniel wrote:
> In article <m6ha5t$1u1h$1@news.ntua.gr>,
> George Mpouras  <gravitalsun@hotmail.foo> wrote:
> >On 13/12/2014 01:49, Tim McDaniel wrote:
> >> @sources = ("hi", "there", "TARGET23", "hello", "TARGETblarg", "world");
> >> @results = ()
> >> for each member of @sources
> >>      if s/^TARGET// matched
> >>          push onto @results the result of the s///
> >>      else
> >>          do nothing
> >
> >
> >use strict;
> >use warnings;
> >
> >/^TARGET(.*)$(?{print "$^N\n"})/ for  qw/hi there TARGET23 hello 
> >TARGETblarg world/
> 
> I do believe we have the winner for worst solution.  Can you work in
> an eval somewhere for bonus points?
> 

s/worst/ugly, obscure, noisy, indirect/.

However, embedded code assertions are no longer deprecated in the latest
Perl versions IIRC.  So tis the season of TIMTOWTDI :)

-- 
Charles DeRykus   



------------------------------

Date: Sun, 14 Dec 2014 12:07:26 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Latest version of "dedup", and questions on open files, unlink, and rm.
Message-Id: <87oar6fn9d.fsf@doppelsaurus.mobileactivedefense.com>

Eric Pozharski <whynot@pozharski.name> writes:
> with <slrnm8p5cf.j3o.hjp-usenet3@hrunkner.hjp.at> Peter J. Holzer wrote:

[...]

> Probably, ':unix' layer is to blame.  Is it safe to assume (without
> reading sources) that open() and sysopen() are no different anymore
> (both operate on IO::Handle after all),

Even without 'reading sources', one should be able to discard this as
'obviously baseless rumour' as the functionality provided by both is
substantially different. Without spending too much time on 'reading
sources', one can see that pp_open and pp_sysopen are different
functions doing different things. While both cases used to be handled in
do_openn (in 5.10.1), the current development version also splits the
backends (Perl_do_open_raw vs Perl_do_open6).

> and <$handle>/print() are just sysread()/syswrite() without fine
> control?

This depends on your definition of 'fine control': When ignoring the
differences, print is surely the same as syswrite (and, for that matter,
an elephant is really the same as a mouse -- quadruped mammals, after
all, why bother with all these cumbersome details!). The main point of
syswrite and sysread is that they provide a real-time I/O facility,
something one usually wants for programs interacting with other programs
(in other ways than 'bulk data' pipeline processing).



------------------------------

Date: Sat, 13 Dec 2014 16:24:13 +0200
From: Eric Pozharski <whynot@pozharski.name>
Subject: Re: Latest version of "dedup", and questions on open files, unlink, and rm.
Message-Id: <slrnm8oj0d.mlu.whynot@orphan.zombinet>

with <slrnm8mv00.2ff.hjp-usenet3@hrunkner.hjp.at> Peter J. Holzer wrote:
> On 2014-12-11 22:47, Robbie Hatley <see.my.sig@for.my.address> wrote:

*SKIP*
>> Or can there be cases when it's not?
> Even when a file is opened in append mode, it is initially positioned
> at the start. Only when you write to it, the file pointer jumps to the
> end.

My perl disagrees:

	% perl -wE ' 
	  say +(stat "foo.Wyoxu5.dump")[7]; 
	  open $h, ">>", "foo.Wyoxu5.dump"; 
	  say tell $h' 
	4
	4

But I've found this (ISO distributes it as latest draft -- is it C90?
section 7.21.5.3(6) (The 'fopen' function) anyway):

	Opening a file with append mode ('a' as the first character in
	the mode argument) causes all subsequent writes to the file to
	be forced to the then current end-of-file, regardless of
	intervening calls to the fseek function. In some
	implementations, opening a binary file with append mode ('b' as
	the second or third character in the above list of mode argument
	values) may initially position the file position indicator for
	the stream beyond the last data written, because of null
	character padding.

POSIX.pm doesn't help:

	% perl -MPOSIX -wE ' 
	  say +(stat "foo.Wyoxu5.dump")[7];
	  $h = fopen "foo.Wyoxu5.dump", "ab"'
	Name "main::h" used only once: possible typo at -e line 3.
	4
	Use method IO::File::open() instead at -e line 3

sysopen() is about open(2) but fopen(3), anyway:

	% perl -MFcntl -wE '
	  say +(stat "foo.Wyoxu5.dump")[7];
	  sysopen $h, "foo.Wyoxu5.dump", O_APPEND|O_WRONLY|O_BINARY;
	  say join " .. ", PerlIO::get_layers $h;
	  say tell $h' 
	4
	unix .. perlio
	4

and ":raw", isn't binary:

	% perl -wE '
	  say +(stat "foo.Wyoxu5.dump")[7];
	  open $h, ">>:raw", "foo.Wyoxu5.dump";
	  say join " .. ", PerlIO::get_layers $h; 
	  say tell $h'
	4 
	unix .. perlio
	4


What else aproach to get to "ab" of fopen(3)?

-- 
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom


------------------------------

Date: Sat, 13 Dec 2014 20:37:48 +0100
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: Latest version of "dedup", and questions on open files, unlink, and rm.
Message-Id: <slrnm8p5cf.j3o.hjp-usenet3@hrunkner.hjp.at>

On 2014-12-13 14:24, Eric Pozharski <whynot@pozharski.name> wrote:
> with <slrnm8mv00.2ff.hjp-usenet3@hrunkner.hjp.at> Peter J. Holzer wrote:
>> On 2014-12-11 22:47, Robbie Hatley <see.my.sig@for.my.address> wrote:
>
> *SKIP*
>>> Or can there be cases when it's not?
>> Even when a file is opened in append mode, it is initially positioned
>> at the start. Only when you write to it, the file pointer jumps to the
>> end.
>
> My perl disagrees:
>
> 	% perl -wE ' 
> 	  say +(stat "foo.Wyoxu5.dump")[7]; 
> 	  open $h, ">>", "foo.Wyoxu5.dump"; 
> 	  say tell $h' 
> 	4
> 	4

You are right. Perl apparently seeks to the end of the file if it is
opened in ">>" mode:

open("foo.Wyoxu5.dump", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 3
_llseek(3, 0, [4], SEEK_END)            = 0
ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0xbf879938) = -1 ENOTTY (Inappropriate ioctl for device)
_llseek(3, 0, [4], SEEK_CUR)            = 0
fstat64(3, {st_mode=S_IFREG|0644, st_size=4, ...}) = 0
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
write(1, "4\n", 24
)                      = 2

The open system call itself does NOT seek:

------------------------------------------------------------------------
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>

int main(int argc, char **argv) {
    int fd = open(argv[1], O_WRONLY | O_APPEND);
    off_t pos;

    pos = lseek(fd, 0, SEEK_CUR);
    printf("initial: %ld\n", (long)pos);

    write(fd, "a", 1);

    pos = lseek(fd, 0, SEEK_CUR);
    printf("after writing one byte: %ld\n", (long)pos);

    pos = lseek(fd, 0, SEEK_SET);
    printf("after rewind: %ld\n", (long)pos);

    char buf[1];
    int rc = read(fd, buf, 1);

    pos = lseek(fd, 0, SEEK_CUR);
    printf("after reading %d bytes: %ld\n", rc, (long)pos);

    write(fd, "b", 1);

    pos = lseek(fd, 0, SEEK_CUR);
    printf("after writing one byte: %ld\n", (long)pos);

}
------------------------------------------------------------------------
initial: 0
after writing one byte: 5
after rewind: 0
after reading -1 bytes: 0
after writing one byte: 6
------------------------------------------------------------------------
and after changing O_WRONLY to O_RDWR:
------------------------------------------------------------------------
initial: 0
after writing one byte: 7
after rewind: 0
after reading 1 bytes: 1
after writing one byte: 8
------------------------------------------------------------------------


> sysopen() is about open(2) but fopen(3), anyway:
>
> 	% perl -MFcntl -wE '
> 	  say +(stat "foo.Wyoxu5.dump")[7];
> 	  sysopen $h, "foo.Wyoxu5.dump", O_APPEND|O_WRONLY|O_BINARY;
> 	  say join " .. ", PerlIO::get_layers $h;
> 	  say tell $h' 
> 	4
> 	unix .. perlio
> 	4

Interesting. Even sysopen does the extra lseek.

        hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Sun, 14 Dec 2014 10:24:00 +0200
From: Eric Pozharski <whynot@pozharski.name>
Subject: Re: Latest version of "dedup", and questions on open files, unlink, and rm.
Message-Id: <slrnm8qi90.6m8.whynot@orphan.zombinet>

with <slrnm8p5cf.j3o.hjp-usenet3@hrunkner.hjp.at> Peter J. Holzer wrote:
> On 2014-12-13 14:24, Eric Pozharski <whynot@pozharski.name> wrote:

*SKIP*
>> sysopen() is about open(2) but fopen(3), anyway:
>>
>> 	% perl -MFcntl -wE '
>> 	  say +(stat "foo.Wyoxu5.dump")[7];
>> 	  sysopen $h, "foo.Wyoxu5.dump", O_APPEND|O_WRONLY|O_BINARY;
>> 	  say join " .. ", PerlIO::get_layers $h;
>> 	  say tell $h' 
>> 	4
>> 	unix .. perlio
>> 	4
> Interesting. Even sysopen does the extra lseek.

Probably, ':unix' layer is to blame.  Is it safe to assume (without
reading sources) that open() and sysopen() are no different anymore
(both operate on IO::Handle after all), and <$handle>/print() are just
sysread()/syswrite() without fine control?  Ought to remember that.

And POSIX.pm doesn't know O_BINARY:

	% perl -MDevel::Peek -MPOSIX -wE '
	  say +(stat "foo.Wyoxu5.dump")[7];
	  $h = POSIX::open "foo.Wyoxu5.dump", POSIX::O_APPEND|POSIX::O_WRONLY;
	  Dump $h;
	  say POSIX::lseek( $h, 0, POSIX::SEEK_CUR )'
	4
	SV = IV(0x8d92030) at 0x8d92034
	  REFCNT = 1
	  FLAGS = (IOK,pIOK)
	  IV = 3
	0

-- 
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4328
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[33052] in Perl-Users-Digest

Perl-Users Digest, Issue: 4328 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Mon Dec 15 03:09:22 2014

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Dec 15 03:09:22 2014