[31745] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 3008 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jun 28 11:09:42 2010

Date: Mon, 28 Jun 2010 08:09:06 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 28 Jun 2010     Volume: 11 Number: 3008

Today's topics:
    Re: Accessing Web Site Files Questions <tadmc@seesig.invalid>
    Re: bulk flush input <nospam-abuse@ilyaz.org>
    Re: bulk flush input <nospam-abuse@ilyaz.org>
    Re: bulk flush input <nospam-abuse@ilyaz.org>
    Re: bulk flush input <jak@isp2dial.com>
    Re: bulk flush input <derykus@gmail.com>
    Re: bulk flush input <peter@makholm.net>
    Re: Proposing a new module: Parallel::Loops <4ux6as402@sneakemail.com>
    Re: Proposing a new module: Parallel::Loops <ben@morrow.me.uk>
    Re: Proposing a new module: Parallel::Loops <willem@turtle.stack.nl>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sun, 27 Jun 2010 16:33:46 -0500
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: Accessing Web Site Files Questions
Message-Id: <slrni2fgl9.trk.tadmc@tadbox.sbcglobal.net>

E.D.G. <edgrsprj@ix.netcom.com> wrote:

> This particular program application is quite important in my opinion. 


Every program is important to the programmer who wrote it,
else it would never have been written.


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.


------------------------------

Date: Sun, 27 Jun 2010 23:45:30 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: bulk flush input
Message-Id: <slrni2fokq.cjn.nospam-abuse@powdermilk.math.berkeley.edu>

On 2010-06-26, C.DeRykus <derykus@gmail.com> wrote:
> On Jun 25, 11:16 pm, Ilya Zakharevich <nospam-ab...@ilyaz.org> wrote:
>> On 2010-06-26, C.DeRykus <dery...@gmail.com> wrote:
>>
>> >       () = <>;
>>
>> Try to do it with a terabyte file...
>>
>
> Hm, sounds like I need to look more closely...
>
> So a humongous temp array gets built with only
> the resulting  assignment being optimized away...?

Hmm, IN PRINCIPLE, one could have coded recognition of this construct,
and would somehow advise pp_readline() that its output is going to be
ignored.  However, given the frequency of this construct, I doubt this
was ever done.

> perl -MO=Concise -e "()=<>"
> 8  <@> leave[1 ref] vKP/REFC ->(end)
> 1     <0> enter ->2
> 2     <;> nextstate(main 1 -e:1) v:{ ->3
> 7     <2> aassign[t3] vKS ->8
> -        <1> ex-list lK ->6
> 3           <0> pushmark s ->4
> 5           <1> readline[t2] lK/1 ->6
> 4              <#> gv[*ARGV] s ->5
> -        <1> ex-list lK ->7
> 6           <0> pushmark s ->7
> -           <0> stub lPRM* ->-

The only way I know to advise an OP is via flags.  So one should
compare flags on the `readline' OP with those on "usual" list contents
readline.  If they are identical, there is little chance that this
construct is memory-optimized.  (But they may differ by "other
reasons" as well...)

Yours,
Ilya


------------------------------

Date: Sun, 27 Jun 2010 23:53:10 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: bulk flush input
Message-Id: <slrni2fp36.cjn.nospam-abuse@powdermilk.math.berkeley.edu>

On 2010-06-26, Willem <willem@turtle.stack.nl> wrote:
> I don't know if <> is smart enough to recognize void context though,
> can probably be tested with a large file and a memory checker tool.

I use void-context-<> all the time (to skip one line); but only with
defined $/.

I vaguely remember that about 10 years ago, I put some code to
optimize behaviour of pp_readline() in void context (or at least, had
a WISH to do so; no way to distinguish now, sigh).  And, definitely,
about the same time I had the same problem as the OP: avoiding SIGPIPE
on the OTHER side of the pipe.

Putting 2 and 2 together, I MIGHT have put there
optimization-of-<>-with-undefined-$/-in-void-context.  But no, I have
no memory of actually doing it.  And I have strong doubts about
somebody else doing it as well...

So I think it is not wise to expect that the core of Perl would be
able to help with this problem.  I would just do $/ = (1<<20), and do
a loop.

Hope this helps,
Ilya


------------------------------

Date: Sun, 27 Jun 2010 23:56:42 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: bulk flush input
Message-Id: <slrni2fp9q.cjn.nospam-abuse@powdermilk.math.berkeley.edu>

On 2010-06-27, Willem <willem@turtle.stack.nl> wrote:
> ) void context always translates to scalar context for built-in functions.
> )
> ) void context usually translates to scalar context for user-defined functions.

> What about map ?

> AFAIK, when you call <map> in void context, it turns into a <for> internally.

Irrelevant: SIDE EFFECTS of map in scalar and list context are the same.

Hope this helps,
Ilya


------------------------------

Date: Mon, 28 Jun 2010 00:41:24 +0000
From: John Kelly <jak@isp2dial.com>
Subject: Re: bulk flush input
Message-Id: <6irf26p7nj6u5jup6ro2au4escmrc7tluj@4ax.com>

On Sun, 27 Jun 2010 23:53:10 +0000 (UTC), Ilya Zakharevich
<nospam-abuse@ilyaz.org> wrote:

>I had the same problem as the OP: avoiding SIGPIPE
>on the OTHER side of the pipe.

>So I think it is not wise to expect that the core of Perl would be
>able to help with this problem.  I would just do $/ = (1<<20), and do
>a loop.

I went with the loop.

(1<<20) is a 1 meg of memory.  (1<<15) may run nearly as fast on a large
file (untested).



-- 
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php
 


------------------------------

Date: Sun, 27 Jun 2010 22:11:34 -0700 (PDT)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: bulk flush input
Message-Id: <05d74c57-2c75-4cb1-8f39-a885a2382ca8@m35g2000prn.googlegroups.com>

On Jun 27, 4:45=A0pm, Ilya Zakharevich <nospam-ab...@ilyaz.org> wrote:
> On 2010-06-26, C.DeRykus <dery...@gmail.com> wrote:
>
> > On Jun 25, 11:16 pm, Ilya Zakharevich <nospam-ab...@ilyaz.org> wrote:
> >> On 2010-06-26, C.DeRykus <dery...@gmail.com> wrote:
>
> >> > () =3D <>;
>
> >> Try to do it with a terabyte file...
>
> > Hm, sounds like I need to look more closely...
>
> > So a humongous temp array gets built with only
> > the resulting =A0assignment being optimized away...?
>
> Hmm, IN PRINCIPLE, one could have coded recognition of this construct,
> and would somehow advise pp_readline() that its output is going to be
> ignored. =A0However, given the frequency of this construct, I doubt this
> was ever done.
>
> > perl -MO=3DConcise -e "()=3D<>"
> > 8 =A0<@> leave[1 ref] vKP/REFC ->(end)
> > 1 =A0 =A0 <0> enter ->2
> > 2 =A0 =A0 <;> nextstate(main 1 -e:1) v:{ ->3
> > 7 =A0 =A0 <2> aassign[t3] vKS ->8
> > - =A0 =A0 =A0 =A0<1> ex-list lK ->6
> > 3 =A0 =A0 =A0 =A0 =A0 <0> pushmark s ->4
> > 5 =A0 =A0 =A0 =A0 =A0 <1> readline[t2] lK/1 ->6
> > 4 =A0 =A0 =A0 =A0 =A0 =A0 =A0<#> gv[*ARGV] s ->5
> > - =A0 =A0 =A0 =A0<1> ex-list lK ->7
> > 6 =A0 =A0 =A0 =A0 =A0 <0> pushmark s ->7
> > - =A0 =A0 =A0 =A0 =A0 <0> stub lPRM* ->-
>
> The only way I know to advise an OP is via flags. =A0So one should
> compare flags on the `readline' OP with those on "usual" list contents
> readline. =A0If they are identical, there is little chance that this
> construct is memory-optimized. =A0(But they may differ by "other
> reasons" as well...)
>

Thanks for all the explanations Ilya.  As you mention,
this is infrequent. Probably a ton of work to fix too.


--
Charles DeRykus


------------------------------

Date: Mon, 28 Jun 2010 09:06:35 +0200
From: Peter Makholm <peter@makholm.net>
Subject: Re: bulk flush input
Message-Id: <87sk47pres.fsf@vps1.hacking.dk>

John Kelly <jak@isp2dial.com> writes:

> This code reads STDIN and remembers the first non-empty line.  That's
> all it cares about.
>
> But it also keeps reading till EOF, acting like the "cat" utility, to
> flush the extra input and avoid broken pipe errors.
>
> But reading line by line, just to throw away the unwanted garbage, is
> inefficient.  I would like to jump out of the loop and "bulk flush" the
> remaining input stream.

If you input stream is a terminal on a posix system you can use the
tcflush() function.

    tcflush(0, TCIFLUSH)
        or warn "Couldn't flush stdin: $!";

This of course only works if stdin is a terminal and not a pipe from
some other program. This might not work on non-posix systems, this
might not work for you specific need.

This works for me:

#!/usr/bin/perl

use strict;
use warnings;

use POSIX;

my $data = '';

sleep 5;

while (<>) {
    chomp;
    /^\s*$/ and next;
    $data = $_;
    print "data=\"$data\"\n";
    last;
}

tcflush(0, TCIFLUSH)
    or warn "Couldn't flush stdin: $!";

<> or die "1 EOF\n";
<> or die "2 EOF\n";
<> or die "3 EOF\n";
<> or die "4 EOF\n";
<> or die "5 EOF\n";
<> or die "6 EOF\n";
<> or die "7 EOF\n";

__END__


------------------------------

Date: Mon, 28 Jun 2010 01:05:07 -0700 (PDT)
From: =?ISO-8859-1?Q?Peter_Valdemar_M=F8rch?= <4ux6as402@sneakemail.com>
Subject: Re: Proposing a new module: Parallel::Loops
Message-Id: <4b73d195-1bf2-49e1-b4b7-597ab9cfd38a@u26g2000yqu.googlegroups.com>

On Jun 26, 10:52 pm, Ben Morrow <b...@morrow.me.uk> wrote:
> I was still looking at the question 'why aren't you simply using
> forks?'. forks handles all this for you.

Well, because I don't want the forks API. I want the foreach
syntax. :-) The main reason is that it is so much easier to write and
read later on.

I could've implemented it using forks, but I didn't. Forks _is_
mentioned in the "SEE ALSO" section so users have a chance to explore
alternatives.

> When you say 'global' you mean 'shared in all P::L instances', right?

Yes.

> Is this a problem?

A little bit. To me, that speaks in favor of

my %output;
$pl->share(\%output)

over

my %output : Shared;

(apart from the fact that $pl->share() seems much simpler to
understand and implement)

> (You don't even need to do that if you just weaken the refs in your
> master list. Perl will replace any that go out of scope with undef.)

Ah, good point.

> I don't know how P::L deals with copying the results back. Presumably
> you have no idea whether a variable has been modified in the sub-process
> or not? What do you do if two sub-processes change the same shared var
> in different ways?

I've mentioned in the pod that only setting of hash keys and pushing
to arrays is supported in the child. I'll append to that that setting
the same key from different iterations preserves a random one of them.

> FWIW, I would cast this API rather differently.

Yeah, I'm beginning to gather that! :-) Fine, you won't be one of
P::L's users I take it...

> You don't seem to be
> trying to emulate the forks API of 'you can do anything you like', but
> instead restricting yourself to iterating over a list.

Exactly.

> In that case, why not have the API like
>
>     my $PL = Parallel::Loops->new(sub { dosomething($_) });
>     my %results = $PL->foreach(0..9);

I guess if I change that to:

  my $PL = Parallel::Loops->new( 4 );
  my %results = $PL->foreach( [0..9], sub {
      ( $_ => dosomething($_) )
  });

We could be in business. I'm presuming I can use wantarray() in the
foreach method to test if the caller is going to use the return value
and only transfer the return value from the child if it is going to be
used. It kind of breaks the analogy with foreach but doesn't hurt
otherwise, so why not.

> Well, if the user wrote
>
>     my %results;
>     {
>         my $pl = Parallel::Loops->new;
>         $pl->share(\%results);
>         $pl->async(sub { $results{$_} = foobar($_) })
>             for 0..4;
>     }
>     useResults \%results;
>
> then a call to ->joinAll in DESTROY would ensure it was called. Since
> variables (particularly those containing potentially-expensive object,
> like $pl) should be minimally-scoped, this would be the correct way to
> write that code.

I don't understand how that can be guaranteed. perldoc perltoot says:

> Perl's notion of the right time to call a destructor is not well-defined
> currently, which is why your destructors should not rely on when they
> are called.

Given that, how can i be sure that DESTROY has been called at the time
of the useResults call?

Peter


------------------------------

Date: Mon, 28 Jun 2010 14:29:59 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Proposing a new module: Parallel::Loops
Message-Id: <nnkmf7-quj2.ln1@osiris.mauzo.dyndns.org>


Quoth =?ISO-8859-1?Q?Peter_Valdemar_M=F8rch?= <4ux6as402@sneakemail.com>:
> On Jun 26, 10:52 pm, Ben Morrow <b...@morrow.me.uk> wrote:
> > I was still looking at the question 'why aren't you simply using
> > forks?'. forks handles all this for you.
> 
> Well, because I don't want the forks API. I want the foreach
> syntax. :-) The main reason is that it is so much easier to write and
> read later on.

OK.

> > You don't seem to be
> > trying to emulate the forks API of 'you can do anything you like', but
> > instead restricting yourself to iterating over a list.
> 
> Exactly.
> 
> > In that case, why not have the API like
> >
> >     my $PL = Parallel::Loops->new(sub { dosomething($_) });
> >     my %results = $PL->foreach(0..9);
> 
> I guess if I change that to:
> 
>   my $PL = Parallel::Loops->new( 4 );
>   my %results = $PL->foreach( [0..9], sub {
>       ( $_ => dosomething($_) )
>   });
> 
> We could be in business. I'm presuming I can use wantarray() in the
> foreach method to test if the caller is going to use the return value
> and only transfer the return value from the child if it is going to be
> used. It kind of breaks the analogy with foreach but doesn't hurt
> otherwise, so why not.

It's now more analogous to map than foreach, but I don't see that as a
problem.

> 
> > Well, if the user wrote
> >
> >     my %results;
> >     {
> >         my $pl = Parallel::Loops->new;
> >         $pl->share(\%results);
> >         $pl->async(sub { $results{$_} = foobar($_) })
> >             for 0..4;
> >     }
> >     useResults \%results;
> >
> > then a call to ->joinAll in DESTROY would ensure it was called. Since
> > variables (particularly those containing potentially-expensive object,
> > like $pl) should be minimally-scoped, this would be the correct way to
> > write that code.
> 
> I don't understand how that can be guaranteed. perldoc perltoot says:
> 
> > Perl's notion of the right time to call a destructor is not well-defined
> > currently, which is why your destructors should not rely on when they
> > are called.
> 
> Given that, how can i be sure that DESTROY has been called at the time
> of the useResults call?

Hmm, I'd forgotten that was there. It's complete nonsense: in Perl 5,
destructors are always called promptly, and there are *lots* of modules
relying on that fact so it isn't going to go away. (Perl 6 is a
different matter, of course.)

Ben



------------------------------

Date: Mon, 28 Jun 2010 15:07:10 +0000 (UTC)
From: Willem <willem@turtle.stack.nl>
Subject: Re: Proposing a new module: Parallel::Loops
Message-Id: <slrni2heku.1e0t.willem@turtle.stack.nl>

Peter Valdemar M?rch wrote:
)> > my %output;
)> > $pl->tieOutput( \%output );
)>
)> Why are you using tie here?
)
) Hmm... I thought the idea would be more obvious than it apparently
) is...
)
) Outside the $pl->foreach() loop, we're running in the parent process.
) Inside the $pl->foreach() loop, we're running in a child process. $pl-
)>tieOutput is actually the raison d'etre of Parallel::Loops. When the
) child process has a result, it stores it in %output (which is tied
) with Tie::Hash behind the scenes in the child process).
)
) Behind the scenes, when the child process exits, it sends the results
) (the keys written to %output) back to the parent process's version/
) copy of %output, so that the user of Parallel::Loops doesn't have to
) do any inter-process communication.

Isn't there some easier method, where you don't have to screw around with
output maps at all ?

If the following API would work, that would be the easiest, IMO:

  my @result = async_map { do_something($_) } @array;

Where async_map takes care of all the details of creating the threads,
gathering all the output, et cetera.  Or does that already exist ?

(The simple implementation is only a few lines of code, but it could
 then be easily extended to use a limited number of threads, or keep
 a thread pool handy, or something like that.)


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3008
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[31745] in Perl-Users-Digest

Perl-Users Digest, Issue: 3008 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Mon Jun 28 11:09:42 2010

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jun 28 11:09:42 2010