[31790] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 3053 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Jul 31 06:09:23 2010

Date: Sat, 31 Jul 2010 03:09:06 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sat, 31 Jul 2010     Volume: 11 Number: 3053

Today's topics:
    Re: Can this be done (by a noob :)) <thomas@tifozi.net>
    Re: Can this be done (by a noob :)) <sherm.pendley@gmail.com>
    Re: Can this be done (by a noob :)) <ben@morrow.me.uk>
    Re: Can this be done (by a noob :)) <thomas@tifozi.net>
    Re: Can this be done (by a noob :)) <thomas@tifozi.net>
    Re: Can this be done (by a noob :)) <jurgenex@hotmail.com>
    Re: Can this be done (by a noob :)) <thomas@tifozi.net>
    Re: Can this be done (by a noob :)) <sherm.pendley@gmail.com>
    Re: Can this be done (by a noob :)) <uri@StemSystems.com>
    Re: How can I tell if a perl interpreter was built for  <rurban@x-ray.at>
        More perl-compiler optimizations <rurban@x-ray.at>
    Re: piped open and shell metacharacters <derykus@gmail.com>
    Re: piped open and shell metacharacters <jak@isp2dial.com>
    Re: piped open and shell metacharacters <derykus@gmail.com>
    Re: piped open and shell metacharacters <jak@isp2dial.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sat, 31 Jul 2010 02:33:06 +0200
From: "Thomas Andersson" <thomas@tifozi.net>
Subject: Re: Can this be done (by a noob :))
Message-Id: <8bh999F9jvU1@mid.individual.net>

Hmm, been playing around a bit and gotten further than I had thought.
I open a file and read in the next webpage to be processed (a id number) and 
set up the page count to 1 (each ID to process can have any number of 
pages).
I create my URL from page count and current ID (pid)
The idea I have is that it will loop as long as there is a page to grab by 
increasing the page count (this plan was flawed I realised though, but 
that's another problem).
As it is now it keeps grabbing the same page over and over thousands of 
times (creating new files for each loop).

#Create URL for sid list from pid and page count.
my $pcnt = 1;
my $page = get 
"http://csr.wwiionline.com/scripts/services/persona/sorties.jsp?page=$pcnt&pid=$pid";
while ($page) {
 if ($page) {
     print "Site is alive\n";
 }
 else {
     print "Site is not accessible\n";
 };

#Create filename and write file, then save grabbed webpage into it.
open FILE, ">", "c:\\scr\\$pid-pg$pcnt.txt" or die $!;
print FILE $page;
$pcnt += 1;
};

I guess the URL doesn't get updated by the increased pagecount, any 
suggestions on how to fix that part? 




------------------------------

Date: Fri, 30 Jul 2010 21:33:27 -0400
From: Sherm Pendley <sherm.pendley@gmail.com>
Subject: Re: Can this be done (by a noob :))
Message-Id: <m2hbjg4eqg.fsf@sherm.shermpendley.com>

"Thomas Andersson" <thomas@tifozi.net> writes:

> As it is now it keeps grabbing the same page over and over thousands of 
> times (creating new files for each loop).

Not quite - the get() is outside of the loop, so it's grabbing the page
only once, and saving it over and over.

> #Create URL for sid list from pid and page count.
> my $pcnt = 1;

I'd put the "base" URL in a separate variable, to avoid repetition:

  my $base = 'http://csr.wwiionline.com/scripts/services/persona/sorties.jsp';

> my $page = get 
> "http://csr.wwiionline.com/scripts/services/persona/sorties.jsp?page=$pcnt&pid=$pid";

So, using the "base" url, this would become:

  my $page = get "$base?page=$pcnt&pid=$pid";

> while ($page) {

The if() is redundant here; if $page is false, the while() will exit
and the if() won't be reached.

>      print "Site is alive\n";

> #Create filename and write file, then save grabbed webpage into it.
> open FILE, ">", "c:\\scr\\$pid-pg$pcnt.txt" or die $!;

You can use forward slashes on Windows too - it's only the command
shell (aka "DOS Box") that requires backslashes. Also, it's a good idea
to include the filename you're trying to open when reporting an error,
because that can help you figure out why it failed.

  my $outfile = "c:/scr/$pid-pg$pcnt.txt";
  open FILE, ">", $outfile or die "Could not open $outfile: $!";

> print FILE $page;
> $pcnt += 1;

Now that you've updated $pcnt, you need to fetch the next page and
store it in $page.

  $page = get "$base?page=$pcnt&pid=$pid";

> };
>
> I guess the URL doesn't get updated by the increased pagecount

Right. When you interpolate a variable into a string, it's a one-time
deal. The current value of the interpolated variable is used, but no
long-lasting relationship exists between them, so the string is not
updated when the interpolated variable's value changes.

For example, this will print the same thing ten times:

#!/usr/bin/perl
use warnings;
use strict;

my $num = 0;
my $string = "Num: $num\n";
for $num (1 .. 10) {
  print $string;
}

Compare that with this, where a new value is assigned to $string each
time around the loop:

#!/usr/bin/perl
use warnings;
use strict;

for my $num (1 .. 10) {
  my $string = "Num: $num\n";
  print $string;
}

sherm--

-- 
Sherm Pendley                <www.shermpendley.com>
                             <www.camelbones.org>
Cocoa Developer


------------------------------

Date: Sat, 31 Jul 2010 02:35:18 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Can this be done (by a noob :))
Message-Id: <m7bci7-4e82.ln1@osiris.mauzo.dyndns.org>


Quoth "Thomas Andersson" <thomas@tifozi.net>:
> Hmm, been playing around a bit and gotten further than I had thought.
> I open a file and read in the next webpage to be processed (a id number) and 
> set up the page count to 1 (each ID to process can have any number of 
> pages).
> I create my URL from page count and current ID (pid)
> The idea I have is that it will loop as long as there is a page to grab by 
> increasing the page count (this plan was flawed I realised though, but 
> that's another problem).
> As it is now it keeps grabbing the same page over and over thousands of 
> times (creating new files for each loop).
> 
> #Create URL for sid list from pid and page count.
> my $pcnt = 1;
> my $page = get 
> "http://csr.wwiionline.com/scripts/services/persona/sorties.jsp?page=$pcnt&pid=$pid";

This happens once, before the loop, when $pcnt = 1.

> while ($page) {
>  if ($page) {
>      print "Site is alive\n";
>  }
>  else {
>      print "Site is not accessible\n";
>  };
> 
> #Create filename and write file, then save grabbed webpage into it.
> open FILE, ">", "c:\\scr\\$pid-pg$pcnt.txt" or die $!;

This happens every time around the loop, with different values of $pcnt.

> print FILE $page;
> $pcnt += 1;
> };
> 
> I guess the URL doesn't get updated by the increased pagecount, any 
> suggestions on how to fix that part? 

You seem to be expecting Perl variables to act like macros; they don't.
If you want to recreate the URL and re-fetch the new page every time you
go round the loop, you need the 'my $page = get...' line *inside* the
loop.

Also: get into the habit, now, of keeping you filehandles in proper
variables. It will make life easier later.

    open my $FILE, ">", "..." or ...;

Ben



------------------------------

Date: Sat, 31 Jul 2010 05:07:52 +0200
From: "Thomas Andersson" <thomas@tifozi.net>
Subject: Re: Can this be done (by a noob :))
Message-Id: <8bhibpFjraU1@mid.individual.net>

Sherm Pendley wrote:

> I'd put the "base" URL in a separate variable, to avoid repetition:
>  my $base =
> 'http://csr.wwiionline.com/scripts/services/persona/sorties.jsp';

Excellent idea, just realised that the links I will collect from the page
also uses the same base. Yhanks for the examples, helps me a lot!

>> while ($page) {
> The if() is redundant here; if $page is false, the while() will exit
> and the if() won't be reached.

Sorry, didn't quite get what you were saying here?
One problem I've realised that kinda breaks this is that if you just up the
page count it will never fail and exit as you just keep getting empty sortie
pages back witha  ever higher page number. (there's a string "No more
sorties found" on them though that I guess could be detected and used to
exit the loop).

> You can use forward slashes on Windows too - it's only the command
> shell (aka "DOS Box") that requires backslashes. Also, it's a good
> idea to include the filename you're trying to open when reporting an
> error, because that can help you figure out why it failed.

Ah, didn't realize, good to know, will definitely follow your suggestion
(might as well pick up good habbits early on).
Thanks for your good advice, I really apreciate it (and will likely come
back time and again for more ;) ).

Best Wishes
Thomas 




------------------------------

Date: Sat, 31 Jul 2010 05:13:17 +0200
From: "Thomas Andersson" <thomas@tifozi.net>
Subject: Re: Can this be done (by a noob :))
Message-Id: <8bhiltFldqU1@mid.individual.net>

> Also: get into the habit, now, of keeping you filehandles in proper
> variables. It will make life easier later.
>
>    open my $FILE, ">", "..." or ...;

Will definitely try to pick up good habbits on coding and formatting so 
thanks for advice.
But if I createa variable of the filehandler like this, won't it contain the 
filepath then, so when I do the print $FILE it will print the filepath 
instead of the content of the file as I want? Or am I missunderstanding? 
(quite likely).

Best Wishes
Thomas 




------------------------------

Date: Fri, 30 Jul 2010 20:14:40 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Can this be done (by a noob :))
Message-Id: <t35756lo3mokncornv21lad6l9bgk5a4t1@4ax.com>

"Thomas Andersson" <thomas@tifozi.net> wrote:
>As it is now it keeps grabbing the same page over and over thousands of 
>times (creating new files for each loop).
>
>my $pcnt = 1;
>my $page = get 
>"http://csr.wwiionline.com/scripts/services/persona/sorties.jsp?page=$pcnt&pid=$pid";
>while ($page) {
> if ($page) {
>     print "Site is alive\n";
> }
> else {
>     print "Site is not accessible\n";
> };
>
>#Create filename and write file, then save grabbed webpage into it.
>open FILE, ">", "c:\\scr\\$pid-pg$pcnt.txt" or die $!;
>print FILE $page;
>$pcnt += 1;
>};
>
>I guess the URL doesn't get updated by the increased pagecount, any 
>suggestions on how to fix that part? 

It may or it may not. Had you used better indentation then you might
have spotted that your get() is outside of the loop, therefore it is
executed only once, therefore the value of $page never changes, and
therefore of course your loop never terminates because the loop
condition will always be the same value as in the first test.

jue


------------------------------

Date: Sat, 31 Jul 2010 05:46:47 +0200
From: "Thomas Andersson" <thomas@tifozi.net>
Subject: Re: Can this be done (by a noob :))
Message-Id: <8bhkkhFu1qU1@mid.individual.net>

Using the suggestions from here I've rewritten it a bit, now it works as far 
ass grabbing additional pages and storing. Now I just need to figure out how 
to make it exit the loop under either of two conditions (found a processed 
link or reached end of pages).
Eventually an additional loop need to be inserted processing the subpages we 
collect the links for in these pages. (The plan is to build lists of link 
from these pages and then collect data from those pages (and they in turn 
contain lists variable number of data).

# Define some variables.
my $pbase = 
'http://csr.wwiionline.com/scripts/services/persona/sorties.jsp';
my $pcnt = 1;
my $pidfile = 'c:/scr/pidlist.txt';
# Open list of pid's and set first one as current pid.
open PIDLIST, "<", $pidfile or die "Could not open $pidfile: $!";
my $pid = <PIDLIST>;
print $pid; # print just so we know we have a pid to process.
chomp $pid; # Remove endline from pid.
#Create URL for sid list from pid and page count.
my $page = get "$pbase?page=$pcnt&pid=$pid";
while ($page) {
  # Create file for storing pages containing the sids.
  my $tmpf = "c:/scr/$pid.txt";
  open TEMPF, ">>", $tmpf or die "Could not open $tmpf: $!";
  print TEMPF $page;  # Store grabbed webpage into the file
  $pcnt += 1; # Update page number for next grab.
  $page = get "$pbase?page=$pcnt&pid=$pid"; # Grab next page.
};




------------------------------

Date: Sat, 31 Jul 2010 00:08:54 -0400
From: Sherm Pendley <sherm.pendley@gmail.com>
Subject: Re: Can this be done (by a noob :))
Message-Id: <m2vd7w5m3t.fsf@sherm.shermpendley.com>

"Thomas Andersson" <thomas@tifozi.net> writes:

> Sherm Pendley wrote:
>
>>> while ($page) {
>> The if() is redundant here; if $page is false, the while() will exit
>> and the if() won't be reached.
>
> Sorry, didn't quite get what you were saying here?

You had originally written something like this:

  while ($page) {
    if ($page) {
      # do stuff
    } else {
    }
  }

Since the while() loop repeats only if $page evaluates to a true
value, you don't need to check $page again with an if(). If $page is
false, the body of the loop will not execute at all, so by the time
you reach the line that the if() is on, you already know that $page
is true. So, the if() block will always run, and the else block never
will; that being the case, it's simpler to just omit the if():

  while ($page) {
    # do stuff
  }

Note that while() only checks its condition *once* before repeating
its block of code. So you can't omit the if(), if the value of $page
might get changed inside the while(), before reaching the if():

  while ($page) {

    # code that might change $page

    # check $page again, because it might have been changed, and
    # the while() loop won't check again until the next time we get
    # to the top of the loop

    if ($page) {
      # do stuff
    }
  }

sherm--

-- 
Sherm Pendley                <www.shermpendley.com>
                             <www.camelbones.org>
Cocoa Developer


------------------------------

Date: Sat, 31 Jul 2010 00:32:11 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: Can this be done (by a noob :))
Message-Id: <87tyngp8z8.fsf@quad.sysarch.com>

>>>>> "TA" == Thomas Andersson <thomas@tifozi.net> writes:

  TA> Using the suggestions from here I've rewritten it a bit, now it
  TA> works as far ass grabbing additional pages and storing. Now I just
  TA> need to figure out how to make it exit the loop under either of
  TA> two conditions (found a processed link or reached end of pages).
  TA> Eventually an additional loop need to be inserted processing the
  TA> subpages we collect the links for in these pages. (The plan is to
  TA> build lists of link from these pages and then collect data from
  TA> those pages (and they in turn contain lists variable number of
  TA> data).

so you need to put some conditionals in the loop. first, how would you
know when the pages are done? can you look for a link to the next page
and exit the loop if it isn't there? then define what a 'processed link'
is. keep track (likely in a hash) of processed links and if you find one
exit the loop. exiting a loop is easy, use the last function.


  TA> # Define some variables.

use less comment. make your comments mean something outside the
code. code is what, comments are why. and you are writing code to be
read by a maintainer. always keep that person in your mind and your code
will be better for it.

  TA> my $pbase = 
  TA> 'http://csr.wwiionline.com/scripts/services/persona/sorties.jsp';
  TA> my $pcnt = 1;
  TA> my $pidfile = 'c:/scr/pidlist.txt';
  TA> # Open list of pid's and set first one as current pid.

have you ever heard of white space? jamming lines of code together makes
major migraines when reading it. loosen up a little. blank lines between
sections is a good idea.

  TA> open PIDLIST, "<", $pidfile or die "Could not open $pidfile: $!";
  TA> my $pid = <PIDLIST>;
  TA> print $pid; # print just so we know we have a pid to process.

comments on the code line are a poor idea in most cases. when they are
long comments it is a horrible idea.

  TA> chomp $pid; # Remove endline from pid.

again, you are telling us what you just did. redundant to anyone who
knows what chomp is.

  TA> #Create URL for sid list from pid and page count.

this is actually getting the page AND building the url.

  TA> my $page = get "$pbase?page=$pcnt&pid=$pid";
  TA> while ($page) {

bah. it is not clear why you are testing page in the loop. and you have
two duplicate lines with the get. make it an infinite loop and exit when
the get fails.

  TA>   # Create file for storing pages containing the sids.
  TA>   my $tmpf = "c:/scr/$pid.txt";
  TA>   open TEMPF, ">>", $tmpf or die "Could not open $tmpf: $!";
  TA>   print TEMPF $page;  # Store grabbed webpage into the file

you can do that with getstore or use File::Slurp's write_file (from cpan). 

use File::Slurp ;

	write_file( "c:/scr/$pid.txt", $page ) ;

much easier to read.

  TA>   $pcnt += 1; # Update page number for next grab.
  TA>   $page = get "$pbase?page=$pcnt&pid=$pid"; # Grab next page.
  TA> };

here is a better loop:

	while( 1 ) {

		my $page = get "$pbase?page=$pcnt&pid=$pid";
		last unless $page ;
		write_file( "c:/scr/$pid.txt", $page ) ;
	}

short, easy to read, easy to maintain. now you can add in the checks for
exiting the loop and it will be easier.

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Sat, 31 Jul 2010 10:53:10 +0200
From: Reini Urban <rurban@x-ray.at>
Subject: Re: How can I tell if a perl interpreter was built for 32 or 64 bits?
Message-Id: <4c53e477$0$15756$91cee783@newsreader04.highway.telekom.at>

Ben Morrow schrieb:
>
> Quoth "Peter J. Holzer"<hjp-usenet2@hjp.at>:
>> On 2010-07-29 18:43, Sherm Pendley<sherm.pendley@gmail.com>  wrote:
>>> David Filmer<usenet@davidfilmer.com>  writes:
>>>> How can I tell if a perl interpreter was built at 32 or 64 bits?
>>
>> That depends on what you mean by "built at X bits".
>>
>> IV size? pointer size? Register size of the architecture?
>> I'm guessing the latter.
>
> 'Register size of the architecture' is an ambiguous term. An x86-64
> machine running in compatibility mode (which is how 32bit programs are
> run under a 64bit OS) has 16bit, 32bit and 64bit registers available,
> but the default address size is still 32 bits.
>
> The normal meaning of this question is 'what size are my pointers', to
> which the answer is perl -V:ptrsize.
>
>>>> If I do "perl -V" I see:
>>>>     use64bitint=undef, use64bitall=undef
>>>
>>> Yep, that's how you can tell.
>>
>> No, not really.
>>
>> use64bitint says whether IVs are 64 bit. This can be achieved on a 32
>> bit system if "long long" is available:
>>
>> |Summary of my perl5 (revision 5 version 12 subversion 1) configuration:
>> |
>> |  Platform:
>> |    osname=linux, osvers=2.6.32-3-686, archname=i686-linux-64int
>> [...]
>> |    use64bitint=define, use64bitall=undef, uselongdouble=undef
>> [...]
>> |    ivtype='long long', ivsize=8, nvtype='double', nvsize=8,
>> Off_t='off_t', lseeksize=8
>>
>> OTOH, you probably can have a 32bit IV even on a 64 bit system.
>>
>> use64bitall might be more conclusive (at least you need both 64 bit ints
>> and 64 bit pointers to get it, which almost certainly means a 64bit
>> architecture).
>
> It's possible at least on some architectures to choose whether to
> use64bitall or not (otherwise the option wouldn't exist). As I said
> earlier, the only values that actually matter are ptrsize and ivsize.

Yes, yes, yes.
The only right answer is

$ perl -V:ptrsize
ptrsize='4';

on 32-bit

$ perl -V:ptrsize
ptrsize='8';

on 64-bit


------------------------------

Date: Sat, 31 Jul 2010 11:14:34 +0200
From: Reini Urban <rurban@x-ray.at>
Subject: More perl-compiler optimizations
Message-Id: <4c53e97b$0$15758$91cee783@newsreader04.highway.telekom.at>

Immediately after I released B-C-1.27
   http://search.cpan.org/dist/B-C/
I added two new compiler optimizations to svn
http://code.google.com/p/perl-compiler/source/detail?r=498

-fro-inc read-only strings of @INC, %INC entries and
   global curpad names and symbols
assuming that nobody wants to change those names and paths at run-time

and -fno-destruct - no perl_destruct
which leaves the optree and sv data cleanup to exit.

There's no need in single process to cleanup by ourselves.
But this is tricky, since perl_destruct does only the destruction, but 
much more.
But I want it desperately, since copy-on-grow on strings cannot be used 
since 5.10 because there are no flags for hek's which let the compiler 
skip static hek's. (as it was supported until 5.10).
That means all hek strings have to be allocated at run-time, which is 
slow. With -no-destruct -fcog (copy-on-grow) can be enabled again.

copy-on-grow: we start with static strings, and once someone wants to 
extend it, we realloc it to the heap.

We save a lot of time with static init, but we loose a lot of time with 
run-time destruct.

What is needed and what does perl_destruct really do?
DESTROY hooks? hmm, this would mean to run through all the sv's and 
check for DESTROY hooks.
END blocks? call_list(PL_scopestack_ix, PL_endav) looks like so.
IO teardown for sure.
thread cancellation for sure.

e.g. without perl_destruct perl -e'print "bla"' would not print anything.

This is my current attempt:
int fast_perl_destruct( PerlInterpreter *my_perl ) {
     dVAR;
     VOL signed char destruct_level;  /* see possible values in 
intrpvar.h */
     HV *hv;
#ifdef DEBUG_LEAKING_SCALARS_FORK_DUMP
     pid_t child;
#endif

     PERL_ARGS_ASSERT_PERL_DESTRUCT;
#ifndef MULTIPLICITY
     PERL_UNUSED_ARG(my_perl);
#endif

     assert(PL_scopestack_ix == 1);

     /* wait for all pseudo-forked children to finish */
     PERL_WAIT_FOR_CHILDREN;

     destruct_level = PL_perl_destruct_level;
#ifdef DEBUGGING
     {
         const char * const s = PerlEnv_getenv("PERL_DESTRUCT_LEVEL");
         if (s) {
             const int i = atoi(s);
             if (destruct_level < i)
                 destruct_level = i;
         }
     }
#endif

     if (PL_exit_flags & PERL_EXIT_DESTRUCT_END) {
         dJMPENV;
         int x = 0;

         JMPENV_PUSH(x);
         PERL_UNUSED_VAR(x);
         if (PL_endav && !PL_minus_c)
             call_list(PL_scopestack_ix, PL_endav);
         JMPENV_POP;
     }
     LEAVE;
     FREETMPS;
     assert(PL_scopestack_ix == 0);

     /* Need to flush since END blocks can produce output */
     my_fflush_all();

     if (CALL_FPTR(PL_threadhook)(aTHX)) {
         /* Threads hook has vetoed further cleanup */
         PL_veto_cleanup = TRUE;
         return STATUS_EXIT;
     }
     PerlIO_destruct(aTHX);
}

-- 
Reini


------------------------------

Date: Fri, 30 Jul 2010 16:20:45 -0700 (PDT)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: piped open and shell metacharacters
Message-Id: <446e0d21-ef77-4eb5-bda4-2cc685395f48@x18g2000pro.googlegroups.com>

On Jul 30, 7:39=A0am, John Kelly <j...@isp2dial.com> wrote:
> The Camel book, 16.3.1. Anonymous Pipes says:
>
> > Perl uses your default system shell (/bin/sh on Unix) whenever a pipe
> > command contains special characters that the shell cares about. If
> > you're only starting one command, and you don't need--or don't want--to
> > use the shell, you can use the multi-argument form of a piped open ...
> > ... But then you don't get I/O redirection, wildcard expansion, or
> > multistage pipes, since Perl relies on your shell to do those.
>
> and 29.2.104. open says
>
> > Any pipe command containing shell metacharacters such as wildcards or
> > I/O redirections is passed to your system's canonical shell (/bin/sh on
> > Unix), so those shell-specific constructs can be processed first. If no
> > metacharacters are found, Perl launches the new process itself without
> > calling the shell.
>
> I have a script that traps the standard output of any command passed in
> as args to the script. =A0 My piped open uses 2>&1 to grab stderr as well
> as stdout. =A0I thought > was a shell metacharacter so I expected to see
> /bin/sh between my script and the trapped command when doing ps ax. =A0Bu=
t
> in many cases, Perl runs the trapped command directly, without needing
> /bin/sh.
>
> You can see that by running the script like this:
>
> ./myscript sleep 10
>
> and then doing a ps ax before the sleep ends. =A0Is the book wrong, or I
> am I missing something? =A0Here is the script:
>
> #!/usr/bin/perl
>
> # =A0 Define author
> # =A0 =A0 =A0 John Kelly, July 28, 2010
>
> # =A0 Define copyright
> # =A0 =A0 =A0 Copyright John Kelly, 2010. All rights reserved.
>
> # =A0 Define license
> # =A0 =A0 =A0 Licensed under the Apache License, Version 2.0 (the "Licens=
e");
> # =A0 =A0 =A0 you may not use this work except in compliance with the Lic=
ense.
> # =A0 =A0 =A0 You may obtain a copy of the License at:
> # =A0 =A0 =A0http://www.apache.org/licenses/LICENSE-2.0
>
> # =A0 Define symbols and (words)
> # =A0 =A0 =A0 OT ........... =A0Output Trap
> # =A0 =A0 =A0 bas0 ......... =A0basename of $0
> # =A0 =A0 =A0 binx ......... =A0binary executable
> # =A0 =A0 =A0 tt ........... =A0temporary time
>
> use strict;
> use FileHandle;
> use File::Basename;
> use POSIX qw (strftime);
>
> STDOUT->autoflush (1);
> STDERR->autoflush (1);
>
> my $bas0 =3D basename ($0);
> my $binx;
>
> unless ($binx =3D shift @ARGV) {
> =A0 =A0 print "Usage: ", $bas0, " binary.executable [args]\n";
> =A0 =A0 exit 1;
>
> }
>
> my $basx =3D basename ($binx);
>
  my $kid;  # edit #1

> if (!defined ($kid =3D open OT, "$binx @ARGV 2>&1 |")) {  # edit # 2
> =A0 =A0 printf "%s -> %s: failure starting %s: $!\n", &tt, $bas0, $binx;
> =A0 =A0 exit 1;}
>
  print "parent shell=3D",getppid()," perl process=3D$$",   # edit # 3
                                  " perl kid=3D$kid\n";
> while (<OT>) {
> =A0 =A0 /^\s*$/ && next;
> =A0 =A0 printf "%s -> %s: ", &tt, $basx;
> =A0 =A0 print $_;}
>
> if (!(close OT) && $!) {
> =A0 =A0 printf "%s -> %s: failure closing OT: $!\n", &tt, $bas0;} else {
>
> =A0 =A0 if ($? & 127) {
> =A0 =A0 =A0 =A0 printf "%s -> %s: %s signal %d, %s coredump\n", &tt, $bas=
0, $basx,
> =A0 =A0 =A0 =A0 =A0 ($? & 127), ($? & 128) ? 'with' : 'without';
> =A0 =A0 } else {
> =A0 =A0 =A0 =A0 printf "%s -> %s: %s exit value %d\n", &tt, $bas0, $basx,=
 $? >> 8;
> =A0 =A0 }
>
> }
>
> sub tt {
> =A0 =A0 strftime "%a %b %e %H:%M:%S %Z %Y", localtime;}
>


On FreeBSD (with small edits above), I don't see that
happening.

$ myscript.pl  sleep 60
parent shell=3D71889 perl process=3D75147  perl kid=3D75148

$ ps -ax
  PID  TT  STAT      TIME COMMAND
71889   2  SNs    0:00.01 -bash (bash)
75147   2  SN+    0:00.02 /usr/bin/perl ./shell.pl sleep 60
75148   2  SN+    0:00.00 sleep 60
 ....

I believe execl in the perl kid launches a shell
which then gets overlaid by sleep(). From doio.c:

PerlProc_execl(PL_sh_path, "sh", "-c", cmd, (char *)NULL);

--
Charles DeRykus


------------------------------

Date: Sat, 31 Jul 2010 00:21:38 +0000
From: John Kelly <jak@isp2dial.com>
Subject: Re: piped open and shell metacharacters
Message-Id: <ahq6565usrtdkpga7lipj9l0qgd9coltjf@4ax.com>

On Fri, 30 Jul 2010 16:20:45 -0700 (PDT), "C.DeRykus"
<derykus@gmail.com> wrote:

>On Jul 30, 7:39 am, John Kelly <j...@isp2dial.com> wrote:

>> I have a script that traps the standard output of any command passed in
>> as args to the script.   My piped open uses 2>&1 to grab stderr as well
>> as stdout.  I thought > was a shell metacharacter so I expected to see
>> /bin/sh between my script and the trapped command when doing ps ax.  But
>> in many cases, Perl runs the trapped command directly, without needing
>> /bin/sh.

>On FreeBSD (with small edits above), I don't see that
>happening.
>
>$ myscript.pl  sleep 60
>parent shell=71889 perl process=75147  perl kid=75148

That's what I'm saying; the kid pid is only 1 greater than the perl pid,
which means there was never a shell pid launched.


>$ ps -ax
>  PID  TT  STAT      TIME COMMAND
>71889   2  SNs    0:00.01 -bash (bash)
>75147   2  SN+    0:00.02 /usr/bin/perl ./shell.pl sleep 60
>75148   2  SN+    0:00.00 sleep 60
>....
>
>I believe execl in the perl kid launches a shell
>which then gets overlaid by sleep().

I don't think so.

You could overlay the shell by prefixing the command with the "exec"
shell builtin, but I didn't do that.

Seems like perl is recognizing 2>&1 as a limited special case, copying
fd1 to fd2 , then exec'ing the binary directly, without using a shell.


-- 
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php
 


------------------------------

Date: Fri, 30 Jul 2010 21:40:51 -0700 (PDT)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: piped open and shell metacharacters
Message-Id: <ac557f9f-0a98-40cb-b6cb-0689b380f1dc@a4g2000prm.googlegroups.com>

On Jul 30, 5:21=A0pm, John Kelly <j...@isp2dial.com> wrote:
> On Fri, 30 Jul 2010 16:20:45 -0700 (PDT), "C.DeRykus"
>
> <dery...@gmail.com> wrote:
> >On Jul 30, 7:39=A0am, John Kelly <j...@isp2dial.com> wrote:
> >> ...

> >On FreeBSD (with small edits above), I don't see that
> >happening.
>
> >$ myscript.pl =A0sleep 60
> >parent shell=3D71889 perl process=3D75147 =A0perl kid=3D75148
>
JK> That's what I'm saying; the kid pid is only 1 greater than
JK> the perl pid, which means there was never a shell pid
JK> launched.
>
> >$ ps -ax
> > =A0PID =A0TT =A0STAT =A0 =A0 =A0TIME COMMAND
> >71889 =A0 2 =A0SNs =A0 =A00:00.01 -bash (bash)
> >75147 =A0 2 =A0SN+ =A0 =A00:00.02 /usr/bin/perl ./shell.pl sleep 60
> >75148 =A0 2 =A0SN+ =A0 =A00:00.00 sleep 60
> >....
>
> >I believe execl in the perl kid launches a shell
> >which then gets overlaid by sleep().
>
JK> I don't think so.
JK>
JK> You could overlay the shell by prefixing the command
JK> with the "exec" shell builtin, but I didn't do that.
JK>
JK> Seems like perl is recognizing 2>&1 as a limited special
JK> case, copying
JK> fd1 to fd2 , then exec'ing the binary directly, without
JK> sing a shell.

Maybe I missed something but I anything supporting your
supposition in the source or docs:

perlfaq8:
   If the second argument to a piped open() contains shell
   metacharacters, perl fork()s, then exec()s a shell to
   decode the metacharacters and eventually run the desired
   program...

perlopentut:
   But if the command contains special shell characters, such
   as ">" or "*", called 'metacharacters', Perl does not execute
   the command directly. Instead, Perl runs the shell, which then
   tries to run the command.

--
Charles DeRykus


------------------------------

Date: Sat, 31 Jul 2010 05:21:50 +0000
From: John Kelly <jak@isp2dial.com>
Subject: Re: piped open and shell metacharacters
Message-Id: <blc756lihvbobllkdl5lchofrjsl1d83f1@4ax.com>

On Fri, 30 Jul 2010 21:40:51 -0700 (PDT), "C.DeRykus"
<derykus@gmail.com> wrote:

>JK> Seems like perl is recognizing 2>&1 as a limited special
>JK> case, copying
>JK> fd1 to fd2 , then exec'ing the binary directly, without
>JK> sing a shell.

>Maybe I missed something but I anything supporting your
>supposition in the source or docs:
>
>perlfaq8:
>   If the second argument to a piped open() contains shell
>   metacharacters, perl fork()s, then exec()s a shell to
>   decode the metacharacters and eventually run the desired
>   program...
>
>perlopentut:
>   But if the command contains special shell characters, such
>   as ">" or "*", called 'metacharacters', Perl does not execute
>   the command directly. Instead, Perl runs the shell, which then
>   tries to run the command.

Seems the docs are incomplete.


-- 
Web mail, POP3, and SMTP
http://www.beewyz.com/freeaccounts.php
 


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3053
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[31790] in Perl-Users-Digest

Perl-Users Digest, Issue: 3053 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Sat Jul 31 06:09:23 2010

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Jul 31 06:09:23 2010