[23631] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 5838 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Nov 20 21:10:40 2003

Date: Thu, 20 Nov 2003 18:10:09 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 20 Nov 2003     Volume: 10 Number: 5838

Today's topics:
    Re: trying to understand fork and wait (John)
    Re: trying to understand fork and wait <usenet@morrow.me.uk>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 20 Nov 2003 16:10:41 -0800
From: jguad98@hotmail.com (John)
Subject: Re: trying to understand fork and wait
Message-Id: <a964da31.0311201610.77690266@posting.google.com>

Ben Morrow <usenet@morrow.me.uk> wrote in message news:<bpj1t3$7te$1@wisteria.csv.warwick.ac.uk>...
> jguad98@hotmail.com (John) wrote:
> > #!/usr/local/bin/perl
> > #=================================================================#
> 
> You don't need all these: they really don't make things any clearer.
> 

Sorry ... old habits based on learning to script in REXX on the 
mainframe ... mandatory templates with comment boxes & what not ...

> You do need
> 
>   use warnings;
>   use strict;
> 

I understand that 'use strict' forces 'good programming' but I don't 
know where to find more thorough explanation of the precise 'good 
programming' structures that 'strict' is enforcing ... i.e., I think
it
would be nice to know to some degree what is legal and illegal when
using 'strict' before I write & execute (even though my tendency is
usually to write first and debug later. :-) 

> > #-----------------------------------------------------------------
> > # some global vars
> > #-----------------------------------------------------------------
> >   $mypid=$$;
> 
>   my $mypid = $$;
> 
> and so on throughout.

is this to say I should use whitespace in or around variable
assignments?
that would cost me 2 more keystrokes! (j/k)

> 
> >   $basedir="/paf/rpas";
> >   @domainlist=`ls $basedir | egrep '^(top|key)..$'`;
> 
>   my @domainlist = <$basedir/{top,key}??>;
> 
> see perldoc -f glob

roger, wilco.  Thank you.

> 
> >   $patternfile="/path/to/patterns.list";
> >   open(PATTERNS,$patternfile);
> 
>   open my $PATTERNS, $patternfile or die "can't open pattern file: $!";

is the use/lack of use of parentheses here a personal style issue, or
is
there a reason for this choice?

> 
> >   @patternlist=<PATTERNS>; # this is my store of error messages to 
> >   close(PATTERNS);
> > #-----------------------------------------------------------------
> > # scan the directories
> > #-----------------------------------------------------------------
> >   foreach $domain (@domainlist) {
> 
>   for my $domain (@domainlist) {
> 

why "for" and not "foreach" ?

> >     chomp($domain);
> >     $userdirbase="$basedir/$domain/users";
> >     opendir(USERS, "$userdirbase");
> >     #-------------------------------------------------------------
> >     # read the dir
> >     #-------------------------------------------------------------
> >     while ($file = readdir(USERS)) {
> >       #---------------------------------------------------------
> >       # only select files matching the username pattern
> >       #---------------------------------------------------------
> >       next unless $file =~ /^[a-z][a-z]\d\d\d\d\d$/;
> 
>   /^[a-z][a-z]\d{5}$/

thank you

> 
> >       #---------------------------------------------------------
> >       # check if we have a user dir
> >       #---------------------------------------------------------
> >       $currentuserdir="$userdirbase/$file";
> >       if (-d "$currentuserdir") {
> >         #-----------------------------------------------------
> >         # check if there is an applog in the user dir
> >         #-----------------------------------------------------
> >         $logfile="$currentuserdir/applog";
> >         if ( -e "$logfile") {
> >           #-------------------------------------------------
> >           # is the log open or closed?
> >           #-------------------------------------------------
> >           $return=`grep 'known end of session message' $logfile`;
> > 
> 
> Better than this would be to delete the log when it finishes; unless
> you need to keep them, in which case I would lock the logfile while
> the child reads it.
> 

I don't own the log, am only allowed to read it.  The application
which
creates the log does nothing with it when a session ends, and
overwrites
it upon the next session startup.  

> >           #-------------------------------------------------
> >           # if open, fire the monitor
> >           #-------------------------------------------------
> >           if (! $return) {# grep failed, ergo file is active
> 
> grep exits with 0 if it succeeds, so this test is precisely backwards
> :).
>  

hmmm... variable $return is being instantiated with output from `grep`
(note backticks?) so I would expect that if grep fails, there is no 
output, so $return will be empty.  I thought that an empty var will 
fail an 'exists' test, so I coded "if not exists" ... a response of 
"exists" indicates the EOS message was found, which is a do-nothing 
situation for me (drop through to bottom/go back to top of loop). I 
only want to "do something" if there is _no_ EOS found.

Is there a better way to test for the "EOS" string in $logfile?

> >           #-------------------------------------------------
> >           # line below is because everything I've read about fork 
> >           # says I gotta wait on the child pid (?) or it will
> >           # become a zombie which is bad.  I don't understand
> >           # what this does or how it works.
> >           #-------------------------------------------------
> >           $SIG{'CHLD'} = sub { wait(); };
> 
> When a process exits, it returns a status code (like grep exitted with
> 0 or 1 above). The parent process can collect this by calling wait or
> waitpid. Until it does, the process has to sit around occupying a slot
> in the process table, just to keep a hold of the exit code. So, for
> instance, the system() call above does (NB this is simplified C)
> 
> pid = fork();
> if(pid)
>   return waitpid(pid);
> else
>   exec("grep", ...);
> 
> internally.
> 
> The easy way to solve this is $SIG{CHLD} = 'IGNORE';, which says you
> don't care about exit codes. That fails on some systems, though; if it
> does you need to read the discussion under 'Signals' in perldoc perlipc.
> 

hmmm ... okay, I understand that I need to reap(?) my children to
prevent zombies.  But I am still hazy on the mechanism and it's
effects on the
rest of the program.  As you see, the main body is designed to loop 
around looking for user logs to read ... if at some loop iteration I
find
an active log and fork the child, I need some tool to wait on the
child
I just spawned. 

The thing that has been bothering me is this:  if at loop iteration 34
I
spawn a child and issue "wait", will my main program continue on to
loop
iteration 35, or does it pause there on loop 34 until the child exits
to
satisfy the wait function?
  
I don't want the program to pause because the child ("logreader") is 
expected to run indefinitely (well, up to 12 hours which is the target
application's activity window) and I want my parent program to keep
discovering logs and spawning logreaders for as long as the activity 
window is open.

> >           #-------------------------------------------------
> >           # now we fork if and only if we are the original program
> >           # we consider ourselves too young to have grandchildren
> >           #-------------------------------------------------
> >           if ($$ == $mypid) {
> >             $kidpid = fork() or die "cannot fork: $!";
> 
> This is wrong. fork returns 0 to the child, <pid> > 0 to the parent and
> undef if it fails. Children will always die, with this... You want
> 
> my $kidpid = fork;
> defined $kidpid or die "cannot fork: $!";
> 

thank you ... I'd seen it done your way, didn't understand why the 
use of "defined" ... now it makes more sense.

> # From this point on we have two almost identical copies of the
> # program running. The only difference is that one has $kidpid set to
> # 0, the other to some positive number.
> 
> if($kidpid) {
>     # parent
> } else {
>  # child
> }
> 

My intent was to aid my own newbie understanding by explicitly testing
'kidpid == 0' as opposed to the implicit "exists" test indicated above
in order to identify the child process.

> which is why you need two blocks in general. In your case, the parent
> doesn't want to do anything but loop round again, and the child loops
> over a file and exits. So you want <untested>
> 

Yes, the parent wants nought else but to loop around again, but note
that
in any given loop, the parent may spawn another child.  Each child is 
expected to be rather persistent as they have to watch the open user
log
for as long as the log is in use (there's that 12 hour window I
mentioned).

Also, however many users there are active on the system (between 100
and
300) is however many children I expect to spawn -- i.e., one child per
active user.  I want to ensure that the children do not step on each 
other (i.e., no more than one child per userlog), and I want to ensure
that the one original parent image is the only one to spawn children 
(hence the little quip about no grandchildren).

> unless($kidpid) {   # if we are the parent, just loop back

so if $kidpid is > 0 we are the parent ... the "unless kidpid" will
fail
if "$kidpid ge 0", am I getting that correct?  If $kidpid is zero, the
unless succeeds and we proceed to the next line ...

>     open my $LOG, $file or die "can't open logfile $file: $!";
> 
>     while ( (my $line = <LOG>) !~ /known EOS/ ) {

is that "my $line = <LOG>" an implicit loop?  hmmm ... don't recall
such
being mentioned in my Perl class ... do you know if this is explained
in
the Camel book or some other common reference material?

>         my $now = localtime;
> 
>         system "logger $now $domain $user $line" 
>             for grep { $line =~ /$_/ } @patternlist;
>         # or you would probably be better off using Sys::Syslog
>     }
> 

setting $now on each line of LOG is unnecessary for my purposes ... I 
only want to know the time if I find a match because the target
application does not consistently put timestamps in the user logs (I
told you it was stupid :-)

>     close $LOG;   #
>     unlink $file; # if you decide to delete them when you're done
> 
>     exit 0; # so we won't get out into the parent's code
> }
> 
> and ditch all this...
> 
> >           } else {
> >             #-------------------------------------------------
> >             # are we in the child process?
> >             #-------------------------------------------------
> >             if ($kidpid == 0) {
> >               #---------------------------------------------
> >               # read the user log
> >               #---------------------------------------------
> >               open(LOG,"$file");
> >               while ($line = <LOG>) {
> >                 close(LOG), exit if ($line =~ /known EOS message/);
> >                 foreach $patt (@patternlist) {
> >                   if ($line =~ /$patt/) {
> >                     $now=localtime(time);
> >                     system("logger $now $domain $user $line");
> >                   }#end child's if pattern match
> >                 }#end child's foreach loop
> >               }#end child's while loop
> >             }# this is the end of the child block
> >           }#end if mypid
> >         }#end if $return
> >       }#end if logfile exists
> >     }#end if directory (go to top of parent while loop)
> >   } #end parent while loop (go to top of parent foreach loop)
> 
> ...down to here.

thank you

> 
> >     closedir(USERS); 
> >   }#end parent foreach loop
> > #-----------------------------------------------------------------
> >   exit 0;
> 
> No need for this: 'falling off the end' is a perfectly valid way to
> end a Perl program.
> 

the explicit exit, along with a bit more flower-boxing is a habit I 
developed to ensure that I know where the intended bottom of the
script
is ... I've had bad experiences with cut-n-paste'd code and incomplete
copies where a hunk of a script has gone missing without me realizing
it.
If I was really on top of things, I'd also annotate line counts
periodically and indicate at the top how many lines I should have ;-).

> > #== EOF ==========================================================#
> 
> <snip>
> 
> > Will "wait" make me wait for a return or not?
> 
> Yes. waitpid() will wait for the specified child process to exit and
> return its exitcode; wait() (or waitpid with a pid of -1) will wait
> for *any* child to exit and return its exitcode. Depending on your
> system, if you are ignoring SIGCHLD both may return -1.
> 
> See wait(2).
> 

Is "wait(2)" different from "wait"?  If so, where do I find it? man & 
perldoc fail me ...

  #>man wait\(2\)
    No manual entry for wait(2).
  #>perldoc -f wait\(2\)
    No documentation for perl function `wait(2)' found

> > I have not actually run the above program mostly because I don't
> > understand how the wait & fork will work and I fear unintended
> > consequences.
> 
> Heh. :) When learning things like this, you need a test box to try
> stuff on. Remember, you can always kill the parent process, and then
> mop up the childen, if it starts doing things you don't expect.
> 
> Ben

Agreen, Ben, agreed.  I whipped this script up based on requirements 
presented to me by the application owners, and they have no "test" 
environment running the target application in which I could test my 
script.  So I have to build a dummy copy of the script (without any 
real application-specific features) to test on a test box that I have 
access to.  But I couldn't build the dummy script until I understood 
which functions could be and needed to be tested, and how to test
them,
know what I mean?

Thank you very much for your help and commentary so far, and I look
forward to further analysis from you or other newsgroup contributors.

Regards,

John G.


------------------------------

Date: Fri, 21 Nov 2003 01:26:22 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: trying to understand fork and wait
Message-Id: <bpjpju$3a9$1@wisteria.csv.warwick.ac.uk>

jguad98@hotmail.com (John) wrote:
> Ben Morrow <usenet@morrow.me.uk> wrote in message
> news:<bpj1t3$7te$1@wisteria.csv.warwick.ac.uk>...
> > You do need
> > 
> >   use warnings;
> >   use strict;
> 
> I understand that 'use strict' forces 'good programming' but I don't
> know where to find more thorough explanation of the precise 'good
> programming' structures that 'strict' is enforcing ... i.e., I think
> it would be nice to know to some degree what is legal and illegal
> when using 'strict' before I write & execute (even though my
> tendency is usually to write first and debug later. :-)

perldoc strict

> > >   $mypid=$$;
> > 
> >   my $mypid = $$;
> > 
> > and so on throughout.
> 
> is this to say I should use whitespace in or around variable
> assignments?
> that would cost me 2 more keystrokes! (j/k)

:) no, no, the important addition is the 'my'.

> > >   open(PATTERNS,$patternfile);
> > 
> >   open my $PATTERNS, $patternfile or die "can't open pattern file: $!";
> 
> is the use/lack of use of parentheses here a personal style issue,
> or is there a reason for this choice?

Sorry, personal style. It seems to be generally preferred in the Perl
community... but really doesn't matter. The important thing is the 'or
die'.

> > >   foreach $domain (@domainlist) {
> > 
> >   for my $domain (@domainlist) {
>
> why "for" and not "foreach" ?

Again, simply style... 'for' and 'foreach' are precise synonyms, so I
prefer to save typing :).

> > >           $return=`grep 'known end of session message' $logfile`;
<snip>
> > >           if (! $return) {# grep failed, ergo file is active
> > 
> > grep exits with 0 if it succeeds, so this test is precisely backwards
> > :).
> 
> hmmm... variable $return is being instantiated with output from `grep`
> (note backticks?) so I would expect that if grep fails, there is no 
> output, so $return will be empty.

Sorry, yes, brain not switched on... I would have used

  my $return = system "grep 'known end'";

which will return the exitcode of 'grep': 0 for success or 1 for
failure, so I assumed you had :).

> Is there a better way to test for the "EOS" string in $logfile?

I would always open the file and grep it in Perl, rather than forking
an external process. This can be as simple as: <untested>

  open my $FILE, "< $file" or die "horribly: $!";
  my @file = <$FILE>;
  my $live = not grep /end of session marker/, @file;
  close $FILE;

[I've re-ordered the text below as I think (hope:) it makes things clearer]

> > >           $SIG{'CHLD'} = sub { wait(); };
> > 
> > When a process exits, it returns a status code (like grep exitted with
> > 0 or 1 above). The parent process can collect this by calling wait or
> > waitpid. Until it does, the process has to sit around occupying a slot
> > in the process table, just to keep a hold of the exit code.

> hmmm ... okay, I understand that I need to reap(?)

To 'reap' a child process is to call 'wait' or 'waitpid' on it. The
analogy is with the Grim Reaper collecting up dead men's souls.

> my children to prevent zombies.

A zombie is one of these processes that has died but is still hanging
around to return its exit code.

> But I am still hazy on the mechanism and it's effects on the rest of
> the program.  As you see, the main body is designed to loop around
> looking for user logs to read ... if at some loop iteration I find
> an active log and fork the child, I need some tool to wait on the
> child I just spawned.

> The thing that has been bothering me is this: if at loop iteration
> 34 I spawn a child and issue "wait", will my main program continue
> on to loop iteration 35, or does it pause there on loop 34 until the
> child exits to satisfy the wait function?

When you call wait, your process stops until a child dies. So in your
program, the main parent loop never wants to call wait: that would
stop the loop, which isn't what you want.

This is what the CHLD signal is for. Whenever a child process dies,
the parent process is sent SIGCHLD, which means that perl will stop
whatever it is doing and jump (asynchronously) to the piece of code
you put in $SIG{CHLD}. If you put the wait in there, then your parent
process will never try to wait for a child until one has just died,
which means that it'll never hang around waiting for one to die.

One small caveat is that if several children die 'at the same time',
i.e. another child dies before the parent gets a chance to handle the
SIGCHLD from the first, you will only receive one signal for the
lot. So a CHLD handler has to keep wait()ing for dead children until
they're all gone, which is what

> > the discussion under 'Signals' in perldoc perlipc.

is about. However,

> > The easy way to solve this is $SIG{CHLD} = 'IGNORE';, which says you
> > don't care about exit codes.

 ...i.e. all children will be automatically reaped as soon as you die,
and you don't need to worry about it any more :).

> > # From this point on we have two almost identical copies of the
> > # program running. The only difference is that one has $kidpid set to
> > # 0, the other to some positive number.
> > 
> > if($kidpid) {
> >     # parent
> > } else {
> >  # child
> > }
> > 
> 
> My intent was to aid my own newbie understanding by explicitly testing
> 'kidpid == 0' as opposed to the implicit "exists" test indicated above
> in order to identify the child process.

This is not an 'exists' test. This is a test of truth: a value is
false in Perl if it is undef, 0, or ""; true otherwise. So

  if($kidpid != 0)

is equivalent to

  if($kidpid)

provided $kidpid is always going to be numeric.

> I want to ensure that the children do not step on each other (i.e.,
> no more than one child per userlog),

This is the tricky part: your business with grepping the
logfile *should* ensure that is the case. If it isn't reliable, or
anyway, you may rather keep a hash in the parent program with files
you have created processes for. So you want

  use POSIX qw/:sys_wait_h/; # for WNOHANG, which says 'don't wait if
                             # we've run out of dead children'
  my %pids, %files;

  $SIG{CHLD} = sub {
      while( (my $dead = waitpid -1, WNOHANG) > 0 ) {
          delete $pids{$files{$dead}}; # remove the dead child from
          delete $files{$dead};        # our records
      }
  };

at the top (if your system uses SysV signal semantics, see the
aforementioned section of perlipc), replace the 'unless' below with

  if($kidpid) { # parent
      $pids{$file}    = $kidpid; # record the new child
      $files{$kidpid} = $file;
  } 
  else { # child

and then test $pids{$file} to see if you have a child reading that
file. The reason for the two hashes is that we need to go both ways:
from a pid to a file, and from a file to a pid.

> and I want to ensure that the one original parent image is the only
> one to spawn children (hence the little quip about no
> grandchildren).

Yes: here, the child process can never escape the 'unless' block
(because of the exit at the end) so it will never fork again.

> > unless($kidpid) {   # if we are the parent, just loop back
>
> so if $kidpid is > 0 we are the parent ... the "unless kidpid" will
> fail if "$kidpid ge 0", am I getting that correct?

Yes, except that you mean $kidpid > 0: ge in Perl is string
comparison, the opposite of shell.

> >     open my $LOG, $file or die "can't open logfile $file: $!";
> > 
> >     while ( (my $line = <LOG>) !~ /known EOS/ ) {

Sorry, typo:                ^^^^^ <$LOG>

> is that "my $line = <LOG>" an implicit loop?

No. The loop is the 'while'. The stuff inside the brackets of the
while is equivalent to

  my $line;             # declare $line
  $line = <$LOG>;       # read the next line from $LOG into $line
  $line !~ /known EOS/; # this is true iff the line doesn't match

so it will keep reading a line at a time into $line until it hits one
which matches.

> >         my $now = localtime;
> > 
> >         system "logger $now $domain $user $line" 
> >             for grep { $line =~ /$_/ } @patternlist;
> >         # or you would probably be better off using Sys::Syslog
> >     }
> 
> setting $now on each line of LOG is unnecessary for my purposes

True... you could do

  my $now = localtime, system "..." for grep {...} @patternlist;

if you'd rather, or 

  if( grep {...} @patternlist ) {
      my $now = localtime;
      system "...";
  }

which may be better anyway as the others will make multiple log
entries if more than one pattern matches.

> > >   exit 0;
> > 
> > No need for this: 'falling off the end' is a perfectly valid way to
> > end a Perl program.
> 
> the explicit exit, along with a bit more flower-boxing is a habit I
> developed to ensure that I know where the intended bottom of the
> script is ... I've had bad experiences with cut-n-paste'd code and
> incomplete copies where a hunk of a script has gone missing without
> me realizing it.

The standard way to indicate the end of a Perl script is __END__,
which perl itself will also understand. It makes anything after that
available as the DATA filehandle, so if you really want you can check
it's there at the top of the script with

  die "End of script missing!" unless *DATA{IO};

DATA and the *foo{THING} notation are described in perldoc perldata,
or *foo{THING} rather better in the Camel, chapter 8, the section
entitled 'Symbol table references'. Or you can just treat it as black
magic :).

> > See wait(2).
> 
> Is "wait(2)" different from "wait"?

wait(2) is a standard way to refer to the manpage 'wait' in section 2
(which deals with syscalls). Read it with 'man 2 wait'. The reason I
wrote wait(2) was to show I specifically meant your system's manpage,
rather than the perl documentation.

Ben

-- 
'Deserve [death]? I daresay he did. Many live that deserve death. And some die
that deserve life. Can you give it to them? Then do not be too eager to deal
out death in judgement. For even the very wise cannot see all ends.'
 :-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-: ben@morrow.me.uk


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 5838
***************************************


home help back first fref pref prev next nref lref last post