[10854] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4455 Volume: 8

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Dec 18 18:07:39 1998

Date: Fri, 18 Dec 98 15:00:19 -0800
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 18 Dec 1998     Volume: 8 Number: 4455

Today's topics:
    Re: $&, $', and $` and parens.... (Larry Rosler)
    Re: ($e_mail !~ /\w+[-\w]*\@\w+[-\w]*\.\w+/) (Richard J. Rauenzahn)
    Re: Close and ReOpen File (Tad McClellan)
    Re: encoding URL <gellyfish@btinternet.com>
    Re: extracting domain names <gellyfish@btinternet.com>
        FMTEYEWTK on open <tchrist@mox.perl.com>
    Re: Help .... looking for /etc/group parser <gellyfish@btinternet.com>
        I know this sounds stupid... (Devin Redlich)
        Special: Digest Administrivia (Last modified: 12 Dec 98 (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 18 Dec 1998 12:47:53 -0800
From: lr@hpl.hp.com (Larry Rosler)
Subject: Re: $&, $', and $` and parens....
Message-Id: <MPG.10e45f565c38dfc59898da@nntp.hpl.hp.com>

[Posted to comp.lang.perl.misc and a copy mailed.]

In article <367ab3ce.258641@news.skynet.be> on Fri, 18 Dec 1998 20:05:42 
GMT, Bart Lateur <bart.lateur@skynet.be> says...
 ...
> So the simple regex + use of ($`,$&,$') is 2 to 3 times faster than the
> far more complicated (from the point of view of the regex engine!) one.
> 
> And the penaly for using those special variables does exist, but the
> speed drop appears to be around 1% or less. Oh dear. What were we
> loosing sleep over.

Before I change any of my code, I'd like to hear more, from those who 
have repeatedly imprecated against the special variables.  It is 
possible that improvements in the regex engine have obsoleted those 
warnings.
 
> BTW it's the numbers between the parens that are important, right? But I
> don't get why there's such a sometimes extreme difference between CPU
> time and total time.

It has to do with sharing the processor on a loaded webserver.  
Sometimes a CPU-bound process takes far more total time than even those 
numbers show!  (The first set of benchmarks was on my very own Windows 
NT system, with no one else competing for the CPU.  Note that the 
Benchmark module actually uses a different output format for non-Unix 
systems.)

-- 
(Just Another Larry) Rosler
Hewlett-Packard Company
http://www.hpl.hp.com/personal/Larry_Rosler/
lr@hpl.hp.com


------------------------------

Date: 18 Dec 1998 22:28:06 GMT
From: nospam@hairball.cup.hp.com (Richard J. Rauenzahn)
Subject: Re: ($e_mail !~ /\w+[-\w]*\@\w+[-\w]*\.\w+/)
Message-Id: <914020079.760322@hpvablab.cup.hp.com>

emclean@slip.net (Emmett McLean) writes:
>
> I have a valid email address but nothing ever came.
>

Still no FAQ?  That's strange.  I received one just last week when
posting from a new address.

[...]
> Now, when I contributed on comp.lang.apl there was a lot of sad feelings
> that APL is/was dying. Some people felt encouraged and pleased to see
> that new people were using the language. It didn't matter so much that
> you asked a FAQ, it mattered that you were commited to learning J.

I think there's a matter of scale that changes the personality of this
group.  I checked out the APL group and noticed the number of posts is
quite small -- this group has at least 10 times the posts, and I was
caught up just a few days ago.  

When there are a lot more newbies asking FAQ's and people answering them
instead of pointing to the FAQ's (and adding no new ideas and _lots_ of
errors) a lot of noise is generated that makes it harder to find the
posts that are not FAQ's.

The experts could ignore the uninteresting posts because they're FAQ's,
but then the non-experts try to answer them, often incorrectly, even
though the answer is clearly documented in the FAQ after having been
looked at by 100's of critical eyes.

Rich
-- 
Rich Rauenzahn ----------+xrrauenza@cup.hp.comx+ Hewlett-Packard Company
Technical Consultant     | I speak for me,     |   19055 Pruneridge Ave. 
Development Alliances Lab|            *not* HP |                MS 46TU2
IASD / Enterprise Systems Group +--------------+---- Cupertino, CA 95014


------------------------------

Date: Fri, 18 Dec 1998 15:53:14 -0600
From: tadmc@metronet.com (Tad McClellan)
Subject: Re: Close and ReOpen File
Message-Id: <asie57.rpu.ln@magna.metronet.com>

Bart Lateur (bart.lateur@skynet.be) wrote:
: Tad McClellan wrote:

: >{
: >   local $/ = '';  # paragraph mode

: Is that the empty string? You don't need that. undef is just as
: acceptable. 


   it is just as acceptable if you don't mind the different semantics...


: So "local $/" is all you need. ("Look ma, no warnings!")

: >   while (defined ($para = <>)) {
: >      # do stuff
: >   }
: >}

: Urm... Won't the WHOLE file be read in one statement?


   It won't if you do what I did.

   It will if you do what you did.

   ;-)

   para mode ( when $/ is the empty string) is distinct from 
   slurp mode ( when $/ is undefined)


--
    Tad McClellan                          SGML Consulting
    tadmc@metronet.com                     Perl programming
    Fort Worth, Texas


------------------------------

Date: 18 Dec 1998 19:44:38 -0000
From: Jonathan Stowe <gellyfish@btinternet.com>
Subject: Re: encoding URL
Message-Id: <75ebb6$bd$1@gellyfish.btinternet.com>

On Thu, 10 Dec 1998 13:09:03 GMT umsee <NOSPAM_umsee@microasia.com> wrote:
> I would like to encode a string so that I can pass the string to perl using 
> GET method.
> 
> i.e convert all spaces to '+'
<snip>

Bear in mind that this conversion is only a *convention*: to rigidly comply
with the specifications spaces should be converted to %20 just as any other
'special' character - you might want to read the relevant RFC for some 
clarification (rfc1945 I think) .

/J\
-- 
Jonathan Stowe <jns@btinternet.com>
Some of your questions answered:
<URL:http://www.btinternet.com/~gellyfish/resources/wwwfaq.htm>
Hastings: <URL:http://www.newhoo.com/Regional/UK/England/East_Sussex/Hastings>


------------------------------

Date: 18 Dec 1998 22:35:44 -0000
From: Jonathan Stowe <gellyfish@btinternet.com>
Subject: Re: extracting domain names
Message-Id: <75elc0$e9$1@gellyfish.btinternet.com>

On 17 Dec 1998 18:42:14 GMT slidge@tugger.net wrote:
> 
> 
> -- 
> slidge
> slidge@tugger.net
> See http://tugger.net/slidge for the fee schedule to use my email address.

Hmm,
and the question is :-?

/J\
-- 
Jonathan Stowe <jns@btinternet.com>
Some of your questions answered:
<URL:http://www.btinternet.com/~gellyfish/resources/wwwfaq.htm>
Hastings: <URL:http://www.newhoo.com/Regional/UK/England/East_Sussex/Hastings>


------------------------------

Date: 18 Dec 1998 20:49:25 GMT
From: Tom Christiansen <tchrist@mox.perl.com>
Subject: FMTEYEWTK on open
Message-Id: <75ef4l$1cg$1@csnews.cs.colorado.edu>

    [updated text to supersede previous incarnation]

=head1 NAME

perlopentut - the basics on opening things in Perl

=head1 DESCRIPTION

Perl has two simple, built-in ways to open files: the shell way for
convenience, and the C way for precision.  The choice is yours.

=head1 Open E<agrave> la shell

Perl's C<open> function was designed to mimic the way command-line
redirection in the shell works.  Here are some basic examples
from the shell:

    $ myprogram file1 file2 file3
    $ myprogram    <  inputfile
    $ myprogram    >  outputfile
    $ myprogram    >> outputfile
    $ myprogram    |  otherprogram 
    $ otherprogram |  myprogram

And here are some more advanced examples:

    $ otherprogram      | myprogram f1 - f2
    $ otherprogram 2>&1 | myprogram -
    $ myprogram     <&3
    $ myprogram     >&4

Programmers accustomed to constructs like those above can take comfort
in learning that Perl directly supports these familiar constructs using
virtually the same syntax as the shell.

=head2 Simple Opens

The C<open> function takes two arguments: the first is a filehandle,
and the second is a single string comprising both what to open and how
to open it.  C<open> returns true when it works, and when it fails,
returns a false value and sets the special variable $! to reflect
the system error.

For example:

    open(INFO,      "datafile") || die("can't open datafile: $!");
    open(INFO,   "<  datafile") || die("can't open datafile: $!");
    open(RESULTS,">  runstats") || die("can't open runstats: $!");
    open(LOG,    ">> logfile ") || die("can't open logfile:  $!");

If you prefer the low-punctuation version, you could write that this way:

    open INFO,   "<  datafile"  or die "can't open datafile: $!";
    open RESULTS,">  runstats"  or die "can't open runstats: $!";
    open LOG,    ">> logfile "  or die "can't open logfile:  $!";

A few things to notice.  First, the leading less-than is optional.
If omitted, Perl assumes that you want to open the file for reading.

Next, a filehandle is not really directly assignable, but you can
look at the FMTEYEWTK on Indirect Filehandles at language.perl.com for
alternative strategies.  Instead, you select a filehandle by choosing
an identifier.  This is a symbol much like a subroutine name, and is in
the package namespace.  By convention, we don't use all lower-case for
fear of clashing with a reserved word.  For example, if we had used C<log>
instead of C<LOG>, the best we could hope for is some very strange errors
as perl replaces that symbol with the result of calculating the natural
logarithm of the value currently stored in $_.  Not a pretty picture.

The other important thing to notice is that, just as in the shell,
any white space before or after the filename is ignored.  This is good,
because you wouldn't want these to do different things:

    open INFO,   "<datafile"   
    open INFO,   "< datafile" 
    open INFO,   "<  datafile"

It also helps for when you read a filename in from a different
file, and forget to trim it before opening:

    $filename = <INFO>;         # oops, \n still there
    open(EXTRA, "< $filename") || die "can't open $filename: $!";

This is not a bug, but a feature.  Because C<open> mimics the shell in
its style of using redirection arrows to specify how to open the file,
it follows the Principle of Least Surprise with respect to extra white
space around the filename itself as well.  For accessing files with
naughty names, see L</"Dispelling the Dweomer">.

=head2 Pipe Opens

In C, when you want to open a file using the standard I/O library,
you use the C<fopen> function, but when opening a pipe, you use the
C<popen> function.  But in the shell, you just use a different redirection
character.  That's also the case for Perl.  The C<open> call 
remains the same--just its argument differs.  

If the leading character is a pipe symbol, you start up a new command and
open a write-only filehandle leading into that command.  This lets you
write into that handle and have what you write show up on that command's
standard input.  For example:

    open(PRINTER, "| lpr -Plp1")    || die "cannot fork: $!";
    print PRINTER "stuff\n";
    close(PRINTER)                  || die "can't close lpr: $!";

If the trailing character is a pipe, you start up a new command and open a
read-only filehandle leading out of that command.  This lets whatever that
command writes to its standard output show up on your handle for reading.
For example:

    open(NET, "netstat -i -n |")    || die "cannot fork: $!";
    while (<NET>) { }               # do something with input
    close(NET)                      || die "can't close netstat: $!";

What happens if you try to open a pipe to or from a non-existent command?
In most systems, such an C<open> will not return an error. That's
because in the traditional C<fork>/C<exec> model, running the other
program happens only in the forked child process, which means that
the failed C<exec> can't be reflected in the return value of C<open>.
Only a failed C<fork> shows up there.  See L<perlfaq8/"Why doesn't open()
return an error when a pipe open fails?"> to see how to cope with this.
There's also an explanation in L<perlipc>.

If you would like to open a bidirectional pipe, the IPC::Open2
library will handle this for you.  Check out L<perlipc/"Bidirectional
Communication with Another Process">

=head2 The Minus File

Again following the lead of the standard shell utilities, Perl's
C<open> function treats a file whose name is a single minus, "-", in a
special way.  If you open minus for reading, it really means to access
the standard input.  If you open minus for writing, it really means to
access the standard output.

If minus can be used as the default input or default output?  What happens
if you open a pipe into or out of minus?  What's the default command it
would run?  The same script as you're current running!  This is actually
a stealth C<fork> hidden inside an C<open> call.  See L<perlipc/"Safe Pipe
Opens"> for details.

=head2 Mixing Reads and Writes

It is possible to specify both read and write access.  All you do is
add a "+" symbol in front of the redirection.  But as in the shell,
using a less-than on a file never creates a new file; it only opens an
existing one.  On the other hand, using a greater-than always clobbers
(truncates to zero length) an existing file, or creates a brand-new one
if there isn't an old one.  Adding a "+" for read-write doesn't affect
whether it only works on existing files or always clobbers existing ones.

    open(WTMP, "+< /usr/adm/wtmp") 
        || die "can't open /usr/adm/wtmp: $!";

    open(SCREEN, "+> /tmp/lkscreen")
        || die "can't open /tmp/lkscreen: $!";

    open(LOGFILE, "+>> /tmp/applog"
        || die "can't open /tmp/applog: $!";

The first one won't create a new file, and the second one will always
clobber an old one.  The third one will create a new file if necessary
and not clobber an old one, and it will allow you to read at any point
in the file, but all writes will always go to the end.  In short, the
first case is substantially more common than the second and third cases,
which are almost always wrong.

In fact, when it comes to updating a file, unless you're working on
a binary file as in the WTMP case above, you probably don't want to
use this approach for updating.  Instead, Perl's B<-i> flag comes to
the rescue.  The following command takes all the C, C++, or yacc source
or header files and changes all their foo's to bar's, leaving
the old version in the original file name with a ".orig" tacked
on the end:

    $ perl -i.orig -pe 's/\bfoo\b/bar/g' *.[Cchy]

This is a short cut for some renaming games that are really
the best way to update textfiles.  See the second question in 
L<perlfaq5> for more details.

=head2 Filters 

One of the most common uses for C<open> is one you never
even notice.  When you process the ARGV filehandle using
C<E<lt>ARGVE<gt>>, Perl actually does an implicit open 
on each file in @ARGV.  Thus a program called like this:

    $ myprogram file1 file2 file3

Can have all its files opened and processed one at a time
using a construct no more complex than:

    while (<>) {
        # do something with $_
    } 

If @ARGV is empty when the loop first begins, Perl pretends you've opened
up minus, that is, the standard input.  In fact, $ARGV, the currently
open file during C<E<lt>ARGVE<gt>> processing, is even set to "-"
in these circumstances.

You are welcome to pre-process your @ARGV before starting the loop to
make sure it's to your liking.  One reason to do this might be to remove
command options beginning with a minus.  While you can always roll the
simple ones by hand, the Getopts modules are good for this.

    use Getopt::Std;

    # -v, -D, -o ARG, sets $opt_v, $opt_D, $opt_o
    getopts("vDo:");            

    # -v, -D, -o ARG, sets $args{v}, $args{D}, $args{o}
    getopts("vDo:", \%args);    

Or the standard Getopt::Long module to permit named arguments:

    use Getopt::Long;
    GetOptions( "verbose"  => \$verbose,        # --verbose
                "Debug"    => \$debug,          # --Debug
                "output=s" => \$output );       
	    # --output=somestring or --output somestring

Another reason for preprocessing arguments is to make an empty
argument list default to all files:

    @ARGV = glob("*") unless @ARGV;

You could even filter out all but plain, text files.  This is a bit
silent, of course, and you might prefer to mention them on the way.

    @ARGV = grep { -f && -T } @ARGV;

If you're using the B<-n> or B<-p> command-line options, you
should put changes to @ARGV in a C<BEGIN{}> block.

Remember that a normal C<open> has special properties, in that it might
call fopen(3S) or it might called popen(3S), depending on what its
argument looks like; that's why it's sometimes called "magic open".
Here's an example:

    $pwdinfo = `domainname` =~ /^(\(none\))?$/
                    ? '< /etc/passwd'
                    : 'ypcat passwd |';

    open(PWD, $pwdinfo)                 
                or die "can't open $pwdinfo: $!";

This sort of thing also comes into play in filter processing.  Because
C<E<lt>ARGVE<gt>> processing employs the normal, shell-style Perl C<open>,
it respects all the special things we've already seen:

    $ myprogram f1 "cmd1|" - f2 "cmd2|" f3 < tmpfile

That program will read from the file F<f1>, the process F<cmd1>, standard
input (F<tmpfile> in this case), the F<f2> file, the F<cmd2> command,
and finally the F<f3> file.

Yes, this also means that if you have a file named "-" (and so on) in
your directory, that they won't be processed as literal files by C<open>.
You'll need to pass them as "./-" much as you would for the I<rm> program.
Or you could use C<sysopen> as described below.

One of the more interesting applications is to change files of a certain
name into pipes.  For example, to autoprocess gzipped or compressed
files by decompressing them with I<gzip>:

    @ARGV = map { /^\.(gz|Z)$/ ? "gzip -dc $_ |" : $_  } @ARGV;

Or, if you have the I<GET> program installed from LWP,
you can fetch URLs before processing them:

    @ARGV = map { m#^\w+://# ? "GET $_ |" : $_ } @ARGV;

It's not for nothing that this is called magic C<E<lt>ARGVE<gt>>.
Pretty nifty, eh?

=head1 Open E<agrave> la C

If you want the convenience of the shell, then Perl's C<open> is
definitely the way to go.  On the other hand, if you want finer precision
than C's simplistic fopen(3S) provides, then you should look to Perl's
C<sysopen>, which is a direct hook into the open(2) system call.
That does mean it's a bit more involved, but that's the price of 
precision.

C<sysopen> takes 3 (or 4) arguments.

    sysopen HANDLE, PATH, FLAGS, [MASK]

The HANDLE argument is a filehandle just as with C<open>.  The PATH is
a literal path, one that doesn't pay attention to any greater-thans or
less-thans or pipes or minuses, nor ignore white space.  If it's there,
it's part of the path.  The FLAGS argument contains one or more values
derived from the Fcntl module that have been or'd together using the
bitwise "|" operator.  The final argument, the MASK, is optional; if
present, it is combined with the user's current umask for the creation
mode of the file.  You should usually omit this.

Although the traditional values of read-only, write-only, and read-write
are 0, 1, and 2 respectively, this is known not to hold true on some
systems.  Instead, it's best to load in the appropriate constants first
from the Fcntl module, which supplies the following standard flags:

    O_RDONLY            Read only
    O_WRONLY            Write only
    O_RDWR              Read and write
    O_CREAT             Create the file if it doesn't exist
    O_EXCL              Fail if the file already exists
    O_APPEND            Append to the file
    O_TRUNC             Truncate the file
    O_NONBLOCK          Non-blocking access

Less common flags that are sometimes available on some operating systems
include C<O_BINARY>, C<O_TEXT>, C<O_SHLOCK>, C<O_EXLOCK>, C<O_DEFER>,
C<O_SYNC>, C<O_ASYNC>, C<O_DSYNC>, C<O_RSYNC>, C<O_NOCTTY>, C<O_NDELAY>
and C<O_LARGEFILE>.  Consult your open(2) manpage or its local equivalent
for details.

Here's how to use C<sysopen> to emulate the simple C<open> calls we had
before.  We'll omit the C<|| die $!> checks for clarity, but make sure
you always check the return values in real code.  These aren't quite
the same, since C<open> will trim leading and trailing white space,
but you'll get the idea:

To open a file for reading:

    open(FH, "< $path");
    sysopen(FH, $path, O_RDONLY);

To open a file for writing, creating a new file if needed or else truncating
an old file:

    open(FH, "> $path");
    sysopen(FH, $path, O_WRONLY | O_TRUNC | O_CREAT);

To open a file for appending, creating one if necessary:

    open(FH, ">> $path");
    sysopen(FH, $path, O_WRONLY | O_APPEND | O_CREAT);

To open a file for update, where the file must already exist:

    open(FH, "+< $path");
    sysopen(FH, $path, O_RDWR);

And here are things you can do with C<sysopen> that you cannot do with
a regular C<open>.  As you see, it's just a matter of controlling the
flags in the third argument.

To open a file for writing, creating a new file which must not previously
exist:

    sysopen(FH, $path, O_WRONLY | O_EXCL | O_CREAT);

To open a file for appending, where that file must already exist:

    sysopen(FH, $path, O_WRONLY | O_APPEND);

To open a file for update, creating a new file if necessary:

    sysopen(FH, $path, O_RDWR | O_CREAT);

To open a file for update, where that file must not already exist:

    sysopen(FH, $path, O_RDWR | O_EXCL | O_CREAT);

To open a file without blocking, creating one if necessary:

    sysopen(FH, $path, O_WRONLY | O_NONBLOCK | O_CREAT);

=head2 Permissions E<agrave> la mode

If you omit the MASK argument to C<sysopen>, Perl uses the octal value
0666.  The normal MASK to use for executables and directories should
be 0777, and for anything else, 0666.

Why so permissive?  Well, it isn't really.  The MASK will be modified
by your process's current C<umask>.  A umask is a number representing
I<disabled> permissions bits; that is, bits that will not be turned on
in the created files' permissions field.

For example, if your C<umask> were 027, then the 020 part would
disable the group from writing, and the 007 part would disable others
from reading, writing, or executing.  Under these conditions, passing
C<sysopen> 0666 would create a file with mode 0640, since C<0666 &~ 027>
is 0640.

You should seldom use the MASK argument to C<sysopen()>.  That takes
away the user's freedom to choose what permission new files will have.
Denying choice is almost always a bad thing.  One exception would be for
cases where sensitive or private data is being stored, such as with mail
folders or cookie files.

=head1 Obscure Open Tricks

=head2 Re-Opening Files (dups)

Sometimes you already have a filehandle open, and want to make another
handle that's a duplicate of the first one.  In the shell, we place an
ampersand in front of a file descriptor number when doing redirections.
For example, C<2E<gt>&1> makes descriptor 2 (that's STDERR in Perl)
be redirected into descriptor 1 (which is usually Perl's STDOUT).
The same is essentially true in Perl: a filename that begins with an
ampersand is treated instead as a file descriptor if a number, or as a
filehandle if a string.

    open(SAVEOUT, ">&SAVEERR") || die "couldn't dup SAVEERR: $!";
    open(MHCONTEXT, "<&4")     || die "couldn't dup fd4: $!";

That means that if a function is expecting a filename, but you don't
want to give it a filename because you already have the file open, you
can just pass the filehandle with a leading ampersand.  It's best to
use a fully qualified handle though, just in case the function happens
to be in a different package:

    somefunction("&main::LOGFILE");

This way if somefunction() is planning on opening its argument, it can
just use the already opened handle.  This differs from passing a handle,
because with a handle, you don't open the file.  Here you have something
you can pass to open.

If you have one of those tricky, newfangled I/O objects that the C++
are raving about, then this doesn't work because those aren't a proper
filehandle in the native Perl sense.  You'll have to use fileno() to
pull out the proper descriptor number, assuming you can:

    use IO::Socket;
    $handle = IO::Socket::INET->new("www.perl.com:80");
    $fd = $handle->fileno;
    somefunction("&$fd");  # not an indirect function call

It can be easier (and certainly will be faster) just to use real
filehandles though:

    use IO::Socket;
    local *REMOTE = IO::Socket::INET->new("www.perl.com:80");
    die "can't connect" unless defined(fileno(REMOTE));
    somefunction("&main::REMOTE");

If the filehandle or descriptor number is preceded not just with a simple
"&" but rather with a "&=" combination, then Perl will not create a
completely new descriptor opened to the same place using the dup(2)
system call.  Instead, it will just make something of an alias to the
exist oneusing the fdopen(3S) library call  This is slightly more
parsimonious of systems resources, although this is less a concern
these days.  Here's an example of that:

    $fd = $ENV{"MHCONTEXTFD"};
    open(MHCONTEXT, "<&=$fd")   or die "couldn't fdopen $fd: $!";

If you're using magic C<E<lt>ARGVE<gt>>, you could even pass in as a
command line argument in @ARGV something like C<"E<lt>&=$MHCONTEXTFD">,
but we've never seen anyone actually do this.

=head2 Dispelling the Dweomer

Perl is more of a DWIMmer language than something like Java--where DWIM
is an acronym for "do what I mean".  But this principle sometimes leads
to more hidden magic than one knows what to do with.  In this way, Perl
is also filled with I<dweomer>, an obscure word meaning an enchantment.
Sometimes, Perl's DWIMmer is just too much like dweomer for comfort.

If magic C<open> is a bit too magical for you, you don't have to turn
to C<sysopen>.  To open a file with arbitrary weird characters in
it, it's necessary to protect any leading and trailing whitespace.
Leading whitespace is protected by inserting a C<"./"> in front of a
filename that starts with whitespace.  Trailing whitespace is protected
by appending an ASCII NUL byte (C<"\0">) at the end off the string.

    $file =~ s#^(\s)#./$1#;
    open(FH, "< $file\0")   || die "can't open $file: $!";

This assumes, of course, that your system considers dot the current
working directory, slash the directory separator, and disallows ASCII
NULs within a valid filename.  Most systems follow these conventions,
including all POSIX systems as well as proprietary Microsoft systems.
The only vaguely popular system that doesn't work this way is the
proprietary Macintosh system, which uses a colon where the rest of us
use a slash.  Maybe C<sysopen> isn't such a bad idea after all.

If you want to use C<E<lt>ARGVE<gt>> processing in a totally boring
and non-magical way, you could do this first:

    #   "Sam sat on the ground and put his head in his hands.  
    #   'I wish I had never come here, and I don't want to see 
    #   no more magic,' he said, and fell silent."
    for (@ARGV) { 
        s#^([^./])#./$1#;
        $_ .= "\0";
    } 
    while (<>) {  
        # now process $_
    } 

But be warned that users will not appreciate being unable to use "-"
to mean standard input, per the standard convention.

=head2 Paths as Opens

You've probably noticed how Perl's C<warn> and C<die> functions can
produce messages like:

    Some warning at scriptname line 29, <FH> chunk 7.

That's because you opened a filehandle FH, and had read in seven records
from it.  But what was the name of the file, not the handle?

If you aren't running with C<strict refs>, or if you've turn them off
temporarily, then all you have to do is this:

    open($path, "< $path") || die "can't open $path: $!";
    while (<$path>) {
        # whatever
    } 

Since you're using the pathname of the file as its handle,
you'll get warnings more like

    Some warning at scriptname line 29, </etc/motd> chunk 7.

=head2 Single Argument Open

Remember how we said that Perl's open took two arguments?  That was a
passive prevarication.  You see, it can also take just one argument.
If and only if the variable is a global variable, not a lexical, you
can pass C<open> just one argument, the filehandle, and it will 
get the path from the global scalar variable of the same name.

    $FILE = "/etc/motd";
    open FILE or die "can't open $FILE: $!";
    while (<FILE>) {
        # whatever
    } 

Why is this here?  Someone has to cater to the hysterical porpoises.
It's something that's been in Perl since the very beginning, if not
before.

=head2 Playing with STDIN and STDOUT

One clever move with STDOUT is to explicitly close it when you're done
with the program.

    END { close(STDOUT) || die "can't close stdout: $!" }

If you don't do this, and your program fills up the disk partition due
to a command line redirection, it won't report the error exit with a
failure status.

You don't have to accept the STDIN and STDOUT you were given.  You are
welcome to reopen them if you'd like.

    open(STDIN, "< datafile")
	|| die "can't open datafile: $!";

    open(STDOUT, "> output")
	|| die "can't open output: $!";

And then these can be read directly or passed on to subprocesses.
This makes it look as though the program were initially invoked
with those redirections from the command line.

It's probably more interesting to connect these to pipes.  For example:

    $pager = $ENV{PAGER} || "(less || more)";
    open(STDOUT, "| $pager")
	|| die "can't fork a pager: $!";

This makes it appear as though your program were called with its stdout
already piped into your pager.  You can also use this kind of thing
in conjunction with an implicit fork to yourself.  You might do this
if you would rather handle the post processing in your own program,
just in a different process:

    head(100);
    while (<>) {
        print;
    } 

    sub head {
        my $lines = shift || 20;
        return unless $pid = open(STDOUT, "|-");
        die "cannot fork: $!" unless defined $pid;
        while (<STDIN>) {
            print;
            last if --$lines < 0;
        } 
        exit;
    } 

This technique can be applied to repeatedly push as many filters on your
output stream as you wish.

=head1 Other I/O Issues

These topics aren't really arguments related to C<open> or C<sysopen>,
but they do affect what you do with your open files.

=head2 Opening Non-File Files

When is a file not a file?  Well, you could say when it exists but
isn't a plain file.   We'll check whether it's a symbolic link first,
just in case.

    if (-l $file || ! -f _) {
        print "$file is not a plain file\n";
    } 

What other kinds of files are there than, well, files?  Directories,
symbolic links, named pipes, Unix-domain sockets, and block and character
devices.  Those are all files, too--just not I<plain> files.  This isn't
the same issue as being a text file. Not all text files are plain files.
Not all plain files are textfiles.  That's why there are separate C<-f>
and C<-T> file tests.

To open a directory, you should use the C<opendir> function, then
process it with C<readdir>, carefully restoring the directory 
name if necessary:

    opendir(DIR, $dirname) or die "can't opendir $dirname: $!";
    while (defined($file = readdir(DIR))) {
        # do something with "$dirname/$file"
    }
    closedir(DIR);

If you want to process directories recursively, it's better
to use the File::Find module.  For example, this prints out
all files recursively, add adds a slash to their names if
the file is a directory.

    @ARGV = qw(.) unless @ARGV;
    use File::Find;
    find sub { print $File::Find::name, -d && '/', "\n" }, @ARGV;

To open a symbolic link, you can just pretend it is what
it points to.  Or, if you want to know I<what> it points
to, then C<readlink> is called for:

    if (-l $file) {
        if (defined($whither = readlink($file))) {
            print "$file points to $whither\n";
        } else {
            print "$file points nowhere: $!\n";
        } 
    } 

Named pipes are a different matter.  You pretend they're regular files,
but their opens will normally block until there is both a reader and
a writer.  You can read more about them in L<perlipc/"Named Pipes">.
Unix-domain sockets are rather different beasts as well; they're
described in L<perlipc/"Unix-Domain TCP Clients and Servers">.

When it comes to opening devices, it can be easy and it can tricky.
We'll assume that if you're opening up a block device, you know what
you're doing.  The character devices are more interesting.  These are
typically used for modems, mice, and some kinds of printers.  This is
described in L<perlfaq8/"How do I read and write the serial port?">
It's often enough to open them carefully:

    sysopen(TTYIN, "/dev/ttyS1", O_RDWR | O_NDELAY | O_NOCTTY)
        or die "can't open /dev/ttyS1: $!";
    open(TTYOUT, "+>&TTYIN)
        or die "can't dup TTYIN: $!";

    $ofh = select(TTYOUT); $| = 1; select($ofh);

    print TTYOUT "+++at\015";
    $answer = <TTYIN>;

What else can you open?  To open a connection using sockets, you won't use
one of Perl's two open functions.  See L<perlipc/"Sockets: Client/Server
Communication"> for that.  Here's an example.  Once you have it,
you can use FH as a bidirectional filehandle.

    use IO::Socket;
    local *FH = IO::Socket::INET->new("www.perl.com:80");

For opening up a URL, the LWP modules from CPAN are just what
the doctor ordered.  There's no filehandle interface, but
it's still easy to get the contents of a document:

    use LWP::Simple;
    $doc = get('http://www.sn.no/libwww-perl/');

=head2 Binary Files

On certain legacy systems with terminally convoluted (some would say
broken) I/O models, a file isn't a file--at least, not with respect to
the C standard I/O library.  On these old systems whose libraries (but
not kernels) distinguish between text and binary streams, to get files
to behave properly you'll have to bend over backwards to avoid nasty
problems.  On such infelicitous systems, sockets and pipes are already
opened in binary mode, and there is currently no way to turn that off.
With files, you have more options.

Another option is to use the C<binmode> function on the appropriate
handles before doing regular I/O on them:

    binmode(STDIN);
    binmode(STDOUT);
    while (<STDIN>) { print } 

Passing C<sysopen> a non-standard flag option will also open the file in
binary mode on those systems that support it.  This is the equivalent of
opening the file normally, then calling C<binmode>ing on the handle.

    sysopen(BINDAT, "records.data", O_RDWR | O_BINARY)
        || die "can't open records.data: $!";

Now you can use C<read> and C<print> on that handle without worrying
about the system non-standard I/O library breaking your data.  It's not
a pretty picture, but then, legacy systems seldom are.  CP/M will be
with us until the end of days, and after.

On systems with exotic I/O systems, it turns out that, astonishingly
enough, even unbuffered I/O using C<sysread> and C<syswrite> might do
sneaky data mutilation behind your back.

    while (sysread(WHENCE, $buf, 1024)) {
        syswrite(WHITHER, $buf, length($buf));
    } 

Depending on the vicissitudes of your runtime system, even these calls
may need C<binmode> or C<O_BINARY> first.  Systems known to be free of
such difficulties includes Unix, the Mac OS, Plan9, and Inferno.

=head2 File Locking

In a multitasking environment, you may need to be careful not to collide
with other processes who want to do I/O on the same files as others
are working on.  You'll often need shared or exclusive locks
on files for reading and writing respectively.  You might just
pretend that only exclusive locks exist.

Never use the existence of a file C<-e $file> as a locking indication,
because there is a race condition between the test for the existence of
the file and its creation.  Atomicity is critical.

Perl's most portable locking interface is via the C<flock> function,
whose simplicity is emulated on systems that don't directly support it,
such as SysV or WindowsNT.  The underlying semantics may affect how
it all works, so you should learn how C<flock> is implemented on your
system's port of Perl.

File locking I<does not> lock out another process that would like to
do I/O.  A file lock only locks out others trying to get a lock, not
processes trying to do I/O.  Because locks are advisory, if one process
uses locking and another doesn't, all bets are off.

By default, the C<flock> call will block until a lock is granted.
A request for a shared lock will be granted as soon as there is no
exclusive locker.  A request for a exclusive lock will be granted as
soon as there is no locker of any kind.  Locks are on file descriptors,
not file names.  You can't lock a file until you open it, and you can't
hold on to a lock once the file has been closed.

Here's how to get a blocking shared lock on a file, typically used
for reading:

    use 5.004;
    use Fcntl qw(:DEFAULT :flock);
    open(FH, "< filename")
        or die "can't open filename: $!";
    flock(FH, LOCK_SH)
        or die "can't lock filename: $!";
    # now read from FH

You can get a non-blocking lock by using C<LOCK_NB>.

    flock(FH, LOCK_SH | LOCK_NB)
        or die "can't lock filename: $!";

This can be useful for producing more user-friendly behaviour by warning
if you're going to be blocking:

    use 5.004;
    use Fcntl qw(:DEFAULT :flock);
    open(FH, "< filename") or die "can't open filename: $!";
    unless (flock(FH, LOCK_SH | LOCK_NB)) {
	$| = 1;
	print "Waiting for lock...";
	flock(FH, LOCK_SH) or die "can't lock filename: $!";
	print "got it.\n"
    } 
    # now read from FH

To get an exclusive lock, typically used for writing, you have to be
careful.  We C<sysopen> the file so it can be locked before it gets
emptied.  You can get a nonblocking version using C<LOCK_EX | LOCK_NB>.

    use 5.004;
    use Fcntl qw(:DEFAULT :flock);
    sysopen(FH, "filename", O_WRONLY | O_CREAT)
        or die "can't open filename: $!";
    flock(FH, LOCK_EX)
        or die "can't lock filename: $!";
    truncate(FH, 0)
        or die "can't truncate filename: $!";
    # now write to FH

Finally, because everyone wants to know how to update a file in place
just because they cannot be dissuaded from wasting cycles on useless
hit counters, here's how you increment a number in a file:

    use Fcntl qw(:DEFAULT :flock);

    sysopen(FH, "numfile", O_RDWR | O_CREAT)
        or die "can't open numfile: $!";
    # autoflush FH
    $ofh = select(FH); $| = 1; select ($ofh);
    flock(FH, LOCK_EX)
        or die "can't write-lock numfile: $!";

    $num = <FH> || 0;
    seek(FH, 0, 0)
        or die "can't rewind numfile : $!";
    print FH $num+1, "\n"
        or die "can't write numfile: $!";

    truncate(FH, tell(FH))
        or die "can't truncate numfile: $!";
    close(FH)
        or die "can't close numfile: $!";

=head2 POSIX Tty Handling

[* Does this section belong in this document? *]

Rather than losing yourself in a morass of twisting, turning C<ioctl>s,
all dissimilar, if you're going to manipulate ttys, it's best to
make calls out to the stty(1) program if you have it, or else use the
portable POSIX interface.  To figure this all out, you'll need to read the
termios(3) manpage, which describes the POSIX interface to tty devices,
and then L<POSIX>, which describes Perl's interface to POSIX.

For example, here we'll set the speed on particular tty:

    use POSIX qw(:termios_h);

    $term = POSIX::Termios->new;
    sysopen(CUA, "+</dev/cua1", O_RDWR | O_NDELAY | O_NOCTTY)
        or die "can't open /dev/cua1: $!";
    $fd = fileno(CUA);
    $term->getattr($fd));
    $term->setospeed(B9600);
    $term->setispeed(B9600);
    $term->setattr($fd, TCSANOW);

Here's a longer demo of manipulating termios to play games with the the
erase and kill characters:

    use POSIX qw(:termios_h);

    $term = POSIX::Termios->new;
    $term->getattr(fileno(STDIN));

    $erase = $term->getcc(VERASE);
    $kill = $term->getcc(VKILL);
    printf "Erase is character %d, %s\n", $erase, uncontrol(chr($erase));
    printf "Kill is character %d, %s\n", $kill, uncontrol(chr($kill));

    $term->setcc(VERASE, ord('#'));
    $term->setcc(VKILL, ord('@'));
    $term->setattr(1, TCSANOW);

    print("erase is #, kill is @; type something: ");
    $line = <STDIN>;
    print "You typed: $line";

    $term->setcc(VERASE, $erase);
    $term->setcc(VKILL, $kill);
    $term->setattr(1, TCSANOW);

    sub uncontrol {
        local $_ = shift;
        s/([\200-\377])/sprintf("M-%c",ord($1) & 0177)/eg;
        s/([\0-\37\177])/sprintf("^%c",ord($1) ^ 0100)/eg;
        return $_;
    } 

And here's an example for manipulating single character input
without echo and without waiting for a carriage return:

    $| = 1;
    while (1) {
        print "Enter `q' to quit: ";
        $key = getone();  
        last if ($key eq 'q');
        print "$key is not it\n";
    } 
    print "\nbye!\n";

    BEGIN {
       use POSIX qw(:termios_h);

       my ($term, $oterm, $echo, $noecho, $fd_tty);

       open(TTY, "+< /dev/tty") || die "no /dev/tty access: $!";

       $fd_tty = fileno(TTY);

       $term     = POSIX::Termios->new();
       $term->getattr($fd_tty);
       $oterm     = $term->getlflag();

       $echo     = ECHO | ECHOK | ICANON;
       $noecho   = $oterm & ~$echo;

       sub cbreak {
           $term->setlflag($noecho);
           $term->setcc(VTIME, 1);
           $term->setattr($fd_tty, TCSANOW);
       }

       sub cooked {
           $term->setlflag($oterm);
           $term->setcc(VTIME, 0);
           $term->setattr($fd_tty, TCSANOW);
       }

       sub getone {
           my $key = '';
           cbreak();
           sysread(TTY, $key, 1);
           cooked();
           return $key;
       }

    }

    END { cooked() }

If you wanted nonblocking input, you would open the tty this way instead:

    use Fcntl;
    sysopen(TTY, "/dev/tty", O_RDWR | O_NONBLOCK)
        || die "no /dev/tty access: $!";

With descriptors that you haven't opened using C<sysopen>, such as a
socket, you can set them to be non-blocking using C<fcntl>:

    use Fcntl;
    fcntl(Connection, F_SETFL, O_NONBLOCK) 
        or die "can't set non blocking: $!";

There are some nice modules on CPAN that can help you with these
games.  Check out Term::ReadKey and Term::ReadLine.

=head1 AUTHOR and COPYRIGHT

Copyright 1998 Tom Christiansen.  
All Rights Reserved.

=head1 HISTORY

None yet.  This is a pre-release.
-- 
Never hit a man with glasses.  Hit him with a baseball bat.


------------------------------

Date: 18 Dec 1998 21:58:53 -0000
From: Jonathan Stowe <gellyfish@btinternet.com>
Subject: Re: Help .... looking for /etc/group parser
Message-Id: <75ej6t$cv$1@gellyfish.btinternet.com>

On Fri, 18 Dec 1998 11:17:38 -0600 Santosh kutty <santosh_kutty@compuware.com> wrote:
> This is a multi-part message in MIME format.
> --------------4EEEF4B1965910416FC6649D
> Content-Type: text/plain; charset=us-ascii
> Content-Type: text/plain; charset=us-ascii
> Content-Transfer-Encoding: 7bit
> Content-Transfer-Encoding: 7bit
> 

Purleeze ...

> Hi,
> I am looking for a script that parses the /etc/group file and does all
> the standard things like adding a group, deleting a group, adding users
> to a group etc .... I am sure that there is one out there ... I just
> don't want to re-invent the wheel !!
> 
> Does anyone have a similar script ... or know where I can find one ??
> I would appreciate any help .... you email me directly at
> santosh_kutty@compuware.com
> Thank you in advance,

something other than provided by groupadd,groupmod,groupdel or (on my system)
gpasswd ?

You could try looking at CPAN at <URL:http://www.perl.com/CPAN> to see if
there are any Modules that might achieve what you want.

/J\
-- 
Jonathan Stowe <jns@btinternet.com>
Some of your questions answered:
<URL:http://www.btinternet.com/~gellyfish/resources/wwwfaq.htm>
Hastings: <URL:http://www.newhoo.com/Regional/UK/England/East_Sussex/Hastings>


------------------------------

Date: Fri, 18 Dec 1998 22:50:31 GMT
From: devin@pctc.com (Devin Redlich)
Subject: I know this sounds stupid...
Message-Id: <75em34$mi@netaxs.com>

I downloaded libwww-perl-5.36 onto my development system, a custom built 
Pentium 233 running SCO Openserver 5.  Unpriviledged, I downloaded and 
built the module, then as root I did a make install and all was well.  
Later, I repeated this process on our production server, an HP Netserver, 
dual PPro, Mylex RAID controller, etc, also running SCO Openserver 5.  
Again, all was well.

This was about a month or two ago.

Today, I decided I'd finally get rid of the distribution directory on the 
production server.  As an unpriviledged user, I:

rm -rf libwww-perl-5.36

And, instead of a nice, simple deletion, the OS kernel panic'd and the 
system crashed.  Nice, I thought to myself, another stellar example of SCO 
at its finest.  Rebooted the system, numerous fscks, etc...

Later, I returned to my development system, and I wanted to install the 
newer libwww-perl-5.41.  There was that old 5.36 distribution directory 
still sitting there, so I thought I should get rid of it to avoid any 
confusion.  Another:

rm -rf libwww-perl-5.36

And what happens?  Different system, same result.  Another kernel panic and 
another crash.  I couldn't believe it.  Now, I'm no perl expert - I just 
use it for a few quick utilities now and then - but I *am* allowed to 
delete the distribution directory after I do a "make install", am I not?

If I'm doing something wrong, please clue me in.  Thx.


------------------------------

Date: 12 Dec 98 21:33:47 GMT (Last modified)
From: Perl-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Special: Digest Administrivia (Last modified: 12 Dec 98)
Message-Id: <null>


Administrivia:

Well, after 6 months, here's the answer to the quiz: what do we do about
comp.lang.perl.moderated. Answer: nothing. 

]From: Russ Allbery <rra@stanford.edu>
]Date: 21 Sep 1998 19:53:43 -0700
]Subject: comp.lang.perl.moderated available via e-mail
]
]It is possible to subscribe to comp.lang.perl.moderated as a mailing list.
]To do so, send mail to majordomo@eyrie.org with "subscribe clpm" in the
]body.  Majordomo will then send you instructions on how to confirm your
]subscription.  This is provided as a general service for those people who
]cannot receive the newsgroup for whatever reason or who just prefer to
]receive messages via e-mail.

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.misc (and this Digest), send your
article to perl-users@ruby.oce.orst.edu.

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

The Meta-FAQ, an article containing information about the FAQ, is
available by requesting "send perl-users meta-faq". The real FAQ, as it
appeared last in the newsgroup, can be retrieved with the request "send
perl-users FAQ". Due to their sizes, neither the Meta-FAQ nor the FAQ
are included in the digest.

The "mini-FAQ", which is an updated version of the Meta-FAQ, is
available by requesting "send perl-users mini-faq". It appears twice
weekly in the group, but is not distributed in the digest.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V8 Issue 4455
**************************************

home help back first fref pref prev next nref lref last post