[12293] in Perl-Users-Digest
Perl-Users Digest, Issue: 5892 Volume: 8
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Jun 4 19:07:23 1999
Date: Fri, 4 Jun 99 16:01:27 -0700
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Fri, 4 Jun 1999 Volume: 8 Number: 5892
Today's topics:
Opening a remote file? <thurley@globalnet.co.uk>
Re: Opening a remote file? <marc@www.com>
Re: Opening a remote file? <marc@www.com>
Re: Opening a remote file? <jeromeo@atrieva.com>
Re: Opening a remote file? <rootbeer@redcat.com>
Re: Opening a remote file? <tchrist@mox.perl.com>
Re: Opening a remote file? <cassell@mail.cor.epa.gov>
Special: Digest Administrivia (Last modified: 12 Dec 98 (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Fri, 4 Jun 1999 22:01:05 +0100
From: "Thurley" <thurley@globalnet.co.uk>
Subject: Opening a remote file?
Message-Id: <7j9etc$lh8$1@gxsn.com>
To my surprise, I found that
"open(DATA,"http://www.somewhere.com/myfiles/data.html");" wouldn't work in
my CGI. Does anyone know of an open function which knows what a URL or how
to open a file on a remote server. I need this as my site is split accross
servers.
Thanks,
James.
------------------------------
Date: Fri, 04 Jun 1999 14:12:21 -0700
From: Marc Northover <marc@www.com>
To: Thurley <thurley@globalnet.co.uk>
Subject: Re: Opening a remote file?
Message-Id: <37584135.57B83BE5@www.com>
Look into the LWP packages.
This prints the contents of a remote URL from the command line:
perl -MLWP::Simple -e 'getprint "http://www.blah.com"'
perldoc LWP::Simple for more information
Thurley wrote:
>
> To my surprise, I found that
> "open(DATA,"http://www.somewhere.com/myfiles/data.html");" wouldn't work in
> my CGI. Does anyone know of an open function which knows what a URL or how
> to open a file on a remote server. I need this as my site is split accross
> servers.
>
> Thanks,
> James.
------------------------------
Date: Fri, 04 Jun 1999 14:13:22 -0700
From: Marc Northover <marc@www.com>
To: Thurley <thurley@globalnet.co.uk>
Subject: Re: Opening a remote file?
Message-Id: <37584172.9863C58E@www.com>
Look into the LWP packages.
This prints the contents of a remote URL from the command line:
perl -MLWP::Simple -e 'getprint "http://www.blah.com"'
perldoc LWP::Simple for more information
Thurley wrote:
>
> To my surprise, I found that
> "open(DATA,"http://www.somewhere.com/myfiles/data.html");" wouldn't work in
> my CGI. Does anyone know of an open function which knows what a URL or how
> to open a file on a remote server. I need this as my site is split accross
> servers.
>
> Thanks,
> James.
------------------------------
Date: Fri, 04 Jun 1999 14:29:37 -0700
From: Jerome O'Neil <jeromeo@atrieva.com>
To: Thurley <thurley@globalnet.co.uk>
Subject: Re: Opening a remote file?
Message-Id: <37584541.E4C9DBBE@atrieva.com>
Thurley wrote:
>
> To my surprise, I found that
> "open(DATA,"http://www.somewhere.com/myfiles/data.html");" wouldn't work in
> my CGI.
Why does that surprise you? When you read the documentation for open(),
it said something different?
> Does anyone know of an open function which knows what a URL or how
> to open a file on a remote server.
LWP knows about URLs. I'm sure this is the module that you need. You
can find out all the exciting detail on LWP by visiting a CPAN archive
near you. Start at http://www.perl.com.
Good Luck!
--
Jerome O'Neil, Operations and Information Services
Atrieva Corporation, 600 University St., Ste. 911, Seattle, WA 98101
jeromeo@atrieva.com - Voice:206/749-2947
The Atrieva Service: Safe and Easy Online Backup http://www.atrieva.com
------------------------------
Date: Fri, 4 Jun 1999 15:23:00 -0700
From: Tom Phoenix <rootbeer@redcat.com>
Subject: Re: Opening a remote file?
Message-Id: <Pine.GSO.4.02A.9906041522130.10794-100000@user2.teleport.com>
On Fri, 4 Jun 1999, Thurley wrote:
> Does anyone know of an open function which knows what
> a URL or how to open a file on a remote server.
Try the LWP package from CPAN. Cheers!
--
Tom Phoenix Perl Training and Hacking Esperanto
Randal Schwartz Case: http://www.rahul.net/jeffrey/ovs/
------------------------------
Date: 4 Jun 1999 16:25:45 -0700
From: Tom Christiansen <tchrist@mox.perl.com>
Subject: Re: Opening a remote file?
Message-Id: <37585269@cs.colorado.edu>
[courtesy cc of this posting mailed to cited author]
Cursed by Microsoft Outlook Express 4.72.3110.1 to torture
comp.lang.perl.misc, "Thurley" <thurley@globalnet.co.uk> writes:
+------------------------------------------------------------+
| To my surprise, I found that |
| "open(DATA,"http://www.somewhere.com/myfiles/data.html");" |
| wouldn't work in my CGI. Does anyone know of an open |
| function which knows what a URL or how to open a file on |
| a remote server. I need this as my site is split accross |
| servers. |
+------------------------------------------------------------+
Ha! Ha ha! Ha-ha ha-ha! Ha ha ha ha ha ha ha ha ha ha ha ha ha ha!
Ok, I've gotten back on my chair again.
Here, read this. It's the manpage for Perl's open function. If your
negligent system adminstrator hasn't run splitpod etc, then you'll find
it in the standard perlfunc manpage, where it has lain in waiting for
time immemorial.
======================================================================
NAME
open - open a file, pipe, or descriptor
SYNOPSIS
open FILEHANDLE,EXPR
open FILEHANDLE
DESCRIPTION
Opens the file whose filename is given by EXPR, and associates
it with FILEHANDLE. If FILEHANDLE is an expression, its value is
used as the name of the real filehandle wanted. If EXPR is
omitted, the scalar variable of the same name as the FILEHANDLE
contains the filename. (Note that lexical variables--those
declared with `my'--will not work for this purpose; so if you're
using `my', specify EXPR in your call to open.) See the
perlopentut manpage for a kinder, gentler explanation of opening
files.
If the filename begins with `'<'' or nothing, the file is opened
for input. If the filename begins with `'>'', the file is
truncated and opened for output, being created if necessary. If
the filename begins with `'>>'', the file is opened for
appending, again being created if necessary. You can put a `'+''
in front of the `'>'' or `'<'' to indicate that you want both
read and write access to the file; thus `'+<'' is almost always
preferred for read/write updates--the `'+>'' mode would clobber
the file first. You can't usually use either read-write mode for
updating textfiles, since they have variable length records. See
the -i switch in the perlrun manpage for a better approach. The
file is created with permissions of `0666' modified by the
process' `umask' value.
The prefix and the filename may be separated with spaces. These
various prefixes correspond to the fopen(3) modes of `'r'',
`'r+'', `'w'', `'w+'', `'a'', and `'a+''.
If the filename begins with `'|'', the filename is interpreted
as a command to which output is to be piped, and if the filename
ends with a `'|'', the filename is interpreted as a command
which pipes output to us. See the section on "Using open() for
IPC" in the perlipc manpage for more examples of this. (You are
not allowed to `open' to a command that pipes both in *and* out,
but see the IPC::Open2 manpage, the IPC::Open3 manpage, and the
section on "Bidirectional Communication" in the perlipc manpage
for alternatives.)
Opening `'-'' opens STDIN and opening `'>-'' opens STDOUT. Open
returns nonzero upon success, the undefined value otherwise. If
the `open' involved a pipe, the return value happens to be the
pid of the subprocess.
If you're unfortunate enough to be running Perl on a system that
distinguishes between text files and binary files (modern
operating systems don't care), then you should check out the
"binmode" entry in the perlfunc manpage for tips for dealing
with this. The key distinction between systems that need
`binmode' and those that don't is their text file formats.
Systems like Unix, MacOS, and Plan9, which delimit lines with a
single character, and which encode that character in C as
`"\n"', do not need `binmode'. The rest need it.
When opening a file, it's usually a bad idea to continue normal
execution if the request failed, so `open' is frequently used in
connection with `die'. Even if `die' won't do what you want
(say, in a CGI script, where you want to make a nicely formatted
error message (but there are modules that can help with that
problem)) you should always check the return value from opening
a file. The infrequent exception is when working with an
unopened filehandle is actually what you want to do.
Examples:
$ARTICLE = 100;
open ARTICLE or die "Can't find article $ARTICLE: $!\n";
while (<ARTICLE>) {...
open(LOG, '>>/usr/spool/news/twitlog'); # (log is reserved)
# if the open fails, output is discarded
open(DBASE, '+<dbase.mine') # open for update
or die "Can't open 'dbase.mine' for update: $!";
open(ARTICLE, "caesar <$article |") # decrypt article
or die "Can't start caesar: $!";
open(EXTRACT, "|sort >/tmp/Tmp$$") # $$ is our process id
or die "Can't start sort: $!";
# process argument list of files along with any includes
foreach $file (@ARGV) {
process($file, 'fh00');
}
sub process {
my($filename, $input) = @_;
$input++; # this is a string increment
unless (open($input, $filename)) {
print STDERR "Can't open $filename: $!\n";
return;
}
local $_;
while (<$input>) { # note use of indirection
if (/^#include "(.*)"/) {
process($1, $input);
next;
}
#... # whatever
}
}
You may also, in the Bourne shell tradition, specify an EXPR
beginning with `'>&'', in which case the rest of the string is
interpreted as the name of a filehandle (or file descriptor, if
numeric) to be duped and opened. You may use `&' after `>',
`>>', `<', `+>', `+>>', and `+<'. The mode you specify should
match the mode of the original filehandle. (Duping a filehandle
does not take into account any existing contents of stdio
buffers.) Here is a script that saves, redirects, and restores
STDOUT and STDERR:
#!/usr/bin/perl
open(OLDOUT, ">&STDOUT");
open(OLDERR, ">&STDERR");
open(STDOUT, ">foo.out") || die "Can't redirect stdout";
open(STDERR, ">&STDOUT") || die "Can't dup stdout";
select(STDERR); $| = 1; # make unbuffered
select(STDOUT); $| = 1; # make unbuffered
print STDOUT "stdout 1\n"; # this works for
print STDERR "stderr 1\n"; # subprocesses too
close(STDOUT);
close(STDERR);
open(STDOUT, ">&OLDOUT");
open(STDERR, ">&OLDERR");
print STDOUT "stdout 2\n";
print STDERR "stderr 2\n";
If you specify `'<&=N'', where `N' is a number, then Perl will
do an equivalent of C's `fdopen' of that file descriptor; this
is more parsimonious of file descriptors. For example:
open(FILEHANDLE, "<&=$fd")
If you open a pipe on the command `'-'', i.e., either `'|-'' or
`'-|'', then there is an implicit fork done, and the return
value of open is the pid of the child within the parent process,
and `0' within the child process. (Use `defined($pid)' to
determine whether the open was successful.) The filehandle
behaves normally for the parent, but i/o to that filehandle is
piped from/to the STDOUT/STDIN of the child process. In the
child process the filehandle isn't opened--i/o happens from/to
the new STDOUT or STDIN. Typically this is used like the normal
piped open when you want to exercise more control over just how
the pipe command gets executed, such as when you are running
setuid, and don't want to have to scan shell commands for
metacharacters. The following pairs are more or less equivalent:
open(FOO, "|tr '[a-z]' '[A-Z]'");
open(FOO, "|-") || exec 'tr', '[a-z]', '[A-Z]';
open(FOO, "cat -n '$file'|");
open(FOO, "-|") || exec 'cat', '-n', $file;
See the section on "Safe Pipe Opens" in the perlipc manpage for
more examples of this.
NOTE: On any operation that may do a fork, any unflushed buffers
remain unflushed in both processes, which means you may need to
set `$|' to avoid duplicate output. On systems that support a
close-on-exec flag on files, the flag will be set for the newly
opened file descriptor as determined by the value of $^F. See
the section on "$^F" in the perlvar manpage.
Closing any piped filehandle causes the parent process to wait
for the child to finish, and returns the status value in `$?'.
The filename passed to open will have leading and trailing
whitespace deleted, and the normal redirection characters
honored. This property, known as "magic open", can often be used
to good effect. A user could specify a filename of "rsh cat file
|", or you could change certain filenames as needed:
$filename =~ s/(.*\.gz)\s*$/gzip -dc < $1|/;
open(FH, $filename) or die "Can't open $filename: $!";
However, to open a file with arbitrary weird characters in it,
it's necessary to protect any leading and trailing whitespace:
$file =~ s#^(\s)#./$1#;
open(FOO, "< $file\0");
If you want a "real" C `open' (see the open(2) manpage on your
system), then you should use the `sysopen' function, which
involves no such magic. This is another way to protect your
filenames from interpretation. For example:
use IO::Handle;
sysopen(HANDLE, $path, O_RDWR|O_CREAT|O_EXCL)
or die "sysopen $path: $!";
$oldfh = select(HANDLE); $| = 1; select($oldfh);
print HANDLE "stuff $$\n");
seek(HANDLE, 0, 0);
print "File contains: ", <HANDLE>;
Using the constructor from the `IO::Handle' package (or one of
its subclasses, such as `IO::File' or `IO::Socket'), you can
generate anonymous filehandles that have the scope of whatever
variables hold references to them, and automatically close
whenever and however you leave that scope:
use IO::File;
#...
sub read_myfile_munged {
my $ALL = shift;
my $handle = new IO::File;
open($handle, "myfile") or die "myfile: $!";
$first = <$handle>
or return (); # Automatically closed here.
mung $first or die "mung failed"; # Or here.
return $first, <$handle> if $ALL; # Or here.
$first; # Or here.
}
See the "seek" entry in the perlfunc manpage for some details
about mixing reading and writing.
======================================================================
Was that too brief? Here, try this. It's the perlopentut manpage
that comes with Perl as of very recent releases:
======================================================================
NAME
perlopentut - tutorial on opening things in Perl
DESCRIPTION
Perl has two simple, built-in ways to open files: the shell way
for convenience, and the C way for precision. The choice is
yours.
Open ` la shell
Perl's `open' function was designed to mimic the way command-
line redirection in the shell works. Here are some basic
examples from the shell:
$ myprogram file1 file2 file3
$ myprogram < inputfile
$ myprogram > outputfile
$ myprogram >> outputfile
$ myprogram | otherprogram
$ otherprogram | myprogram
And here are some more advanced examples:
$ otherprogram | myprogram f1 - f2
$ otherprogram 2>&1 | myprogram -
$ myprogram <&3
$ myprogram >&4
Programmers accustomed to constructs like those above can take
comfort in learning that Perl directly supports these familiar
constructs using virtually the same syntax as the shell.
Simple Opens
The `open' function takes two arguments: the first is a
filehandle, and the second is a single string comprising both
what to open and how to open it. `open' returns true when it
works, and when it fails, returns a false value and sets the
special variable $! to reflect the system error. If the
filehandle was previously opened, it will be implicitly closed
first.
For example:
open(INFO, "datafile") || die("can't open datafile: $!");
open(INFO, "< datafile") || die("can't open datafile: $!");
open(RESULTS,"> runstats") || die("can't open runstats: $!");
open(LOG, ">> logfile ") || die("can't open logfile: $!");
If you prefer the low-punctuation version, you could write that
this way:
open INFO, "< datafile" or die "can't open datafile: $!";
open RESULTS,"> runstats" or die "can't open runstats: $!";
open LOG, ">> logfile " or die "can't open logfile: $!";
A few things to notice. First, the leading less-than is
optional. If omitted, Perl assumes that you want to open the
file for reading.
The other important thing to notice is that, just as in the
shell, any white space before or after the filename is ignored.
This is good, because you wouldn't want these to do different
things:
open INFO, "<datafile"
open INFO, "< datafile"
open INFO, "< datafile"
Ignoring surround whitespace also helps for when you read a
filename in from a different file, and forget to trim it before
opening:
$filename = <INFO>; # oops, \n still there
open(EXTRA, "< $filename") || die "can't open $filename: $!";
This is not a bug, but a feature. Because `open' mimics the
shell in its style of using redirection arrows to specify how to
open the file, it also does so with respect to extra white space
around the filename itself as well. For accessing files with
naughty names, see the section on "Dispelling the Dweomer".
Pipe Opens
In C, when you want to open a file using the standard I/O
library, you use the `fopen' function, but when opening a pipe,
you use the `popen' function. But in the shell, you just use a
different redirection character. That's also the case for Perl.
The `open' call remains the same--just its argument differs.
If the leading character is a pipe symbol, C<open) starts up a
new command and open a write-only filehandle leading into that
command. This lets you write into that handle and have what you
write show up on that command's standard input. For example:
open(PRINTER, "| lpr -Plp1") || die "cannot fork: $!";
print PRINTER "stuff\n";
close(PRINTER) || die "can't close lpr: $!";
If the trailing character is a pipe, you start up a new command
and open a read-only filehandle leading out of that command.
This lets whatever that command writes to its standard output
show up on your handle for reading. For example:
open(NET, "netstat -i -n |") || die "cannot fork: $!";
while (<NET>) { } # do something with input
close(NET) || die "can't close netstat: $!";
What happens if you try to open a pipe to or from a non-existent
command? In most systems, such an `open' will not return an
error. That's because in the traditional `fork'/`exec' model,
running the other program happens only in the forked child
process, which means that the failed `exec' can't be reflected
in the return value of `open'. Only a failed `fork' shows up
there. See the section on "Why doesn't open() return an error
when a pipe open fails?" in the perlfaq8 manpage to see how to
cope with this. There's also an explanation in the perlipc
manpage.
If you would like to open a bidirectional pipe, the IPC::Open2
library will handle this for you. Check out the section on
"Bidirectional Communication with Another Process" in the
perlipc manpage
The Minus File
Again following the lead of the standard shell utilities, Perl's
`open' function treats a file whose name is a single minus, "-",
in a special way. If you open minus for reading, it really means
to access the standard input. If you open minus for writing, it
really means to access the standard output.
If minus can be used as the default input or default output?
What happens if you open a pipe into or out of minus? What's the
default command it would run? The same script as you're current
running! This is actually a stealth `fork' hidden inside an
`open' call. See the section on "Safe Pipe Opens" in the perlipc
manpage for details.
Mixing Reads and Writes
It is possible to specify both read and write access. All you do
is add a "+" symbol in front of the redirection. But as in the
shell, using a less-than on a file never creates a new file; it
only opens an existing one. On the other hand, using a greater-
than always clobbers (truncates to zero length) an existing
file, or creates a brand-new one if there isn't an old one.
Adding a "+" for read-write doesn't affect whether it only works
on existing files or always clobbers existing ones.
open(WTMP, "+< /usr/adm/wtmp")
|| die "can't open /usr/adm/wtmp: $!";
open(SCREEN, "+> /tmp/lkscreen")
|| die "can't open /tmp/lkscreen: $!";
open(LOGFILE, "+>> /tmp/applog"
|| die "can't open /tmp/applog: $!";
The first one won't create a new file, and the second one will
always clobber an old one. The third one will create a new file
if necessary and not clobber an old one, and it will allow you
to read at any point in the file, but all writes will always go
to the end. In short, the first case is substantially more
common than the second and third cases, which are almost always
wrong. (If you know C, the plus in Perl's `open' is historically
derived from the one in C's fopen(3S), which it ultimately
calls.)
In fact, when it comes to updating a file, unless you're working
on a binary file as in the WTMP case above, you probably don't
want to use this approach for updating. Instead, Perl's -i flag
comes to the rescue. The following command takes all the C, C++,
or yacc source or header files and changes all their foo's to
bar's, leaving the old version in the original file name with a
".orig" tacked on the end:
$ perl -i.orig -pe 's/\bfoo\b/bar/g' *.[Cchy]
This is a short cut for some renaming games that are really the
best way to update textfiles. See the second question in the
perlfaq5 manpage for more details.
Filters
One of the most common uses for `open' is one you never even
notice. When you process the ARGV filehandle using `<ARGV>',
Perl actually does an implicit open on each file in @ARGV. Thus
a program called like this:
$ myprogram file1 file2 file3
Can have all its files opened and processed one at a time using
a construct no more complex than:
while (<>) {
# do something with $_
}
If @ARGV is empty when the loop first begins, Perl pretends
you've opened up minus, that is, the standard input. In fact,
$ARGV, the currently open file during `<ARGV>' processing, is
even set to "-" in these circumstances.
You are welcome to pre-process your @ARGV before starting the
loop to make sure it's to your liking. One reason to do this
might be to remove command options beginning with a minus. While
you can always roll the simple ones by hand, the Getopts modules
are good for this.
use Getopt::Std;
# -v, -D, -o ARG, sets $opt_v, $opt_D, $opt_o
getopts("vDo:");
# -v, -D, -o ARG, sets $args{v}, $args{D}, $args{o}
getopts("vDo:", \%args);
Or the standard Getopt::Long module to permit named arguments:
use Getopt::Long;
GetOptions( "verbose" => \$verbose, # --verbose
"Debug" => \$debug, # --Debug
"output=s" => \$output );
# --output=somestring or --output somestring
Another reason for preprocessing arguments is to make an empty
argument list default to all files:
@ARGV = glob("*") unless @ARGV;
You could even filter out all but plain, text files. This is a
bit silent, of course, and you might prefer to mention them on
the way.
@ARGV = grep { -f && -T } @ARGV;
If you're using the -n or -p command-line options, you should
put changes to @ARGV in a `BEGIN{}' block.
Remember that a normal `open' has special properties, in that it
might call fopen(3S) or it might called popen(3S), depending on
what its argument looks like; that's why it's sometimes called
"magic open". Here's an example:
$pwdinfo = `domainname` =~ /^(\(none\))?$/
? '< /etc/passwd'
: 'ypcat passwd |';
open(PWD, $pwdinfo)
or die "can't open $pwdinfo: $!";
This sort of thing also comes into play in filter processing.
Because `<ARGV>' processing employs the normal, shell-style Perl
`open', it respects all the special things we've already seen:
$ myprogram f1 "cmd1|" - f2 "cmd2|" f3 < tmpfile
That program will read from the file f1, the process cmd1,
standard input (tmpfile in this case), the f2 file, the cmd2
command, and finally the f3 file.
Yes, this also means that if you have a file named "-" (and so
on) in your directory, that they won't be processed as literal
files by `open'. You'll need to pass them as "./-" much as you
would for the *rm* program. Or you could use `sysopen' as
described below.
One of the more interesting applications is to change files of a
certain name into pipes. For example, to autoprocess gzipped or
compressed files by decompressing them with *gzip*:
@ARGV = map { /^\.(gz|Z)$/ ? "gzip -dc $_ |" : $_ } @ARGV;
Or, if you have the *GET* program installed from LWP, you can
fetch URLs before processing them:
@ARGV = map { m#^\w+://# ? "GET $_ |" : $_ } @ARGV;
It's not for nothing that this is called magic `<ARGV>'. Pretty
nifty, eh?
Open ` la C
If you want the convenience of the shell, then Perl's `open' is
definitely the way to go. On the other hand, if you want finer
precision than C's simplistic fopen(3S) provides, then you
should look to Perl's `sysopen', which is a direct hook into the
open(2) system call. That does mean it's a bit more involved,
but that's the price of precision.
`sysopen' takes 3 (or 4) arguments.
sysopen HANDLE, PATH, FLAGS, [MASK]
The HANDLE argument is a filehandle just as with `open'. The
PATH is a literal path, one that doesn't pay attention to any
greater-thans or less-thans or pipes or minuses, nor ignore
white space. If it's there, it's part of the path. The FLAGS
argument contains one or more values derived from the Fcntl
module that have been or'd together using the bitwise "|"
operator. The final argument, the MASK, is optional; if present,
it is combined with the user's current umask for the creation
mode of the file. You should usually omit this.
Although the traditional values of read-only, write-only, and
read-write are 0, 1, and 2 respectively, this is known not to
hold true on some systems. Instead, it's best to load in the
appropriate constants first from the Fcntl module, which
supplies the following standard flags:
O_RDONLY Read only
O_WRONLY Write only
O_RDWR Read and write
O_CREAT Create the file if it doesn't exist
O_EXCL Fail if the file already exists
O_APPEND Append to the file
O_TRUNC Truncate the file
O_NONBLOCK Non-blocking access
Less common flags that are sometimes available on some operating
systems include `O_BINARY', `O_TEXT', `O_SHLOCK', `O_EXLOCK',
`O_DEFER', `O_SYNC', `O_ASYNC', `O_DSYNC', `O_RSYNC',
`O_NOCTTY', `O_NDELAY' and `O_LARGEFILE'. Consult your open(2)
manpage or its local equivalent for details.
Here's how to use `sysopen' to emulate the simple `open' calls
we had before. We'll omit the `|| die $!' checks for clarity,
but make sure you always check the return values in real code.
These aren't quite the same, since `open' will trim leading and
trailing white space, but you'll get the idea:
To open a file for reading:
open(FH, "< $path");
sysopen(FH, $path, O_RDONLY);
To open a file for writing, creating a new file if needed or
else truncating an old file:
open(FH, "> $path");
sysopen(FH, $path, O_WRONLY | O_TRUNC | O_CREAT);
To open a file for appending, creating one if necessary:
open(FH, ">> $path");
sysopen(FH, $path, O_WRONLY | O_APPEND | O_CREAT);
To open a file for update, where the file must already exist:
open(FH, "+< $path");
sysopen(FH, $path, O_RDWR);
And here are things you can do with `sysopen' that you cannot do
with a regular `open'. As you see, it's just a matter of
controlling the flags in the third argument.
To open a file for writing, creating a new file which must not
previously exist:
sysopen(FH, $path, O_WRONLY | O_EXCL | O_CREAT);
To open a file for appending, where that file must already
exist:
sysopen(FH, $path, O_WRONLY | O_APPEND);
To open a file for update, creating a new file if necessary:
sysopen(FH, $path, O_RDWR | O_CREAT);
To open a file for update, where that file must not already
exist:
sysopen(FH, $path, O_RDWR | O_EXCL | O_CREAT);
To open a file without blocking, creating one if necessary:
sysopen(FH, $path, O_WRONLY | O_NONBLOCK | O_CREAT);
Permissions ` la mode
If you omit the MASK argument to `sysopen', Perl uses the octal
value 0666. The normal MASK to use for executables and
directories should be 0777, and for anything else, 0666.
Why so permissive? Well, it isn't really. The MASK will be
modified by your process's current `umask'. A umask is a number
representing *disabled* permissions bits; that is, bits that
will not be turned on in the created files' permissions field.
For example, if your `umask' were 027, then the 020 part would
disable the group from writing, and the 007 part would disable
others from reading, writing, or executing. Under these
conditions, passing `sysopen' 0666 would create a file with mode
0640, since `0666 &~ 027' is 0640.
You should seldom use the MASK argument to `sysopen()'. That
takes away the user's freedom to choose what permission new
files will have. Denying choice is almost always a bad thing.
One exception would be for cases where sensitive or private data
is being stored, such as with mail folders, cookie files, and
internal temporary files.
Obscure Open Tricks
Re-Opening Files (dups)
Sometimes you already have a filehandle open, and want to make
another handle that's a duplicate of the first one. In the
shell, we place an ampersand in front of a file descriptor
number when doing redirections. For example, `2>&1' makes
descriptor 2 (that's STDERR in Perl) be redirected into
descriptor 1 (which is usually Perl's STDOUT). The same is
essentially true in Perl: a filename that begins with an
ampersand is treated instead as a file descriptor if a number,
or as a filehandle if a string.
open(SAVEOUT, ">&SAVEERR") || die "couldn't dup SAVEERR: $!";
open(MHCONTEXT, "<&4") || die "couldn't dup fd4: $!";
That means that if a function is expecting a filename, but you
don't want to give it a filename because you already have the
file open, you can just pass the filehandle with a leading
ampersand. It's best to use a fully qualified handle though,
just in case the function happens to be in a different package:
somefunction("&main::LOGFILE");
This way if somefunction() is planning on opening its argument,
it can just use the already opened handle. This differs from
passing a handle, because with a handle, you don't open the
file. Here you have something you can pass to open.
If you have one of those tricky, newfangled I/O objects that the
C++ folks are raving about, then this doesn't work because those
aren't a proper filehandle in the native Perl sense. You'll have
to use fileno() to pull out the proper descriptor number,
assuming you can:
use IO::Socket;
$handle = IO::Socket::INET->new("www.perl.com:80");
$fd = $handle->fileno;
somefunction("&$fd"); # not an indirect function call
It can be easier (and certainly will be faster) just to use real
filehandles though:
use IO::Socket;
local *REMOTE = IO::Socket::INET->new("www.perl.com:80");
die "can't connect" unless defined(fileno(REMOTE));
somefunction("&main::REMOTE");
If the filehandle or descriptor number is preceded not just with
a simple "&" but rather with a "&=" combination, then Perl will
not create a completely new descriptor opened to the same place
using the dup(2) system call. Instead, it will just make
something of an alias to the existing one using the fdopen(3S)
library call This is slightly more parsimonious of systems
resources, although this is less a concern these days. Here's an
example of that:
$fd = $ENV{"MHCONTEXTFD"};
open(MHCONTEXT, "<&=$fd") or die "couldn't fdopen $fd: $!";
If you're using magic `<ARGV>', you could even pass in as a
command line argument in @ARGV something like
`"<&=$MHCONTEXTFD"', but we've never seen anyone actually do
this.
Dispelling the Dweomer
Perl is more of a DWIMmer language than something like Java--
where DWIM is an acronym for "do what I mean". But this
principle sometimes leads to more hidden magic than one knows
what to do with. In this way, Perl is also filled with
*dweomer*, an obscure word meaning an enchantment. Sometimes,
Perl's DWIMmer is just too much like dweomer for comfort.
If magic `open' is a bit too magical for you, you don't have to
turn to `sysopen'. To open a file with arbitrary weird
characters in it, it's necessary to protect any leading and
trailing whitespace. Leading whitespace is protected by
inserting a `"./"' in front of a filename that starts with
whitespace. Trailing whitespace is protected by appending an
ASCII NUL byte (`"\0"') at the end off the string.
$file =~ s#^(\s)#./$1#;
open(FH, "< $file\0") || die "can't open $file: $!";
This assumes, of course, that your system considers dot the
current working directory, slash the directory separator, and
disallows ASCII NULs within a valid filename. Most systems
follow these conventions, including all POSIX systems as well as
proprietary Microsoft systems. The only vaguely popular system
that doesn't work this way is the proprietary Macintosh system,
which uses a colon where the rest of us use a slash. Maybe
`sysopen' isn't such a bad idea after all.
If you want to use `<ARGV>' processing in a totally boring and
non-magical way, you could do this first:
# "Sam sat on the ground and put his head in his hands.
# 'I wish I had never come here, and I don't want to see
# no more magic,' he said, and fell silent."
for (@ARGV) {
s#^([^./])#./$1#;
$_ .= "\0";
}
while (<>) {
# now process $_
}
But be warned that users will not appreciate being unable to use
"-" to mean standard input, per the standard convention.
Paths as Opens
You've probably noticed how Perl's `warn' and `die' functions
can produce messages like:
Some warning at scriptname line 29, <FH> chunk 7.
That's because you opened a filehandle FH, and had read in seven
records from it. But what was the name of the file, not the
handle?
If you aren't running with `strict refs', or if you've turn them
off temporarily, then all you have to do is this:
open($path, "< $path") || die "can't open $path: $!";
while (<$path>) {
# whatever
}
Since you're using the pathname of the file as its handle,
you'll get warnings more like
Some warning at scriptname line 29, </etc/motd> chunk 7.
Single Argument Open
Remember how we said that Perl's open took two arguments? That
was a passive prevarication. You see, it can also take just one
argument. If and only if the variable is a global variable, not
a lexical, you can pass `open' just one argument, the
filehandle, and it will get the path from the global scalar
variable of the same name.
$FILE = "/etc/motd";
open FILE or die "can't open $FILE: $!";
while (<FILE>) {
# whatever
}
Why is this here? Someone has to cater to the hysterical
porpoises. It's something that's been in Perl since the very
beginning, if not before.
Playing with STDIN and STDOUT
One clever move with STDOUT is to explicitly close it when
you're done with the program.
END { close(STDOUT) || die "can't close stdout: $!" }
If you don't do this, and your program fills up the disk
partition due to a command line redirection, it won't report the
error exit with a failure status.
You don't have to accept the STDIN and STDOUT you were given.
You are welcome to reopen them if you'd like.
open(STDIN, "< datafile")
|| die "can't open datafile: $!";
open(STDOUT, "> output")
|| die "can't open output: $!";
And then these can be read directly or passed on to
subprocesses. This makes it look as though the program were
initially invoked with those redirections from the command line.
It's probably more interesting to connect these to pipes. For
example:
$pager = $ENV{PAGER} || "(less || more)";
open(STDOUT, "| $pager")
|| die "can't fork a pager: $!";
This makes it appear as though your program were called with its
stdout already piped into your pager. You can also use this kind
of thing in conjunction with an implicit fork to yourself. You
might do this if you would rather handle the post processing in
your own program, just in a different process:
head(100);
while (<>) {
print;
}
sub head {
my $lines = shift || 20;
return unless $pid = open(STDOUT, "|-");
die "cannot fork: $!" unless defined $pid;
while (<STDIN>) {
print;
last if --$lines < 0;
}
exit;
}
This technique can be applied to repeatedly push as many filters
on your output stream as you wish.
Other I/O Issues
These topics aren't really arguments related to `open' or
`sysopen', but they do affect what you do with your open files.
Opening Non-File Files
When is a file not a file? Well, you could say when it exists
but isn't a plain file. We'll check whether it's a symbolic link
first, just in case.
if (-l $file || ! -f _) {
print "$file is not a plain file\n";
}
What other kinds of files are there than, well, files?
Directories, symbolic links, named pipes, Unix-domain sockets,
and block and character devices. Those are all files, too--just
not *plain* files. This isn't the same issue as being a text
file. Not all text files are plain files. Not all plain files
are textfiles. That's why there are separate `-f' and `-T' file
tests.
To open a directory, you should use the `opendir' function, then
process it with `readdir', carefully restoring the directory
name if necessary:
opendir(DIR, $dirname) or die "can't opendir $dirname: $!";
while (defined($file = readdir(DIR))) {
# do something with "$dirname/$file"
}
closedir(DIR);
If you want to process directories recursively, it's better to
use the File::Find module. For example, this prints out all
files recursively, add adds a slash to their names if the file
is a directory.
@ARGV = qw(.) unless @ARGV;
use File::Find;
find sub { print $File::Find::name, -d && '/', "\n" }, @ARGV;
This finds all bogus symbolic links beneath a particular
directory:
find sub { print "$File::Find::name\n" if -l && !-e }, $dir;
As you see, with symbolic links, you can just pretend that it is
what it points to. Or, if you want to know *what* it points to,
then `readlink' is called for:
if (-l $file) {
if (defined($whither = readlink($file))) {
print "$file points to $whither\n";
} else {
print "$file points nowhere: $!\n";
}
}
Named pipes are a different matter. You pretend they're regular
files, but their opens will normally block until there is both a
reader and a writer. You can read more about them in the section
on "Named Pipes" in the perlipc manpage. Unix-domain sockets are
rather different beasts as well; they're described in the
section on "Unix-Domain TCP Clients and Servers" in the perlipc
manpage.
When it comes to opening devices, it can be easy and it can
tricky. We'll assume that if you're opening up a block device,
you know what you're doing. The character devices are more
interesting. These are typically used for modems, mice, and some
kinds of printers. This is described in the section on "How do I
read and write the serial port?" in the perlfaq8 manpage It's
often enough to open them carefully:
sysopen(TTYIN, "/dev/ttyS1", O_RDWR | O_NDELAY | O_NOCTTY)
# (O_NOCTTY no longer needed on POSIX systems)
or die "can't open /dev/ttyS1: $!";
open(TTYOUT, "+>&TTYIN")
or die "can't dup TTYIN: $!";
$ofh = select(TTYOUT); $| = 1; select($ofh);
print TTYOUT "+++at\015";
$answer = <TTYIN>;
With descriptors that you haven't opened using `sysopen', such
as a socket, you can set them to be non-blocking using `fcntl':
use Fcntl;
fcntl(Connection, F_SETFL, O_NONBLOCK)
or die "can't set non blocking: $!";
Rather than losing yourself in a morass of twisting, turning
`ioctl's, all dissimilar, if you're going to manipulate ttys,
it's best to make calls out to the stty(1) program if you have
it, or else use the portable POSIX interface. To figure this all
out, you'll need to read the termios(3) manpage, which describes
the POSIX interface to tty devices, and then the POSIX manpage,
which describes Perl's interface to POSIX. There are also some
high-level modules on CPAN that can help you with these games.
Check out Term::ReadKey and Term::ReadLine.
What else can you open? To open a connection using sockets, you
won't use one of Perl's two open functions. See the section on
"Sockets: Client/Server Communication" in the perlipc manpage
for that. Here's an example. Once you have it, you can use FH as
a bidirectional filehandle.
use IO::Socket;
local *FH = IO::Socket::INET->new("www.perl.com:80");
For opening up a URL, the LWP modules from CPAN are just what
the doctor ordered. There's no filehandle interface, but it's
still easy to get the contents of a document:
use LWP::Simple;
$doc = get('http://www.sn.no/libwww-perl/');
Binary Files
On certain legacy systems with what could charitably be called
terminally convoluted (some would say broken) I/O models, a file
isn't a file--at least, not with respect to the C standard I/O
library. On these old systems whose libraries (but not kernels)
distinguish between text and binary streams, to get files to
behave properly you'll have to bend over backwards to avoid
nasty problems. On such infelicitous systems, sockets and pipes
are already opened in binary mode, and there is currently no way
to turn that off. With files, you have more options.
Another option is to use the `binmode' function on the
appropriate handles before doing regular I/O on them:
binmode(STDIN);
binmode(STDOUT);
while (<STDIN>) { print }
Passing `sysopen' a non-standard flag option will also open the
file in binary mode on those systems that support it. This is
the equivalent of opening the file normally, then calling
`binmode'ing on the handle.
sysopen(BINDAT, "records.data", O_RDWR | O_BINARY)
|| die "can't open records.data: $!";
Now you can use `read' and `print' on that handle without
worrying about the system non-standard I/O library breaking your
data. It's not a pretty picture, but then, legacy systems seldom
are. CP/M will be with us until the end of days, and after.
On systems with exotic I/O systems, it turns out that,
astonishingly enough, even unbuffered I/O using `sysread' and
`syswrite' might do sneaky data mutilation behind your back.
while (sysread(WHENCE, $buf, 1024)) {
syswrite(WHITHER, $buf, length($buf));
}
Depending on the vicissitudes of your runtime system, even these
calls may need `binmode' or `O_BINARY' first. Systems known to
be free of such difficulties include Unix, the Mac OS, Plan9,
and Inferno.
File Locking
In a multitasking environment, you may need to be careful not to
collide with other processes who want to do I/O on the same
files as others are working on. You'll often need shared or
exclusive locks on files for reading and writing respectively.
You might just pretend that only exclusive locks exist.
Never use the existence of a file `-e $file' as a locking
indication, because there is a race condition between the test
for the existence of the file and its creation. Atomicity is
critical.
Perl's most portable locking interface is via the `flock'
function, whose simplicity is emulated on systems that don't
directly support it, such as SysV or WindowsNT. The underlying
semantics may affect how it all works, so you should learn how
`flock' is implemented on your system's port of Perl.
File locking *does not* lock out another process that would like
to do I/O. A file lock only locks out others trying to get a
lock, not processes trying to do I/O. Because locks are
advisory, if one process uses locking and another doesn't, all
bets are off.
By default, the `flock' call will block until a lock is granted.
A request for a shared lock will be granted as soon as there is
no exclusive locker. A request for a exclusive lock will be
granted as soon as there is no locker of any kind. Locks are on
file descriptors, not file names. You can't lock a file until
you open it, and you can't hold on to a lock once the file has
been closed.
Here's how to get a blocking shared lock on a file, typically
used for reading:
use 5.004;
use Fcntl qw(:DEFAULT :flock);
open(FH, "< filename") or die "can't open filename: $!";
flock(FH, LOCK_SH) or die "can't lock filename: $!";
# now read from FH
You can get a non-blocking lock by using `LOCK_NB'.
flock(FH, LOCK_SH | LOCK_NB)
or die "can't lock filename: $!";
This can be useful for producing more user-friendly behaviour by
warning if you're going to be blocking:
use 5.004;
use Fcntl qw(:DEFAULT :flock);
open(FH, "< filename") or die "can't open filename: $!";
unless (flock(FH, LOCK_SH | LOCK_NB)) {
$| = 1;
print "Waiting for lock...";
flock(FH, LOCK_SH) or die "can't lock filename: $!";
print "got it.\n"
}
# now read from FH
To get an exclusive lock, typically used for writing, you have
to be careful. We `sysopen' the file so it can be locked before
it gets emptied. You can get a nonblocking version using
`LOCK_EX | LOCK_NB'.
use 5.004;
use Fcntl qw(:DEFAULT :flock);
sysopen(FH, "filename", O_WRONLY | O_CREAT)
or die "can't open filename: $!";
flock(FH, LOCK_EX)
or die "can't lock filename: $!";
truncate(FH, 0)
or die "can't truncate filename: $!";
# now write to FH
Finally, due to the uncounted millions who cannot be dissuaded
from wasting cycles on useless vanity devices called hit
counters, here's how to increment a number in a file safely:
use Fcntl qw(:DEFAULT :flock);
sysopen(FH, "numfile", O_RDWR | O_CREAT)
or die "can't open numfile: $!";
# autoflush FH
$ofh = select(FH); $| = 1; select ($ofh);
flock(FH, LOCK_EX)
or die "can't write-lock numfile: $!";
$num = <FH> || 0;
seek(FH, 0, 0)
or die "can't rewind numfile : $!";
print FH $num+1, "\n"
or die "can't write numfile: $!";
truncate(FH, tell(FH))
or die "can't truncate numfile: $!";
close(FH)
or die "can't close numfile: $!";
SEE ALSO
The `open' and `sysopen' function in perlfunc(1); the standard
open(2), dup(2), fopen(3), and fdopen(3) manpages; the POSIX
documentation.
AUTHOR and COPYRIGHT
Copyright 1998 Tom Christiansen.
When included as part of the Standard Version of Perl, or as
part of its complete documentation whether printed or otherwise,
this work may be distributed only under the terms of Perl's
Artistic License. Any distribution of this file or derivatives
thereof outside of that package require that special
arrangements be made with copyright holder.
Irrespective of its distribution, all code examples in these
files are hereby placed into the public domain. You are
permitted and encouraged to use this code in your own programs
for fun or for profit as you see fit. A simple comment in the
code giving credit would be courteous but is not required.
HISTORY
First release: Sat Jan 9 08:09:11 MST 1999
Do you not have the open(2) manpage? Here's the one from
my system. Note very carefully what the pathname is.
======================================================================
OPEN(2) OpenBSD Programmer's Manual OPEN(2)
NAME
open - open or create a file for reading or writing
SYNOPSIS
#include <fcntl.h>
int
open(const char *path, int flags, mode_t mode);
DESCRIPTION
The file name specified by path is opened for reading and/or writing as
specified by the argument flags and the file descriptor returned to the
calling process. The flags argument may indicate the file is to be cre-
ated if it does not exist (by specifying the O_CREAT flag), in which case
the file is created with mode mode as described in chmod(2) and modified
by the process' umask value (see umask(2)).
The flags specified are formed by OR'ing the following values
O_RDONLY open for reading only
O_WRONLY open for writing only
O_RDWR open for reading and writing
O_NONBLOCK do not block on open or for data to become available
O_APPEND append on each write
O_CREAT create file if it does not exist
O_TRUNC truncate size to 0
O_EXCL error if create and file exists
O_SHLOCK atomically obtain a shared lock
O_EXLOCK atomically obtain an exclusive lock
Opening a file with O_APPEND set causes each write on the file to be ap-
pended to the end. If O_TRUNC is specified and the file exists, the file
is truncated to zero length. If O_EXCL is set with O_CREAT and the file
already exists, open() returns an error. This may be used to implement a
simple exclusive access locking mechanism. If O_EXCL is set and the last
component of the pathname is a symbolic link, open() will fail even if
the symbolic link points to a non-existent name. If the O_NONBLOCK flag
is specified, do not wait for the device or file to be ready or avail-
able. If the open() call would result in the process being blocked for
some reason (e.g., waiting for carrier on a dialup line), open() returns
immediately. This flag also has the effect of making all subsequent I/O
on the open file non-blocking.
When opening a file, a lock with flock(2) semantics can be obtained by
setting O_SHLOCK for a shared lock, or O_EXLOCK for an exclusive lock.
If creating a file with O_CREAT, the request for the lock will never fail
(provided that the underlying filesystem supports locking).
If successful, open() returns a non-negative integer, termed a file de-
scriptor. It returns -1 on failure. The file pointer used to mark the
current position within the file is set to the beginning of the file.
When a new file is created it is given the group of the directory which
contains it.
The new descriptor is set to remain open across execve(2) system calls;
see close(2) and fcntl(2).
The system imposes a limit on the number of file descriptors open simul-
taneously by one process. getdtablesize(3) returns the current system
limit.
ERRORS
The named file is opened unless:
[ENOTDIR] A component of the path prefix is not a directory.
[ENAMETOOLONG]
A component of a pathname exceeded {NAME_MAX} characters,
or an entire path name exceeded {PATH_MAX} characters.
[ENOENT] O_CREAT is not set and the named file does not exist.
[ENOENT] A component of the path name that must exist does not ex-
ist.
[EACCES] Search permission is denied for a component of the path
prefix.
[EACCES] The required permissions (for reading and/or writing) are
denied for the given flags.
[EACCES] O_CREAT is specified, the file does not exist, and the di-
rectory in which it is to be created does not permit writ-
ing.
[ELOOP] Too many symbolic links were encountered in translating the
pathname.
[EISDIR] The named file is a directory, and the arguments specify it
is to be opened for writing.
[EINVAL] The flags specified for opening the file are not valid.
[EROFS] The named file resides on a read-only file system, and the
file is to be modified.
[EMFILE] The process has already reached its limit for open file de-
scriptors.
[ENFILE] The system file table is full.
[ENXIO] The named file is a character special or block special
file, and the device associated with this special file does
not exist.
[EINTR] The open() operation was interrupted by a signal.
[EOPNOTSUPP] O_SHLOCK or O_EXLOCK is specified but the underlying
filesystem does not support locking.
[ENOSPC] O_CREAT is specified, the file does not exist, and the di-
rectory in which the entry for the new file is being placed
cannot be extended because there is no space left on the
file system containing the directory.
[ENOSPC] O_CREAT is specified, the file does not exist, and there
are no free inodes on the file system on which the file is
being created.
[EDQUOT] O_CREAT is specified, the file does not exist, and the di-
rectory in which the entry for the new file is being placed
cannot be extended because the user's quota of disk blocks
on the file system containing the directory has been ex-
hausted.
[EDQUOT] O_CREAT is specified, the file does not exist, and the us-
er's quota of inodes on the file system on which the file
is being created has been exhausted.
[EIO] An I/O error occurred while making the directory entry or
allocating the inode for O_CREAT.
[ETXTBSY] The file is a pure procedure (shared text) file that is be-
ing executed and the open() call requests write access.
[EFAULT] path points outside the process's allocated address space.
[EEXIST] O_CREAT and O_EXCL were specified and the file exists.
[EOPNOTSUPP] An attempt was made to open a socket (not currently imple-
mented).
[EAGAIN] O_NONBLOCK and either O_EXLOCK or O_SHLOCK are set and the
file is already locked.
SEE ALSO
chmod(2), close(2), dup(2), flock(2), lseek(2), read(2), umask(2),
write(2), getdtablesize(3)
HISTORY
An open() function call appeared in Version 6 AT&T UNIX.
OpenBSD 2.5 November 16, 1993 3
======================================================================
I'll be kind and not make you read through namei.c; suffice it
to say, open paths are not what you think they are. This is
an exerpt from the perlfaq9 manpage, standard with every release
of Perl for the last couple of years:
======================================================================
NAME
perlfaq9 - Networking ($Revision: 1.26 $, $Date: 1999/05/23
16:08:30 $)
DESCRIPTION
This section deals with questions related to networking, the
internet, and a few on the web.
[...deletia...]
How do I fetch an HTML file?
One approach, if you have the lynx text-based HTML browser
installed on your system, is this:
$html_code = `lynx -source $url`;
$text_data = `lynx -dump $url`;
The libwww-perl (LWP) modules from CPAN provide a more powerful
way to do this. They don't require lynx, but like lynx, can
still work through proxies:
# simplest version
use LWP::Simple;
$content = get($URL);
# or print HTML from a URL
use LWP::Simple;
getprint "http://www.sn.no/libwww-perl/";
# or print ASCII from HTML from a URL
# also need HTML-Tree package from CPAN
use LWP::Simple;
use HTML::Parse;
use HTML::FormatText;
my ($html, $ascii);
$html = get("http://www.perl.com/");
defined $html
or die "Can't fetch HTML from http://www.perl.com/";
$ascii = HTML::FormatText->new->format(parse_html($html));
print $ascii;
How do I automate an HTML form submission?
If you're submitting values using the GET method, create a URL
and encode the form using the `query_form' method:
use LWP::Simple;
use URI::URL;
my $url = url('http://www.perl.com/cgi-bin/cpan_mod');
$url->query_form(module => 'DB_File', readme => 1);
$content = get($url);
If you're using the POST method, create your own user agent and
encode the content appropriately.
use HTTP::Request::Common qw(POST);
use LWP::UserAgent;
$ua = LWP::UserAgent->new();
my $req = POST 'http://www.perl.com/cgi-bin/cpan_mod',
[ module => 'DB_File', readme => 1 ];
$content = $ua->request($req)->as_string;
[...deletia...]
======================================================================
Well, if you've read this far, you not only have your answer, you're
also a lot better informed than you were before. :-) To be honest,
I don't know why nobody has written a more magical version of open
that does what you would like. Larry Wall has expressed that he
wouldn't mind seeing this. Actually, I do know why they haven't.
It's because LWP's architecture does not lend itself to incremental
processing. You might also want to push an html2text post-processing
filter on that. Oh well, might as well just open a pipe from lynx.
H T H + H A N D
o h e a i a
p i l v c y
e s p e e
s
--tom
======================================================================
PS #1: (this was used in the closing)
#!/usr/bin/perl
# rot90 - tchrist@perl.com
$/ = '';
# uncomment for easier to read, but not reversible
#@ARGV = map { "fmt -20 $_ |" } @ARGV;
while ( <> ) {
chomp;
@lines = split /\n/;
$MAXCOLS = -1;
for (@lines) { $MAXCOLS = length if $MAXCOLS < length; }
@vlines = ( " " x @lines ) x $MAXCOLS;
for ( $row = 0; $row < @lines; $row++ ) {
for ( $col = 0; $col < $MAXCOLS; $col++ ) {
$char = ( length($lines[$row]) > $col )
? substr($lines[$row], $col, 1)
: ' ';
substr($vlines[$col], $row, 1) = $char;
}
}
for (@vlines) {
# uncomment for easier to read, but not reversible
s/(.)/$1 /g;
print $_, "\n";
}
print "\n";
}
======================================================================
PS #2: (this was used in the opening)
#!/usr/bin/perl
# boxit - quote paragraphs prettily
# tchrist@perl.com
$/ = '';
while (<>) {
$* = 1;
s/^-- ?$// if eof;
s/^[-+]{2}\w+$// if eof;
next unless split(/\n/);
$max = 0;
$* = 0;
for (@_) {
1 while s/\t+/' 'x (length($&) * 8 - length($`) % 8)/e;
$max = ($max > length) ? $max : length;
}
$edge = "+" . "-" x ($max+2) . "+\n";
print $edge;
for (@_) { printf "| %-${max}s |\n", $_; }
print $edge, "\n";
}
--
"They acquit the vultures but condemn the doves."
- Juvenal
------------------------------
Date: Fri, 04 Jun 1999 15:34:31 -0700
From: David Cassell <cassell@mail.cor.epa.gov>
Subject: Re: Opening a remote file?
Message-Id: <37585477.5D5E55D@mail.cor.epa.gov>
Thurley wrote:
>
> To my surprise, I found that
> "open(DATA,"http://www.somewhere.com/myfiles/data.html");" wouldn't work in
> my CGI.
You shouldn't be surprised. URL's are not a filesystem. They
just provide a notation which *looks* like a single filesystem
to the user. In fact, they require the HTTP protocol to find
and open links. open() works on files [in the unix sense of
the word].
> Does anyone know of an open function which knows what a URL or how
> to open a file on a remote server. I need this as my site is split accross
> servers.
Since you're using M$ Outlook Distress, I'm guessing you're
using win32 platforms. In that case, you probably have
ActiveState Perl installed on them. So you already have
the LWP::Simple module on your system(s). Just read the
docs that come with it (HTML tree if you prefer - it's on
your Start menu) and you'll find a nice set of examples
etc. for your viewing pleasure.
HTH,
David
--
David Cassell, OAO cassell@mail.cor.epa.gov
Senior computing specialist
mathematical statistician
------------------------------
Date: 12 Dec 98 21:33:47 GMT (Last modified)
From: Perl-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Special: Digest Administrivia (Last modified: 12 Dec 98)
Message-Id: <null>
Administrivia:
Well, after 6 months, here's the answer to the quiz: what do we do about
comp.lang.perl.moderated. Answer: nothing.
]From: Russ Allbery <rra@stanford.edu>
]Date: 21 Sep 1998 19:53:43 -0700
]Subject: comp.lang.perl.moderated available via e-mail
]
]It is possible to subscribe to comp.lang.perl.moderated as a mailing list.
]To do so, send mail to majordomo@eyrie.org with "subscribe clpm" in the
]body. Majordomo will then send you instructions on how to confirm your
]subscription. This is provided as a general service for those people who
]cannot receive the newsgroup for whatever reason or who just prefer to
]receive messages via e-mail.
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.misc (and this Digest), send your
article to perl-users@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
The Meta-FAQ, an article containing information about the FAQ, is
available by requesting "send perl-users meta-faq". The real FAQ, as it
appeared last in the newsgroup, can be retrieved with the request "send
perl-users FAQ". Due to their sizes, neither the Meta-FAQ nor the FAQ
are included in the digest.
The "mini-FAQ", which is an updated version of the Meta-FAQ, is
available by requesting "send perl-users mini-faq". It appears twice
weekly in the group, but is not distributed in the digest.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V8 Issue 5892
**************************************