[6298] in Perl-Users-Digest
Perl-Users Digest, Issue: 920 Volume: 7
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Feb 10 03:13:24 1997
Date: Mon, 10 Feb 97 00:00:25 -0800
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Mon, 10 Feb 1997 Volume: 7 Number: 920
Today's topics:
Re: ??? Formating numbers (Dave Thomas)
[Q] C++/perl interface <wade@cs.ualberta.ca>
Re: Advice: Asynchronous Socket Comm. <rick@agenetra.theage.com.au>
Efficient keyword/value lookups using keyword substring <bdg@endpoint.com>
Re: Help is anyone any good with sockets (Phil Gross)
Re: Help please (Tad McClellan)
Re: HELP: How to speed up Perl scripts ? <nick@mail.g440.com>
HTML layout manager like Tk <sfarrell@phaedrus.uchicago.edu>
Re: opening a URL from my perl script (Nathan V. Patwardhan)
Pattern Match Prob (Claudia Ma)
Re: Pattern Match Prob (brian d foy)
Re: Perl vs Korn Shell (Abigail)
Re: POP Mail module. PLEASE HELP! (Shishir Gundavaram)
sgmlstripper not stripping (or how to turn html to text (Phil Gross)
Re: sgmlstripper not stripping (or how to turn html to (Dave Thomas)
Re: Using $ENV{'QUERY_STRING'} (Tad McClellan)
Re: Using $ENV{'QUERY_STRING'} <morpheus+@andrew.cmu.edu>
Digest Administrivia (Last modified: 8 Jan 97) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 9 Feb 1997 23:00:58 GMT
From: dave@fast.thomases.com (Dave Thomas)
Subject: Re: ??? Formating numbers
Message-Id: <slrn5fsll4.b4s.dave@fast.thomases.com>
On 9 Feb 1997 22:56:20 GMT, Daniel L. Kreitz <kreitzd@atlantic.net> wrote:
> How do I get my variables to print out in a special format--specifically,
> currency formats? The output removes the extra zeros (i.e., $3.5 instead of
> $3.50). How do I fix this?
>
(s)printf
--
_________________________________________________________________________
| Dave Thomas - Dave@Thomases.com - Unix and systems consultancy - Dallas |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
------------------------------
Date: 09 Feb 1997 20:56:06 -0700
From: Wade Holst <wade@cs.ualberta.ca>
Subject: [Q] C++/perl interface
Message-Id: <r73ev5qppl.fsf@sunchild.cs.ualberta.ca>
I am having problems finding (or just verifying that what I have is) the
latest and greatest in perl documentation for various problems. I have been
perusing the CPAN sites, and it appears that almost all of the files under
CPAN/doc were last modified in Jan-Feb of 1996. Are these truly the latest
versions?
Some areas on which I need documentation:
- IMPORTANT: The documentation for how perl interacts with C always
mentions that it is also possible to have perl and C++ communicate with
one another. However, I have found only one example of C++ code, and
no explicit discussion of how to link perl and C++ code together.
The 'c_plus_plus' code in CPAN requires a remake of the perl code -
does that mean that C++ is NOT handled without the patch supplied
with c_plus_plus? h2xs seems to generate code that knows about C++
(at least, there are preprocessor directives doing conditional inclusion
based on whether __cplusplus is defined or not) - this seems to imply
that the official "latest" version should work. Very few extensions
are written in C++, and those that are are pre-alpha
(i.e. SGML::SP, Neural, Marpa, etc.).
SUMMARY: I have a very large C++ application with a small number of
interface routines. I want to write XS code that allows me to
invoke these interface routines from perl. How do I do it?
- Although the CPAN/doc/manual/info/perl-info.tar.gz exists, it would be
nice if a robust 'pod2texinfo' script were provided with the distribution
so that info files for *all* modules can be created. Are there any plans
for this? Note that the CPAN/doc/pod2x/pod2texinfo script generates
texinfo files which cause errors during conversion to info format. The
author mentions in the script that it is not yet robust - has anyone done
more?
Summary of my perl5 (5.0 patchlevel 3 subversion 0) configuration:
Platform:
osname=sunos, osver=4.1.4, archname=sun4-sunos
uname='sunos sunchild 4.1.4 1 sun4m '
hint=recommended, useposix=true, d_sigaction=define
Compiler:
cc='gcc', optimize='-O', gccversion=2.7.2
cppflags='-I/usr/local/include -I/usr/gnu/include'
ccflags ='-I/usr/local/include -I/usr/gnu/include'
stdchar='unsigned char', d_stdstdio=define, usevfork=false
voidflags=15, castflags=0, d_casti32=define, d_castneg=define
intsize=4, alignbytes=8, usemymalloc=y, randbits=31
Linker and Libraries:
ld='ld', ldflags =' -L/usr/local/lib -L/usr/gnu/lib'
libpth=/usr/local/lib /usr/gnu/lib /lib /usr/lib /usr/ucblib
libs=-lnsl -ldbm -ldl -lm -lc -lposix
libc=/lib/libc.so.1.9, so=so
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=, ccdlflags=' '
cccdlflags='-fpic', lddlflags='-assert nodefinitions -L/usr/local/lib -L/usr/gnu/lib'
@INC: . /usr/dapp3/grad/wade/root/lib/Perl /usr/dapp3/grad/wade/.r/dnd/src /usr/dapp3/grad/wade/root/lib/perl5 /usr/dapp3/grad/wade/Research/DTF/DTF/Exec /usr/dapp3/grad/wade/Programs/Languages/Perl5/Tk400.200/blib/arch /usr/dapp3/grad/wade/Programs/Languages/Perl5/Tk400.200/blib/lib /usr/dapp3/grad/wade/root/lib/perl5/sun4-sunos/5.003 /usr/dapp3/grad/wade/root/lib/perl5 /usr/dapp3/grad/wade/root/lib/perl5/site_perl/sun4-sunos /usr/dapp3/grad/wade/root/lib/perl5/site_perl .
------------------------------
Date: Mon, 10 Feb 1997 14:56:56 +1100
From: Richard Kirk <rick@agenetra.theage.com.au>
Subject: Re: Advice: Asynchronous Socket Comm.
Message-Id: <32FE9C88.3DCC@agenetra.theage.com.au>
alex@ceara.net wrote:
>
> I am somewhat of a beginner with regards to socket programming, if someone
> could give me some pointers on the following I would greatly appreciate it.
>
> I am trying to implement 2-way socket communication between a Java Applet Client
> and a Perl Server. I am using the server example from the second edition
> of the Camel book p. 350-351.
<Stuff deleted>
O.K. I had this same problem not so long back and managed to figure it
out
from the blu Camel book. I've now written a multiway client server
app that seems to work evertime. I am posting the code here in the
hope that
A) People like yourself may get benefit from it. and
B) People out there who know better may be able to point out the
mistakes I am making.
i.e. No code to handle timeouts etc etc...
I'd be interested to here from anyone will some helpful feedback. Like I
said, this
code works, but I'd like it to be bulletproof next time I use it :)
The Server:
#!/usr/local/bin/perl -w
require 5.003; #Uses perl 5.003 minimum.
use Socket; #Uses sockets.pm for compatability.
$SERVERPORT=8000; #Service port.
$USER="nobody"; #default username.
$NAME="nowhere"; #Default remote hostname.
$LOG="/tmp/server.log"; #Server log file.
setsocket(); #Set up the server socket.
sub logmsg; #Forward declaration.
my $paddr;
my $waitedpid=0;
$SIG{CHLD}=\&REAPER; #Setup signal handler.
while(1)
{
for ( $waitedpid=0 ; ($paddr = accept(Client,Server)) || $waitedpid;
$waitedpid=0,close Client)
{
next if $waitedpid;
my ($port,$iaddr) = sockaddr_in($paddr);
$NAME = gethostbyaddr($iaddr,AF_INET);
my $addr = inet_ntoa($iaddr);
my $logreq;
select(Client); $|=1; select(STDOUT);
handleconnect();
}
}
sub REAPER()
{
$SIG{CHLD}=\&REAPER;
$waitedpid=wait;
}
sub logmsg
{
if (open (LOG,">>$LOG"))
{
my $time=scalar localtime;
#print "$time : @_\n";
print LOG "$time : @_\n";
close (LOG);
}
}
sub handleconnect()
{
my $pid;
if (!defined($pid=fork))
{
logmsg "cannot fork: $!";
return;
}
elsif ($pid)
{
return; #I'm the parent
}
#else I'm the child.
$request=<Client>;
chomp ($request); # Read request from client
($request,$USER)=split(/,/,$request); # Chop out the username for
logging.
if ((length($request) > 50))
{
$logreq=substr($request,0,45); # Keep it short for the log
$logreq.="....";
}
else
{
$logreq=$request;
}
logmsg "$$ : $USER at $NAME requests function ($logreq)."; # Record
connection.
@arguments=split(/ +/,$request); #make array of args.
@results=docommand(@arguments); #Perform the required task.
foreach $result (@results) #Send back results line by line.
{
chomp($result);
print Client "$result\n";
}
exit;
}
sub setsocket()
{
my $port = $SERVERPORT;
my $proto = getprotobyname('tcp');
socket (Server,PF_INET,SOCK_STREAM,$proto) || die "Unable to get
socket: $!";
setsockopt (Server,SOL_SOCKET,SO_REUSEADDR,pack("l", 1)) || die "Unable
to setsockopt: $!";
bind (Server,sockaddr_in($port,INADDR_ANY)) || die "Unable to bind:
$!";
listen (Server,SOMAXCONN) || die "Unable to listen: $!";
logmsg "-=> Version Database server (pid : $$) started on port $port.
<=-";
select(Server); $|=1; select(STDOUT);
}
sub docommand()
{
my @args=@_;
my $numargs=@args;
my @results;
$results[0]="Invalid server request (@args).\n"; #default unknown
result.
$results[0]=scalar localtime if ($args[0] eq "time");
@results="Available Commands : time, help." if ($args[0] eq "help");
return(@results);
}
The Client:
#!/usr/local/bin/perl -w
require 5.003; # Needs at least perl v5.003.
use Socket; # Uses Sockets.pm
use Curses; # and Curses.
$MACHINE="<Enter name here>; # Server machine.
$SERVPORT=<Enter Port Here>; # Base server port No.
my $waitedpid=0;
$SIG{CHLD}=\&REAPER;
MAIN:
{
if (@ARGV == 0) #No command line args so
{
#do gui stuff here;
}
else
{
@output=sendget(@ARGV); #processes args in @ARGV from command line.
print @output; #return output to STDOUT.
}
exit;
}
sub REAPER()
{ $SIG{CHLD}=\&REAPER;
$waitedpid=wait;
}
#####################################################################################################
# sendget()
# Sends the request to the server and recieves
# the output in array @output
# Returns array @output.
#####################################################################################################
sub sendget()
{
my @args=@_; #Read in args.
my $USER=$ENV{"USER"}; #Get the Username.
$USER="nobody" if (! ($USER)); #Default if none found.
my($remote) = $MACHINE; #Server machine.
my($serverport) = $SERVPORT; #Base server port No.
my $pid;
get_socket($remote,$serverport);#Establish a connection to the server.
select (SOCK); $|=1; select(STDOUT);
if (!defined($pid=fork))
{
die "Unable to fork child client : $!";
return;
}
elsif ($pid)
{
@output=<SOCK>; #Parent gets output from the server.
return(@output);
}
print SOCK "@args,$USER\n"; #Child sends request to server
exit; #and dies.}
#####################################################################################################
# get_socket()
# Opens a connection to "serverport" on "remote" machine.
# Initialises the SOCK filehandle or dies in the
# process.
#####################################################################################################
sub get_socket()
{
my($remote,$serverport)=@_;
my($iaddr, $paddr, $proto);
if ($serverport =~ /\D/)
{
$serverport = getservbyname($serverport, 'tcp');
}
die "No port" unless $serverport;
$iaddr = inet_aton($remote) || die "no host: $remote";
$paddr = sockaddr_in($serverport, $iaddr);
$proto = getprotobyname('tcp');
socket(SOCK, PF_INET, SOCK_STREAM, $proto) || die "Unable to get
socket: $!";
connect(SOCK, $paddr) || die "Unable to connect to server: $!";
}
Okay, there it is. Probably not very pretty to those more guruish than
me which is most :)
It is also multithreaded. You might want to put a counter on the server
to monitor the
number of simultaneous connections you have got and limit it if
necessary. I didn't need
to do it for this application, it's not that busy...
Hope this helps.
Constructive critism welcome!!!
-Rick.
> Alex Panagides
> Ceara, Brazil
------------------------------
Date: Mon, 10 Feb 1997 00:01:02 -0600
From: Ben Goldstein <bdg@endpoint.com>
Subject: Efficient keyword/value lookups using keyword substrings
Message-Id: <32FEB99D.7705@endpoint.com>
The associative arrays perl provides are great for doing fast lookups
using keywords, but apparently the keywords can't be regular expressions
(If they can, please tell me how!) so how does one go about efficiently
doing keyword/value lookups when only substrings are provided for the
keywords? (Yes, if "keyword substrings" are used, then potentially
multiple keyword matches may be generated for any given lookup, so I
suppose a list will have to be returned.)
I thought this would be a straightforward (and common) task to do in
perl,
but I'm stumped. Of course, a tree structure of some kind would handle
the job. Is there a library module somewhere for this that I've missed,
or some other perl hack that finesses the problem?
P.S., Yes, I've done my level-best to read the FAQ's! I guess I'm just
dense.
------------------------------
Date: 9 Feb 1997 17:29:28 -0500
From: philbo@bronze.lcs.mit.edu (Phil Gross)
Subject: Re: Help is anyone any good with sockets
Message-Id: <5dlj48$o49@bronze.lcs.mit.edu>
I'm using w3mir, which can get single pages, or lots of pages. It's
available at http://www.ifi.uio.no/~janl/w3mir.html
Good luck!
--Phil Gross
Philbo's Omnibus
http://www.philbo.com
------------------------------
Date: Sun, 9 Feb 1997 17:16:19 -0600
From: tadmc@flash.net (Tad McClellan)
Subject: Re: Help please
Message-Id: <3slld5.lj3.ln@localhost>
[ emailed, posted 'cause it would appear he is MUCH too busy to
take the time to come back here to get the answer ]
Nathan V. Patwardhan (nvp@shore.net) wrote:
: Magnus Grdsberg (magnus@grasberg.se) wrote:
: : Subject: Help please
^^^^^^^^^^^
Golly Magnus, I'd bet you would get more help with your Usenet
postings if you could be bothered to actually put a subject in
your Subject: ...
This seems to be a recurring phenomenon for you.
Here we have "Help please".
( where you asked a question that had NOTHING to do with Perl )
Your only other post to this newsgroup was "Cgi script help, please!"
While your post to comp.lang.perl.tk was "help please...."
( it also had NOTHING to do with tk )
While your post to comp.lang.perl.modules was
"help please.....with pearl script!"
( Where you asked the SAME question as in the "Cgi script help, please!"
posted to comp.lang.perl.misc AND the "help please...." posted to
comp.lang.perl.tk. What you have done there is known on Usenet
as 'spam'. It is NOT GOOD to spam, nor is it good to be known as a
spammer, as you now are.
)
In the comp.lang.java.* newsgroups you branched out to "Help needed!"
and "Help please!"
: : Is it possible to send PARAMS from a text fil to a java applet instead
: : of giving the params in the HTML code???
: Check comp.infosystems.www.authoring.cgi. This group has *nothing* to
^^^^^^^^^^^^
: do with Java.
^^^^^^^^^^^^
( Other than drinking significant quantities of the stuff. Preferably
_before_ attempting to answer questions here... ;-)
Golly Magnus, I'd bet you would get more help with your Usenet
postings if you could be bothered to actually post to an on-topic
newsgroup...
: : Pleae mail me the answer..........
Golly Magnus, I'd bet you would get more help with your Usenet
postings if you could be bothered to come back here to get
your answer...
It appears you can't be bothered.
Know something?
Neither can I.
Now let me see... Press the 'K' key while holding down the Control key...
--
Tad McClellan SGML Consulting
Tag And Document Consulting Perl programming
tadmc@flash.net
------------------------------
Date: Sun, 09 Feb 1997 19:34:25 +0000
From: Nick Bauman <nick@mail.g440.com>
To: konink@telebyte.nl
Subject: Re: HELP: How to speed up Perl scripts ?
Message-Id: <32FE26C0.2DC0@mail.g440.com>
There is a Perl Compiler in alpha right now. (check into CPAN for this)
I have tried it and it didn't work for me. It was looking for some
moduals I didn't have. What it does it generate C source code from your
Perl program by (I think) tracing the system calls used in your Perl
program. You then compile the C source.
There is also a way to dump a snapshot of your Perl program using the -u
switch that can be 'undumped' into a binary on some architechtures. It
will, however, be HUGE (>600k) and not likely to be faster _running_,
but will appear to be invoked faster because it doesn't need to be
compiled.
Hope this helps.
-N
------------------------------
Date: Mon, 10 Feb 1997 06:27:15 GMT
From: stephen farrell <sfarrell@phaedrus.uchicago.edu>
Subject: HTML layout manager like Tk
Message-Id: <87bu9tmb0c.fsf@phaedrus.uchicago.edu>
having been working in PERL/CGI (like the rest of the world) for quite
a while now, i've stumbled across an idea that i think would solve the
embedding HTML in perl code problem. i can't use the PHP/FI, web.sql,
or similar approaches i've seen. i can see their use, but my stuff is
overwhelmingly perl code, and is more of an application than a
collection of documents. as such, it seems that it would be much
nicer to have a layout manager for perl that outputted HTML.
basically like Tk or the java layout managers. in fact, it would be
quite nice if it were compatible with Tk Perl, albeit supporting only
a subset of Tk's functionality.
so you'd create a template for your pages, and then could add buttons,
forms, etc as in Tk Perl. it would use tables to pack (or possibly
alternative, like <pre> packing for old browsers or specifying exact
pixels for new browsers as promised in NS4). most importantly, it
would handle icky stuff like getting the "name" correct for items on a
long form, get nasty HTML out of perl cgi code, and make large forms
much less irritating (especially those generated by cgi, and
*especially* those that are dynamic, e.g., have select boxes that
vary).
having thought of this, it seems so obvious that i'm hoping someone
has done it. has anyone?
(scanning CPAN...)
this looks interesting: HTML-Stream-1.36.tar.gz -- would make a great
starting point for this sort of project, just lacks the 'layout
manager' and wrappers (if you really wanted it to be easily portable
to/from TkPerl).
btw -- a simplification would be an HTML table generator for making
forms embedding in tables. basically, you pass it a hash like
%tableattr = ( "height" => "100%", "width" => "100%", "balanced" => "yes" );
@form = (
{ "header" => "Whatever", "name" => "wh", "type"="text"},
{ "header" => "Nothing", "name" => "no", "type"="text" }
);
$thispage->pack(\%tableattr,\@form); # call this pack... (?)
$thispage->print(\*STDOUT); # or whatever, just screwing around
and would output something like:
<table width=100% height=100%>
<tr><th width=50%>Whatever></th><th width=50%>Nothing</th></tr>
<tr><td width=50%><input type="text" name="wh"></td>
<td width=50%><input type="text" name="no"></td></tr>
</table>
(note that there are obvious things missing/wrong from/with this example, but
it's just supposed to get the idea across).
thoughts? please cc: steve@farrell.org, as i find this newsgroup a
little overwhelming sometimes (too many posts to keep up with!)
------------------------------
Date: 10 Feb 1997 00:59:47 GMT
From: nvp@shore.net (Nathan V. Patwardhan)
Subject: Re: opening a URL from my perl script
Message-Id: <5dlru3$cvu@fridge-nf0.shore.net>
Mike Russ (miker@lainet.com) wrote:
: if ("lainet" eq "lainet") {
: print "location: http://www.lainet.com";
: }
How about:
if("lainet" eq "lainet") {
print("Location: http://www.lainet.com\n\n");
}
In the future, please re-direct CGI-related questions to
comp.infosystems.www.authoring.cgi. Thank you.
Hope this helps!
--
Nathan V. Patwardhan
nvp@shore.net
"What is the wind speed of a sparrow?"
------------------------------
Date: 10 Feb 1997 05:24:24 GMT
From: maclaudi@cps.msu.edu (Claudia Ma)
Subject: Pattern Match Prob
Message-Id: <5dmbe8$1nra@msunews.cl.msu.edu>
Hi there,
Can someone tell me how to use a var within s/ / /g?
Say $var1 = ABC, $var2 = A, should I say $var1 =~ m/^$var2/ ?
Thanks,
Claudia
--
============================================================
Claudia Y. Ma, Computer Science Dept., MSU
Email: maclaudi@cps.msu.edu
URL: http://www.cps.msu.edu/~maclaudi
============================================================
------------------------------
Date: Sun, 09 Feb 1997 02:02:31 -0500
From: comdog@computerdog.com (brian d foy)
Subject: Re: Pattern Match Prob
Message-Id: <comdog-0902970202310001@nntp.netcruiser>
In article <5dmbe8$1nra@msunews.cl.msu.edu>, maclaudi@cps.msu.edu
(Claudia Ma) wrote:
> Hi there,
>
> Can someone tell me how to use a var within s/ / /g?
#!/usr/bin/perl
$foo = 'csh';
$bar = 'perl';
$thing = 'just another csh hacker'; #we like csh
$thing =~ s/$foo/$bar/g; #oops, we don't like csh
print $thing;
__END__
Output:
just another perl hacker
> Say $var1 = ABC, $var2 = A, should I say $var1 =~ m/^$var2/ ?
what happened when you tried it? :)
check for this message on <URL:http://www.dejanews.com> to see
a brief discussion of what happens to variables in m//;
Subject: Re: Regular Expression Problem
From: comdog@computerdog.com (brian d foy)
Date: 1997/01/27
Message-Id: <comdog-2701970323360001@nntp.netcruiser>
References: <5cjfl5$31k@Holly.aa.net>
Newsgroups: comp.lang.perl.misc
--
brian d foy <URL:http://computerdog.com>
unsolicited commercial email is not appreciated
------------------------------
Date: Mon, 10 Feb 1997 00:12:17 GMT
From: abigail@ny.fnx.com (Abigail)
Subject: Re: Perl vs Korn Shell
Message-Id: <E5D1wH.6Mw@nonexistent.com>
On 8 Feb 1997 02:06:01 GMT, Ilya Zakharevich wrote in comp.lang.perl.misc:
++ [A complimentary Cc of this posting was sent to Nathan Wagner
++ <nw@hydaspes.if.org>],
++ who wrote in article <32FB95DE.3E2D@hydaspes.if.org>:
++ > Abigail wrote:
++ >
++ > > I agree perl is "better" than sed or awk. Yet I still use awk for
++ > > two reasons:
++ > >
++ > > - Sometimes awk is less typing (or less thinking) than perl.
++ > > I just can't beat "| awk '{print $2}'" in perl.
++ >
++ > I suppose you realize this, but how about
++ > | perl -pe '(split)[1]'
++
++ Did you try this before posting? Please do the next time.
++
++ Meanwhile what about
++ | perl -lane 'print $F[2]'
That should be "| perl -lane 'print $F[1]'"
Since it isn't that obvious (two people making mistakes while trying
to point the perl alternative), I'll stick to "| awk '{print $2}'".
No need to think twice, and less typing as well.
Abigail
------------------------------
Date: 10 Feb 1997 06:54:02 GMT
From: shishir@ruby.ora.com (Shishir Gundavaram)
To: shishir@ora.com
Subject: Re: POP Mail module. PLEASE HELP!
Message-Id: <5dmgma$97h@amber.ora.com>
mojo (mojo@oregoncoast.com) wrote:
: Is there a POP mail module for perl?
: I know that there is an FTP and a PING one.
: Net::FTP
: and
: Net::Ping
: But is there a POP mail one?
Yep, it's called POP3Client, and you can get it by pointing your browser
at:
http://www.perl.com/cgi-bin/cpan_mod?module=Mail::POP3Client
--Shishir
------------------------------
Date: 9 Feb 1997 17:59:34 -0500
From: philbo@bronze.lcs.mit.edu (Phil Gross)
Subject: sgmlstripper not stripping (or how to turn html to text)
Message-Id: <5dlksm$o9a@bronze.lcs.mit.edu>
I've got an html file that I want to turn to plain text. I've searched
through CPAN, and found exactly what I need, sgmlstripper, by Robert
Seymour.
It works fine for most html, but I can't get it to work for one file.
Here's the relevant sgmlstripper code (c) by Robert Seymour and
Springer Verlag
## Use STDIN if no files are given
$ARGV[0] = "-" unless @ARGV;
## Strip out anything contained in an SGML markup tag. This is not
## very pretty and rather inefficient, but it does take care of tags
## which cross line or paragraph boundaries.
foreach $file (@ARGV) {
open(INPUT,$file);
while($char = getc(INPUT)) {
if($char eq "<") {
IGNORE: for(;;) {
last IGNORE if (getc(INPUT) eq ">");
}
} else {
print $char;
}
}
close(INPUT);
}
Here's the HTML (for example):
<!DOCTYPE HTML PUBLIC "html.dtd">
<HTML><BODY><H4>Massachusetts ZONE FORECASTS
</H4><P>National Weather Service Taunton MA
1107 AM EST Sun Feb 9 1997
<P>
<!--100400-->
<LI>Eastern Essex-Eastern Norfolk-Eastern Plymouth-Southeast
Middlesex-
</LI><BR>Suffolk-
<BR>Including the cities of, Gloucester, Lynn, Salem, Cambridge,
<BR>Waltham, Woburn, Boston, Quincy, Cohasset, Plymouth
<BR>1107 AM EST Sun Feb 9 1997
<FONT SIZE="-1">Last Modified: February 09, 1997</F
ONT></P>
Now, sgmlstrip should give me everything outside of angle brackets,
but it doesn't. It only gives me the first three lines (the first of
which is blank (as it should be).
Any suggestions on what's wrong with the code? It looks pretty
straightforward to me, but could it be a problem with escaping
characters?
Thanks for the help!
--Phil Gross
www.philbo.com
philbo@philbo.com
------------------------------
Date: 9 Feb 1997 23:28:42 GMT
From: dave@fast.thomases.com (Dave Thomas)
Subject: Re: sgmlstripper not stripping (or how to turn html to text)
Message-Id: <slrn5fsn93.bas.dave@fast.thomases.com>
On 9 Feb 1997 17:59:34 -0500, Phil Gross <philbo@bronze.lcs.mit.edu> wrote:
> while($char = getc(INPUT)) {
^^^^^^^^^^^^^^^^^
This is a 'C' programmer writing Perl! In Perl, numbers and strings are
(mostly) interchangable. The number zero is false in a logical context. So
when the input comes across a zero (as in '1107 AM EST'), the loop terminates.
Getc returns a null string at EOF.
As an alternative, have you had a look at HTML::FormatText in the libwww
package?
package HTML::FormatText;
# $Id: FormatText.pm,v 1.12 1996/06/09 14:49:58 aas Exp $
=head1 NAME
HTML::FormatText - Format HTML as text
=head1 SYNOPSIS
require HTML::FormatText;
$html = parse_htmlfile("test.html");
$formatter = new HTML::FormatText;
print $formatter->format($html);
Never used it myself, but it looks like it does what you want.
Regards
Dave
--
_________________________________________________________________________
| Dave Thomas - Dave@Thomases.com - Unix and systems consultancy - Dallas |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
------------------------------
Date: Sun, 9 Feb 1997 17:27:47 -0600
From: tadmc@flash.net (Tad McClellan)
Subject: Re: Using $ENV{'QUERY_STRING'}
Message-Id: <jhmld5.rl3.ln@localhost>
Kevin Woodward (kevin@redsun.com) wrote:
: I'm having trouble with $ENV variables. I'd like to set a variable
^^^^^^^^^^^^^^^^^^^^^^^^^^
: ($quy) to the string entered after the ? when called.=20
: I assume (from what I've read on the web) that the line should look
: like:
I think you have confused perl scripting with some of the particulars
of CGI scripting.
The QUERY_STRING environment variable is set as part of the CGI.
: $quy =3D $ENV{'QUERY_STRING'};
: However this fails. I've tried printing $ENV{'QUERY_STRING'} to the
: UNIX prompt, but had no luck. I've have been attempting this simple
: task for many hours this weekend and this is my last resort.
: Is they some command to set up these variable, or to pass them to the
: process? Or anm I simply implementing it wrong?
: I apologies for the 'non-technical' terms; I blame it on my newness to
: the language.=20
Now, I don't know what your _real_ question is. So I'll have to make
some guesses:
1) to set a _perl_ variable from an _environment_ variable:
$quy = $ENV{'QUERY_STRING'}; # this assumes that QUERY_STRING has
# been set to something by someone
2) to set an _environment_ variable form a _perl_ variable:
$ENV{'QUERY_STRING'} = $quy;
3) to parse the URL encoded input to a CGI script:
a) ask about CGI in the CGI newsgroup: comp.infosystems.www.authoring.cgi
b) use the CGI.pm perl module
Hope this helps!
--
Tad McClellan SGML Consulting
Tag And Document Consulting Perl programming
tadmc@flash.net
------------------------------
Date: Mon, 10 Feb 1997 00:07:53 -0500
From: Gautam Srikanth <morpheus+@andrew.cmu.edu>
Subject: Re: Using $ENV{'QUERY_STRING'}
Message-Id: <smzeodG00YUu1IJoA0@andrew.cmu.edu>
Excerpts from netnews.comp.lang.perl.misc:
9-Feb-97 Re: Using $ENV{'QUERY_STRING'} by Tad McClellan@flash.net
> 2) to set an _environment_ variable form a _perl_ variable:
>
> $ENV{'QUERY_STRING'} = $quy;
Tad, are you sure that'll work? I just tried setting an environment
variable from inside a Perl script, and it doesn't carry through to the
calling environment (Solaris 2.5.1).
I've been under the understanding that no program can modify the calling
environment under Unix. Is there indeed some way to do so?
-- Gautam
Gautam Srikanth \ "The point of the journey
morpheus+@andrew.cmu.edu \ is not to arrive!"
www.andrew.cmu.edu/~morpheus \ (Rush, "Prime Mover")
------------------------------
Date: 8 Jan 97 21:33:47 GMT (Last modified)
From: Perl-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 8 Jan 97)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.misc (and this Digest), send your
article to perl-users@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
The Meta-FAQ, an article containing information about the FAQ, is
available by requesting "send perl-users meta-faq". The real FAQ, as it
appeared last in the newsgroup, can be retrieved with the request "send
perl-users FAQ". Due to their sizes, neither the Meta-FAQ nor the FAQ
are included in the digest.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V7 Issue 920
*************************************