[10322] in Perl-Users-Digest
Perl-Users Digest, Issue: 3916 Volume: 8
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Oct 7 12:07:53 1998
Date: Wed, 7 Oct 98 09:01:35 -0700
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Wed, 7 Oct 1998 Volume: 8 Number: 3916
Today's topics:
Re: perl html tag parser ekaull@my-dejanews.com
Re: perl html tag parser <jdf@pobox.com>
Perl reg exp more reliable than sed? <koji.kiyokawa@which.net>
Pizza Hut advertises for CPAN miko@idocs.com
Re: Pizza Hut advertises for CPAN <wodehouse@cheerful.com>
Q: Any List of Win32 Error Codes ? (Eisen Chao)
Re: Q: Speed up a regular expression (John Moreno)
Re: Reading all lines from datafile (Mark-Jason Dominus)
Re: Reading all lines from datafile (Doran L. Barton)
Re: regexp with variable substitution values (Brand Hilton)
scope of my using () <due@whitecrow.net>
Re: scope of my using () (Mark-Jason Dominus)
Syntax Question (What the 'ell does that mean?!) (Omri Mezrich)
system command on Win32 (Joe Mahan)
typeglobs <alexb@sig.net>
Re: typeglobs <jdf@pobox.com>
Re: Win32 <adrian@pearl.demon.co.uk>
XML::Parser::Expat <michel.prevost@cactuscom.ca_REMOVE_TO_MAIL>
Special: Digest Administrivia (Last modified: 12 Mar 98 (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Wed, 07 Oct 1998 13:57:05 GMT
From: ekaull@my-dejanews.com
Subject: Re: perl html tag parser
Message-Id: <6vfrvh$36t$1@nnrp1.dejanews.com>
Thanks so much.
I'm probably going to go the way of regex. I've already put 2 days into this
code and probably can't justify any more. I'll have to learn the cool way
later.
if ($listing[$index]=~/href+\s*\=+\"+(.*)\s*\"+/i)
{
# $html =~ s/(href=")([^"]*)/$1\L$2/gi; # yours
$listing[$index]=~s/(href+\s*\=+\"+)([^"]*)/$1\L$2/gi; # mine works
}
I tested it and it seems that mine works. There maybe exceptions but I don't
see them yet. Whaddaya think?
Ed
In article <m3ogrphvox.fsf@joshua.panix.com>,
Jonathan Feinberg <jdf@pobox.com> wrote:
> ekaull@my-dejanews.com writes:
>
> > Jonathan Feinberg <jdf@pobox.com> wrote:
>
> > > require HTML::Filter;
> > > @ISA = qw( HTML::Filter );
> > > sub start {
> > > my $self = shift;
> > > my ($tag, $attr, $attrseq, $text) = @_;
> > > if ($tag eq 'a') {
> > > $attr->{href} = lc $attr->{href} ;
> > > $_[3] = '<a ' . (join ' ', map qq($_="$attr->{$_}"),@$attrseq) .
'>';
> > > }
> > > $self->SUPER::start(@_);
> > > }
>
> > To be honest, this code is a bit over my head.
>
> You know, when I wrote and tested the example, I struggled to make it
> simple. I'd hoped that just tweaking the $attr hash itself would
> affect the HTML::Filter output, but you must re-create the raw HTML
> itself and pass it on to the superclass. Looking at it now, I'd use
> "$text" instead of "$_[3]", and then explicity invoke
> $self->SUPER::start($tag, $attr, $attrseq, $text).
>
> In order to understand this code you'll need to read and understand
> several docs: perlref (since $attr and $attrseq are references to a
> hash, and a list, respectively), perlmod (packages), perlobj &
> perltoot (objects, the SUPER:: pseudo-package, the @ISA array), and
> especially the documentation for HTML::Parse and HTML:Filter.
>
> > If I could only get inside the quotes to change it I'd be all set.
>
> For this you'll want perlre and the book _Mastering Regular
> Expressions_. Here's a very simple idea that *will* fail in some
> cases:
>
>
>
> --
> Jonathan Feinberg jdf@pobox.com Sunny Brooklyn, NY
> http://pobox.com/~jdf
>
-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
------------------------------
Date: 07 Oct 1998 17:13:20 +0200
From: Jonathan Feinberg <jdf@pobox.com>
To: ekaull@my-dejanews.com
Subject: Re: perl html tag parser
Message-Id: <m3soh0pfqn.fsf@joshua.panix.com>
ekaull@my-dejanews.com writes:
> if ($listing[$index]=~/href+\s*\=+\"+(.*)\s*\"+/i)
^^ ^^^^^^^^^^ ^^^
this is wrong.
I don't think you understand what + does in a regex. Also, almost all
of those backslashes are unnecessary, and make it harder to read your
regex. But the most serious problem is that if the line in question
looks like
<a href="http://foo.com/">A</a> <a href="http://bar.com/">B</a>
then $1 will contain
http://foo.com/">A</a> <a href="http://bar.com/
which is certainly not what you intend. Use [^"] instead of .
> {
> $listing[$index]=~s/(href+\s*\=+\"+)([^"]*)/$1\L$2/gi; # mine works
>
> }
Why are you doing the regex twice? Just do the substitution. If the
stuff isn't there, nothing will happen to the string.
--
Jonathan Feinberg jdf@pobox.com Sunny Brooklyn, NY
http://pobox.com/~jdf
------------------------------
Date: Wed, 7 Oct 1998 15:33:54 +0100
From: "koji kiyokawa" <koji.kiyokawa@which.net>
Subject: Perl reg exp more reliable than sed?
Message-Id: <6vft4p$hk6$1@news.uk.ibm.com>
Hello,
Recently I had to remove a whole bunch of null characters from a
file:
So I did
sed 's/\0//' filename
But this actually removed some zeros too.
When I used Perl
$input=<FILE>;
$input=~s/\0//;
Everything worked fine. Is there a logical explanation to this?
Also, if I wanted to zap out all foreign characters (those european
o's and a's with two dots on top etc) is there a list of control codes for
them??
Many thanks for your help
------------------------------
Date: Wed, 07 Oct 1998 13:48:37 GMT
From: miko@idocs.com
Subject: Pizza Hut advertises for CPAN
Message-Id: <6vfrfl$2h2$1@nnrp1.dejanews.com>
Has anyone else noticed that the new Pizza Hut logo for their pan pizza looks
like it reads "CPAN"? I keep glancing at the TV to discover that there are
now wonderfully filmed national ads for CPAN, and I think "wow, CPAN must
really want some traffic".
-miko
--
Miko O'Sullivan
Author of The Idocs Guide to HTML
http://www.idocs.com/tags/
-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
------------------------------
Date: Wed, 7 Oct 1998 16:25:40 +0200
From: "ch" <wodehouse@cheerful.com>
Subject: Re: Pizza Hut advertises for CPAN
Message-Id: <6vftjc$bgh$1@dinkel.civ.utwente.nl>
miko@idocs.com wrote in message <6vfrfl$2h2$1@nnrp1.dejanews.com>...
>Has anyone else noticed that the new Pizza Hut logo for their pan pizza
looks
>like it reads "CPAN"? I keep glancing at the TV to discover that there are
>now wonderfully filmed national ads for CPAN, and I think "wow, CPAN must
>really want some traffic".
You're damn right (see www.pizzahut.com). I wonder how many new Perl users
we'll see before the end of the year who were triggered by unconscious
desires for CPAN.
Casper H.
http://come.to/quotes
------------------------------
Date: 7 Oct 1998 15:47:48 GMT
From: echao@interaccess.com (Eisen Chao)
Subject: Q: Any List of Win32 Error Codes ?
Message-Id: <6vg2f4$o0t$1@supernews.com>
To All:
Any one out there know where a list of all the returned error
codes from Win32 modules like NetResources is ? I'm getting
some non-zero error codes but I have no idea what they mean.
Thanks in advance,
Eisen
------------------------------
Date: Wed, 7 Oct 1998 11:02:20 -0500
From: phenix@interpath.com (John Moreno)
Subject: Re: Q: Speed up a regular expression
Message-Id: <1dgizeo.u7o9yz1yw0p8gN@roxboro0-005.dyn.interpath.net>
Larry Rosler <lr@hpl.hp.com> wrote:
> Larry Rosler <lr@hpl.hp.com> says...
-snip-
> >
> > Rather than start up two regex matches (in case the first one fails),
> > one should write simply:
> >
> > if ($line =~ /(?:^| )$Search /io)
>
> I felt guilty about posting conjectures about performance without
> benchmarking them, so I wrote a simple benchmark. Now I am completely
> baffled, because my reasoning^Wintuition has failed completely yet again.
>
> #!/usr/local/bin/perl -w
> use strict;
> use Benchmark;
>
> timethese (1 << (shift || 0), {
> one_regex0 => sub { $_ = 'fox'; /(?:^| )foo/ },
> one_regex1 => sub { $_ = 'foo'; /(?:^| )foo/ },
> one_regex2 => sub { $_ = ' foo'; /(?:^| )foo/ },
> two_regex0 => sub { $_ = 'fox'; /^foo/ || / foo/ },
> two_regex1 => sub { $_ = 'foo'; /^foo/ || / foo/ },
> two_regex2 => sub { $_ = ' foo'; /^foo/ || / foo/ },
> } );
> __END__
>
> Benchmark: timing 262144 iterations of one_regex0, one_regex1,
> one_regex2, two_regex0, two_regex1, two_regex2...
> one_regex0: 14 wallclock secs (14.33 usr + 0.00 sys = 14.33 CPU)
> one_regex1: 12 wallclock secs (12.29 usr + 0.00 sys = 12.29 CPU)
> one_regex2: 13 wallclock secs (13.40 usr + 0.00 sys = 13.40 CPU)
> two_regex0: 8 wallclock secs ( 8.57 usr + 0.00 sys = 8.57 CPU)
> two_regex1: 12 wallclock secs (11.80 usr + 0.00 sys = 11.80 CPU)
> two_regex2: 11 wallclock secs ( 9.29 usr + 0.00 sys = 9.29 CPU)
>
> Conclusion:
>
> Two regexes *are* faster than one regex with alternation. Why???
> (The factoring in the single regex has minor performance implications.)
>
> This is perl 5.005_02 on Wintel.
Benchmark: timing 262144 iterations of one_regex0, one_regex1,
one_regex2, two_regex0, two_regex1, two_regex2...
one_regex0: 9 secs ( 9.60 usr 0.00 sys = 9.60 cpu)
one_regex1: 12 secs (13.22 usr 0.00 sys = 13.22 cpu)
one_regex2: 14 secs (14.37 usr 0.00 sys = 14.37 cpu)
two_regex0: 14 secs (12.80 usr 0.00 sys = 12.80 cpu)
two_regex1: 15 secs (12.33 usr 0.00 sys = 12.33 cpu)
two_regex2: 18 secs (15.98 usr 0.00 sys = 15.98 cpu)
This is perl 5.004 on the Mac.
--
John Moreno
------------------------------
Date: 7 Oct 1998 10:56:59 -0400
From: mjd@op.net (Mark-Jason Dominus)
Subject: Re: Reading all lines from datafile
Message-Id: <6vfvfr$hbq$1@monet.op.net>
In article <361B3860.40EC@edoc.co.za>, Nico <info@edoc.co.za> wrote:
>while (<CAT>)
Evaluating a filehandle in angle brackets yields the next line
from that file... Ordinarily you must assign that value to a
variable, but there is one situation where an automatic
assignment happens. If and ONLY if the input symbol is the
only thing inside the conditional of a while or for(;;) loop,
the value is automatically assigned to the variable $_.
(`perlop' manual page)
So this automatically reads a line from the file into the variable $_.
>$ty =<CAT>;
And then this reads the next line into the variable $ty.
>print "$ty\n";}
And that's why you only print out the even-numbered lines.
Rewrite it like this:
while (<CAT>) {
print $_;
}
You don't need \n here becuse the line you read in already has a
newline on the end of it.
------------------------------
Date: 7 Oct 1998 09:31:27 -0600
From: fozz@xmission.xmission.com (Doran L. Barton)
Subject: Re: Reading all lines from datafile
Message-Id: <6vg1gf$79a$1@xmission.xmission.com>
Nico <info@edoc.co.za> writes:
>What am I doing wrong?
>#Read datalines in file
>$filename1 = 'c:\myfiles\testfile.txt';
>open(CAT,$filename1) || die "can't open file $filename1";
>while (<CAT>)
>{
>$ty =<CAT>;
>print "$ty\n";}
>close(CAT);
When you do a 'while(<CAT>)' you are setting up a while() loop in which
each iteration of the loop reads a line from the file pointed to by the
file handle CAT and places it in the default scalar variable $_. Therefore,
when you do '$ty = <CAT>' you are reading in every other line in the file
since the previous lines have already been read and assigned to the $_
variable by the 'while(<CAT>)' statement.
This is one of the many things that may confuse C programmers learning
Perl. The 'while(<CAT>)' is more than just an EOF test. It also does other
work by reading data and putting it into the default scalar variable.
Hope that helps.
-=Fozz
--
Doran L. Barton = fozz@xmission.com && http://www.xmission.com/~fozz/;
"Where do you want Microsoft to go today?" --Ron Barry <ronb@cc.usu.edu>
"This may seem a bit weird, but that's okay, because it is weird."
-- Larry Wall <lwall@sems.com> in the Perl v5 man page
------------------------------
Date: 7 Oct 1998 14:17:40 GMT
From: bhilton@tsg.adc.com (Brand Hilton)
Subject: Re: regexp with variable substitution values
Message-Id: <6vft64$7p114@mercury.adc.com>
In article <6vet89$n2u$1@statler.server.colt.net>,
Paul Makepeace <Paul.Makepeace@POBox.com> wrote:
>Brand Hilton wrote:
>> BZZZT! Thank you for playing :-)
>
>
>Paul, browned on both sides, who promises not to post after 1am BST. (Doh,
>again)
Hey, at least mine had a smiley ;-)
--
_____
|/// | Brand Hilton bhilton@adc.com
| ADC| ADC Telecommunications, ATM Transport Division
|_____| Richardson, Texas
------------------------------
Date: 7 Oct 1998 13:24:49 GMT
From: "AmD" <due@whitecrow.net>
Subject: scope of my using ()
Message-Id: <6vfq31$4i8$0@206.165.167.139>
Hi Folks,
I encountered an unexpected behavior today and was hoping someone could
explain it to me. If you run the following it will print "Hash is defined"
using Perl version 5.004_02 in Win95 (I know, I know but I have no choice in
the matter). However, if I remove the parentheses in line one so it reads:
my %my_hash
I observe the behavior I originally expected which is that nothing is
printed. Is this a bug, feature, or something to do with the scope of my?
I have read the sections on my in Programming Perl so if the answer is
contained there in, it eludes me. Any edification would be appreciated.
Allan M. Due
---------------------------------
my (%my_hash);
%my_hash = hash_work();
print "Hash is defined\n" if defined %my_hash;
sub hash_work {
my (%in_hash);
%in_hash = ();
return %in_hash;
}
------------------------------
Date: 7 Oct 1998 11:08:23 -0400
From: mjd@op.net (Mark-Jason Dominus)
Subject: Re: scope of my using ()
Message-Id: <6vg057$hfv$1@monet.op.net>
In article <6vfq31$4i8$0@206.165.167.139>, AmD <Allan@due.net> wrote:
>If you run the following it will print "Hash is defined"
At present, `defined' is not meaningful for hashes or arrays; only for
scalars. Using it returns a bizarre result.
------------------------------
Date: Wed, 07 Oct 1998 14:21:20 GMT
From: mezsez@netvision.net.il (Omri Mezrich)
Subject: Syntax Question (What the 'ell does that mean?!)
Message-Id: <361e78cd.7815878@news.netvision.net.il>
$header =~ s/\n\s+/ /g; # fix continuation lines
%hdrs = (UNIX_FROM => split /^(.*?):\s*/m, $header);
It's supposed to split the entire header of a normal Unix email
message (in $header) into fields and their values.
Explain.
Thanks,
Omri Mezrich
mezsez@netvision.net.il
------------------------------
Date: Wed, 07 Oct 1998 14:42:06 GMT
From: mahan@Intone.lkg.dec.com (Joe Mahan)
Subject: system command on Win32
Message-Id: <361b7cd1.503200273@mrnews.mro.dec.com>
If I understand the docs correctly the system command, as follows,
executes my DOS command and waits for error code return:
system ("my DOS command");
Question is, when I remove the parentheses, what really happens?
system "my DOS command";
Will execute the command, but what happens if there is an error with
my DOS command?
TIA,
Joe
------------------------------
Date: Wed, 07 Oct 1998 08:05:48 -0500
From: Alexander Bibighaus <alexb@sig.net>
Subject: typeglobs
Message-Id: <361B672C.78A0A4ED@sig.net>
I have been reading the book "Advanced Perl Programming" and
I am confused on typeglobs.
Can someone give me an example of how they use typeglobs?
thanks,
alexander
------------------------------
Date: 07 Oct 1998 17:29:28 +0200
From: Jonathan Feinberg <jdf@pobox.com>
To: Alexander Bibighaus <alexb@sig.net>
Subject: Re: typeglobs
Message-Id: <m3ogropezr.fsf@joshua.panix.com>
Alexander Bibighaus <alexb@sig.net> writes:
> I have been reading the book "Advanced Perl Programming" and
> I am confused on typeglobs.
>
> Can someone give me an example of how they use typeglobs?
Check out perldata, perlref, perlsub and perlmod for their discussions
of typeglobs. You'll also find some good stuff in perlfaq5 and
perlfaq7.
--
Jonathan Feinberg jdf@pobox.com Sunny Brooklyn, NY
http://pobox.com/~jdf
------------------------------
Date: Wed, 7 Oct 1998 14:30:43 +0100
From: Adrian Albin-Clark <adrian@pearl.demon.co.uk>
Subject: Re: Win32
Message-Id: <uFGoYKAD02G2Ew1$@pearl.demon.co.uk>
Looks like we've both got something in common.
It must be something simple that I am not doing.
I want to use PERL scripts with NT Server 4 but have had no luck at all.
This is what I have tried.
0) Initially I tried all of the following on a FAT16 installation, but
thought that maybe it needed to be NTFS, so then tried using that, but
still no joy.
1) Downloaded (from the net) and installed perl5_00402-bindist04-bc.zip.
This claims to have been tested extensively under Windows NT.
2) Downloaded (from the net) various sample scripts including 'Flexbook'
a simple form that acts as a guestbook, allowing the user to add some
comments to an evergrowing list (effectively modifying the source HTML
of the page being viewed).
3) Put the path to the perl binary in the path variable e.g. g:\perl\bin
4) Made sure IUSR.. (the www user) has read/write access on the
gbook.html file.
5) Looked in all docs with perl but could find nothing of any help.
6) Found a helpful tip on the net which suggested setting up an
association in registry thus:
H_KEY_LOCAL_MACHINE/SYSTEM/CURRENT_CONTROL_SET/SERVICES/W3SVC/PARAMETERS
/SCRIPT_MAP/
If you don't have anything in there for a .pl or .cgi extension, add it
with the path to your perl.exe and %s %s
So I did this. STILL NO JOY!
*****************************************************
Have I omitted some crucial things?
By the way, what is the correct syntax for the whizbang line in a Perl
script under NT?
For the following file (g:\perl\bin\perl.exe) I have tried:
#!/perl/bin/perl.exe
/perl/bin/perl.exe
/perl/bin/perl
/e|/perl/bin/perl.exe
#!/e|/perl/bin/perl.exe
******************************************************
The most I have managed to get is the flexbook.pl file loaded in the
browser as a text file, which is obviously not what is required!!
PLEASE HELP!!!!
--
Adrian Albin-Clark
------------------------------
Date: Wed, 07 Oct 1998 10:37:00 -0400
From: Michel Prevost <michel.prevost@cactuscom.ca_REMOVE_TO_MAIL>
Subject: XML::Parser::Expat
Message-Id: <361B7C8C.4E22E43F@cactuscom.ca_REMOVE_TO_MAIL>
Hi All
I have a small problem with that module. When parsing an XML file, I get
the following error:
Parse position is outside of buffer at
/home/mprevost/perl/lib/XML/Parser/Expat.pm line 163.
Has anyone ever encountered this error?
Tx
Michel Prevost
------------------------------
Date: 12 Jul 98 21:33:47 GMT (Last modified)
From: Perl-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Special: Digest Administrivia (Last modified: 12 Mar 98)
Message-Id: <null>
Administrivia:
Special notice: in a few days, the new group comp.lang.perl.moderated
should be formed. I would rather not support two different groups, and I
know of no other plans to create a digested moderated group. This leaves
me with two options: 1) keep on with this group 2) change to the
moderated one.
If you have opinions on this, send them to
perl-users-request@ruby.oce.orst.edu.
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.misc (and this Digest), send your
article to perl-users@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
The Meta-FAQ, an article containing information about the FAQ, is
available by requesting "send perl-users meta-faq". The real FAQ, as it
appeared last in the newsgroup, can be retrieved with the request "send
perl-users FAQ". Due to their sizes, neither the Meta-FAQ nor the FAQ
are included in the digest.
The "mini-FAQ", which is an updated version of the Meta-FAQ, is
available by requesting "send perl-users mini-faq". It appears twice
weekly in the group, but is not distributed in the digest.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V8 Issue 3916
**************************************