[16123] in Perl-Users-Digest
Perl-Users Digest, Issue: 3535 Volume: 9
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Jul 2 06:05:29 2000
Date: Sun, 2 Jul 2000 03:05:10 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <962532310-v9-i3535@ruby.oce.orst.edu>
Content-Type: text
Perl-Users Digest Sun, 2 Jul 2000 Volume: 9 Number: 3535
Today's topics:
Re: ***Do not use this code!*** Re: Perl Help Please! (Villy Kruse)
Re: 2Q re HTML::, was: HTML::Parser docs and homepage Tom_Roche@ncsu.edu
Emacs modules for Perl programming (Jari Aalto+mail.perl)
Re: how to get text enclosed by matching () ? <bwalton@rochester.rr.com>
Re: how to get text enclosed by matching () ? (Neil Kandalgaonkar)
Re: HTML::Parser docs and homepage <j.bessels@quicknet.nl>
insert non-ascii character through DBD-Oracle dwang999@my-deja.com
Is this code dangerous? eval{$$_ = $q->param($_)} <abuse@localhost>
Re: Is this code dangerous? eval{$$_ = $q->param($_)} (brian d foy)
Message board software <philipc@i-cable.com>
Re: off topic question about NT command shell [was: Dum (jason)
Perl Newbie Question (Costas Menico)
Re: Perl Newbie Question <rob13@rock13.com>
Re: Perl Newbie Question <flavell@mail.cern.ch>
Re: Perl Newbie Question <bcaligari@shipreg.com>
Perl Subroutines - some help required anuragmenon@my-deja.com
Re: Perl Subroutines - some help required (Tad McClellan)
Digest Administrivia (Last modified: 16 Sep 99) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 2 Jul 2000 09:03:52 GMT
From: vek@pharmnl.ohout.pharmapartners.nl (Villy Kruse)
Subject: Re: ***Do not use this code!*** Re: Perl Help Please!
Message-Id: <slrn8lu1bn.5mk.vek@pharmnl.ohout.pharmapartners.nl>
On 1 Jul 2000 22:28:25 GMT,
danny@lennon.postino.com <danny@lennon.postino.com> wrote:
>User-Agent: tin/1.4.2-20000205 ("Possession") (UNIX) (Linux/2.2.14-5.0 (i586))
>
>In comp.lang.perl.misc Paul Taylor <pap@notheresotonians.org.uk> wrote:
>[snip]
>> Two problems arisen, neither answered in a constructive manner.
>> Perl is supposed to be a community. In a community, people contribute when
>> they can, right or wrong. There are always going to be leaders in a community,
>> and certain individuals are going to know a great deal more about the
>> mechanisms of that community.
>> However, when that is the case, such individuals should educate, not berate
>> those in need of help. In this thread, you had every right to point out the
>> security shortcomings in my script. You could have suggested an alternative,
>> or added to the script provided in order to make it safe.
>
>Hear, hear. I count 19 posts in this thread, lots of accusations and flames,
>yet not a single example of how to use taint checking to safely open and
>read a file. Sometimes read the FAQ just doesn't cut it, and this is one of
>those times.
>
Reading the output from 'perldoc perlsec' would probably be a good idea.
All 8 pages of it. Then, if there are still question then ask.
Villy
------------------------------
Date: Sat, 01 Jul 2000 21:12:26 -0400
From: Tom_Roche@ncsu.edu
To: sburke@spinn.net
Subject: Re: 2Q re HTML::, was: HTML::Parser docs and homepage
Message-Id: <395E96FA.109AB64C@ncsu.edu>
I wrote:
>> * howto? Can you point me to any references (articles, etc)
>> illustrating the use of the HTML:: modules that are somewhat less
>> terse than the CPAN documentation or "The HTML Module" in _Perl in
>> a Nutshell_?
sburke@spinn.net wrote:
> I was so busy refurbishing TreeBuilder and Element that I've had
> hardly a moment to document them. But this is changing. Have a look,
> notably, at this:
> http://www.speech.cs.cmu.edu/~sburke/y2c.html
Thanks for your assistance! While unfortunately this reached me late
for the first project on which I was working, it should be useful for
the next two. I got the first done using HTML::Parser, after finding
the following pointers:
1 Jonathan Stowe wrote:
> I have a very brief and sometime to be finished page at :
> <http://www.gellyfish.com/htexamples/>
> This only deals with version 2 of the module - I have some
> examples using version 3 but you can search Deja News for them
2 A search pointer to Ken MacFarlane's excellent article
http://www.itknowledge.com/tpj/issues/vol5_1/tpj0501-0003.html
Although HTML::Parser worked for me, I'm leaning toward using
HTML::TreeBuilder for the upcoming projects, since HTML::TreeBuilder
seems essentially to extend HTML::Parser with the functionality of
HTML::Element. Unfortunately my surmise is nowhere plainly stated
other than in this mere comment from TreeBuilder.pm v2.96:
> # It's not that we ARE an element AND a parser.
> # We ARE an element, but one that knows how to handle signals
> # (method calls) from Parser in order to elaborate its subtree.
Your responses to the following three questions (and your attention to
the following three notes) are appreciated:
? Is it in fact correct to state that HTML::TreeBuilder essentially
extends HTML::Parser with the functionality of HTML::Element?
? Presuming the above characterization true, would it be correct to
state that one cannot fully employ the functionality of
HTML::TreeBuilder without knowledge of HTML::Parser and
HTML::Element? I don't notice any documented wrappers to the
HTML::Parser functionality; rather it appears to be assumed that a
user will know that HTML::TreeBuilder::start, ::stop, etc are there,
and how to use them.
! Presuming the previous characterization true, I, and I suppose many
other Perl lightweights, would appreciate some documentation that
synthesized the various material available on those three modules.
It seems to me that available materials on HTML::Parser makes no
reference to HTML::TreeBuilder and HTML::Element, and vice versa.
? My first project required transforming to-do/problem-tracking
information from records in a large HTML table into records in a
database. What I wanted (and IMHO only kludged with HTML::Parser)
was a way to suck out the entire contents of table cells (i.e. the
'text' of a 'td'), including any internal tagging they might
possess, regardless of any internal tagging they might possess. I.e.
I did _not_ want to handle the internal start's, text's, and end's
encountered, and do the appending of contents required, until I got
to the first </td>. (Which would in any case be a bad algorithm if
that cell contained another table, which further illustrates my
desire for this functionality :-)
Is this functionality provided via HTML::TreeBuilder (or other tool
of your acquaintance)?
! Editorial note: in
http://www.speech.cs.cmu.edu/~sburke/y2c.html
> <IMG SRC="foo.png">
> is represented, internally as a blessed hashref:
> bless {
> '_tag' => 'img',
> '_parent' => undef,
> '_content' => [],
> 'src' => 'Stuff.png'
> }, 'HTML::Element'
> However, you would never make objects of your own by blessing them
> into that class; instead you'd call:
> my $i = HTML::Element->new('img', 'src' => 'Stuff.png');
Should not the two instances of 'Stuff.png' be replaced with
'foo.png'?
! An even more minor editorial note:
http://www.speech.cs.cmu.edu/~sburke/y2c.html
> <CODE>$node-&gt;as_text</CODE>
is IMHO incorrect. But someone's gotta proofread :-)
Your assistance is appreciated, Tom_Roche@ncsu.edu
------------------------------
Date: 02 Jul 2000 09:25:29 GMT
From: <jari.aalto@poboxes.com> (Jari Aalto+mail.perl)
Subject: Emacs modules for Perl programming
Message-Id: <perl-faq/emacs-lisp-modules_962529840@rtfm.mit.edu>
Archive-name: perl-faq/emacs-lisp-modules
Posting-Frequency: 2 times a month
URL: http://home.eu.org/~jari/ema-keys.html
Maintainer: Jari Aalto <jari.aalto@poboxes.com>
Announcement: "What Emacs lisp modules can help with programming Perl"
Preface
Emacs is your friend if you have to do anything comcerning software
development: It offers plug-in modules, written in Emacs lisp
(elisp) language, that makes all your programmings wishes come
true. Please introduce yourself to Emacs and your programming era
will get a new light.
Where to find Emacs
XEmacs/Emacs, is available to various platforms:
o Unix:
If you don't have one, bust your sysadm.
http://www.gnu.org/software/emacs/emacs.html
http://www.xemacs.org/
Emacs resources at http://home.eu.org/~jari/emacs-elisp.html
o W9x/NT:
http://www.gnu.org/software/emacs/windows/ntemacs.html
Emacs Perl Modules
Cperl -- Perl programming mode
.ftp://ftp.math.ohio-state.edu/pub/users/ilya/perl
.<olson@mcs.anl.gov> Bob Olson (started 1991)
.<ilya@math.ohio-state.edu> Ilya Zakharevich
Major mode for editing perl files. Forget the default
`perl-mode' that comes with Emacs, this is much better. Comes
starndard in newest Emacs.
TinyPerl -- Perl related utilities
.http://home.eu.org/~jari/tiny-tools-beta.zip
.http://home.eu.org/~jari/emacs-tiny-tools.html
If you ever wonder how to deal with Perl POD pages or how to find
documentation from all perl manpages, this package is for you.
Couple of keystrokes and all the documentaion is in your hands.
o Instant function help: See documentation of `shift', `pop'...
o Show Perl manual pages in *pod* buffer
o Load source code into Emacs, like Devel::DProf.pm
o Grep through all Perl manpages (.pod)
o Follow POD manpage references to next pod page with TinyUrl
o Coloured pod pages with `font-lock'
o Separate `tiperl-pod-view-mode' for jumping topics and pages
forward and backward in *pod* buffer.
o TinyUrl is used to jump to URLs (other pod pages, man pages etc)
mentioned in POD pages. (It's a general URL minor mode)
TinyIgrep -- Perl Code browsing and easy grepping
[TinyIgrep is included in the tgz mentioned above]
To grep from all installed Perl modules, define database to
TinyIgrep. There is example in the tgz (ema-tigr.ini) that shows
how to set up datatbases for Perl5, Perl4 whatever you have
installed
TinyIgrep calls Igrep.el to run the find for you, You can adjust
recursive grep options, ignored case, add user grep options.
You can get `igrep.el' module from <kevinr@ihs.com>. Ask for copy.
Check also ftp://ftp.ihs.com/pub/kevinr/
TinyCompile -- Browsing grep results in Emacs *compile* buffer
TinyCompile is minor mode for *compile* buffer from where
you can collapse unwanted lines, shorten the file URLs
/asd/asd/asd/asd/ads/as/da/sd/as/as/asd/file1:NNN: MATCHED TEXT
/asd/asd/asd/asd/ads/as/da/sd/as/as/asd/file2:NNN: MATCHED TEXT
-->
cd /asd/asd/asd/asd/ads/as/da/sd/as/as/asd/
file1:NNN: MATCHED TEXT
file1:NNN: MATCHED TEXT
End
------------------------------
Date: Sun, 02 Jul 2000 04:20:08 GMT
From: Bob Walton <bwalton@rochester.rr.com>
Subject: Re: how to get text enclosed by matching () ?
Message-Id: <395EC33E.7E47CFE6@rochester.rr.com>
RonR wrote:
> a line: "blabla bla (xyz, abc(n), xyz, abc(n)) etc etc"
> I want to get the list enclosed by the matching braces
> giving: "( xyz, abc(n), xyz, abc(n) )" or eventually without the braces.
>
> Is there an easy way to do this ? The braces are always in pairs (if that
> helps).
>
> any input will be apreciated,
> --
> Ronald. < \_ _ o o
In general, you can't do this a single regular expression. However, if
there is a limit to the depth to which your parentheses are nested, you
can use this code originally written by Jeffrey Friedl (with a slight
modification by me) to generate a regex that will match the longest
string containing balanced sets of parens up to a given depth. Note:
this routine dates from the Perl 4 days, and could be updated with my
and qr//, but works fine as-is. It used to give weird results if the
paren level was above a couple hundred; haven't tested that recently.
Complete example:
$bal=make_parenmatching_regex(10); #make pattern
$string=<DATA>;
chomp $string;
$string=~/\(($bal)/; #matches a ( followed by longest balanced string
print "$1\n";
#
## Given DEPTH, return a regex which will match a string with up
## to DEPTH levels of nested parens.
##
sub make_parenmatching_regex {
local($depth) = @_;
local($nonparen) = '[^()]';
"($nonparen|\\(" x $depth . "$nonparen*" . '\))*' x ($depth-1) .
'\))+';
}
__END__
blabla bla (xyz, abc(n), xyz, abc(n)) etc etc
prints:
xyz, abc(n), xyz, abc(n)
--
Bob Walton
------------------------------
Date: Sun, 02 Jul 2000 07:54:21 GMT
From: neil@brevity.org (Neil Kandalgaonkar)
Subject: Re: how to get text enclosed by matching () ?
Message-Id: <8jms2u$ibg$1@localhost.localdomain>
In article <395EC33E.7E47CFE6@rochester.rr.com>,
Bob Walton <bwalton@rochester.rr.com> wrote:
>RonR wrote:
>> a line: "blabla bla (xyz, abc(n), xyz, abc(n)) etc etc"
>> I want to get the list enclosed by the matching braces
>> giving: "( xyz, abc(n), xyz, abc(n) )" or eventually without the braces.
>>
>> Is there an easy way to do this ? The braces are always in pairs (if that
^^^^^^^^^^^^^^^
>In general, you can't do this a single regular expression.
If
1. the parens are known to be in pairs, and
2. well-balanced (?),
3. and 'escaped' parens are not allowed, e.g. \)
then surely:
($in_parens) = ( $string =~ /(\(.*\))/ );
works?
--
Neil Kandalgaonkar <neil@brevity.org>
------------------------------
Date: Sun, 02 Jul 2000 11:10:54 +0200
From: Jan Bessels <j.bessels@quicknet.nl>
Subject: Re: HTML::Parser docs and homepage
Message-Id: <395F071D.94B3E46A@quicknet.nl>
> The most recent version of The Perl Journal has a very good article
> about HTML::Parser
>
> http://www.itknowledge.com/tpj/
I've also been struggling with HTML::Parser and its (surprisingly but true)
quite difficult to get some good articles about it. The module is flexible
but quite complex. In the list of mentioned urls also I miss
http://www.stonehenge.com/merlyn/WebTechniques/col22.html. A very good
article by Randall L.Schwartz which did the trick for me - rel2abs parsing
of hrefs.
Usenet is all about sharing information. Hence I've included the text of the
mentioned TPJ article, which is quite good actually. Enjoy.
Parsing HTML with HTML::PARSER
Ken MacFarlane
Packages Used:
HTML::Parser...............................CPAN
Perl is often used to manipulate the HTML files constituting web pages.
For instance, one common task is removing tags from an HTML file to
extract the plain text. Many solutions for such tasks usually use
regular expressions, which often end up complicated, unattractive, and
incomplete (or wrong). The alternative, described here, is to use the
HTML::Parser module available on the CPAN (http://www.perl.com/CPAN).
HTML::Parser is an excellent example of what Sean Burke noted earlier in
this issue: some object-oriented modules require extra explanation for
casual users.
HTML::Parser works by scanning HTML input, and breaks it up into
segments by how the text would be interpreted by a browser. For
instance, this input:
<A HREF="index.html">This is a link</A>
would be broken up into three segments: a start tag (<A
HREF="index.html">), text (This is a link), and an end tag (</A>). As
each segment is detected, the parser passes it to an appropriate
subroutine. There's a subroutine for start tags, one for end tags, and
another for plain text. There are subroutines for comments and
declarations as well.
In this article, I'll first give a simple example on how to read and
print out all the information found by HTML::Parser. Next, I'll
demonstrate differences in the events triggered by the parser. Finally,
I'll show how to access specific information passed along by the parser.
As of this writing, there are two major versions of HTML::Parser
available. Both version 2 and version 3 work by having you subclass the
module. For this article, I will mostly concentrate on the subclassing
method, because it will work with both major versions, and is a bit
easier to understand for those not overly familiar with some of Perl's
finer details. In version 3, there is more of an emphasis on the use of
references, anonymous subroutines, and similar topics; advanced users
who may be interested will see a brief example at the end of this
article.
Getting Started
The first thing to be aware of when using HTML::Parser is that, unlike
other modules, it appears to do absolutely nothing. When I first
attempted to use this module, I used code similar to this:
#!/usr/bin/perl -w
use strict;
use HTML::Parser;
my $p = new HTML::Parser;
$p->parse_file("index.html");
No output whatsoever. If you look at the source code to the module,
you'll see why:
sub text
{
# my($self, $text) = @_;
}
sub declaration
{
# my($self, $decl) = @_;
}
sub comment
{
# my($self, $comment) = @_;
}
sub start
{
# my($self, $tag, $attr, $attrseq, $origtext) = @_;
# $attr is reference to a HASH, $attrseq is reference
to an ARRAY
}
sub end
{
# my($self, $tag, $origtext) = @_;
}
The whole idea of the parser is that as it chugs along through the HTML,
it calls these subroutines whenever it finds an appropriate snippet
(start tag, end tag, and so on). However, these subroutines do nothing.
My program works, and the HTML is being parsed - but I never instructed
the program to do anything with the parse results.
The Identity Parser
The following is an example of how HTML::Parser can be subclassed, and
its methods overridden, to produce meaningful output. This example
simply prints out the original HTML file, unmodified:
1 #!/usr/bin/perl -w
2
3 use strict;
4
5 # define the subclass
6 package IdentityParse;
7 use base "HTML::Parser";
8
9 sub text {
10 my ($self, $text) = @_;
11 # just print out the original text
12 print $text;
13 }
14
15 sub comment {
16 my ($self, $comment) = @_;
17 # print out original text with comment marker
18 print "parse_file("index.html");
Lines 6 and 7 declare the IdentityParse package, having it inherit from
HTML::Parser. (Type perldoc perltoot for more information on
inheritance.) We then override the text(), comment(), start(), and end()
subroutines so that they print their original values. The result is a
script which reads an HTML file, parses it, and prints it to standard
output in its original form.
The HTML Tag Stripper
Our next example strips all the tags from the HTML file and prints just
the text:
1 #!/usr/bin/perl -w
2
3 use strict;
4
5 package HTMLStrip;
6 use base "HTML::Parser";
7
8 sub text {
9 my ($self, $text) = @_;
10 print $text;
11 }
12
13 my $p = new HTMLStrip;
14 # parse line-by-line, rather than the whole
file at once
15 while (<>) {
16 $p->parse($_);
17 }
18 # flush and parse remaining unparsed HTML
19 $p->eof;
Since we're only interested in the text and HTML tags, we override only
the text() subroutine. Also note that in lines 13-17, we invoke the
parse() method instead of parse_file(). This lets us read files provided
on the command line. When using parse() instead of parse_file(), we must
also call the eof() method (line 19); this is done to check and clear
HTML::Parser's internal buffer.
Another Example: HTML Summaries
Suppose you've hand-crafted your own search engine for your web site,
and you want to be able to generate summaries for each hit. You could
use the HTML::Summary module described in TPJ #16, but we'll describe a
simpler solution here. We'll assume that some (but not all) of your
site's pages use a META tag to describe the content:
<META NAME="DESCRIPTION" CONTENT="description of file">
When a page has a META tag, your search engine should use the CONTENT
for the summary. Otherwise, the summary should be the first H1 tag if
one exists. And if that fails, we'll use the TITLE. Our third example
generates such a summary:
1 #!/usr/bin/perl -w
2
3 use strict;
4
5 package GetSummary;
6 use base "HTML::Parser";
7
8 my $meta_contents;
9 my $h1 = "";
10 my $title = "";
11
12 # set state flags
13 my $h1_flag = 0;
14 my $title_flag = 0;
15
16 sub start {
17 my ($self, $tag, $attr, $attrseq, $origtext) = @_;
18
19 if ($tag =~ /^meta$/i &&
$attr->{'name'} =~ /^description$/i) {
20 # set if we find <META NAME="DESCRIPTION"
21 $meta_contents = $attr->{'content'};
22 } elsif ($tag =~ /^h1$/i && ! $h1) {
23 # set state if we find <H1> or <TITLE>
24 $h1_flag = 1;
25 } elsif ($tag =~ /^title$/i && ! $title) {
26 $title_flag = 1;
27 }
28 }
29
30 sub text {
31 my ($self, $text) = @_;
32 # If we're in <H1>...</H1> or
<TITLE>...</TITLE>, save text
33 if ($h1_flag) { $h1 .= $text; }
34 if ($title_flag) { $title .= $text; }
35 }
36
37 sub end {
38 my ($self, $tag, $origtext) = @_;
39
40 # reset appropriate flag if we see </H1> or </TITLE>
41 if ($tag =~ /^h1$/i) { $h1_flag = 0; }
42 if ($tag =~ /^title$/i) { $h1_flag = 0; }
43 }
44
45 my $p = new GetSummary;
46 while (<>) {
47 $p->parse($_);
48 }
49 $p->eof;
50
51 print "Summary information: ", $meta_contents ||
52 $h1 || $title || "No summary information found.", "\n";
The magic happens in lines 19-27. The variable $attr contains a
reference to a hash where the tag attributes are represented with
key/value pairs. The keys are lowercased by the module, which is a
code-saver; otherwise, we'd need to check for all casing possibilities
(name, NAME, Name, and so on).
Lines 19-21 check to see if the current tag is a META tag and has a
field NAME set to DESCRIPTION; if so, the variable $meta_contents is set
to the value of the CONTENT field. Lines 22-27 likewise check for an H1
or TITLE tag. In these cases, the information we want is in the text
between the start and end tags, and not the tag itself. Furthermore,
when the text subroutine is called, it has no way of knowing which tags
(if any) its text is between. This is why we set a flag in start()
(where the tag name is known) and check the flag in text() (where it
isn't). Lines 22 and 25 also check whether or not $h1 and $title have
been set; since we only want the first match, subsequent matches are
ignored.
Another Fictional Example
Suppose that your company has been running a successful product site at
http://www.bar.com/foo/. However, the web marketing team decides that
http://foo.bar.com/ looks better in the company's advertising materials,
so a redirect is set up from the new address to the old.
Fast forward to Friday, 4:45 in the afternoon, when the phone rings. The
frantic voice on the other end says, "foo.bar.com just crashed! We need
to change all the links back to the old location!" Just when you though
a simple search-and-replace would suffice, the voice adds: "And
marketing says we can't change the text of the web pages, only the
links."
"No problem", you respond, and quickly hack together a program that
changes the links in A HREF tags, and nowhere else.
1 #!/usr/bin/perl -w -i.bak
2
3 use strict;
4
5 package ChangeLinks;
6 use base "HTML::Parser";
7
8 sub start {
9 my ($self, $tag, $attr, $attrseq, $origtext) = @_;
10
11 # we're only interested in changing <A ...> tags
12 unless ($tag =~ /^a$/) {
13 print $origtext;
14 return;
15 }
16
17 if (defined $attr->{'href'}) {
18 $attr->{'href'} =~
s[foo\.bar\.com][www\.bar\.com/foo];
19 }
20
21 print "<A ";
22 # print each attribute of the <A ...> tag
23 foreach my $i (@$attrseq) {
24 print $i, qq(="$attr->{$i}" );
25 }
26 print ">";
27 }
28
29 sub text {
30 my ($self, $text) = @_;
31 print $text;
32 }
33
34 sub comment {
35 my ($self, $comment) = @_;
36 print "<!-", $comment, "->";
37 }
38
39 sub end {
40 my ($self, $tag, $origtext) = @_;
41 print $origtext;
42 }
43
44 my $p = new ChangeLinks;
45 while (<>) {
46 $p->parse($_);
47 }
48 $p->eof;
Line 1 specifies that the files will be edited in place, with the
original files being renamed with a .bak extension. The real fun is in
the start() subroutine, lines 8-27. First, in lines 12-15, we check for
an A tag; if that's not what we have, we simply return the original tag.
Lines 17-19 check for the HREF and make the desired substitution.
$attrseq appears in line 23. This variable is a reference to an array
with the tag attributes in their original order of appearance. If the
attribute order needs to be preserved, this array is necessary to
reconstruct the original order, since the hash $attr will jumble them
up. Here, we dereference $attrseq and then recreate each tag. The
attribute names will appear lowercase regardless of how they originally
appeared. If you'd prefer uppercase, change the first $i in line 24 to
uc($i).
Using HTML::Parser Version 3
Version 3 of the module provides more flexibility in how the handlers
are invoked. One big change is that you no longer have to use
subclassing; rather, event handlers can be specified when the
HTML::Parser constructor is called. The following example is equivalent
to the previous program but uses some of the version 3 features:
1 #!/usr/bin/perl -w -i.bak
2
3 use strict;
4 use HTML::Parser;
5
6 # specify events here rather than in a subclass
7 my $p = HTML::Parser->new( api_version => 3,
8 start_h => [\amp;start,
9 "tagname, attr, attrseq, text"],
10 default_h => [sub { print shift }, "text"],
11 );
12 sub start {
13 my ($tag, $attr, $attrseq, $origtext) = @_;
14
15 unless ($tag =~ /^a$/) {
16 print $origtext;
17 return;
18 }
19
20 if (defined $attr->{'href'}) {
21 $attr->{'href'} =~
s[foo\.bar\.com][www\.bar\.com/foo];
22 }
23
24 print "<A ";
25 foreach my $i (@$attrseq) {
26 print $i, qq(="$attr->{$i}" );
27 }
28 print ">";
29 }
30
31 while (<>) {
32 $p->parse($_);
33 }
34 $p->eof;
The key changes are in lines 7-10. In line 8, we specify that the start
event is to be handled by the start() subroutine. Another key change is
line 10; version 3 of HTML::Parser supports the notion of a default
handler. In the previous example, we needed to specify separate handlers
for text, end tags, and comments; here, we use default_h() as a
catch-all. This turns out to be a code saver as well.
Take a closer look at line 9, and compare it to line 9 of the previous
example. Note that $self hasn't been passed. In version 3 of
HTML::Parser, the list of attributes which can be passed along to the
handler subroutine is configurable. If our program only needed to use
the tag name and text, we can change the string tagname, attr, attrseq,
text to tagname, text and then change the start() subroutine to only use
two parameters. Also, handlers are not limited to subroutines. If we
changed the default handler like this, the text that would have been
printed is instead pushed onto @lines.
my $p = HTML::Parser->new( api_version => 3,
start_h => [\&start,
"tagname, attr, attrseq, text"],
default_h => \@lines, "text"],
);
Version 3 of HTML::Parser also adds some new features; notably, one can
now set options to recognize and act upon XML constructs, such as <TAG/>
and <?TAG?>. There are also multiple methods of accessing tag
information, instead of the $attr hash. Rather than go into further
detail, I encourage you to explore the flexibility and power of this
module on your own.
Acknowledgments
The HTML::Parser module was written by Gisle Aas and Michael A. Chase.
Excerpts of code and documentation from the module are used here with
the authors' permission.
_ _END_ _
------------------------------
Date: Sun, 02 Jul 2000 06:26:57 GMT
From: dwang999@my-deja.com
Subject: insert non-ascii character through DBD-Oracle
Message-Id: <8jmnb8$8co$1@nnrp1.deja.com>
I am using:
1) Oracle 8i Release 2 on Solaris 2.6 running an instance with UTF8 as
the character set
2) Perl 5.005_03 built for sun4-solaris 2.7, DBI 1.14, DBD-Oracle-1.03
Problem:
To insert some non-ascii characters within the Perl program I wrote, it
returns:
ORA-01756: quoted string not properly terminated (DBD ERROR:
OCIStmtPrepare)
The same insert statement entered on the SQLPlus interface works fine.
Does DBD-Oracle support non-ascii chars? How do I solve this problem?
Thanks.
Sent via Deja.com http://www.deja.com/
Before you buy.
------------------------------
Date: Sun, 2 Jul 2000 16:15:36 +0800
From: "multiplexor" <abuse@localhost>
Subject: Is this code dangerous? eval{$$_ = $q->param($_)}
Message-Id: <8jmt8p$i8k39@imsp212.netvigator.com>
I am a newbie of Unix and security. I read a CGI faq that exposing client
data to shell is dangerous because one may type "rm -fr", sth like that.
However, I don't know if the following kind of eval is dangerous.
###
use CGI;
$q = new CGI;
@form_field = qw/name email/;
foreach (@form_field) {
eval{$$_ = $q->param($_)}
}
###
As I understand it, what the foreach loop do is equivalent to the following:
$name = $q->param(name);
$email = $q->param(email);
That's why I can't find any security hole when someone type any dangerous
comand. Can you comment on this code?
Thanks for your time.
------------------------------
Date: Sun, 02 Jul 2000 04:51:32 -0400
From: brian@smithrenaud.com (brian d foy)
Subject: Re: Is this code dangerous? eval{$$_ = $q->param($_)}
Message-Id: <brian-ya02408000R0207000451320001@news.panix.com>
In article <8jmt8p$i8k39@imsp212.netvigator.com>, "multiplexor" <abuse@localhost> posted:
> use CGI;
> eval{$$_ = $q->param($_)}
why not just use import_names() ?
--
brian d foy
CGI Meta FAQ <URL:http://www.smithrenaud.com/public/CGI_MetaFAQ.html>
Perl Mongers <URL:http://www.perl.org/>
------------------------------
Date: Sun, 2 Jul 2000 16:28:55 +0800
From: "Philip Chan" <philipc@i-cable.com>
Subject: Message board software
Message-Id: <8jmu74$72k5@rain.i-cable.com>
I'm looking for a good message board software, any suggestions?
Thanks.
------------------------------
Date: Sun, 02 Jul 2000 08:29:20 GMT
From: elephant@squirrelgroup.com (jason)
Subject: Re: off topic question about NT command shell [was: Dumb Perl (win32) Q]
Message-Id: <MPG.13c99aa5327fcd9b989765@news>
Jonathan Stowe writes ..
>On Sat, 01 Jul 2000 03:40:33 GMT jason wrote:
>>
>> BUT .. you should be debugging CGI scripts in a CGI environment ..
>>
>
>That really does depend on what part of the programming you are debugging
>really. If you are debugging parts that are specific to the CGI then
>perhaps, but in other cases it might more appropriate to test at the
>command line as it easier to control the environment and avail yourself of
>diagnostic information.
in a general sense that's certainly true .. I meant my comment in a
specific sense .. the originator reported scrolling pages of HTML output
.. sounds like they're debugging could benefit from a proper CGI
environment to me
--
jason - elephant@squirrelgroup.com -
------------------------------
Date: Sun, 02 Jul 2000 04:27:22 GMT
From: costas_menico@mindspring.com (Costas Menico)
Subject: Perl Newbie Question
Message-Id: <395ec382.2107099@news.bellatlantic.net>
I am new to Perl. I need to uniquely identify the user who accesses
my application. There is no user or password login.
I wanted to know the recommended way of tracking a user's
actions/inputs when they jump from form to form in the application.
Is there an equivalent Application or Session variables like in
ASP/VBScript?
Also, is using cookies the recommended way? I have been using the IP
address of the user and storing it in a table on the server. Does
anyone see a problem with this method?
Is there some wbesite that discusses these issues?
Thanks
Costas
------------------------------
Date: Sun, 02 Jul 2000 00:56:42 -0400
From: "Rob - Rock13.com" <rob13@rock13.com>
Subject: Re: Perl Newbie Question
Message-Id: <395ECB8A.37C545AF@rock13.com>
Costas Menico wrote:
>
> I am new to Perl. I need to uniquely identify the user who accesses
> my application. There is no user or password login.
>
> I wanted to know the recommended way of tracking a user's
> actions/inputs when they jump from form to form in the application.
> Is there an equivalent Application or Session variables like in
> ASP/VBScript?
> Also, is using cookies the recommended way? I have been using the IP
> address of the user and storing it in a table on the server. Does
> anyone see a problem with this method?
Cookies are typical, set for the session only - longer than that and
it may not be the same person.
Not everyone has a unique IP, I don't I'm on a dialup connection so
get a random IP from the ISP.
> Is there some wbesite that discusses these issues?
Try a search engine.
--
Rob - http://rock13.com/
Web Stuff: http://rock13.com/webhelp/
------------------------------
Date: Sun, 2 Jul 2000 11:00:13 +0200
From: "Alan J. Flavell" <flavell@mail.cern.ch>
Subject: Re: Perl Newbie Question
Message-Id: <Pine.GHP.4.21.0007021054380.23066-100000@hpplus03.cern.ch>
On Sun, 2 Jul 2000, Costas Menico wrote:
> I am new to Perl. I need to uniquely identify the user who accesses
> my application. There is no user or password login.
Your problem, as stated, is impossible. You defined it to be
impossible. The programming language is irrelevant.
Redefine the problem.
Consult Nick Kew's CGI FAQ first, and the
comp.infosystems.www.authoring.cgi usenet group when you have a better
idea of what you're trying to achieve.
------------------------------
Date: Sun, 2 Jul 2000 12:05:35 +0200
From: "Brendon Caligari" <bcaligari@shipreg.com>
Subject: Re: Perl Newbie Question
Message-Id: <8jn3ib$hpb$1@news.news-service.com>
"Costas Menico" <costas_menico@mindspring.com> wrote in message
news:395ec382.2107099@news.bellatlantic.net...
> I am new to Perl. I need to uniquely identify the user who accesses
> my application. There is no user or password login.
interesting....i assume you're talking about a web based / application
>
> I wanted to know the recommended way of tracking a user's
> actions/inputs when they jump from form to form in the application.
> Is there an equivalent Application or Session variables like in
> ASP/VBScript?
cookies. This will identify the browser really. So if my gf comes
and uses my computer your application will still believ it's me.
>
> Also, is using cookies the recommended way? I have been using the IP
> address of the user and storing it in a table on the server. Does
> anyone see a problem with this method?
>
yes...if the users are using NAT or get their ip through some sort of dhcp
or
address pool (say..like dialup users to the itnernet) you can get the same
user
using different IPs or different users using the same IP.
> Is there some wbesite that discusses these issues?
>
There are loads of goodies at http://www.cpan.org . I'm no web/perl
expert (2 week perl exposure) but all roads seem to lead to the above
web site ;>
Hope i could have helped.
B.
------------------------------
Date: Sun, 02 Jul 2000 06:25:52 GMT
From: anuragmenon@my-deja.com
Subject: Perl Subroutines - some help required
Message-Id: <8jmn98$8cg$1@nnrp1.deja.com>
Hi all..
I am a perl newbie and have recieved some valuable help from this
newsgroup. I have one more question. Could somebody "demystify" Perl
Subroutines for me. I am used to the traditional parameter passing
routines and when I look up documentation surprisingly they are all the
SAME and I dont get it. I guess I am a little thick at the moment but
cant help it!
This is what I want to do
1. Pass a string to a subroutine: How do I pass the parameter?
It should return a file name after doing some manipulations to the
string. This file then needs to be just directly opened on a browser
from the calling program which is a CGI - script. The entire
application is a CGI script. I dont plan to keep the subroutine in a
seperate file. Just at the end of the same .pl file.
I am writing some pseudocode down here. Let me know of your comments
---Main Program---
-lines of code -
-lines of code -
my $stringtobepassed = test;
my $filename = GenerateFileName($stringtobepassed);
- Open this $filename in a browser - I am not sure how to do this - so
any suggestions here will be welcome too - should I just construct the
entire url like 'http://www.blah.edu/blah/blah/$filename' and try to
open it?- pardon my ignorance if that is a stupid approach! -
Now, the subroutine
sub GenerateFileName{
----
----
-----
----
return $NewFileName;
}
How do I access/pass the string $stringtobepassed in the subroutine?
I appreciate all suggestions and code and pseudocode and everything!
Vinod.
Sent via Deja.com http://www.deja.com/
Before you buy.
------------------------------
Date: Sun, 2 Jul 2000 02:45:38 -0400
From: tadmc@metronet.com (Tad McClellan)
Subject: Re: Perl Subroutines - some help required
Message-Id: <slrn8ltp8i.9b4.tadmc@magna.metronet.com>
On Sun, 02 Jul 2000 06:25:52 GMT, anuragmenon@my-deja.com <anuragmenon@my-deja.com> wrote:
>
>I have one more question. Could somebody "demystify" Perl
>Subroutines for me.
We generally try to demystify things that are mysterious.
Passing arguments is well documented (i.e. not mysterious).
>I am used to the traditional parameter passing
^^^^^^^^^^^
What does that mean?
What would untraditional parameter passing be?
>routines and when I look up documentation
^^^^^^^^^^^^^
You mean like
perldoc perlsub
?
>surprisingly they are all the
>SAME and I dont get it.
I don't get what you are having a problem with...
>I guess I am a little thick at the moment but
>cant help it!
ditto, it would appear.
>This is what I want to do
>
>1. Pass a string to a subroutine: How do I pass the parameter?
> my $stringtobepassed = test;
Barewords are bad!
Do not use barewords.
Do enable warnings to be warned about barewords.
my $stringtobepassed = 'test';
> my $filename = GenerateFileName($stringtobepassed);
You already know how to pass parameters to a subroutine!
You do not have a problem at all!
> - Open this $filename in a browser - I am not sure how to do this
Me either. Because I cannot figure out what "opening a file
in a browser" might mean.
I think you may have a profound misunderstanding of how the
technology (CGI) works.
Nothing much of interest to the CGI programmer happens in
a browser.
A browser supplies input to the CGI program, and the CGI
program's output is sent to a browser.
The browser is just a (somewhat fancy) input/output device
for the CGI program running on the server.
>- so
>any suggestions here will be welcome too - should I just construct the
>entire url like 'http://www.blah.edu/blah/blah/$filename' and try to
>open it?- pardon my ignorance if that is a stupid approach! -
Maybe there is some CGI way of "redirect"ing to another web
page, but that is a CGI question, best asked in a newsgroup
that has some connection with WWW stuff, such as:
comp.infosystems.www.authoring.cgi
If the file is on the server, you can just open()
and print() it.
perldoc -f open
perldoc -f print
>Now, the subroutine
>
>sub GenerateFileName{
>
>----
>----
>-----
>----
>return $NewFileName;
>}
>
>How do I access/pass the string $stringtobepassed in the subroutine?
^^^^^^
my($str) = @_; # first line in body of subroutine (usually)
Just like it says near the very top of perlsub.pod...
--
Tad McClellan SGML Consulting
tadmc@metronet.com Perl programming
Fort Worth, Texas
------------------------------
Date: 16 Sep 99 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 16 Sep 99)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
| NOTE: The mail to news gateway, and thus the ability to submit articles
| through this service to the newsgroup, has been removed. I do not have
| time to individually vet each article to make sure that someone isn't
| abusing the service, and I no longer have any desire to waste my time
| dealing with the campus admins when some fool complains to them about an
| article that has come through the gateway instead of complaining
| to the source.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V9 Issue 3535
**************************************