[10412] in Perl-Users-Digest
Perl-Users Digest, Issue: 4005 Volume: 8
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Oct 19 01:02:29 1998
Date: Sun, 18 Oct 98 22:00:14 -0700
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sun, 18 Oct 1998 Volume: 8 Number: 4005
Today's topics:
Re: File I/O (Ronald J Kimball)
Re: File I/O <eugene@vertical.net>
Re: File I/O <eugene@vertical.net>
Re: I have a few elementary Perl problems <rick.delaney@shaw.wave.ca>
Re: I have a few elementary Perl problems (Ronald J Kimball)
Re: I have a few elementary Perl problems (Ilya)
Re: Looking for a shopping cart kmcarthur@my-dejanews.com
looking for programmer or programmers to make script fo (PuFF DaDDz)
Perl for Win 3.11 (Ei/No)
Re: problems calling procmail (Nick Halloway)
Re: Slow Sort? (Ilya Zakharevich)
Re: Slow Sort? <uri@sysarch.com>
Re: Slow Sort? (Ilya Zakharevich)
Re: Sorry (Ronald J Kimball)
Re: sub and return (Ronald J Kimball)
Re: The space deletion woes... (Ronald J Kimball)
What's the word on self-tie? and other questions ... <scribble@pobox.com>
Special: Digest Administrivia (Last modified: 12 Mar 98 (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sun, 18 Oct 1998 22:14:42 -0400
From: rjk@coos.dartmouth.edu (Ronald J Kimball)
Subject: Re: File I/O
Message-Id: <1dh45xq.yah7nkaj035wN@bos-ip-2-208.ziplink.net>
Bob L. <newsonly@usa.net> wrote:
> 2. I trid using +> when I open the file for read and write but for some
> reason it will not write the counters to the file. Any idea why?
Are you sure the problem isn't that the program won't read the counter
from the file? If you open with +>, you're clobbering the results of
the file. Try +< instead.
--
_ / ' _ / - aka - rjk@coos.dartmouth.edu
( /)//)//)(//)/( Ronald J Kimball chipmunk@m-net.arbornet.org
/ http://www.ziplink.net/~rjk/
"It's funny 'cause it's true ... and vice versa."
------------------------------
Date: Mon, 19 Oct 1998 00:42:45 -0400
From: "Eugene Sotirescu" <eugene@vertical.net>
Subject: Re: File I/O
Message-Id: <362ac42a.0@news.dca.net>
Bob L. wrote in message ...
>2. I trid using +> when I open the file for read and write but for some
>reason it will not write the counters to the file. Any idea why?
>
>Thank you very much.
Bob,
don't use +>
It is equivalent to:
sysopen (FH, $file, O_RDWR|O_TRUNC|O_CREAT);
i.e., it will clobber your existing file.
------------------------------
Date: Mon, 19 Oct 1998 00:51:10 -0400
From: "Eugene Sotirescu" <eugene@vertical.net>
Subject: Re: File I/O
Message-Id: <362ac622.0@news.dca.net>
Bob L. wrote in message <1puW1.172$H9.53270@proxye1.nycap.rr.com>...
>Hi and thank you:
>
>I have included the I/O code below. I tried +> when I open the file and it
>will not write to the file. Actually, the file gets wiped out. My first
>question was answered but now another short one. if 2 cgi's need to access
>the same text file and one of the cgi's is running, when the second
>executes, what happens when the file is locked as shown below?
>
>I really appreciate your help on this.
>
>Bob
>
>
># Open the file containing phrases and read it in.
>open(FILE,"$random_file") || &error('open->random_file',$random_file);
>flock(FILE,2);
>@FILE = <FILE>;
>#close(FILE);&print_line;
>
>
>
>sub print_line {
>
> THE VARIABLE $NEWFILE IS SET IN THE CODE I DELETED
>
># open(CHANGE, ">$random_file");
> print FILE $newfile;
> flock(FILE,8);
> close(FILE);
>
> exit;
> }
1. If this is the code you're running, nothing should happen since you've
commented out the call to print_line().
------------------------------
Date: Mon, 19 Oct 1998 01:39:58 GMT
From: Rick Delaney <rick.delaney@shaw.wave.ca>
Subject: Re: I have a few elementary Perl problems
Message-Id: <362A99DD.3868CCA5@shaw.wave.ca>
[posted & mailed]
Ilya wrote:
>
> #!/usr/contrib/bin/perl
> # Open a file in the format:
> # fully_qualified_name1
> # fully_qualified_name2
> # fully_qualified_name3
>
> open (MASTER_LIST, "<master_list"); #open master_list for reading
>
Check success of open, yada, yada, yada.
> while (<MASTER_LIST>) # read until EOF
> {
>
> chop;
chomp would be better, particularly since you probably have \r\n as your
line terminators to get the kind of behaviour you specify. If so, set
$/ = "\r\n" and use chomp.
> system ("remsh $_ command > /home/ilya/web/sysinfo/$_/command");
>
> }
> close (MASTER_LIST);
>
>
> However, the problem is that in the example above,
> /home/ilya/web/sysinfo/$_/command apparently does not expand to
> /home/ilya/web/sysinfo/machineXX/command file as it should and I am
> getting an error message:
>
If you want to see know what it expands to, print it. You can run it
through od -c to see if there are invisible characters that might be
screwing you up, or maybe put some brackets around it so you can see
things like carriage returns easily.
print "remsh ($_) command > /home/ilya/web/sysinfo/($_)/command"
> $ ./build_page.pl
> sh: /home/ilya/web/sysinfo/machine1: Cannot create the specified
> file.
That looks like what you would get if $_ had a carriage return on the
end.
--
Rick Delaney
rick.delaney@shaw.wave.ca
------------------------------
Date: Sun, 18 Oct 1998 22:14:45 -0400
From: rjk@coos.dartmouth.edu (Ronald J Kimball)
Subject: Re: I have a few elementary Perl problems
Message-Id: <1dh462g.cv1pyjvyknsN@bos-ip-2-208.ziplink.net>
[posted and mailed]
[BOO to stealth cc's]
[followups set]
Ilya wrote:
>
> | Do the directories already exist? If not, perhaps the error message
> | means that the shell cannot even create the directory in which you have
> | asked it to create the file.
>
> Yes, the directories already exist.
>
> All of the following create files the way I want to:
>
> system ("remsh $_ command > /tmp/test.$_"
> system ("remsh $_ command > /home/test.$_"
> system ("remsh $_ command > /home/ilya/test.$_"
> system ("remsh $_ command > /home/ilya/web/test.$_"
> system ("remsh $_ command > /home/ilya/web/sysinfo/test.$_"
>
> But
>
> system ("remsh $_ command > /home/ilya/web/sysinfo/$_/test.$_"
>
> Gives me the erorr. What's up with that?
> system ("remsh $_ command > /home/ilya/web/sysinfo/$_/command"
Hmm.... Could there be a space at the end of $_, perhaps because the
file has spaces at the end of each line?
Try adding this to your script, before the system line:
warn "executing: remsh $_ command >
/home/ilya/web/sysinfo/$_/command\n";
just to make sure the command you're sending is what you expect.
Hope that helps!
Ronald
------------------------------
Date: 19 Oct 1998 03:23:53 GMT
From: ilya@ns1.foothill.net (Ilya)
Subject: Re: I have a few elementary Perl problems
Message-Id: <70ebc9$m2a$1@ns2.foothill.net>
Elaine -HappyFunBall- Ashton (eashton@bbnplanet.com) wrote:
: Patience is never easy, but it will help you see the problems with your
: approach.
: /me sends poster a big mug of Newcastle.
: e.
Thanks.
I was helped to figure out what the problem was by the posts here, but I
still have not solved it. :( I want the execute a command on the remote
machine and redirect the output to the local host. The problem is that
Perl tries to create that output file on the remote machine. What I was
saying is this:
system ("remsh $_ command > /home/ilya/web/sysinfo/$_/test.$_")
What I really want is:
system ("remsh $_ command" > /home/ilya/web/sysinfo/$_/test.$_)
But that gives me an error as well. The $_ variable is fine.
Ilya
------------------------------
Date: Mon, 19 Oct 1998 01:00:32 GMT
From: kmcarthur@my-dejanews.com
Subject: Re: Looking for a shopping cart
Message-Id: <70e2vg$19$1@nnrp1.dejanews.com>
You might be interested in a survey that we are taking of actual costs paid
for shopping cart websites. The information is at http://www.mbsinternet.com
Kenneth A. McArthur
MBS Internet
Internet Services for Business
www.mbsinternet.com
mailto:kmcarthur@mbsinternet.com
-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
------------------------------
Date: 19 Oct 1998 01:48:39 GMT
From: puffdaddz@aol.com (PuFF DaDDz)
Subject: looking for programmer or programmers to make script for free webspace provider like geocities, etc...
Message-Id: <19981018214839.05269.00002268@ng-fc1.aol.com>
will need to make sign up script and login area parts---look at
www.hypermart.net or www.geocities.com ways to see kinda what i want...
contact me by icq 6707316 or by email
brandon@telaweb.com
------------------------------
Date: Mon, 19 Oct 1998 03:20:47 GMT
From: cock-a-doodle-doo@copacabana.com (Ei/No)
Subject: Perl for Win 3.11
Message-Id: <362aafc3.1161444@news.kolumbus.fi>
Hi!
Where I can find Perl compiled for Windows 3.11?
Ei/No
------------------------------
Date: 19 Oct 1998 01:36:55 GMT
From: snowe@rain.org (Nick Halloway)
Subject: Re: problems calling procmail
Message-Id: <70e53n$j7$1@news.rain.org>
Nick Halloway (snowe@rain.org) wrote:
@ I'm having problems with the perl version 5.0 code below.
@ It doesn't seem to know that save-incoming is a .procmailrc file,
@ it gives errors
@
@ :0:: not found
@ !snowe@rain.org: not found
@
@ -- the lines in save-incoming are
@ :0:
@ !snowe@rain.org
@
@ but it doesn't die.
It thinks my procmail rc file is an executable. When I chmod to 644 I
get an error message: Permission denied. Why is it doing that?
@
@ Thanks ...
@
@ ##################################################################
@ # archive
@ open( COMMAND, "| procmail -f- $MNG_ROOT/etc/procmail/save-incoming" )
@ || die "open procmail didnt work";
@
@ print COMMAND $Body || die "print to procmail didnt work";
@
@ close( COMMAND );
@
@ unlink( $TmpFile );
@
@
@
:
:
------------------------------
Date: 19 Oct 1998 00:10:47 GMT
From: ilya@math.ohio-state.edu (Ilya Zakharevich)
Subject: Re: Slow Sort?
Message-Id: <70e027$d7c$1@mathserv.mps.ohio-state.edu>
[A complimentary Cc of this posting was sent to Steve Monson
<monson@tri.sbc.com>],
who wrote in article <70dsqv$9l6@euphony.tri.sbc.com>:
> I brought up the original problem. As I mentioned in my post, I
> am not doing a sort of some convoluted structure. I have performed
> a few computaional sections, and ended up with a list @s containing
> 102,442 small integers (ranging from 1 to about 100 or so). The
> single line which took over 40 minutes was
>
> @s = sort {$b <=> $a} @s;
>
> I am currently doing a bit of checking to see why my program is taking
> up so much memory (the raw data is only 6MB or so, but by the time I
> get to this sort (having built some hash tables on the way), 28MB
> is in use), and where I can put the sort to run in the 10-15 seconds
> which it should take.
timeit perl -e "$#s = 100000; push @s, int rand 300 while ++$i <= 100000; \
@s = sort {$b <=> $a} @s"
Elapsed: 0:00:13.06
Memory usage:
Memory allocation statistics after compilation: (buckets 4(4)..8188(8192)
15700 free: 124 117 2 28 10 8 0 1 1 0 0
432 57 25 26 5
63432 used: 131 137 187 65 6 8 4 15 1 1 1
79 113 315 100 20
Total sbrk(): 81920/14:163. Odd ends: pad+heads+chain+tail: 0+740+0+2048.
Memory allocation statistics after execution: (buckets 4(4)..1052668(1048576)
2413388 free: 124 117 0 28 10 7 0 1 0 0 0 0 0 0 0 0 0 0
432 57 100029 25 5
10948268 used: 131 137 189 65 6 9 4 5235 2 1 1 0 0 0 0 0 0 3
79 113 100316 101 20
Total sbrk(): 13740032/87:236. Odd ends: pad+heads+chain+tail: 0+40456+0+337920.
Looks pretty normal. Probably you have some old version of perl AND
broken system libraries.
Ilya
------------------------------
Date: 18 Oct 1998 23:06:22 -0400
From: Uri Guttman <uri@sysarch.com>
To: Alexis Huxley <alexis@danae.demon.co.uk>
Subject: Re: Slow Sort?
Message-Id: <x7u311s11t.fsf@sysarch.com>
>>>>> "AH" == Alexis Huxley <alexis@danae.demon.co.uk> writes:
AH> I have a hash %attribs containing structured records detailing
AH> various file attributes. This is indexed on device and inode
AH> number as returned by stat() (or lstat() for symlinks).
are you sure it is the sort that kills you and not the printf? a sort of
27k items like you show should not be killer. the printf on the
otherhand could be since it is much less efficient than print.
AH> Having built the hashes, I then output the attributes for each
AH> filename with this bit of code:
i can't say for sure why you sort is slow but i have some ideas to speed
up you code. i notice you are using the old perl4 convention of hashes
with comma separated lists of keys being made into pseudo array. this
should be done with multilevel hashes as they won't have any problems
with potential data use of $;
also you use main:: which incurs an extra symbol table lookup (i am not
sure if this is runtime or compile time). it does clutter up the
code. maybe using a typeglob to localize the vars to the current package
would simplify and possibly speedup the code.
AH> foreach $key (sort {$main::names{$a} cmp $main::names{$b}} (keys %main::names)) {
maybe use a temp variable for the keys and then sort that and then do
the loop. perl my have to allocate large temp arrays and not knowing the
size in advance may cause extra overhead. again it is cluttered code
because of all the main::
AH> ($dev, $inode, $i) = split($;, $key);
AH> # output all interesting information except list of other links.
AH> printf $out_handle "%s$main::field_separator%s$main::field_separator%d$main::field_separator%d$main::field_separator%o$main::field_separator%d$main::field_separator%s", $main::names{$key}, $main::attribs{$dev,$inode}->{type}, $main::attribs{$dev,$inode}->{uid}, $main::attribs{$dev,$inode}->{gid}, $main::attribs{$dev,$inode}->{mode}, $main::attribs{$dev,$inode}->{nlink}, $main::attribs{$dev,$inode}->{contents};
this statement coul dbe doen with one long hereis doc string. it would
save a large amount of time since you are just printf'ing string and
numbers without any actual formatting.
AH> # If there are other links then also list them all except the
AH> # one by which we're listing it of course!
AH> if ($main::attribs{$dev,$inode}->{namecnt} != 1) {
AH> for ($j=0; $j<$main::attribs{$dev,$inode}->{namecnt}; $j++) {
AH> if ($j != $i) {
AH> print $out_handle "$main::field_separator$main::names{$dev, $inode, $j}";
AH> }
AH> }
AH> }
AH> print $out_handle "$main::record_separator";
AH> }
hth,
uri
--
Uri Guttman ----------------- SYStems ARCHitecture and Software Engineering
Perl Hacker for Hire ---------------------- Perl, Internet, UNIX Consulting
uri@sysarch.com ------------------------------------ http://www.sysarch.com
The Best Search Engine on the Net ------------- http://www.northernlight.com
------------------------------
Date: 19 Oct 1998 04:31:25 GMT
From: ilya@math.ohio-state.edu (Ilya Zakharevich)
Subject: Re: Slow Sort?
Message-Id: <70efat$nqu$1@mathserv.mps.ohio-state.edu>
[A complimentary Cc of this posting was sent to Steve Monson
<monson@tri.sbc.com>],
who wrote in article <70dsqv$9l6@euphony.tri.sbc.com>:
> I am currently doing a bit of checking to see why my program is taking
> up so much memory (the raw data is only 6MB or so, but by the time I
> get to this sort (having built some hash tables on the way), 28MB
> is in use), and where I can put the sort to run in the 10-15 seconds
> which it should take.
I'm sorry, my previous test program was absolutely buggy and was doing
a complete junk. Here is better one:
timeit perl -e "$#s = $l = shift; $#s = -1; \
push @s, int rand 300 while ++$i < $l; \
@s = sort {$b <=> $a} @s" 100000
1
Elapsed: 0:00:06.31
With PERL_DEBUG_MSTATS=2 :
Memory allocation statistics after compilation: (buckets 4(4)..8188(8192)
16932 free: 123 116 53 24 10 8 0 1 1 0 0
429 56 23 25 5
64216 used: 132 138 199 69 6 8 4 15 1 1 1
82 114 317 101 20
Total sbrk(): 81920/14:163. Odd ends: pad+heads+chain+tail: 0+772+0+0.
1
Memory allocation statistics after execution: (buckets 4(4)..528380(524288)
2414612 free: 122 116 51 24 10 7 0 1 0 0 0 0 0 0 0 0 0
429 56 100027 24 5
6917996 used: 133 138 201 69 6 9 4 2825 2 1 1 0 0 0 0 0 3
82 114 100318 102 20
Total sbrk(): 9449472/99:248. Odd ends: pad+heads+chain+tail: 0+30848+0+86016.
Without taking an `int':
Memory allocation statistics after compilation: (buckets 4(4)..8188(8192)
16980 free: 123 116 54 25 10 8 0 1 1 0 0
429 56 23 24 5
64168 used: 132 138 198 68 6 8 4 15 1 1 1
82 114 317 102 20
Total sbrk(): 81920/14:163. Odd ends: pad+heads+chain+tail: 0+772+0+0.
1
Memory allocation statistics after execution: (buckets 4(4)..528380(524288)
13496 free: 122 116 52 25 10 7 0 0 0 0 0 0 0 0 0 0 0
429 56 21 23 5
5753192 used: 133 138 200 68 6 9 4 4036 2 1 1 0 0 0 0 0 3
82 114 319 103 20
Total sbrk(): 5851136/82:231. Odd ends: pad+heads+chain+tail: 0+16864+0+67584.
This is quite marvelous. Only 58.5 bytes per one sorted number.
Ilya
------------------------------
Date: Sun, 18 Oct 1998 22:14:47 -0400
From: rjk@coos.dartmouth.edu (Ronald J Kimball)
Subject: Re: Sorry
Message-Id: <1dh473k.10zuh53ojl7nrN@bos-ip-2-208.ziplink.net>
Lee Brandson <rlb@intrinsix.ca> wrote:
> In article <1dgyuxt.1s9yanzvwev18N@bos-ip-1-106.ziplink.net>,
> rjk@coos.dartmouth.edu (Ronald J Kimball) wrote:
>
> > [posted and mailed]
>
> Why? Posting was sufficient.
Because I wanted to make sure you read it, of course.
I did not CC this one. I hope you read it anyway.
> > Lee Brandson <rlb@intrinsix.ca> wrote:
> >
> > > As an occasional watcher of this ng and its predecessor for some two years
> > > now, I would like to ask whether it is strictly necessary to be as rude as
> > > possible when answering (or not answering, as the case may be) a question?
> > > Is this what it takes to be "in the club?
> >
> > If you were more than an "occasional" watcher of this ng and its
> > predecessor, then you would know that this very question has been asked
> > numerous times. No, it's not in the FAQ. That's probably because there
> > is no agreed upon answer; instead it just leads to another drawn-out
> > debate between people who are unlikely to be convinced to change their
> > own opinion on the matter.
>
> The question was not the topic of my comment. I was addressing the nature
> of the reply. More generally, I was addressing the general tone of replies
> throughout this ng. This is not say that all replies are rude, dismissive,
> insulting, quarrelsome... but opportunities for such replies are seldom
> overlooked.
You completely misunderstood. I was referring to *your* question.
[In fact, I don't see any other questions asked in the thread leading up
to your post. What else could I have been referring to? :-) ]
"As an occasional watcher of this ng and its predecessor for some two
years now, I would like to ask whether it is strictly necessary to be as
rude as possible when answering (or not answering, as the case may be) a
question? Is this what it takes to be 'in the club'?"
That question has been asked numerous times. No, it's not in the FAQ.
That's probably because there is no agreed upon answer; instead it just
leads to another drawn-out debate between people who are unlikely to be
convinced to change their own opinion on the matter.
> > > Do you enjoy the unending long threads of justifications for such
> > > rudeness?
> >
> > Oh, apparently you do know. No, I don't enjoy the threads. And thank
> > you *so much* for starting yet one more!
>
> I did not start it. I merely jumped in to suggest that a polite reply
> would have all of the benefits and none of the unfortunate side effects of
> a rude reply.
Yes, you did start the thread. Your post is the one which started the
justification for such rudeness. [1]
Once again, thank you *so much*.
[1] I'm sure you'll try to say that you didn't. To save you the
trouble, please refer to dejanews and answer the following questions:
How many posts justifying rudeness are sent in response to posts asking
FAQs?
How many posts justifying rudeness are sent in response to rude replies
to FAQs?
How many posts justifying rudeness are sent in response to posts
condemning rude replies?
--
_ / ' _ / - aka - rjk@coos.dartmouth.edu
( /)//)//)(//)/( Ronald J Kimball chipmunk@m-net.arbornet.org
/ http://www.ziplink.net/~rjk/
"It's funny 'cause it's true ... and vice versa."
------------------------------
Date: Sun, 18 Oct 1998 22:14:57 -0400
From: rjk@coos.dartmouth.edu (Ronald J Kimball)
Subject: Re: sub and return
Message-Id: <1dh47rc.jtakye1h812gxN@bos-ip-2-208.ziplink.net>
Tad McClellan <tadmc@flash.net> wrote:
> ( I assume that "brackets" means "parenthesis" over there?
>
> I know you talk funny (now I know who was first, and who
> it really is that talks funny, but I don't want to
> acknowledge that in a public forum such as this ;-)
> )
It could be worse... One of my coworkers, who is just learning Perl,
refers to *every* set of paired characters as 'quotes'. To him, the
following code contains four 'quoted' parts:
if (/$foo/) {
print "foo\n";
}
!!!
--
_ / ' _ / - aka - rjk@coos.dartmouth.edu
( /)//)//)(//)/( Ronald J Kimball chipmunk@m-net.arbornet.org
/ http://www.ziplink.net/~rjk/
"It's funny 'cause it's true ... and vice versa."
------------------------------
Date: Sun, 18 Oct 1998 22:14:59 -0400
From: rjk@coos.dartmouth.edu (Ronald J Kimball)
Subject: Re: The space deletion woes...
Message-Id: <1dh47z9.1bzlb1rpm0qj1N@bos-ip-2-208.ziplink.net>
Brent Michalski <perlguy@technologist.com> wrote:
> > > =~ s/\s+?//g;
> >
> > Useless use of non-greedy matching. That will never match more than one
> > space, because it never has a reason to.
> >
>
> =~ s/\s+?//g; works fine for me.
I didn't say it didn't work, did I? I said that s/\s+?//g will never
match more than one space. In other words, it's functionally equivalent
to s/\s//g;. Not very efficient.
> I was under the impression that Perl regex's were _greedy_ unless you
> used the ?: operator to make them non-greedy...
I think you meant the '?' quantifier. ?: is a trinary logical operator
that has nothing to do with regular expressions.
Anyway, I'm not sure what the point is that you're trying to make.
Obviously, Perl's regex quantifiers are greedy, unless you add a '?'.
--
_ / ' _ / - aka - rjk@coos.dartmouth.edu
( /)//)//)(//)/( Ronald J Kimball chipmunk@m-net.arbornet.org
/ http://www.ziplink.net/~rjk/
"It's funny 'cause it's true ... and vice versa."
------------------------------
Date: 18 Oct 1998 23:01:24 -0500
From: Tushar Samant <scribble@pobox.com>
Subject: What's the word on self-tie? and other questions ...
Message-Id: <70edik$18o@tekka.wwa.com>
[These unrelated questions are in the same post because they
came up in the same software I am trying to write]
1. I just heard "rumours" that tying a filehandle to an object
which itself contains the filehandle has bugs. Could someone
enlighten?
2. Is there going to be a final word on overload? What is probably
the most enjoyable man page in the entire perl documentation is
marred by a big disclaimer out front saying there is no final
word on it. Hasn't that been the case for some years now?
Description of what I am working on:
(any advice on how to do it better is also much appreciated)
These are a few modules to parse HTML with 'high fidelity' to
the source HTML. That is to say, you want to parse while losing
as little of the original source formatting as possible. Ideally,
if you did nothing to the parsed structure, then applying an
'as_HTML' function to it should recreate *exactly* the original
HTML.
This comes up when you are modifying some given HTML or writing
a 'wizard' to generate HTML; I am convinced that the only way not
to create unreadable code is to have something such as the above.
Secondly, I found HTML::Parser (from which my code is stolen)
increasingly difficult for quick-and-dirty parsing. In fact,
most people around here, including me, are web people primarily
and are loath to use object oriented notation.
With these considerations in mind, I have written three modules
(everything including names is provisional):
Tag.pm: Exports function 'Tag', which parses an HTML-like tag
into an overloaded object. Example:
$t = Tag '<body bgcolor="#0000ff">';
$t |= '<bgcolor="white" background="x.gif">';
print $t;
#prints <body bgcolor="#0000ff" background="x.gif">
Six other commonly needed functions are overloaded, mostly treating
the list of attributes as a "bitmap". User can generally avoid object
methods. (This is the reason for the overload question).
TagHash.pm: This is a combination of IxHash and CPHash, precisely
what you want in order to know all about the attributes of a tag
"as it was written". I found that just knowing the hash (and attr-
seq as in HTML::Parser) was not always sufficient, nor very smooth.
TagStream.pm: This exports two functions, parse_on and parse_off.
You pass them open filehandles:
open(HTML, "a.html") or die;
parse_on(\*HTML);
Now readline on HTML will give you tokens (which are overloaded
Tag objects, as above):
while (<HTML>) {
if ($_ == 'tag:body') {
$_ .= '<body bgcolor=#cccccc>';
print;
last;
}
print;
}
(That unconditionally changed bgcolor to grey.) Then to stop parsing
you say:
parse_off(\*HTML);
and you are back to normal.
while (<HTML>) {
print;
}
The advantage is I don't have to think to use this, and it treats the
HTML with the lightest touch. But now the question of tied filehandles
has come up, and I'd like to know what the deal is.
Another example--make a web page tired:
$wired = 0; #innocent until proved guilty
while (<HTML>) {
$wired = 1 if $_ == 'tag:applet';
if (!$wired and $_ != 'tag:blink' and $_ != '/tag:blink') {
print $_;
}
$wired = 0 if $_ == '/tag:applet';
}
If anybody is interested in using and improving these modules, please
tell me. The above is only a sketchy description of what they do, not
to speak of numerous (current) limitations, though I think the actual
potential may be big--for instance a proxy which opens a socket to an
HTTP GET and filters things to your heart's content.
------------------------------
Date: 12 Jul 98 21:33:47 GMT (Last modified)
From: Perl-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Special: Digest Administrivia (Last modified: 12 Mar 98)
Message-Id: <null>
Administrivia:
Special notice: in a few days, the new group comp.lang.perl.moderated
should be formed. I would rather not support two different groups, and I
know of no other plans to create a digested moderated group. This leaves
me with two options: 1) keep on with this group 2) change to the
moderated one.
If you have opinions on this, send them to
perl-users-request@ruby.oce.orst.edu.
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.misc (and this Digest), send your
article to perl-users@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
The Meta-FAQ, an article containing information about the FAQ, is
available by requesting "send perl-users meta-faq". The real FAQ, as it
appeared last in the newsgroup, can be retrieved with the request "send
perl-users FAQ". Due to their sizes, neither the Meta-FAQ nor the FAQ
are included in the digest.
The "mini-FAQ", which is an updated version of the Meta-FAQ, is
available by requesting "send perl-users mini-faq". It appears twice
weekly in the group, but is not distributed in the digest.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V8 Issue 4005
**************************************