[22362] in Perl-Users-Digest
Perl-Users Digest, Issue: 4583 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Feb 19 06:06:01 2003
Date: Wed, 19 Feb 2003 03:05:07 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Wed, 19 Feb 2003 Volume: 10 Number: 4583
Today's topics:
Re: -MCPAN to install older PM <randy@theoryx5.uwinnipeg.ca>
Re: calling mrtg from a perlscript <goldbb2@earthlink.net>
Clearing an array (Jason Singleton)
Re: Clearing an array <bernard.el-hagin@DODGE_THISlido-tech.net>
Compiling Perl with Intel Linux C++ Compiler (Seth Brundle)
dbd-pg (postgres) on windows <swen@news.com>
Re: dbd-pg (postgres) on windows <goldbb2@earthlink.net>
Error out routine (Jeff Mott)
Re: Error out routine <tassilo.parseval@post.rwth-aachen.de>
Re: Extracting patterns <remccart@uiuc.edu.spam>
Re: Extracting patterns <goldbb2@earthlink.net>
Re: Need Help with Pattern Matching <marc.beyer@gmx.li>
Re: Need Help with Pattern Matching <goldbb2@earthlink.net>
Re: Problem with buffering on non-blocking socket under <goldbb2@earthlink.net>
Re: Problem with buffering on non-blocking socket under <Ed+nospam@ewildgoose.demon.co.uk@>
Re: Removing HTML tags. <tassilo.parseval@post.rwth-aachen.de>
Re: Removing HTML tags. <acm2@ukc.ac.uk>
Re: Removing HTML tags. <peakpeek@purethought.com>
Re: Returning the field list from DBI <goldbb2@earthlink.net>
Re: Returning the field list from DBI <barbr-en@online.no>
Re: Unsuccesful "Print" (Jay Tilton)
Re: Unsuccesful "Print" <bwalton@rochester.rr.com>
Re: Unsuccesful "Print" <tony_curtis32@yahoo.com>
Re: Unsuccesful "Print" <goldbb2@earthlink.net>
Re: Unsuccesful "Print" <goldbb2@earthlink.net>
Re: Uploading a file (g)
Re: use DBI; (Helgi Briem)
Re: use DBI; (g)
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Wed, 19 Feb 2003 01:47:21 -0600
From: "Randy Kobes" <randy@theoryx5.uwinnipeg.ca>
Subject: Re: -MCPAN to install older PM
Message-Id: <4EG4a.45807$7_.202277@news1.mts.net>
"Andrew McGregor" <andy@misk.co.uk> wrote in message
news:3E520F2F.1080203@misk.co.uk...
[ .. ]
> How can I force the CPAN/shell program to install an older version of a
> Perl module, saving me from upgrading the Perl interpreter?
Try giving the explicit name of the distribution, as in
cpan> install A/AB/ABC/some_package-1.23.tar.gz
best regards,
randy kobes
------------------------------
Date: Tue, 18 Feb 2003 23:10:35 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: calling mrtg from a perlscript
Message-Id: <3E5303BB.91F9E784@earthlink.net>
Rob wrote:
>
> Hello group,
>
> We use MRTG with RRD to monitor our servers.
>
> We have a perl script called mrtgmon.pl which executes the mrtg script
> with a "do" function from a subroutine, it works but the perl process
> runs out of memory. It seems that it is the MRTG perl script because
> if we use a standard system call from the mrtgmon.pl script, the
> memory of the perl process keeps low, because the separate process for
> MRTG is terminated. This way the memory allocated by MRTG is freed
> again.
>
> Why change it? Because if we use the do function we gain about 200
> percent speed in monitoring all the servers(about 150 hosts in 45 sec.
> instead of 200 sec.), while there are no separate processes started
> for mrtg. In this way both the mrtmon.pl and mrtg script run in the
> same interpreter.
>
> Our question? Is it possible with the do function to free the memory
> that the MRTG script uses, like in the system call.
The problem is that your mrtg script that gets do()ed has been written
in such a way that it allocates memory, and never explicitly frees it.
When you run the script via system, the memory of course gets freed when
the process finishes.
But when you run the script via do(), there is nothing to cause the
memory to get freed... And since you run it again and again in the same
process, that process grows bigger and bigger.
There's two solutions that I can see:
1) Examine the mrtg script, find out how and where it's allocating
memory, and alter it so that finishing the script frees that memory.
This requires learning what exactly it's doing, and *understanding*
that... it requires thought. (Or it requires hiring a programmer to do
your thinking for you.)
2) Use fork() to cause the do() to occur in a seperate process, as
follows:
defined(my $pid = fork) or die "fork failed: $!";
if( $pid == 0 ) {
local $!;
do "mrtg_thingy.pl";
die $@ || $! if $@ || $!;
exit(0);
}
waitpid($pid, 0) or die "waitpid failed: $!";
die "mrtg failed [\$? == $?]\n" if $?;
[untested]
This is similar to what system() does internally, except that where I
use do(), the system() function uses the much more expensive exec()
system call.
--
$;=qq qJ,krleahciPhueerarsintoitq;sub __{0 &&
my$__;s ee substr$;,$,&&++$__%$,--,1,qq;;;ee;
$__>2&&&__}$,=22+$;=~y yiy y;__ while$;;print
------------------------------
Date: 19 Feb 2003 02:34:19 -0800
From: jsn@microlib.demon.co.uk (Jason Singleton)
Subject: Clearing an array
Message-Id: <5b7e0283.0302190234.3a075b23@posting.google.com>
According to the perl documentaion you clear an array by using one of
the following
@whatever = ();
$#whatever = -1;
But this dosn't work it dosn't do anything, I have got it to work with
%whatever=();
I store entries in the array like $whatever{'STUFF'}="data"; am I
misunderstanding somthing here?
btw I'm using ActivePerl 5.8 on WinXP
------------------------------
Date: Wed, 19 Feb 2003 10:40:57 +0000 (UTC)
From: "Bernard El-Hagin" <bernard.el-hagin@DODGE_THISlido-tech.net>
Subject: Re: Clearing an array
Message-Id: <Xns932776283C653elhber1lidotechnet@62.89.127.66>
Jason Singleton wrote:
> According to the perl documentaion you clear an array by using one of
> the following
>
> @whatever = ();
> $#whatever = -1;
>
> But this dosn't work it dosn't do anything, I have got it to work with
>
> %whatever=();
@whatever is an array, %whatever is a hash. You're filling a hash and
clearing an array. For more info:
perldoc perldata
--
Cheers,
Bernard
--
echo 42|perl -pe '$#="Just another Perl hacker,"'
------------------------------
Date: 18 Feb 2003 20:39:24 -0800
From: brundlefly76@hotmail.com (Seth Brundle)
Subject: Compiling Perl with Intel Linux C++ Compiler
Message-Id: <53e2ec95.0302182039.54ead4f4@posting.google.com>
I was wondering if anyone had successfully compiled Perl with this
compiler.
A CERN scientist reports 30% faster executables with it (on Intel
machines I would imagine), and good GNU compatibility.
------------------------------
Date: Tue, 18 Feb 2003 21:31:55 -0800
From: swen <swen@news.com>
Subject: dbd-pg (postgres) on windows
Message-Id: <3E5316CB.592734A6@news.com>
Hi,
I'm guessing someone here has probably used postgres with perl on
windows. I'm having problems at DBI->connect. for example,
$dbh = DBI->connect('dbi:Pg:dbname=test','','');
dies with error:
"DBI->connect(dbname=test) failed: could not create socket: An address
incompatible with the requested protocol was used"
I am running postmaster from cygwin and can connect to the server from
psql just fine. Postmaster doesn't output any messages when I try to
connect from perl. I read the perldocs for dbd::pg and dbi but I don't
see or I missed any answers there.
Thanks for any help.
dbd::pg is 1.3
postgres is 7.2
------------------------------
Date: Wed, 19 Feb 2003 03:21:34 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: dbd-pg (postgres) on windows
Message-Id: <3E533E8E.805FC425@earthlink.net>
swen wrote:
>
> Hi,
>
> I'm guessing someone here has probably used postgres with perl on
> windows. I'm having problems at DBI->connect. for example,
>
> $dbh = DBI->connect('dbi:Pg:dbname=test','','');
>
> dies
How could it possibly die? I see no call to perl's die() operator
there.
> with error:
>
> "DBI->connect(dbname=test) failed: could not create socket: An address
> incompatible with the requested protocol was used"
This looks like the contents of $! ... you should be printing the
contents of $DBI::errstr.
Remember, the $! variable only contains valid information when a system
call has failed -- if a higher level function fails (eg, a module
function, such as DBI->connect), there are no garuntees that it will
contain anything in particular, *unless* that function explicitly is
documented as doing so (and DBI->connect is *not* documented as setting
the $! variable on failure; thus, there's a fair chance of it containing
garbage...)
> I am running postmaster from cygwin and can connect to the server from
> psql just fine. Postmaster doesn't output any messages when I try to
> connect from perl. I read the perldocs for dbd::pg and dbi but I don't
> see or I missed any answers there.
>
> Thanks for any help.
>
> dbd::pg is 1.3
> postgres is 7.2
--
$;=qq qJ,krleahciPhueerarsintoitq;sub __{0 &&
my$__;s ee substr$;,$,&&++$__%$,--,1,qq;;;ee;
$__>2&&&__}$,=22+$;=~y yiy y;__ while$;;print
------------------------------
Date: 18 Feb 2003 23:06:29 -0800
From: jeffmott@twcny.rr.com (Jeff Mott)
Subject: Error out routine
Message-Id: <f9c0ce19.0302182306.77598686@posting.google.com>
What is the good programming practice way of reporting an error in a
subroutine from malformed input or a mishap in the execution:
returning undef and forcing the user (programmer) to check the return
value for every call, be extremely strict and die (croak) out giving
the user little to no choice over the decision, or working around the
error and leaving the user to remain ignorant or an error?
------------------------------
Date: 19 Feb 2003 08:23:25 GMT
From: "Tassilo v. Parseval" <tassilo.parseval@post.rwth-aachen.de>
Subject: Re: Error out routine
Message-Id: <b2vett$bc0$1@nets3.rz.RWTH-Aachen.DE>
Also sprach Jeff Mott:
> What is the good programming practice way of reporting an error in a
> subroutine from malformed input or a mishap in the execution:
> returning undef and forcing the user (programmer) to check the return
> value for every call, be extremely strict and die (croak) out giving
> the user little to no choice over the decision, or working around the
> error and leaving the user to remain ignorant or an error?
A very common thing to do is:
sub func {
...
return $value if $all_went_fine; # $value should be true
return; # else
}
The empty 'return' does the right thing, ie. it will return the
empty list in list context and undef in scalar context.
A croak() could also be an option, but depending on the circumstances it
can be annoying. If you want to proceed even if the function detected a
case where it would croak(), you'd have to wrap that into an eval{}.
You could also (and additionally to the above) keep a variable
containing an error-string that is set when something does wrong:
my $errstr;
sub func {
...
return $value if $all_went_fine; # $value should be true
# else
$errstr = "Some error description";
return;
}
This allows a rather natural Perl idiom:
func(@args) or print $errstr;
There is no definite answer to your question. What to do could also
depend on whether you are actually writing a module where error-handling
is an issue of its own and might need to be more sophisticated than in
an ordinary program with merely a few subroutines.
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
------------------------------
Date: Tue, 18 Feb 2003 21:59:05 -0600
From: Ryan McCarthy <remccart@uiuc.edu.spam>
Subject: Re: Extracting patterns
Message-Id: <HdD4a.101$o7.2813@vixen.cso.uiuc.edu>
That is exactly what I mean.
Gunnar Hjalmarsson wrote:
> Ryan McCarthy wrote:
>
>> I am trying to extract to an array all the different permutations of a
>> string formatted like this:
>>
>> ABCD(E|F)GHI...
>>
>> Where this would produce an array with two entries
>>
>> 1. ABCDEGHI
>> 2. ABCDFGHI
>>
>> The trick is that there can be 0 or more of these (X|Y) notations.
>> Anybody have any suggestions?
>
>
> Do you mean that two (X|Y) notations would result in 2 x 2 = 4 entries
> in the array, and that three of them would result in 2 x 2 x 2 = 8
> entries, etc.?
>
> / Gunnar
>
------------------------------
Date: Wed, 19 Feb 2003 00:07:06 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Extracting patterns
Message-Id: <3E5310FA.81B73F76@earthlink.net>
Ryan McCarthy wrote:
>
> I am trying to extract to an array all the different permutations of a
> string formatted like this:
>
> ABCD(E|F)GHI...
>
> Where this would produce an array with two entries
>
> 1. ABCDEGHI
> 2. ABCDFGHI
>
> The trick is that there can be 0 or more of these (X|Y) notations.
> Anybody have any suggestions?
sub expand_pattern {
my $pattern = shift;
if( $pattern =~ /\( ([^()]*) \)/xs ) {
my $left = substr( $pattern, 0, $-[0] );
my $right = substr( $pattern, $+[0] );
return map expand_pattern($_),
map "$left$_$right", split /\|/, $1, -1;
} elsif( $pattern =~ tr/()// ) {
die "Pattern '$pattern' contains unbalanced parens\n";
} else {
return split /\|/, $pattern, -1;
}
}
print $_, "\n" for expand_pattern("ABCD(E|F)GHI...");
__END__
[tested]
This is designed to work properly even with nested parenthesiszed
thingies. For example, if you have ".(1|2).(3|<(4|5)>)." as your
pattern, it will print out:
.1.3.
.1.<4>.
.1.3.
.1.<5>.
.2.3.
.2.<4>.
.2.3.
.2.<5>.
--
$;=qq qJ,krleahciPhueerarsintoitq;sub __{0 &&
my$__;s ee substr$;,$,&&++$__%$,--,1,qq;;;ee;
$__>2&&&__}$,=22+$;=~y yiy y;__ while$;;print
------------------------------
Date: Wed, 19 Feb 2003 03:08:00 +0000
From: Marc <marc.beyer@gmx.li>
Subject: Re: Need Help with Pattern Matching
Message-Id: <b2uscm$1gbpk7$1@ID-123680.news.dfncis.de>
Hi,
regular expressions are your friend :-). Use this as your sort subroutine:
sub make_sort {
($a =~ m/[0-9\.]*\s\d*\s([a-zA-Z ]*)\s\d\.\d+/)[0]
cmp
($b =~ m/[0-9\.]*\s\d*\s([a-zA-Z ]*)\s\d\.\d+/)[0];
}
Note that this currently sorts on the first name (same as your "split" would
have done), if you want a sort on the last name you'd have use this regex:
m/[0-9\.]*\s\d*\s[a-zA-Z ]*\s(\w*)\s\d\.\d+/
Have fun,
Marc
P.S. I have no idea how many records you need to sort and in what timeframe,
this solution might be too slow for you.
------------------------------
Date: Tue, 18 Feb 2003 22:37:21 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Need Help with Pattern Matching
Message-Id: <3E52FBF1.D875220@earthlink.net>
Mikey wrote:
> Benjamin Goldberg wrote:
> > Mikey wrote:
> > [snip]
> > > I'm trying to sort the following data by Height:
> > > (this data resides in store.dat, each field separated by a space)
> > >
> > > DOB Cash Name Height
> > >
> > > 12.01.1963 100 John Smith 5.11
> > > 27.02.1955 200 Jimmy Dolan Andrews 5.8
> > > 16.03.1977 400 Johnny Barrosh 5.4
> > > 09.01.1987 200 Steve McHooter 5.5
> > > 01.02.1983 100 Blaro Van Daronimo 4.4
> > >
> > > I can't work out a correct pattern matching line which will
> > > successfully separate these fields.
> >
> > The problem is that the fields are seperated by spaces, and the Name
> > field has a variable number of embeded spaces.
> >
> > Thus, you can't just split on spaces and get the 4th element after
> > splitting.
> >
> > However... the height field is always the *last* field when
> > splitting on whitespace, even if the number of fields before it
> > vary.
>
> This is true but I also may want to access the oter fields later.
What exactly are you asking for? Are you trying to figure out how to
parse the fields into variables?
while( <IN> ) {
my ($dob, $cash, $name_height) = split ' ', $_, 3;
my ($name, $height) = $name_height =~ /(.*) (.*)/;
# do stuff with $dob, $cash, $name, height.
}
> Would this type of pattern matching be possible despite the fact that
> the separotors are spaces and a field can have random number of spaces
> in it?
As long as there's only *one* field which can have a random number of
spaces in it, yes.
> >So, you can do the following:
> >
> > open( FH, "<", "store.dat" ) or die $!;
> > print sort {
> > (split(' ', $a))[-1] <=> (split(' ', $b))[-1]
> > } <FH>;
> > __END__
> >
>
> This compiles and runs but how do I print the newly sorted data?
There's a print statement in there... Does nothing get outputed to your
screen when you run it?
Or are you asking something else (eg, how do I store the newly sorted
data, or maybe how do I extract the Cash field from each row of the
newly sorted data, etc.)?
> > Note that for large files, this repeated splitting is
> > innefficient... One may get faster results with:
> >
> > open( FH, "<", "store.dat" ) or die $!;
> > print $_->[0] for
> > sort { $a->[1] <=> $b->[1] }
> > map [ $_, /(\S+)$/ ],
> > <FH>; # read the data in.
> > __END__
> >
> > This the 'Schwartzian-Transform', a commonly used optomization when
> > sorting data with perl.
--
$;=qq qJ,krleahciPhueerarsintoitq;sub __{0 &&
my$__;s ee substr$;,$,&&++$__%$,--,1,qq;;;ee;
$__>2&&&__}$,=22+$;=~y yiy y;__ while$;;print
------------------------------
Date: Tue, 18 Feb 2003 23:33:45 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Problem with buffering on non-blocking socket under win32
Message-Id: <3E530929.3118F102@earthlink.net>
Ed W wrote:
>
> In fact I have just noticed that if I comment out the
> $remote->blocking(0) then syswrite still returns immediately and I can
> see from the performance monitor that the OS/Perl carries on in the
> background actually writing the stuff over the wire...
>
> ...is this normal?
Yes. This is not affected by turning $handle->blocking on or off.
If you want to block until the data is actually sent over the wire, use
the $handle->sync method.
> I am writing over a very slow satellite modem and reading from an
> email client on the local machine so I really need to be able to see
> how much data is outstanding to be written to the modem (it could take
> a very long time to clear down!)
Although you can use ->sync to wait until all the data that's been
written to a socket gets sent, there's no way of finding out *how much*
of the data has been sent ... your only possible states of knowledge
are:
1/ You've called ->sync, and not written any data since then. Thus,
all the data you've written has been sent.
2/ select() has indicated that the handle is not writable. Therefor,
the OS level outgoing buffer for that handle is completely full, and
syswrite() will block. Logically, the amount that's been sent is equal
to the amount that's been written since the last ->sync, minus the size
of the OS level buffer.
3/ You've written some data, haven't called ->sync, and select
indicates that the handle is writable. The amount actually sent is
greater than or equal to zero, and less than the amount that you'd know
was written if you were in state (2). If you've written more bytes than
the size of the OS level buffer, you know that that the amount that's
been sent is *at least* the amount that you've written minus the size of
the OS buffer.
[Obviously, if you don't know how big the OS buffer is, your states are
"all of it has been sent" (after calling sync), "I've no idea how much
has been sent" (written some data, select says writable), and "At least
*some* has been sent, but I've no idea how much" (written some data,
select says not writable)]
If you want to be able to use select() to find out when all the data for
a socket has been sent, then you could do something like this:
open(my($fh), "-|") or $socket->sync, exit;
$sel->add($fh); # $sel is an IO::Select object.
The $fh filehandle will become readable (with an EOF condition) when the
child process exits, which in turn will happen when the ->sync method
finishes.
--
$;=qq qJ,krleahciPhueerarsintoitq;sub __{0 &&
my$__;s ee substr$;,$,&&++$__%$,--,1,qq;;;ee;
$__>2&&&__}$,=22+$;=~y yiy y;__ while$;;print
------------------------------
Date: Wed, 19 Feb 2003 11:04:16 GMT
From: "Edward Wildgoose" <Ed+nospam@ewildgoose.demon.co.uk@>
Subject: Re: Problem with buffering on non-blocking socket under win32
Message-Id: <QwJ4a.391576$1y5.2803628@news.easynews.com>
Benjamin, Thanks for once again, what is an exceptional and very detailed
answer!
I think I can summarise what you said as: "can_write($socket)" tells me if
Perl is buffering extra data, however, after that can_write returns true
even though the OS is buffering some more data?
ie if I write 1Mb, then 32Kb, (say) is in the OS buffer, and once there is
only 32Kb left to write, then "can_write" starts to return true.
Two questions occur. I think I saw on here a mention of extra params when
creating the socket to control "windows size", which I assume is that the OS
buffer is going to size itself based on? (Perhaps...) Can someone please
remind me what these options would be (ie I would like to turn down the
size of my send buffer on windows 2K/XP)
Secondly, why does the write function send too much data, ie from the perl
cookbook we see:
my $n = $remote->send($buf, 0);
Where $n is supposed to be the number of chars written, BUT I am *always*
seeing $n == length($buf)
How can I just fill up just the OS buffers and then have it set
"$!{EWOULDBLOCK}" - my options appear to be: blocking writes, or
non-blocking writes which accept as much data as they feel fit (I managed to
force 100MB "into" the socket without problems or notice of "EWOULDBLOCK")?
This seems a little odd.
Thanks all
Ed W
------------------------------
Date: 19 Feb 2003 08:09:47 GMT
From: "Tassilo v. Parseval" <tassilo.parseval@post.rwth-aachen.de>
Subject: Re: Removing HTML tags.
Message-Id: <b2ve4b$am0$1@nets3.rz.RWTH-Aachen.DE>
Also sprach Tony McNulty:
> On Wed, 19 Feb 2003 02:17:38 +0100, Tore Aursand <tore@aursand.no> wrote:
>
>> On Tue, 18 Feb 2003 23:54:54 +0000, Tony McNulty wrote:
>>>> HTML::Parse perhaps?
>>
>>> Hasn't that been deprecated?
>>
>> Whoa! You know that the module is deprecated, but you don't know what
>> replacement modules to use? You should really start _reading_ the module
>> documentation from now on!
>
> Hmm, read through and played around with Parse, Parser and TreeBuilder.
> Decided none of them are what I want to use now. Figured it might just be
> simpler to design a regex, and turns out it is :)
You are wrong here.
> Hopefully the site aren't planning on changing their layout soon :)
The following snippet is a complete HTML-stripper. It wont get much
simpler, will it? It's also completeley independent from the site's
layout:
#! /usr/bin/perl -w
use strict;
package Strip;
use base qw/HTML::Parser/;
{
my $data;
sub text { $data .= $_[1] }
sub get { $data }
}
package main;
use LWP::Simple;
my $p = Strip->new;
$p->parse( get(shift) );
print $p->get;
It gets the URL to parse as first argument from the command-line.
> Apologies for sparking the reactions I did... I've never held an interest
> in html parsing before (generally just JAPH's and perl golf), and often
> skip over the posts on the matter. A quick search through groups.google.com
> proved how much I had actually overlooked :)
You are still overlooking a lot if you want to do it with regexes.
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
------------------------------
Date: Wed, 19 Feb 2003 10:15:59 -0000
From: Tony McNulty <acm2@ukc.ac.uk>
Subject: Re: Removing HTML tags.
Message-Id: <oprkt6sxn62czp9w@news.ukc.ac.uk>
On 19 Feb 2003 08:09:47 GMT, Tassilo v. Parseval
<tassilo.parseval@post.rwth-aachen.de> wrote:
> Also sprach Tony McNulty:
>> On Wed, 19 Feb 2003 02:17:38 +0100, Tore Aursand <tore@aursand.no>
>> wrote:
>>
>>> On Tue, 18 Feb 2003 23:54:54 +0000, Tony McNulty wrote:
>>>>> HTML::Parse perhaps?
>>>
>>>> Hasn't that been deprecated?
>>>
>>> Whoa! You know that the module is deprecated, but you don't know what
>>> replacement modules to use? You should really start _reading_ the
>>> module
>>> documentation from now on!
>>
>> Hmm, read through and played around with Parse, Parser and TreeBuilder.
>> Decided none of them are what I want to use now. Figured it might just
>> be simpler to design a regex, and turns out it is :)
>
> You are wrong here.
Umm, for my purposes, it is simpler.
As I stated in my original post: "stripping the html tags from it to make
it easier to extract the text that I need".
The reason is that by leaving the tags in, I can use them as reference
points to get at the bits of the page that I need to. I managed to find a
couple reference points that let me extract the text I needed with only two
lines of regexs (that could probably be shrunk to one, but that's a task
for another night).
>> Hopefully the site aren't planning on changing their layout soon :)
>
> The following snippet is a complete HTML-stripper. It wont get much
> simpler, will it? It's also completeley independent from the site's
> layout:
>
<snip the snippet>
> It gets the URL to parse as first argument from the command-line.
Thanks for that, it looks exactly like what I was asking for initially, and
I'm sorry that I've decided against that method, but actually stripping the
html away makes my overall task harder :D
Well, that was an interesting experience for my first posting here :D
Apologies for wasting people's time. Sometimes it takes a couple of silly
questions before actually understandig what im trying to do :D
Thanks,
Tony - crawling back into lurker mode for another year
------------------------------
Date: Wed, 19 Feb 2003 11:03:05 +0000
From: Sharon Grant <peakpeek@purethought.com>
Subject: Re: Removing HTML tags.
Message-Id: <2an65v4eksmbbos8c4qt7s8et0m565ej9j@4ax.com>
On Wed, 19 Feb 2003 10:15:59 -0000, in comp.lang.perl.misc, Tony McNulty <acm2@ukc.ac.uk> wrote:
>by leaving the tags in, I can use them as reference
>points to get at the bits of the page that I need to. I managed to find a
>couple reference points that let me extract the text I needed with only two
>lines of regexs (that could probably be shrunk to one, but that's a task
>for another night).
>
>>> Hopefully the site aren't planning on changing their layout soon
If you're not expecting the layout to change, you could document
the layout once-only (off-line) using the dump() method - examples
in perldoc HTML::Element
dump() will display the hierarchical addresses of all the elements
in the document
Then it might be simple (depending partly on the quality of the
HTML) to locate the text you need using the address() method and
hard-coding the specific address you discovered by examining the
dump() output by eye
I had this working once before, but have since lost the code
I'm still struggling with the sheer scope of function in
HTML::Parser and HTML::TreeBuilder and have not yet found any
simple examples showing how to traverse the tree in order to
find one or two particular elements
Most of the examples do things like extract a list of links,
strip the tags to convert the document to text - all of which
are useful in their context, but not enough to use the full
power of these tools
FWIW, the challenge I was working on is to extract the currency
conversion information from the HTML returned by Xenon's
currency converter:
http://www.xe.com/ucc/
The HTML is not pleasant to work with, but I do have a set of
regular expressions which works - in a sledgehammer sort of way
--
Sharon
------------------------------
Date: Tue, 18 Feb 2003 22:27:02 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Returning the field list from DBI
Message-Id: <3E52F986.5995034F@earthlink.net>
Sharon Grant wrote:
> Benjamin Goldberg wrote:
>>Jason Singleton wrote:
>>> Also how do I change the read position in the returned data? I
>>> don't want to return all records to the user at once I want it to
>>> display the returned data in pages like a search engine.
>
> SQL is a set-oriented data access language. The general advice
> has always been that there should not be a need to return part
> of a set. Instead the query should be refined to return a
> smaller set
>
>> There's two possible ways, both of which
>> require support from the SQL driver.
>>
>> One is to use a LIMIT clause; this is the prefered way.
>>
>> The other is to use the ->func method on the statement handle, to
>> tell the driver that you want to skip forward some number of records.
>> I'm not sure what databases, if any, have this kind of functionality.
>>
>> There's sortof a third way -- simply read and discard as many records
>> as you want to skip. This is innefficient, but portable
>
> All the above suggestions assume that the DBMS is going to
> return the data in the same order each time, which an SQL DBMS
> may not do unless:
> - the query has an ORDER BY clause, and
> - the column names in the ORDER BY clause specify a unique key
>
> This is explained more eloquently in the PostgreSQL
> documentation of the LIMIT clause:
> http://tinyurl.com/61r3
Hmm, it sounds like it's saying that if a LIMIT clause is present, but
you've left out ORDER BY, PostgreSQL will intentionally mangle the order
of the data in such a way that LIMIT and OFFSET are useless for
providing subsets of the data.
I seriously doubt that this is a common effect; I would expect that the
only *normal* reasons why the order would be inconsistant are if rows
are inserted/deleted/altered, or if an index is added/removed/remade for
a table, or something else is done which alters the tables in question,
or alters their metadata.
(Hmm... I suppose that if a table has a seperate index on one or more of
the fields, and that index is done using a splay tree, instead of a
hash, then that too could concievably result in an inconsistent order,
especially if someone does another query in between the two
select-with-LIMIT queries. But who uses splay trees?)
> Now, since a uniquely keyed ORDER BY clause is required in any
> case, paging can be implemented (portably) by saving the key
> of the last row of the current page, and including that key in
> the search conditions of the WHERE clause of the query which
> is to produce the next page
Are you suggesting something like:
my $sth = $dbh->prepare(q(
SELECT * FROM table
WHERE primarykey > ?
ORDER BY primarykey
));
$sth->execute( $where_last_search_finished );
while( my $data = $sth->fetchrow_hashref ) {
output($data);
$key_of_last_row_output = $data->{primarykey};
last if ++$rows >= $items_per_page;
}
$sth->finish;
# store $key_of_last_row_output somewhere.
? I suppose it would work, but if the database implementation does a
lot of prefetching (transfers large numbers from the database server
process to the client, even before they're fetched), there's still a bit
of innefficiency here (unless a LIMIT clause is used, but some databases
don't support that).
Plus of course there's your other note:
> The syntax of the search conditions gets a bit verbose for a
> multi-column key, but that's part of the cost of SQL
.....
IMHO, the *best* way of doing this is to have the query for the first
page of data to fetch *all* the rows, but write them to a cache file (or
rather, a bunch of cache files, one per page of data). Subsequent
requests would then read from those files. This avoids the need to do
unportable things like LIMIT, and the need to add potentially expensive
things, like ORDER BY.
--
$;=qq qJ,krleahciPhueerarsintoitq;sub __{0 &&
my$__;s ee substr$;,$,&&++$__%$,--,1,qq;;;ee;
$__>2&&&__}$,=22+$;=~y yiy y;__ while$;;print
------------------------------
Date: Wed, 19 Feb 2003 11:32:47 +0100
From: Kåre Olai Lindbach <barbr-en@online.no>
Subject: Re: Returning the field list from DBI
Message-Id: <g9m65voldrk8tl7r8pp4as57ifd3mm63e9@4ax.com>
On Tue, 18 Feb 2003 22:27:02 -0500, Benjamin Goldberg
<goldbb2@earthlink.net> wrote:
>Sharon Grant wrote:
>> Benjamin Goldberg wrote:
>>>Jason Singleton wrote:
>>>> Also how do I change the read position in the returned data? I
>>>> don't want to return all records to the user at once I want it to
>>>> display the returned data in pages like a search engine.
[snip alot useful]
>IMHO, the *best* way of doing this is to have the query for the first
>page of data to fetch *all* the rows, but write them to a cache file (or
>rather, a bunch of cache files, one per page of data). Subsequent
>requests would then read from those files. This avoids the need to do
>unportable things like LIMIT, and the need to add potentially expensive
>things, like ORDER BY.
There is a disclaimer, though:
The longer you keep this buffer, and only look there, the more
incorrect your data will be, if it is updated alot.
--------------------------------------
I myself use some sort of "limit/top", "order by xxx asc/desc" and
keep track of where to start next/previous page.
If I want to make it somewhat portable I set up my sqls like this:
"SELECT $top x,y,z from table WHERE x $next_prev ORDER BY x $direction
$limit"
my use usually:
$top = $dbtype eq "MSSQL" ? "TOP $n" : "";
$limit = $dbtype eq "PGSQL" ? "LIMIT $n" : "";
$next_prev is either ">= $high_key_value" or "<= $low_key_value", and
$direction is either "ASC" or "DESC" according to direction wanted to
show data.
--
mvh/Regards
Kåre Olai Lindbach
------------------------------
Date: Wed, 19 Feb 2003 03:10:47 GMT
From: tiltonj@erols.com (Jay Tilton)
Subject: Re: Unsuccesful "Print"
Message-Id: <3e52f4e4.32989123@news.erols.com>
"Mikey" <PleaseDontThrowSpam@Me.com> wrote:
: This is a cut down version that has the exact same problems - it wont output
: anything!
:
: (perl perlfile.pl -h yields nothing, as does the -h anything option)
: (perl perlfile.pl -x will give Unknown option: x)
:
: use strict;
: use Getopt::Std;
: my($opt_h, $opt_a);
: getopts('ha:');
: if ( $opt_h ) {
: print "\nUsage: perlfile [ options ] arg\n";
: }
:
: if ( $opt_a ) {
: print "\nYou entered $opt_a\n";
: }
getopts() will assign values to package variables, not lexicals.
Change
my($opt_h, $opt_a);
to
our($opt_h, $opt_a);
------------------------------
Date: Wed, 19 Feb 2003 03:32:31 GMT
From: Bob Walton <bwalton@rochester.rr.com>
Subject: Re: Unsuccesful "Print"
Message-Id: <3E52FAC5.2030502@rochester.rr.com>
Mikey wrote:
> "Tony Curtis" <tony_curtis32@yahoo.com> wrote in message
> news:87isvhuk41.fsf@limey.hpcc.uh.edu...
>
...
>
> This is a cut down version that has the exact same problems - it wont output
> anything!
>
> (perl perlfile.pl -h yields nothing, as does the -h anything option)
> (perl perlfile.pl -x will give Unknown option: x)
>
> use strict;
>
> use Getopt::Std;
>
> my($opt_h, $opt_a);
>
>
> getopts('ha:');
>
>
> if ( $opt_h ) {
>
> print "\nUsage: perlfile [ options ] arg\n";
>
> }
>
> if ( $opt_a ) {
>
> print "\nYou entered $opt_a\n";
> }
>
If you follow what perldoc Getopt::Std says to do and declare $opt_h and
$opt_a using "our" rather than "my", then it works as the docs say it will:
perl junk.pl -h
gives the usage message,
perl junk.pl -axxx
prints xxx, and
perl junk.pl -b
prints the canned bad option message.
--
Bob Walton
------------------------------
Date: Tue, 18 Feb 2003 21:39:01 -0600
From: Tony Curtis <tony_curtis32@yahoo.com>
Subject: Re: Unsuccesful "Print"
Message-Id: <87fzqluega.fsf@limey.hpcc.uh.edu>
>> On Wed, 19 Feb 2003 03:32:31 GMT,
>> Bob Walton <bwalton@rochester.rr.com> said:
> Mikey wrote:
>> "Tony Curtis" <tony_curtis32@yahoo.com> wrote in
>> message news:87isvhuk41.fsf@limey.hpcc.uh.edu...
>>
> ...
No I didn't. Please get the attributions right.
------------------------------
Date: Tue, 18 Feb 2003 22:57:49 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Unsuccesful "Print"
Message-Id: <3E5300BD.3274C7CE@earthlink.net>
Mikey wrote:
[snip]
> my($opt_h, $opt_a);
>
> getopts('ha:');
>
> if ( $opt_h ) {
The getopts() function sets the variable $main::opt_h. Your if()
statement is examining a lexical variable named $opt_h, which is an
entirely different variable.
The reason your program doesn't print anything is because the lexical
variable $opt_h is never set to anything.
Instead of declaring those to $opt_ things with my(), declare them
either with our(), or with use vars qw().
Or, better yet, pass a hash reference to getopts() as a second argument,
and examine that hash, instead of dealing with multiple seperate
variables. Eg:
use Getopt::Std qw(getopts);
use strict;
use warnings;
my %opts;
getopts("ha:", \%opts);
if( $opts{h} ) {
# stuff.
}
if( $opts{a} ) {
# stuff.
}
[untested]
--
$;=qq qJ,krleahciPhueerarsintoitq;sub __{0 &&
my$__;s ee substr$;,$,&&++$__%$,--,1,qq;;;ee;
$__>2&&&__}$,=22+$;=~y yiy y;__ while$;;print
------------------------------
Date: Wed, 19 Feb 2003 00:26:03 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Unsuccesful "Print"
Message-Id: <3E53156B.DFEA66ED@earthlink.net>
Tony Curtis wrote:
>
> >> On Wed, 19 Feb 2003 03:32:31 GMT,
> >> Bob Walton <bwalton@rochester.rr.com> said:
>
> > Mikey wrote:
> >> "Tony Curtis" <tony_curtis32@yahoo.com> wrote in
> >> message news:87isvhuk41.fsf@limey.hpcc.uh.edu...
> >>
> > ...
>
> No I didn't. Please get the attributions right.
He's not saying that you wrote that
code, he's saying that you wrote '...'.
Now, if he'd snipped 'Mikey wrote:', and removed the '>' in front of
'"Tony Curtis" ... wrote ...', *then* he would be (incorrectly) saying
that you wrote the code.
--
$;=qq qJ,krleahciPhueerarsintoitq;sub __{0 &&
my$__;s ee substr$;,$,&&++$__%$,--,1,qq;;;ee;
$__>2&&&__}$,=22+$;=~y yiy y;__ while$;;print
------------------------------
Date: 19 Feb 2003 02:57:15 -0800
From: the_game_is_never_over@yahoo.co.uk (g)
Subject: Re: Uploading a file
Message-Id: <2059e247.0302190257.69aecec2@posting.google.com>
I put the if else loop in my code, but I get an error saying use of
uninitialized value in -T. why does this happen??
#!\usr\bin\perl\bin\perl.exe -w
use CGI qw/:standard/;
use CGI::Carp 'fatalsToBrowser';
$in = new CGI();
if (-T $in) {
$INPUT = $in->param('original');
$outtie = $in->param('newname');
$OUTPUT = ">./$outtie";
open (OUTPUT) || die "Unable to open file for writing";
print $in->header();
print $in->start_html("Thank You");
print "<h2>Thank You For Sending a review $INPUT</h2>";
while($line = <$INPUT>) {
print $line; # display contents back in browser
print OUTPUT $line; # save in a file
}
close OUTPUT;
print $in->end_html();
} else {
print "not a text file sorry";
}
exit(0);
------------------------------
Date: Wed, 19 Feb 2003 10:36:02 GMT
From: helgi@decode.is (Helgi Briem)
Subject: Re: use DBI;
Message-Id: <3e535c5c.1031961211@news.cis.dfn.de>
On 18 Feb 2003 18:13:58 GMT, Abigail <abigail@abigail.nl>
wrote:
>Helgi Briem (helgi@decode.is) wrote on MMMCDLVIII September MCMXCIII in
><URL:news:3e523a9b.957778142@news.cis.dfn.de>:
>))
>)) That last exit is completely unnecessary. Where does
>)) it come from? I keep seeing this sort of thing in
>)) cargo cult code.
>))
>)) >exit(0);
>
>That has nothing to do with 'cargo cult'. Cargo cult programming is
>about using constructs you don't understand, but which seem vital
>to your program.
I always see this in old Perl4 CGI programs that people
have downloaded from the internet somewhere and are trying
to fix. I doubt that many of them understand
the code at all. Most haven't even looked at it.
When I first encountered Perl, I was shown it by a C/C++
programmer. He put exit(0) all over the place. For a
while, I did too. I have always assumed that it was a C-ism
but I don't speak C so I can't be sure.
>I doubt the 'exit(0)' was put there without knowing what
>the exit() does.
I am pretty sure most of the people who use it in do it
just because it was in the script they downloaded or
copied. I almost certain most of them haven't got a
clue what it's there for.
>Nor does it mysteriously change the behaviour of the program.
No, it just does absolutely nothing that wouldn't have
happened anyway.
>Just because it isn't "necessary" doesn't make an exit() at the end
>of the program cargo cult, just like using parens where not required,
>or an $_ where not required make "cargo cult".
I though the definition of "cargo cult" was using something
in a program because you saw it somewhere else without
knowing why you are using it.
Do you use exit(0) a lot? And if so, what for?
--
Regards, Helgi Briem
helgi AT decode DOT is
------------------------------
Date: 19 Feb 2003 02:39:34 -0800
From: the_game_is_never_over@yahoo.co.uk (g)
Subject: Re: use DBI;
Message-Id: <2059e247.0302190239.71b4e7fe@posting.google.com>
Hi thanks for you help so far, looks like I now can connect to the
database but I get a error message saying malformed header from
script. Bad header=. it then displays the first 2 fields from by
database (in my error log) What does that mean???
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 4583
***************************************