[29399] in Perl-Users-Digest
Perl-Users Digest, Issue: 643 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Jul 11 09:10:10 2007
Date: Wed, 11 Jul 2007 06:09:11 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Wed, 11 Jul 2007 Volume: 11 Number: 643
Today's topics:
Re: croak/confess from within File::Find anno4000@radom.zrz.tu-berlin.de
Re: croak/confess from within File::Find <mritty@gmail.com>
Re: Hash ref, but what is the key? with Chemistry::File <hjp-usenet2@hjp.at>
Re: Hash ref, but what is the key? with Chemistry::File <hjp-usenet2@hjp.at>
Re: Hash ref, but what is the key? with Chemistry::File <mritty@gmail.com>
Re: How to turn off taint checking in cgi <savagebeaste@yahoo.com>
Re: How to turn off taint checking in cgi <hjp-usenet2@hjp.at>
Re: How to turn off taint checking in cgi <usenet@larseighner.com>
new CPAN modules on Wed Jul 11 2007 (Randal Schwartz)
Re: Perl 5.6 vs 5.8 <clarke.n.o.s.p.a.m@hyperformix.com>
PERL/HTML: extract repetitive information seminex@gmail.com
Re: PERL/HTML: extract repetitive information <tadmc@seesig.invalid>
Re: PERL/HTML: extract repetitive information <thepoet_nospam@arcor.de>
Re: PERL/HTML: extract repetitive information <seminex@gmail.com>
Re: Portable general timestamp format, not 2038-limited <martin@see.sig.for.address>
Re: Remove a specific element from an Array <tadmc@seesig.invalid>
Re: TXL-like capability? <hjp-usenet2@hjp.at>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 11 Jul 2007 09:22:21 GMT
From: anno4000@radom.zrz.tu-berlin.de
Subject: Re: croak/confess from within File::Find
Message-Id: <5fjlqdF3dda1cU1@mid.dfncis.de>
Paul Lalli <mritty@gmail.com> wrote in comp.lang.perl.misc:
> I've been staring at this and playing with variations for over an hour
> now. Can someone help me out? My goal is to be able to use croak()
> from within a subroutine that is called by the &wanted subroutine
> which is passed to File::Find::find(). I want the error message
> printed as a result of this croak() to list the line number of the
> call to the final subroutine. Here is a short-but-complete script to
> demonstrate the problem I'm having:
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Carp;
> use File::Find;
> die "Usage: $0 [croak|confess] [level]\n" unless @ARGV == 2;
> my ($which, $carplevel) = @ARGV;
>
> sub err {
> $Carp::CarpLevel = $carplevel;
> if ($which eq 'croak') {
> croak "You did something bad!"; #line 12
> } else {
> confess "You did something bad!"; #line 14
> }
> }
>
> sub wanted {
> err(); #line 19
> }
Add this:
push our @CARP_NOT, 'File::Find';
>
> find(\&wanted, '.'); #line 22
> __END__
[snip]
Carp has a hard time assigning an error location in callback situations.
It climbs the stack, essentially watching for a change in the calling
package. When called from a callback it meets that change earlier than
intended (from your main to File::Find). The variable @CARP_NOT is
checked and a package change is ignored if one of the packages is on
the other's @CARP_NOT.
Anno
------------------------------
Date: Wed, 11 Jul 2007 05:53:44 -0700
From: Paul Lalli <mritty@gmail.com>
Subject: Re: croak/confess from within File::Find
Message-Id: <1184158424.034385.323730@d55g2000hsg.googlegroups.com>
On Jul 11, 5:22 am, anno4...@radom.zrz.tu-berlin.de wrote:
> Paul Lalli <mri...@gmail.com> wrote in comp.lang.perl.misc:
>
> > My goal is to be able to use croak()
> > from within a subroutine that is called by the &wanted subroutine
> > which is passed to File::Find::find(). I want the error message
> > printed as a result of this croak() to list the line number of the
> > call to the final subroutine.
>
> Add this:
>
> push our @CARP_NOT, 'File::Find';
>
> Carp has a hard time assigning an error location in callback situations.
> It climbs the stack, essentially watching for a change in the calling
> package. When called from a callback it meets that change earlier than
> intended (from your main to File::Find). The variable @CARP_NOT is
> checked and a package change is ignored if one of the packages is on
> the other's @CARP_NOT.
Anno,
Thank you for the information. Annoyingly, it seems that if all the
packages are "trusted" (as per Carp's docs), then croak behaves
exactly like confess:
#!/opt2/perl/bin/perl
use strict;
use warnings;
use Carp;
use File::Find;
sub err {
croak ("You did something bad!"); #line 8
}
sub wanted {
err(); #line 12
}
push our @CARP_NOT, 'File::Find';
find(\&wanted, '.'); #line 15
__END__
$ ./ff_carp.pl
You did something bad! at ./ff_carp.pl line 8
main::err() called at ./ff_carp.pl line 12
main::wanted() called at /opt2/Perl5_8_4/lib/perl5/5.8.4/File/
Find.pm line 810
File::Find::_find_dir('HASH(0x13e8d4)', ., 2) called at /opt2/
Perl5_8_4/lib/perl5/5.8.4/File/Find.pm line 690
File::Find::_find_opt('HASH(0x13e8d4)', .) called at /opt2/
Perl5_8_4/lib/perl5/5.8.4/File/Find.pm line 1193
File::Find::find('CODE(0x15e498)', .) called at ./ff_carp.pl
line 15
I suppose that's better than nothing. It just bugs me that there
doesn't seem to be a way to just have it print where the subroutine
was called from. I suppose I could fiddle with caller() to get
exactly what I want, but doesn't it seem like carp/croak should be
able to do this on its own?
Paul Lalli
------------------------------
Date: Wed, 11 Jul 2007 11:02:11 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Hash ref, but what is the key? with Chemistry::File::SDF mod
Message-Id: <slrnf9974j.sr0.hjp-usenet2@zeno.hjp.at>
On 2007-07-11 04:05, Uri Guttman <uri@stemsystems.com> wrote:
>>>>>> "p" == pistachio <hpbenton@gmail.com> writes:
> p> for (my $i=0; $i < @mols; $i++){
>
> no need for that as you can loop over @mols directly and use a hash for
> the other parts. this also means you can drop all of those arrays. you
> have a parallel set of arrays which means you should be using an array
> of hashes instead.
>
> foreach my $mol ( @mols ) {
>
> my %stuff ;
>
> p> $ID[$i]= $mols[$i]->attr("sdf/data")->{LM_ID};
>
> $stuff{id} = $mol->attr("sdf/data")->{LM_ID};
That hash exists only inside the loop. He could get the same effect by
just using scalar variables $ID, $name, etc.
I assume that he is filling all those arrays with values because he
intends to use them after the loop. So either he needs to keep the loop
counter (which is best if he needs to keep the correspondence to @mols)
or he can use another unique id. If LM_ID is unique and suitable for the
purpose something like :
$id = $mols[$i]->attr("sdf/data")->{LM_ID};
$stuff{$id}{name} = $mols[$i]->attr("sdf/data")->{COMMON_NAME};
$stuff{$id}{formula} = $mols[$i]->attr("sdf/data")->{FORMULA};
...
would work.
However, none of that seems to have anything to do with the OP's
problem.
hp
--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hjp@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"
------------------------------
Date: Wed, 11 Jul 2007 11:13:15 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Hash ref, but what is the key? with Chemistry::File::SDF mod
Message-Id: <slrnf997pb.sr0.hjp-usenet2@zeno.hjp.at>
On 2007-07-11 03:47, pistachio <hpbenton@gmail.com> wrote:
[...]
> for (my $i=0; $i < @mols; $i++){
> #print $i;
> $ID[$i]= $mols[$i]->attr("sdf/data")->{LM_ID};
> $name[$i]=$mols[$i]->attr("sdf/data")->{COMMON_NAME};
> $formula[$i]=$mols[$i]->attr("sdf/data")->{FORMULA};
> $mass[$i]=$mols[$i]->attr("sdf/data")->{EXACT_MASS};
> my @smile;
> $sdf[$i]=$mols[$i]->attr("sdf/data");
> #my $temp={$sdf[$i]};
> #$sdf[$i]=$mols[$i]->attr("sdf/data")->{SDF} . "\t";
> #print ".";
> print $mass[$i] . $name[$i] . $formula[$i] . $date . $ID[$i]. "\n";
> }
> print $sdf[1] ."\n";
I assume this line is the problem? Please be more explicit in the
future. It is hard to answer questions if you don't know what the
question is.
At this point $sdf[1] contains the return value from
$mols[1]->attr("sdf/data"), which (judging from the code above) is a
hashref which contains at least the keys LM_ID, COMMON_NAME, FORMULA and
EXACT_MASS.
You cannot print a hashref directly, you need to print it's individual
components, e.g.,
print $sdf[1]{EXACT_MASS}, " ",
$sdf[1]{COMMON_NAME}, " ",
$sdf[1]{FORMULA}, " ",
$sdf[1]{LM_ID}, "\n";
or use a module which returns a suitable string representation, like
Data::Dumper ...
> print Dumper $sdf[2] . "\n";
... but you did that already.
hp
--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hjp@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"
------------------------------
Date: Wed, 11 Jul 2007 03:54:36 -0700
From: Paul Lalli <mritty@gmail.com>
Subject: Re: Hash ref, but what is the key? with Chemistry::File::SDF mod
Message-Id: <1184151276.583620.71010@w3g2000hsg.googlegroups.com>
On Jul 10, 11:48 pm, pistachio <hpben...@gmail.com> wrote:
> $sdf[$i]=$mols[$i]->attr("sdf/data");
Here you assign $sdf[$i] to be a hashref that's returned from the
attr() method.
> print $mass[$i] . $name[$i] . $formula[$i] . $date . $ID[$i]. "\n";}
>
> print $sdf[1] ."\n";
Here you attempt to print out the hashref, concatenated to the
newline. Both of those operations "stringify" the hashref. It is no
longer a reference. It is just a string that contains the word HASH
followed by a memory location.
> print Dumper $sdf[2] . "\n";
Here you are attempting to print out a Dump of the hashref, but you
made the mistake of using the hashref in a concatenation. That
stringified the hashref, turning it into that "HASH(0x1234556)"
string, and then you dumped that string.
You seem to be concatenation-happy. Stop it. :-P
print Dumper($sdf[2]);
If for some reason you want an extra newline at the end of your
output, print it after the dump:
print Dumper($sdf[2]), "\n";
------------------------------
Date: Wed, 11 Jul 2007 00:57:26 -0700
From: "Clenna Lumina" <savagebeaste@yahoo.com>
Subject: Re: How to turn off taint checking in cgi
Message-Id: <5fjgrdF3dh7qlU1@mid.individual.net>
Paul Lalli wrote:
> On Jul 10, 9:18 am, Lars Eighner <use...@larseighner.com> wrote:
>> How can I turn off taint checking in a perl cgi script?
>
> By removing the -T option from the shebang in the script and/or the
> perl executable line in your webserver configuration file.
I don't ever recall seeing anything like that in Apache.
> I have now handed you a loaded hand gun and taught you how to point it
> at your head. Good luck.
It's amazing how it's always assumed that there isn't a valid reason for
a question like this. The OP may have a perfectly good reason for doing
what he's doing (like, say, debugging in some way), even if we don't
know what it is. Granted, there's nothing wrong with reiterating a
danger where danger potentially exists, but at the same time, it seems
some people act too much like a "parental controls" mechanism, which
only serves to take the focus from the original question itself.
Give warning, by all means, but please try to address the question
itself.
AFAIK, if you are using setuid perl (where tainting is on by default,
iirc), then a common method is using a c wrapper that's suid'ed to the
desired user and executes the script, returning the output.
--
CL
------------------------------
Date: Wed, 11 Jul 2007 11:30:21 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: How to turn off taint checking in cgi
Message-Id: <slrnf998pd.sr0.hjp-usenet2@zeno.hjp.at>
On 2007-07-11 07:57, Clenna Lumina <savagebeaste@yahoo.com> wrote:
> Paul Lalli wrote:
>> On Jul 10, 9:18 am, Lars Eighner <use...@larseighner.com> wrote:
>>> How can I turn off taint checking in a perl cgi script?
>>
>> By removing the -T option from the shebang in the script and/or the
>> perl executable line in your webserver configuration file.
>
> I don't ever recall seeing anything like that in Apache.
Then your CGI scripts are probably not taint checked.
AFAIK there are only two ways to turn taint checking on:
1) Explicitely via the -T flag
2) Implicitely by invoking the perl interpreter with a different ruid
and euid.
>> I have now handed you a loaded hand gun and taught you how to point it
>> at your head. Good luck.
[...]
> Give warning, by all means, but please try to address the question
> itself.
He did, didn't he?
> AFAIK, if you are using setuid perl (where tainting is on by default,
> iirc), then a common method is using a c wrapper that's suid'ed to the
> desired user and executes the script, returning the output.
CGI scripts are generally not invoked with a different ruid and euid, so
this is unlikely to be the reason. However, this can be checked with a
simple script like
#!/usr/bin/perl
use warnings;
use strict;
print "Content-Type: text/plain\n";
print "\n";
print "ruid = $<\n";
print "euid = $>\n";
If this prints the same uid twice, -T flag is left as the only
explanation.
Another question to the OP: Are you really using CGI or are you using
mod_perl? If you use the latter, you share one perl interpreter between
scripts, so the either all have taint checking on or none of them.
hp
--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hjp@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"
------------------------------
Date: 11 Jul 2007 11:41:13 GMT
From: Lars Eighner <usenet@larseighner.com>
Subject: Re: How to turn off taint checking in cgi
Message-Id: <slrnf99gck.121.usenet@goodwill.larseighner.com>
In our last episode, <5fjgrdF3dh7qlU1@mid.individual.net>, the lovely and
talented Clenna Lumina broadcast on comp.lang.perl.misc:
> Paul Lalli wrote:
>> On Jul 10, 9:18 am, Lars Eighner <use...@larseighner.com> wrote:
>>> How can I turn off taint checking in a perl cgi script?
>>
>> By removing the -T option from the shebang in the script and/or the
>> perl executable line in your webserver configuration file.
> I don't ever recall seeing anything like that in Apache.
>> I have now handed you a loaded hand gun and taught you how to point it
>> at your head. Good luck.
> It's amazing how it's always assumed that there isn't a valid reason for
> a question like this. The OP may have a perfectly good reason for doing
> what he's doing
Well, as a matter of fact I do. I am putting together tools to use
to build static pages which are then uploaded to a server which is actually
open to the public, unlike my local one. I've got one user. Me.
> (like, say, debugging in some way), even if we don't
> know what it is. Granted, there's nothing wrong with reiterating a
> danger where danger potentially exists, but at the same time, it seems
> some people act too much like a "parental controls" mechanism, which
> only serves to take the focus from the original question itself.
> Give warning, by all means, but please try to address the question
> itself.
> AFAIK, if you are using setuid perl (where tainting is on by default,
> iirc), then a common method is using a c wrapper that's suid'ed to the
> desired user and executes the script, returning the output.
Well, apparently it is an apache problem -- or at least I have an
apache problem. I cannot get suexec to work on 2.0.x; and nothing
containing backticks, system, etc. works. Similar things also do not
work in sh scripts (although of course everything runs fine from the
commandline). Perl (when taint is given sneering lip service) and sh
scripts work fine with apache 1.3.xx, but I don't have a php5 module that
works with that. Likewise with versions of apache above 2.0.xx -- no
php handler.
Yes. I have compiled and installed five versions of apache in the last 24
hours, and still cannot get a combination that will do both php5 and
cgi, although each of them will do one or the other.
Look folks, I'd like to sell you my new super secure Server-Scripting
combination. It's called Rock v. 1.0. It is unhackable. It won't
execute malicious code, because it won't execute any code at all.
You can put it on your desk or collocate it -- like on the ground with other
rocks. And you don't have to worry about your data, because it doesn't have
any of your data. It's a Rock. And that seems to be what developers are
trying to create.
Hell, if I ran Windoz, it would be just as useless but at least I could
watch the ripped-off videos on you tube.
--
Lars Eighner <http://larseighner.com/> <http://myspace.com/larseighner>
Countdown: 559 days to go.
Friends of Lizbeth: help replace failed a/c at Austin's no-kill shelter
<https://secure.groundspring.org/dn/index.php?aid=12349>
------------------------------
Date: Wed, 11 Jul 2007 04:42:12 GMT
From: merlyn@stonehenge.com (Randal Schwartz)
Subject: new CPAN modules on Wed Jul 11 2007
Message-Id: <JKzzqC.1o21@zorch.sf-bay.org>
The following modules have recently been added to or updated in the
Comprehensive Perl Archive Network (CPAN). You can install them using the
instructions in the 'perlmodinstall' page included with your Perl
distribution.
Alter-0.03
http://search.cpan.org/~anno/Alter-0.03/
Alter Ego Objects
----
Apache-StrReplace-0.01
http://search.cpan.org/~askadna/Apache-StrReplace-0.01/
Filter between string replace
----
Apache-StrReplace-0.01b
http://search.cpan.org/~askadna/Apache-StrReplace-0.01b/
Filter between string replace
----
Apache-StrReplace-0.02
http://search.cpan.org/~askadna/Apache-StrReplace-0.02/
Filter between string replace
----
Astro-FITS-CFITSIO-Simple-0.15
http://search.cpan.org/~djerius/Astro-FITS-CFITSIO-Simple-0.15/
read and write FITS tables
----
B-Generate-1.10
http://search.cpan.org/~jjore/B-Generate-1.10/
Create your own op trees.
----
BerkeleyDB-0.32
http://search.cpan.org/~pmqs/BerkeleyDB-0.32/
Perl extension for Berkeley DB version 2, 3 or 4
----
Bio-FASTASequence-File-0.04
http://search.cpan.org/~reneeb/Bio-FASTASequence-File-0.04/
Perl extension for Bio::FASTASequence
----
Bundle-Perl-Critic-0.02
http://search.cpan.org/~thaljef/Bundle-Perl-Critic-0.02/
A CPAN bundle for Perl::Critic and related modules
----
Carp-REPL-0.07
http://search.cpan.org/~sartak/Carp-REPL-0.07/
read-eval-print-loop on die
----
Crypt-SSLeay-0.56
http://search.cpan.org/~dland/Crypt-SSLeay-0.56/
OpenSSL support for LWP
----
FabForce-DBDesigner4-0.08
http://search.cpan.org/~reneeb/FabForce-DBDesigner4-0.08/
Parse/Analyse XML-Files created by DBDesigner 4 (FabForce)
----
FabForce-DBDesigner4-DBIC-0.01
http://search.cpan.org/~reneeb/FabForce-DBDesigner4-DBIC-0.01/
create DBIC scheme for DBDesigner4 xml file
----
File-BSED-0.2
http://search.cpan.org/~asksh/File-BSED-0.2/
Search/Replace in Binary Files.
----
File-BSED-0.3
http://search.cpan.org/~asksh/File-BSED-0.3/
Search/Replace in Binary Files.
----
File-Next-1.00_01
http://search.cpan.org/~jjore/File-Next-1.00_01/
File-finding iterator
----
Filter-Crypto-1.19
http://search.cpan.org/~shay/Filter-Crypto-1.19/
Create runnable Perl files encrypted with OpenSSL libcrypto
----
HTML-DOM-0.002
http://search.cpan.org/~sprout/HTML-DOM-0.002/
A Perl implementation of the HTML Document Object Model
----
HTML-Template-Compiled-Plugin-Comma-0.01
http://search.cpan.org/~hagy/HTML-Template-Compiled-Plugin-Comma-0.01/
HTC Plugin to commify numbers
----
HTML-Template-Default-1.01
http://search.cpan.org/~leocharre/HTML-Template-Default-1.01/
unless template file is on disk, use default hard coded
----
JSON-XS-1.41
http://search.cpan.org/~mlehmann/JSON-XS-1.41/
JSON serialising/deserialising, done correctly and fast
----
Locale-Country-Geo-0.01
http://search.cpan.org/~clkao/Locale-Country-Geo-0.01/
Module for country geographic location data
----
Mail-Postini-0.07
http://search.cpan.org/~scottw/Mail-Postini-0.07/
Perl extension for talking to Postini
----
MooseX-Daemonize-0.01
http://search.cpan.org/~perigrin/MooseX-Daemonize-0.01/
provides a Role that daemonizes your Moose based application.
----
Options-1.5.1
http://search.cpan.org/~pchriste/Options-1.5.1/
Yet another Perl module to provide support for command-line option parsing and usage generation.
----
POE-Component-CPAN-YACSmoke-0.23
http://search.cpan.org/~bingos/POE-Component-CPAN-YACSmoke-0.23/
bringing the power of POE to CPAN smoke testing.
----
POE-Component-IRC-5.33_01
http://search.cpan.org/~bingos/POE-Component-IRC-5.33_01/
a fully event-driven IRC client module.
----
Parse-Apache-ServerStatus-0.02
http://search.cpan.org/~bloonix/Parse-Apache-ServerStatus-0.02/
Simple module to parse apache's server-status.
----
Parse-Eyapp-1.07
http://search.cpan.org/~casiano/Parse-Eyapp-1.07/
Extensions for Parse::Yapp
----
Perl-Repository-APC-1.251
http://search.cpan.org/~andk/Perl-Repository-APC-1.251/
Class modelling "All Perl Changes" repository
----
RT-SimpleGPGVerify-0.04
http://search.cpan.org/~jesse/RT-SimpleGPGVerify-0.04/
----
SOAP-WSDL-2.00_05
http://search.cpan.org/~mkutter/SOAP-WSDL-2.00_05/
SOAP with WSDL support
----
Sys-Statistics-Linux-0.11_03
http://search.cpan.org/~bloonix/Sys-Statistics-Linux-0.11_03/
Front-end module to collect system statistics
----
Task-Compress-Zlib-0.01
http://search.cpan.org/~ski/Task-Compress-Zlib-0.01/
Installs everything needed for Compress::Zlib
----
Test-FITesque-0.01_002
http://search.cpan.org/~konobi/Test-FITesque-0.01_002/
the FITesque framework!
----
Test-Inline-2.202
http://search.cpan.org/~adamk/Test-Inline-2.202/
Lets you put tests in your modules, next to tested code
----
Test-Inline-2.203
http://search.cpan.org/~adamk/Test-Inline-2.203/
Lets you put tests in your modules, next to tested code
----
WWW-Facebook-API-v0.4.0
http://search.cpan.org/~unobe/WWW-Facebook-API-v0.4.0/
Facebook API implementation
----
Win32-InstallShield-0.3
http://search.cpan.org/~kbaucom/Win32-InstallShield-0.3/
InstallShield data file interface
----
Win32-SharedFileOpen-3.36
http://search.cpan.org/~shay/Win32-SharedFileOpen-3.36/
Open a file for shared reading and/or writing
----
Win32-UTCFileTime-1.46
http://search.cpan.org/~shay/Win32-UTCFileTime-1.46/
Get/set UTC file times with stat/utime on Win32
----
XML-Generator-1.01
http://search.cpan.org/~bholzman/XML-Generator-1.01/
Perl extension for generating XML
----
Xpriori-XMS-0.01
http://search.cpan.org/~kwitknr/Xpriori-XMS-0.01/
Perl extension for Xpriori::XMS Database.
----
Yahoo-Photos-0.0.1
http://search.cpan.org/~daxim/Yahoo-Photos-0.0.1/
Manage Yahoo Photos
----
re-engine-POSIX-0.02
http://search.cpan.org/~avar/re-engine-POSIX-0.02/
POSIX (IEEE Std 1003.1-2001) regular expressions
If you're an author of one of these modules, please submit a detailed
announcement to comp.lang.perl.announce, and we'll pass it along.
This message was generated by a Perl program described in my Linux
Magazine column, which can be found on-line (along with more than
200 other freely available past column articles) at
http://www.stonehenge.com/merlyn/LinuxMag/col82.html
print "Just another Perl hacker," # the original
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
------------------------------
Date: Wed, 11 Jul 2007 07:34:52 -0500
From: "AC" <clarke.n.o.s.p.a.m@hyperformix.com>
Subject: Re: Perl 5.6 vs 5.8
Message-Id: <4694ce84$0$17210$39cecf19@news.twtelecom.net>
"Robert Hicks" <sigzero@gmail.com> wrote in message
news:1184114929.175250.103930@o61g2000hsh.googlegroups.com...
> We are migrating from a 5.6 PA-RISC setup to a 5.8 Itanium setup. It
> is "mostly" working except it is dropping messages that weren't
> before. I am looking more at did something change with how Perl
> handles hashes and arrays between the two versions.
>
> Any help or suggestions would be appreciated.
>
> Robert
>
Please post back to the group anything you run into, as I'm am considering
making the jump from 5.6 to 5.8 soon. My project is also complicated because
Perl/Tk is involved for the GUI.
Allan
------------------------------
Date: Wed, 11 Jul 2007 08:05:11 -0000
From: seminex@gmail.com
Subject: PERL/HTML: extract repetitive information
Message-Id: <1184141111.741731.256460@i38g2000prf.googlegroups.com>
Hi all,
I've an HTML (~5Mo) page like that :
<HTML>
<BODY BGCOLOR=#FFFFFF LINK=000066 VLINK=000066 TOPMARGIN=0
LEFTMARGIN=0 MARGINWIDTH=0 MARGINHEIGHT=0><font size=12
color=#000000>Date debut: 07/07/2007<br>Date fin: 08/07/2007<br>Heure
debut : 01:00:00<br>Heure fin: 01:00:00<br>FTI : access<br><br><TABLE
BORDER=>
<tr bgcolor=#FFFFE8>
<th width=100 align=CENTER>login</th>
<th width=70 align=CENTER>ip</th>
<th width=230 align=CENTER>Num. appelant</th>
<th width=80 align=CENTER>J debut</th>
<th width=60 align=CENTER>H debut</th>
<th width=80 align=CENTER>J fin</th>
<th width=60 align=CENTER>H fin</th>
</tr>
<tr>
<td width=100 align=CENTER bgcolor=#FFFFE8>login/access</td>
<td width=70 align=LEFT bgcolor=#FFFFE8>192.168.30.26</td>
<td width=230 align=LEFT bgcolor=#FFFFE8>Supervision ACCESLIBRE</td>
<td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-06</td>
<td width=60 align=LEFT bgcolor=#FFFFE8>23:59:50</td>
<td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
<td width=60 align=LEFT bgcolor=#FFFFE8>00:00:00</td>
</tr>
<tr>
<td width=100 align=CENTER bgcolor=#FFFFE8>login/access</td>
<td width=70 align=LEFT bgcolor=#FFFFE8>192.168.30.41</td>
<td width=230 align=LEFT bgcolor=#FFFFE8>Supervision ACCESLIBRE</td>
<td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
<td width=60 align=LEFT bgcolor=#FFFFE8>00:00:02</td>
<td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
<td width=60 align=LEFT bgcolor=#FFFFE8>00:00:12</td>
</tr>
</HTML>
I would extract only first hours, in this example, "23:59:50" and
"00:00:02".
I've tried more perl program, but I use regular expression ( /^(\d\d):
(\d\d):(\d\d)/) ) to extract my hours but often, they are nothing
(html error), and I've this :
[..]
1 <tr>
2 <td width=100 align=CENTER bgcolor=#FFFFE8>login/access</td>
3 <td width=70 align=LEFT bgcolor=#FFFFE8>192.168.30.41</td>
4 <td width=230 align=LEFT bgcolor=#FFFFE8>Supervision ACCESLIBRE</td>
5 <td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
6 <td width=60 align=LEFT bgcolor=#FFFFE8> </td>
7 <td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
8 <td width=60 align=LEFT bgcolor=#FFFFE8> </td>
9 </tr>
[..]
So, I ask you if anybody have some sample to extract _only_ line 6..
Because this :
sub tparse {
@input = @_;
chomp(@input);
if($input[0] =~ /^(\d\d):(\d\d):(\d\d)/){
push (@tableau, $input[0]);
}
}
my $p = HTML::Parser->new( api_version => 3,
text_h => [\&tparse, "dtext"]);
$p->parse_file(shift || die "Ne peut ouvrir le fichier ! ($!)\n") ||
die $!;
Extract line 6 and 8 but ONLY if I have hours like 00:00:01 but if I
have nothing, my script extract next and perturb the rest of the
script.
Thank for advance.
------------------------------
Date: Wed, 11 Jul 2007 06:08:14 -0500
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: PERL/HTML: extract repetitive information
Message-Id: <slrnf99egu.v6f.tadmc@tadmc30.sbcglobal.net>
seminex@gmail.com <seminex@gmail.com> wrote:
> I would extract only first hours, in this example, "23:59:50" and
> "00:00:02".
> 1 <tr>
> 2 <td width=100 align=CENTER bgcolor=#FFFFE8>login/access</td>
> 3 <td width=70 align=LEFT bgcolor=#FFFFE8>192.168.30.41</td>
> 4 <td width=230 align=LEFT bgcolor=#FFFFE8>Supervision ACCESLIBRE</td>
> 5 <td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
> 6 <td width=60 align=LEFT bgcolor=#FFFFE8> </td>
> 7 <td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
> 8 <td width=60 align=LEFT bgcolor=#FFFFE8> </td>
> 9 </tr>
> [..]
>
> So, I ask you if anybody have some sample to extract _only_ line 6..
Regexes are not the Right Tool for parsing context free languages
such as HTML.
Use a module that understands HTML for processing HTML data:
----------------------------------
#!/usr/bin/perl
use warnings;
use strict;
use HTML::TableExtract;
my $html = do { local $/; <DATA> };
my $te = new HTML::TableExtract( );
$te->parse($html);
# Examine all matching tables
foreach my $ts ($te->table_states) {
foreach my $row ($ts->rows) {
print "found '$row->[4]'\n";
}
}
__DATA__
<HTML>
<BODY BGCOLOR=#FFFFFF LINK=000066 VLINK=000066 TOPMARGIN=0
LEFTMARGIN=0 MARGINWIDTH=0 MARGINHEIGHT=0><font size=12
color=#000000>Date debut: 07/07/2007<br>Date fin: 08/07/2007<br>Heure
debut : 01:00:00<br>Heure fin: 01:00:00<br>FTI : access<br><br><TABLE
BORDER=>
<tr bgcolor=#FFFFE8>
<th width=100 align=CENTER>login</th>
<th width=70 align=CENTER>ip</th>
<th width=230 align=CENTER>Num. appelant</th>
<th width=80 align=CENTER>J debut</th>
<th width=60 align=CENTER>H debut</th>
<th width=80 align=CENTER>J fin</th>
<th width=60 align=CENTER>H fin</th>
</tr>
<tr>
<td width=100 align=CENTER bgcolor=#FFFFE8>login/access</td>
<td width=70 align=LEFT bgcolor=#FFFFE8>192.168.30.26</td>
<td width=230 align=LEFT bgcolor=#FFFFE8>Supervision ACCESLIBRE</td>
<td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-06</td>
<td width=60 align=LEFT bgcolor=#FFFFE8>23:59:50</td>
<td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
<td width=60 align=LEFT bgcolor=#FFFFE8>00:00:00</td>
</tr>
<tr>
<td width=100 align=CENTER bgcolor=#FFFFE8>login/access</td>
<td width=70 align=LEFT bgcolor=#FFFFE8>192.168.30.41</td>
<td width=230 align=LEFT bgcolor=#FFFFE8>Supervision ACCESLIBRE</td>
<td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
<td width=60 align=LEFT bgcolor=#FFFFE8>00:00:02</td>
<td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
<td width=60 align=LEFT bgcolor=#FFFFE8>00:00:12</td>
</tr>
</HTML>
----------------------------------
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
------------------------------
Date: Wed, 11 Jul 2007 13:23:12 +0200
From: Christian Winter <thepoet_nospam@arcor.de>
Subject: Re: PERL/HTML: extract repetitive information
Message-Id: <4694bda0$0$31624$9b4e6d93@newsspool3.arcor-online.net>
seminex@gmail.com wrote:
> Hi all,
>
> I've an HTML (~5Mo) page like that :
>
> <HTML>
> <BODY BGCOLOR=#FFFFFF LINK=000066 VLINK=000066 TOPMARGIN=0
> LEFTMARGIN=0 MARGINWIDTH=0 MARGINHEIGHT=0><font size=12
> color=#000000>Date debut: 07/07/2007<br>Date fin: 08/07/2007<br>Heure
> debut : 01:00:00<br>Heure fin: 01:00:00<br>FTI : access<br><br><TABLE
> BORDER=>
> <tr bgcolor=#FFFFE8>
> <th width=100 align=CENTER>login</th>
> <th width=70 align=CENTER>ip</th>
> <th width=230 align=CENTER>Num. appelant</th>
> <th width=80 align=CENTER>J debut</th>
> <th width=60 align=CENTER>H debut</th>
> <th width=80 align=CENTER>J fin</th>
> <th width=60 align=CENTER>H fin</th>
> </tr>
> <tr>
> <td width=100 align=CENTER bgcolor=#FFFFE8>login/access</td>
> <td width=70 align=LEFT bgcolor=#FFFFE8>192.168.30.26</td>
> <td width=230 align=LEFT bgcolor=#FFFFE8>Supervision ACCESLIBRE</td>
> <td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-06</td>
> <td width=60 align=LEFT bgcolor=#FFFFE8>23:59:50</td>
> <td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
> <td width=60 align=LEFT bgcolor=#FFFFE8>00:00:00</td>
> </tr>
> <tr>
> <td width=100 align=CENTER bgcolor=#FFFFE8>login/access</td>
> <td width=70 align=LEFT bgcolor=#FFFFE8>192.168.30.41</td>
> <td width=230 align=LEFT bgcolor=#FFFFE8>Supervision ACCESLIBRE</td>
> <td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
> <td width=60 align=LEFT bgcolor=#FFFFE8>00:00:02</td>
> <td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
> <td width=60 align=LEFT bgcolor=#FFFFE8>00:00:12</td>
> </tr>
> </HTML>
>
> I would extract only first hours, in this example, "23:59:50" and
> "00:00:02".
> I've tried more perl program, but I use regular expression ( /^(\d\d):
> (\d\d):(\d\d)/) ) to extract my hours but often, they are nothing
> (html error), and I've this :
>
> [..]
> 1 <tr>
> 2 <td width=100 align=CENTER bgcolor=#FFFFE8>login/access</td>
> 3 <td width=70 align=LEFT bgcolor=#FFFFE8>192.168.30.41</td>
> 4 <td width=230 align=LEFT bgcolor=#FFFFE8>Supervision ACCESLIBRE</td>
> 5 <td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
> 6 <td width=60 align=LEFT bgcolor=#FFFFE8> </td>
> 7 <td width=80 align=LEFT bgcolor=#FFFFE8>2007-07-07</td>
> 8 <td width=60 align=LEFT bgcolor=#FFFFE8> </td>
> 9 </tr>
> [..]
>
> So, I ask you if anybody have some sample to extract _only_ line 6..
>
> Because this :
>
> sub tparse {
> @input = @_;
> chomp(@input);
> if($input[0] =~ /^(\d\d):(\d\d):(\d\d)/){
> push (@tableau, $input[0]);
> }
> }
>
> my $p = HTML::Parser->new( api_version => 3,
> text_h => [\&tparse, "dtext"]);
> $p->parse_file(shift || die "Ne peut ouvrir le fichier ! ($!)\n") ||
> die $!;
>
> Extract line 6 and 8 but ONLY if I have hours like 00:00:01 but if I
> have nothing, my script extract next and perturb the rest of the
> script.
I've never been fond of using HTML::Parser directly. In this case,
I'd use HTML::TreeBuilder, which is IMHO more intuitive:
--------------------------------------------------------------------
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TreeBuilder;
my @files = ("test.html");
my @hits;
foreach my $fn ( @files )
{
my $h = HTML::TreeBuilder->new_from_file( $fn );
foreach( $h->look_down('_tag', 'tr' ) )
{
my $field = ($_->look_down('_tag', 'td' ))[4];
push @hits, $field->as_text()
if( $field && $field->as_text() =~ /^\d\d:\d\d:\d\d$/ );
}
}
print $_.$/ foreach( @hits );
__END__
--------------------------------------------------------------------
HTH
-Chris
------------------------------
Date: Wed, 11 Jul 2007 11:46:03 -0000
From: Seminex <seminex@gmail.com>
Subject: Re: PERL/HTML: extract repetitive information
Message-Id: <1184154363.885546.159200@n60g2000hse.googlegroups.com>
Great it works !
2 solutions, 2 manners of proceeding, it's great !
Thanks all !!!
Thank lot off !!
I'm discovering HTML::TableExtract, it's fun :p
Thanks !
:)
------------------------------
Date: Wed, 11 Jul 2007 13:39:20 +0100
From: Martin Gregorie <martin@see.sig.for.address>
Subject: Re: Portable general timestamp format, not 2038-limited
Message-Id: <qkvem4-slt.ln1@zoogz.gregorie.org>
Ilya Zakharevich wrote:
> [A complimentary Cc of this posting was sent to
> Martin Gregorie
> <martin@see.sig.for.address>], who wrote in article <u1fdm4-32o.ln1@zoogz.gregorie.org>:
>> Its in "A Short History of Time". Sorry I can't quote chapter or page,
>> but a friend borrowed my copy and lent me Dawkins "Climbing Mount
>> Improbable" before vanishing, never to be seen since. Not an equal
>> exchange: I preferred ASHOT to CMI.
>
Oops - I should have written "A Brief History of Time". It was the first
edition, so I don't know if it was altered/edited out of later versions.
> I would prefer a reference to a peer-reviewed paper. ;-)
>
Sure, but I don't think you'll find one. It was in a descriptive, rather
than rigorous, passage. But then, the book famously had only one
equation in it.
--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
------------------------------
Date: Wed, 11 Jul 2007 05:54:22 -0500
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: Remove a specific element from an Array
Message-Id: <slrnf99dmu.v6f.tadmc@tadmc30.sbcglobal.net>
Petr Vileta <stoupa@practisoft.cz> wrote:
> Tad McClellan wrote:
>> Petr Vileta <stoupa@practisoft.cz> wrote:
>>> Sumit wrote:
>>
>>>> #!/usr/bin/perl -w
>>
>>> if (@updateNames[$item] eq $item2)
>>
>>
>> You should not ignore the warnings that perl issues.
>>
> Well, sorry ;-) Should be
>
> if ($updateNames[$item] eq $item2)
>
> but on my Perl 5.6.1 work well both forms ;-)
Sometimes it doesn't make a difference, but sometimes it does.
See:
perldoc -q difference
What is the difference between $array[1] and @array[1]?
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
------------------------------
Date: Wed, 11 Jul 2007 11:51:41 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: TXL-like capability?
Message-Id: <slrnf99a1d.sr0.hjp-usenet2@zeno.hjp.at>
On 2007-07-10 14:27, SomeDeveloper <somedeveloper@gmail.com> wrote:
> Can I do source to source transformations elegantly in Perl?
>
> I'm a compilers and TXL newbie, but as I'm reading more about TXL, I'm
> seeing that what it provides at the end of the day is:
> 1. the ability to define/specify arbitrary grammars, and
> 2. the ability to specify semantic actions (for grammar
> productions) via search/replace patterns.
>
> Since Perl is the king of regex's,
Regexes (even perl regexes, which are more powerful than regular
expressions) are not sufficient to specify arbitrary grammars.
Assuming that by TXL you mean the programming language available from
http://www.txl.ca/, they use BNF to specify the grammar, not regexes.
> I'm wondering where I can stay
> within Perl for all my source to source transformation needs.
You can use Parse::RecDescent to build parsers from a BNF-like
description.
hp
--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hjp@hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 643
**************************************