[24245] in Perl-Users-Digest
Perl-Users Digest, Issue: 6436 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Apr 21 03:05:44 2004
Date: Wed, 21 Apr 2004 00:05:11 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Wed, 21 Apr 2004 Volume: 10 Number: 6436
Today's topics:
Re: "cloning" perl / CPAN install <spamtrap@dot-app.org>
.plx <robin @ infusedlight.net>
Re: .plx <bmb@ginger.libs.uga.edu>
Re: .plx <spamtrap@dot-app.org>
Re: .plx <matthew.garrish@sympatico.ca>
Re: .plx <bmb@ginger.libs.uga.edu>
Re: .plx <jurgenex@hotmail.com>
Re: .plx <tassilo.parseval@rwth-aachen.de>
Re: .plx <robin @ infusedlight.net>
Re: .plx <robin @ infusedlight.net>
Re: activeperl + -T option <robin @ infusedlight.net>
Re: activeperl + -T option <robin @ infusedlight.net>
Re: activeperl + -T option <tassilo.parseval@rwth-aachen.de>
cgi.pm <robin @ infusedlight.net>
Re: Extract data using Curl Unix Command & Perl Script (Fiaz Idris)
Re: incorrect value for HOSTNAME executed as cron job <robin @ infusedlight.net>
Re: Is it possible to 'word wrap' lines just using RegE <dave@dave.org.uk>
Match Offset: length($`) <hawkesm@on3etel.n5et.u9k>
Re: Match Offset: length($`) <uri@stemsystems.com>
Re: Match Offset: length($`) <tassilo.parseval@rwth-aachen.de>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Tue, 20 Apr 2004 22:05:29 -0400
From: Sherm Pendley <spamtrap@dot-app.org>
Subject: Re: "cloning" perl / CPAN install
Message-Id: <XJidnRl-0JJ0SxjdRVn-jg@adelphia.com>
Walter Roberson wrote:
> poster's question about where exactly perl -MCPAN -e shell was
> documented.
perldoc CPAN.pm
> As for your reference to the "help" command: here is what the help
> menu says about autobundle as of perl 5.8.3:
>
> [...]
> autobundle Snapshot force cmd unconditionally do
> cmd
> cpan> help autobundle
> Detailed help not yet implemented
Yep, I was too quick on the trigger this time. Sorry. :-(
The documentation as it stands is pretty bad; 'perldoc CPAN' gets you a
*very* terse description of the 'cpan' command from the POD docs
in /usr/bin/CPAN. Using 'perldoc CPAN.pm' instead will get you the older,
much more thorough, documentation found in the module.
sherm--
--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
------------------------------
Date: Tue, 20 Apr 2004 18:03:52 -0700
From: "Robin" <robin @ infusedlight.net>
Subject: .plx
Message-Id: <c64j7i$m0m$1@reader2.nmix.net>
What's the difference between .plx and .pl when used with cgi?
cc to address below.
--
Regards,
-Robin
--
[ webmaster @ infusedlight.net ]
------------------------------
Date: Tue, 20 Apr 2004 21:26:10 -0400
From: Brad Baxter <bmb@ginger.libs.uga.edu>
Subject: Re: .plx
Message-Id: <Pine.A41.4.58.0404202125260.12704@ginger.libs.uga.edu>
On Tue, 20 Apr 2004, Robin wrote:
> What's the difference between .plx and .pl when used with cgi?
none.
------------------------------
Date: Tue, 20 Apr 2004 22:21:51 -0400
From: Sherm Pendley <spamtrap@dot-app.org>
Subject: Re: .plx
Message-Id: <iZGdnc0szJFdRxjdRVn-iQ@adelphia.com>
Robin wrote:
> What's the difference between .plx and .pl when used with cgi?
In the context of handling a form posting to a web server, .plx is most
commonly used on Windows servers for ISAPI scripts. ISAPI is the IIS
server's moral equivalent of mod_perl.
Unless you've moved to a Windows server, you can't use it. Unless you need
to handle an absurdly high amount of traffic, you don't need it. And unless
you're an advanced Perl programmer, you won't know how to write for it.
In other words, you probably can't use it, almost certainly don't need it,
and definitely aren't (yet) ready for it. So fuhgeddaboudit. ;-)
sherm--
--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
------------------------------
Date: Tue, 20 Apr 2004 22:14:38 -0400
From: "Matt Garrish" <matthew.garrish@sympatico.ca>
Subject: Re: .plx
Message-Id: <gOkhc.50736$Gp4.1082541@news20.bellglobal.com>
"Brad Baxter" <bmb@ginger.libs.uga.edu> wrote in message
news:Pine.A41.4.58.0404202125260.12704@ginger.libs.uga.edu...
> On Tue, 20 Apr 2004, Robin wrote:
>
> > What's the difference between .plx and .pl when used with cgi?
>
> none.
>
Not true, generally speaking. When you install ActivePerl on a Windoze
system .pl is mapped to the perl executable while .plx gets mapped to the
perlis.dll, which is what I assume he's after.
To our good friend the OP, take your own advice and RTFM to find out why
that makes a difference...
Matt
------------------------------
Date: Tue, 20 Apr 2004 23:40:19 -0400
From: Brad Baxter <bmb@ginger.libs.uga.edu>
Subject: Re: .plx
Message-Id: <Pine.A41.4.58.0404202338400.12674@ginger.libs.uga.edu>
On Tue, 20 Apr 2004, Matt Garrish wrote:
> "Brad Baxter" <bmb@ginger.libs.uga.edu> wrote in message
> news:Pine.A41.4.58.0404202125260.12704@ginger.libs.uga.edu...
> > On Tue, 20 Apr 2004, Robin wrote:
> >
> > > What's the difference between .plx and .pl when used with cgi?
> >
> > none.
> >
> Not true, generally speaking. When you install ActivePerl on a Windoze
> system .pl is mapped to the perl executable while .plx gets mapped to the
> perlis.dll, which is what I assume he's after.
Hmmm. Wrong twice in one night. I think I'll give it a break.
Regards,
Brad
------------------------------
Date: Wed, 21 Apr 2004 04:05:29 GMT
From: "Jürgen Exner" <jurgenex@hotmail.com>
Subject: Re: .plx
Message-Id: <dqmhc.32052$L31.25947@nwrddc01.gnilink.net>
Robin wrote:
> What's the difference between .plx and .pl when used with cgi?
The letter "x"
> cc to address below.
Huh? Why would anyone want to do do that?
jue
------------------------------
Date: 21 Apr 2004 06:19:35 GMT
From: "Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de>
Subject: Re: .plx
Message-Id: <c653pm$85mcb$1@ID-231055.news.uni-berlin.de>
Also sprach Jürgen Exner:
> Robin wrote:
>> What's the difference between .plx and .pl when used with cgi?
>
> The letter "x"
>
>> cc to address below.
>
> Huh? Why would anyone want to do do that?
Maybe because he asks for it? It's not a violation against usenet
etiquette to request a CCed reply since there'll still be a follow-up in
the group for others to see.
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
------------------------------
Date: Tue, 20 Apr 2004 23:56:55 -0700
From: "Robin" <robin @ infusedlight.net>
Subject: Re: .plx
Message-Id: <c6576r$smf$1@reader2.nmix.net>
"Sherm Pendley" <spamtrap@dot-app.org> wrote in message
news:iZGdnc0szJFdRxjdRVn-iQ@adelphia.com...
> Robin wrote:
>
> > What's the difference between .plx and .pl when used with cgi?
>
> In the context of handling a form posting to a web server, .plx is most
> commonly used on Windows servers for ISAPI scripts. ISAPI is the IIS
> server's moral equivalent of mod_perl.
>
> Unless you've moved to a Windows server, you can't use it. Unless you need
> to handle an absurdly high amount of traffic, you don't need it. And
unless
> you're an advanced Perl programmer, you won't know how to write for it.
>
> In other words, you probably can't use it, almost certainly don't need it,
> and definitely aren't (yet) ready for it. So fuhgeddaboudit. ;-)
>
> sherm--
>
actually I know quite a bit about nt servers.... plx would probably be cool
though.
> --
> Cocoa programming in Perl: http://camelbones.sourceforge.net
> Hire me! My resume: http://www.dot-app.org
------------------------------
Date: Tue, 20 Apr 2004 23:57:32 -0700
From: "Robin" <robin @ infusedlight.net>
Subject: Re: .plx
Message-Id: <c6576s$smf$2@reader2.nmix.net>
"Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de> wrote in message
news:c653pm$85mcb$1@ID-231055.news.uni-berlin.de...
> Also sprach Jürgen Exner:
>
> > Robin wrote:
> >> What's the difference between .plx and .pl when used with cgi?
> >
> > The letter "x"
> >
> >> cc to address below.
> >
> > Huh? Why would anyone want to do do that?
>
> Maybe because he asks for it? It's not a violation against usenet
> etiquette to request a CCed reply since there'll still be a follow-up in
> the group for others to see.
gotcha. Thanks- robin
------------------------------
Date: Tue, 20 Apr 2004 18:07:08 -0700
From: "Robin" <robin @ infusedlight.net>
Subject: Re: activeperl + -T option
Message-Id: <c64j7l$m0m$4@reader2.nmix.net>
"Clyde Ingram"
<clydenospamorham@nospamorhamgetofftheline.freeservenospamorham.co.uk> wrote
in message news:9o7hc.52$pl4.33@newsfe3-win.server.ntli.net...
> Robin,
>
> "Robin" <robin @ infusedlight.net> wrote in message
> news:c62hsr$qci$1@reader2.nmix.net...
> > I am running active perl 8.2.3 Build 809 and I'm wondering why when I
turn
> > on taint mode checking on the #!/usr/bin/perl line whenver I run the
> script
> > it gives me an error "Too late for -T option at bbs.pl line 1." and
> whenever
> > I run the script with perl -T bbs.pl it works fine
>
> You have not said which platform you are running on.
> What you hint at should behave correctly on most UNIX systems.
> There are several well documented bugs in how DROSS and Windoze systems
> invoke Perl programs . . .
>
> > ....is there any
> > configuration file I can edit so perl will automatically understand it
to
> be
> > run with a -T option? I want to run the script with a Perl IDE that I've
> > downloaded and it gives me this error unless I take out the -T option.
Do
> I
> > have to take out the -T evertime I run the script with the IDE or is
there
> > something I can do?
>
> I assume Windoze.
>
> From the ActivePerl user guide, look at
>
file://E:\Perl\html\faq\Windows\ActivePerl-Winfaq4.html#What_s_the_equivalen
t_of_the_she
> (for E:, substitute the drive you have installed ActivePerl on):
> <QUOTE>Unfortunately, Win32 platforms don't provide the shebang syntax, or
> anything like it. You can try one of the two following methods to run a
> script from the command line. If all else fails, you can always just call
> the perl interpreter directly, as in perl myscript.pl.
>
> . . .
>
> For Windows NT 4.0/2000, the coolest method is to use associated file
types
> (see How do I associate Perl scripts with perl?). If you've associated
Perl
> scripts with the .pl extension, you can just type the name of your file at
> the command line and Windows NT/2000 will launch perl.exe for you.
>
> </QUOTE>
>
> I guess you could hard-wire "-T" into the Perl command line associated
with
> extension ".pl", but that would impose taint checking everywhere, which
> would give you headaches.
>
>
>
> If you change the PATHEXT environment variable to include .pl files, like
> this:
>
> SET PATHEXT=.pl;%PATHEXT%
> you can just type the file name without an extension, and Windows NT/2000
> will find the first .pl file in your path with that name. You may want to
> set PATHEXT in the System control panel rather than on the command line.
> Otherwise, you'll have to re-enter it each time the command prompt window
> closes.
>
> <QUOTE> Note that the file association method does not work for Windows
9x,
> nor does it work with Windows NT/2000 if you have command extensions
> disabled. You can, however, still start the Perl script from an Explorer
> window if the extension is associated with perl.
>
> Another option is to use the pl2bat utility distributed with ActivePerl to
> convert your Perl script into a batch file. What this does is tag some
Win32
> batch language to the front of your script so that the system calls the
perl
> interpreter on the file. It's quite a clever piece of batch coding,
> actually.
>
> If you call the pl2bat utility on your Perl script helloworld.pl, like
this:
>
> C:\> pl2bat helloworld.pl
> it will produce a batch file, helloworld.bat. You can then invoke the
script
> just like this:
>
> C:\> helloworld
> Hello, World!
> You can pass command line parameters, as well. Your script can be in your
> PATH, or in another directory, and the pl2bat code will usually find it
and
> execute it correctly. The big advantage of this over file associations is
> that I/O redirection will work correctly.
>
> pl2bat has a number of useful command line options to affect how the
> wrapping is done, what command line switches to pass to perl, etc. Running
> perldoc pl2bat at the command line will show a full description of these
> options.
>
> </QUOTE>
>
> When I run this little script, called "trial_shebang.pl":
> #!e:\perl\bin\perl.exe -wT
>
> use strict;
> print "Howdy do there\n"
>
> I see what you saw:
> D:\Clyde\perldev\Trial>trial_shebang.pl
> Too late for "-T" option at
D:\Clyde\perldev\Trial\trial_shebang.pl
> line 1.
>
> When I run:
> D:\Clyde\perldev\Trial>pl2bat trial_shebang.pl
>
> pl2bat creates a DROSS batch file "trial_shebang.bat"
> When I run it, I see this:
>
> D:\Clyde\perldev\Trial>trial_shebang
> Howdy do there
>
> Now, whether this is any help to you depends on how your IDE invokes your
> Perl programs.
> Which IDE was it?
> .
> Regards,
> Clyde
Yeah, optiperl still isn't working with this, but it's cool, it's still
running. Check out new scripts at www.infusedlight.net
-Later,
Robin
------------------------------
Date: Tue, 20 Apr 2004 18:07:43 -0700
From: "Robin" <robin @ infusedlight.net>
Subject: Re: activeperl + -T option
Message-Id: <c64j7n$m0m$5@reader2.nmix.net>
"Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de> wrote in message
news:c6304s$7edtm$1@ID-231055.news.uni-berlin.de...
> Also sprach Robin:
>
> > I am running active perl 8.2.3 Build 809 and I'm wondering why when I
turn
> > on taint mode checking on the #!/usr/bin/perl line whenver I run the
script
> > it gives me an error "Too late for -T option at bbs.pl line 1." and
whenever
> > I run the script with perl -T bbs.pl it works fine....is there any
> > configuration file I can edit so perl will automatically understand it
to be
> > run with a -T option? I want to run the script with a Perl IDE that I've
> > downloaded and it gives me this error unless I take out the -T option.
Do I
> > have to take out the -T evertime I run the script with the IDE or is
there
> > something I can do?
>
> The reason why this happens is that your operating system (most probably
> Windows) doesn't take the shebang line into account. However, perl does.
> It executes the script and looks at the shebang line to see whether it
> should include some switches (like -w). This doesn't work with the -T
> switch because a perl instance cannot switch to tainted mode. It has to
> know right from the start that it should use taintedness.
>
> You can probably tell your IDE to use 'perl -T' instead of 'perl' as the
> interpreter.
>
> Tassilo
> --
>
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
>
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
>
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
So Perl's not that smart... I'm so burned out on perl, what's perl.
-Robin
------------------------------
Date: 21 Apr 2004 06:25:41 GMT
From: "Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de>
Subject: Re: activeperl + -T option
Message-Id: <c65455$7uns0$1@ID-231055.news.uni-berlin.de>
Also sprach Robin:
> So Perl's not that smart... I'm so burned out on perl, what's perl.
perl is the thing which runs programs written in Perl. See
What's the difference between "perl" and "Perl"?
in perlfaq1.
Other than that, the tainting-happening-too-late issue is none of the
language. It's about the interpreter being not smart enough to do it.
However, tainted mode hooks very deeply into the interpreter and it is
not trivial to switch from untainted to tainted mode at runtime. It may
look trivial to you but taintedness has some serious implications for
the whole interpreter in nearly all aspects.
That's why the interpreter has to be told about it right at the start.
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
------------------------------
Date: Tue, 20 Apr 2004 23:30:54 -0700
From: "Robin" <robin @ infusedlight.net>
Subject: cgi.pm
Message-Id: <c655kv$s58$1@reader2.nmix.net>
What are the easter eggs in cgi.pm... ? I must ask.
--
Regards,
-Robin
--
[ webmaster @ infusedlight.net ]
------------------------------
Date: 20 Apr 2004 20:37:53 -0700
From: ifiaz@hotmail.com (Fiaz Idris)
Subject: Re: Extract data using Curl Unix Command & Perl Script from Webpage
Message-Id: <93c1947c.0404201937.503951bc@posting.google.com>
I happen to solve my original problem by using the following
perlscript. There are two problems with this scrpt
1) After about 90-100 times inside the loop, the loop doesn't
progress anymore but just waits. So I have to Ctrl+C the script
and use a new starting count and start again. And the same happens
again and again...
2) Occasionally the behaviour is uncertain.
Could someone guide me where I should change in the script or give
any other valuable advice. Thanks.
I am using cygwin on a windows machine with perl 5.8.2
Script
-------
#!/usr/bin/perl -w
use LWP::Simple;
use HTML::TableExtract;
use LWP::UserAgent;
my $browser = LWP::UserAgent->new;
for ($regno=2225700; $regno<=2230000; $regno=$regno+50) {
sleep 5;
print STDERR "$regno\n";
print "\n";
my $response = $browser->post(
'http://www.chennaionline.com/msuniversity/result.asp',
[
'Codeid' => 'BA',
'Exam_Registration_Number' => $regno
],
);
$curcontent = $response->{_content};
my $all_te = new HTML::TableExtract( depth=>1, count=> 2 );
my $all_tem = new HTML::TableExtract( depth=>1, count=> 3);
#$all_te->parse_file("flt.txt");
$all_te->parse($curcontent);
$all_tem->parse($curcontent);
foreach $ts ($all_te->table_states) {
foreach $row($ts->rows) {
for($i=0; $i<@$row; $i++) {
my $temprow = $row->[$i];
#print "***<$temprow>***\n";
$temprow =~ s/^[\s\W\n]+(.*)\s+$/$1/g;
#$temprow =~ s/$unknownchar//g;
if ($temprow =~ /Registration/) { next; }
if ($temprow =~ /Name/) { next; }
if ($temprow =~ /College/) { next; }
print "$temprow, ";
}
#print "\n"
}
}
foreach $ts ($all_tem->table_states) {
foreach $row($ts->rows) {
for($i=0; $i<@$row; $i++) {
my $temprow = $row->[$i];
#print "***<$temprow>***\n";
$temprow =~ s/^[\s\W\n]+(.*)\s+$/$1/g;
#$temprow =~ s/$unknownchar//g;
if ($temprow =~ /Subject/) { next; }
if ($temprow =~ /Marks/) { next; }
if ($temprow =~ /Result/) { next; }
if ($temprow =~ /CONTROLLER/) { next; }
print "$temprow, ";
}
#print "\n";
}
}
}
__END__
------------------------------
Date: Tue, 20 Apr 2004 18:06:14 -0700
From: "Robin" <robin @ infusedlight.net>
Subject: Re: incorrect value for HOSTNAME executed as cron job
Message-Id: <c64j7j$m0m$3@reader2.nmix.net>
"Julian" <jrodri@HotPop.com> wrote in message
news:fd6c6323.0404201006.a2fce49@posting.google.com...
> Hi.
>
> I have a problem with a script in perl that obtain value of HOSTNAME
> environment variable.
>
> !/usr/bin/perl
>
> # Autoflush
> $|=1;
>
> do("/home/sixsl/scripts/constantes.pl");
>
> $lock_dir="/var/dbsync";
>
> $host_actual=$ENV{HOSTNAME};
>
> ...
>
> When I executed the script manually from the shell it obtain correct
> value for hostname (harpo for this case). But when I put the script in
> the cron the perl obtain incorrect value, it gets localhost. Cron job
> is executed with the same user that I executed the script from the
> shell.
>
> The cron daemon is vixie cron. The perl version is 5.8.0. The Linux
> box is RedHat 9.0.
>
> Julian.
probably something with your crontabs.... or check the paths.
-Robin
------------------------------
Date: Wed, 21 Apr 2004 08:01:35 +0100
From: Dave Cross <dave@dave.org.uk>
Subject: Re: Is it possible to 'word wrap' lines just using RegEx?
Message-Id: <pan.2004.04.21.07.01.34.248930@dave.org.uk>
On Tue, 20 Apr 2004 09:16:38 +0100, chris-usenet wrote:
> W. D. <NewsGroups@us-webmasters.com> wrote:
>> Is there to split a line like the following, *ONLY* using a regular
>> expression?
[ snip ]
> (
> echo 'The quick brown fox jumped over the lazy sleeping dog. The rain in';
> echo 'Spain falls mainly on the plain.'
> ) |
> perl -ape 's/\s(in)\n/\n$1 /'
But that doesn't use only regular expressions. It uses the substitution
operator too :)
> I wonder if Text::Format is what you really want?
Or Text::Wrap.
Dave...
------------------------------
Date: Wed, 21 Apr 2004 07:01:33 +0100
From: Marco <hawkesm@on3etel.n5et.u9k>
Subject: Match Offset: length($`)
Message-Id: <Y0ohc.10$DK4.3@newsfe1-win>
I need the offset of a matched substring. I can run the match and
then get the length of $` ($PREMATCH), but the Camel Book says this
entails a performance loss for every other match in the program.
However, Benchmark::timethis shows that using $` is actually the
fastest way.
# Benchmark::timethis -- runs 108225.11 times per second
sub get_matched_offset_1 {
$text =~ /\b$keyword\b/;
length $`
}
# Benchmark::timethis -- runs 89766.61 times per second
sub get_matched_offset_2 {
$text =~ /(\b$keyword\b)/;
index $text, $1;
}
# Benchmark::timethis -- runs 72150.07 times per second
sub get_matched_offset_3 {
$text =~ /(.*?)\b$keyword\b/;
length $1;
}
I benchmarked each subroutine separately because I didn't want
get_matched_offset_1's use of $` to harm the performance of
the latter 2. Any opinions on how I should implement
get_matched_offset?
Marco
----------------------------------------------------
Please remove digits from e-mail address (tr/0-9//d)
------------------------------
Date: Wed, 21 Apr 2004 06:27:25 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: Match Offset: length($`)
Message-Id: <x7n055ong2.fsf@mail.sysarch.com>
>>>>> "M" == Marco <hawkesm@on3etel.n5et.u9k> writes:
M> I need the offset of a matched substring. I can run the match and
M> then get the length of $` ($PREMATCH), but the Camel Book says this
M> entails a performance loss for every other match in the program.
M> However, Benchmark::timethis shows that using $` is actually the
M> fastest way.
M> # Benchmark::timethis -- runs 108225.11 times per second
M> sub get_matched_offset_1 {
M> $text =~ /\b$keyword\b/;
M> length $`
M> }
M> # Benchmark::timethis -- runs 89766.61 times per second
M> sub get_matched_offset_2 {
M> $text =~ /(\b$keyword\b)/;
M> index $text, $1;
that isn't guaranteed to be correct. what if $keyword appears inside
another earlier word? the index will find that and not the matched one.
have you looked at @+ in perlvar? it does what you want.
M> I benchmarked each subroutine separately because I didn't want
M> get_matched_offset_1's use of $` to harm the performance of
M> the latter 2. Any opinions on how I should implement
M> get_matched_offset?
you didn't bypass the $` issue as you don't understand it. what it does
is force a complete copy of all strings matched in or s/// that don't
already have grabs (strings with grabs are already copied). this copy is
needed since the code doesn't know where $& will be used (since it is a
global). and since you didn't use s/// the use of $& is moot. but
regardless of the effciency issues, using $& is bad perl coding as it is
a global and that is not nice.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
------------------------------
Date: 21 Apr 2004 06:32:22 GMT
From: "Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de>
Subject: Re: Match Offset: length($`)
Message-Id: <c654hl$83pu7$1@ID-231055.news.uni-berlin.de>
Also sprach Marco:
> I need the offset of a matched substring. I can run the match and
> then get the length of $` ($PREMATCH), but the Camel Book says this
> entails a performance loss for every other match in the program.
It does.
> However, Benchmark::timethis shows that using $` is actually the
> fastest way.
But does your benchmark also time other matches (where you don't need
$PREMATCH)? The slow-down means that even inconspicuous lines such as
if (/^\d+$/)
are now made capturing by perl.
The slow-down does not happen for matches that are capturing anyway. Try
it on non-capturing matches and see what happens.
> # Benchmark::timethis -- runs 108225.11 times per second
> sub get_matched_offset_1 {
> $text =~ /\b$keyword\b/;
> length $`
> }
>
> # Benchmark::timethis -- runs 89766.61 times per second
> sub get_matched_offset_2 {
> $text =~ /(\b$keyword\b)/;
> index $text, $1;
> }
>
> # Benchmark::timethis -- runs 72150.07 times per second
> sub get_matched_offset_3 {
> $text =~ /(.*?)\b$keyword\b/;
> length $1;
> }
>
> I benchmarked each subroutine separately because I didn't want
> get_matched_offset_1's use of $` to harm the performance of
> the latter 2. Any opinions on how I should implement
> get_matched_offset?
Have a look at @- and @+ in perlvar.pod. It even explains how to emulate
$`, $' and $& with these two arrays.
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 6436
***************************************