[31516] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 2775 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jan 18 18:09:41 2010

Date: Mon, 18 Jan 2010 15:09:08 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 18 Jan 2010     Volume: 11 Number: 2775

Today's topics:
    Re: embedded Perl process in Apache using mod_perl <paduille.4061.mumia.w+nospam@earthlink.net>
    Re: embedded Perl process in Apache using mod_perl <hjp-usenet2@hjp.at>
    Re: embedded Perl process in Apache using mod_perl <ben@morrow.me.uk>
    Re: FAQ 9.16 How do I decode a CGI form? <paduille.4061.mumia.w+nospam@earthlink.net>
    Re: perl runtime model <pengyu.ut@gmail.com>
    Re: perl runtime model <pengyu.ut@gmail.com>
    Re: perl runtime model <smallpond@juno.com>
    Re: perl runtime model sln@netherlands.com
    Re: perl runtime model <sreservoir@gmail.com>
    Re: perl runtime model <pengyu.ut@gmail.com>
    Re: perl runtime model <sreservoir@gmail.com>
    Re: perl runtime model <jurgenex@hotmail.com>
    Re: search and replace in Perl <paduille.4061.mumia.w+nospam@earthlink.net>
    Re: search and replace in Perl <tadmc@seesig.invalid>
    Re: search and replace in Perl <tadmc@seesig.invalid>
    Re: Subroutines and $_[0] <willem@stack.nl>
    Re: Subroutines and $_[0] sln@netherlands.com
    Re: Subroutines and $_[0] <uri@StemSystems.com>
    Re: Subroutines and $_[0] sln@netherlands.com
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Mon, 18 Jan 2010 12:49:37 -0600
From: "Mumia W." <paduille.4061.mumia.w+nospam@earthlink.net>
Subject: Re: embedded Perl process in Apache using mod_perl
Message-Id: <wbSdnU5yluMmKcnWnZ2dnUVZ_gGdnZ2d@earthlink.com>

On 01/18/2010 10:03 AM, ccc31807 wrote:
> Could someone give a simple and short explanation of what it means to
> say that mod_perl embeds a Perl process in Apache?
> 
> I understand that classical CGI passes requests for a script to Perl,
> which fires up a Perl process, runs the script, returns a value, and
> shuts down. I also understand that FastCGI does the same thing except
> that the Perl process doesn't start and stop with each request but
> remains running in the background.
> 
> However, I don't understand t mechanism of an embedded Perl process in
> mod_perl, and the question seems so elementary that the documentation
> doesn't address it. Or maybe I just haven't read the right
> documentation.
> 
> Thanks, CC.

Mod_perl adds itself to apache's "core." Apache--the webserver, now can 
run Perl code. This means that it's no longer necessary for Apache to 
invoke an external process (slow) to interpret Perl code. When Apache 
needs to do so, it calls /itself/.

That arrangement allows for a dramatic speedup in processing Perl 
scripts while adding considerable complexity as well. Apache's "core" 
isn't simple.



------------------------------

Date: Mon, 18 Jan 2010 21:21:33 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: embedded Perl process in Apache using mod_perl
Message-Id: <slrnhl9gmd.tdc.hjp-usenet2@hrunkner.hjp.at>

On 2010-01-18 16:03, ccc31807 <cartercc@gmail.com> wrote:
> Could someone give a simple and short explanation of what it means to
> say that mod_perl embeds a Perl process in Apache?
>
> I understand that classical CGI passes requests for a script to Perl,
> which fires up a Perl process, runs the script, returns a value, and
> shuts down.

That's at least ambiguously formulated.

> I also understand that FastCGI does the same thing except
> that the Perl process doesn't start and stop with each request but
> remains running in the background.
>
> However, I don't understand t mechanism of an embedded Perl process in
> mod_perl, and the question seems so elementary that the documentation
> doesn't address it.


In CGI, the web server 

 * sets some environment variables from the request header (see 
   RFC 3875, section 4.1 for details)
 * starts the CGI program as a new process, with stdin and stdout
   connected to the server (usually via pipes).
 * If there is a request body (e.g. in a POST or PUT request) it is sent 
   to the stdin of the newly created process.
 * The response of CGI program is read from the pipe connected to the
   program's stdout and (with minor changes) passed on to the client.

The CGI program may or may not be written in Perl. The server doesn't
know or care - it does not invoke a perl interpreter, it invokes the CGI
program (on Unix systems the kernel then determines the type of the 
executable and invokes a suitable interpreter if necessary).

In FastCGI:

 * The FCGI program runs as an independent server process which listens 
   on a  (unix or tcp) socket for requests.
 * Startup and Shutdown of this process is often managed by the web
   server, but not necessarily.
 * The web server uses a special protocol to send requests to and read
   responses from the FCGI process.

Again, a FastCGI program may be written in any language, the web server
doesn't care. In fact, the FastCGI program may even run on a different
host!

In mod_perl:

 * The web server has an embedded perl interpreter, i.e., it is linked
   with libperl and knows how to call this interpreter (this is a
   (C) function call, not a program invokation).
 * The perl interpreter has access to an API, so that it can access
   at the request and response data.
 * At or before the first request, the web server/perl interpreter
   compiles the Perl script (possibly including some wrapper code),
   so we now have one or more compiled perl subs/methods.
 * For each request, the perl bytecode interpreter runs the appropriate 
   sub/method.

Here the web server does know that the code is written in Perl - it
needs to because it needs it interpret it itself.

	hp



------------------------------

Date: Mon, 18 Jan 2010 20:23:36 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: embedded Perl process in Apache using mod_perl
Message-Id: <8jse27-h1h.ln1@osiris.mauzo.dyndns.org>


Quoth ccc31807 <cartercc@gmail.com>:
> Could someone give a simple and short explanation of what it means to
> say that mod_perl embeds a Perl process in Apache?

See perlembed. In simple terms, perl is built as a library (usually
called libperl.so), which provides functions to create and manipulate
Perl interpreters. perl(1) itself is a very simple program that links
libperl, and uses it to create, run and destroy a single interpreter.
mod_perl is an Apache DSO that also links libperl, and generally creates
more than one interpreter. (One consequence of this is that if you were
to delete /usr/bin/perl, mod_perl wouldn't be affected.)

Ben



------------------------------

Date: Mon, 18 Jan 2010 12:51:27 -0600
From: "Mumia W." <paduille.4061.mumia.w+nospam@earthlink.net>
Subject: Re: FAQ 9.16 How do I decode a CGI form?
Message-Id: <wbSdnUlyluMhKcnWnZ2dnUVZ_gEAAAAA@earthlink.com>

On 01/18/2010 09:55 AM, Helmut Richter wrote:
> On Sat, 16 Jan 2010, Mumia W. wrote:
> [...]
>> The changes I made to your program were modest (no real changes). I placed a
>> copy here:
>> http://home.earthlink.net/~mumia.w.18.spam/docs/try-binary1.txt
> 
> Could you please leave it there for a while? Or should I get my own copy?
> 
> Thank you.
> 

I can leave it there.



------------------------------

Date: Mon, 18 Jan 2010 12:50:07 -0800 (PST)
From: Peng Yu <pengyu.ut@gmail.com>
Subject: Re: perl runtime model
Message-Id: <30addf21-b36d-44f5-85ca-700fc4ab8d8d@o28g2000yqh.googlegroups.com>

On Jan 18, 1:03=A0pm, "Jochen Lehmeier" <OJZGSRPBZ...@spammotel.com>
wrote:
> On Mon, 18 Jan 2010 16:01:03 +0100, Peng Yu <pengyu...@gmail.com> wrote:
> > Could somebody let me know a reference on the runtime of a perl
> > script? I.e how a perl script run?
>
> If you are asking for the internal execution details, then you have to =
=A0
> read the source code (of perl, not your script). It's fun!
>
> Everything else is explained to total exhaustion in "man perl" (also fun =
=A0
> to read through).

'man perl' lists a lot of documents. Would you please be specific on
which one I should read?

> > What parts make a perl script be slower than an equivalent, say, C =A0
> > programe?
>
> Google "interpreted vs compiled".

I don't see a webpage specifically listing what language features in
perl may cause a perl script be slower than a C program doing the
same. Here, "same" means that given the same input the output is the
same. That is, I don't care what language features are used in a perl
script, as long as the perl script is fastest among all perl scripts
that can produce the same output from the same input.

Would you please point me a webpage?

> Plus, just for your information, your question is pure flame bait - don't=
 =A0
> be irritated if you get heated answers.



------------------------------

Date: Mon, 18 Jan 2010 13:03:12 -0800 (PST)
From: Peng Yu <pengyu.ut@gmail.com>
Subject: Re: perl runtime model
Message-Id: <fbbb679e-681c-4b67-8849-1e940f078ff1@r24g2000yqd.googlegroups.com>

On Jan 18, 9:55=A0am, smallpond <smallp...@juno.com> wrote:
> On Jan 18, 10:01=A0am, Peng Yu <pengyu...@gmail.com> wrote:
>
> > Could somebody let me know a reference on the runtime of a perl
> > script? I.e how a perl script run? What parts make a perl script be
> > slower than an equivalent, say, C programe?
>
> What do you mean by equivalent?

Two programs are "equivalent" if and only if given the same input, the
output is the same. That is, what are used in the two programs do not
matter.

Now, given the same input and the same output, theoretically, we can
have a fastest perl program and a fastest C program that are
equivalent. "fastest" is measured in terms of the runtime.

It is generally understand that this fastest perl program is slower
than this fastest C program in a lot of cases. I want to understand
why it is so and what features in perl cause perl be slow in these
cases.

> Perl scalar variables have runtime type checking and conversion. =A0A
> "string" in C is just a pointer to memory with no checking, so of
> course the code runs much faster.

Are you sure it is only the runtime type check make perl slower.

> If you wrote a C library that supported Perl scalar data types it's
> not clear that C would be much faster. =A0And C doesn't have hash
> or array data types at all.

I have no interest in blaming perl is slow. It is not relevant to my
OP to explain in what aspect perl is not slow.


------------------------------

Date: Mon, 18 Jan 2010 13:24:18 -0800 (PST)
From: smallpond <smallpond@juno.com>
Subject: Re: perl runtime model
Message-Id: <53e72801-f441-4b08-bdb0-dd07b85d264b@a15g2000yqm.googlegroups.com>

On Jan 18, 4:03=A0pm, Peng Yu <pengyu...@gmail.com> wrote:
> On Jan 18, 9:55=A0am, smallpond <smallp...@juno.com> wrote:
>
> > On Jan 18, 10:01=A0am, Peng Yu <pengyu...@gmail.com> wrote:
>
> > > Could somebody let me know a reference on the runtime of a perl
> > > script? I.e how a perl script run? What parts make a perl script be
> > > slower than an equivalent, say, C programe?
>
 ... snip ...
>
> I have no interest in blaming perl is slow. It is not relevant to my
> OP to explain in what aspect perl is not slow.

In that case I don't understand your question.


------------------------------

Date: Mon, 18 Jan 2010 13:32:43 -0800
From: sln@netherlands.com
Subject: Re: perl runtime model
Message-Id: <mhj9l59v8cj9ed724ig5nhfc9odl20vv9r@4ax.com>

On Mon, 18 Jan 2010 13:03:12 -0800 (PST), Peng Yu <pengyu.ut@gmail.com> wrote:

>On Jan 18, 9:55 am, smallpond <smallp...@juno.com> wrote:
>> On Jan 18, 10:01 am, Peng Yu <pengyu...@gmail.com> wrote:
>>
>> > Could somebody let me know a reference on the runtime of a perl
>> > script? I.e how a perl script run? What parts make a perl script be
>> > slower than an equivalent, say, C programe?
>>
>> What do you mean by equivalent?
>
>Two programs are "equivalent" if and only if given the same input, the
>output is the same. That is, what are used in the two programs do not
>matter.
>
>Now, given the same input and the same output, theoretically, we can
>have a fastest perl program and a fastest C program that are
>equivalent. "fastest" is measured in terms of the runtime.
>
>It is generally understand that this fastest perl program is slower
>than this fastest C program in a lot of cases. I want to understand
>why it is so and what features in perl cause perl be slow in these
>cases.
>
>> Perl scalar variables have runtime type checking and conversion.  A
>> "string" in C is just a pointer to memory with no checking, so of
>> course the code runs much faster.
>
>Are you sure it is only the runtime type check make perl slower.
>
>> If you wrote a C library that supported Perl scalar data types it's
>> not clear that C would be much faster.  And C doesn't have hash
>> or array data types at all.
>
>I have no interest in blaming perl is slow. It is not relevant to my
>OP to explain in what aspect perl is not slow.

Assembly has operations that can do assignments, addition
substraction, bitwise shifts (multiplication/division), sets
flags, can take the contents of one variable and move it to
another variable. It has hardwired variables and access to
memory locations, which can be variables or hold data.
It has a few real stacks, push/pop etc, and pseudo stacks.

These can all be used to efficiently process data, move it
around, etc..

There is a language below assembly, called microcode. Each assembly
instruction executes several microcode instructions. Below microcode
are flip/flop gates and electron plasma. Below plasma there are quarks
and sub-atomic particles and the quantum physics realm.

This is the language tree up to assembly. Obviously, sub-atomic
particles react faster than assembly instructions. The reason is that
assembly requires organization of the particles into electron gates (cpu),
the gates facilitate microcode, which facilitate assembly.

More and more overhead is required for each level of organization (language)
up you go.

Finally, you reach a point of dynamic language called your desktop, it is
you dragging things around scheduling tasks and doing many things in your
multi-threaded world. You do everyday, dynamically program and run your
own world.

So you see that on the low level, while things are fast, not too much
can be done with limited organization. The more you want done, the more
dynamic an language is, the more organization it takes, the more time
it consumes as with any higher level complexities.

End of PhD thesis!!

-sln


------------------------------

Date: Mon, 18 Jan 2010 16:57:19 -0500
From: sreservoir <sreservoir@gmail.com>
Subject: Re: perl runtime model
Message-Id: <hj2lgc$qdb$1@speranza.aioe.org>

On 1/18/2010 3:50 PM, Peng Yu wrote:
> I don't see a webpage specifically listing what language features in
> perl may cause a perl script be slower than a C program doing the
> same. Here, "same" means that given the same input the output is the
> same. That is, I don't care what language features are used in a perl
> script, as long as the perl script is fastest among all perl scripts
> that can produce the same output from the same input.

the short version: everything not directly related to your problem.

perl does a lot of stuff not directly related to your problem whenever
you rune something. it handles memory, and scoping, and it takes a few
cycles at the beginning and end to compiled and clean up after itself.

your c program isn't doing this stuff. your c program can just ignore
everything irrelevant.

the real problem comes when you get input you're not expecting: in c,
you might get a segfault or you might get a silent buffer overflow. in
perl, you'll get a nice warning message that tells you what went wrong.

-- 

   "Six by nine. Forty two."
   "That's it. That's all there is."
   "I always thought something was fundamentally wrong with the universe"


------------------------------

Date: Mon, 18 Jan 2010 14:12:23 -0800 (PST)
From: Peng Yu <pengyu.ut@gmail.com>
Subject: Re: perl runtime model
Message-Id: <867486e8-6a20-4cb1-8362-6f962952b621@36g2000yqu.googlegroups.com>

On Jan 18, 3:57=A0pm, sreservoir <sreserv...@gmail.com> wrote:
> On 1/18/2010 3:50 PM, Peng Yu wrote:
>
> > I don't see a webpage specifically listing what language features in
> > perl may cause a perl script be slower than a C program doing the
> > same. Here, "same" means that given the same input the output is the
> > same. That is, I don't care what language features are used in a perl
> > script, as long as the perl script is fastest among all perl scripts
> > that can produce the same output from the same input.
>
> the short version: everything not directly related to your problem.
>
> perl does a lot of stuff not directly related to your problem whenever
> you rune something. it handles memory, and scoping, and it takes a few
> cycles at the beginning and end to compiled and clean up after itself.
>
> your c program isn't doing this stuff. your c program can just ignore
> everything irrelevant.
>
> the real problem comes when you get input you're not expecting: in c,
> you might get a segfault or you might get a silent buffer overflow. in
> perl, you'll get a nice warning message that tells you what went wrong.

Your understanding of output is more limited than I meant. Warning
message is also considered as 'output' in my original message. Would
you please reconsider my question?


------------------------------

Date: Mon, 18 Jan 2010 17:27:29 -0500
From: sreservoir <sreservoir@gmail.com>
Subject: Re: perl runtime model
Message-Id: <hj2n8u$u7f$1@speranza.aioe.org>

On 1/18/2010 5:12 PM, Peng Yu wrote:
> On Jan 18, 3:57 pm, sreservoir<sreserv...@gmail.com>  wrote:
>> On 1/18/2010 3:50 PM, Peng Yu wrote:
>>
>>> I don't see a webpage specifically listing what language features in
>>> perl may cause a perl script be slower than a C program doing the
>>> same. Here, "same" means that given the same input the output is the
>>> same. That is, I don't care what language features are used in a perl
>>> script, as long as the perl script is fastest among all perl scripts
>>> that can produce the same output from the same input.
>>
>> the short version: everything not directly related to your problem.
>>
>> perl does a lot of stuff not directly related to your problem whenever
>> you rune something. it handles memory, and scoping, and it takes a few
>> cycles at the beginning and end to compiled and clean up after itself.
>>
>> your c program isn't doing this stuff. your c program can just ignore
>> everything irrelevant.
>>
>> the real problem comes when you get input you're not expecting: in c,
>> you might get a segfault or you might get a silent buffer overflow. in
>> perl, you'll get a nice warning message that tells you what went wrong.
>
> Your understanding of output is more limited than I meant. Warning
> message is also considered as 'output' in my original message. Would
> you please reconsider my question?

if it were to have the same input and the same output in every possibly
conceivable situation, it's not likely to be significantly faster.

and stderr is distnct from stdout, but that's not very relevant.

-- 

   "Six by nine. Forty two."
   "That's it. That's all there is."
   "I always thought something was fundamentally wrong with the universe"


------------------------------

Date: Mon, 18 Jan 2010 14:59:23 -0800
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: perl runtime model
Message-Id: <vmp9l5heatf637t5s1k5ij405ji5t6i46m@4ax.com>

Peng Yu <pengyu.ut@gmail.com> wrote:
>On Jan 18, 9:55 am, smallpond <smallp...@juno.com> wrote:
>> On Jan 18, 10:01 am, Peng Yu <pengyu...@gmail.com> wrote:
>>
>> > Could somebody let me know a reference on the runtime of a perl
>> > script? I.e how a perl script run? What parts make a perl script be
>> > slower than an equivalent, say, C programe?
>>
>> What do you mean by equivalent?
>
>Two programs are "equivalent" if and only if given the same input, the
>output is the same. That is, what are used in the two programs do not
>matter.
>
>Now, given the same input and the same output, theoretically, we can
>have a fastest perl program and a fastest C program that are
>equivalent. "fastest" is measured in terms of the runtime.
>
>It is generally understand that this fastest perl program is slower
>than this fastest C program in a lot of cases. I want to understand
>why it is so and what features in perl cause perl be slow in these
>cases.

C is a low-level language, much closer to native assembler code. 
Perl is a high-level language providing much more powerful features to
the programmer. Of course those features come at a price. 

>> Perl scalar variables have runtime type checking and conversion.  A
>> "string" in C is just a pointer to memory with no checking, so of
>> course the code runs much faster.
>
>Are you sure it is only the runtime type check make perl slower.

No. The main difference is obviously that Perl is interpreted while C is
compiled.

jue


------------------------------

Date: Mon, 18 Jan 2010 13:15:42 -0600
From: "Mumia W." <paduille.4061.mumia.w+nospam@earthlink.net>
Subject: Re: search and replace in Perl
Message-Id: <wbSdnUhyluMgKcnWnZ2dnUVZ_gGdnZ2d@earthlink.com>

On 01/18/2010 11:40 AM, Dominic Philsby wrote:
> Hi, I'm using Perl to do simple text search & replace within a text
> file. The Perl version, sample file, and commandline syntax I am using
> is shown below.
> 
> 
> C:\test>
> C:\test>
> C:\test>type file.txt
> the quick brown cow jumps over the lazy horse
> C:\test>
> C:\test>
> C:\test>perl  -p -e "s/cow/fox/g;s/horse/dog/g" file.txt
> the quick brown fox jumps over the lazy dog
> C:\test>
> C:\test>
> C:\test>perl -v
> 
> This is perl, v5.6.1 built for MSWin32-x86
> 
> Copyright 1987-2001, Larry Wall
> 
> Perl may be copied only under the terms of either the Artistic License
> or the
> GNU General Public License, which may be found in the Perl 5 source
> kit.
> 
> Complete documentation for Perl, including FAQ lists, should be found
> on
> this system using `man perl' or `perldoc perl'.  If you have access to
> the
> Internet, point your browser at http://www.perl.com/, the Perl Home
> Page.
> 
> 
> C:\test>
> C:\test>
> C:\test>
> 
> 
> My file.txt document contains only one line but the real files are
> several hundred thousand lines.
> The words I am changing are not just "cow" and "horse" but hundreds of
> words.
> 
> I am using Windows.
> 
> In my commandline program, my question is rather than specifying "s/
> cow/fox/g;s/horse/dog/g" on the commandline, I want to reference a
> file containing this. In otherwords, I want my commandline program to
> reference a text file, lets call it regexReplace.txt, containing the
> following
> 
> s/cow/fox/g;
> s/horse/dog/g;
> 
> Can someone help me out with the syntax or how to do this?
> Thank you
> 
> Dominic

Here is an example:

#!/usr/bin/perl
use strict;
use warnings;
use Fatal qw/open close/;

my ($fh, %regmap);

open $fh, '<', 'data/cow-regexp.txt'; # Notice Fatal above.
while (my $mapline = <$fh>) {
     next if !($mapline =~ /(\w+);(\w+)/);
     $regmap{$1} = $2;
}
close $fh;

my $regex = join('|',keys %regmap); # use alternation in regexp.

open $fh, '<', 'data/cow-file.txt';
while (my $dataline = <$fh>) {
     $dataline =~ s/($regex)/$regmap{$1}/g;
     print $dataline;
}
close $fh;

--------------------------
The file cow-regexp.txt looks like this:

cow;fox
horse;dog

I hope this helps.


------------------------------

Date: Mon, 18 Jan 2010 13:30:14 -0600
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: search and replace in Perl
Message-Id: <slrnhl9dfr.jd3.tadmc@tadbox.sbcglobal.net>

Dominic Philsby <dominicphilsby@googlemail.com> wrote:
> Hi, I'm using Perl to do simple text search & replace within a text
> file. The Perl version, sample file, and commandline syntax I am using
> is shown below.
>
>
> C:\test>
> C:\test>
> C:\test>type file.txt
> the quick brown cow jumps over the lazy horse
> C:\test>
> C:\test>
> C:\test>perl  -p -e "s/cow/fox/g;s/horse/dog/g" file.txt
> the quick brown fox jumps over the lazy dog

> In my commandline program, my question is rather than specifying "s/
> cow/fox/g;s/horse/dog/g" on the commandline, I want to reference a
                                               ^^^^^^
> file containing this. 


You might consider adjusting what you want.

If this was my job to do, I'd use a hash and a single regular 
expression (see: perldoc -q "many regular expressions"):

---------------------
#!/usr/bin/perl
use warnings;
use strict;

my %subst = (
    cow   => 'fox',
    horse => 'dog',
);

my $pattern = join '|', sort {$b cmp $a} keys %subst;

$_ = "the quick brown cow jumps over the lazy horse\n";
s/($pattern)/$subst{$1}/g;
print;
---------------------


Loading %subst from a file ought to be a trivial change.


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"


------------------------------

Date: Mon, 18 Jan 2010 15:18:53 -0600
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: search and replace in Perl
Message-Id: <slrnhl9jri.jl0.tadmc@tadbox.sbcglobal.net>

Mumia W. <paduille.4061.mumia.w+nospam@earthlink.net> wrote:

[ snip matching many patterns at once ]

> my $regex = join('|',keys %regmap); # use alternation in regexp.


I once lost about 3 hours because I did it that way, so let me help
others avoid such a fate...

If you have a pattern that is a prefix of some other pattern, say "cow"
and "cows", then you better do something to ensure that the longer
one is leftmost in your regex's alternation.

I usually just do a sort in descending order:

    my $regex = join('|', sort {$b cmp $a} keys %regmap);

This program outputs "Cows" when it should be "COWS":

--------------------------------------------
#!/usr/bin/perl
use warnings;
use strict;

my %regmap;
while (my $mapline = <DATA>) {
     next if !($mapline =~ /(\w+);(\w+)/);
     $regmap{$1} = $2;
}

my $regex = join('|',keys %regmap); # use alternation in regexp.

$_ = "the quick brown cow jumps over the lazy cows\n";
s/($regex)/$regmap{$1}/g;
print;

__DATA__
cow;Cow
cows;COWS
--------------------------------------------


The order is dependant on the order given by keys(), with my
5.10 perl, it makes

    cow|cows

you should ensure that it is instead

    cows|cow

as my code line above does.


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"


------------------------------

Date: Mon, 18 Jan 2010 19:34:28 +0000 (UTC)
From: Willem <willem@stack.nl>
Subject: Re: Subroutines and $_[0]
Message-Id: <slrnhl9du4.1vv8.willem@turtle.stack.nl>

George wrote:
) Willem wrote:
)> Uri Guttman wrote:
)> ) or even this so you can declare $title and make sure it is set to
)> ) something useful
)> )
)> ) my $title = $_[0] =~ m{...}) ? $1 : '' ;
)> )
)> ) but i still can't see why he had to save the value in a lexical to make
)> ) it work. i think there is unpasted code that affects things.
)> 
)> Because otherwise $_[0] is an alias for $1, and the next regular
)> expression will change the value of $1, and therefore the value of $_[0] ?
)> 
)> This is a pretty basic perl gotcha, you know.
)> 
)> 
)> SaSW, Willem
) I am actually not doing any substitutions, simply checking whether there 
) is a pattern match or not. I cut down the subroutine to:

 <snip>
)
) But, the print statement on $_[0] prints foobar - could you please 
) explain why this is the case?

You're passing $1 as an argument.
That makes $_[0] an alias for $1.
The next regexp match, $1 gets the value of its first paren match.
The value of $_[0], being an alias for $1, therefore also gets this value.

I hope I explained it clearly enough this time.

Code:

perl -e 'my $x = "foobarbaz"; my $y = "fefifofum";
$x =~ /(......)/; foofun($1);
sub foofun { print "$_[0]\n"; $y =~ /(......)/; print "$_[0]\n" }'

Result:

foobar
fefifo

Clear now ?


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT


------------------------------

Date: Mon, 18 Jan 2010 11:52:14 -0800
From: sln@netherlands.com
Subject: Re: Subroutines and $_[0]
Message-Id: <10e9l55hf5cvlnvsfuuj3qgihp2p53s0or@4ax.com>

On Mon, 18 Jan 2010 18:06:03 +0000, George <me@me.com> wrote:

>Willem wrote:
>> Uri Guttman wrote:
>> ) or even this so you can declare $title and make sure it is set to
>> ) something useful
>> )
>> ) my $title = $_[0] =~ m{...}) ? $1 : '' ;
>> )
>> ) but i still can't see why he had to save the value in a lexical to make
>> ) it work. i think there is unpasted code that affects things.
>> 
>> Because otherwise $_[0] is an alias for $1, and the next regular
>> expression will change the value of $1, and therefore the value of $_[0] ?
>> 
>> This is a pretty basic perl gotcha, you know.
>> 
>> 
>> SaSW, Willem
>I am actually not doing any substitutions, simply checking whether there 
>is a pattern match or not. I cut down the subroutine to:
>
>sub check_url {
>
>print "Orginal 1: ",$_[0],"\n";
>my $test = "foobar12";
>print "Test before: ",$test,"\n";
>$test =~ m/([a-z]{1,})[0-9]{1,}/;
>print "Test 2: ",$test,"\n";
>print "Original argument: ",$_[0], "\n";
>
>}
>
>Now, if I run it the original argument (first print statement) prints as
>  <!--
>             -->
>     <div class="listing_results_logo"><a 
>href="http://www.test.com/hayles-accountants.html"><img border="0" 
>src="http://www.test.com/template/default-siva/images/noimage.gif" 
>alt="Hayles Accountants" /></a></div>
>     <div class="listing_results_listing">
>         <div class="listing_results_rating"></div>
>         <div class="listing_results_title"><a 
>href="http://www.test.com/hayles-accountants.html"><span 
>class="listing_default">Hayles Accountants</span></a>   </div>
>         <div class="listing_results_address">
>
>                 <!-- new code added by PMD GFX -->
>                 Boston Rd,<br />                Hanwell,<br /> 
>                  Middlesex    W7 3TT           <!-- end -->
>                 <br /><br />
>
>         </div>
>
>         <div class="listing_results_description">
>                      </div>
>     </div>
>
>which is correct as well as the value of $test which is foobar12. My 
>understanding is that the next line will only check if the test variable 
>matches the pattern and will not make any changes to it, so the next 
>print statement is correct as well (am I right here in thinking that it 
>will just check if there is a match and not change the original variable)?
>
>But, the print statement on $_[0] prints foobar - could you please 
>explain why this is the case?

It does NOT print 'foobar'.  In your code $_[0] does not alias the $test
variable! Even if it did alias $test, you did not do any substitution
in your regex so it will still be foobar12 if $_[0] did alias $test,
which it doesen't.

$_[] do not alias $1 vars directly unless $1 is passed in as a parameter:

myfunc($1);

sub myfunc {
  # when $1 is passed in, $_[0] becomes an alias for $1
  print $_[0];
  $_[] =~ /asd(fas)df/;
  print $_[0];
}

$1 aliased in myfunc() is readonly and is subject to change
upon the first regular expression.

So, passing in the $(n) variables when using a subs $_[] variables
is not a good idea. The same holds true if passing in a tempoary
like myfunc("this").

But none of this is your problem. I think in your desperation you
are typing/changing test lines so fast you are not even sure
of what you are seeing.

You can alias all you want but its something you should read
up on a little more.

-sln
--------------
use strict;
use warnings;

my $string = "this is 999 a string";

check_match ($string);

my $str = "howdy all";
check_substitution ($str);
check_substitution ("howdy all ");

exit 0;

###
sub check_match
{
	print "\nOrginal 1: ",$_[0],"\n";
	my $test = "foobar12";
	print "Test before: ",$test,"\n";
	$test =~  /([a-z]{1,})[0-9]{1,}/;
	print "\$test = $test  ,,  \$1 = $1\n";
	print "Original argument: ",$_[0], "\n";
	check_match2 ($1);
}

sub check_match2
{
	print "\nOrginal 1: ",$_[0],"\n";
	my ($test)  = $_[0] =~ /([a-z]{1,3})[0-9]*/;
	print "\$test = $test  ,,  \$1 = $1\n";
	print "Original argument: ",$_[0], "\n";
}

sub check_substitution
{
	print "\nOrginal 1: ",$_[0],"\n";
	$_[0] =~ s/[a-z]{1,3}//;
	print "Original argument: ",$_[0], "\n";
}
__END__

Orginal 1: this is 999 a string
Test before: foobar12
$test = foobar12  ,,  $1 = foobar
Original argument: this is 999 a string

Orginal 1: foobar
$test = foo  ,,  $1 = foo
Original argument: foo

Orginal 1: howdy all
Original argument: dy all

Orginal 1: howdy all
Modification of a read-only value attempted at hh.pl line 38.



------------------------------

Date: Mon, 18 Jan 2010 15:25:01 -0500
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: Subroutines and $_[0]
Message-Id: <87r5pnkvoi.fsf@quad.sysarch.com>

>>>>> "JB" == John Bokma <john@castleamber.com> writes:

  JB> Since $_[ 0 ] is an alias for $1, you modify $_[ 0 ] if you modify $1 in
  JB> your regexp.

i see it now. i was thinking about it backwards as in $1 is readonly and
if you pass it and then modify it, you get errors. this is the case
where you alias $1 and it get set (not modified) by the grab and so the
alias is also set to its new value.

yes, and the solution is to not do that! :)

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Mon, 18 Jan 2010 12:36:59 -0800
From: sln@netherlands.com
Subject: Re: Subroutines and $_[0]
Message-Id: <7sf9l5h0iic2t25mcq4lhvogk9ckomh18u@4ax.com>

On Mon, 18 Jan 2010 19:43:31 +0100, "Jochen Lehmeier" <OJZGSRPBZVCX@spammotel.com> wrote:

>On Mon, 18 Jan 2010 18:40:42 +0100, Uri Guttman <uri@stemsystems.com>
>wrote:
>
>> [...]
>
>Look what happened when I played around with the OP's case yesterday. I  
>thought I'd ignore what happened then, but I find it interesting, actually.
>
># perl -v
>
>This is perl, v5.8.8 built for i486-linux-gnu-thread-multi
>
># perl -e '
>"a" =~ m/(.*)/;
>print "before: $1\n";
>fn($1);
>print "after: $1\n";
>
>sub fn
>{
>     print "inside 1: @_\n";
>     "b" =~ m//;                  # !!!!!
              ^^
I thought this used to be an error since it
matches nothing. Apparently, the construct of
target/regex operator/pattern parses into
a function call where @_ is loaded with the target/regex and
passed to the engine. Upon seeing m//, it doesen't seem to restore
(unwind) the callers @_.

     "b" =~ /sdfds/;

works, even though it doesen't match.
Try this:

sub fn
{
     print "inside 1: $_[0]\n";
#     "b" =~ m//;                  # !!!!!
     my $jj = 'c';
     $jj =~ s///;                  # !!!!!
     print "inside 2: $_[0]\n";
     print "inside 3: $jj\n";
}
before: a
inside 1: a
inside 2: c
inside 3:
after: c

sub fn
{
     print "inside 1: $_[0]\n";
#     "b" =~ m//;                  # !!!!!
     my $jj = 'c';
     $jj =~ s/g//;                  # !!!!!
     print "inside 2: $_[0]\n";
     print "inside 3: $jj\n";
}
before: a
inside 1: a
inside 2: a
inside 3: c
after: a


>     print "inside 2: @_\n"
>}'
>
>before: a
>inside 1: a
>inside 2: b
>after: b               # !!!!!
>
># perl -e '
>"a" =~ m/(.*)/;
>print "before: $1\n";
>fn($1);
>print "after: $1\n";
>
>sub fn
>{
>     print "inside 1: @_\n";
>     "b" =~ m/(.*)/;                  # !!!!!
               ^^^^
Works correctly even on a non-match like
      "b" =~ /pkj/;

>     print "inside 2: @_\n"
>}'
>
>before: a
>inside 1: a
>inside 2: b
>after: a               # !!!!!
>
>First, some pertinent lines from the documentation:
>
>perlvar on $1...: "These variables are all read-only and dynamically  
>scoped to the current BLOCK."
>perlop: m// uses "the last successfully matched regular expression".
>perlsub: "The array @_ is a local array, but its elements are aliases for  
>the actual scalar parameters."
>
>So, in the first test, m// matches and uses the old regexp m/(.*)/. It  
>sets $1 from the point of the view of the sub *and* of the caller. ***This  
>contradicts the scoping to the current BLOCK*** if I'm not mistaken.
>
>In the second test, m/(.*)/ matches. It sets $1 from the point of view of  
>the sub (it's not in the code but we can assume that ;-) ) ****and also  
>$_[0]**** which is aliased to $1 (in the caller). ***But it does not set  
>$1*** from the point of view of the caller. ***How does it know that $_[0]  
>is $1?***
>
>There is something very funny going on, which I would definitely not have  
>expected.
>
>I think one could explain the first effect (that m// overwrites the  
>original $1) by assuming that the $1 is actually linked to the regexp, and  
>by re-using the old regexp, is overwritten. Though this is not documented.  
>The second escapes me.

-sln


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 2775
***************************************


home help back first fref pref prev next nref lref last post