[22319] in Perl-Users-Digest
Perl-Users Digest, Issue: 4540 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Feb 10 19:02:06 2003
Date: Mon, 10 Feb 2003 15:56:40 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Mon, 10 Feb 2003 Volume: 10 Number: 4540
Today's topics:
Re: Re-running DBI 'prepare' on same statement handle <nobull@mail.com>
Re: Re-running DBI 'prepare' on same statement handle (John Ramsden)
Re: Re-running DBI 'prepare' on same statement handle (Andrew Perrin (CLists))
Re: Re-running DBI 'prepare' on same statement handle <nobull@mail.com>
Re: Re-running DBI 'prepare' on same statement handle <rereidy@indra.com>
Re: Re-running DBI 'prepare' on same statement handle <goldbb2@earthlink.net>
Re: Re-running DBI 'prepare' on same statement handle <goldbb2@earthlink.net>
Reference issues in Class::Struct (Flamesplash)
Re: Reference issues in Class::Struct <nobull@mail.com>
Re: Reference issues in Class::Struct <flamesplash@nospam.yahoo.com>
Re: reference to abc.pl?a=b does not display as html in <spamtrap@nowhere.com>
Regex difficulty (Allan Cady)
Re: Regex difficulty <jurgenex@hotmail.com>
Re: Regex difficulty <abigail@abigail.nl>
Re: Regex difficulty <egwong@netcom.com>
Re: Regex difficulty (Allan Cady)
Re: Regex difficulty (Allan Cady)
Re: Regex difficulty <abigail@abigail.nl>
Re: Regex difficulty (Tad McClellan)
Re: Regex difficulty (Tad McClellan)
Re: Regex difficulty (Allan Cady)
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 07 Feb 2003 17:12:13 +0000
From: Brian McCauley <nobull@mail.com>
Subject: Re: Re-running DBI 'prepare' on same statement handle
Message-Id: <u9wukcuiaa.fsf@wcl-l.bham.ac.uk>
john_ramsden@sagitta-ps.com (John Ramsden) writes:
> Subject: Re-running DBI 'prepare' on same statement handle
> What concerns me slightly though is that if the same statement
> handle is assigned the value returned by a new prepare, will the
> memory associated with the preceding prepare be freed?
The essense of your problem is much simpler and not specfic to DBI or
prepare.
my $foo = something_returning_an_object_reference(1);
# ...do some stuff...
$foo = something_returning_an_object_reference(2);
You are concerned about the first object getting desctroyed.
Don't worry, this is Perl not C++.
Things in Perl are reference counted. As soon as $foo no longer
points to the first object then the reference count will drop to zero
and it will get GCed. Assuming, of course, you've not got any other
references hanging around.
There are a few execptional object classes that doen't GC neatly
because they contain circular unweakend references. I'm fairly sure
that DBI statement handles are not one.
> In practice this re-prepare shouldn't happen very often during
> the lifetime of the program; but if there is some way I can
> explicitly ditch any resources associated with a prepare
> before rerunning it, I'd feel more comfortable.
>
> (I tried $sth->free(); but that function name isn't recognized.
What make you think that it should be?
> Would either of 'undef $sth' or '$sth->destroy()' work?)
Yes, undef($sth) would work.
As, indeed, would $sth='banana' but it wouldn't be as ideomatic.
Either would move the destruction of the old object to happen before
the construction of the new one rather than after it.
--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
------------------------------
Date: 7 Feb 2003 09:37:12 -0800
From: john_ramsden@sagitta-ps.com (John Ramsden)
Subject: Re: Re-running DBI 'prepare' on same statement handle
Message-Id: <d27434e.0302070937.24b0f15d@posting.google.com>
Ron Reidy <rereidy@indra.com> wrote in message news:<3E439E18.8090301@indra.com>...
> See below ...
>
> John Ramsden wrote:
> > I have a monitoring app which needs to store statement handles
> > for DBI $dbh->prepare() calls, but on occasion the WHERE clause
> > of some SQL statements will change and when this happens the app
> > must rerun every prepare().
> >
> > [...]
> >
> > What concerns me slightly though is that if the same statement
> > handle is assigned the value returned by a new prepare, will the
> > memory associated with the preceding prepare be freed? Or will
> > it hang around, and possibly cause problems when the statement
> > is executed?
> >
> > [...]
> >
> > (I tried $sth->free(); but that function name isn't recognized.
> > Would either of 'undef $sth' or '$sth->destroy()' work?)
>
> Try $sth->finish
Thanks. yes, I wondered about that. But isn't finish() more for
deallocating resources after the statement has been executed?
In particular, can't you do the following sequence, which implies
that finish() doesn't clear up the prepare data?:
prepare
execute
:::
finish
execute
:::
finish
:::
Cheers
John Ramsden (john_ramsden@sagitta-ps.com and jr@adslate.com)
------------------------------
Date: 07 Feb 2003 12:47:10 -0500
From: clists@perrin.socsci.unc.edu (Andrew Perrin (CLists))
Subject: Re: Re-running DBI 'prepare' on same statement handle
Message-Id: <843cn0kmox.fsf@perrin.socsci.unc.edu>
Brian McCauley <nobull@mail.com> writes:
> john_ramsden@sagitta-ps.com (John Ramsden) writes:
>
> > Subject: Re-running DBI 'prepare' on same statement handle
>
> [snip]
> You are concerned about the first object getting desctroyed.
>
I understood the OP to be worried about the database statement handle
getting destroyed, not the perl object that points to it. It's true
that any method of undefining $sth will make sure the perl object is
destroyed, but I'm not sure it's guaranteed that the DBD will close
the handle correctly (is it?). If not, then an:
$sth->finish;
should do the trick.
--
----------------------------------------------------------------------
Andrew J Perrin - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
clists@perrin.socsci.unc.edu * andrew_perrin (at) unc.edu
------------------------------
Date: 07 Feb 2003 18:11:50 +0000
From: Brian McCauley <nobull@mail.com>
Subject: Re: Re-running DBI 'prepare' on same statement handle
Message-Id: <u9el6kufix.fsf@wcl-l.bham.ac.uk>
clists@perrin.socsci.unc.edu (Andrew Perrin (CLists)) writes:
> Brian McCauley <nobull@mail.com> writes:
>
> > john_ramsden@sagitta-ps.com (John Ramsden) writes:
> >
> > > Subject: Re-running DBI 'prepare' on same statement handle
> >
> > [snip]
> > You are concerned about the first object getting desctroyed.
> >
>
> I understood the OP to be worried about the database statement handle
> getting destroyed, not the perl object that points to it. It's true
> that any method of undefining $sth will make sure the perl object is
> destroyed, but I'm not sure it's guaranteed that the DBD will close
> the handle correctly (is it?).
Well "guaranteed" would be a strong word - but if it didn't then it
would definitely be a bug.
The documentation of the finish() method really labours the point that
you almost never need to call finish(). I take this to mean that the
authors of DBI intended that active statement handles could simply be
destroyed in the same way as inactive ones. Not that such evidence is
really needed - in an OO mindset the idea that destruction works OK is
the null-hypothesis - only exeptions to this rule require explicit
documentation.
> If not, then an:
>
> $sth->finish;
>
> should do the trick.
Definitely not. If there's a resource leak in the DBD then it won't
be likely to make any difference if the finish() is explict or implicit.
Anyhow $sth->finish only frees up resouces (both in Perl and the DB
server) that pertain to an incompletely processed result set. It
returns the statement handle to the inactive state. There will still
be resouces (again on both) allocated to the inactive statement handle
and these will not be freed until descruction of the statement handle.
To free up all resouces[1]:
undef $sth;
[1] Unless, of course, the handle came from prepare_cached().
--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
------------------------------
Date: Fri, 07 Feb 2003 16:16:11 -0700
From: Ron Reidy <rereidy@indra.com>
Subject: Re: Re-running DBI 'prepare' on same statement handle
Message-Id: <3E443E3B.1050708@indra.com>
John Ramsden wrote:
> Ron Reidy <rereidy@indra.com> wrote in message news:<3E439E18.8090301@indra.com>...
>
>>See below ...
>>
>>John Ramsden wrote:
>>
[ snip ]
>>
>>Try $sth->finish
>
>
> Thanks. yes, I wondered about that. But isn't finish() more for
> deallocating resources after the statement has been executed?
Kind of. finish() is normally used to clean up a statement handle when
it (the sth) is not needed **before** all data is retrieved. Memory
resources are freed and in particular (in the case of Oracle), the SGA
and PGA memory resources are freed (not too sure about this - I read it
somewhere a long time ago).
As this may apply to your situation, <NOT_TESTED>$sth->finish would
allow you to create a new cursor and assign it to the statement handle.
A possibility exists wherein opening a new cursor on a non-freed
statement handle could cause a memory leak (there are JDBC drivers that
do this with Oracle I am told).</NOT_TESTED>
>
> In particular, can't you do the following sequence, which implies
> that finish() doesn't clear up the prepare data?:
>
> prepare
>
> execute
> :::
> finish
>
> execute
> :::
> finish
>
> :::
>
> Cheers
>
> John Ramsden (john_ramsden@sagitta-ps.com and jr@adslate.com)
------------------------------
Date: Fri, 07 Feb 2003 23:38:33 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Re-running DBI 'prepare' on same statement handle
Message-Id: <3E4489C9.DBADE70A@earthlink.net>
John Ramsden wrote:
> Ron Reidy <rereidy@indra.com> wrote in message [snip]
> > Try $sth->finish
>
> Thanks. yes, I wondered about that. But isn't finish() more for
> deallocating resources after the statement has been executed?
When you've fetched all of the records from a statement handle, the
resources used by it are automatically deallocated, so that it's just
like a handle which has been prepared but not executed.
However, if you fetch part of the data from that query, but stop before
reaching the end, the resources of the partially-fetched query are not
freed, until either ->finish is called on the handle (which, like
fetching all of the data, returns it to a pre-execute()ed state), or
else until all references to the handle go out of scope (usually by
doing undef $sth, or by assigning something else to $sth).
> In particular, can't you do the following sequence, which implies
> that finish() doesn't clear up the prepare data?:
[snip]
Quite right, what you should can indeed be done. However, a
prepared-but-not-executed statement handle usually takes up relatively
little memory. Further, the "finish"es you have in your example aren't
needed if, after every execute, you fetch all the rows that the handle
has to provide for you.
--
"So, who beat the clueless idiot today?"
"Well, we flipped for it, but when Kuno
landed, he wasn't in any shape to fight."
"Next time, try flipping a *coin.*"
------------------------------
Date: Fri, 07 Feb 2003 23:52:15 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: Re-running DBI 'prepare' on same statement handle
Message-Id: <3E448CFF.D9CDA3A@earthlink.net>
John Ramsden wrote:
>
> I have a monitoring app which needs to store statement handles
> for DBI $dbh->prepare() calls, but on occasion the WHERE clause
> of some SQL statements will change and when this happens the app
> must rerun every prepare().
If you use prepare_cached all over, then it would be safe to run the
prepare every time, and let the handle go out of scope after each use.
DBI itself will keep the old handles for you to avoid needing to
*actually* re-prepare them, and will of course realize when your SQL has
changed and prepare new ones as needed.
And, if your SQL changes back to what it had originally been, then SQL
will still have the original statement handles cached to give back to
you.
> What concerns me slightly though is that if the same statement
> handle is assigned the value returned by a new prepare,
This doesn't happen. The new perpare creates a new statement handle.
If you happen to store it in a variable which already held a statement
handle, that's not important or relevant.
> will the memory associated with the preceding prepare be freed?
Mere prepare()d statements generally don't use much memory; statements
which have been prepared, executed, but not completely fetched use more.
Anyway, assigning something else to the variable, whether it be a
string, and integer, an undef, a reference to a (hash, array, scalar),
or another statement handle, will decrease the refcount of whatever had
been there.
When the refcount of the old handle goes to zero, it's DESTROY gets
called, and when DESTROY returns, the perl-level data structures it used
get their refcounts decreased. Presumably, the DESTROY method will make
calls into the database backend, and this will free up the memory used.
> Or will it hang around, and possibly cause problems when the
> statement is executed?
If the refcount didn't go to zero, and the old statement wasn't
DESTROYed, this won't do any harm to the new sth stored into that
variable -- it is, after all, an entirely seperate handle.
If the old sth gets DESTROYed, but the DESTROY method is buggy, and
doesn't free up the memory used by the backend, then it's possible that
eventually the Oracle server or whatever will run out of memory.
> In practice this re-prepare shouldn't happen very often during
> the lifetime of the program; but if there is some way I can
> explicitly ditch any resources associated with a prepare
> before rerunning it, I'd feel more comfortable.
If you want, you could do:
$sth = undef;
$sth = $dbh->prepare( .... );
This will force the old handle's refcount to be decreased *before* the
new handle is created.
> (I tried $sth->free(); but that function name isn't recognized.
> Would either of 'undef $sth' or '$sth->destroy()' work?)
'undef $sth' would work.
--
"So, who beat the clueless idiot today?"
"Well, we flipped for it, but when Kuno
landed, he wasn't in any shape to fight."
"Next time, try flipping a *coin.*"
------------------------------
Date: 7 Feb 2003 09:14:47 -0800
From: flamesplash@yahoo.com (Flamesplash)
Subject: Reference issues in Class::Struct
Message-Id: <3386196d.0302070914.38f880b9@posting.google.com>
Here's the code
--------
#!/usr/local/bin/perl
use Class::Struct;
struct foo =>
{
bar => '@',
};
$f = foo->new();
$barRef = $f->{bar};
foreach ( @$f )
{}
foreach ( @($f->{bar} ) )
{}
---------
which gives the following error
Scalar found where operator expected at ./tmp.pl line 16, at end of
line
(Missing operator before ?)
syntax error at ./tmp.pl line 16, near "@($f"
Execution of ./tmp.pl aborted due to compilation errors.
[5] - Exit 2 ./tmp.pl
It compains about the second foreach not the first. It also gives
this error if I do @$f->{bar} ( which I would consider plain
ambiguous. )
Am I using the reference incorrectly, or is this because i'm using
Perl
5.004 ( which is unfortunatly not my fault )
Thanks
------------------------------
Date: 07 Feb 2003 17:24:18 +0000
From: Brian McCauley <nobull@mail.com>
Subject: Re: Reference issues in Class::Struct
Message-Id: <u9smv0uhq5.fsf@wcl-l.bham.ac.uk>
flamesplash@yahoo.com (Flamesplash) writes:
> Subject: Reference issues in Class::Struct
I know nothing about Class::Struct. This doesn't matter as your
question doesn't seem to be about Class::Struct.
> foreach ( @($f->{bar} ) )
> syntax error at ./tmp.pl line 16, near "@($f"
You apper to have () where you should have {}.
> It also gives this error if I do @$f->{bar} ( which I would consider
> plain ambiguous. )
No, it's plain wrong. It is disambiguated by the rule that @ binds
tighter than the ->.
> Am I using the reference incorrectly,
Yes, correct use of references can be found in perlref/"Using References"
> or is this because i'm using Perl 5.004 ( which is unfortunatly not
> my fault )
No.
--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
------------------------------
Date: Sat, 8 Feb 2003 22:15:00 -0800
From: "shane" <flamesplash@nospam.yahoo.com>
Subject: Re: Reference issues in Class::Struct
Message-Id: <b24h1p$qip$1@bob.news.rcn.net>
> > syntax error at ./tmp.pl line 16, near "@($f"
>
> You apper to have () where you should have {}.
You are so correct. My font made {} look like () on the Class::Struct
manual site. sigh.
Thank You.
------------------------------
Date: Fri, 07 Feb 2003 20:39:37 GMT
From: Andrew Lee <spamtrap@nowhere.com>
Subject: Re: reference to abc.pl?a=b does not display as html in N6
Message-Id: <8c684vc1v10o1rb0h5t46uplnltfb0qh08@4ax.com>
On Wed, 05 Feb 2003 12:25:24 -0700, "G.G. Campbell"
<campbell@cira.colostate.edu> wrote:
>I have a perl script which returns html text.
>But N6 interprets the following address as perl
>and asks whether to start perl.
>
>http://isccp.cira.colostate.edu/scripts/monmat.pl?yrmn=8505
>
>This works in IE6.
>
>The perl script it self begins with
>print "Content-type: text/html\n\n";
>
>please reply to campbell@cira.colostate.edu
>as I do not visit here often.
I assure you, I email you even less often that you visit here.
------------------------------
Date: 8 Feb 2003 17:09:43 -0800
From: allancady@yahoo.com (Allan Cady)
Subject: Regex difficulty
Message-Id: <d563b154.0302081709.6d2fb3ba@posting.google.com>
I posted a few days ago about wanting to do some transformations on
HTML files... since the "right" solution to my problem will require me
to master several new aspects of Perl, I've been making do with
regular expressions for now. But this bit has me stumped, and it's
something I think I should be able to do.
I want to get rid of tag pairs like this:
<A ...><IMG SRC="theimage.gif" ...></A>
In other words, all anchors that use theimage.gif as their image.
Here's what I thought would work:
$data =~ s{<A .*?theimage.*?/A>}{}sgi;
I thought that the .*? minimal match operators would limit the match
to what's between the innermost <a></a> tags. But not so... it
matches further out, both ahead and beyond, to other anchor tags.
A similar thing works if there's only a single .*? within the regex,
but with two, it works differently.
Is there a rule here that I need to learn?
As for a workaround, I can get the result I want with this:
$data =~ s{(.*)(<A .*?theimage.*?/A>)(.*)}{\1\3}sgi;
but this slows my script down by like a factor of 5.
Help please? Thanks.
-Allan
------------------------------
Date: Sun, 09 Feb 2003 02:04:29 GMT
From: "Jürgen Exner" <jurgenex@hotmail.com>
Subject: Re: Regex difficulty
Message-Id: <NGi1a.6838$F25.1864@nwrddc02.gnilink.net>
Allan Cady wrote:
> I posted a few days ago about wanting to do some transformations on
> HTML files... since the "right" solution to my problem will require me
> to master several new aspects of Perl, I've been making do with
> regular expressions for now.
[...]
> Help please? Thanks.
Help you shooting yourself in the foot? I don't think so.
You will be better off using the right solution.
And people will be much more willing to help you, too.
jue
------------------------------
Date: 09 Feb 2003 03:05:51 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: Regex difficulty
Message-Id: <slrnb4bhcf.sb.abigail@alexandra.abigail.nl>
Allan Cady (allancady@yahoo.com) wrote on MMMCDXLIX September MCMXCIII in
<URL:news:d563b154.0302081709.6d2fb3ba@posting.google.com>:
\\ I posted a few days ago about wanting to do some transformations on
\\ HTML files... since the "right" solution to my problem will require me
\\ to master several new aspects of Perl, I've been making do with
\\ regular expressions for now. But this bit has me stumped, and it's
\\ something I think I should be able to do.
\\
\\ I want to get rid of tag pairs like this:
\\
\\ <A ...><IMG SRC="theimage.gif" ...></A>
\\
\\ In other words, all anchors that use theimage.gif as their image.
\\
\\ Here's what I thought would work:
\\
\\ $data =~ s{<A .*?theimage.*?/A>}{}sgi;
\\
\\ I thought that the .*? minimal match operators would limit the match
\\ to what's between the innermost <a></a> tags. But not so... it
\\ matches further out, both ahead and beyond, to other anchor tags.
\\
\\ A similar thing works if there's only a single .*? within the regex,
\\ but with two, it works differently.
I'm confused. Compared to having two .*?, does it work similar, or
different if there's only one .*?
\\ Is there a rule here that I need to learn?
Yes. The most important rule is "if there is a possible match, the
regex will not fail". That's more important than minimal matching.
Also, leftmost matching is more imporant than minimal matching.
So, if your text is:
Text <A name = "foo"><IMG SRC = "fnord"</A> More text
<A name = "bar"><IMG SRC = "theimage.gif"></A> End
The pattern will match from the first "<A" to the last "</A>".
\\ As for a workaround, I can get the result I want with this:
\\
\\ $data =~ s{(.*)(<A .*?theimage.*?/A>)(.*)}{\1\3}sgi;
\\
\\ but this slows my script down by like a factor of 5.
\\
\\ Help please? Thanks.
You might want to try something like:
s{<A[^>]*>[^<]*<IMG[^>]*theimage[^>]*>[^<]*</A>}{}sgi;
but that assumes no other elements in the anchor to be present.
And it's also untested.
Abigail
--
perl -Mstrict='}); print "Just another Perl Hacker"; ({' -le1
------------------------------
Date: Sun, 09 Feb 2003 06:09:35 -0000
From: Eric Wong <egwong@netcom.com>
Subject: Re: Regex difficulty
Message-Id: <v4bs4vr5mjrg94@corp.supernews.com>
Allan Cady <allancady@yahoo.com> wrote:
> I posted a few days ago about wanting to do some transformations on
> HTML files... since the "right" solution to my problem will require me
> to master several new aspects of Perl, I've been making do with
> regular expressions for now. But this bit has me stumped, and it's
> something I think I should be able to do.
>
> I want to get rid of tag pairs like this:
>
> <A ...><IMG SRC="theimage.gif" ...></A>
>
> In other words, all anchors that use theimage.gif as their image.
>
> Here's what I thought would work:
>
> $data =~ s{<A .*?theimage.*?/A>}{}sgi;
[cut]
You can simplify the problem by making two passes over your file. First,
remove the <img> tags, next, remove the empty <a></a>'s.
------------------------------
Date: 8 Feb 2003 23:59:03 -0800
From: allancady@yahoo.com (Allan Cady)
Subject: Re: Regex difficulty
Message-Id: <d563b154.0302082359.4465c22a@posting.google.com>
"J rgen Exner" <jurgenex@hotmail.com> wrote...
> Help you shooting yourself in the foot? I don't think so.
> You will be better off using the right solution.
> And people will be much more willing to help you, too.
Wow, I'm sure glad somebody's looking out for me by not helping me!
:-)
(Your previous reply, referring me to HTML::Parse, was indeed helpful
and duly noted. I read the FAQ, and I will pursue it further soon.)
But I think my question is a perfectly legitimate one about how Perl
handles minimal matching. If I hadn't told you my motivation, would
you have been more willing to help me?
-Allan
------------------------------
Date: 9 Feb 2003 01:04:51 -0800
From: allancady@yahoo.com (Allan Cady)
Subject: Re: Regex difficulty
Message-Id: <d563b154.0302090104.ae20260@posting.google.com>
Abigail <abigail@abigail.nl> wrote...
> I'm confused. Compared to having two .*?, does it work similar, or
> different if there's only one .*?
It seems to work differently.
If I say this:
$data =~ s{<A .*?</A>}{}g;
and my text is this:
SomeStuff<A HREF="link1">Text1</A>MoreStuff<A
HREF="link2">Text2</A>YetMoreStuff
After the substitution, I'm left with this:
SomeStuffMoreStuffYetMoreStuff
rather than this:
SomeStuffYetMoreStuff
In other words, it substituted out the stuff inside the <A></A> pairs,
but not between them. It wasn't greedier than I wanted, which is the
problem I had when there were two .*? in the expression.
> The most important rule is "if there is a possible match, the
> regex will not fail". That's more important than minimal matching.
> Also, leftmost matching is more imporant than minimal matching.
I think the explanation is in those statements. I'll have to ponder
it a bit more.
> You might want to try something like:
>
> s{<A[^>]*>[^<]*<IMG[^>]*theimage[^>]*>[^<]*</A>}{}sgi;
Thanks, I'll take a look at that too.
------------------------------
Date: 09 Feb 2003 12:14:44 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: Regex difficulty
Message-Id: <slrnb4chhk.1ml.abigail@alexandra.abigail.nl>
Allan Cady (allancady@yahoo.com) wrote on MMMCDXLIX September MCMXCIII in
<URL:news:d563b154.0302090104.ae20260@posting.google.com>:
;; Abigail <abigail@abigail.nl> wrote...
;; > I'm confused. Compared to having two .*?, does it work similar, or
;; > different if there's only one .*?
;;
;; It seems to work differently.
;;
;; If I say this:
;; $data =~ s{<A .*?</A>}{}g;
;;
;; and my text is this:
;; SomeStuff<A HREF="link1">Text1</A>MoreStuff<A
;; HREF="link2">Text2</A>YetMoreStuff
;;
;; After the substitution, I'm left with this:
;; SomeStuffMoreStuffYetMoreStuff
;;
;; rather than this:
;; SomeStuffYetMoreStuff
Of course. .*? does minimal matching.
;; In other words, it substituted out the stuff inside the <A></A> pairs,
;; but not between them. It wasn't greedier than I wanted, which is the
;; problem I had when there were two .*? in the expression.
What do you mean by that?
Abigail
--
sub camel (^#87=i@J&&&#]u'^^s]#'#={123{#}7890t[0.9]9@+*`"'***}A&&&}n2o}00}t324i;
h[{e **###{r{+P={**{e^^^#'#i@{r'^=^{l+{#}H***i[0.9]&@a5`"':&^;&^,*&^$43##@@####;
c}^^^&&&k}&&&}#=e*****[]}'r####'`=437*{#};::'1[0.9]2@43`"'*#==[[.{{],,,1278@#@);
print+((($llama=prototype'camel')=~y|+{#}$=^*&[0-9]i@:;`"',.| |d)&&$llama."\n");
------------------------------
Date: Sun, 9 Feb 2003 07:17:18 -0600
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: Regex difficulty
Message-Id: <slrnb4cl6u.u3s.tadmc@magna.augustmail.com>
Allan Cady <allancady@yahoo.com> wrote:
> If I say this:
> $data =~ s{<A .*?</A>}{}g;
>
> and my text is this:
> SomeStuff<A HREF="link1">Text1</A>MoreStuff<A
> HREF="link2">Text2</A>YetMoreStuff
>
> After the substitution, I'm left with this:
> SomeStuffMoreStuffYetMoreStuff
No you aren't. You are left with:
SomeStuff<A HREF="link1">Text1</A>MoreStuff<A
HREF="link2">Text2</A>YetMoreStuff
Since the pattern does not match (it has no s///s option).
You should take care that your newsreader does not "helpfully"
wrap long lines for you.
Or, even better, you should say it in Perl, as suggested in
the Posting Guidelines:
$data = 'SomeStuff<A HREF="link1">Text1</A>MoreStuff<A'
. ' HREF="link2">Text2</A>YetMoreStuff';
It is crucial to have the *exact* string and pattern when
evaluating a match. Your newsreader deleted the space before
the 2nd HREF, as well as line wrapping for you.
Such problems will not occur if you adhere to the suggested guidelines.
>> The most important rule is "if there is a possible match, the
>> regex will not fail". That's more important than minimal matching.
>> Also, leftmost matching is more imporant than minimal matching.
>
> I think the explanation is in those statements.
You are exactly right.
Another way of phrasing the rule:
minimal matching matches a little as possible,
*while still allowing the overall match to succeed*
One more thing to keep in mind:
minimal vs greedy matching *never* affects whether the match
will succeed or fail. It only affects _how_ the match will
succeed (or fail).
If it matches with greedy, it will match with nongreedy. If it fails
to match with greedy, it will fail to match with nongreedy.
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: Sun, 9 Feb 2003 07:32:23 -0600
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: Regex difficulty
Message-Id: <slrnb4cm37.u3s.tadmc@magna.augustmail.com>
Allan Cady <allancady@yahoo.com> wrote:
> I posted a few days ago about wanting to do some transformations on
> HTML files... since the "right" solution to my problem will require me
> to master several new aspects of Perl, I've been making do with
> regular expressions for now.
The "right" solution is for the general case (ie. where the HTML
is arbitrary).
You can often get along with regexes in specific special cases,
such as when you control the generation of the HTML being
"parsed" so that you _know_ none of the "gotchas" will getcha,
or where you are willing to have your program fail horribly
in response to legal-yet-uncommon HTML foibles.
Something as simple as </A > will break all of the code I've
seen in this thread for instance (including mine below).
> I want to get rid of tag pairs like this:
>
><A ...><IMG SRC="theimage.gif" ...></A>
>
> In other words, all anchors that use theimage.gif as their image.
This is not likely to be very fast, but at least it does the
right thing in many (but not all) situations:
-----------------------------
#!/usr/bin/perl
use strict;
use warnings;
$_ = 'Text <A name = "foo"><IMG SRC = "fnord">Fnord pic</A> More text
<A name = "bar"><IMG SRC = "theimage.gif">TheImage pic</A> End
<A name = "bar"><IMG SRC = "theimage.gif">Yet another TheImage pic</A> End
';
print;
print "-----\n";
s#(<A.*?</A>)# (index($1, '"theimage.gif"') >= 0) ? '' : $1 #sige;
print;
-----------------------------
As further proof of the fragility of not doing a Real Parse, the
above will not do The Right Thing with:
$_ = 'Before <A name = "bar"><IMG SRC = "altimage.gif">'
. 'Use this link instead of "theimage.gif"</A> End';
> As for a workaround, I can get the result I want with this:
^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^
That does not give you the result you want, you just haven't
tested it with data that will expose its problems. Try it
with the data as in my program above...
> $data =~ s{(.*)(<A .*?theimage.*?/A>)(.*)}{\1\3}sgi;
^ ^
^ ^
You should always enable warnings when developing Perl code, as
also suggested in the Posting Guidelines.
Please read the Posting Guidelines:
http://mail.augustmail.com/~tadmc/clpmisc.shtml
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: 9 Feb 2003 11:16:42 -0800
From: allancady@yahoo.com (Allan Cady)
Subject: Re: Regex difficulty
Message-Id: <d563b154.0302091116.60af7977@posting.google.com>
Thanks everyone for being tolerant of my newbie blunders and lack of
clarity... it's not always easy to know how to ask questions when you
can't yet see the forest for the trees. I'll definitely check out the
posting guidelines.
> > As for a workaround, I can get the result I want with this:
> ^^^^^^^^^^^^^^^^^
> ^^^^^^^^^^^^^^^^^
>
> That does not give you the result you want, you just haven't
> tested it with data that will expose its problems. Try it
> with the data as in my program above...
Right now, the result I want is VERY narrow in scope... I'm not
writing code that anyone else will even see (except for here), or see
the results of, let alone production code. In fact, the task is
already accomplished -- I'm just trying to learn a few things along
the way. So I appreciate the guidance, and I'll do my best to
understand what y'all have offered. :)
-Allan
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 4540
***************************************