[28255] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 9619 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Aug 17 14:25:43 2006

Date: Thu, 17 Aug 2006 11:25:31 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 17 Aug 2006     Volume: 10 Number: 9619

Today's topics:
        regular expression strangeness <wibble436-groups@yahoo.co.uk>
    Re: regular expression strangeness <mgarrish@gmail.com>
    Re: regular expression strangeness <mgarrish@gmail.com>
    Re: regular expression strangeness <wibble436-groups@yahoo.co.uk>
    Re: regular expression strangeness <mgarrish@gmail.com>
    Re: regular expression strangeness <1usa@llenroc.ude.invalid>
    Re: regular expression strangeness <hjp-usenet2@hjp.at>
    Re: regular expression strangeness <mgarrish@gmail.com>
    Re: regular expression strangeness <mgarrish@gmail.com>
    Re: regular expression strangeness <1usa@llenroc.ude.invalid>
    Re: regular expression strangeness <mgarrish@gmail.com>
    Re: regular expression strangeness <1usa@llenroc.ude.invalid>
    Re: regular expression strangeness <mgarrish@gmail.com>
    Re: regular expression strangeness <nospam-abuse@ilyaz.org>
    Re: regular expression strangeness <wibble436-groups@yahoo.co.uk>
    Re: regular expression strangeness anno4000@radom.zrz.tu-berlin.de
    Re: regular expression strangeness <syscjm@gwu.edu>
    Re: regular expression strangeness anno4000@radom.zrz.tu-berlin.de
    Re: regular expression strangeness <benmorrow@tiscali.co.uk>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 16 Aug 2006 15:05:01 -0700
From: "greendogday" <wibble436-groups@yahoo.co.uk>
Subject: regular expression strangeness
Message-Id: <1155765901.664968.213590@i3g2000cwc.googlegroups.com>

How come the following:

my $s = 'here "is" some "text" stuff';
if ($s =~ /(")([^"]*)"/) { print "$2\n" }
if ($s =~ /(")([^\1]*)"/) { print "$2\n" }

outputs this:

is
is" some "text

Why doesn't the 2nd one work the same as the first? How did it skip
over the quotes in the middle when it is meant to match with
non-quotes?

Thanks,



------------------------------

Date: 16 Aug 2006 15:31:55 -0700
From: "Matt Garrish" <mgarrish@gmail.com>
Subject: Re: regular expression strangeness
Message-Id: <1155767515.886864.35950@i3g2000cwc.googlegroups.com>


greendogday wrote:

> How come the following:
>
> my $s = 'here "is" some "text" stuff';
> if ($s =~ /(")([^"]*)"/) { print "$2\n" }
> if ($s =~ /(")([^\1]*)"/) { print "$2\n" }

Out of curiosity why are you capturing a character that doesn't change?
You also should assume the first match worked when performing the
seond, and you should use the proper $1 when referring matches (\1 is
for backreferncing a match).

> Why doesn't the 2nd one work the same as the first?

Because you're greedily matching anything that isn't a 1 after the
first double quote up to the last.

Matt



------------------------------

Date: 16 Aug 2006 15:33:29 -0700
From: "Matt Garrish" <mgarrish@gmail.com>
Subject: Re: regular expression strangeness
Message-Id: <1155767609.695077.163200@m79g2000cwm.googlegroups.com>


Matt Garrish wrote:

> greendogday wrote:
>
> > How come the following:
> >
> > my $s = 'here "is" some "text" stuff';
> > if ($s =~ /(")([^"]*)"/) { print "$2\n" }
> > if ($s =~ /(")([^\1]*)"/) { print "$2\n" }
>
> Out of curiosity why are you capturing a character that doesn't change?
> You also should assume the first match worked when performing the
> seond

"shouldn't" of course...

Matt



------------------------------

Date: 16 Aug 2006 15:35:41 -0700
From: "greendogday" <wibble436-groups@yahoo.co.uk>
Subject: Re: regular expression strangeness
Message-Id: <1155767741.592855.55910@i3g2000cwc.googlegroups.com>


> Out of curiosity why are you capturing a character that doesn't change?

It's just a cut down version of a problem, to show where the problem
lies.
The original had a match for either a double or single quote.

> You also should assume the first match worked when performing the
> seond, and you should use the proper $1 when referring matches (\1 is
> for backreferncing a match).

I don't understand that. I am backreferencing a match - the double
quote.

> > Why doesn't the 2nd one work the same as the first?
>
> Because you're greedily matching anything that isn't a 1 after the
> first double quote up to the last.

But it should be "not a double quote", shouldn't it? Not a 1.



------------------------------

Date: 16 Aug 2006 15:51:50 -0700
From: "Matt Garrish" <mgarrish@gmail.com>
Subject: Re: regular expression strangeness
Message-Id: <1155768710.868619.194580@b28g2000cwb.googlegroups.com>


Matt Garrish wrote:

> greendogday wrote:
>
> > How come the following:
> >
> > my $s = 'here "is" some "text" stuff';
> > if ($s =~ /(")([^"]*)"/) { print "$2\n" }
> > if ($s =~ /(")([^\1]*)"/) { print "$2\n" }
>
> Out of curiosity why are you capturing a character that doesn't change?
> You also should assume the first match worked when performing the
> seond, and you should use the proper $1 when referring matches (\1 is
> for backreferncing a match).
>

Sorry, guilty of skimming. I thought you were trying to reference the
match in the first from the second. You can't use a backreference
inside a character class because character classes are meant to contain
literal characters so you can't build them dynamically during
evaluation. My point still stands, you're telling perl to find anything
that is not a 1 [^\1]. You could do [^$1] (which is what I thought you
were trying to do) because $1 is set in the preceding match so it gets
interpolated when the regular expression is compiled, but that's
obviously not possible from a single substitution and not what you
need. Try a negative lookahead assertion instead.

Matt



------------------------------

Date: Wed, 16 Aug 2006 23:05:25 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: regular expression strangeness
Message-Id: <Xns9821C24CCA3Easu1cornelledu@127.0.0.1>

"greendogday" <wibble436-groups@yahoo.co.uk> wrote in 
news:1155767741.592855.55910@i3g2000cwc.googlegroups.com:

[ Please do not snip attributions when you reply ]

> 
>> Out of curiosity why are you capturing a character that doesn't
>> change?
> 
> It's just a cut down version of a problem, to show where the problem
> lies. The original had a match for either a double or single quote.

I think you are asking a FAQ in disguise:

perldoc -q inside

>> > Why doesn't the 2nd one work the same as the first?
>>
>> Because you're greedily matching anything that isn't a 1 after the
>> first double quote up to the last.
> 
> But it should be "not a double quote", shouldn't it? Not a 1.

See

perldoc perlreref

See the paragraph starting with "The following sequences work within or 
without a character class."

#!/usr/bin/perl

use strict;
use warnings;

my $s = 'here "is" some "text" stuff';

if ( $s =~ /(")([^"]*)"/ ) { 
    print "$2\n";
}

if ( $s =~ /(")([^$1]*)"/ ) { 
    print "$2\n";
}

my $rx1 = qr/(")([^"]*)"/;
my $rx2 = qr/(")([^$1]*)"/;
my $rx3 =  qr/(")([^\1]*)"/;


use YAPE::Regex::Explain;

for my $r ( $rx1, $rx2, $rx3 ) {
    print YAPE::Regex::Explain->new($r)->explain, "\n";
}




-- 
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html



------------------------------

Date: Thu, 17 Aug 2006 01:00:27 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: regular expression strangeness
Message-Id: <pan.2006.08.16.23.00.27.30236@hjp.at>

On Wed, 16 Aug 2006 15:05:01 -0700, greendogday wrote:

> How come the following:
> 
> my $s = 'here "is" some "text" stuff';
> if ($s =~ /(")([^"]*)"/) { print "$2\n" }
> if ($s =~ /(")([^\1]*)"/) { print "$2\n" }
> 
> outputs this:
> 
> is
> is" some "text
> 
> Why doesn't the 2nd one work the same as the first? How did it skip
> over the quotes in the middle when it is meant to match with
> non-quotes?

I don't think \1 is supposed to be a backreference inside a character
class (what if the first () matched more than one character?).


if ($s =~ /(")(.*?)\1/) { print "$2\n" }

works as expected.

	hp

-- 
   _  | Peter J. Holzer    | > Wieso sollte man etwas erfinden was nicht
|_|_) | Sysadmin WSR       | > ist?
| |   | hjp@hjp.at         | Was sonst wäre der Sinn des Erfindens?
__/   | http://www.hjp.at/ |	-- P. Einstein u. V. Gringmuth in desd



------------------------------

Date: 16 Aug 2006 16:22:19 -0700
From: "Matt Garrish" <mgarrish@gmail.com>
Subject: Re: regular expression strangeness
Message-Id: <1155770539.584975.54070@h48g2000cwc.googlegroups.com>


A. Sinan Unur wrote:

> "greendogday" <wibble436-groups@yahoo.co.uk> wrote in
> news:1155767741.592855.55910@i3g2000cwc.googlegroups.com:
>
> [ Please do not snip attributions when you reply ]
>
> >
> >> Out of curiosity why are you capturing a character that doesn't
> >> change?
> >
> > It's just a cut down version of a problem, to show where the problem
> > lies. The original had a match for either a double or single quote.
>
> I think you are asking a FAQ in disguise:
>
> perldoc -q inside
>
> >> > Why doesn't the 2nd one work the same as the first?
> >>
> >> Because you're greedily matching anything that isn't a 1 after the
> >> first double quote up to the last.
> >
> > But it should be "not a double quote", shouldn't it? Not a 1.
>
> See
>
> perldoc perlreref
>
> See the paragraph starting with "The following sequences work within or
> without a character class."
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my $s = 'here "is" some "text" stuff';
>
> if ( $s =~ /(")([^"]*)"/ ) {
>     print "$2\n";
> }
>
> if ( $s =~ /(")([^$1]*)"/ ) {
>     print "$2\n";
> }
>

I'm learning something new today. I didn't think you could dynamically
interpolate into a character class, but it appears you can, but only if
$1 has been set before you try and do it. If you take your example
above and remove the first pattern match, you'll get a compilation
error. I assumed perl would only accept $1 from the first expression
when compiling the second, but so long as $1 has been set it doesn't
matter what it has been set to as the first set of parens will
override:

my $s = 'here "is" some "text" stuff';
if ($s =~ /(")([^$1]*)"/) { print "$2\n" }

outputs:
Unmatched [ in regex; marked by <-- HERE in m/(")([ <-- HERE ^]*)"/ at

however:

my $s = 'here "is" some "text" stuff';
if ($s =~ /( )([^"]*)"/) { print "$2\n" }
if ($s =~ /(")([^$1]*)"/) { print "$2\n" }

outputs:

is

Is this a bug in perl or a feature?

Matt



------------------------------

Date: 16 Aug 2006 16:29:17 -0700
From: "Matt Garrish" <mgarrish@gmail.com>
Subject: Re: regular expression strangeness
Message-Id: <1155770957.144205.112720@m79g2000cwm.googlegroups.com>


Matt Garrish wrote:

> A. Sinan Unur wrote:
>
> > "greendogday" <wibble436-groups@yahoo.co.uk> wrote in
> > news:1155767741.592855.55910@i3g2000cwc.googlegroups.com:
> >
> > [ Please do not snip attributions when you reply ]
> >
> > >
> > >> Out of curiosity why are you capturing a character that doesn't
> > >> change?
> > >
> > > It's just a cut down version of a problem, to show where the problem
> > > lies. The original had a match for either a double or single quote.
> >
> > I think you are asking a FAQ in disguise:
> >
> > perldoc -q inside
> >
> > >> > Why doesn't the 2nd one work the same as the first?
> > >>
> > >> Because you're greedily matching anything that isn't a 1 after the
> > >> first double quote up to the last.
> > >
> > > But it should be "not a double quote", shouldn't it? Not a 1.
> >
> > See
> >
> > perldoc perlreref
> >
> > See the paragraph starting with "The following sequences work within or
> > without a character class."
> >
> > #!/usr/bin/perl
> >
> > use strict;
> > use warnings;
> >
> > my $s = 'here "is" some "text" stuff';
> >
> > if ( $s =~ /(")([^"]*)"/ ) {
> >     print "$2\n";
> > }
> >
> > if ( $s =~ /(")([^$1]*)"/ ) {
> >     print "$2\n";
> > }
> >
>
> I'm learning something new today. I didn't think you could dynamically
> interpolate into a character class, but it appears you can, but only if
> $1 has been set before you try and do it. If you take your example
> above and remove the first pattern match, you'll get a compilation
> error. I assumed perl would only accept $1 from the first expression
> when compiling the second, but so long as $1 has been set it doesn't
> matter what it has been set to as the first set of parens will
> override:
>

Hmm, didn't learn anything but I've been away from perl too long. A bad
assumption on my part. I was right the first time, but used a bad test
case and thought I was wrong. Switching my example to capture (i)
proved that it is picking up from the first pattern match and you can't
create dynamically as I thought. I'd still be interested in hearing why
the pattern only compiles if $1 has been set.

Matt



------------------------------

Date: Wed, 16 Aug 2006 23:35:27 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: regular expression strangeness
Message-Id: <Xns9821C764246Casu1cornelledu@127.0.0.1>

"Matt Garrish" <mgarrish@gmail.com> wrote in 
news:1155770539.584975.54070@h48g2000cwc.googlegroups.com:

> I'm learning something new today. I didn't think you could dynamically
> interpolate into a character class, 

Sure you can. Regexen behave as double-quoted strings.

> but it appears you can, but only if $1 has been set before you try and 
> do it. If you take your example above and remove the first pattern 
> match, you'll get a compilation error.

I think that's a runtime error caused by the fact that $1 is not 
defined:

D:\UseNet\clpmisc> cat t59.pl
#!/usr/bin/perl

use strict;
use warnings;

my $s = 'here "is" some "text" stuff';
if ($s =~ /(")([^$1]*)"/) { print "$2\n" }

__END__


D:\UseNet\clpmisc> t59
Use of uninitialized value in concatenation (.) or string at D:\UseNet
\clpmisc\t59.pl line 7.
Unmatched [ in regex; marked by <-- HERE in m/(")([ <-- HERE ^]*)"/ at 
D:\UseNet\clpmisc\t59.pl line 7.

The situation would be the same with any other variable:

D:\UseNet\clpmisc> cat t59.pl
#!/usr/bin/perl

use strict;
use warnings;

my $s = 'here "is" some "text" stuff';

my $t;
if ($s =~ /(")([^$t]*)"/) { print "$2\n" }

__END__

D:\UseNet\clpmisc> t59
Use of uninitialized value in concatenation (.) or string at D:\UseNet
\clpmisc\t59.pl line 9.
Unmatched [ in regex; marked by <-- HERE in m/(")([ <-- HERE ^]*)"/ at 
D:\UseNet\clpmisc\t59.pl line 9.

Because $t is undef, you end up with [^] which starts out life as a 
character class not containing the character ']' but perl can't find the 
closing ] for the class.

> I assumed perl would only accept $1 from the first expression
> when compiling the second, but so long as $1 has been set it doesn't
> matter what it has been set to as the first set of parens will
> override:

No, the capture variables always exist. However, without a successful 
match, you cannot be certain what value they hold.

> Is this a bug in perl or a feature?

Just another reason not to use capture variables before ensuring a 
successful match.

Sinan

-- 
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html



------------------------------

Date: 16 Aug 2006 17:08:46 -0700
From: "Matt Garrish" <mgarrish@gmail.com>
Subject: Re: regular expression strangeness
Message-Id: <1155773326.168604.181240@i3g2000cwc.googlegroups.com>


A. Sinan Unur wrote:

> "Matt Garrish" <mgarrish@gmail.com> wrote in
> news:1155770539.584975.54070@h48g2000cwc.googlegroups.com:
>
> > I'm learning something new today. I didn't think you could dynamically
> > interpolate into a character class,
>
> Sure you can. Regexen behave as double-quoted strings.
>
> > but it appears you can, but only if $1 has been set before you try and
> > do it. If you take your example above and remove the first pattern
> > match, you'll get a compilation error.
>
> I think that's a runtime error caused by the fact that $1 is not
> defined:
>

Bad wording on my part, I think. I meant that you can't create the
regex from captured groups in the same regex, and I believe I'm right.

I don't deny that you can interpolate when the regex is compiled, but
that's not what the OP was asking. He wants to reference the first set
of parens in the match, not the set from the preceding match, which is
what your example takes advantage of.

You're right about the regex class not being defined. I was thinking
that was happening during compilation (too much compiling code of
late!). But at least I don't feel so dumb now about being incredulous
that you could reference a match in a character class during execution
of the regex. I still like being awed if anyone has a way... : )

Matt



------------------------------

Date: Thu, 17 Aug 2006 00:16:35 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: regular expression strangeness
Message-Id: <Xns9821CE5E05109asu1cornelledu@127.0.0.1>

"Matt Garrish" <mgarrish@gmail.com> wrote in 
news:1155773326.168604.181240@i3g2000cwc.googlegroups.com:

> 
> A. Sinan Unur wrote:
> 
>> "Matt Garrish" <mgarrish@gmail.com> wrote in
>> news:1155770539.584975.54070@h48g2000cwc.googlegroups.com:
>>
>> > I'm learning something new today. I didn't think you could 
>> > dynamically interpolate into a character class,
>>
>> Sure you can. Regexen behave as double-quoted strings.
>>
>> > but it appears you can, but only if $1 has been set before you try 
>> > and do it. If you take your example above and remove the first 
>> > pattern match, you'll get a compilation error.
>>
>> I think that's a runtime error caused by the fact that $1 is not
>> defined:
>>
> 
> Bad wording on my part, I think. I meant that you can't create the
> regex from captured groups in the same regex, and I believe I'm right.

You are, and I now understand what you meant.

> You're right about the regex class not being defined. I was thinking
> that was happening during compilation (too much compiling code of
> late!). But at least I don't feel so dumb now about being incredulous
> that you could reference a match in a character class during execution
> of the regex. I still like being awed if anyone has a way... : )

I don't think so, but then I really don't know much.

Sinan

-- 
-- 
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html



------------------------------

Date: 16 Aug 2006 17:22:09 -0700
From: "Matt Garrish" <mgarrish@gmail.com>
Subject: Re: regular expression strangeness
Message-Id: <1155774129.149032.259340@75g2000cwc.googlegroups.com>


A. Sinan Unur wrote:

> "Matt Garrish" <mgarrish@gmail.com> wrote in
> news:1155773326.168604.181240@i3g2000cwc.googlegroups.com:
>
> > Bad wording on my part, I think. I meant that you can't create the
> > regex from captured groups in the same regex, and I believe I'm right.
>
> You are, and I now understand what you meant.
>
> > You're right about the regex class not being defined. I was thinking
> > that was happening during compilation (too much compiling code of
> > late!). But at least I don't feel so dumb now about being incredulous
> > that you could reference a match in a character class during execution
> > of the regex. I still like being awed if anyone has a way... : )
>
> I don't think so, but then I really don't know much.
>

Hey, I've been all over the map on this one proving I've forgotten
everything I thought I once knew. A couple of bad test cases and I was
thinking I had a whole new power over regexes. Some days just aren't as
good as others... :P

Matt



------------------------------

Date: Thu, 17 Aug 2006 08:29:47 +0000 (UTC)
From:  Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: regular expression strangeness
Message-Id: <ec19dr$1ue5$1@agate.berkeley.edu>

[A complimentary Cc of this posting was sent to
A. Sinan Unur
<1usa@llenroc.ude.invalid>], who wrote in article <Xns9821C764246Casu1cornelledu@127.0.0.1>:
> > I'm learning something new today. I didn't think you could dynamically
> > interpolate into a character class, 
> 
> Sure you can. Regexen behave as double-quoted strings.

<pedantic>
  ...  as far as variable interpolation goes.
</pedantic>

In other respects (e.g, backslash interpolation) the behaviour is
quite different (and not fully documented yet, AFAIK.  I tried to do
it in "gory details", but made some goofs, where were not fixed yet -
at least several years ago.)

Hope this helps,
Ilya




------------------------------

Date: 17 Aug 2006 02:49:16 -0700
From: "greendogday" <wibble436-groups@yahoo.co.uk>
Subject: Re: regular expression strangeness
Message-Id: <1155808156.289470.103640@i3g2000cwc.googlegroups.com>


> My point still stands, you're telling perl to find anything
> that is not a 1 [^\1].

But if I put a 1 in the string, like so:
$s = 'here "is" 1 some "text" stuff';

then I get:

is" 1 some "text

from the 2nd expression. So it doesn't seem to be searching
for not 1's.

> You can't use a backreference
> inside a character class

So, as a matter of interest, what does the [^\1] end up as?
What is it "not looking for" when it gets to this bit?



------------------------------

Date: 17 Aug 2006 10:07:30 GMT
From: anno4000@radom.zrz.tu-berlin.de
Subject: Re: regular expression strangeness
Message-Id: <4kitf2Fcb51mU1@news.dfncis.de>

greendogday <wibble436-groups@yahoo.co.uk> wrote in comp.lang.perl.misc:
> 
> > My point still stands, you're telling perl to find anything
> > that is not a 1 [^\1].
> 
> But if I put a 1 in the string, like so:
> $s = 'here "is" 1 some "text" stuff';
> 
> then I get:
> 
> is" 1 some "text
> 
> from the 2nd expression. So it doesn't seem to be searching
> for not 1's.
> 
> > You can't use a backreference
> > inside a character class
> 
> So, as a matter of interest, what does the [^\1] end up as?
> What is it "not looking for" when it gets to this bit?

It's looking for "\1".  That is the character whose ord() is (octal) 1.
Set

    $s = qq(here "is" \1 some "text" stuff);

to see that.

Anno


------------------------------

Date: Thu, 17 Aug 2006 09:43:08 -0400
From: Chris Mattern <syscjm@gwu.edu>
Subject: Re: regular expression strangeness
Message-Id: <12e8sjd4cmum059@corp.supernews.com>

greendogday wrote:
>>Out of curiosity why are you capturing a character that doesn't change?
> 
> 
> It's just a cut down version of a problem, to show where the problem
> lies.
> The original had a match for either a double or single quote.
> 
> 
>>You also should assume the first match worked when performing the
>>seond, and you should use the proper $1 when referring matches (\1 is
>>for backreferncing a match).
> 
> 
> I don't understand that. I am backreferencing a match - the double
> quote.
> 
> 

No, you aren't.  You only think you are.  \[1-9] is only "special"
in the second part of a search and replace, and can only refer to
the first part of the same S&R.  In a simple match, it doesn't mean
anything.  It is a literal "1", which you escaped with a backslash,
making it still a literal "1".


------------------------------

Date: 17 Aug 2006 14:43:36 GMT
From: anno4000@radom.zrz.tu-berlin.de
Subject: Re: regular expression strangeness
Message-Id: <4kjdkoFchec3U1@news.dfncis.de>

Chris Mattern  <syscjm@gwu.edu> wrote in comp.lang.perl.misc:
> greendogday wrote:
> >>Out of curiosity why are you capturing a character that doesn't change?
> > 
> > 
> > It's just a cut down version of a problem, to show where the problem
> > lies.
> > The original had a match for either a double or single quote.
> > 
> > 
> >>You also should assume the first match worked when performing the
> >>seond, and you should use the proper $1 when referring matches (\1 is
> >>for backreferncing a match).
> > 
> > 
> > I don't understand that. I am backreferencing a match - the double
> > quote.
> > 
> > 
> 
> No, you aren't.  You only think you are.  \[1-9] is only "special"
> in the second part of a search and replace, and can only refer to

No, that's wrong too.

The one-digit backreferences *are* a replacement for $1 .. $9 in a
regex (and the regex part of a s///) where $1 .. $9 can't be used.
Using them on the replacement side of a s/// works, but is considered
bad style.

They are match-time interpolated in the regex, but only in the parts
that are literal matching text.  They are not interpolated in character
classes, and neither in {,}-quantifiers and probably a lot more places
that aren't matched literally.  The non-interpolation in character
classes was the source of the confusion.

BTW, the existence of backreferences is what makes computer-style
regexes fundamentally different from their mathematical model.  In
mathematics a "regular expression" disallows backreferences.  That
limits the set of languages they describe in (mathematically)
interesting ways.

> the first part of the same S&R.  In a simple match, it doesn't mean
> anything.  It is a literal "1", which you escaped with a backslash,
> making it still a literal "1".

Well, no.  Here is a regex

    /^(.)\1*$/

that uses a backreference to match all strings that are a repetition
of a single character, no matter which. This is something mathematicians
like to prove regexes cannot do.

    print "'$_': ", /^(.)\1*$/ ? 'yes' : 'no', "\n" for
        '', qw( a ab aaaa aaab XXX ;;;;; ;;;:;;);

Anno


------------------------------

Date: Thu, 17 Aug 2006 17:47:20 +0100
From: Ben Morrow <benmorrow@tiscali.co.uk>
Subject: Re: regular expression strangeness
Message-Id: <o5jer3-k3r.ln1@osiris.mauzo.dyndns.org>


Quoth "Peter J. Holzer" <hjp-usenet2@hjp.at>:
> On Wed, 16 Aug 2006 15:05:01 -0700, greendogday wrote:
> 
> > How come the following:
> > 
> > my $s = 'here "is" some "text" stuff';
> > if ($s =~ /(")([^"]*)"/) { print "$2\n" }
> > if ($s =~ /(")([^\1]*)"/) { print "$2\n" }
> > 
> > outputs this:
> > 
> > is
> > is" some "text
> > 
> > Why doesn't the 2nd one work the same as the first? How did it skip
> > over the quotes in the middle when it is meant to match with
> > non-quotes?
> 
> I don't think \1 is supposed to be a backreference inside a character
> class (what if the first () matched more than one character?).
> 
> 
> if ($s =~ /(")(.*?)\1/) { print "$2\n" }
> 
> works as expected.

 ...but only if that is the whole regex; e.g.

    /(") (.*?) \1 foo/x

does *not* match "a double-quoted string followed by 'foo'". Applied to
the string

    "xxx"bar "yyy"foo

$2 will be 'xxx"bar "yyy', which is (probably) not what was meant. In
the general case you need a negative look-ahead:

    m{ (['"]) ( (?:(?! \1).)* ) \1 foo }x

For matching actual quoted strings you really want to use
Text::Balanced: go read the FAQ.

Ben

-- 
'Deserve [death]? I daresay he did. Many live that deserve death. And some die
that deserve life. Can you give it to them? Then do not be too eager to deal
out death in judgement. For even the very wise cannot see all ends.'
                                                        benmorrow@tiscali.co.uk


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 9619
***************************************


home help back first fref pref prev next nref lref last post