[30194] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 1437 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Apr 15 14:44:55 2008

Date: Tue, 15 Apr 2008 11:44:42 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Tue, 15 Apr 2008     Volume: 11 Number: 1437

Today's topics:
        Can I use a look-ahead and a look-behind at the same ti dan.j.weber@gmail.com
    Re: Can I use a look-ahead and a look-behind at the sam <joost@zeekat.nl>
    Re: Can I use a look-ahead and a look-behind at the sam <1usa@llenroc.ude.invalid>
    Re: Can I use a look-ahead and a look-behind at the sam xhoster@gmail.com
    Re: Can I use a look-ahead and a look-behind at the sam dan.j.weber@gmail.com
    Re: Can I use a look-ahead and a look-behind at the sam <jimsgibson@gmail.com>
        Can someone 'splain why this regex won't work both ways spydox@gmail.com
    Re: Can someone 'splain why this regex won't work both  <ben@morrow.me.uk>
    Re: Can someone 'splain why this regex won't work both  <1usa@llenroc.ude.invalid>
    Re: Can someone 'splain why this regex won't work both  spydox@gmail.com
    Re: Can someone 'splain why this regex won't work both  spydox@gmail.com
    Re: Can someone 'splain why this regex won't work both  <willem@stack.nl>
    Re: Can someone 'splain why this regex won't work both  <ben@morrow.me.uk>
    Re: Can someone 'splain why this regex won't work both  (J.D. Baldwin)
    Re: Can someone 'splain why this regex won't work both  <1usa@llenroc.ude.invalid>
    Re: Can someone 'splain why this regex won't work both  xhoster@gmail.com
    Re: Can someone 'splain why this regex won't work both  <hjp-usenet2@hjp.at>
    Re: Can someone 'splain why this regex won't work both  xhoster@gmail.com
    Re: Can someone 'splain why this regex won't work both  <nospam-abuse@ilyaz.org>
    Re: Can someone 'splain why this regex won't work both  <ben@morrow.me.uk>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 9 Apr 2008 15:01:51 -0700 (PDT)
From: dan.j.weber@gmail.com
Subject: Can I use a look-ahead and a look-behind at the same time?
Message-Id: <1421afb9-6b66-45d8-ba6f-60aad330f718@p39g2000prm.googlegroups.com>

How would I match the text that's after "#ab cd ef#" and before "#qr
st uv#" in the following string? I want to use a regular expression
that has both a look-behind and a look-ahead together. Is this
possible?

#ab cd ef#gh ij kl#qr st uv#


------------------------------

Date: Thu, 10 Apr 2008 00:07:35 +0200
From: Joost Diepenmaat <joost@zeekat.nl>
Subject: Re: Can I use a look-ahead and a look-behind at the same time?
Message-Id: <87y77m65so.fsf@zeekat.nl>

dan.j.weber@gmail.com writes:

> I want to use a regular expression that has both a look-behind and a
> look-ahead together. Is this possible?

AFAIK, yes. Just try it.

-- 
Joost Diepenmaat | blog: http://joost.zeekat.nl/ | work: http://zeekat.nl/


------------------------------

Date: Wed, 09 Apr 2008 22:16:40 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Can I use a look-ahead and a look-behind at the same time?
Message-Id: <Xns9A7BB9EBF4AD6asu1cornelledu@127.0.0.1>

dan.j.weber@gmail.com wrote in
news:1421afb9-6b66-45d8-ba6f-60aad330f718
@p39g2000prm.googlegroups.co
m: 

> How would I match the text that's after "#ab cd ef#" and before
> "#qr st uv#" in the following string? I want to use a regular
> expression that has both a look-behind and a look-ahead together.
> Is this possible?
> 
> #ab cd ef#gh ij kl#qr st uv#

I may be misunderstanding the question, but I am not sure why you
think you need look-ahead or look-behind here. 

#!/usr/bin/perl

use strict;
use warnings;

my $x = q{#ab cd ef#gh ij kl#qr st uv#};

if ( $x =~ /#ab cd ef#(.+)#qr st uv#/ ) {
    print "$1\n";
}

# You could also use split:
print( (grep length, split /#/, $x)[1], "\n" );

__END__

-- 
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/


------------------------------

Date: 09 Apr 2008 22:18:43 GMT
From: xhoster@gmail.com
Subject: Re: Can I use a look-ahead and a look-behind at the same time?
Message-Id: <20080409181845.835$p6@newsreader.com>

dan.j.weber@gmail.com wrote:
> How would I match the text that's after "#ab cd ef#" and before "#qr
> st uv#" in the following string? I want to use a regular expression
> that has both a look-behind and a look-ahead together. Is this
> possible?
>
> #ab cd ef#gh ij kl#qr st uv#

I don't know what problems you are anticipating, so I'll just try doing it
in a straightforward manner:

use strict;
"#ab cd ef#gh ij kl#qr st uv#" =~
    /(?<=#ab cd ef#)(.*?)(?=#qr st uv#)/ or die;
print $1
__END__
gh ij kl

Yep, seems to work.  Which is what I expected, because the parts of Perl's
regex language are supposed to work when used together--if they didn't
there wouldn't be much point in having such a language.  Neither look ahead
nor look behind claim to be an experimental features, so I'd just
storm ahead and use them with confidence.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.


------------------------------

Date: Wed, 9 Apr 2008 15:25:39 -0700 (PDT)
From: dan.j.weber@gmail.com
Subject: Re: Can I use a look-ahead and a look-behind at the same time?
Message-Id: <d15b8a1e-f04c-4000-bcf2-e7d929c1ff79@v26g2000prm.googlegroups.com>

On Apr 9, 3:18=A0pm, xhos...@gmail.com wrote:
> dan.j.we...@gmail.com wrote:
> > How would I match the text that's after "#ab cd ef#" and before "#qr
> > st uv#" in the following string? I want to use a regular expression
> > that has both a look-behind and a look-ahead together. Is this
> > possible?
>
> > #ab cd ef#gh ij kl#qr st uv#
>
> I don't know what problems you are anticipating, so I'll just try doing it=

> in a straightforward manner:
>
> use strict;
> "#ab cd ef#gh ij kl#qr st uv#" =3D~
> =A0 =A0 /(?<=3D#ab cd ef#)(.*?)(?=3D#qr st uv#)/ or die;
> print $1
> __END__
> gh ij kl
>
> Yep, seems to work. =A0Which is what I expected, because the parts of Perl=
's
> regex language are supposed to work when used together--if they didn't
> there wouldn't be much point in having such a language. =A0Neither look ah=
ead
> nor look behind claim to be an experimental features, so I'd just
> storm ahead and use them with confidence.
>
> Xho
>
> --
> --------------------http://NewsReader.Com/--------------------
> The costs of publication of this article were defrayed in part by the
> payment of page charges. This article must therefore be hereby marked
> advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate=

> this fact.

Thanks for your responses. The example I gave was a simplification.
The problem was that I was using (.*) instead of (.*?) and I'm not
100% why, but it doesn't work like that. Thanks.


------------------------------

Date: Wed, 09 Apr 2008 16:32:52 -0700
From: Jim Gibson <jimsgibson@gmail.com>
Subject: Re: Can I use a look-ahead and a look-behind at the same time?
Message-Id: <090420081632525111%jimsgibson@gmail.com>

In article
<d15b8a1e-f04c-4000-bcf2-e7d929c1ff79@v26g2000prm.googlegroups.com>,
<dan.j.weber@gmail.com> wrote:

> On Apr 9, 3:18 pm, xhos...@gmail.com wrote:
> > dan.j.we...@gmail.com wrote:
> > > How would I match the text that's after "#ab cd ef#" and before "#qr
> > > st uv#" in the following string? I want to use a regular expression
> > > that has both a look-behind and a look-ahead together. Is this
> > > possible?
> >
> > > #ab cd ef#gh ij kl#qr st uv#
> >
> > I don't know what problems you are anticipating, so I'll just try doing it
> > in a straightforward manner:
> >
> > use strict;
> > "#ab cd ef#gh ij kl#qr st uv#" =~
> >     /(?<=#ab cd ef#)(.*?)(?=#qr st uv#)/ or die;
> > print $1
> > __END__
> > gh ij kl
> >
> > Yep, seems to work.  Which is what I expected, because the parts of Perl's
> > regex language are supposed to work when used together--if they didn't
> > there wouldn't be much point in having such a language.  Neither look ahead
> > nor look behind claim to be an experimental features, so I'd just
> > storm ahead and use them with confidence.
> 
> Thanks for your responses. The example I gave was a simplification.
> The problem was that I was using (.*) instead of (.*?) and I'm not
> 100% why, but it doesn't work like that. Thanks.

Xho's example works either with (.*?) or (.*), so your problem may lie
elsewhere. The only difference would be if your string included two
'#qr st uv#' substrings after the initial '#ab cd ef#'. In that case
(.*?) will match the shortest possible string, while (.*) will match
the longest.

-- 
Jim Gibson

 Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
    ** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------        
                http://www.usenet.com


------------------------------

Date: Mon, 14 Apr 2008 10:08:51 -0700 (PDT)
From: spydox@gmail.com
Subject: Can someone 'splain why this regex won't work both ways?
Message-Id: <093bf887-729d-4400-8750-6c91b21b478e@w4g2000prd.googlegroups.com>


I'm trying to find a repeated number in a string, like 122345 finds
22.

This works:

/(\d)\1/

This doesn't:

 /\1(\d)/

I guess LLR parsing is to blame, but shouldn't the second example
first try to FIND a $1 then check to see if there is a \1, and repeat
that process moving L to R?

I though Perl sort of went to and fro trying to do matching. To me,
there IS a /\1(\d)/ in the string since $1 is 2, and there is a \1 = 2
preceeding it.

I was a little surprized this didn't work although I can sort of see
why in a way too. In some ways it seems to me that regexes should be
*disconnected* from parsing - just answer the question does this
match?






------------------------------

Date: Mon, 14 Apr 2008 19:31:47 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Can someone 'splain why this regex won't work both ways?
Message-Id: <jhkcd5-it81.ln1@osiris.mauzo.dyndns.org>


Quoth spydox@gmail.com:
> 
> I'm trying to find a repeated number in a string, like 122345 finds
> 22.
> 
> This works:
> 
> /(\d)\1/
> 
> This doesn't:
> 
>  /\1(\d)/
> 
> I guess LLR parsing is to blame, but shouldn't the second example
> first try to FIND a $1 then check to see if there is a \1, and repeat
> that process moving L to R?
> 
> I though Perl sort of went to and fro trying to do matching. To me,
> there IS a /\1(\d)/ in the string since $1 is 2, and there is a \1 = 2
> preceeding it.

There are two separate operations here which you are confusing. First
perl parses the regex itself, and compiles it into an internal form.
Then it matches that regex against the string you provide. The second
will backtrack, under some circumstances; the first won't.

Ben



------------------------------

Date: Mon, 14 Apr 2008 18:37:12 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Can someone 'splain why this regex won't work both ways?
Message-Id: <Xns9A8094B89BA3Basu1cornelledu@127.0.0.1>

spydox@gmail.com wrote in
news:093bf887-729d-4400-8750-
6c91b21b478e@w4g2000prd.googlegroups.com
: 

> I'm trying to find a repeated number in a string, like 122345
> finds 22.
> 
> This works:
> 
> /(\d)\1/
> 
> This doesn't:
> 
>  /\1(\d)/
> 
> I guess LLR parsing is to blame, 

 ...

> I was a little surprized this didn't work although I can sort of
> see why in a way too. In some ways it seems to me that regexes
> should be *disconnected* from parsing - just answer the question
> does this match?

I don't look at this as a parsing issue. Rather, it is a "the 
universe must make sense" kind of issue: The first match does not 
exist before the first match. That makes sense to me. It may not 
make sense to you.

Sinan
-- 
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/


------------------------------

Date: Mon, 14 Apr 2008 11:51:21 -0700 (PDT)
From: spydox@gmail.com
Subject: Re: Can someone 'splain why this regex won't work both ways?
Message-Id: <2a37197b-968f-4c57-8b74-25d8843ca336@u3g2000hsc.googlegroups.com>

On Apr 14, 2:31 pm, Ben Morrow <b...@morrow.me.uk> wrote:
> Quoth spy...@gmail.com:
>
>
>
>
>
> > I'm trying to find a repeated number in a string, like 122345 finds
> > 22.
>
> > This works:
>
> > /(\d)\1/
>
> > This doesn't:
>
> >  /\1(\d)/
>
> > I guess LLR parsing is to blame, but shouldn't the second example
> > first try to FIND a $1 then check to see if there is a \1, and repeat
> > that process moving L to R?
>
> > I though Perl sort of went to and fro trying to do matching. To me,
> > there IS a /\1(\d)/ in the string since $1 is 2, and there is a \1 = 2
> > preceeding it.
>
> There are two separate operations here which you are confusing. First
> perl parses the regex itself, and compiles it into an internal form.
> Then it matches that regex against the string you provide. The second
> will backtrack, under some circumstances; the first won't.
>
> Ben

Understood, and I appreciate the insight. It makes sense.
Yet, when all else apparently *fails*, in my experience, and I've
heard MJD and others say this, Perl will "do its best" to match. To
me, unless it *also* tried backtracking, it gave up too soon..





------------------------------

Date: Mon, 14 Apr 2008 11:57:46 -0700 (PDT)
From: spydox@gmail.com
Subject: Re: Can someone 'splain why this regex won't work both ways?
Message-Id: <e6278092-e663-4ea6-8f07-40d65faeb551@f63g2000hsf.googlegroups.com>

 .
 .
 .
>
> > I guess LLR parsing is to blame,
>
 .
 .
>
> I don't look at this as a parsing issue. Rather, it is a "the
> universe must make sense" kind of issue: The first match does not
> exist before the first match. That makes sense to me. It may not
> make sense to you.
>

To me, like conventional pattern-recognition, of say two tanks next to
each other, the system should accept it whether the match is described
either way:

find a tank with another identical tank to it's left

 *or*

find a tank with another identical tank to it's right


The system should have no *context-sensitivity* where only one of the
two matches. Sure, internally an algorithm may be scanning L to R or R
to L or whatever, but the user should not even be concerned with that,
at least in this case. I still think it gave up too soon- it should
have tried R to L (backtracking) when L to R failed.

Just IMHO, thank-you for your thoughts. This area seems just a bit
gray to me I'd be very interested in Damain or Mark's thoughts.






------------------------------

Date: Mon, 14 Apr 2008 19:11:45 +0000 (UTC)
From: Willem <willem@stack.nl>
Subject: Re: Can someone 'splain why this regex won't work both ways?
Message-Id: <slrng07b3h.1gav.willem@snail.stack.nl>

spydox@gmail.com wrote:
) Understood, and I appreciate the insight. It makes sense.
) Yet, when all else apparently *fails*, in my experience, and I've
) heard MJD and others say this, Perl will "do its best" to match. To
) me, unless it *also* tried backtracking, it gave up too soon..

That's not what backtracking means.


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT


------------------------------

Date: Mon, 14 Apr 2008 20:06:28 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Can someone 'splain why this regex won't work both ways?
Message-Id: <kimcd5-ua91.ln1@osiris.mauzo.dyndns.org>


Quoth spydox@gmail.com:
> On Apr 14, 2:31 pm, Ben Morrow <b...@morrow.me.uk> wrote:
> > Quoth spy...@gmail.com:
> > >
> > > I'm trying to find a repeated number in a string, like 122345 finds
> > > 22.
> >
> > > This works:
> >
> > > /(\d)\1/
> >
> > > This doesn't:
> >
> > >  /\1(\d)/
> >
> > > I guess LLR parsing is to blame, but shouldn't the second example
> > > first try to FIND a $1 then check to see if there is a \1, and repeat
> > > that process moving L to R?
> >
> > > I though Perl sort of went to and fro trying to do matching. To me,
> > > there IS a /\1(\d)/ in the string since $1 is 2, and there is a \1 = 2
> > > preceeding it.
> >
> > There are two separate operations here which you are confusing. First
> > perl parses the regex itself, and compiles it into an internal form.
> > Then it matches that regex against the string you provide. The second
> > will backtrack, under some circumstances; the first won't.
> 
> Understood, and I appreciate the insight. It makes sense.
> Yet, when all else apparently *fails*, in my experience, and I've
> heard MJD and others say this, Perl will "do its best" to match. To
> me, unless it *also* tried backtracking, it gave up too soon..

No, you're still not understanding. Perl will only backtrack *while
trying to match*. Compiling the regex comes long before that.

Ben



------------------------------

Date: Mon, 14 Apr 2008 19:51:13 +0000 (UTC)
From: INVALID_SEE_SIG@example.com.invalid (J.D. Baldwin)
Subject: Re: Can someone 'splain why this regex won't work both ways?
Message-Id: <fu0cjh$fv3$1@reader2.panix.com>


In the previous article,  <spydox@gmail.com> wrote:
> > > I guess LLR parsing is to blame,
> >
> .
> .
> >
> > I don't look at this as a parsing issue. Rather, it is a "the
> > universe must make sense" kind of issue: The first match does not
> > exist before the first match. That makes sense to me. It may not
> > make sense to you.
> >
> 
> To me, like conventional pattern-recognition, of say two tanks next to
> each other, the system should accept it whether the match is described
> either way:
> 
> find a tank with another identical tank to it's left
> 
>  *or*
> 
> find a tank with another identical tank to it's right

A better phrasing:

    find a tank, then find another one to its right
 
      *or*
 
    find another one to its left, then find a tank

One of these phrasings makes sense; the other does not.  Or, rather,
the other doesn't and one of the phrasings makes sense.

If you want a more formal justification, here's what the Camel Book
says about these.  Note the two instances of the word "later":

    1.7.4. Backreferences

    [...]  A pair of parentheses around a part of a regular
    expression causes whatever was matched by that part to be
    remembered for later use. It doesn't change what the part
    matches, so /\d+/ and /(\d+)/ will still match as many digits
    as possible, but in the latter case they will be remembered in
    a special variable to be backreferenced later.
-- 
  _+_ From the catapult of |If anyone disagrees with any statement I make, I
_|70|___:)=}- J.D. Baldwin |am quite prepared not only to retract it, but also
\      /  baldwin@panix.com|to deny under oath that I ever made it. -T. Lehrer
***~~~~-----------------------------------------------------------------------


------------------------------

Date: Mon, 14 Apr 2008 21:43:38 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Can someone 'splain why this regex won't work both ways?
Message-Id: <Xns9A80B454746E8asu1cornelledu@127.0.0.1>

spydox@gmail.com wrote in
news:e6278092-e663-4ea6-8f07-40d65faeb551
@f63g2000hsf.googlegroups.co
m: 

[ please do not snip attributions ]

>> > I guess LLR parsing is to blame,
>>
>> I don't look at this as a parsing issue. Rather, it is a "the
>> universe must make sense" kind of issue: The first match does not
>> exist before the first match. That makes sense to me. It may not
>> make sense to you.
>>
> 
> To me, like conventional pattern-recognition, of say two tanks
> next to each other, the system should accept it whether the match
> is described either way:
> 
> find a tank with another identical tank to it's left
> 
>  *or*
> 
> find a tank with another identical tank to it's right
>
> 
> The system should have no *context-sensitivity* where only one of
> the two matches. Sure, internally an algorithm may be scanning L
> to R or R to L or whatever, but the user should not even be
> concerned with that, at least in this case. I still think it gave
> up too soon- it should have tried R to L (backtracking) when L to
> R failed. 

What you seem to want is a "match two identical characters" 
operator. For this particular case, you can achieve that by using:

=for example

my @strings = qw( 1222345 1233345 );

s/00|11|22|33|44|55|66|77|88|99// for @strings;

print "$_\n" for @strings;

=cut

When you use a character class, every element of that class is 
considered equivalent to every other one. So, for example, when you 
write

/\d{2}/

that does find two characters that are in the same equivalence 
class.

The tank analogy works perftectly here because there are no two 
identical tanks in the world. Instead, there are equivalence classes 
of tanks. Tanks that are the same model, tanks in the same unit etc.

If what you want is to say, 

    find a tank, then find another tank that is the same 
    model as the one you just found

well, that is equivalent to /(\d)\1/

J. D. Baldwin gives perfect examples of why /\1(\d)/ does not make 
sense: Finding another tank in the same equivalence class as the one 
you first found comes after first finding a tank.

> Just IMHO, thank-you for your thoughts. This area seems just a bit
> gray to me I'd be very interested in Damain or Mark's thoughts.

s/Damain/Damian/

My feeble mind looks at the following:

#!/usr/bin/perl

use strict;
use warnings;

use 5.010;

for ( my @a = qw( 1222345 1233345 ) ) {
    s/(?<tank>\d)\K\k<tank>// and print "$_\n";
}

for ( my @a = qw( 1222345 1233345 ) ) {
    s/(?<tank>\d)\K\k<tank>+// and print "$_\n";
}

for ( my @a = qw( 1222345 1233345 ) ) {
    s/(?<tank>\d)\k<tank>// and print "$_\n";
}

__END__

thinks that the third one is the most natural (that is, find a tank, 
then find another tank in the same equivalence class) to the other 
ones.

Sinan

-- 
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/


------------------------------

Date: 14 Apr 2008 21:52:28 GMT
From: xhoster@gmail.com
Subject: Re: Can someone 'splain why this regex won't work both ways?
Message-Id: <20080414175231.297$HZ@newsreader.com>

Ben Morrow <ben@morrow.me.uk> wrote:
> Quoth spydox@gmail.com:
> > On Apr 14, 2:31 pm, Ben Morrow <b...@morrow.me.uk> wrote:
> > > Quoth spy...@gmail.com:
> > > >
> > > > I'm trying to find a repeated number in a string, like 122345 finds
> > > > 22.
> > >
> > > > This works:
> > >
> > > > /(\d)\1/
> > >
> > > > This doesn't:
> > >
> > > >  /\1(\d)/
> > >
> > > > I guess LLR parsing is to blame, but shouldn't the second example
> > > > first try to FIND a $1 then check to see if there is a \1, and
> > > > repeat that process moving L to R?
> > >
> > > > I though Perl sort of went to and fro trying to do matching. To me,
> > > > there IS a /\1(\d)/ in the string since $1 is 2, and there is a \1
> > > > = 2 preceeding it.
> > >
> > > There are two separate operations here which you are confusing. First
> > > perl parses the regex itself, and compiles it into an internal form.
> > > Then it matches that regex against the string you provide. The second
> > > will backtrack, under some circumstances; the first won't.
> >
> > Understood, and I appreciate the insight. It makes sense.
> > Yet, when all else apparently *fails*, in my experience, and I've
> > heard MJD and others say this, Perl will "do its best" to match. To
> > me, unless it *also* tried backtracking, it gave up too soon..
>
> No, you're still not understanding. Perl will only backtrack *while
> trying to match*. Compiling the regex comes long before that.

I think that that is what he is talking about, the when trying to match
part. His use of "parsing" in the original question was ill-advised, but I
think you latched onto the the bad phrasing and rather than the intended
question, and now won't let him correct his poor phrasing.  First perl
parses and compiles the regular expression, then it uses that compiled
expression to match (or loosely speaking "parse") the target string.

Perl parses and compiles /\1(.)/ without error or warning (which surprised
me). But then what does it do with it?

Conceptually, it could temporarily treat the \1 as ".*", and then when/if
the capture is matched go back and verify that it is the same as the thing
previously matched by the tentative .* cum \1.  I don't know that I would
call this backtracking (as the OP seems to be doing), but I can't think of
anything obviously better to call it.  Or it could reorder things give
an identical compiled regex as /(.)\1/.  I don't know if these two things
would give the same answer as each other in all cases (if so, the latter
would surely be faster).

I think that that is what the OP thought it should do.  Obviously, Perl
doesn't do either of those thing.  I can't figure out what it does do.  I
thought it would treat \1 preceding any capture as the empty string, but
apparently it doesn't do that, either.  It seems to act as something
unmatchable.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.


------------------------------

Date: Mon, 14 Apr 2008 23:53:32 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Can someone 'splain why this regex won't work both ways?
Message-Id: <slrng07kiu.hj5.hjp-usenet2@hrunkner.hjp.at>

On 2008-04-14 18:57, spydox@gmail.com <spydox@gmail.com> wrote:
>> I don't look at this as a parsing issue. Rather, it is a "the
>> universe must make sense" kind of issue: The first match does not
>> exist before the first match. That makes sense to me. It may not
>> make sense to you.
>>
>
> To me, like conventional pattern-recognition, of say two tanks next to
> each other, the system should accept it whether the match is described
> either way:
>
> find a tank with another identical tank to it's left
>
>  *or*
>
> find a tank with another identical tank to it's right
>
>
> The system should have no *context-sensitivity* where only one of the
> two matches. Sure, internally an algorithm may be scanning L to R or R
> to L or whatever, but the user should not even be concerned with that,
> at least in this case. I still think it gave up too soon- it should
> have tried R to L (backtracking) when L to R failed.

Backtracking doesn't mean scanning right to left. Backtracking means to
go back to the last point where you had a choice and try the other
alternative(s). 

So, for example if you have a pattern /foo(bar|baz)/, after matching
"foo", you have a choice between trying to match "bar" or "baz". The
regex engine will try to match "bar" first, and if that fails, it will
backtrack to the point before it tried that and then try to match "baz"
instead. 

But in a pattern like /\1(a)/ there is no choice: It needs to start by
matching the string in the first capture group, but that hasn't been
defined yet, so it must fail. (Well, it could try all possible strings,
but that would be extremely inefficient).

	hp



------------------------------

Date: 14 Apr 2008 21:58:04 GMT
From: xhoster@gmail.com
Subject: Re: Can someone 'splain why this regex won't work both ways?
Message-Id: <20080414175806.268$wd@newsreader.com>

news@baldwin.users.panix.com wrote:
> In the previous article,  <spydox@gmail.com> wrote:
> >
> > find a tank with another identical tank to it's left
> >
> >  *or*
> >
> > find a tank with another identical tank to it's right
>
> A better phrasing:
>
>     find a tank, then find another one to its right
>
>       *or*
>
>     find another one to its left, then find a tank
>
> One of these phrasings makes sense; the other does not.  Or, rather,
> the other doesn't and one of the phrasings makes sense.
>
> If you want a more formal justification, here's what the Camel Book
> says about these.  Note the two instances of the word "later":

I think that that is his point, an objection to the notion that
left and right typographically equate to "earlier" and "later"
chronologically.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.


------------------------------

Date: Mon, 14 Apr 2008 22:03:16 +0000 (UTC)
From:  Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: Can someone 'splain why this regex won't work both ways?
Message-Id: <fu0kb4$23dp$1@agate.berkeley.edu>

[A complimentary Cc of this posting was sent to

<spydox@gmail.com>], who wrote in article <093bf887-729d-4400-8750-6c91b21b478e@w4g2000prd.googlegroups.com>:
> 
> I'm trying to find a repeated number in a string, like 122345 finds
> 22.
> 
> This works:
> 
> /(\d)\1/
> 
> This doesn't:
> 
>  /\1(\d)/

This depends on what you mean by "works".  It works in the sense that
it does not match (as it should not).  I do not find it documented in
perlre, but \3 will fail to match if group 3 did not match "yet".

Hope this helps,
Ilya

P.S.  perl -Mre=debugcolor -wle "q(aa) =~ /\1(a)/"


------------------------------

Date: Tue, 15 Apr 2008 00:28:48 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Can someone 'splain why this regex won't work both ways?
Message-Id: <gu5dd5-luf1.ln1@osiris.mauzo.dyndns.org>


Quoth xhoster@gmail.com:
> Ben Morrow <ben@morrow.me.uk> wrote:
> >
> > No, you're still not understanding. Perl will only backtrack *while
> > trying to match*. Compiling the regex comes long before that.
> 
> I think that that is what he is talking about, the when trying to match
> part. His use of "parsing" in the original question was ill-advised, but I
> think you latched onto the the bad phrasing and rather than the intended
> question, and now won't let him correct his poor phrasing.  First perl
> parses and compiles the regular expression, then it uses that compiled
> expression to match (or loosely speaking "parse") the target string.

You're right, I was misunderstanding the OP's misunderstanding. :)

> Perl parses and compiles /\1(.)/ without error or warning (which surprised
> me). But then what does it do with it?

I was assuming (without having tried it) that the regex was failing at
the compile stage. It seems I was wrong... :(

> Conceptually, it could temporarily treat the \1 as ".*", and then when/if
> the capture is matched go back and verify that it is the same as the thing
> previously matched by the tentative .* cum \1. 

Something like this can be done with

    /(.*)(a)(??{ $1 eq $2 ? "(?:)" : "(?!)" })/

using a code assertion to insert either a 'succeed' or a 'fail and
backtrack' item into the regex at runtime. Not that I'd recommend this,
of course... :)

> I don't know that I would
> call this backtracking (as the OP seems to be doing), but I can't think of
> anything obviously better to call it.  Or it could reorder things give
> an identical compiled regex as /(.)\1/.  I don't know if these two things
> would give the same answer as each other in all cases (if so, the latter
> would surely be faster).
> 
> I think that that is what the OP thought it should do.  Obviously, Perl
> doesn't do either of those thing.  I can't figure out what it does do.  I
> thought it would treat \1 preceding any capture as the empty string, but
> apparently it doesn't do that, either.  It seems to act as something
> unmatchable.

All the $N start as undef, which is unmatchable as you found (-Mre=debug
is useful for finding out what is going on). Whenever perl backtracks to
retry part of a match, it clears any $N set by the part of the match it
is backtracking over, so /\1(.)/ couldn't match even if it did
backtrack, as $1 would be undef again by the time it got to retry the
\1. (Perl doesn't in fact backtrack, as it knows nothing has changed so
this would be an infinite loop.)

It is, however, possible to get \1 to match when it appears earlier in
the expression than the first brackets (which is why it's not a syntax
error); you just have to make sure it gets set first. For instance,

    "abac" =~ /^(?:\1c|(a)b)+$/

matches. The first time through the +, $1 is undef so the \1c part fails;
but the (a)b part succeeds so $1 gets set. Then it goes round the + loop
again, and this time $1 is 'a' so the first branch can succeed.

Ben



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 1437
***************************************


home help back first fref pref prev next nref lref last post