[32685] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3794 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jun 3 14:55:52 2013

Date: Sat, 13 Oct 2012 02:17:11 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sat, 13 Oct 2012     Volume: 11 Number: 3794

Today's topics:
    Re: Differential pattern match <ben@morrow.me.uk>
    Re: Differential pattern match <graham.stow@stowassocs.co.uk>
    Re: Differential pattern match <justin.1210@purestblue.com>
    Re: Differential pattern match <rweikusat@mssgmbh.com>
    Re: Differential pattern match <ben@morrow.me.uk>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 11 Oct 2012 22:24:39 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Differential pattern match
Message-Id: <n1qjk9-s091.ln1@anubis.morrow.me.uk>


Quoth "Graham" <graham.stow@stowassocs.co.uk>:
> "Ben Morrow" <ben@morrow.me.uk> wrote in message 
> news:ubi1k9-0u7.ln1@anubis.morrow.me.uk...
> >
> > If you want to pull out just the string numbers (ignoring the spaces)
> > you unfortunately can't use the {6} notation, because capturing
> > (bracketed) groups inside repetitions still only capture once. That
> > means it's probably easiest to build up the pattern programatically,
> > like this:
> >
> >    my $string  = '(\b [12][0-9] \b | [1-9Xx]) [ ]*';
> >    my $chord   = $string x 6;
> >
> >    my $text    = "999x 10  x";
> >    my @strings = $text =~ /$chord/x;
>
> Don't know if you're still about Ben, but pulling out the string numbers is 
> proving tricky. The following code (predominately yours)
> 
> $string = '(\b[12][0-9]\b|[0-9Xx])[]*';

That should be '[ ]*' at the end. [ ] is one of the ways of matching a
literal space when using /x; normally /x causes spaces to be ignored, so
you have to do something special to cause them to be matched. Of the
available alternatives:

    [ ]  \   \40  \x20  \N{SPACE}

I find [ ] easiest to read and remember.

Where is the 'my'? Are you using 'strict' and 'warnings'? If not, you
should be.

> $chord = $string*6;

That should be 'x', not '*': repetition, not multiplication.

> $text = "999x 10 x";
> @strings=$text=~/$chord/x;
> print "String = $string<br>\n";
> print "Chord = $chord<br>\n";
> print "Text = $text<br>\n";
> foreach $line(@strings) {
>  print "String = $line<br>\n";
> }
> 
> produces this output
> 
> String = (\b[12][0-9]\b|[0-9Xx])[]*
> Chord = 0
> Text = 999x 10 x
> String = 1

When printing out variables for debugging like this, I find it easier to
put [] around the value (print "Text = [$text]";) so you can easily see
if there are extraneous spaces on the end.

Ben



------------------------------

Date: Fri, 12 Oct 2012 09:32:08 +0100
From: "Graham" <graham.stow@stowassocs.co.uk>
Subject: Re: Differential pattern match
Message-Id: <882dnYxtdtANSOrNnZ2dnUVZ7t-dnZ2d@bt.com>


"Ben Morrow" <ben@morrow.me.uk> wrote in message 
news:n1qjk9-s091.ln1@anubis.morrow.me.uk...
>
> Quoth "Graham" <graham.stow@stowassocs.co.uk>:
>> "Ben Morrow" <ben@morrow.me.uk> wrote in message
>> news:ubi1k9-0u7.ln1@anubis.morrow.me.uk...
>> >
>> > If you want to pull out just the string numbers (ignoring the spaces)
>> > you unfortunately can't use the {6} notation, because capturing
>> > (bracketed) groups inside repetitions still only capture once. That
>> > means it's probably easiest to build up the pattern programatically,
>> > like this:
>> >
>> >    my $string  = '(\b [12][0-9] \b | [1-9Xx]) [ ]*';
>> >    my $chord   = $string x 6;
>> >
>> >    my $text    = "999x 10  x";
>> >    my @strings = $text =~ /$chord/x;
>>
>> Don't know if you're still about Ben, but pulling out the string numbers 
>> is
>> proving tricky. The following code (predominately yours)
>>
>> $string = '(\b[12][0-9]\b|[0-9Xx])[]*';
>
> That should be '[ ]*' at the end. [ ] is one of the ways of matching a
> literal space when using /x; normally /x causes spaces to be ignored, so
> you have to do something special to cause them to be matched. Of the
> available alternatives:
>
>    [ ]  \   \40  \x20  \N{SPACE}
>
> I find [ ] easiest to read and remember.
>
> Where is the 'my'? Are you using 'strict' and 'warnings'? If not, you
> should be.
>
>> $chord = $string*6;
>
> That should be 'x', not '*': repetition, not multiplication.
>
>> $text = "999x 10 x";
>> @strings=$text=~/$chord/x;
>> print "String = $string<br>\n";
>> print "Chord = $chord<br>\n";
>> print "Text = $text<br>\n";
>> foreach $line(@strings) {
>>  print "String = $line<br>\n";
>> }
>>
>> produces this output
>>
>> String = (\b[12][0-9]\b|[0-9Xx])[]*
>> Chord = 0
>> Text = 999x 10 x
>> String = 1
>
> When printing out variables for debugging like this, I find it easier to
> put [] around the value (print "Text = [$text]";) so you can easily see
> if there are extraneous spaces on the end.
>
> Ben
>
Yes, I've gotten lazy and not been using 'strict' and 'warnings'. Apologies. 
Will backtrack and amend just as soon as I've sorted this problem (promise!)

OK, I now understand the repetition operator, but I need to modify the line
my $string  = '(\b [12][0-9] \b | [1-9Xx]) [ ]*';
to incoporate an improved pattern match that Ben helped me develop. That 
final pattern match was

/(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
[ ]{0,2} ){6}/x

I would have thought that I just remove the {6} and the cluster only 
parentheses from this and therefore the line becomes
$string = '(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ ]{0,2}';

and the whole sequence to pull out the fret numbers on all 6 strings becomes

$string = '(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ ]{0,2}';
$chord = $string x 6;
$text = "garbage 999x 10x garbage";
@strings=$text=~/$chord/x;
print "String = [$string]<br>\n";
print "Chord = [$chord]<br>\n";
print "Text = [$text]<br>\n";
foreach $line(@strings) {
 print "String = $line<br>\n";
}

However, this produces the output

String = [(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ ]{0,2}]
Chord = 
[(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ ]{0,2}(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ 
 ]{0,2}(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ ]{0,2}(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ 
 ]{0,2}(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ ]{0,2}(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ 
 ]{0,2}]
Text = [garbage 999x 10x garbage]
String = 1

If I retain the cluster parentheses, I get a similar result.

String = [(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
[ ]{0,2} )]
Chord = [(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
[ ]{0,2} )(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
[ ]{0,2} )(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
[ ]{0,2} )(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
[ ]{0,2} )(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
[ ]{0,2} )(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
[ ]{0,2} )]
Text = [garbage 999x 10x garbage]
String = 1

Any thoughts on where I'm going wrong?




------------------------------

Date: Fri, 12 Oct 2012 11:46:34 +0100
From: Justin C <justin.1210@purestblue.com>
Subject: Re: Differential pattern match
Message-Id: <a19lk9-usk.ln1@zem.masonsmusic.co.uk>

On 2012-10-12, Graham <graham.stow@stowassocs.co.uk> wrote:
> Yes, I've gotten lazy and not been using 'strict' and 'warnings'. Apologies. 
> Will backtrack and amend just as soon as I've sorted this problem (promise!)

No, do it now. Get valid code first and then work out
why it doesn't do what you want. It's much easier
that way than to bend invalid code to fit your
purpose and then try and make it valid.

   Justin.

-- 
Justin C, by the sea.


------------------------------

Date: Fri, 12 Oct 2012 12:11:08 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Differential pattern match
Message-Id: <874nm0oshf.fsf@sapphire.mobileactivedefense.com>

"Graham" <graham.stow@stowassocs.co.uk> writes:

[...]

>>> >    my $string  = '(\b [12][0-9] \b | [1-9Xx]) [ ]*';

[...]

>>> String = (\b[12][0-9]\b|[0-9Xx])[]*

[...]

> my $string  = '(\b [12][0-9] \b | [1-9Xx]) [ ]*';

[...]

> /(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) [ ]{0,2} ){6}/x

[...]

> $string = '(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ ]{0,2}';

[...]

> String = [(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ ]{0,2}]
> Chord = 
> [(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ ]{0,2}(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ 
>  ]{0,2}(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ ]{0,2}(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ 
>  ]{0,2}(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ ]{0,2}(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ 
>  ]{0,2}]

[...]

> String = [(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
> [ ]{0,2} )]
> Chord = [(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
> [ ]{0,2} )(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
> [ ]{0,2} )(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
> [ ]{0,2} )(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
> [ ]{0,2} )(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
> [ ]{0,2} )(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
> [ ]{0,2} )]

[...]

> Any thoughts on where I'm going wrong?

You're wrongly assuming that this sequence of expanding line noise
will 'magically' turn into sensible code which happens to produce the
desired result if you just keep adding more {@!!"~££$&&&!!```Honk!
Honk! Honk! Â--$%& to it. In the meantime, you could have implemented
a parser which actually solves your problem three times over, if you
weren't so hell-bent on "five hours of typing to save ten minutes of
thinking".




------------------------------

Date: Fri, 12 Oct 2012 16:55:23 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Differential pattern match
Message-Id: <b4rlk9-hci1.ln1@anubis.morrow.me.uk>


Quoth "Graham" <graham.stow@stowassocs.co.uk>:
> 
> OK, I now understand the repetition operator, but I need to modify the line
> my $string  = '(\b [12][0-9] \b | [1-9Xx]) [ ]*';

This pattern captures something.

> to incoporate an improved pattern match that Ben helped me develop. That 
> final pattern match was
> 
> /(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
> [ ]{0,2} ){6}/x
> 
> I would have thought that I just remove the {6} and the cluster only 
> parentheses from this and therefore the line becomes
> $string = '(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ ]{0,2}';

This pattern doesn't. It also doesn't match anything like the same thing
as the pattern just above it. Why did you take out (all) the grouping
parens? They were there for a reason.

> and the whole sequence to pull out the fret numbers on all 6 strings becomes
<snip>
> 
> Chord = 
> [(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][
> ]{0,2}(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ 
>  ]{0,2}(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][
> ]{0,2}(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ 
>  ]{0,2}(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][
> ]{0,2}(?<=[Xx])|\b[12][0-9](?=[Xx])|\b|[xX0-9][ 
>  ]{0,2}]

Look at that pattern. If you had made any attempt to write it readably
you might have written something like

    (?<=[Xx])
    | \b[12][0-9](?=[Xx])
    | \b
    | [xX0-9][ ]{0,2}(?<=[Xx])
    | \b[12][0-9](?=[Xx])
    | ...

that is, the whole thing is one long sequence of alternations (because
you took the groups out), and one of those alternatives is just /\b/.
This will *always* match, so the pattern is telling you precisely
nothing.

> If I retain the cluster parentheses, I get a similar result.
> 
> String = [(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
> [ ]{0,2} )]
> Chord = [(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
> [ ]{0,2} )(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
> [ ]{0,2} )(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
> [ ]{0,2} )(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
> [ ]{0,2} )(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
> [ ]{0,2} )(?: (?: (?:(?<=[Xx])|\b) [12][0-9] (?:(?=[Xx])|\b) | [xX0-9] ) 
> [ ]{0,2} )]

That's a bit more like it. This will at least match the same text as the
(?:...){0,6} pattern above. However, it's still not capturing anything:
you need to add some plain () capturing parens around the bit of the
pattern you care about.

Ben



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3794
***************************************


home help back first fref pref prev next nref lref last post