[32151] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3416 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Jun 15 18:09:22 2011

Date: Wed, 15 Jun 2011 15:09:06 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 15 Jun 2011     Volume: 11 Number: 3416

Today's topics:
    Re: Catastrophic regexp performance with grouping. <uri@StemSystems.com>
    Re: Catastrophic regexp performance with grouping. <alessandro.forghieri@gmail.com>
    Re: Catastrophic regexp performance with grouping. sln@netherlands.com
    Re: Catastrophic regexp performance with grouping. sln@netherlands.com
    Re: Catastrophic regexp performance with grouping. <uri@StemSystems.com>
    Re: FAQ 2.6 What modules and extensions are available f <brian.d.foy@gmail.com>
    Re: FAQ 2.6 What modules and extensions are available f <ralph@happydays.com>
    Re: How initialize an array to 0's? <tzz@lifelogs.com>
    Re: Regex Matching <dnlchen@gmail.com>
    Re: Regex Matching <uri@StemSystems.com>
    Re: Regex Matching <dnlchen@gmail.com>
    Re: Regexp help to understand <ken@swiss-soccer.net>
    Re: Regexp help to understand sln@netherlands.com
    Re: Regexp help to understand <nospampleasebutthisisvalid3@gmx.net>
    Re: Regexp help to understand <keithdlee2000@gmail.com>
    Re: Regexp help to understand <jimsgibson@gmail.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 15 Jun 2011 11:23:04 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: Catastrophic regexp performance with grouping.
Message-Id: <871uyvf7af.fsf@quad.sysarch.com>

>>>>> "a" == alf  <alessandro.forghieri@gmail.com> writes:

  a> Greetings.
  a> Consider the follwing code (5.8.8, perl -V follows):
  a> my $strf;
  a> {
  a> local $/=undef;
  a> $strf=<$ARGV[0]>; 
  a> }
  a> $strf =~ s/(?:\s*\r?\n?)*\cL/\cL/gx;

there is no need for the /x there as you don't do any extended regex stuff.

  a> # $strf =~ s/(\s*\r?\n?)*\cL/\cL/g;

  a> When the second (commented) form of the regexp is used, it becomes
  a> about 300 times slower (less than 1 sec vs. 4 minutes on a 36894
  a> lines file). Now I know that grouping (which is not even needed in
  a> the above) pays a price, but this seems pathological...ridiculus
  a> even.

well your regex has some pathological problems that are exacerbated when
you grab instead of just group. the problem is the (\s*\r?\n?)*
part. note that each part of the inside is optional and so is the whole
group. this means perl will likely try massive numbers of combinations
to try those matches and it all matches against a null string. and for
each of those matches, it will copy its match to a buffer because it is
grabbed. this happens even if the later parts of the regex fail. that is
a hell of a lot of copying for no reason. the (?: version doesn't do the
copying so it is much faster but it still has the bad regex which is
nested optional stuff. that is a known pathological problem (not just
perl, but regexes in general). you should never do nested null matches
like that. just changing some of the * to + may fix it. i don't know the
data you are trying to match so i can't be more specific. the clause is
looking for whitespace followed by optional \r\n endings. but that whole
part can be matched infinite numbers of times. the question is will you
really ever see a null string before your \cL parts? if not, fix the
regex so you make a better match (which could be optional) and the
pathological problem will go away. even 1 second for that match is slow
unless you are scanning a ton of data which means it is burning cpu
there as well.

  a> Versione details follow:

you generally don't need to post perl -V. -v will do fine. in this case
it isn't relevent.

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Wed, 15 Jun 2011 09:30:42 -0700 (PDT)
From: alf <alessandro.forghieri@gmail.com>
Subject: Re: Catastrophic regexp performance with grouping.
Message-Id: <5cde4b3f-401f-42f7-9021-f1b6c46bd6fb@glegroupsg2000goo.googlegroups.com>

Greetings.
On Wednesday, June 15, 2011 5:23:04 PM UTC+2, Uri Guttman wrote:

> there is no need for the /x there as you don't do any extended regex stuf=
f.

Uh. I thought that (?: contructs were in the /x province.


[...]
> try massive numbers of combinations
> to try those matches and it all matches against a null string. and for
> each of those matches, it will copy its match to a buffer because it is
> grabbed. this happens even if the later parts of the regex fail. that is
> a hell of a lot of copying for no reason.=20

I see what you are saying. The data itself are \cL (<FF>) separated invoice=
s (the whole shebang is part of a custom cups filter) and each of them has =
an unspecified number of intervening blank lines. Sometimes there is whites=
pace, sometimes it ain't there - go figure. Hence the laziness in the regex=
p. But, I thought that the final \cL was enough to anchor it - obviously, I=
 was mistaken.=20

Cheers,
alf


------------------------------

Date: Wed, 15 Jun 2011 09:57:50 -0700
From: sln@netherlands.com
Subject: Re: Catastrophic regexp performance with grouping.
Message-Id: <bsohv6pugirqdsli1e2ae65gkm3181jsma@4ax.com>

On Wed, 15 Jun 2011 09:30:42 -0700 (PDT), alf <alessandro.forghieri@gmail.com> wrote:

>Greetings.
>On Wednesday, June 15, 2011 5:23:04 PM UTC+2, Uri Guttman wrote:
>
>> there is no need for the /x there as you don't do any extended regex stuff.
>
>Uh. I thought that (?: contructs were in the /x province.
>
>
>[...]
>> try massive numbers of combinations
>> to try those matches and it all matches against a null string. and for
>> each of those matches, it will copy its match to a buffer because it is
>> grabbed. this happens even if the later parts of the regex fail. that is
>> a hell of a lot of copying for no reason. 
>
>I see what you are saying. The data itself are \cL (<FF>) separated invoices
> (the whole shebang is part of a custom cups filter) and each of them has an
> unspecified number of intervening blank lines. Sometimes there is whitespace,
> sometimes it ain't there - go figure. Hence the laziness in the regexp.
> But, I thought that the final \cL was enough to anchor it - obviously, I was mistaken. 
>

Apart from (*)* , the problem is that this   (?:\s*\r?\n?)*  actually specifies
an order of appearence, a sequence. The engine is trying match that sequence in stages
of backtracking.

If the order is unspecified, just use a character class instead:
[\s\r\n]*

-sln


------------------------------

Date: Wed, 15 Jun 2011 10:00:45 -0700
From: sln@netherlands.com
Subject: Re: Catastrophic regexp performance with grouping.
Message-Id: <18phv65cs1mlb3ath0h3rg3rhdddsug63p@4ax.com>

On Wed, 15 Jun 2011 09:57:50 -0700, sln@netherlands.com wrote:

>On Wed, 15 Jun 2011 09:30:42 -0700 (PDT), alf <alessandro.forghieri@gmail.com> wrote:
>
>>Greetings.
>>On Wednesday, June 15, 2011 5:23:04 PM UTC+2, Uri Guttman wrote:
>>
>>> there is no need for the /x there as you don't do any extended regex stuff.
>>
>>Uh. I thought that (?: contructs were in the /x province.
>>
>>
>>[...]
>>> try massive numbers of combinations
>>> to try those matches and it all matches against a null string. and for
>>> each of those matches, it will copy its match to a buffer because it is
>>> grabbed. this happens even if the later parts of the regex fail. that is
>>> a hell of a lot of copying for no reason. 
>>
>>I see what you are saying. The data itself are \cL (<FF>) separated invoices
>> (the whole shebang is part of a custom cups filter) and each of them has an
>> unspecified number of intervening blank lines. Sometimes there is whitespace,
>> sometimes it ain't there - go figure. Hence the laziness in the regexp.
>> But, I thought that the final \cL was enough to anchor it - obviously, I was mistaken. 
>>
>
>Apart from (*)* , the problem is that this   (?:\s*\r?\n?)*  actually specifies
>an order of appearence, a sequence. The engine is trying match that sequence in stages
>of backtracking.
>
>If the order is unspecified, just use a character class instead:
>[\s\r\n]*

Doh, but \s* would do instead wouldn't it?

-sln


------------------------------

Date: Wed, 15 Jun 2011 14:10:57 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: Catastrophic regexp performance with grouping.
Message-Id: <87ei2vc6dq.fsf@quad.sysarch.com>

>>>>> "a" == alf  <alessandro.forghieri@gmail.com> writes:

  a> Greetings.
  a> On Wednesday, June 15, 2011 5:23:04 PM UTC+2, Uri Guttman wrote:

  >> there is no need for the /x there as you don't do any extended regex stuff.

  a> Uh. I thought that (?: contructs were in the /x province.

nope. /x allows whitespace, comments on lines with # and some other
things. (?:) is in basic regexes.


  a> [...]
  >> try massive numbers of combinations
  >> to try those matches and it all matches against a null string. and for
  >> each of those matches, it will copy its match to a buffer because it is
  >> grabbed. this happens even if the later parts of the regex fail. that is
  >> a hell of a lot of copying for no reason. 

  a> I see what you are saying. The data itself are \cL (<FF>) separated
  a> invoices (the whole shebang is part of a custom cups filter) and
  a> each of them has an unspecified number of intervening blank
  a> lines. Sometimes there is whitespace, sometimes it ain't there - go
  a> figure. Hence the laziness in the regexp. But, I thought that the
  a> final \cL was enough to anchor it - obviously, I was mistaken.

and \f is formfeed iirc. much easier to read than \cL as i don't know
the control code of that anymore (obviously now i know it is L).

the issue isn't anchoring, it is repeated nesting of a null match. how
can perl decide where and when to stop the match?

if you have blank lines, then you have real \r\n there so by dropping
the ? you will make the regex work well. it would be rare to expect
either \r or \n or both so use the correct one and make them expected
and then make that part option with the * quantifier. the leading
whitespace can then stay with *.

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Wed, 15 Jun 2011 19:23:16 +0200
From: brian d foy <brian.d.foy@gmail.com>
Subject: Re: FAQ 2.6 What modules and extensions are available for Perl? What is CPAN? What does CPAN/src/... mean?
Message-Id: <150620111923165639%brian.d.foy@gmail.com>

In article <c24b0$4df8e909$ce534406$18875@news.eurofeeds.com>, Ralph
Malph <ralph@happydays.com> wrote:

> Putting dates in the faq like this invariably makes the faq look dated 
> unless
> the date and related info is kept updated.

True, but not putting the dates in there makes it look up-to-date when
it is not.


------------------------------

Date: Wed, 15 Jun 2011 13:16:57 -0400
From: Ralph Malph <ralph@happydays.com>
Subject: Re: FAQ 2.6 What modules and extensions are available for Perl? What is CPAN? What does CPAN/src/... mean?
Message-Id: <c24b0$4df8e909$ce534406$18875@news.eurofeeds.com>

Putting dates in the faq like this invariably makes the faq look dated 
unless
the date and related info is kept updated.
5 years is kind of a long time.
Maybe just give an estimate to the nearest of magnitude like McDonalds 
does ("billions and billions") and be done with it.

>      Considering that, as of 2006, there are over ten thousand existing
>      modules in the archive, one probably exists to do nearly anything you
>      can think of. Current categories under "CPAN/modules/by-category/"
>      include Perl core modules; development support; operating system
>      interfaces; networking, devices, and interprocess communication; data
>      type utilities; database interfaces; user interfaces; interfaces to
>      other languages; filenames, file systems, and file locking;
>      internationalization and locale; world wide web support; server and
>      daemon utilities; archiving and compression; image manipulation; mail
>      and news; control flow utilities; filehandle and I/O; Microsoft Windows
>      modules; and miscellaneous modules.




------------------------------

Date: Wed, 15 Jun 2011 09:22:41 -0500
From: Ted Zlatanov <tzz@lifelogs.com>
Subject: Re: How initialize an array to 0's?
Message-Id: <87aadjuqby.fsf@lifelogs.com>

On Wed, 15 Jun 2011 07:53:19 +0000 (UTC) Willem <willem@toad.stack.nl> wrote: 

W> sln@netherlands.com wrote:
W> ) On Tue, 14 Jun 2011 14:47:55 -0500, Ted Zlatanov <tzz@lifelogs.com> wrote:
W> )
W> )>On Tue, 14 Jun 2011 12:03:52 -0700 sln@netherlands.com wrote: 
W> )>
W> )>s> @array[0 .. $jornadas] = eval "[(0) x ($partidos+1)]," x ($jornadas+1);
W> )>
W> )>Using eval() is almost always a bad idea (unless you're writing Tcl, the
W> )>ultimate eval() abuser).  It indicates all other options have failed.
W> )>
W> )>map() is the right solution here.  For once :)
W> )>
W> )
W> ) Definetly agree. 'eval' should have been named evil()

W> And the block-version of 'eval' ?  I use that all the time.

Sorry, I should have said "string eval" but I really don't think of the
other one as an eval.  It's more of a try/finally exception handler.
IMO it's badly named.

Ted


------------------------------

Date: Wed, 15 Jun 2011 13:17:54 -0700 (PDT)
From: DanielC <dnlchen@gmail.com>
Subject: Re: Regex Matching
Message-Id: <89c5feb8-a249-460a-9959-dba9f9e5adb9@35g2000prp.googlegroups.com>

On Jun 13, 5:22=A0pm, Tad McClellan <ta...@seesig.invalid> wrote:
> DanielC <dnlc...@gmail.com> wrote:
> > On Jun 12, 3:39=A0pm, s...@netherlands.com wrote:
> >> =A0 my $field_sep =3D '(?:\b.*\b)';
> > Is ?: a ternary operator?
>
> Yes.
>
> But that is not ?:
>
> That is (?: =A0... =A0)
>
> Which, as with all regex constructs, is documented in perlre.pod
>
> =A0 =A0 =3Ditem C<(?:pattern)>
>
> =A0 =A0 This is for clustering, not capturing; it groups subexpressions l=
ike
> =A0 =A0 "()", but doesn't make backreferences as "()" does.
>
> > How come <print @found. " lines:\n"> gives the number of elements in
> > an array?
>
> Because the name of an array gives the number of elements
> when used in a scalar context like that.
>
> http://stackoverflow.com/questions/4802541/scalar-and-list-context-in...
>
> --
> Tad McClellan
> email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
> The above message is a Usenet post.
> I don't recall having given anyone permission to use it on a Web site.

Is this being used often by Perl devs? I asked some my colleagues who
are Perl devs and they don't know this and even they don't use
reference very much.


------------------------------

Date: Wed, 15 Jun 2011 16:24:22 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: Regex Matching
Message-Id: <87fwnaalmx.fsf@quad.sysarch.com>

>>>>> "D" == DanielC  <dnlchen@gmail.com> writes:

  D> On Jun 13, 5:22 pm, Tad McClellan <ta...@seesig.invalid> wrote:
  >> DanielC <dnlc...@gmail.com> wrote:
  >> > On Jun 12, 3:39 pm, s...@netherlands.com wrote:
  >> >>   my $field_sep = '(?:\b.*\b)';
  >> > Is ?: a ternary operator?
  >> 
  >> Yes.
  >> 
  >> But that is not ?:
  >> 
  >> That is (?:  ...  )
  >> 
  >> Which, as with all regex constructs, is documented in perlre.pod
  >> 
  >>     =item C<(?:pattern)>
  >> 
  >>     This is for clustering, not capturing; it groups subexpressions like
  >>     "()", but doesn't make backreferences as "()" does.
  >> 
  >> > How come <print @found. " lines:\n"> gives the number of elements in
  >> > an array?
  >> 
  >> Because the name of an array gives the number of elements
  >> when used in a scalar context like that.
  >> 
  >> http://stackoverflow.com/questions/4802541/scalar-and-list-context-in...
  >> 
  >> --
  >> Tad McClellan
  >> email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
  >> The above message is a Usenet post.
  >> I don't recall having given anyone permission to use it on a Web site.

  D> Is this being used often by Perl devs? I asked some my colleagues who
  D> are Perl devs and they don't know this and even they don't use
  D> reference very much.

it helps if you are explicit about 'this'. do you mean ternary operator?
do you mean (?:) in regexes? or was is @array in scalar context? what do
you mean by "don't use reference very much"? the word "reference" wasn't
mentioned in the quoted post (backreferences was).

and if they are perl devs and don't use any of those features, they
aren't very good perl devs. those are basic useful things that should be
in the toolbox of any decent perl developer.

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Wed, 15 Jun 2011 13:56:50 -0700 (PDT)
From: DanielC <dnlchen@gmail.com>
Subject: Re: Regex Matching
Message-Id: <e0228b20-fbf5-4ab2-96ed-eabfff772aa4@x38g2000pri.googlegroups.com>

On Jun 15, 1:24=A0pm, "Uri Guttman" <u...@StemSystems.com> wrote:
> >>>>> "D" =3D=3D DanielC =A0<dnlc...@gmail.com> writes:
>
> =A0 D> On Jun 13, 5:22=A0pm, Tad McClellan <ta...@seesig.invalid> wrote:
> =A0 >> DanielC <dnlc...@gmail.com> wrote:
> =A0 >> > On Jun 12, 3:39=A0pm, s...@netherlands.com wrote:
> =A0 >> >> =A0 my $field_sep =3D '(?:\b.*\b)';
> =A0 >> > Is ?: a ternary operator?
> =A0 >>
> =A0 >> Yes.
> =A0 >>
> =A0 >> But that is not ?:
> =A0 >>
> =A0 >> That is (?: =A0... =A0)
> =A0 >>
> =A0 >> Which, as with all regex constructs, is documented in perlre.pod
> =A0 >>
> =A0 >> =A0 =A0 =3Ditem C<(?:pattern)>
> =A0 >>
> =A0 >> =A0 =A0 This is for clustering, not capturing; it groups subexpres=
sions like
> =A0 >> =A0 =A0 "()", but doesn't make backreferences as "()" does.
> =A0 >>
> =A0 >> > How come <print @found. " lines:\n"> gives the number of element=
s in
> =A0 >> > an array?
> =A0 >>
> =A0 >> Because the name of an array gives the number of elements
> =A0 >> when used in a scalar context like that.
> =A0 >>
> =A0 >>http://stackoverflow.com/questions/4802541/scalar-and-list-context-=
in...
> =A0 >>
> =A0 >> --
> =A0 >> Tad McClellan
> =A0 >> email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
> =A0 >> The above message is a Usenet post.
> =A0 >> I don't recall having given anyone permission to use it on a Web s=
ite.
>
> =A0 D> Is this being used often by Perl devs? I asked some my colleagues =
who
> =A0 D> are Perl devs and they don't know this and even they don't use
> =A0 D> reference very much.
>
> it helps if you are explicit about 'this'. do you mean ternary operator?
> do you mean (?:) in regexes? or was is @array in scalar context? what do
> you mean by "don't use reference very much"? the word "reference" wasn't
> mentioned in the quoted post (backreferences was).
>
> and if they are perl devs and don't use any of those features, they
> aren't very good perl devs. those are basic useful things that should be
> in the toolbox of any decent perl developer.
>
> uri
>
> --
> Uri Guttman =A0------ =A0u...@stemsystems.com =A0-------- =A0http://www.s=
ysarch.com--
> ----- =A0Perl Code Review , Architecture, Development, Training, Support =
------
> --------- =A0Gourmet Hot Cocoa Mix =A0---- =A0http://bestfriendscocoa.com=
---------

Sorry, I'm not a Perl Dev and I just use Perl like C. I meant "(?:) in
regexes" and "reference".

Thanks all of you!


------------------------------

Date: Wed, 15 Jun 2011 11:55:26 -0400
From: Ken Butler <ken@swiss-soccer.net>
Subject: Re: Regexp help to understand
Message-Id: <alpine.DEB.2.00.1106151142440.16805@ken-laptop>



On Wed, 15 Jun 2011, mike wrote:

> I am trying to use regexp and grouping to extract numbers.
>
> I have the following string:
>
> X-ADF_BASE_Z32B_APP_NO4.20.1_SE
>
> And I am trying to the following regexp:
>
> match types like  4.1, 4.11, 4.1.1, 4.11.1
> 	my ($firstpos, $secondpos, $thirdpos) = /(\d)\.(\d+)\.*(\d*)/;

> 		if ( scalar(@highest) == 0 ) {
>
> 			print $firstpos;
> 			print $secondpos;
> 			print $thirdpos;

First confusing thing: "print" doesn't come with a trailing newline, so 
the values for first, second and third all get smooshed together.

print "$firstpos $secondpos $thirdpos\n";

would print them all on one line with spaces between, with (explicit) 
newline at the end.

> The print becomes:
>
> 42014
> 20
> 1
>
> Can anyone be nice to help me and explain what I am doing wrong.
>
> I was expecting:
>
> 4
> 20
> 1

Because your final dot and last digit(s) are optional, there is a 
potential confusion: something like 4.20 can get matched, not as 4 20 as 
you expect, but as 4 2 0: first the 4 gets matched, then the dot, and then 
what gets matched is the 2 as "1 or more digits", nothing as the "zero or 
more dots", and then the 0 as "0 or more digits". You want to ensure that 
if the second dot doesn't match, you don't go looking for another group of 
digits.

Cheers,
Ken.



------------------------------

Date: Wed, 15 Jun 2011 09:17:30 -0700
From: sln@netherlands.com
Subject: Re: Regexp help to understand
Message-Id: <fgmhv6tbhpaon3536mnj3ca7k8q4kjjekk@4ax.com>

On Wed, 15 Jun 2011 11:55:26 -0400, Ken Butler <ken@swiss-soccer.net> wrote:

>
>
>On Wed, 15 Jun 2011, mike wrote:
>
>> I am trying to use regexp and grouping to extract numbers.
>>
>> I have the following string:
>>
>> X-ADF_BASE_Z32B_APP_NO4.20.1_SE
>>
>> And I am trying to the following regexp:
>>
>> match types like  4.1, 4.11, 4.1.1, 4.11.1
>> 	my ($firstpos, $secondpos, $thirdpos) = /(\d)\.(\d+)\.*(\d*)/;
>
>> 		if ( scalar(@highest) == 0 ) {
>>
>> 			print $firstpos;
>> 			print $secondpos;
>> 			print $thirdpos;
>
[snip]
>
>Because your final dot and last digit(s) are optional, there is a 
>potential confusion: something like 4.20 can get matched, not as 4 20 as 
>you expect, but as 4 2 0: first the 4 gets matched, then the dot, and then 
>what gets matched is the 2 as "1 or more digits", nothing as the "zero or 
>more dots", and then the 0 as "0 or more digits". You want to ensure that 
>if the second dot doesn't match, you don't go looking for another group of 
>digits.
>

I think (\d+) will greedily match all the digits since
everything after it is optional. Its the engines shortest path to success.
You would be right if it had been (\d+?)

-sln


------------------------------

Date: Wed, 15 Jun 2011 20:56:37 +0200
From: Wolf Behrenhoff <nospampleasebutthisisvalid3@gmx.net>
Subject: Re: Regexp help to understand
Message-Id: <4df90065$0$6557$9b4e6d93@newsspool4.arcor-online.net>

On 15.06.2011 09:29, mike wrote:
> Hi,
> 
> I am trying to use regexp and grouping to extract numbers.
> 
> I have the following string:
> X-ADF_BASE_Z32B_APP_NO4.20.1_SE
> 
> And I am trying to the following regexp:
> 	my ($firstpos, $secondpos, $thirdpos) = /(\d)\.(\d+)\.*(\d*)/;
> 
> Can anyone be nice to help me and explain what I am doing wrong.
> 
> I was expecting:
> 
> 4
> 20
> 1

Please post a full _minimal_ program so that we can reproduce the problem.

Running this program
$_="X-ADF_BASE_Z32B_APP_NO4.20.1_SE";
/(\d)\.(\d+)\.*(\d*)/ and print "$1, $2, $3\n";

outputs
4, 20, 1

just as expected. (Are you sure you also want to match  "4.5.....666"
and get out 4, 5, 666?)

- Wolf


------------------------------

Date: Wed, 15 Jun 2011 19:32:29 +0000 (UTC)
From: Keith <keithdlee2000@gmail.com>
Subject: Re: Regexp help to understand
Message-Id: <itb1cd$icd$1@dont-email.me>

Wolf:
 Just curious; but, how would one also read the 32in Z32B?  

Keith


------------------------------

Date: Wed, 15 Jun 2011 14:54:19 -0700
From: Jim Gibson <jimsgibson@gmail.com>
Subject: Re: Regexp help to understand
Message-Id: <150620111454198594%jimsgibson@gmail.com>

In article <itb1cd$icd$1@dont-email.me>, Keith
<keithdlee2000@gmail.com> wrote:

> Wolf:
>  Just curious; but, how would one also read the 32in Z32B?  

You can use the regular expression /(\d+)/ to get the first set of
consecutive numerical digits in a string.

You can use /([\d.]+)/ to get the first set of digits that may or may
not include a period.

You can use /(\d+\.\d+)/ to get the first set of digits that contain
exactly one embedded period with at least one digit on either side.

You can use /([\d.]+)/g to extract all sets of digits that may contain
a period, including at the beginning or end of the substring, then test
each extracted substring to see if it matches your desired format.

-- 
Jim Gibson


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3416
***************************************


home help back first fref pref prev next nref lref last post