[28871] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 115 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Feb 9 23:01:35 2007

Date: Fri, 9 Feb 2007 20:00:42 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 9 Feb 2007     Volume: 11 Number: 115

Today's topics:
        Non-greedy matching problem jason.yfho@gmail.com
    Re: Non-greedy matching problem <1usa@llenroc.ude.invalid>
    Re: Non-greedy matching problem <dave.slayton@gmail.com>
    Re: Non-greedy matching problem <kenslaterpa@hotmail.com>
    Re: Non-greedy matching problem <rvtol+news@isolution.nl>
    Re: Non-greedy matching problem <evad.notyals@liamg.moc>
    Re: Non-greedy matching problem <nobull67@gmail.com>
    Re: Non-greedy matching problem jason.yfho@gmail.com
    Re: Non-greedy matching problem jason.yfho@gmail.com
    Re: Non-greedy matching problem <nobull67@gmail.com>
    Re: Non-greedy matching problem <hjp-usenet2@hjp.at>
    Re: Non-greedy matching problem <1usa@llenroc.ude.invalid>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 3 Feb 2007 07:36:53 -0800
From: jason.yfho@gmail.com
Subject: Non-greedy matching problem
Message-Id: <1170517013.074401.141580@v45g2000cwv.googlegroups.com>

Hello!

I want to get the shortest length of string that starts with "i" and
ends with "s" from string "iiiidssss" using regular expression, that
is "id"s". Any idea? Mine result is not non-greddy enough.

$text = "iiiidssss";
$text =~ m/(i.+?s)/;
$1 is "iiiids", but I want to get "ids". How?

Thank you!

Rgds,
Jason



------------------------------

Date: Sat, 03 Feb 2007 16:29:10 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Non-greedy matching problem
Message-Id: <Xns98CC751BFB399asu1cornelledu@127.0.0.1>

jason.yfho@gmail.com wrote in news:1170517013.074401.141580
@v45g2000cwv.googlegroups.com:

> I want to get the shortest length of string that starts with "i" and
> ends with "s" from string "iiiidssss" using regular expression, that
> is "id"s". Any idea? Mine result is not non-greddy enough.
> 
> $text = "iiiidssss";
> $text =~ m/(i.+?s)/;
> $1 is "iiiids", but I want to get "ids". How?
> 

It seems a perfect match for index and rindex:

#!/usr/bin/perl

use strict;
use warnings;

my $s = 'iiiidssss';

my $start = rindex $s, 'i';
my $end   = index  $s, 's';

if ( $start > -1 and $start < $end ) {
    print substr( $s, $start, $end - $start + 1), "\n";
}

__END__

A possible regex solution:

#!/usr/bin/perl

use strict;
use warnings;

my $s = 'iiiidssss';

if ( $s =~ /i*(i[^s]*s)s*/ ) {
    print "$1\n";
}

__END__

-- 
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html



------------------------------

Date: Sat, 3 Feb 2007 09:43:14 -0700
From: "Dave Slayton" <dave.slayton@gmail.com>
Subject: Re: Non-greedy matching problem
Message-Id: <8f-dnXeR7uQAJlnYnZ2dnUVZ_qOpnZ2d@comcast.com>

how about this instead?
$text =~ /(i[^is]*?s)/;

now you're getting an 'i', followed by the minimum possible number of 
anything that's not an 'i' or 's', followed by an 's'....


<jason.yfho@gmail.com> wrote in message 
news:1170517013.074401.141580@v45g2000cwv.googlegroups.com...
> Hello!
>
> I want to get the shortest length of string that starts with "i" and
> ends with "s" from string "iiiidssss" using regular expression, that
> is "id"s". Any idea? Mine result is not non-greddy enough.
>
> $text = "iiiidssss";
> $text =~ m/(i.+?s)/;
> $1 is "iiiids", but I want to get "ids". How?
>
> Thank you!
>
> Rgds,
> Jason
> 




------------------------------

Date: 3 Feb 2007 10:07:56 -0800
From: "kens" <kenslaterpa@hotmail.com>
Subject: Re: Non-greedy matching problem
Message-Id: <1170526076.781972.32350@p10g2000cwp.googlegroups.com>

On Feb 3, 11:43 am, "Dave Slayton" <dave.slay...@gmail.com> wrote:
> how about this instead?
> $text =~ /(i[^is]*?s)/;
>
> now you're getting an 'i', followed by the minimum possible number of
> anything that's not an 'i' or 's', followed by an 's'....
>
> <jason.y...@gmail.com> wrote in message
>
> news:1170517013.074401.141580@v45g2000cwv.googlegroups.com...
>
> > Hello!
>
> > I want to get the shortest length of string that starts with "i" and
> > ends with "s" from string "iiiidssss" using regular expression, that
> > is "id"s". Any idea? Mine result is not non-greddy enough.
>
> > $text = "iiiidssss";
> > $text =~ m/(i.+?s)/;
> > $1 is "iiiids", but I want to get "ids". How?
>
> > Thank you!
>
> > Rgds,
> > Jason

Please do not top-post - makes it harder to follow the conversation.

A minor point - the question mark (?) is not necessary in your regular
expression
>  $text =~ /(i[^is]*?s)/;
If 's' is not included in the character class, it is needed:
 $text =~ /(i[^i]*?s)/;
Ken




------------------------------

Date: Sat, 3 Feb 2007 19:03:56 +0100
From: "Dr.Ruud" <rvtol+news@isolution.nl>
Subject: Re: Non-greedy matching problem
Message-Id: <eq2meo.io.1@news.isolution.nl>

Dave Slayton schreef:

> jason.yfho@gmail.com:

>> I want to get the shortest length of string that starts with "i" and
>> ends with "s" from string "iiiidssss" using regular expression, that
>> is "id"s".
>
> how about this instead?
> $text =~ /(i[^is]*?s)/;
>
> now you're getting an 'i', followed by the minimum possible number of
> anything that's not an 'i' or 's', followed by an 's'....

The greediness is of no importance here, so /(i[^is]*s)/ is better, or
/(i[^is]+s)/ if at least one character should be between i and s.

If more than one of such string can be in the main string, you'll need
to capture them all, then sort on length and take the shortest.

perl -Mstrict -wle'
  print +(sort {length $a <=> length $b}
      /i[^is]+s/g)[0]
          for @ARGV;
' iiiiabcdssssiiiiabssssiiiiabcssss isiasiabs
iabs
ias

-- 
Affijn, Ruud

"Gewoon is een tijger."



------------------------------

Date: Sat, 3 Feb 2007 11:19:35 -0700
From: "Dave Slayton" <evad.notyals@liamg.moc>
Subject: Re: Non-greedy matching problem
Message-Id: <GumdnXs1LNGvT1nYnZ2dnUVZ_vamnZ2d@comcast.com>

"kens" <kenslaterpa@hotmail.com> wrote in message 
news:1170526076.781972.32350@p10g2000cwp.googlegroups.com...
> On Feb 3, 11:43 am, "Dave Slayton" <dave.slay...@gmail.com> wrote:
>> how about this instead?
>> $text =~ /(i[^is]*?s)/;
>>
>> now you're getting an 'i', followed by the minimum possible number of
>> anything that's not an 'i' or 's', followed by an 's'....
>>
>> <jason.y...@gmail.com> wrote in message
>>
>> news:1170517013.074401.141580@v45g2000cwv.googlegroups.com...
>>
>> > Hello!
>>
>> > I want to get the shortest length of string that starts with "i" and
>> > ends with "s" from string "iiiidssss" using regular expression, that
>> > is "id"s". Any idea? Mine result is not non-greddy enough.
>>
>> > $text = "iiiidssss";
>> > $text =~ m/(i.+?s)/;
>> > $1 is "iiiids", but I want to get "ids". How?
>>
>> > Thank you!
>>
>> > Rgds,
>> > Jason
>
> Please do not top-post - makes it harder to follow the conversation.
>
> A minor point - the question mark (?) is not necessary in your regular
> expression
>>  $text =~ /(i[^is]*?s)/;
> If 's' is not included in the character class, it is needed:
> $text =~ /(i[^i]*?s)/;
> Ken

Sorry.  Won't happen again.

Also, maybe my solution isn't optimal...he said he wanted 'the shortest 
length of string that starts with "i" and ends with "s" from string 
"iiiidssss" using regular expression'...well, it works on *that* 
string...but not if he wanted the shortest possible such substring from 
*any* string....for the string "iiiiiiiiunderstandingssssssssss" my regex 
gets "iunders" and not "ings", cuz "iunders" is the leftmost valid match, so 
it wins even though it's longer....not sure how to make it get "ings"...the 
solution offered by A. Sinan Unur has the same "problem". 




------------------------------

Date: 3 Feb 2007 10:19:53 -0800
From: "Brian McCauley" <nobull67@gmail.com>
Subject: Re: Non-greedy matching problem
Message-Id: <1170526793.082583.54240@l53g2000cwa.googlegroups.com>

On Feb 3, 3:36 pm, jason.y...@gmail.com wrote:

> I want to get the shortest length of string that starts with "i" and
> ends with "s" from string "iiiidssss" using regular expression, that
> is "id"s". Any idea? Mine result is not non-greddy enough.
>
> $text = "iiiidssss";
> $text =~ m/(i.+?s)/;
> $1 is "iiiids", but I want to get "ids". How?

Greedyness only applies at the end.

You can put something greedy in front, but this will always find the
last match.

 $text =~ m/.*(i.+?s)/;

It still wont find the globally shortest match. in "iiassiibbss" it
will find "ibbs".

To find the globally shotest match you need to find all the matches
and then find the shortest. I could show you how (later) if you want
but I'm just on my way out.



------------------------------

Date: 3 Feb 2007 18:00:25 -0800
From: jason.yfho@gmail.com
Subject: Re: Non-greedy matching problem
Message-Id: <1170554424.158159.40370@a75g2000cwd.googlegroups.com>

Thank you very much for all replies.

How about this case? a similar problem, but this time not just to
match one single character as start or end in a string.

$text = '<script language="javascript">functionA( );</script><script
language="javascript">functionB( );</script><script
language="javascript">functionC( );</script>';

Want to extract the shortest string with '<script' as start and '</
script>' as the end with functionB in-between.

So what I want to get is the shortest match '<script
language="javascript">functionB( );</script>' from the $text.

Code:
$text =~ /(<script.+?functionB.+?<\/script>)/;
But $1 will be the longest match

Thank you!

Rgds,
Jason



------------------------------

Date: 3 Feb 2007 18:01:27 -0800
From: jason.yfho@gmail.com
Subject: Re: Non-greedy matching problem
Message-Id: <1170554487.761922.136240@a34g2000cwb.googlegroups.com>

Thank you very much for all replies.

How about this case? a similar problem, but this time not just to
match one single character as start or end in a string.

$text = '<script language="javascript">functionA( );</script><script
language="javascript">functionB( );</script><script
language="javascript">functionC( );</script>';

Want to extract the shortest string with '<script' as start and '</
script>' as the end with functionB in-between.

So what I want to get is the shortest match '<script
language="javascript">functionB( );</script>' from the $text.

Code:
$text =~ /(<script.+?functionB.+?<\/script>)/;
But $1 will be the longest match

Thank you!

Rgds,
Jason



------------------------------

Date: 4 Feb 2007 04:56:33 -0800
From: "Brian McCauley" <nobull67@gmail.com>
Subject: Re: Non-greedy matching problem
Message-Id: <1170593793.065821.167500@k78g2000cwa.googlegroups.com>

On Feb 4, 2:00 am, jason.y...@gmail.com wrote:
> How about this case? a similar problem, but this time not just to
> match one single character as start or end in a string.

Yes, I'd guessed your real problem might be of this nature, which is
why I didn't provide a character class based solution.

> $text = '<script language="javascript">functionA( );</script><script
> language="javascript">functionB( );</script><script
> language="javascript">functionC( );</script>';
>
> Want to extract the shortest string with '<script' as start and '</
> script>' as the end with functionB in-between.

Again, to get the globally shortest you need to find all candiates and
select the shortest.

> So what I want to get is the shortest match '<script
> language="javascript">functionB( );</script>' from the $text.
>
> Code:
> $text =~ /(<script.+?functionB.+?<\/script>)/;
> But $1 will be the longest match

Not necessarily.

Consider

$text='<script>functionB</script><script>longer! functionB</script>';

Your regex does _not_ find the _longest_ match. It finds the match
that starts in the leftmost position.

I suspect you are not thinking hard enough about what you want. By a
literal interpretation your description of what you want the following
would be an OK match: '<script></script>functionB<script></script>'.
Somehow I suspect (based on domain knowledge) that you wouldn't want
this to be a match but unfortunately computers don't have knowledge
and tend to a bit literal.

For parsing HTML you really should consider using an HTML parser. Any
simple pattern match will fail sooner or later.



------------------------------

Date: Sun, 4 Feb 2007 19:40:15 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Non-greedy matching problem
Message-Id: <slrnesca4f.30u.hjp-usenet2@yoyo.hjp.at>

On 2007-02-03 16:29, A. Sinan Unur <1usa@llenroc.ude.invalid> wrote:
> jason.yfho@gmail.com wrote in news:1170517013.074401.141580
> @v45g2000cwv.googlegroups.com:
>
>> I want to get the shortest length of string that starts with "i" and
>> ends with "s" from string "iiiidssss" using regular expression, that
>> is "id"s". Any idea? Mine result is not non-greddy enough.
>> 
>> $text = "iiiidssss";
>> $text =~ m/(i.+?s)/;
>> $1 is "iiiids", but I want to get "ids". How?
>> 
>
> It seems a perfect match for index and rindex:
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my $s = 'iiiidssss';

my $s = 'iiiidssssi';


>
> my $start = rindex $s, 'i';
> my $end   = index  $s, 's';
>
> if ( $start > -1 and $start < $end ) {
>     print substr( $s, $start, $end - $start + 1), "\n";
> }
>
> __END__
>

Hint to the OP: If you provide only one example string and no rules how
it was constructed we can only provide solutions which work for this
string, but not for similar strings because we don't know what "similar"
means.

	hp

-- 
   _  | Peter J. Holzer    | Es ist ganz einfach ihn zu verstehen, wenn
|_|_) | Sysadmin WSR       | man nur alle wichtigen Worte im Satz durch
| |   | hjp@hjp.at         | andere ersetzt.
__/   | http://www.hjp.at/ |	-- Nils Ketelsen in danr


------------------------------

Date: Sun, 04 Feb 2007 19:00:29 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Non-greedy matching problem
Message-Id: <Xns98CD8EC41FCA2asu1cornelledu@127.0.0.1>

"Peter J. Holzer" <hjp-usenet2@hjp.at> wrote in
news:slrnesca4f.30u.hjp-usenet2@yoyo.hjp.at: 

> On 2007-02-03 16:29, A. Sinan Unur <1usa@llenroc.ude.invalid> wrote:
>> jason.yfho@gmail.com wrote in news:1170517013.074401.141580
>> @v45g2000cwv.googlegroups.com:
>>
>>> I want to get the shortest length of string that starts with "i" and
>>> ends with "s" from string "iiiidssss" using regular expression, that
>>> is "id"s". Any idea? Mine result is not non-greddy enough.
>>> 
>>> $text = "iiiidssss";
>>> $text =~ m/(i.+?s)/;
>>> $1 is "iiiids", but I want to get "ids". How?
>>> 
>>
>> It seems a perfect match for index and rindex:
>>
>> #!/usr/bin/perl
>>
>> use strict;
>> use warnings;
>>
>> my $s = 'iiiidssss';
> 
> my $s = 'iiiidssssi';


Thank you for the counter-example. However, please note that my code was
specifically constructed to address *only* the case given by the OP.
Nothing more nothing less. 

> Hint to the OP: If you provide only one example string and no rules
> how it was constructed we can only provide solutions which work for
> this string, but not for similar strings because we don't know what
> "similar" means.

Ditto.

Sinan
-- 
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 115
**************************************


home help back first fref pref prev next nref lref last post