[32363] in Perl-Users-Digest
Perl-Users Digest, Issue: 3630 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Mar 3 14:09:25 2012
Date: Sat, 3 Mar 2012 11:09:07 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sat, 3 Mar 2012 Volume: 11 Number: 3630
Today's topics:
Re: $var = do { ... }? <rweikusat@mssgmbh.com>
Re: $var = do { ... }? (Tim McDaniel)
Re: $var = do { ... }? (Randal L. Schwartz)
Re: $var = do { ... }? <ben@morrow.me.uk>
Re: $var = do { ... }? <ben@morrow.me.uk>
Re: $var = do { ... }? <rweikusat@mssgmbh.com>
Re: $var = do { ... }? <rweikusat@mssgmbh.com>
Re: $var = do { ... }? <rweikusat@mssgmbh.com>
Re: $var = do { ... }? <m@rtij.nl.invlalid>
Re: $var = do { ... }? <rweikusat@mssgmbh.com>
Re: $var = do { ... }? <rweikusat@mssgmbh.com>
Re: any lnux distro that uses cpanm for linux pacakge <jurgenex@hotmail.com>
any lnux distro that uses cpanm for linux pacakge mana <gavcomedy@gmail.com>
Re: any lnux distro that uses cpanm for linux pacakge <gavcomedy@gmail.com>
Re: Best way to search for a string which has N% in a c <glex_no-spam@qwest-spam-no.invalid>
Re: Best way to search for a string which has N% in a c <pengyu.ut@gmail.com>
Re: Best way to search for a string which has N% in a c <kiuhnm03.4t.yahoo.it>
Re: Best way to search for a string which has N% in a c <kiuhnm03.4t.yahoo.it>
Re: Best way to search for a string which has N% in a c <ben@morrow.me.uk>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Fri, 02 Mar 2012 19:18:10 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: $var = do { ... }?
Message-Id: <877gz2errh.fsf@sapphire.mobileactivedefense.com>
tmcd@panix.com (Tim McDaniel) writes:
> I just had a case where there's a block of code, I split up the work
> into assignments to intermediate variables, but I only wanted the
> final value. I very much like to restrict the scope of variables,
> because it's then obvious what's a temporary and what has more
> significance
[...]
> my $permanent_variable = do {
> my $this = ...;
> my $that = ... $this ...;
> yadda yadda;
> ... final computation ...;
> };
>
> It does do the scope encapsulation that I like, and it makes it
> vividly obvious that the block has one purpose, to set
> $permanent_variable.
[...]
> $var = do { several statement; }
>
> is just an odd construct?
I decidedly do consider this an odd construct: If you have a
self-contained piece of code whose purpose is to perform some
operation independently of the surrounding code and to return some
value, and which possible has a small number of well-defined inputs,
this should be put into a subroutine with a sufficiently descriptive
name that someone who reads through the outer code knows what the
purpose of the subroutine happens to be but without having to bother
with the details of its implementation and that someone who cares
about the implementation of this particular subroutine doesn't have to
go hunting for it in a large block of otherwise unrelated code.
A real-world example of that:
sub validate_customer_key($)
{
my ($bin_ckey, $ckey, $skey, $skey_version);
eval {
($bin_ckey, $skey_version) = decode_bin_ckey($_[0]);
$skey = get_server_key($skey_version);
die("no version $skey_version server key") unless $skey;
dec($bin_ckey, $skey->key());
$ckey = MECSUPD::CustomerKey->new_from_string($bin_ckey);
die("not a valid customer key") unless $ckey;
check_cookie($ckey, $skey);
check_expiry($ckey);
check_ckey_cid_version($ckey);
};
$@ && do {
syslog('ERR', "$@");
return;
};
syslog('INFO', 'customer %08x authenticated successfully',
$ckey->cid());
return 1;
}
['dec' means 'decrypt' here]
This is (not counting external libraries) 'a meeting point' of 80
lines of Perl code and 20 lines of C code and the algorithm this
subroutine is supposed to perform would be totally obliterated if the
100 lines of code implementing all the details had been used in place
of this 22 lines (sloccount) high-level description.
------------------------------
Date: Fri, 2 Mar 2012 19:24:56 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: $var = do { ... }?
Message-Id: <jir6q8$4i1$1@reader1.panix.com>
In article <877gz2errh.fsf@sapphire.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
> [an eval here]
> $@ && do {
> syslog('ERR', "$@");
> return;
> };
Is there a reason to prefer that over
if ($@) {
syslog('ERR', "$@");
return;
}
? I don't see a reason, so I prefer the "if" version.
--
Tim McDaniel, tmcd@panix.com
------------------------------
Date: Fri, 02 Mar 2012 11:45:05 -0800
From: merlyn@stonehenge.com (Randal L. Schwartz)
Subject: Re: $var = do { ... }?
Message-Id: <864nu6wzwe.fsf@red.stonehenge.com>
>>>>> "Tim" == Tim McDaniel <tmcd@panix.com> writes:
Tim> Yeah, I tend to do too much inline code and end up with (for example)
Tim> 300-line blocks of code. You know, though, that it can be annoying to
Tim> subify something when it has lots of external dependencies, whether
Tim> inputs or outputs.
Actually, that'll make your code cleaner. If you have a lot of
undelimited cohesion (I think that's the word), bugs are much harder to
find, and the code is more fragile.
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.posterous.com/ for Smalltalk discussion
------------------------------
Date: Fri, 2 Mar 2012 20:10:42 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: $var = do { ... }?
Message-Id: <23m729-u9f1.ln1@anubis.morrow.me.uk>
Quoth tmcd@panix.com:
> In article <877gz2errh.fsf@sapphire.mobileactivedefense.com>,
> Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
> > [an eval here]
> > $@ && do {
> > syslog('ERR', "$@");
> > return;
> > };
>
> Is there a reason to prefer that over
> if ($@) {
> syslog('ERR', "$@");
> return;
> }
> ? I don't see a reason, so I prefer the "if" version.
You should always prefer Try::Tiny over an explicit eval: there are,
unfortunately, rather a lot of nasty corner cases in perl's handling of
$@, and Try::Tiny deals with as many as it can.
If you insist on doing the eval yourself, you should test the truth of
the eval
eval {
...
} or do {
...
};
rather than relying on $@, since there are cases (destructors, for one,
depending on your version of perl) where $@ can be cleared even though
the eval failed. You still lose the error, but at least you know there
was one.
Ben
------------------------------
Date: Fri, 2 Mar 2012 20:06:45 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: $var = do { ... }?
Message-Id: <lrl729-u9f1.ln1@anubis.morrow.me.uk>
Quoth tmcd@panix.com:
> In article <h0h729-r9e1.ln1@anubis.morrow.me.uk>,
> Ben Morrow <ben@morrow.me.uk> wrote:
> >I agree with Randal on both counts: I use do like this all the time,
> >and you should seriously consider making the block a sub instead.
>
> Yeah, I tend to do too much inline code and end up with (for example)
> 300-line blocks of code.
I do too, unless I make a conscious effort to split them up. IMHO a sub
which is longer than a screenful (24 lines) should be split. It's
*always* worth it, and often surprisingly quickly.
> You know, though, that it can be annoying to
> subify something when it has lots of external dependencies, whether
> inputs or outputs.
It can sometimes require a bit of thought, yes, but in doing so you will
gain a much better understanding of the problem you're trying to solve.
> >> I looked thru the codebase at work and found a few instances of it.
> >> But mostly "do" was used to implement a slurp function,
> >
> >If you mean the
> >
> > my $txt = do {
> > open my $F, "<", ...;
> > local $/;
> > <$F>;
> > };
> >
> >construction then this is exactly the same situation, isn't it?
>
> If the *only* use of it is as
> ... = do {local $/; <HANDLE>};
> or the vast majority of use, then it can just be viewed as an idiom
> for one special task.
I suppose.
> E.g., about the only time I use hash slices is
> my %table;
> @table{@array} = (1) x @array;
> (though I'm reconsidering using instead
> my %table = map { $_ => 1 } @array;
> ).
Well, I would write that
my %table = map +($_, 1), @array;
since I tend to avoid block-map if I can (I'm not entirely sure why...),
but maybe I'm happier with unary-+ than some people.
I have seen people say (here) that what you ought to do is
my %table;
@table{@array} = ();
and then test with 'exists'. Apparently the memory savings are not
inconsiderable.
Ben
------------------------------
Date: Fri, 02 Mar 2012 20:43:17 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: $var = do { ... }?
Message-Id: <87ipimd996.fsf@sapphire.mobileactivedefense.com>
Ben Morrow <ben@morrow.me.uk> writes:
> Quoth tmcd@panix.com:
>> In article <877gz2errh.fsf@sapphire.mobileactivedefense.com>,
>> Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
>> > [an eval here]
>> > $@ && do {
>> > syslog('ERR', "$@");
>> > return;
>> > };
>>
>> Is there a reason to prefer that over
>> if ($@) {
>> syslog('ERR', "$@");
>> return;
>> }
>> ? I don't see a reason, so I prefer the "if" version.
>
> You should always prefer Try::Tiny over an explicit eval: there are,
> unfortunately, rather a lot of nasty corner cases in perl's handling of
> $@, and Try::Tiny deals with as many as it can.
It is a lot better to state what these corner-cases are than to make
nebulous scare-mongering statements about them. I'm aware of one: Code
which is executed as part of a destructor and which doesn't localize
$@ properly may cause it to be cleared or set to a different value. In
particular, this will happen when the destructor calls syslog (that's
were I encountered it). This is, however, not applicable here and in
any case, the solution is to fix the destructors. I'll happily learn
about other such cases. But according to the Try::Tiny documentation,
there are none: The only other thing it mentions is that the 'eval'
might accidentally clear $@ if it is running inside another eval and
$@ wasn't properly localized ... but didn't we have this already?
> If you insist on doing the eval yourself, you should test the truth of
> the eval
>
> eval {
> ...
> } or do {
> ...
> };
>
> rather than relying on $@,
No, I shouldn't "always do this" because I usually know what the code
will be doing when executed, ie, in this case, that there are neither
outer nor inner evals. I understand that this is more difficult for
someone whose preferred solution to any technical problem is "download
150,000 lines of unknown code from the internet" (in order to save
writing 5 lines of code). Regarding this, I'm a follower of the theory
that blind use of complex devices with unknown properties (third-party
written maxi-mega-modules intended to solve 15,000 different trivial
problems with a huge amount of "heavily optimized general-purpose
code") is bound to cause accidental deaths and other nuisances and
therefore, I avoid such situations (fixing the inevitable bugs in a
large body of unknown code is going to take more time than writing a
small amount of new code).
------------------------------
Date: Fri, 02 Mar 2012 20:44:11 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: $var = do { ... }?
Message-Id: <87ehtad97o.fsf@sapphire.mobileactivedefense.com>
tmcd@panix.com (Tim McDaniel) writes:
> In article <877gz2errh.fsf@sapphire.mobileactivedefense.com>,
> Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
>> [an eval here]
>> $@ && do {
>> syslog('ERR', "$@");
>> return;
>> };
>
> Is there a reason to prefer that over
> if ($@) {
> syslog('ERR', "$@");
> return;
> }
> ? I don't see a reason, so I prefer the "if" version.
I'm not aware of any, that's IMHO just a matter of personal
preference.
------------------------------
Date: Fri, 02 Mar 2012 22:47:10 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: $var = do { ... }?
Message-Id: <87399qei35.fsf@sapphire.mobileactivedefense.com>
Rainer Weikusat <rweikusat@mssgmbh.com> writes:
> Ben Morrow <ben@morrow.me.uk> writes:
>> Quoth tmcd@panix.com:
>>> In article <877gz2errh.fsf@sapphire.mobileactivedefense.com>,
>>> Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
>>> > [an eval here]
>>> > $@ && do {
>>> > syslog('ERR', "$@");
>>> > return;
>>> > };
>>>
>>> Is there a reason to prefer that over
>>> if ($@) {
>>> syslog('ERR', "$@");
>>> return;
>>> }
>>> ? I don't see a reason, so I prefer the "if" version.
>>
>> You should always prefer Try::Tiny over an explicit eval: there are,
>> unfortunately, rather a lot of nasty corner cases in perl's handling of
>> $@, and Try::Tiny deals with as many as it can.
>
> It is a lot better to state what these corner-cases are than to make
> nebulous scare-mongering statements about them.
To state this in a clearer way: The two cases this module is supposed
to deal with are
-------------
package a;
sub DESTROY
{
eval { 3 + 1; };
}
package main;
eval {
my $a;
bless(\$a, 'a');
die("gruesome error!");
};
$@ and print("$@\n");
-------------
Upon exiting the eval scope, the a::DESTROY routine will be executed
automatically and the eval in there cause $@ to be cleared. The
solution to this problem is to add a
local $@ if $@
to the beginning of any destructor which does something none-trivial
which might either invoke die or eval. Since a destructor can be
executed automatically after some other code died but before the
unknowing caller had a chance to look at $@, it certainly shouldn't
change an existing value of $@.
The other is
--------------
sub complex_task
{
eval { 3 + 1; };
}
eval {
die("gruesome error!");
};
complex_task();
$@ and print("$@\n");
---------------
This time, the caller is at fault: If some non-trivial action needs to
be performed before looking at $@, the value of $@ immediately after
the eval needs to be save in a 'non-global' variable until it is going
to be used. This is especially true because there is, as the
'BACKGROUND' text of the Try::Tiny documentation aptly explains, no
way the called routine can easily do this.
IMO, both of these 'nasty corner-cases' are actually fairly trivial
programming errors and the general solution to these is to teach
people how to avoid them, not to try to write code which enables them
to remain blissfully unaware of the problem.
------------------------------
Date: Sat, 3 Mar 2012 15:58:07 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: $var = do { ... }?
Message-Id: <v4o929-1u.ln1@news.rtij.nl>
On Fri, 02 Mar 2012 20:43:17 +0000, Rainer Weikusat wrote:
> Ben Morrow <ben@morrow.me.uk> writes:
>> If you insist on doing the eval yourself, you should test the truth of
>> the eval
>>
>> eval {
>> ...
>> } or do {
>> ...
>> };
>>
>> rather than relying on $@,
>
> No, I shouldn't "always do this" because I usually know what the code
> will be doing when executed, ie, in this case, that there are neither
If there is a clear way that is always correct and another equally clear
way that may break on refactoring, I always would choose the first. This
seems like one of those cases (which I didn't know about yet btw).
M4
------------------------------
Date: Sat, 03 Mar 2012 17:11:56 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: $var = do { ... }?
Message-Id: <87ipil1ueb.fsf@sapphire.mobileactivedefense.com>
Martijn Lievaart <m@rtij.nl.invlalid> writes:
> On Fri, 02 Mar 2012 20:43:17 +0000, Rainer Weikusat wrote:
>
>> Ben Morrow <ben@morrow.me.uk> writes:
>
>>> If you insist on doing the eval yourself, you should test the truth of
>>> the eval
>>>
>>> eval {
>>> ...
>>> } or do {
>>> ...
>>> };
>>>
>>> rather than relying on $@,
>>
>> No, I shouldn't "always do this" because I usually know what the code
>> will be doing when executed, ie, in this case, that there are neither
>
> If there is a clear way that is always correct and another equally clear
> way that may break on refactoring, I always would choose the first. This
> seems like one of those cases (which I didn't know about yet btw).
Testing $@ is 'always correct' while the return value of eval might be
'false' for any number of reasons because it is just the return value
of the last thing executed in the scope of the eval.
----------
my $bc;
my $rc = eval {
$bc = 3;
die('huh?') if $bc != 3;
} or do {
print("Ben Morrow error occurred!\n");
$@ or print("Phew ... that was close ... \n");
};
----------
Also, there is no way to write code such that yet unknown future
changes made to this code are guaranteed to be correct.
------------------------------
Date: Sat, 03 Mar 2012 17:19:57 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: $var = do { ... }?
Message-Id: <87eht91u0y.fsf@sapphire.mobileactivedefense.com>
Rainer Weikusat <rweikusat@mssgmbh.com> writes:
> Martijn Lievaart <m@rtij.nl.invlalid> writes:
>> On Fri, 02 Mar 2012 20:43:17 +0000, Rainer Weikusat wrote:
>>
>>> Ben Morrow <ben@morrow.me.uk> writes:
>>
>>>> If you insist on doing the eval yourself, you should test the truth of
>>>> the eval
>>>>
>>>> eval {
>>>> ...
>>>> } or do {
>>>> ...
>>>> };
>>>>
>>>> rather than relying on $@,
>>>
>>> No, I shouldn't "always do this" because I usually know what the code
>>> will be doing when executed, ie, in this case, that there are neither
>>
>> If there is a clear way that is always correct and another equally clear
>> way that may break on refactoring, I always would choose the first. This
>> seems like one of those cases (which I didn't know about yet btw).
>
> Testing $@ is 'always correct' while the return value of eval might be
> 'false' for any number of reasons because it is just the return value
> of the last thing executed in the scope of the eval.
Additional remark: One of the purposes of using exceptions to signal
errors is to avoid the so-called semipredicate problem where a
technically legitimate return value must be used to signal an
exceptional condition and the caller cannot generally tell the
difference. Eg, assume the eval returns a file descriptor number. The
usual convention for signalling errors via return value for
subroutines doing this would be to use the number -1 which cannot be a
valid file descriptor. But this won't work with the code above because
-1 is logically true. OTOH, 0 is a valid file descriptor number
(usually used for 'the standard input file descriptor') but logically
false.
------------------------------
Date: Fri, 02 Mar 2012 16:41:06 -0800
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: any lnux distro that uses cpanm for linux pacakge management? cpan seems to well vetter and rpm n debs so pailful
Message-Id: <t2q2l79dffp389pbu783l6udro5djcr6aj@4ax.com>
[fullquote]
gavino <gavcomedy@gmail.com> wrote:
>wondering
That is good, it is the first step to enlightenment.
Do you do anything else besides "wondering"?
jue
------------------------------
Date: Fri, 2 Mar 2012 10:54:33 -0800 (PST)
From: gavino <gavcomedy@gmail.com>
Subject: any lnux distro that uses cpanm for linux pacakge management? cpan seems to well vetter and rpm n debs so pailful
Message-Id: <9452092.725.1330714473787.JavaMail.geo-discussion-forums@pbqn3>
wondering
------------------------------
Date: Sat, 3 Mar 2012 09:20:52 -0800 (PST)
From: gavino <gavcomedy@gmail.com>
Subject: Re: any lnux distro that uses cpanm for linux pacakge management? cpan seems to well vetter and rpm n debs so pailful
Message-Id: <25421933.998.1330795252305.JavaMail.geo-discussion-forums@ynkz21>
On Friday, March 2, 2012 4:41:06 PM UTC-8, J=FCrgen Exner wrote:
> [fullquote]
> gavino <> wrote:
> >wondering
>=20
> That is good, it is the first step to enlightenment.
> Do you do anything else besides "wondering"?
>=20
> jue
honestly not much, although pacman in archlinux is very nice and most linux=
distro should sdopt it
On Friday, March 2, 2012 4:41:06 PM UTC-8, J=FCrgen Exner wrote:
> [fullquote]
> gavino wrote:
> >wondering
>=20
> That is good, it is the first step to enlightenment.
> Do you do anything else besides "wondering"?
>=20
> jue
Not much really, although pacman in archlinux is nice and most linux distro=
should adopt it and rolling release...
------------------------------
Date: Fri, 02 Mar 2012 13:18:04 -0600
From: "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid>
Subject: Re: Best way to search for a string which has N% in a character class?
Message-Id: <4f511cec$0$9074$815e3792@news.qwest.net>
On 03/02/12 13:06, Tim McDaniel wrote:
> In article<4f510c5c$0$75670$815e3792@news.qwest.net>,
> J. Gleixner<glex_no-spam@qwest-spam-no.invalid> wrote:
>> On 03/02/12 10:29, Peng Yu wrote:
>>> Suppose that I want to search for a substring which has say 50%
>>> letters are in a letter class say [A-D]. Note that there is some
>>> ambiguity at the two ends of the substring. But other than that,
>>> this problem is well defined.
>>>
>>> It seems that this problem can not (or can not easily, please let
>>> me know if there is a way) be formulated in regex. Since perl is
>>> strong in processing string, I think that there might be a good way
>>> to search for such strings in perl. Does anybody have some good way
>>> in search this type of substring?
>>
>> What have you tried?????????????????
>>
>> Using 'tr' and 'length' would probably help you.
[...]
> So if you really want a range of characters like A thru D,
> tr/A-D//
> works. If you want all digits, or all alphabetics, or some other
> character class, you need to use s/// instead.
>
Thanks for the correction.
------------------------------
Date: Fri, 2 Mar 2012 12:53:18 -0800 (PST)
From: Peng Yu <pengyu.ut@gmail.com>
Subject: Re: Best way to search for a string which has N% in a character class?
Message-Id: <762e3fdb-c136-47f4-a6cf-dfdfc56d80c1@k24g2000yqe.googlegroups.com>
On Mar 2, 12:07=A0pm, "J. Gleixner" <glex_no-s...@qwest-spam-no.invalid>
wrote:
> On 03/02/12 10:29, Peng Yu wrote:
>
> > Hi,
>
> > Suppose that I want to search for a substring which has say 50%
> > letters are in a letter class say [A-D]. Note that there is some
> > ambiguity at the two ends of the substring. But other than that, this
> > problem is well defined.
>
> > It seems that this problem can not (or can not easily, please let me
> > know if there is a way) be formulated in regex. Since perl is strong
> > in processing string, I think that there might be a good way to search
> > for such strings in perl. Does anybody have some good way in search
> > this type of substring?
>
> What have you tried?????????????????
>
> Using 'tr' and 'length' would probably help you.
>
> =A0From perldoc perlop:
>
> =A0 y/SEARCHLIST/REPLACEMENTLIST/cds
> =A0 =A0 =A0[...]Transliterates all occurrences of the characters found in=
the
> search list with the corresponding character in the replacement list.
> It returns the number of characters replaced or deleted.
>
> Using that you can get the number of characters in the class.
> e.g. $cnt =3D tr/[A-D]/[A-D]/;
>
> Using 'length' you can find how many characters are in the string.
>
> perldoc -f length
>
> Divide one by the other, multiply by 100 and you have the percent.
I don't think that you understand my question.
Suppose that I have a string $str which the concatenation of $str1,
$str2 and $str3, where both $str1 and $str3 have less than 50% of [A-
D] and $str2 have more than 50% of [A-D].
I need to discovered from $str where $str2 starts and ends. I don't
see how tr and length alone can address this question.
------------------------------
Date: Fri, 02 Mar 2012 23:34:36 +0100
From: Kiuhnm <kiuhnm03.4t.yahoo.it>
Subject: Re: Best way to search for a string which has N% in a character class?
Message-Id: <4f514afa$0$1382$4fafbaef@reader1.news.tin.it>
On 3/2/2012 21:53, Peng Yu wrote:
> I don't think that you understand my question.
>
> Suppose that I have a string $str which the concatenation of $str1,
> $str2 and $str3, where both $str1 and $str3 have less than 50% of [A-
> D] and $str2 have more than 50% of [A-D].
That's not clear at all.
Kiuhnm
------------------------------
Date: Fri, 02 Mar 2012 23:37:29 +0100
From: Kiuhnm <kiuhnm03.4t.yahoo.it>
Subject: Re: Best way to search for a string which has N% in a character class?
Message-Id: <4f514ba7$0$1382$4fafbaef@reader1.news.tin.it>
On 3/2/2012 18:25, Kiuhnm wrote:
> On 3/2/2012 17:29, Peng Yu wrote:
>> Suppose that I want to search for a substring which has say 50%
>> letters are in a letter class say [A-D]. Note that there is some
>> ambiguity at the two ends of the substring. But other than that, this
>> problem is well defined.
>>
>> It seems that this problem can not (or can not easily, please let me
>> know if there is a way) be formulated in regex.
>
> /[A-D][^A-D]|[^A-D][A-D]/
A more interesting problem is find the longest substring with that
property (mine was the shortest).
Let's simplify the problem and consider binary strings:
X = 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 1
Let
Y = (0) 1 0 -1 -2 -3 -2 -1 0 -1 0 1 0 -1 0 1 0
be the array where the i-th element is
(the number of 0s seen so far) -
(the number of 1s seen so far)
If Y[m] = Y[n] then from m to n we must have run through a sequence with
an equal number of 0s and 1s.
Finding the longest substring boils down to finding m and n such that
Y[m] = Y[n]
and m-n is maximized.
In our example, the longest substring is the entire string (indeed we
have (0).....0 in Y).
If you don't want 50/50 but 33.(3)/66.(6), use
2 * (the number of 0s seen so far) -
(the number of 1s seen so far)
or vice versa:
X = 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 1
Y = (0) 2 1 0 -1 -2 0 2 4 3 5 7 6 5 7 9 8
The best we can do is from 2 (excluded) to 2:
1 1 1 1 0 0
--->
#!/usr/bin/perl
use 5.010;
say (my $str = 'aaaaaaaaabcdefabcdefabcdefaaaaaaaa');
my @v = split //, $str;
my @y = (0, map { state $v;
my $cur = $v += (/[a-c]/ ? -1 : 1)
} @v
);
my %m;
my $maxIdx;
for (@y) {
state $cnt = 0;
$m{$_}->[0] //= $cnt;
$m{$_}->[1] = $cnt - $m{$_}->[0];
$maxIdx = $_ if $m{$_}->[1] > $m{$maxIdx}->[1];
$cnt++;
}
my $from = $m{$maxIdx}->[0];
my $to = $from + $m{$maxIdx}->[1] - 1;
say "The longest substring goes from $from to $to:\n ", @v[$from..$to];
<---
Kiuhnm
------------------------------
Date: Sat, 3 Mar 2012 00:12:23 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Best way to search for a string which has N% in a character class?
Message-Id: <784829-tdi1.ln1@anubis.morrow.me.uk>
Quoth Peng Yu <pengyu.ut@gmail.com>:
>
> Suppose that I want to search for a substring which has say 50%
> letters are in a letter class say [A-D]. Note that there is some
> ambiguity at the two ends of the substring. But other than that, this
> problem is well defined.
>
> It seems that this problem can not (or can not easily, please let me
> know if there is a way) be formulated in regex. Since perl is strong
> in processing string, I think that there might be a good way to search
> for such strings in perl. Does anybody have some good way in search
> this type of substring?
ISTM you are right the regex engine is a good match for this sort of
thing: it's already got the run-through-the-string and the backtracking
logic you need.
How about
our $d;
m{ (
(?>
(?{ $d = 0 })
(?:
[A-D] (?{ $d++ })
| . (?{ $d-- })
)*
)
(?(?{ $d }) (?!))
) }sx;
(It's necessary for $d to be 'our' due to a longstanding bug in the
implementation of (?{}).)
I think that works correctly?
Ben
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3630
***************************************