[30749] in Perl-Users-Digest
Perl-Users Digest, Issue: 1994 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Nov 20 21:09:46 2008
Date: Thu, 20 Nov 2008 18:09:11 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Thu, 20 Nov 2008 Volume: 11 Number: 1994
Today's topics:
Re: debug cgi programs with post forms xhoster@gmail.com
Re: debug cgi programs with post forms <michaelgang@gmail.com>
Hash keys with range of values -- How? <edMbj@aes-intl.com>
Re: Hash keys with range of values -- How? xhoster@gmail.com
Re: Hash keys with range of values -- How? <jurgenex@hotmail.com>
Re: Hash keys with range of values -- How? <edMbj@aes-intl.com>
Re: Hash keys with range of values -- How? <nobull67@gmail.com>
Re: Hash keys with range of values -- How? <rvtol+news@isolution.nl>
Re: How to find all the strings in a long that are at m <1usa@llenroc.ude.invalid>
Re: How to find all the strings in a long that are at m <PengYu.UT@gmail.com>
Re: How to find all the strings in a long that are at m sln@netherlands.com
Re: How to find all the strings in a long that are at m <rvtol+news@isolution.nl>
Re: How to find all the strings in a long that are at m sln@netherlands.com
Re: How to find all the strings in a long that are at m <rvtol+news@isolution.nl>
Re: How to find all the strings in a long that are at m <someone@example.com>
Re: subroutine local variable with initialization <nospam-abuse@ilyaz.org>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 20 Nov 2008 16:27:10 GMT
From: xhoster@gmail.com
Subject: Re: debug cgi programs with post forms
Message-Id: <20081120112627.514$br@newsreader.com>
david <michaelgang@gmail.com> wrote:
> Hi all,
>
> How do i debug cgi programs where the html contains a post form ?
> If it contains a get form it is easy because i can run perl -d from
> the command line.
> In this case i have a very large form with many variables and it is a
> post form.
Does it have file upload fields?
> How can i debug it ?
What kind of bugs are you looking for?
> I made a google search and found answers like printing the environment
> variables and then run it like a get command with perl -d but this is
> a bit tedious.
> Is there a "best practice" for this type of problem ?
I find the best practice is not making bugs in the first place. :)
I also often add code to the CGI to process secret debugging parameter,
which turns on strategically placed extra print or warn statements.
warn "Now I'm doing $foo" if $cgi->url_param('debug');
If you really want to debug under -d, then you can run the program using
$cgi->save($file_handle) to save the query and then perl -d script.cgi <
saved_query_file
(this won't work with file upload fields, as the file contents are not
saved.)
Where the script use CGI has been changed to something like:
use CGI qw(-debug);
Xho
--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
------------------------------
Date: Thu, 20 Nov 2008 08:31:54 -0800 (PST)
From: david <michaelgang@gmail.com>
Subject: Re: debug cgi programs with post forms
Message-Id: <42c5b9a0-d465-4d2a-96da-b98e15b19b8e@3g2000yqs.googlegroups.com>
I am not using file upload fields, so the option of $cgi->save
($file_handle) to save the query and then perl -d script.cgi <
saved_query_file to run the query will be great
Thank you very much
On Nov 20, 6:27=A0pm, xhos...@gmail.com wrote:
> david <michaelg...@gmail.com> wrote:
> > Hi all,
>
> > How do i debug cgi programs where the html contains a post form ?
> > If it contains a get form it is easy because i can run perl -d from
> > the command line.
> > In this case i have a very large form with many variables and it is a
> > post form.
>
> Does it have file upload fields?
>
> > How can i debug it ?
>
> What kind of bugs are you looking for?
>
> > I made a google search and found answers like printing the environment
> > variables and then run it like a get command with perl -d but this is
> > a bit tedious.
> > Is there a "best practice" for this type of problem ?
>
> I find the best practice is not making bugs in the first place. :)
>
> I also often add code to the CGI to process secret debugging parameter,
> which turns on strategically placed extra print or warn statements.
>
> warn "Now I'm doing $foo" if $cgi->url_param('debug');
>
> If you really want to debug under -d, then you can run the program using
> $cgi->save($file_handle) to save the query and then perl -d script.cgi <
> saved_query_file
>
> (this won't work with file upload fields, as the file contents are not
> saved.)
>
> Where the script use CGI has been changed to something like:
> use CGI qw(-debug);
>
> Xho
>
> --
> --------------------http://NewsReader.Com/--------------------
> The costs of publication of this article were defrayed in part by the
> payment of page charges. This article must therefore be hereby marked
> advertisement in accordance with 18 U.S.C. Section 1734 solely to indicat=
e
> this fact.
------------------------------
Date: Thu, 20 Nov 2008 10:12:32 -0800
From: Ed Jay <edMbj@aes-intl.com>
Subject: Hash keys with range of values -- How?
Message-Id: <uk9bi4lj24r60bb4u5oshq5bnopft9eopg@4ax.com>
I'm trying to use a hash to replace:
if ($scoreR >= 0) {$th_right = TH1;}
if ($scoreR >= 30) {$th_right = TH2;}
if ($scoreR >= 75) {$th_right = TH3;}
if ($scoreR >= 120) {$th_right = TH4;}
if ($scoreR >= 150) {$th_right = TH5;}
My hash:
%thVal = ("x1","TH1","x2","TH2","x3","TH3","x4","TH4","x5","TH5");
I want to use: $th_right = $thVal{$scoreR};
What is the syntax for x1, x2,...x5? I've tried
%thVal = (">=0","TH1",">=30","TH2",etc), but it doesn't work.
--
Ed Jay (remove 'M' to reply by email)
Win the War Against Breast Cancer.
Knowing the facts could save your life.
http://www.breastthermography.info
------------------------------
Date: 20 Nov 2008 19:01:15 GMT
From: xhoster@gmail.com
Subject: Re: Hash keys with range of values -- How?
Message-Id: <20081120140025.844$Su@newsreader.com>
Ed Jay <edMbj@aes-intl.com> wrote:
> I'm trying to use a hash to replace:
>
> if ($scoreR >= 0) {$th_right = TH1;}
> if ($scoreR >= 30) {$th_right = TH2;}
> if ($scoreR >= 75) {$th_right = TH3;}
> if ($scoreR >= 120) {$th_right = TH4;}
> if ($scoreR >= 150) {$th_right = TH5;}
>
> My hash:
> %thVal = ("x1","TH1","x2","TH2","x3","TH3","x4","TH4","x5","TH5");
>
> I want to use: $th_right = $thVal{$scoreR};
>
> What is the syntax for x1, x2,...x5? I've tried
> %thVal = (">=0","TH1",">=30","TH2",etc), but it doesn't work.
There isn't a syntax to do what you want directly. This is not what
hashes do. You might be able to create a "tied" hash module that would do
this, but I would say it is no longer a hash, just something hooking into
the hash hooks.
Maybe something like this (untested):
use List::Util qw(first);
my @thVal = [0,"TH1"],[30,"TH2"],[75,"TH3"]; # etc.
@thVal = sort {$b->[0] <=> $a->[0]} @thVal;
my $val = first {$scoreR < $_->[0]} @thVal;
# should test for $val being undefined;
$th_right = $val->[1];
If you are concerned about performance and @thVal is large, a binary search
would be better.
Xho
--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
------------------------------
Date: Thu, 20 Nov 2008 12:44:00 -0800
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Hash keys with range of values -- How?
Message-Id: <9cdbi45lkg63ijsmkfqqi1olvipcvtjja4@4ax.com>
Ed Jay <edMbj@aes-intl.com> wrote:
>I'm trying to use a hash to replace:
>
>if ($scoreR >= 0) {$th_right = TH1;}
>if ($scoreR >= 30) {$th_right = TH2;}
>if ($scoreR >= 75) {$th_right = TH3;}
>if ($scoreR >= 120) {$th_right = TH4;}
>if ($scoreR >= 150) {$th_right = TH5;}
>
>My hash:
>%thVal = ("x1","TH1","x2","TH2","x3","TH3","x4","TH4","x5","TH5");
>
>I want to use: $th_right = $thVal{$scoreR};
>
>What is the syntax for x1, x2,...x5? I've tried
>%thVal = (">=0","TH1",">=30","TH2",etc), but it doesn't work.
You seem to have trouble with what a hash is. Mathematically it is a
(partial) mapping from string to scalar.
So,
%thVal = (">=0","TH1",">=30","TH2",etc)
is valid, but it does something totally different you seem to think. It
simply assign the value "TH1" to the key ">=0". I could just as well
assign "TH1" to "foobar". For the hash there is no difference wrt what
string you use as the key.
Your initial plan with the multi-level "If" isn't that bad at all. If
the list of ranges is large, you could use the hash similar to what you
got
%thVal = ("0","TH1","30","TH2",75, 'TH3', 120, 'TH4', etc)
and then loop through a sorted list of the keys in order to find the
boundaries.
(sketch only, untested, probably has a one-off error):
for my $this(sort keys(%thVal)) {
next if $scoreA > $thVal{$this};
$th_right = $thVal{$this}; last;
}
jue
------------------------------
Date: Thu, 20 Nov 2008 12:55:17 -0800
From: Ed Jay <edMbj@aes-intl.com>
Subject: Re: Hash keys with range of values -- How?
Message-Id: <8kjbi4pkr63sogivpvdi7ojb6ald3rjop4@4ax.com>
Jürgen Exner wrote:
>Ed Jay <edMbj@aes-intl.com> wrote:
>>I'm trying to use a hash to replace:
>>
>>if ($scoreR >= 0) {$th_right = TH1;}
>>if ($scoreR >= 30) {$th_right = TH2;}
>>if ($scoreR >= 75) {$th_right = TH3;}
>>if ($scoreR >= 120) {$th_right = TH4;}
>>if ($scoreR >= 150) {$th_right = TH5;}
>>
>>My hash:
>>%thVal = ("x1","TH1","x2","TH2","x3","TH3","x4","TH4","x5","TH5");
>>
>>I want to use: $th_right = $thVal{$scoreR};
>>
>>What is the syntax for x1, x2,...x5? I've tried
>>%thVal = (">=0","TH1",">=30","TH2",etc), but it doesn't work.
>
>You seem to have trouble with what a hash is. Mathematically it is a
>(partial) mapping from string to scalar.
>So,
> %thVal = (">=0","TH1",">=30","TH2",etc)
>is valid, but it does something totally different you seem to think. It
>simply assign the value "TH1" to the key ">=0". I could just as well
>assign "TH1" to "foobar". For the hash there is no difference wrt what
>string you use as the key.
>
>Your initial plan with the multi-level "If" isn't that bad at all. If
>the list of ranges is large, you could use the hash similar to what you
>got
> %thVal = ("0","TH1","30","TH2",75, 'TH3', 120, 'TH4', etc)
>
>and then loop through a sorted list of the keys in order to find the
>boundaries.
>
>(sketch only, untested, probably has a one-off error):
>
>for my $this(sort keys(%thVal)) {
> next if $scoreA > $thVal{$this};
> $th_right = $thVal{$this}; last;
>}
>
>jue
Thanks xposter and Jue. I think it's wise to stick to the multiple
conditionals.
--
Ed Jay (remove 'M' to reply by email)
Win the War Against Breast Cancer.
Knowing the facts could save your life.
http://www.breastthermography.info
------------------------------
Date: Thu, 20 Nov 2008 12:59:49 -0800 (PST)
From: Brian McCauley <nobull67@gmail.com>
Subject: Re: Hash keys with range of values -- How?
Message-Id: <ffeb0ceb-8c6c-405f-b562-5abd979eebd7@j39g2000yqn.googlegroups.com>
On 20 Nov, 19:01, xhos...@gmail.com wrote:
> This is not what hashes do. =A0You might be able to create a "tied" hash =
module [...]
...or simply download one (Tie::RangeHash) from CPAN.
------------------------------
Date: Thu, 20 Nov 2008 23:56:40 +0100
From: "Dr.Ruud" <rvtol+news@isolution.nl>
Subject: Re: Hash keys with range of values -- How?
Message-Id: <gg4toe.og.1@news.isolution.nl>
Ed Jay schreef:
> if ($scoreR >= 0) {$th_right = TH1;}
> if ($scoreR >= 30) {$th_right = TH2;}
> if ($scoreR >= 75) {$th_right = TH3;}
> if ($scoreR >= 120) {$th_right = TH4;}
> if ($scoreR >= 150) {$th_right = TH5;}
These ifs are in the wrong order. When $scoreR is 200, $th_right will
change 5 times.
Are you maybe looking for log()?
--
Affijn, Ruud
"Gewoon is een tijger."
------------------------------
Date: Thu, 20 Nov 2008 21:46:27 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: How to find all the strings in a long that are at most of n different characters of a test string?
Message-Id: <Xns9B5CAAA22A43Casu1cornelledu@127.0.0.1>
Peng Yu <PengYu.UT@gmail.com> wrote in
news:8f31f2be-e332-4d6d-a5ad-337e18208809@o2g2000yqd.googlegroups.com:
> On Nov 20, 12:19 am, "John W. Krahn" <some...@example.com> wrote:
>> Peng Yu wrote:
>> > On Nov 19, 7:55 pm, s...@netherlands.com wrote:
>>
...
>> > I can not find what '$-[$i]' means in your code on online perl
>> > documentation. Would you please let me know its meaning? Where is
>> > its documentation?
>>
>> You can find out about the @- array in:
>>
>> perldoc perlvar
>
> It seems that the perldoc is not easy to understand by a newbie. While
> it explains @-, the perldoc keeps on referring to others things that I
> don't understand. Can somebody help explain @- for me with more
> examples?
I looked at the perldoc perlvar page to remind myself what it said. I am
unable to figure out how one could make it clearer:
<blockquote>
This array holds the offsets of the beginnings of the last
successful submatches in the currently active dynamic scope.
$-[0] is the offset into the string of the beginning of the
entire match. The *n*th element of this array holds the offset
of the *n*th submatch, so $-[1] is the offset where $1 begins,
$-[2] the offset where $2 begins, and so on.
</blockquote>
Consider the example below:
C:\DOCUME~1\asu1\LOCALS~1\Temp> cat t.pl
#!/usr/bin/perl
use strict;
use warnings;
my $s = '123abcABC';
if ( $s =~ /^([0-9]+([0-9]))[a-z]+([A-Z]([A-Z]+))$/ ) {
print "@-\n";
print "$_\n" for $1, $2, $3, $4;
}
__END__
In this case, a successful match would produce
$1 = 123 (offset = 0)
$2 = 3 (offset = 2)
$3 = ABC (offset = 6)
$4 = BC (offset = 7)
The entire match starts at offset 0, so we have @- = (0, 0, 2, 6, 7).
Sinan
--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)
comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
------------------------------
Date: Thu, 20 Nov 2008 08:23:24 -0800 (PST)
From: Peng Yu <PengYu.UT@gmail.com>
Subject: Re: How to find all the strings in a long that are at most of n different characters of a test string?
Message-Id: <b6c9b18d-5031-4778-addb-351dd3a98845@v42g2000yqv.googlegroups.com>
On Nov 20, 8:52=A0am, Charlton Wilbur <cwil...@chromatico.net> wrote:
> >>>>> "PY" =3D=3D Peng Yu <PengYu...@gmail.com> writes:
>
> =A0 =A0 PY> BTW, I'd think that the perldoc be made more readable for
> =A0 =A0 PY> newbies. I'm not sure whether other newbies would have the sa=
me
> =A0 =A0 PY> feeling.
>
> There's a tradeoff between making the perldocs more readable for newbies
> and making them more information-dense for experts. =A0There are other
> sources of documentation for newbies, but precious few for experts, and
> so I think the perldocs are at a very sweet spot.
>
> Further, you can learn more about Perl by following up on everything in
> the documentation that you don't understand. =A0Indeed, that's how many o=
f
> us learned as much as we did.
I total understand what you mean.
At this moment, I only be interested in understanding how @- works. I
tried to read that perldoc page, but I feel it is hard to follow.
Since that section of perldocs is not very readable to newbies, I'm
wondering if there is any more introductory material that explains it
with more examples.
I tried to look the more introductory materials on @- online but I
don't find any.
I think as I understand perl more, I shall be able to read perldoc
more easily.
Thanks,
Peng
------------------------------
Date: Fri, 21 Nov 2008 00:33:41 GMT
From: sln@netherlands.com
Subject: Re: How to find all the strings in a long that are at most of n different characters of a test string?
Message-Id: <lgubi49ostf13tfepn8t3u2vk3r8u9kctf@4ax.com>
On Thu, 20 Nov 2008 01:55:07 GMT, sln@netherlands.com wrote:
>On Wed, 19 Nov 2008 15:34:17 -0800 (PST), Peng Yu <PengYu.UT@gmail.com> wrote:
>
>>Hi,
>>
>>Suppose I have a long string $a, and a test string $b.
>>
>>I want to fine all the substrings in $a, whose length is the same as
>>$b with at most n mismatches.
>>
>>For example, string 'abcdef' and string 'aacdxf' have two mismatches
>>at the 2nd character and the 5th character.
>>
>>I'm wondering if this can be done easily in perl. Can I use regular
>>expression to solve this problem?
>>
>>Thanks,
>>Peng
>
>Not easily, and probably fairly slow.
>You need some heuristic algorithym.
>
>sln
>
>----------------------------
>
>use strict;
>use warnings;
>
>
>my $str = 'aacdxfo sdfbsabcrxfodfbdfb';
>my $pattern = 'abcdef';
>my $misses = 2;
>
>my @tmp = split '',$pattern;
>my $pstr;
>for (@tmp) {
> $pstr .= "(?:$_|(.))";
>}
>$pstr = '('.$pstr.')';
>my $rxpattern = qr/$pstr/i;
>
>print @tmp,"\n",$rxpattern,"\n\n";
>
>while ($str =~ /$rxpattern/g )
>{
> my $cnt = 0;
> for my $i (2..(@tmp+1)) {
> last if (($cnt += defined( $-[$i])) > $misses);
> }
> if ($cnt > $misses) {
> pos($str) = $-[0]+1;
> } else {
> print "$cnt bad chars, but found a close match: '$1'\n";
> }
>}
>
>__END__
>
>
>
>abcdef
>(?i-xsm:((?:a|(.))(?:b|(.))(?:c|(.))(?:d|(.))(?:e|(.))(?:f|(.))))
>
>2 bad chars, but found a close match: 'aacdxf'
>2 bad chars, but found a close match: 'abcrxf'
>
Follow-up:
Below is a modified version that chops the time in half.
The fastest solution conforming to the original problem statement
is from John Krahn. Because you can't fight math when posing
1-dimensional problems like this.
However, I'm posting this modified version incase the problem set
expands to multi-character heuristics.
The modifications include staying away from the @- array altogether.
In my opinion, it is just a eval{} in disguise anyway, don't know
don't care. Position is set every time. The pos() function is free,
its virtually no overhead. Reading @-, the patter/sub-pattern matching
position array, on the otherhand, incurrs 5-10 times the overhead.
Its to be avoided at all costs on performance related regexp issues.
Staying away from the @- array but keeping the algorithym means doing
something thats not supposed to be done, but it works:
$i = 1; # is not a reference
$$i # $1
I didn't see any side affects except you have to manage the pos() each time.
See: FAQ 7.29: How can I use a variable as a variable name?
That being said Expanding notation to multi-character substrings
and to add some intelligence (possibly) to dynamically generated regexp's,
might be a help in the future.
For instance, you could expand the definition of matching items like so:
@terms = (
# string - required - generated rxterm
# ----------------------------------------
'Merry ' , 1, # (?:Merry )
'C', , 0, # (?:C|(.))
'h', , 0, # (?:h|(.))
'r', , 0, # (?:r|(.))
'i', , 0, # (?:i|(.))
's', , 0, # (?:s|(.))
't', , 0, # (?:t|(.))
'mas', , 0, # (?:mas|(.{3}))
);
But then your getting into heuristics.
Good luck!
sln
--------------------------------
use strict;
use warnings;
use Benchmark ':hireswallclock';
my ($t0,$t1,$tdif);
my $pattern = 'aacdxf';
my $misses = 2;
my @tmp = split '',$pattern;
my $pstr;
for (@tmp) {
$pstr .= "(?:$_|(.))";
}
$pstr = '('.$pstr.')';
my $rxpattern = qr/$pstr/i;
my $extent = @tmp+1;
my ($i,$cnt,$lastpos) = (0,0,0);
print @tmp,"\n",$rxpattern,"\n\n";
my $fname = 'c:\temp\5MEG_FILE.txt';
open my $fh, $fname or die "can't open $fname...";
$t0 = new Benchmark;
while (<$fh>)
{
chomp;
$lastpos = 0;
while ( /$rxpattern/g )
{
$cnt = 0;
for $i (2..$extent) {
last if (($cnt += defined( $$i)) > $misses);
}
if ($cnt <= $misses) {
print "$cnt bad chars, but found a close match: '$1'\n";
}
pos() = ++$lastpos;
}
}
$t1 = new Benchmark;
close $fh;
$tdif = timediff($t1, $t0);
print "the code took:",timestr($tdif),"\n";
__END__
------------------------------
Date: Fri, 21 Nov 2008 00:14:52 +0100
From: "Dr.Ruud" <rvtol+news@isolution.nl>
Subject: Re: How to find all the strings in a long that are at most of n different characters of a test string?
Message-Id: <gg4uj0.1f8.1@news.isolution.nl>
Peng Yu schreef:
> Suppose I have a long string $a, and a test string $b.
>
> I want to fine all the substrings in $a, whose length is the same as
> $b with at most n mismatches.
Use substr() and the "^" operator and y/\0//c.
> For example, string 'abcdef' and string 'aacdxf' have two mismatches
> at the 2nd character and the 5th character.
$ perl -wle'
$r = "abcdef" ^ "aacdxf";
print $r =~ y/\0//c
'
2
--
Affijn, Ruud
"Gewoon is een tijger."
------------------------------
Date: Thu, 20 Nov 2008 23:52:15 GMT
From: sln@netherlands.com
Subject: Re: How to find all the strings in a long that are at most of n different characters of a test string?
Message-Id: <8psbi4hbak99lghp63govh9df66d9ceajp@4ax.com>
On Thu, 20 Nov 2008 10:06:33 -0800, "John W. Krahn" <someone@example.com> wrote:
>Pilcrow wrote:
>> On Wed, 19 Nov 2008 15:34:17 -0800 (PST), Peng Yu <PengYu.UT@gmail.com>
>> wrote:
>>>
>>> Suppose I have a long string $a, and a test string $b.
>>>
>>> I want to fine all the substrings in $a, whose length is the same as
>>> $b with at most n mismatches.
>>>
>>> For example, string 'abcdef' and string 'aacdxf' have two mismatches
>>> at the 2nd character and the 5th character.
>>>
>>> I'm wondering if this can be done easily in perl. Can I use regular
>>> expression to solve this problem?
>>
>> Not too hard the simple way.
>> -------------------------------------------------------------------
>> #!/usr/bin/perl
>> use strict; use warnings;
>>
>> my $a =
>> "abcdefbacdefabbdefaaaaacdxfaaacdefcdefbacdefabbdefaaaaacdxfaaacdefaacdxfaacdfx";
>> my $b = "aacdxf";
>>
>> my @a = split //,$a;
>> my @b = split //,$b;
>> my $limit = 2;
>> my $cnt;
>> my $lenb = length $b;
>> my @substrings = ();
>>
>> print "\nmatching '$a' against '$b'\n";
>>
>> OUTER:
>> for (my $i = 0; $i <= $#a-$lenb+1; $i++) {
>> $cnt = 0;
>> for (my $j = 0; $j <= $#b; $j++) {
>> $cnt++ unless $a[$i+$j] eq $b[$j];
>> next OUTER if $cnt > $limit;
>> }
>> my $sub = substr($a,$i,$lenb);
>> # push @substrings, $sub; # alternate output
>> print "match '$sub' at offset $i\n";
>> }
>
>Or more simpler as:
>
>#!/usr/bin/perl
>use strict;
>use warnings;
>
>my $a =
>'abcdefbacdefabbdefaaaaacdxfaaacdefcdefbacdefabbdefaaaaacdxfaaacdefaacdxfaacdfx';
>my $b = 'aacdxf';
>
>my $limit = 2;
>my $lenb = length $b;
>
>print "\nmatching '$a' against '$b'\n";
>
>while ( $a =~ /(?=(.{$lenb}))/sog ) {
> next if ( $1 ^ $b ) =~ tr/\0//c > $limit;
> print "match '$1' at offset $-[0]\n";
> }
>
>__END__
>
>
>
>
>John
Yes, this is by far the fastest way. It benches 1/3 of the time
of using arrays and substr. Although, traversing with substr's and
using the bitwise ^ strings with tr///c might be faster, if only
by a little, its worth investigating. I personally think using the
regex engine for this is faster.
This is a prime example where bit-wise operators rule over string
comparisons... pure math.
If the problem were to spill over to multiple character comparisons
this won't work but the OP didn't say that.
Fast indeed!
I looked over this pretty good. I got a question about that look-ahead
zero-width assertion.
I noticed the capture group is in the assertion and it only advances the
regexp ordinal position by one.
Example:
/(.{$lenb})/ advances 6
/(?=(.{$lenb}))/ advances 1
I looked in perlre,perlretut and there is only one mention of capture within
a "non-capture zero-width assertion" extended expression elements.
That was just a side from some example on (?>..), and that just mentioned,
the ordinal position is adjusted with a (?=(regex)) analogy.
Do you know why this is?
sln
------------------------------
Date: Fri, 21 Nov 2008 01:27:35 +0100
From: "Dr.Ruud" <rvtol+news@isolution.nl>
Subject: Re: How to find all the strings in a long that are at most of n different characters of a test string?
Message-Id: <gg52qp.1dc.1@news.isolution.nl>
Peng Yu schreef:
> Suppose I have a long string $a, and a test string $b.
>
> I want to fine all the substrings in $a, whose length is the same as
> $b with at most n mismatches.
>
> For example, string 'abcdef' and string 'aacdxf' have two mismatches
> at the 2nd character and the 5th character.
>
> I'm wondering if this can be done easily in perl. Can I use regular
> expression to solve this problem?
Yes, a regular expression is also possible. First write code to create
the regex string, for your example it can be done like this:
perl -wle'
my ($b, @b) = "aacdxf";
for my $i (0 .. 4) {
substr(my $bi = $b, $i, 1, ".");
for my $j ($i+1 .. 5) {
substr(my $bj = $bi, $j, 1, ".");
push @b, $bj;
}
}
my $re = join("\n | ", @b);
$re = qr/
( $re
)
/x;
print $re;
'
(?x-ism:
( ..cdxf
| .a.dxf
| .ac.xf
| .acd.f
| .acdx.
| a..dxf
| a.c.xf
| a.cd.f
| a.cdx.
| aa..xf
| aa.d.f
| aa.dx.
| aac..f
| aac.x.
| aacd..
)
)
--
Affijn, Ruud
"Gewoon is een tijger."
------------------------------
Date: Thu, 20 Nov 2008 10:06:33 -0800
From: "John W. Krahn" <someone@example.com>
Subject: Re: How to find all the strings in a long that are at most of n different characters of a test string?
Message-Id: <IGhVk.151$X11.50@newsfe08.iad>
Pilcrow wrote:
> On Wed, 19 Nov 2008 15:34:17 -0800 (PST), Peng Yu <PengYu.UT@gmail.com>
> wrote:
>>
>> Suppose I have a long string $a, and a test string $b.
>>
>> I want to fine all the substrings in $a, whose length is the same as
>> $b with at most n mismatches.
>>
>> For example, string 'abcdef' and string 'aacdxf' have two mismatches
>> at the 2nd character and the 5th character.
>>
>> I'm wondering if this can be done easily in perl. Can I use regular
>> expression to solve this problem?
>
> Not too hard the simple way.
> -------------------------------------------------------------------
> #!/usr/bin/perl
> use strict; use warnings;
>
> my $a =
> "abcdefbacdefabbdefaaaaacdxfaaacdefcdefbacdefabbdefaaaaacdxfaaacdefaacdxfaacdfx";
> my $b = "aacdxf";
>
> my @a = split //,$a;
> my @b = split //,$b;
> my $limit = 2;
> my $cnt;
> my $lenb = length $b;
> my @substrings = ();
>
> print "\nmatching '$a' against '$b'\n";
>
> OUTER:
> for (my $i = 0; $i <= $#a-$lenb+1; $i++) {
> $cnt = 0;
> for (my $j = 0; $j <= $#b; $j++) {
> $cnt++ unless $a[$i+$j] eq $b[$j];
> next OUTER if $cnt > $limit;
> }
> my $sub = substr($a,$i,$lenb);
> # push @substrings, $sub; # alternate output
> print "match '$sub' at offset $i\n";
> }
Or more simpler as:
#!/usr/bin/perl
use strict;
use warnings;
my $a =
'abcdefbacdefabbdefaaaaacdxfaaacdefcdefbacdefabbdefaaaaacdxfaaacdefaacdxfaacdfx';
my $b = 'aacdxf';
my $limit = 2;
my $lenb = length $b;
print "\nmatching '$a' against '$b'\n";
while ( $a =~ /(?=(.{$lenb}))/sog ) {
next if ( $1 ^ $b ) =~ tr/\0//c > $limit;
print "match '$1' at offset $-[0]\n";
}
__END__
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
------------------------------
Date: Fri, 21 Nov 2008 00:26:05 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: subroutine local variable with initialization
Message-Id: <gg4v6t$30pn$1@agate.berkeley.edu>
[A complimentary Cc of this posting was NOT [per weedlist] sent to
Brian McCauley
<nobull67@gmail.com>], who wrote in article <c98a6733-0841-4706-b245-756c13ef5cf3@f20g2000yqg.googlegroups.com>:
> > BEGIN {
> >
> > =A0 =A0 my $count =3D 1 ;
> >
> > =A0 =A0 sub doit {
> > =A0 =A0 =A0 =A0 print "count=3D$count\n";
> > =A0 =A0 =A0 =A0 $count++;
> > =A0 =A0 }
> >
> > }
> I would argue that an INIT block would be more idiomatically correct
> rather than a BEGIN one.
> But it does not really make any difference.
... Well, if you forget that the version with BEGIN works, and the
version with INIT works only in some situations, then yes...
Hope this helps,
Ilya
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 1994
***************************************