[18413] in Perl-Users-Digest
Perl-Users Digest, Issue: 581 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Mar 28 14:21:56 2001
Date: Wed, 28 Mar 2001 11:21:26 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <985807285-v10-i581@ruby.oce.orst.edu>
Content-Type: text
Perl-Users Digest Wed, 28 Mar 2001 Volume: 10 Number: 581
Today's topics:
printing numbers (Fulko van Westrenen)
Re: printing numbers (Tad McClellan)
Re: printing numbers (Fulko van Westrenen)
Re: printing numbers (Gary O'Keefe)
Re: printing numbers <djberge@uswest.com>
Re: printing numbers (Tad McClellan)
Re: printing numbers <bmb@ginger.libs.uga.edu>
Re: printing numbers brianr@liffe.com
Re: printing numbers (Rafael Garcia-Suarez)
Provo Perl Mongers Meeting <alany@idiglobal.com>
Question about File::Find module? <shah@typhoon.xnet.com>
Re: Question about File::Find module? nobull@mail.com
Re: Question about File::Find module? (Abigail)
Re: Question: Addressing Strings? (Anno Siegel)
Re: Read and write a hash of arrays ? <c_clarkson@hotmail.com>
Re: Regex question (Ave Wrigley)
Re: Regex question (Ave Wrigley)
Re: Regex question (Ave Wrigley)
Re: Regex question <uri@sysarch.com>
Re: Regex question <Ave.Wrigley@itn.co.uk>
Re: Regex question <joe+usenet@sunstarsys.com>
Re: regex-qr// for search and replace <rick.delaney@home.com>
regexp with multiple \n? <m.grimshaw@salford.ac.uk>
Re: regexp with multiple \n? <iltzu@sci.invalid>
Re: regexp with multiple \n? <abc@def.com>
Digest Administrivia (Last modified: 16 Sep 99) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 28 Mar 2001 11:32:52 GMT
From: fulko@dizzy.debian.org (Fulko van Westrenen)
Subject: printing numbers
Message-Id: <99si54$3p5d$1@simian.nlr.nl>
Hello,
I would like to print the number 3 as 003 (to include in a name).
Is there a simple way to do this?
Thanks,
Fulko
--
Fulko van Westrenen
westre@nlr.nl
Human Factors Department, NLR
------------------------------
Date: Wed, 28 Mar 2001 06:47:03 -0500
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: printing numbers
Message-Id: <slrn9c3jpn.i4v.tadmc@tadmc26.august.net>
Fulko van Westrenen <fulko@dizzy.debian.org> wrote:
>
>I would like to print the number 3 as 003 (to include in a name).
printf "%03d", 3;
>Is there a simple way to do this?
perldoc -f printf
perldoc -f sprintf
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: 28 Mar 2001 13:15:26 GMT
From: fulko@dizzy.debian.org (Fulko van Westrenen)
Subject: Re: printing numbers
Message-Id: <99so5e$3rdf$1@simian.nlr.nl>
On Wed, 28 Mar 2001 06:47:03 -0500, Tad McClellan <tadmc@augustmail.com> wrote:
>Fulko van Westrenen <fulko@dizzy.debian.org> wrote:
>>
>>I would like to print the number 3 as 003 (to include in a name).
>
> printf "%03d", 3;
Thanks for this hint (manpages are somewhat criptic)
Fulko
------------------------------
Date: Wed, 28 Mar 2001 13:15:37 GMT
From: gary@onegoodidea.com (Gary O'Keefe)
Subject: Re: printing numbers
Message-Id: <3ac1ddbc.188524854@news.gssec.bt.co.uk>
On 28 Mar 2001 11:32:52 GMT, fulko@dizzy.debian.org (Fulko van
Westrenen) wrote:
>Hello,
>
>I would like to print the number 3 as 003 (to include in a name).
>Is there a simple way to do this?
printf ( "%03d", $your_number_here );
or
$your_string = sprintf( "%03d", $your_number_here );
It's in the perlfunc docs that should have come with your
distribution. To access the documentation, type 'perldoc perl' for
command line options and listings of the (exceptionally extensive)
available documentation.
It is a good idea to search through the documentation as fully as
possible before posting a question like this to clpm as some of the
other posters take exception to people posting questions where the
answers are readily available in the docs. They'll killfile you as
soon as look at you and you'll never be able to get their help in the
future. As it tends to be the more experienced members of the group
that do this, you'll have trouble getting definitive answers to your
more advanced questions.
So, in the interests of harmony, your future self-interest, and the
reduction of noise in clpm: learn what RTFM means (it's a statement of
exasperation, not abuse).
Gary
--
Gary O'Keefe
gary@onegoodidea.com
+44 (0) 7976 614 336
------------------------------
Date: Wed, 28 Mar 2001 07:46:09 -0600
From: Daniel Berger <djberge@uswest.com>
Subject: Re: printing numbers
Message-Id: <3AC1EB21.A302C731@uswest.com>
Fulko van Westrenen wrote:
> Hello,
>
> I would like to print the number 3 as 003 (to include in a name).
> Is there a simple way to do this?
>
> Thanks,
> Fulko
>
> --
> Fulko van Westrenen
> westre@nlr.nl
> Human Factors Department, NLR
my $n = 3;
if($n < 10){ $n = "00" . $n }
# and later...
if($n == 007){ print "Bond. James Bond" || die "Live and Let"}
--
"Evil will always triumph because Good is *dumb*"
- Dark Helmet, Spaceballs
------------------------------
Date: Wed, 28 Mar 2001 08:52:26 -0500
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: printing numbers
Message-Id: <slrn9c3r4q.iku.tadmc@tadmc26.august.net>
Daniel Berger <djberge@uswest.com> wrote:
>Fulko van Westrenen wrote:
>
>> I would like to print the number 3 as 003 (to include in a name).
>> Is there a simple way to do this?
[ snip quoted .sig Please don't quote those. ]
>my $n = 3;
>if($n < 10){ $n = "00" . $n }
Please put a smiley in when you post joke code, else somebody
might end up actually using it.
That breaks for ($n >=10 and $n < 100).
># and later...
>if($n == 007){ print "Bond. James Bond" || die "Live and Let"}
^^^
You lucked out that your octal number above happened to convert
to the correct decimal value...
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: Wed, 28 Mar 2001 10:10:37 -0500
From: Brad Baxter <bmb@ginger.libs.uga.edu>
Subject: Re: printing numbers
Message-Id: <Pine.A41.4.21.0103281006370.19002-100000@ginger.libs.uga.edu>
On Wed, 28 Mar 2001, Daniel Berger wrote:
> Fulko van Westrenen wrote:
> > I would like to print the number 3 as 003 (to include in a name).
> > Is there a simple way to do this?
>
> my $n = 3;
> if($n < 10){ $n = "00" . $n }
>
> # and later...
> if($n == 007){ print "Bond. James Bond" || die "Live and Let"}
Intentional obfuscation? :-)
my $n = 10;
$n = "0" . $n;
print "$n\n";
if( $n == 010 ){ print "Oh, James." }
else { die "yuppy scum\n" }
Brad
------------------------------
Date: 28 Mar 2001 13:32:57 +0100
From: brianr@liffe.com
Subject: Re: printing numbers
Message-Id: <vtpuf2f4zq.fsf@liffe.com>
fulko@dizzy.debian.org (Fulko van Westrenen) writes:
> Hello,
>
> I would like to print the number 3 as 003 (to include in a name).
> Is there a simple way to do this?
Sure, see 'perldoc -f sprintf'
HTH
--
Brian Raven
And don't tell me there isn't one bit of difference between null and space,
because that's exactly how much difference there is. :-)
-- Larry Wall in <10209@jpl-devvax.JPL.NASA.GOV>
------------------------------
Date: Wed, 28 Mar 2001 12:26:44 GMT
From: rgarciasuarez@free.fr (Rafael Garcia-Suarez)
Subject: Re: printing numbers
Message-Id: <slrn9c3m4i.b70.rgarciasuarez@rafael.kazibao.net>
Fulko van Westrenen wrote in comp.lang.perl.misc:
> Hello,
>
> I would like to print the number 3 as 003 (to include in a name).
> Is there a simple way to do this?
perldoc -f printf
perldoc -f sprintf
--
Rafael Garcia-Suarez / http://rgarciasuarez.free.fr/
------------------------------
Date: Wed, 28 Mar 2001 11:31:20 -0700
From: Alan Young <alany@idiglobal.com>
Subject: Provo Perl Mongers Meeting
Message-Id: <3AC22DF8.DE00669A@idiglobal.com>
If this is OT I'm sure y'all'll let me know.
We are having a meeting of the Provo Perl Mongers tonight at the Orem
Public Library at 7pm for anyone who is interested.
Alan Young
------------------------------
Date: Wed, 28 Mar 2001 16:08:49 +0000 (UTC)
From: Hemant Shah <shah@typhoon.xnet.com>
Subject: Question about File::Find module?
Message-Id: <99t2ah$7v2$1@flood.xnet.com>
Folks,
I am in the middle of re-writing a ksh script into a perl script.
The script basically cleans up the temporary files everynight.
The scripts does the following among other things:
1) Search all the local filesystems for core files and delete it.
2) Delete files older than 4 hours from /tmp, /usr/tmp, /var/tmp
3) Delete the files older then 3 days from $HOME/tmp for all users.
I wrote few functions and found that File::Find was running much slower
than find command. I looked closer at the perl script and found that
find function is pretty simple, it takes the starting directories
and calls my function (wanted) for every entry it finds. So even if
I want to search only local file systems, the find function will look at
all the filesystems. My function (wanted) is the one decides whether
I should process the entry.
Example:
find2perl / -type f -name core ! -fstype nfs -print
Will generate following code:
---------cut-------------cut-------------cut-------------cut----
File::Find::find({wanted => \&wanted}, '/');
sub wanted {
my ($dev,$ino,$mode,$nlink,$uid,$gid);
(($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_)) &&
-f _ &&
/^core\z/s &&
! ($dev >= 0) &&
print("$name\n");
}
---------cut-------------cut-------------cut-------------cut----
I see 2 problems with the above code:
1) File::Find::find will traverse all the filesystems, the "wanted" function
decides what to do with the entry. This does not look like a good solution
for me, because I have a 60GB file system (users home directory) NFS
mounted across 20 systems, and I do not want 20 systems traversing the file
system every night.
2) Someone else mentioned it in another post that on Solaris 8 the
($dev < 0) test does not work for NFS file systems.
I have to run the script on Linux, AIX 4.2.1, AIX 4.3.3, HP-UX 10.2,
HP-UX 11.x, Solaris 8.
Is there a better solution in perl? Otherwise I am back to using find
command and ksh.
Thanks in advance.
--
Hemant Shah /"\ ASCII ribbon campaign
E-mail: NoJunkMailshah@xnet.com \ / ---------------------
X against HTML mail
TO REPLY, REMOVE NoJunkMail / \ and postings
FROM MY E-MAIL ADDRESS.
-----------------[DO NOT SEND UNSOLICITED BULK E-MAIL]------------------
I haven't lost my mind, Above opinions are mine only.
it's backed up on tape somewhere. Others can have their own.
------------------------------
Date: 28 Mar 2001 18:03:07 +0100
From: nobull@mail.com
Subject: Re: Question about File::Find module?
Message-Id: <u9itkt6d2s.fsf@wcl-l.bham.ac.uk>
Hemant Shah <shah@typhoon.xnet.com> writes:
> find2perl / -type f -name core ! -fstype nfs -print
>
> Will generate following code:
>
> ---------cut-------------cut-------------cut-------------cut----
> File::Find::find({wanted => \&wanted}, '/');
>
> sub wanted {
> my ($dev,$ino,$mode,$nlink,$uid,$gid);
>
> (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_)) &&
> -f _ &&
> /^core\z/s &&
> ! ($dev >= 0) &&
> print("$name\n");
> }
> ---------cut-------------cut-------------cut-------------cut----
>
> I see 2 problems with the above code:
>
> 1) File::Find::find will traverse all the filesystems, the "wanted" function
> decides what to do with the entry. This does not look like a good solution
> for me, because I have a 60GB file system (users home directory) NFS
> mounted across 20 systems, and I do not want 20 systems traversing the file
> system every night.
So "prune" when you hit a NFS filesysetem.
> 2) Someone else mentioned it in another post that on Solaris 8 the
> ($dev < 0) test does not work for NFS file systems.
>
>
> I have to run the script on Linux, AIX 4.2.1, AIX 4.3.3, HP-UX 10.2,
> HP-UX 11.x, Solaris 8.
Sorry can't help there.
--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
------------------------------
Date: Wed, 28 Mar 2001 18:30:54 +0000 (UTC)
From: abigail@foad.org (Abigail)
Subject: Re: Question about File::Find module?
Message-Id: <slrn9c4beu.9ac.abigail@tsathoggua.rlyeh.net>
Hemant Shah (shah@typhoon.xnet.com) wrote on MMDCCLXVI September MCMXCIII
in <URL:news:99t2ah$7v2$1@flood.xnet.com>:
[]
[] I see 2 problems with the above code:
[]
[] 1) File::Find::find will traverse all the filesystems, the "wanted" function
[] decides what to do with the entry. This does not look like a good solution
[] for me, because I have a 60GB file system (users home directory) NFS
[] mounted across 20 systems, and I do not want 20 systems traversing the fil
[] system every night.
[]
[] 2) Someone else mentioned it in another post that on Solaris 8 the
[] ($dev < 0) test does not work for NFS file systems.
[]
[]
[] I have to run the script on Linux, AIX 4.2.1, AIX 4.3.3, HP-UX 10.2,
[] HP-UX 11.x, Solaris 8.
[]
[] Is there a better solution in perl? Otherwise I am back to using find
[] command and ksh.
Well, it's fairly trivial to find out which systems are NFS mounted -
"mount" will tell you.
However, having said that, there's no reason to assume one has to
use File::Find to traverse the file system. I've never used File::Find
myself, but I often use find. Which works fine from within Perl:
open my $find => "find / -name 'core' \! -fstype nfs -print0 |" or die;
$/ = "\x00";
while (<$find>) {...}
close $find or die;
Of course, if you just want to delete the files, no need for the open.
system q {find / -name 'core' \! -fstype nfs -exec rm {} \;};
If you want to, you could do all the deletes in a single find.
Abigail
--
perl -we '$@="\145\143\150\157\040\042\112\165\163\164\040\141\156\157\164".
"\150\145\162\040\120\145\162\154\040\110\141\143\153\145\162".
"\042\040\076\040\057\144\145\166\057\164\164\171";`$@`'
------------------------------
Date: 28 Mar 2001 11:02:37 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Question: Addressing Strings?
Message-Id: <99sgcd$jpp$1@mamenchi.zrz.TU-Berlin.DE>
According to Benjamin Goldberg <goldbb2@earthlink.net>:
[...]
> Again, // is magical. It means match with the last nonempty pattern.
> If you've never had any nonempty patterns in your file, then it may
> work, but it is a bad idea to always expect this to be so.
Not quite. m// uses whatever regex last matched successfully, not
just the last regex compiled.
This quirk makes the otherwise attractive m// feature nearly useless.
It is attractive because it gives you a way to re-use a compiled
regex without the /o modifier. This means you can re-compile a regex
occasionally during the run of a program. But you would want to do
that only when parts of the regex are interpolated at run time. This
again means that you (generally) don't know what the regex will match
at a given time. Remember there are regexes that don't match anything,
so there may not even be such a string. So you don't have a chance
to reliably initialize m//.
Of course, these days we have qr// for that, so the point is moot.
Anno
------------------------------
Date: Wed, 28 Mar 2001 05:25:14 -0600
From: "Charles K. Clarkson" <c_clarkson@hotmail.com>
Subject: Re: Read and write a hash of arrays ?
Message-Id: <7449E7C0B5F358E7.7666F9C1D739B7D2.07A7728B73B4F8B0@lp.airnews.net>
Rich <bigrich318@yahoo.com> wrote:
:
: <u665313720@spawnkill.ip-mobilphone.net> wrote:
: > Hi,
: <snip>
:
: use strict;
:
: my %enzymes_hash;
: my $keyword;
:
: while (<DATA>) {
: if (/^(\w+)::$/) { $keyword = $1; next;};
: my ($clone, $best_hit_id, $e_value, $percent_ident, $aln_length, $species,
$desc) =
: (/^(\w+)\s+(\w+)\s+(.*?)\s+(.*?)\s+(.*?)\s+(.*?\s+.*?)\s+(.*)/);
: push(@{$enzymes_hash{$best_hit_id}}, join(" ", ($e_value, $clone,
$best_hit_id, $percent_ident, $aln_length, $species, $desc)));
: }
There isn't a need for all those values anymore:
while (<DATA>) {
if (/^(\w+)::$/) { $keyword = $1; next;};
my @arr =
(/^(\w+)\s+(\w+)\s+(.*?)\s+(.*?)\s+(.*?)\s+(.*?\s+.*?)\s+(.*)/);
push @{$enzymes_hash{$arr[1]}}, join ' ', @arr[2, 0, 1, 3 .. 6];
}
HTH,
Charles K. Clarkson
:
: # print hash of arrays
: print "Keyword is $keyword\n\n";
: foreach my $key (sort keys %enzymes_hash) {
: print "Contents of ${key}:\n";
: foreach (sort @{$enzymes_hash{$key}}) {
: print;
: print "\n";
: }
: print "\n";
: }
: __DATA__
: ACYLASE::
: HVSMEb0011A11f ACY1_PIG 1.10e-06 38,32% 107 Sus scrofa
AMINOACYLASE-1 (EC 3.5.1.14) (N-ACYL-L-AMINO-ACID
: HVSMEb0011A21f ACY1_PIG 9.40e-35 51,15% 217 Sus scrofa
AMINOACYLASE-1 (EC 3.5.1.14) (N-ACYL-L-AMINO-ACID
: HVSMEb0014B21f FABD_BACSU 2.00e-12 47,14% 140 Bacillus subtilis
MALONYL COA-ACYL CARRIER PROTEIN TRANSACYLASE (EC
: HVSMEc0009D01f ACY1_HUMAN 2.60e-24 52,07% 169 Homo sapiens
AMINOACYLASE-1 (EC 3.5.1.14) (N-ACYL-L-AMINO-ACID
: HVSMEf0010D05f ACY1_HUMAN 1.70e-14 52,38% 84 Homo sapiens
AMINOACYLASE-1 (EC 3.5.1.14) (N-ACYL-L-AMINO-ACID
: HVSMEf0023M23f FABD_BACSU 4.00e-13 38,38% 99 Bacillus subtilis
MALONYL COA-ACYL CARRIER PROTEIN TRANSACYLASE (EC
: HVSMEg0013L23f ACY1_PIG 1.20e-23 51.46% 171 Sus scrofa
AMINOACYLASE-1 (EC 3.5.1.14) (N-ACYL-L-AMINO-
: HVSMEk0024P21f FABD_BACSU 6.60e-29 39,50% 200 Bacillus subtilis
MALONYL COA-ACYL CARRIER PROTEIN TRANSACYLASE (EC
:
:
:
------------------------------
Date: Wed, 28 Mar 2001 10:54:27 GMT
From: Ave.Wrigley@itn.co.uk (Ave Wrigley)
Subject: Re: Regex question
Message-Id: <3ac1c16d.676016370@news.lhr.globix.net>
On Sun, 25 Mar 2001 19:27:28 GMT, Benjamin Goldberg
<goldbb2@earthlink.net> wrote:
>Ave Wrigley wrote:
>>
>> Anyone know a neat regex for the follwing; capture from a string a
>> substing which contains any given 2 (3,4, ...) words, in any order.
>> Something along the lines of:
>>
>> $string =~ /(?=.*foo)(?=.*bar)(.*?)(?<=foo.*)(?<=bar.*)/;
>>
>> execpt that variable length lookbehind not implemented.
>>
>> Ave Wrigley <Ave.Wrigley@itn.co.uk>
>
>return true if( /(.*)\b(foo|bar|baz)\b(.*)/ and "$1$3" ~= // );
>
>In other words, match one word, remove it, match again.
OK, I'm not sure if I a) expressed the problem correctly, and b)
understand your response, so please help me out! What I was after was
to capture from a given string, the smallest substring that contains
all of a given set of words, in any order. For example, suppose
$string = "some text here baz foo blah foo bar more text here";
and
@words = ( "foo", "bar", "baz" );
What I want is something that will extract "baz foo blah foo bar".
Ave Wrigley <Ave.Wrigley@itn.co.uk>
------------------------------
Date: Wed, 28 Mar 2001 11:17:06 GMT
From: Ave.Wrigley@itn.co.uk (Ave Wrigley)
Subject: Re: Regex question
Message-Id: <3ac1c50c.676943603@news.lhr.globix.net>
>>A solution that does not use regexps :
>>
>>$searched = "There are a bar, a foo and a baz here.";
>>@words = qw/foo bar baz/;
>>my $beg = (sort { $a <=> $b } map { index $searched, $_ } @words)[0];
>>my $end = (sort { $b <=> $a } map { (length) + index $searched, $_ } @words)[0];
>>print substr($searched,$beg,$end-$beg),"\n";
>>
>>Note that it's possible to enhance this solution by optimizing the way
>>of calculating the max and the min of a list of integers.
>>
>>Also, handling cases where $searched does not match all words is left as
>>an exercise the reader.
Is it true that what this does is:
1) find the index if the begining first instance of any of @words in
$searched
2) find the index of the end of the last instance of ditto
3) extract the substring between these
OK, this is a cool solution; does it return the longest substring that
contains all (or as many as possible) of @words? Is there a solution
that returns the smallest?
Ave Wrigley <Ave.Wrigley@itn.co.uk>
------------------------------
Date: Wed, 28 Mar 2001 11:33:39 GMT
From: Ave.Wrigley@itn.co.uk (Ave Wrigley)
Subject: Re: Regex question
Message-Id: <3ac1c848.677771273@news.lhr.globix.net>
On Mon, 26 Mar 2001 20:12:18 GMT, ced@bcstec.ca.boeing.com (Charles
DeRykus) wrote:
>Here's one using a regex... uglier though:
>
>my $regex;
>$regex .= "(?=^(.*?$_))" for @words;
>my @sort = sort { length $a <=> length $b } $searched =~ /$regex/s;
>my ($trim) = $sort[0] =~/(.*?)(?:@{[join "|", @words]})/;
>my $match = substr $sort[$#sort], length $trim;
Soo ...
1) $regex is a lookahead assertion for all of @words, which captures
all the text from the begining of the string until after each of the
words; thins for each word
2) @sort contains these captured substrings, sorted by length
3) $trim takes the shortest of these, and does a non-greedy match up
until the first of any of @words (could this have been achieved by not
capturing $_ in the $regex regex?)
4) $match is the substring of the longest of @sort from $trim to the
end of the string
So $match is a (kind of!) the substring from the first first instance
of any of @words to the last first instance of any of @word (if you
get what I mean!). Is thsi correct?
Ave Wrigley <Ave.Wrigley@itn.co.uk>
------------------------------
Date: Wed, 28 Mar 2001 16:44:21 GMT
From: Uri Guttman <uri@sysarch.com>
Subject: Re: Regex question
Message-Id: <x73dbxzvve.fsf@home.sysarch.com>
>>>>> "AW" == Ave Wrigley <Ave.Wrigley@itn.co.uk> writes:
AW> On Sun, 25 Mar 2001 19:27:28 GMT, Benjamin Goldberg
AW> <goldbb2@earthlink.net> wrote:
>> Ave Wrigley wrote:
>>>
>>> Anyone know a neat regex for the follwing; capture from a string a
>>> substing which contains any given 2 (3,4, ...) words, in any order.
>>> Something along the lines of:
>>>
>>> $string =~ /(?=.*foo)(?=.*bar)(.*?)(?<=foo.*)(?<=bar.*)/;
>>>
>>> execpt that variable length lookbehind not implemented.
>>>
>>> Ave Wrigley <Ave.Wrigley@itn.co.uk>
>>
>> return true if( /(.*)\b(foo|bar|baz)\b(.*)/ and "$1$3" ~= // );
>>
>> In other words, match one word, remove it, match again.
AW> OK, I'm not sure if I a) expressed the problem correctly, and b)
AW> understand your response, so please help me out! What I was after was
AW> to capture from a given string, the smallest substring that contains
AW> all of a given set of words, in any order. For example, suppose
AW> $string = "some text here baz foo blah foo bar more text here";
AW> and
AW> @words = ( "foo", "bar", "baz" );
AW> What I want is something that will extract "baz foo blah foo bar".
this seems to work fine. you need all the nesting to make sure they are
real words and can have spaces between them and to grab the whole sub
string.
#!/usr/local/bin/perl -ln
BEGIN{ @words = qw( foo bar baz ) ; $" = "|" ; }
print $1 if /((?:\s*(?:\b(?:@words)\b)\s*)+)/ ;
you can make the use of $" go away by using a join '|' on the list and
stuffing that into the regex.
uri
--
Uri Guttman --------- uri@sysarch.com ---------- http://www.sysarch.com
SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting
The Perl Books Page ----------- http://www.sysarch.com/cgi-bin/perl_books
The Best Search Engine on the Net ---------- http://www.northernlight.com
------------------------------
Date: Wed, 28 Mar 2001 18:54:53 +0100
From: "Ave Wrigley" <Ave.Wrigley@itn.co.uk>
Subject: Re: Regex question
Message-Id: <Fopw6.3$XC5.1800@news.lhr.globix.net>
> -----Original Message-----
> From: Uri Guttman [mailto:uri@sysarch.com]
> Sent: 28 March 2001 17:44
> To: Ave.Wrigley@itn.co.uk
> Subject: Re: Regex question
>
>
> The following message is a courtesy copy of an article
> that has been posted to comp.lang.perl.misc as well.
>
> >>>>> "AW" == Ave Wrigley <Ave.Wrigley@itn.co.uk> writes:
>
> AW> On Sun, 25 Mar 2001 19:27:28 GMT, Benjamin Goldberg
> AW> <goldbb2@earthlink.net> wrote:
>
> >> Ave Wrigley wrote:
> >>>
> >>> Anyone know a neat regex for the follwing; capture from
> a string a
> >>> substing which contains any given 2 (3,4, ...) words,
> in any order.
> >>> Something along the lines of:
> >>>
> >>> $string =~ /(?=.*foo)(?=.*bar)(.*?)(?<=foo.*)(?<=bar.*)/;
> >>>
> >>> execpt that variable length lookbehind not implemented.
> >>>
> >>> Ave Wrigley <Ave.Wrigley@itn.co.uk>
> >>
> >> return true if( /(.*)\b(foo|bar|baz)\b(.*)/ and "$1$3" ~= // );
> >>
> >> In other words, match one word, remove it, match again.
>
> AW> OK, I'm not sure if I a) expressed the problem correctly, and b)
> AW> understand your response, so please help me out! What I
> was after was
> AW> to capture from a given string, the smallest substring
> that contains
> AW> all of a given set of words, in any order. For example, suppose
>
> AW> $string = "some text here baz foo blah foo bar more text here";
>
> AW> and
>
> AW> @words = ( "foo", "bar", "baz" );
>
> AW> What I want is something that will extract "baz foo
> blah foo bar".
>
> this seems to work fine. you need all the nesting to make
> sure they are
> real words and can have spaces between them and to grab the whole sub
> string.
>
> #!/usr/local/bin/perl -ln
>
> BEGIN{ @words = qw( foo bar baz ) ; $" = "|" ; }
>
> print $1 if /((?:\s*(?:\b(?:@words)\b)\s*)+)/ ;
>
>
> you can make the use of $" go away by using a join '|' on the list and
> stuffing that into the regex.
>
> uri
Surely this only matches if the substring only contains @words? I.e. for:
"some text here baz foo blah foo bar more text here"
it will match:
" baz foo "
Ave.
--
Ave Wrigley <Ave.Wrigley@itn.co.uk>
------------------------------
Date: 28 Mar 2001 13:29:11 -0500
From: Joe Schaefer <joe+usenet@sunstarsys.com>
Subject: Re: Regex question
Message-Id: <m3n1a59288.fsf@mumonkan.sunstarsys.com>
"Ave Wrigley" <Ave.Wrigley@itn.co.uk> writes:
> > >>>>> "AW" == Ave Wrigley <Ave.Wrigley@itn.co.uk> writes:
> > AW> to capture from a given string, the smallest substring
> > that contains
> > AW> all of a given set of words, in any order. For example, suppose
That's sounds more like an optimization problem rather than a
regexp problem. Here's a vague outline of one way to solve it:
1) for each word, make an array of indices that mark where each
word begins in the string. You can use regexps or index() for this.
2) make a function that you want to optimize: f(@) = max(@) - min(@)
where @ is a list of indices composed of one element from each
array in (1).
3) identify a list that optimizes (2), and use the corresponding
indices to pull out the substring.
4) do something clever in (3) if your algorithm for it is too slow.
HTH
--
Joe Schaefer "There is something fascinating about science. One gets such
wholesale returns of conjecture out of such a trifling
investment of fact."
--Mark Twain
------------------------------
Date: Wed, 28 Mar 2001 03:57:02 GMT
From: Rick Delaney <rick.delaney@home.com>
Subject: Re: regex-qr// for search and replace
Message-Id: <3AC1644F.357045C9@home.com>
Bart Lateur wrote:
>
> sub regexsub {
> my $re = shift;
> return sub {
> shift =~ /$re/o;
> }
> }
> $sub[0] = regexsub('fo+');
> $sub[1] = regexsub('ba+r');
>
> foreach (qw'foo baaar fooooo bar bbbbbbbbar') {
> foreach my $i(0, 1) {
> $sub[$i]->($_) and print "Match for '$_' with regex $i\n";
> }
> }
> -->
> Match for 'foo' with regex 0
> Match for 'baaar' with regex 1
> Match for 'fooooo' with regex 0
> Match for 'bar' with regex 1
> Match for 'bbbbbbbbar' with regex 1
>
> In pre-5.6.0, //o and closures don't work too well together.
They still don't. I get
Match for 'foo' with regex 0
Match for 'foo' with regex 1
Match for 'fooooo' with regex 0
Match for 'fooooo' with regex 1
$ perl -v
This is perl, v5.6.0 built for i686-linux
I also get this for a couple of 5.6.1-TRIAL versions and a 5.7 track
perl. I have seen it work properly with a particular ActiveState
version but I don't have it anymore and can't remember which.
Can you post what version of perl gives you the results you got? I'd
like to track this down and get it fixed in the main perl distribution
once and for all.
I feel like we've had this discussion before. If so, sorry I didn't do
anything about it last time.
--
Rick Delaney
rick.delaney@home.com
------------------------------
Date: Wed, 28 Mar 2001 19:03:01 +0100
From: Mark Grimshaw <m.grimshaw@salford.ac.uk>
Subject: regexp with multiple \n?
Message-Id: <3AC22755.97092577@salford.ac.uk>
Hi,
I'm capturing input from a FORMs-based bulletin board. If $message is
my input from the form, I use the following to remove all occurrences of
\n and replace them with <BR> for return to the browser:
$message =~ s/\n/<BR>/g;
This works fine.
However, to stop users hitting their return key multiple times and
thereby wasting a lot of space on the webpage, I tried the following
hoping that it would replace multiple occurrences of \n with <P> and
then single occurrences of \n with <BR>:
$message =~ s/(\n){2,}/<P>/g;
$message =~ s/\n/<BR>/g;
However, the first code produces no result. Any solutions?
------------------------------
Date: 28 Mar 2001 18:51:44 GMT
From: Ilmari Karonen <iltzu@sci.invalid>
Subject: Re: regexp with multiple \n?
Message-Id: <985805172.15743@itz.pp.sci.fi>
In article <3AC22755.97092577@salford.ac.uk>, Mark Grimshaw wrote:
>
>However, to stop users hitting their return key multiple times and
>thereby wasting a lot of space on the webpage, I tried the following
>hoping that it would replace multiple occurrences of \n with <P> and
>then single occurrences of \n with <BR>:
>
>$message =~ s/(\n){2,}/<P>/g;
>$message =~ s/\n/<BR>/g;
>
>However, the first code produces no result. Any solutions?
You've got something invisible -- probably carriage returns -- between
the line feeds. Get rid of them first:
$message =~ s/[^\S\n]+\n/\n/g;
--
Ilmari Karonen - http://www.sci.fi/~iltzu/
Please ignore Godzilla / Kira -- do not feed the troll.
------------------------------
Date: Wed, 28 Mar 2001 20:58:11 +0200
From: HP <abc@def.com>
Subject: Re: regexp with multiple \n?
Message-Id: <xrqw6.1339$e04.4229@nntpserver.swip.net>
Mark Grimshaw wrote:
> Hi,
>
> I'm capturing input from a FORMs-based bulletin board. If $message is
> my input from the form, I use the following to remove all occurrences of
> \n and replace them with <BR> for return to the browser:
>
> $message =~ s/\n/<BR>/g;
>
> This works fine.
>
> However, to stop users hitting their return key multiple times and
> thereby wasting a lot of space on the webpage, I tried the following
> hoping that it would replace multiple occurrences of \n with <P> and
> then single occurrences of \n with <BR>:
>
> $message =~ s/(\n){2,}/<P>/g;
> $message =~ s/\n/<BR>/g;
>
> However, the first code produces no result. Any solutions?
Wouldn't multiple \n's be read as multiple lines?
/HP
------------------------------
Date: 16 Sep 99 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 16 Sep 99)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
| NOTE: The mail to news gateway, and thus the ability to submit articles
| through this service to the newsgroup, has been removed. I do not have
| time to individually vet each article to make sure that someone isn't
| abusing the service, and I no longer have any desire to waste my time
| dealing with the campus admins when some fool complains to them about an
| article that has come through the gateway instead of complaining
| to the source.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 581
**************************************