[30424] in Perl-Users-Digest
Perl-Users Digest, Issue: 1667 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jun 23 09:09:46 2008
Date: Mon, 23 Jun 2008 06:09:05 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Mon, 23 Jun 2008 Volume: 11 Number: 1667
Today's topics:
Re: [OT] How to make list of all htm file... <rvtol+news@isolution.nl>
Re: [OT] How to make list of all htm file... <szrRE@szromanMO.comVE>
Re: Difference of * and + in regular expression <ben@morrow.me.uk>
Re: Difference of * and + in regular expression <tadmc@seesig.invalid>
Re: Difference of * and + in regular expression <ced@blv-sam-01.ca.boeing.com>
Re: FAQ 8.26 Why doesn't open() return an error when a <szrRE@szromanMO.comVE>
Re: Filtering a string <zen13097@zen.co.uk>
Re: Filtering a string <daveb@addr.invalid>
Re: How to make list of all htm file... <szrRE@szromanMO.comVE>
Re: How to make list of all htm file... <szrRE@szromanMO.comVE>
How to match string end for a multiline string? <PengYu.UT@gmail.com>
Re: How to match string end for a multiline string? <damian@tvk.rwth-aachen.de>
new CPAN modules on Mon Jun 23 2008 (Randal Schwartz)
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sun, 22 Jun 2008 14:53:05 +0200
From: "Dr.Ruud" <rvtol+news@isolution.nl>
Subject: Re: [OT] How to make list of all htm file...
Message-Id: <g3lp6s.19s.1@news.isolution.nl>
szr schreef:
> $ find . | grep -P 'html?$'
That is quite wasteful, even if the current directory doesn't contain
millions of subdirectories and files.
And it would erroneously return ./test_html and such.
$ find . -type f -name "*.htm" -or -name "*.html"
$ find . -type f -regex ".*\.html?"
--
Affijn, Ruud
"Gewoon is een tijger."
------------------------------
Date: Sun, 22 Jun 2008 14:59:41 -0700
From: "szr" <szrRE@szromanMO.comVE>
Subject: Re: [OT] How to make list of all htm file...
Message-Id: <g3mi0d01tpc@news4.newsguy.com>
Dr.Ruud wrote:
> szr schreef:
>
>> $ find . | grep -P 'html?$'
>
> That is quite wasteful, even if the current directory doesn't contain
> millions of subdirectories and files.
Aside form forgetting *. which should of been at the beginning of my
patterns, is it really more wasteful? Does find not have to also check
each file it comes across too? Or is it just the over of piping the
final output from find over to grep? Other then that I don't see why it
would be more wasteful? On my both my Dual core Linux system as well as
an old P2 400 also running Linux, I see no difference in speed, even on
a large sprawling directory. find does it's thing, grep prunes it's
results.
--
szr
------------------------------
Date: Sun, 22 Jun 2008 04:04:57 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Difference of * and + in regular expression
Message-Id: <p3s0j5-tqg1.ln1@osiris.mauzo.dyndns.org>
Quoth Peng Yu <PengYu.UT@gmail.com>:
>
> If I used the uncommented if-statement, I would get no match. If I
> used the commend if statement otherwise, I would have the following
> string as the output. I'm wondering why the regular expression with *
> does not match anything?
>
> namespace a { namespace b { namespace c {
>
> $string="a namespace a { namespace b { namespace c { ";
>
> #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
> if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
'Match earlier in the string' beats 'match longest', even with greedy
matching, and since your regex will match the empty string the first
match is right before the first 'a'.
Ben
--
You poor take courage, you rich take care:
The Earth was made a common treasury for everyone to share
All things in common, all people one.
'We come in peace'---the order came to cut them down. [ben@morrow.me.uk]
------------------------------
Date: Sun, 22 Jun 2008 10:00:28 -0500
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: Difference of * and + in regular expression
Message-Id: <slrng5sq8c.1ej.tadmc@tadmc30.sbcglobal.net>
Peng Yu <PengYu.UT@gmail.com> wrote:
> On Jun 21, 9:39 pm, Gunnar Hjalmarsson <nore...@gunnar.cc> wrote:
>> Peng Yu wrote:
>> > If I used the uncommented if-statement, I would get no match.
>>
>> Not true. $1 is defined, so the regex does match.
>>
>> > $string="a namespace a { namespace b { namespace c { ";
>>
>> > #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
>> > if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
>> > print "$1\$\n";
>> > }
>>
>> With the * quantifier, the regex seems to behave non-greedy, though.
>
> According to the manual, *? is non-greedy.
> Why * is also non-greedy?
Greediness is not involved here.
(Greedy vs. non-greedy never changes whether a match will succeed or fail.
It is simply a "tie breaker" used when the regex engine can match more
than one way at the current pos()ition.
)
There are 2 primary issues with this OP's problem: writing a pattern
where everything is optional, and that regexes match as early as possible
from left to right.
If you write a pattern where everything is optional, then it will match
the empty string, which in turn means that it would match *every* string
you can think of.
The left-to-right evaluation of the pattern seems to be buried
a bit in perlre.pod:
The above recipes describe the ordering of matches I<at a given position>.
One more rule is needed to understand how a match is determined for the
whole regular expression: a match at an earlier position is always better
than a match at a later position.
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
------------------------------
Date: Sun, 22 Jun 2008 20:41:02 -0700 (PDT)
From: "comp.llang.perl.moderated" <ced@blv-sam-01.ca.boeing.com>
Subject: Re: Difference of * and + in regular expression
Message-Id: <3441f862-0c21-41d1-81df-acddb59a06b6@u6g2000prc.googlegroups.com>
On Jun 22, 8:00 am, Tad J McClellan <ta...@seesig.invalid> wrote:
> Peng Yu <PengYu...@gmail.com> wrote:
> > On Jun 21, 9:39 pm, Gunnar Hjalmarsson <nore...@gunnar.cc> wrote:
> >> Peng Yu wrote:
> >> > If I used the uncommented if-statement, I would get no match.
>
> >> Not true. $1 is defined, so the regex does match.
>
> >> > $string="a namespace a { namespace b { namespace c { ";
>
> >> > #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
> >> > if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
> >> > print "$1\$\n";
> >> > }
>
> >> With the * quantifier, the regex seems to behave non-greedy, though.
>
> > According to the manual, *? is non-greedy.
> > Why * is also non-greedy?
>
> Greediness is not involved here.
>
> (Greedy vs. non-greedy never changes whether a match will succeed or fail.
> It is simply a "tie breaker" used when the regex engine can match more
> than one way at the current pos()ition.
> )
>
> There are 2 primary issues with this OP's problem: writing a pattern
> where everything is optional, and that regexes match as early as possible
> from left to right.
>
> If you write a pattern where everything is optional, then it will match
> the empty string, which in turn means that it would match *every* string
> you can think of.
>
> The left-to-right evaluation of the pattern seems to be buried
> a bit in perlre.pod:
>
> The above recipes describe the ordering of matches I<at a given position>.
> One more rule is needed to understand how a match is determined for the
> whole regular expression: a match at an earlier position is always better
> than a match at a later position.
>
I still prefer to think of this as another
aspect of greediness: * can be greedy
but only as greedy as needed to get the
earliest match. Thus, even greed embraces the cardinal Perl virtue of
laziness....
--
Charles DeRykus
------------------------------
Date: Sun, 22 Jun 2008 15:11:20 -0700
From: "szr" <szrRE@szromanMO.comVE>
Subject: Re: FAQ 8.26 Why doesn't open() return an error when a pipe open fails?
Message-Id: <g3mim801uek@news4.newsguy.com>
PerlFAQ Server wrote:
[...]
> 8.26: Why doesn't open() return an error when a pipe open fails?
>
> If the second argument to a piped open() contains shell
> metacharacters, perl fork()s, then exec()s a shell to decode the
> metacharacters and eventually run the desired program. If the
> program couldn't be run, it's the shell that gets the message, not
> Perl. All your Perl program can find out is whether the shell
> itself could be successfully started.
When it invokes a shell, why doesn't it just return the value or $? (or
equivalent exit status variable) so that the open returns the correct
exit status?
--
szr
------------------------------
Date: 23 Jun 2008 07:50:26 GMT
From: Dave Weaver <zen13097@zen.co.uk>
Subject: Re: Filtering a string
Message-Id: <485f55c2$0$10645$fa0fcedb@news.zen.co.uk>
On Fri, 20 Jun 2008 19:44:37 +0200, Dave B <daveb@addr.invalid> wrote:
> Bill H wrote:
>
> > Can someone point me to some docs on how I would do this without
> > iterating over the whole string (pattern matching?):
> >
> > $original = "a malformed%string/containi\"ng characters I don'~t
> > want! ...";
> >
> > $filter = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890-_";
> >
> > $new = &fix($original);
> >
> > $new would now equal:
> >
> > amalformedstringcontainingcharactersidontwant
>
> Is the following acceptable for you?
>
> $new = lc(join("",grep(m/[$filter]/i,split(//,$original))));
>
That won't do what the OP wants - try it with "=", "[" or "<" in
$original, for example.
------------------------------
Date: Mon, 23 Jun 2008 09:58:31 +0200
From: Dave B <daveb@addr.invalid>
Subject: Re: Filtering a string
Message-Id: <g3nl97$snk$1@registered.motzarella.org>
Dave Weaver wrote:
> On Fri, 20 Jun 2008 19:44:37 +0200, Dave B <daveb@addr.invalid> wrote:
>> Bill H wrote:
>>
>>> Can someone point me to some docs on how I would do this without
>>> iterating over the whole string (pattern matching?):
>>>
>>> $original = "a malformed%string/containi\"ng characters I don'~t
>>> want! ...";
>>>
>>> $filter = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890-_";
>>>
>>> $new = &fix($original);
>>>
>>> $new would now equal:
>>>
>>> amalformedstringcontainingcharactersidontwant
>> Is the following acceptable for you?
>>
>> $new = lc(join("",grep(m/[$filter]/i,split(//,$original))));
>>
>
> That won't do what the OP wants - try it with "=", "[" or "<" in
> $original, for example.
Yes, the "-" in the filter should be at the very beginning or end, I
overlooked that. However, after seeing the solutions based on tr///, I
realized that this is not by any means a good solution. I'm still learning...
--
D.
------------------------------
Date: Sun, 22 Jun 2008 14:55:15 -0700
From: "szr" <szrRE@szromanMO.comVE>
Subject: Re: How to make list of all htm file...
Message-Id: <g3mho301tjt@news4.newsguy.com>
Dr.Ruud wrote:
> szr schreef:
>
>> $ find . | grep -P 'html?$'
>
> That is quite wasteful, even if the current directory doesn't contain
> millions of subdirectories and files.
>
> And it would erroneously return ./test_html and such.
>
> $ find . -type f -name "*.htm" -or -name "*.html"
>
> $ find . -type f -regex ".*\.html?"
Ah, yes, I forgot the *. in my examples. And I forgot you could use
regex with find.
--
szr
------------------------------
Date: Sun, 22 Jun 2008 15:02:09 -0700
From: "szr" <szrRE@szromanMO.comVE>
Subject: Re: How to make list of all htm file...
Message-Id: <g3mi5301tvf@news4.newsguy.com>
Andrew DeFaria wrote:
> David Filmer wrote:
>> Pero wrote:
>>
>>> I want to write search script in perl.
>>> How to make list of all htm file on Linux - Apache web server?
>>
>> Perl is a big hammer for such a small nail.
>>
>> How about just typing this at your commandline:
>>
>> find . -name "*.htm"
>>
>> (that recurses down from your current directory. cd to \ if you want
>> to find ALL such files anywhere they may exist. But you probably
>> want to start at your Apache DocumentRoot).
>
> "find" doesn't do this on Windows. On Unix there is no "\" to cd too.
> So which OS are you speaking of?
Actually you can cd to "\", which takes you to the root of the current
drive you are in. If you want a true Unix style root have a look at
cygwin.
--
szr
------------------------------
Date: Sun, 22 Jun 2008 09:28:59 -0700 (PDT)
From: Peng Yu <PengYu.UT@gmail.com>
Subject: How to match string end for a multiline string?
Message-Id: <69e38e64-fc3e-4391-be47-9c0813fb120e@2g2000hsn.googlegroups.com>
Hi,
$ matches line end. When a string has multiple lines, how to much the
last line end?
Thanks,
Peng
------------------------------
Date: Sun, 22 Jun 2008 18:33:27 +0200
From: Damian Lukowski <damian@tvk.rwth-aachen.de>
Subject: Re: How to match string end for a multiline string?
Message-Id: <6c7d6oF3ear8iU1@mid.dfncis.de>
> The "\A" and "\Z" are just like "^" and "$",
> except that they won't match multiple times when the "/m" modifier is
> used, while "^" and "$" will match at every internal line boundary. To
> match the actual end of the string and not ignore an optional trailing
> newline, use "\z"
------------------------------
Date: Mon, 23 Jun 2008 04:42:20 GMT
From: merlyn@stonehenge.com (Randal Schwartz)
Subject: new CPAN modules on Mon Jun 23 2008
Message-Id: <K2wFqK.Gr4@zorch.sf-bay.org>
The following modules have recently been added to or updated in the
Comprehensive Perl Archive Network (CPAN). You can install them using the
instructions in the 'perlmodinstall' page included with your Perl
distribution.
Algorithm-Evolutionary-0.5.7
http://search.cpan.org/~jmerelo/Algorithm-Evolutionary-0.5.7/
Perl extension for performing paradigm-free evolutionary algorithms.
----
Algorithm-MedianSelect-XS-0.21
http://search.cpan.org/~schubiger/Algorithm-MedianSelect-XS-0.21/
Median finding algorithm
----
AnyEvent-4.152
http://search.cpan.org/~mlehmann/AnyEvent-4.152/
provide framework for multiple event loops
----
CPU-Z80-Assembler-1.02
http://search.cpan.org/~dcantrell/CPU-Z80-Assembler-1.02/
a Z80 assembler
----
Class-XSAccessor-Array-0.03
http://search.cpan.org/~smueller/Class-XSAccessor-Array-0.03/
Generate fast XS accessors without runtime compilation
----
Crypt-Rijndael-1.06_03
http://search.cpan.org/~bdfoy/Crypt-Rijndael-1.06_03/
Crypt::CBC compliant Rijndael encryption module
----
Devel-Events-0.06
http://search.cpan.org/~nuffin/Devel-Events-0.06/
Extensible instrumentation framework.
----
Fey-DBIManager-0.06
http://search.cpan.org/~drolsky/Fey-DBIManager-0.06/
Manage a set of DBI handles
----
Fey-ORM-0.07
http://search.cpan.org/~drolsky/Fey-ORM-0.07/
A Fey-based ORM
----
GRID-Machine-0.096
http://search.cpan.org/~casiano/GRID-Machine-0.096/
Remote Procedure Calls over a SSH link
----
IO-Plumbing-0.03
http://search.cpan.org/~samv/IO-Plumbing-0.03/
pluggable, lazy access to system commands
----
Net-Amazon-S3-Tools-0.07
http://search.cpan.org/~mra/Net-Amazon-S3-Tools-0.07/
command line tools for Amazon S3
----
Net-SNMP-Mixin-0.11
http://search.cpan.org/~gaissmai/Net-SNMP-Mixin-0.11/
mixin framework for Net::SNMP
----
Net-SNMP-Mixin-Dot1abLldp-0.10
http://search.cpan.org/~gaissmai/Net-SNMP-Mixin-Dot1abLldp-0.10/
mixin class for the Link Layer Discovery Protocol
----
Parse-Marpa-0.211_009
http://search.cpan.org/~jkegl/Parse-Marpa-0.211_009/
Earley's algorithm with LR(0) precomputation
----
Perl-Critic-1.087
http://search.cpan.org/~elliotjs/Perl-Critic-1.087/
Critique Perl source code for best-practices.
----
PerlIO-Util-0.49_03
http://search.cpan.org/~gfuji/PerlIO-Util-0.49_03/
A selection of general PerlIO utilities
----
Ruby-0.03
http://search.cpan.org/~gfuji/Ruby-0.03/
Perl interface to Ruby interpreter
----
Test-Harness-3.12
http://search.cpan.org/~andya/Test-Harness-3.12/
Run Perl standard test scripts with statistics
----
Test-HexString-0.01
http://search.cpan.org/~pevans/Test-HexString-0.01/
test binary strings with hex dump diagnostics
----
Test-Pod-Content-0.0.5
http://search.cpan.org/~mkutter/Test-Pod-Content-0.0.5/
Test a Pod's content
----
Test-TAP-Model-0.10
http://search.cpan.org/~nuffin/Test-TAP-Model-0.10/
DEPRECATED Use TAP::Harness, TAP::Formatter::HTML
----
UML-Class-Simple-0.11
http://search.cpan.org/~agent/UML-Class-Simple-0.11/
Render simple UML class diagrams, by loading the code
----
UUID-Random-0.01
http://search.cpan.org/~perler/UUID-Random-0.01/
Generate random uuid strings
----
UUID-Random-0.02
http://search.cpan.org/~perler/UUID-Random-0.02/
Generate random uuid strings
----
XUL-App-0.01
http://search.cpan.org/~agent/XUL-App-0.01/
Nifty XUL apps in a XUL::App
----
autodie-1.10_06
http://search.cpan.org/~pjf/autodie-1.10_06/
Replace functions with ones that succeed or die with lexical scope
If you're an author of one of these modules, please submit a detailed
announcement to comp.lang.perl.announce, and we'll pass it along.
This message was generated by a Perl program described in my Linux
Magazine column, which can be found on-line (along with more than
200 other freely available past column articles) at
http://www.stonehenge.com/merlyn/LinuxMag/col82.html
print "Just another Perl hacker," # the original
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 1667
***************************************