[30004] in Perl-Users-Digest
Perl-Users Digest, Issue: 1247 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Feb 1 03:09:38 2008
Date: Fri, 1 Feb 2008 00:09:06 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Fri, 1 Feb 2008 Volume: 11 Number: 1247
Today's topics:
Re: "negative" regexp <tadmc@seesig.invalid>
Re: "negative" regexp <tadmc@seesig.invalid>
Re: "negative" regexp <uri@stemsystems.com>
Re: "negative" regexp <stoupa@practisoft.cz>
Re: "negative" regexp <stoupa@practisoft.cz>
Re: "negative" regexp <john@castleamber.com>
Re: "negative" regexp <jurgenex@hotmail.com>
Re: how to remove blocks between nested brackets <sopan.shewale@gmail.com>
new CPAN modules on Fri Feb 1 2008 (Randal Schwartz)
Re: Obscure baffling "module not exported" error: can s <ben@morrow.me.uk>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Fri, 01 Feb 2008 00:10:33 GMT
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: "negative" regexp
Message-Id: <slrnfq4nud.8f2.tadmc@tadmc30.sbcglobal.net>
Petr Vileta <stoupa@practisoft.cz> wrote:
> Tad J McClellan wrote:
>> Petr Vileta <stoupa@practisoft.cz> wrote:
>>
>>
>>> $html=~s/<\!\-\-.+?\-\->//sig;
>>
>>
>> Unnecessary backslashes make your code much harder to read
>> and understand. You should backslash only when you actually
>> need to.
>>
>> There is not much point in ignoring case when your pattern
>> does not contain any letters...
>>
>>
>> $html =~ s/<!--.+?-->//sg;
> Yes, you are right, but O'Reilly book "Programin Perl" say "... any other
> escaped character is character itself".
... and in the regular expression language, many UNescaped characters
also are the character itself!
> Maybe this is not correct cite, I have
> Czech version.
You have not given enough information to be able to find it in the
Camel book. What chapter/section? What edition?
> In other word the character - is sometime "range operator"
... and it is sometimes "subtraction operator".
> say
> in case [a-z]
That is not the regular expression language (grammar), that is
in the character class language.
In the Perl language, the character - is subtraction.
In the regular expression language, the character - is not special,
it matches a - character.
In the character class language, the character - forms a range.
You have to know which language you are in before you can properly
discern what all those funny characters mean.
> and character ! sometime mean "not".
... in the Perl language.
It is not special in either the regular expression language nor in
the character class language.
> So for to be sure a
> character is a character but not operator
then all you need to know is which language you are currently writing in.
> I'm used to escape all possible
> ambiguous characters ;-)
If you code from ignorance, you end up with ignorant code.
Simply learn Perl and its "mini languages", then you will be really sure,
and your code won't look so embarrassingly amateurish (as well as be
much easier to maintain).
For a bit more on this, see:
http://groups.google.com/group/comp.lang.perl.misc/msg/a218a97e390c892a
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
------------------------------
Date: Fri, 01 Feb 2008 00:10:33 GMT
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: "negative" regexp
Message-Id: <slrnfq4o86.8f2.tadmc@tadmc30.sbcglobal.net>
Uri Guttman <uri@stemsystems.com> wrote:
>>>>>> "PV" == Petr Vileta <stoupa@practisoft.cz> writes:
[ snip "parsing" HTML with a regex ]
> PV> HTML:Parser and WWW:Mechanize are good modules but in many case these
> PV> are "too big gun" :-)
>
> better a big accurate gun than a tiny pistol with no accuracy. you might
> even shoot your eye out!
Or your foot!
http://groups.google.com/group/comp.lang.perl.misc/msg/a97f4d7d02afa8ff
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
------------------------------
Date: Fri, 01 Feb 2008 00:34:48 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: "negative" regexp
Message-Id: <x71w7xy0nq.fsf@mail.sysarch.com>
>>>>> "TJM" == Tad J McClellan <tadmc@seesig.invalid> writes:
TJM> Uri Guttman <uri@stemsystems.com> wrote:
>>>>>>> "PV" == Petr Vileta <stoupa@practisoft.cz> writes:
TJM> [ snip "parsing" HTML with a regex ]
PV> HTML:Parser and WWW:Mechanize are good modules but in many case these
PV> are "too big gun" :-)
>>
>> better a big accurate gun than a tiny pistol with no accuracy. you might
>> even shoot your eye out!
TJM> Or your foot!
TJM> http://groups.google.com/group/comp.lang.perl.misc/msg/a97f4d7d02afa8ff
ha!
i was refering to jean shepherd's a christmas story. :)
use Red::Ryder::BB ;
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Architecture, Development, Training, Support, Code Review ------
----------- Search or Offer Perl Jobs ----- http://jobs.perl.org ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
------------------------------
Date: Fri, 1 Feb 2008 03:41:57 +0100
From: "Petr Vileta" <stoupa@practisoft.cz>
Subject: Re: "negative" regexp
Message-Id: <fnu12l$18be$2@ns.felk.cvut.cz>
Tad J McClellan wrote:
>> Yes, you are right, but O'Reilly book "Programin Perl" say "... any
>> other escaped character is character itself".
>
>
> ... and in the regular expression language, many UNescaped characters
> also are the character itself!
>
Of course, sir ;-) But if I'm not sure if character could be operator then I
escape it, if I'm not sure about precedence in calculation then I add
"needless" parentheses ;-)
> You have not given enough information to be able to find it in the
> Camel book. What chapter/section? What edition?
>
I try it, but I have Czech edition and maybe my translation will not be
accurate or my edition can have more or less pages.
Larry Wall, Tom Christiansen & Randal L. Schwartz
Programming Perl
Original copyright: 1996 O'Reilly and Associates Inc.
Translations: 1997 Computer Press, Pague, Czech Republic
Chapter "2. Basic program parts", page 69 "comparing by paterns"
<cite>
Code Meaning
-----------------------
\a signal
\n new line
....
\S other then blank character
....
Character "c" preceded by backslash and followed by single character , for
example \cD, is identical with
control-character.
Any other character preceded by backslash is identical with character itself.
</cite>
> Simply learn Perl and its "mini languages", then you will be really
> sure, and your code won't look so embarrassingly amateurish (as well
> as be much easier to maintain).
>
I endeavour to learn Perl and all its parts, nuances and tricks but I'm from
"lost postcommunistic generation". Now I'm 50+ and I started learn English
only few years ago, this is too late for man. We had "iron curtain" here 40
years and we had no chance to get informations from "free world". When you are
20 or 30, then you can learn 2-3 human languages and many programming
languages because your memory is able to absorb informations. But as you are
older and older then your memory more and more fail to absorb new
informations.
But end of lament - I'm happy that we have not communism here now :-) and I
can communicate with any people in the world.
--
Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your
mail from another non-spammer site please.)
Please reply to <petr AT practisoft DOT cz>
------------------------------
Date: Fri, 1 Feb 2008 02:14:16 +0100
From: "Petr Vileta" <stoupa@practisoft.cz>
Subject: Re: "negative" regexp
Message-Id: <fnu12l$18be$1@ns.felk.cvut.cz>
Michele Dondi wrote:
> On Thu, 31 Jan 2008 15:05:35 +0100, "Petr Vileta"
> <stoupa@practisoft.cz> wrote:
>
> I was *just* commenting on you claim that HTML parsing modules "build
> large hashes" which IMHO is not (necessarily) the case. And I'm still
> asking you for some evidence.
>
Michele, I have no time to prepare concrete example, but please compute with
me:
when I load html page say 100kB into string type variable then script occupy
100kB + 4 (?) bytes for varable pointer. When I parse it by HTML::Parser into
has then I will get hash with 100, 200, 1000 ? hash items. All of these items
must ocupy space for own name (as text) and pointers to parent and child
items. Maybe this is not correct definition of hash structure in memory, but
maybe is near to true ;-) In other word when you use my way and dump all
memory occupied by perl script into file then this file may be say about
200kB. If you use Parser and dump to file then the file may be say about from
300 up to 500kB in dependence of html complexity. I'm old fashion programmer
and I begin with assembler for 8-bit computers so I still tend to spare
memory, disk space, number of CPU usage anytime it is possible :-) We have
saying in Czech "You can not teach an old dog to do new stunts".
--
Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your
mail from another non-spammer site please.)
Please reply to <petr AT practisoft DOT cz>
------------------------------
Date: 1 Feb 2008 03:40:44 GMT
From: John Bokma <john@castleamber.com>
Subject: Re: "negative" regexp
Message-Id: <Xns9A36DC87DD489castleamber@130.133.1.4>
"Petr Vileta" <stoupa@practisoft.cz> wrote:
> Michele Dondi wrote:
>> On Thu, 31 Jan 2008 15:05:35 +0100, "Petr Vileta"
>> <stoupa@practisoft.cz> wrote:
>>
>> I was *just* commenting on you claim that HTML parsing modules "build
>> large hashes" which IMHO is not (necessarily) the case. And I'm still
>> asking you for some evidence.
>>
> Michele, I have no time to prepare concrete example, but please
> compute with
> me:
>
> when I load html page say 100kB into string type variable then script
> occupy 100kB + 4 (?) bytes for varable pointer. When I parse it by
> HTML::Parser into has then I will get hash with 100, 200, 1000 ? hash
> items. All of these items must ocupy space for own name (as text) and
> pointers to parent and child items. Maybe this is not correct
> definition of hash structure in memory, but maybe is near to true ;-)
Not all HTML parsers create an entire tree in memory. I have more
experience with XML parsers, but with XML you have parsers that generate
events for each element encoutered (to be more exactly, the start of an
element, character data, end of an element, and possible some more). If
you don't store it yourself, nothing is stored. Those are great if you
want to get some information from a huge file, for example.
I just did a quick check, and HTML::PullParser does sound to me like it
works along those lines:
"repeatedly call $parser->get_token to obtain the tags and text found in
the parsed document."
And I have the feeling that even HTML::Parser works (or can work) that
way.
> memory, disk space, number of CPU usage anytime it is possible :-) We
> have saying in Czech "You can not teach an old dog to do new stunts".
But you're not a dog ;-) The problem is that an old dog often has become a
part of the family and knows it gets its food and walks anyway. There is
no need to learn new tricks, the old ones will work.
--
John
Arachnids near Coyolillo - part 1
http://johnbokma.com/mexit/2006/05/04/arachnids-coyolillo-1.html
------------------------------
Date: Fri, 01 Feb 2008 05:25:00 GMT
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: "negative" regexp
Message-Id: <h4b5q310acv5706654fk7m4t0uks2jkrca@4ax.com>
"Petr Vileta" <stoupa@practisoft.cz> wrote:
>Michele Dondi wrote:
>> On Thu, 31 Jan 2008 15:05:35 +0100, "Petr Vileta"
>> <stoupa@practisoft.cz> wrote:
>>
>> I was *just* commenting on you claim that HTML parsing modules "build
>> large hashes" which IMHO is not (necessarily) the case. And I'm still
>> asking you for some evidence.
>>
> Michele, I have no time to prepare concrete example, but please compute with
>me:
>
>when I load html page say 100kB into string type variable then script occupy
>100kB + 4 (?) bytes for varable pointer. When I parse it by HTML::Parser into
>has then I will get hash with 100, 200, 1000 ? hash items. All of these items
>must ocupy space for own name (as text) and pointers to parent and child
>items.
Well, no. Or at least not necessarily. Just check the documentation for e.g.
HTML::Parser. It clearly says:
Objects of the "HTML::Parser" class will recognize markup and separate
it from plain text (alias data content) in HTML documents. As different
kinds of markup and text are recognized, the corresponding event
handlers are invoked.
In other words unless _YOU_ define a call-back that stores those elements
nothing will be stored. This way you can extract exactly _what_ you want and
store it in the _way_ you want it.
------------------------------
Date: Thu, 31 Jan 2008 15:14:52 -0800 (PST)
From: "sopan.shewale@gmail.com" <sopan.shewale@gmail.com>
Subject: Re: how to remove blocks between nested brackets
Message-Id: <38d61121-1887-40b9-a7d5-7a4da6ac76f8@h11g2000prf.googlegroups.com>
Hi,
People have already answered your question - i just thought of adding
value to the discussion.
Have a look at the articles:
[1]. http://blog.stevenlevithan.com/archives/regex-recursion
[2]. http://perl.plover.com/yak/regex/samples/slide083.html
This articles will help you to understand the solutions posted in this
discussion and give high level understanding about the problem.
Regards,
--sopan shewale
On Jan 31, 8:20 am, Abigail <abig...@abigail.be> wrote:
> _
> Si (silicium_a...@rmony-p.ath.cx) wrote on VCCLXVI September MCMXCIII in
> <URL:news:47a1c3a1$0$902$ba4acef3@news.orange.fr>:
> :) Beeing new to perl and not wanting to learn using it like assembler, I
> :) need to delete parts of a single line (possibly long) that are enclosed
> :) between nested brackets:
> :)
> :) abcd efgh [ij: 123-[456]-klm; nop-789]; qrst; uvw [xyz: 98-76-[ef]; gh;
> :) ijkl] (mnop)
> :)
> :) The comments between [] must be discarded to split with ';' delimiter.
> :) The expected result is:
> :) array[0] == "abcd efgh"
> :) array[1] == "qrst"
> :) array[2] == "uvw (mnop)"
> :)
> :) Thanks for a magic formula.
>
> The magic formula is: \s*(\[[^][]*+(?:(?1)[^][]*+)*\])
>
> my $_ = "abcd efgh [ij: 123-[456]-klm; nop-789]; qrst; uvw " .
> "[xyz: 98-76-[ef]; gh;ijkl] (mnop)";
>
> s {\s*(\[[^][]*+(?:(?1)[^][]*+)*\])} {}g;
>
> my @array = split /;\s*/;
>
> say qq {array[$_] == "}, $array [$_], qq {"} for 0 .. $#array;
>
> __END__
> array[0] == "abcd efgh"
> array[1] == "qrst"
> array[2] == "uvw (mnop)"
>
> Abigail
> --
> perl -we 'print split /(?=(.*))/s => "Just another Perl Hacker\n";'
------------------------------
Date: Fri, 1 Feb 2008 05:42:16 GMT
From: merlyn@stonehenge.com (Randal Schwartz)
Subject: new CPAN modules on Fri Feb 1 2008
Message-Id: <JvJp6G.1qGH@zorch.sf-bay.org>
The following modules have recently been added to or updated in the
Comprehensive Perl Archive Network (CPAN). You can install them using the
instructions in the 'perlmodinstall' page included with your Perl
distribution.
AIIA-GMT-0.01
http://search.cpan.org/~cjukuo/AIIA-GMT-0.01/
a XML-RPC client of a web-service server, AIIA gene mention tagger, which provides the service to recognize named entities in the biomedical articles
----
Acme-POE-Acronym-Generator-1.06
http://search.cpan.org/~bingos/Acme-POE-Acronym-Generator-1.06/
Generate random POE acronyms.
----
Algorithm-LBFGS-0.12
http://search.cpan.org/~laye/Algorithm-LBFGS-0.12/
Perl extension for L-BFGS
----
Apache-AuthCASSimple-0.0.1
http://search.cpan.org/~yvesago/Apache-AuthCASSimple-0.0.1/
Apache module to authentificate trough a CAS server
----
Apache2-AuthCASSimple-0.0.1
http://search.cpan.org/~yvesago/Apache2-AuthCASSimple-0.0.1/
Apache2 module to authentificate trough a CAS server
----
CGI-Widget-DBI-Search-0.12
http://search.cpan.org/~adiraj/CGI-Widget-DBI-Search-0.12/
Database search widget
----
Catalyst-Plugin-Assets-0.021
http://search.cpan.org/~rkrimen/Catalyst-Plugin-Assets-0.021/
Manage and minify .css and .js assets in a Catalyst application
----
DBIx-Class-EncodedColumn-0.00001_02
http://search.cpan.org/~groditi/DBIx-Class-EncodedColumn-0.00001_02/
Automatically encode columns
----
DBIx-Class-EncodedColumn-0.00001_03
http://search.cpan.org/~groditi/DBIx-Class-EncodedColumn-0.00001_03/
Automatically encode columns
----
JavaScript-Dumper-0.001
http://search.cpan.org/~perler/JavaScript-Dumper-0.001/
Dump JavaScript data structures from Perl objects. Allows unquoted strings and numbers.
----
MP3-Tag-0.9710
http://search.cpan.org/~ilyaz/MP3-Tag-0.9710/
Module for reading tags of MP3 audio files
----
Mac-Pasteboard-0.000_01
http://search.cpan.org/~wyant/Mac-Pasteboard-0.000_01/
Manipulate Mac OS X clipboards/pasteboards.
----
Module-Build-PM_Filter-v1.2
http://search.cpan.org/~vmoral/Module-Build-PM_Filter-v1.2/
Add a PM_Filter feature to Module::Build
----
Module-Build-PM_Filter-v1.2.1
http://search.cpan.org/~vmoral/Module-Build-PM_Filter-v1.2.1/
Add a PM_Filter feature to Module::Build
----
POE-Component-CPAN-SQLite-Info-0.01
http://search.cpan.org/~zoffix/POE-Component-CPAN-SQLite-Info-0.01/
non-blocking wrapper around CPAN::SQLite::Info with file fetching abilities.
----
POE-Component-CPAN-SQLite-Info-0.02
http://search.cpan.org/~zoffix/POE-Component-CPAN-SQLite-Info-0.02/
non-blocking wrapper around CPAN::SQLite::Info with file fetching abilities.
----
POE-Component-CPAN-SQLite-Info-0.03
http://search.cpan.org/~zoffix/POE-Component-CPAN-SQLite-Info-0.03/
non-blocking wrapper around CPAN::SQLite::Info with file fetching abilities.
----
POE-Component-Client-Ident-1.10
http://search.cpan.org/~bingos/POE-Component-Client-Ident-1.10/
A component that provides non-blocking ident lookups to your sessions.
----
POE-Component-IRC-5.56
http://search.cpan.org/~bingos/POE-Component-IRC-5.56/
a fully event-driven IRC client module.
----
Parse-Eyapp-1.107
http://search.cpan.org/~casiano/Parse-Eyapp-1.107/
Extensions for Parse::Yapp
----
Parse-IASLog-1.02
http://search.cpan.org/~bingos/Parse-IASLog-1.02/
A parser for Microsoft IAS-formatted log entries.
----
SMS-Send-IS-Vit-0.03
http://search.cpan.org/~avar/SMS-Send-IS-Vit-0.03/
SMS::Send driver for vit.is
----
SMS-Send-IS-Vodafone-0.03
http://search.cpan.org/~avar/SMS-Send-IS-Vodafone-0.03/
SMS::Send driver for vodafone.is
----
WWW-Search-AltaVista-2.151
http://search.cpan.org/~mthurn/WWW-Search-AltaVista-2.151/
class for searching www.altavista.com
----
WWW-Search-Backends-1.073
http://search.cpan.org/~mthurn/WWW-Search-Backends-1.073/
----
WWW-Search-Jobs-2.027
http://search.cpan.org/~mthurn/WWW-Search-Jobs-2.027/
----
Win32-GUITaskAutomate-0.03
http://search.cpan.org/~zoffix/Win32-GUITaskAutomate-0.03/
A module for automating GUI tasks.
----
XML-Compile-0.66
http://search.cpan.org/~markov/XML-Compile-0.66/
Compilation based XML processing
----
XML-Compile-SOAP-0.66
http://search.cpan.org/~markov/XML-Compile-SOAP-0.66/
base-class for SOAP implementations
----
XML-Validator-Schema-1.10
http://search.cpan.org/~samtregar/XML-Validator-Schema-1.10/
validate XML against a subset of W3C XML Schema
----
pfacter-1.9-3
http://search.cpan.org/~sschneid/pfacter-1.9-3/
Collect and display facts about the system
If you're an author of one of these modules, please submit a detailed
announcement to comp.lang.perl.announce, and we'll pass it along.
This message was generated by a Perl program described in my Linux
Magazine column, which can be found on-line (along with more than
200 other freely available past column articles) at
http://www.stonehenge.com/merlyn/LinuxMag/col82.html
print "Just another Perl hacker," # the original
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
------------------------------
Date: Thu, 31 Jan 2008 23:39:03 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Obscure baffling "module not exported" error: can someone help me find the cause?
Message-Id: <np2a75-s06.ln1@osiris.mauzo.dyndns.org>
Quoth Mark Clements <mark.clementsREMOVETHIS@wanadoo.fr>:
> Henry Law wrote:
> > I have a bizarre problem with packages and I'm hoping that someone can
> > help me find out what I'm doing wrong because I'm utterly stumped.
> >
> > The error is "not exported" for something that quite clearly is exported
> > (details follow). The error disappears when one of several particular
> <snip>
>
> > There are three modules: NFBT::ServerLib, NFBT::Utilities::Common and
> > NFBT::Utilities::Server. There is some requirement in them for
> > subroutines out of one or more of the others.
> You appear to have circular dependencies between these modules, and I'd
> guess that the mutual importing is confusing the import/export mechanism.
This *shouldn't* be a problem, providing Perl knows the exports early
enough. Put the 'require Exporter; @ISA=...; @EXPORT=...;' stuff in a
BEGIN block, *before* you use any modules that might recursively use
this one.
> This is an unrelated issue, but "use"ing happens at compile-time.
> Putting the use statement inside the subroutine does not limit its scope
> or control when it is executed.
...unless it's a lexically-scoped pragma like strict or warnings. The
use still happens at compile time, but some of the effects of that are
restricted to the lexical scope currently being compiled. This obviously
doesn't apply to simply importing subs, which is an operation with
global effect.
Ben
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 1247
***************************************