[24262] in Perl-Users-Digest
Perl-Users Digest, Issue: 6453 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Apr 23 14:05:47 2004
Date: Fri, 23 Apr 2004 11:05:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Fri, 23 Apr 2004 Volume: 10 Number: 6453
Today's topics:
Re: Asking for comments on this script <rvf_lists@fastmail.fm>
Authen::NTLM and MS04-011 (Kevin Collins)
Ben and Tassilo: about calling subs with & (Rex Gustavus Adolphus)
Re: Ben and Tassilo: about calling subs with & <tore@aursand.no>
Re: Ben and Tassilo: about calling subs with & <tassilo.parseval@rwth-aachen.de>
Re: indexing <jwillmore@remove.adelphia.net>
Re: indexing (Malcolm Dew-Jones)
Re: inefficient regex (clarified) <tore@aursand.no>
Re: inefficient regex (clarified) <mothra@mothra.pub>
Re: inefficient regex (clarified) <uri.guttman@fmr.com>
Re: inefficient regex (clarified) <mothra@mothra.pub>
Re: inefficient regex (clarified) <uri.guttman@fmr.com>
Re: inefficient regex (clarified) <jwillmore@remove.adelphia.net>
Re: inefficient regex (clarified) <uri.guttman@fmr.com>
Re: inefficient regex - please help! <jwillmore@remove.adelphia.net>
Re: inefficient regex - please help! (Gary E. Ansok)
Re: operator (Malcolm Dew-Jones)
Re: perl <jwillmore@remove.adelphia.net>
Re: perl (Malcolm Dew-Jones)
Proper way to use an imported constant under 'use stric <minter@lunenburg.org>
Re: Proper way to use an imported constant under 'use s <remorse@partners.org>
Re: RFC: Text similarity <jwillmore@remove.adelphia.net>
Re: slurp not working? ideas please! <geoffacox@dontspamblueyonder.co.uk>
Re: slurp not working? ideas please! <geoffacox@dontspamblueyonder.co.uk>
Re: What kind of RegEx should I use for <TEXTAREA>? <jwillmore@remove.adelphia.net>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Fri, 23 Apr 2004 10:49:04 -0500
From: Rafael Villarroel <rvf_lists@fastmail.fm>
Subject: Re: Asking for comments on this script
Message-Id: <w0ar7uer8y7.fsfhello@somewhere.com>
Jon Ericson <Jon.Ericson@jpl.nasa.gov> writes:
> Rafael Villarroel <rvf_lists@fastmail.fm> writes:
>
>> \makeatletter
> <snip>
>> \makeatother
>
> Other people have given you perl style advice, but I wanted to mention
> a couple of LaTeX hints. Using \makeatletter in a document is a good
> sign that you should write a document class or package.
>
Yes I know, in fact, I had that code in a separate package, but I
decided to include it in the script because it seemed easier for me to
maintain and distribute only one file.
> This seems like an excellent opportunity to use PerlTeX[2]. From the
> abstract:
>
> PerlTEX is a combination Perl script (perltex) and LATEX2" style
> file (perlmacros) that, together, give the user the ability to
> define LATEX macros in terms of Perl code. Once defined, a Perl
> macro becomes indistinguishable from any other LATEX macro. PerlTEX
> thereby combines LATEX˘s typesetting power with Perl˘s
> programmability.
>
[snip example of PerlTeX use]
This seems very interesting, I knew about the existence of PerlTeX but
I did not have a use for it at that moment. However, since the
combination of abilities to use both Perl and TeX in chess fans (to
the extent of installing a nonstandard and complex package like
PerlTeX) seems to be rare, I think I will prefer not to depend on it.
Thanks for your time taken in answering my post, and to all the other
people!
Rafael
------------------------------
Date: Fri, 23 Apr 2004 16:57:32 GMT
From: spamtotrash@toomuchfiction.com (Kevin Collins)
Subject: Authen::NTLM and MS04-011
Message-Id: <slrnc8iins.c53.spamtotrash@doom.unix-guy.com>
Hi,
We have just started installing Microsoft critical patch MS04-011
(http://www.microsoft.com/technet/security/bulletin/ms04-011.mspx) on our Win2k
servers. We have a CGI script that makes use of LWP and LWP::Authen:Ntlm which
requires Authen::NTLM. This script uses NTLM authentication to check the status
of various critical web servers.
When we apply this patch, the authentication breaks and in the Security Event
Log, we see a failed authentication but the domain shows up as a non-printable
character and the "Logon Type" is listed as "NtLmSsp". Part of the patch was an
update to LSASS (which handles RPC authentication) to perform bounds checking.
Additionally, the patch includes an SSP update (used by IIS, also appears to be
bounds checking). We can uninstall the patch and everything works fine.
My suspicion (based on the origins of Authen::NTLM) is that the code is
reverse-engineered NTLM protocol, which has now had some minor change and is
causing the Perl module to break. The patch has been out 3 or 4 days now.
I've sent basically this same info to Mark Bush (the author of Authen::NTLM),
but have not yet heard anything from him. If anyone else is seeing this or has
any ideas, I would appreciate suggestions.
Thanks in advance for any help you can offer.
Kevin
------------------------------
Date: 23 Apr 2004 08:44:15 -0700
From: uffesterner@spamhole.com (Rex Gustavus Adolphus)
Subject: Ben and Tassilo: about calling subs with &
Message-Id: <c70a85ff.0404230744.503c23c9@posting.google.com>
Hi!
A couple of months ago I participated in a thread "What with this open
file..."
I published parts of my code in it and that lead into a discussion
about not calling subs with & (started by Ben Morrow):
"Don't call subs with & unless you need to (here you don't)."
Well I just looked into one of the books I used to learn Perl (aptly
named "Learning Perl"(!))
Accidently the chapter about subroutines is published on the web,
see http://www.oreilly.com/catalog/lperl3/chapter/ch04.html
Quoting from the beginning of Chapter 4 Subroutines:
"The name of a subroutine is another Perl identifier (letters, digits,
and underscores, but can't start with a digit) with a
sometimes-optional ampersand (&) in front. There's a rule about when
you can omit the ampersand and when you cannot; we'll see that rule by
the end of the chapter. For now, we'll just use it every time that
it's not forbidden, which is always a safe rule"
And another quote from the end of the same chapter:
"So, the real rule to use is this one: until you know the names of all
of Perl's builtin functions, always use the ampersand on function
calls."
So now, Ben and Tassilo, you maybe understand way programmers new to
Perl uses & in subroutine calls?
Have a good weekend : )
------------------------------
Date: Fri, 23 Apr 2004 17:57:01 +0200
From: Tore Aursand <tore@aursand.no>
Subject: Re: Ben and Tassilo: about calling subs with &
Message-Id: <pan.2004.04.23.15.56.48.128604@aursand.no>
On Fri, 23 Apr 2004 08:44:15 -0700, Rex Gustavus Adolphus wrote:
> Well I just looked into one of the books I used to learn Perl (aptly
> named "Learning Perl"(!))
> [...]
I hope it have been updated;
perldoc -q calling
perldoc perlsub
--
Tore Aursand <tore@aursand.no>
"First, God created idiots. That was just for practice. Then He created
school boards." (Mark Twain)
------------------------------
Date: 23 Apr 2004 17:45:32 GMT
From: "Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de>
Subject: Re: Ben and Tassilo: about calling subs with &
Message-Id: <c6bkns$a94hg$1@ID-231055.news.uni-berlin.de>
Also sprach Rex Gustavus Adolphus:
> A couple of months ago I participated in a thread "What with this open
> file..."
>
> I published parts of my code in it and that lead into a discussion
> about not calling subs with & (started by Ben Morrow):
> "Don't call subs with & unless you need to (here you don't)."
>
> Well I just looked into one of the books I used to learn Perl (aptly
> named "Learning Perl"(!))
>
> Accidently the chapter about subroutines is published on the web,
> see http://www.oreilly.com/catalog/lperl3/chapter/ch04.html
>
> Quoting from the beginning of Chapter 4 Subroutines:
> "The name of a subroutine is another Perl identifier (letters, digits,
> and underscores, but can't start with a digit) with a
> sometimes-optional ampersand (&) in front. There's a rule about when
> you can omit the ampersand and when you cannot; we'll see that rule by
> the end of the chapter. For now, we'll just use it every time that
> it's not forbidden, which is always a safe rule"
>
> And another quote from the end of the same chapter:
> "So, the real rule to use is this one: until you know the names of all
> of Perl's builtin functions, always use the ampersand on function
> calls."
>
>
> So now, Ben and Tassilo, you maybe understand way programmers new to
> Perl uses & in subroutine calls?
Quite. If you re-read the thread you quoted you will notice that I did
not at all condemn the ampersand on function calls. In essence I said:
If a programmer sees advantages in it (even if it is just for the sake
of feeling more comfortable), there is nothing wrong with them. The
side-effects they have are almost neglectible and they are further cut
down by 50% when parens are used.
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
------------------------------
Date: Fri, 23 Apr 2004 11:09:21 -0400
From: James Willmore <jwillmore@remove.adelphia.net>
Subject: Re: indexing
Message-Id: <pan.2004.04.23.15.09.14.610793@remove.adelphia.net>
On Fri, 23 Apr 2004 01:23:06 -0400, kums wrote:
> A word has many no. of subdivisions of words.That subdivisions has many
> no.of subdivisions.and etc.
> how to provide number indexing to that words.
> consider the foll.example
> i want to prefix indexing like..
> 1.word
> 1.1 alphabets.
> 1.1.1 small
> 1.1.2 caps
> 1.2 numbers
> 2.color
This is one of those things where *you* need to put forth the effort and
write some code - because there are many ways to do what it is you're
asking about :-)
Your first stop should be http://search.cpan.org to see if someone has put
together a module to do this. I'd try searching the Lingua modules first.
*If* you're just interested in some application that will do this, visit
http://freshmeat.net and search for "index". There are several
applications that do this.
HTH
--
Jim
Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.
a fortune quote ...
"All snakes who wish to remain in Ireland will please raise
their right hands." -- Saint Patrick
------------------------------
Date: 23 Apr 2004 09:49:14 -0800
From: yf110@vtn1.victoria.tc.ca (Malcolm Dew-Jones)
Subject: Re: indexing
Message-Id: <4089490a@news.victoria.tc.ca>
kums (bckumari@yahoo.com) wrote:
: A word has many no. of subdivisions of words.That subdivisions has many
: no.of subdivisions.and etc.
: how to provide number indexing to that words.
: consider the foll.example
: i want to prefix indexing like..
: 1.word
: 1.1 alphabets.
: 1.1.1 small
: 1.1.2 caps
: 1.2 numbers
: 2.color
use runoff
------------------------------
Date: Fri, 23 Apr 2004 17:22:12 +0200
From: Tore Aursand <tore@aursand.no>
Subject: Re: inefficient regex (clarified)
Message-Id: <pan.2004.04.23.15.18.03.373092@aursand.no>
On Fri, 23 Apr 2004 15:46:28 +0100, Mothra wrote:
> OK, I just posted an earlier request, but wasn't clear enough about what I
> wanted - my bad :-$
Please don't start a new thread. Continue the old thread instead.
> I need a regex string that performs the same function as this:
>
> /(.*Lost connection.*(\n.*)*){8}/
>
> but that runs more quickly when scanning a text file.
>
> It needs to be a regex solution, as the program I need it for only accepts
> perl regular expression syntax (rather than perl itself) to scan log files.
> The above example is crude and slow and I reckon there must be a better way
> to do it.
Try this one:
my $count = () = $string =~ /Lost connection/g;
Or - if you need to evaluate it right away;
if ( (my $count = () = $string =~ /Lost Connection/g) >= 8 ) {
print "There are at least 8 occurances of the string 'Lost
connection'\n";
}
else {
print "There are less than 8 occurances of the string 'Lost
connection'\n";
}
I haven't tested anything of the above, and I'm a bit unsure about the
latter example. This is Perl, though, so it _should_ work. :)
--
Tore Aursand <tore@aursand.no>
"A teacher is never a giver of truth - he is a guide, a pointer to the
truth that each student must find for himself. A good teacher is
merely a catalyst." (Bruce Lee)
------------------------------
Date: Fri, 23 Apr 2004 16:37:26 +0100
From: "Mothra" <mothra@mothra.pub>
Subject: Re: inefficient regex (clarified)
Message-Id: <c6bd5r$mds$1@news6.svr.pol.co.uk>
"Tore Aursand" <tore@aursand.no> wrote in message
news:pan.2004.04.23.15.18.03.373092@aursand.no...
> On Fri, 23 Apr 2004 15:46:28 +0100, Mothra wrote:
> > OK, I just posted an earlier request, but wasn't clear enough about what
I
> > wanted - my bad :-$
>
> Please don't start a new thread. Continue the old thread instead.
>
But the old thread was going in the wrong direction...
> > I need a regex string that performs the same function as this:
> >
> > /(.*Lost connection.*(\n.*)*){8}/
> >
> > but that runs more quickly when scanning a text file.
> >
> > It needs to be a regex solution, as the program I need it for only
accepts
> > perl regular expression syntax (rather than perl itself) to scan log
files.
> > The above example is crude and slow and I reckon there must be a better
way
> > to do it.
>
> Try this one:
>
<snip>
> I haven't tested anything of the above, and I'm a bit unsure about the
> latter example. This is Perl, though, so it _should_ work. :)
>
None of those will work - the program I'm using will only accept a regular
expression, not Perl code. Therefore all of those if statements, variable
declarations will be useless. :-(
------------------------------
Date: 23 Apr 2004 11:03:13 -0400
From: Uri Guttman <uri.guttman@fmr.com>
Subject: Re: inefficient regex (clarified)
Message-Id: <siscbrli3ff2.fsf@tripoli.fmr.com>
>>>>> "M" == Mothra <mothra@mothra.pub> writes:
M> I need a regex string that performs the same function as this:
M> /(.*Lost connection.*(\n.*)*){8}/
M> It needs to be a regex solution, as the program I need it for only
M> accepts perl regular expression syntax (rather than perl itself) to
M> scan log files. The above example is crude and slow and I reckon
M> there must be a better way to do it.
clarify your spec for that regex. why do you need all those .*s? having
* wrapping * can cause exponential explosions of cpu usage as all
combinations of the two are checked. so write your spec in clear english
and that will make it easier to code up.
read MRE for more on the problems of (.*)* stuff.
you can speed it up by converting some/all of the *'s to +'s as that
will lower the possible combinations.
if you used the /s modifier, you could drop the whole \n stuff.
so something like this could work (given my assumptions above):
/(Lost connection.+){8}/s
or this:
/(.+Lost connection){8}/s
both assume the string is never next to itself which seems like a good
restriction
and if you can't use /s (this not being perl, bad boy!), then you could
use the internal form of the modifier or [\S\s] instead of .
uri
------------------------------
Date: Fri, 23 Apr 2004 16:59:20 +0100
From: "Mothra" <mothra@mothra.pub>
Subject: Re: inefficient regex (clarified)
Message-Id: <c6beeu$qu0$1@newsg3.svr.pol.co.uk>
"Uri Guttman" <uri.guttman@fmr.com> wrote in message
news:siscbrli3ff2.fsf@tripoli.fmr.com...
> >>>>> "M" == Mothra <mothra@mothra.pub> writes:
>
> M> I need a regex string that performs the same function as this:
>
> M> /(.*Lost connection.*(\n.*)*){8}/
>
> clarify your spec for that regex. why do you need all those .*s?
Well, I probably don't, hence the post! ;-)
> * wrapping * can cause exponential explosions of cpu usage as all
> combinations of the two are checked. so write your spec in clear english
> and that will make it easier to code up.
>
Each line of the text file will contain a long sequence of words which may
or may not contain the phrase "Lost connection". In a whole text file, I
need to be able to ascertain with a single regex whether or not there are
more than n occurrences in the file. It's one of those system monitoring
tools that allows you to put perl regular expressions in a text box to
generate alarms based on the content of log files. I wouldn't normally
touch that sort of thing, but I seem to be the only person in the company
who knows *any* regex.
> read MRE for more on the problems of (.*)* stuff.
>
<newbie question>errr MRE??</newbie question>
> you can speed it up by converting some/all of the *'s to +'s as that
> will lower the possible combinations.
>
> if you used the /s modifier, you could drop the whole \n stuff.
>
> so something like this could work (given my assumptions above):
>
> /(Lost connection.+){8}/s
>
> or this:
>
> /(.+Lost connection){8}/s
>
> both assume the string is never next to itself which seems like a good
> restriction
>
> and if you can't use /s (this not being perl, bad boy!), then you could
> use the internal form of the modifier or [\S\s] instead of .
>
Thanks - that's set me on the right track. :-)
------------------------------
Date: 23 Apr 2004 12:14:33 -0400
From: Uri Guttman <uri.guttman@fmr.com>
Subject: Re: inefficient regex (clarified)
Message-Id: <siscwu461xjq.fsf@tripoli.fmr.com>
>>>>> "M" == Mothra <mothra@mothra.pub> writes:
M> "Uri Guttman" <uri.guttman@fmr.com> wrote in message
M> news:siscbrli3ff2.fsf@tripoli.fmr.com...
>>
M> Each line of the text file will contain a long sequence of words
M> which may or may not contain the phrase "Lost connection". In a
M> whole text file, I need to be able to ascertain with a single regex
M> whether or not there are more than n occurrences in the file. It's
M> one of those system monitoring tools that allows you to put perl
M> regular expressions in a text box to generate alarms based on the
M> content of log files. I wouldn't normally touch that sort of
M> thing, but I seem to be the only person in the company who knows
M> *any* regex.
they need to learn them. :)
>> read MRE for more on the problems of (.*)* stuff.
>>
M> <newbie question>errr MRE??</newbie question>
mastering regular expressions 2nd ed. by jeff friedl.
i haven't gotten 2nd ed. but i hear it is very good. the 1st ed. (which
you should be able to get for cheap) is excellent and still useful for
stuff like this.
>> /(Lost connection.+){8}/s
>>
>> or this:
>>
>> /(.+Lost connection){8}/s
M> Thanks - that's set me on the right track. :-)
and what are your results? a speedup number would be nice to hear. :)
uri
------------------------------
Date: Fri, 23 Apr 2004 12:56:49 -0400
From: James Willmore <jwillmore@remove.adelphia.net>
Subject: Re: inefficient regex (clarified)
Message-Id: <pan.2004.04.23.16.45.38.883750@remove.adelphia.net>
On Fri, 23 Apr 2004 16:37:26 +0100, Mothra wrote:
> None of those will work - the program I'm using will only accept a regular
> expression, not Perl code. Therefore all of those if statements, variable
> declarations will be useless. :-(
Huh? I was under the impression you were writing Perl code. What's with
the "the program I'm using will only accept a regular expression, not Perl
code" statement? If you're trying to develop *just* a regular expression,
then this isn't the group to post in - this is a *Perl* newsgroup and the
answers you'll get are going to be .... *Perl* answers :-)
--
Jim
Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.
a fortune quote ...
Endless Loop: n., see Loop, Endless. Loop, Endless: n., see
Endless Loop. -- Random Shack Data Processing Dictionary
------------------------------
Date: 23 Apr 2004 11:07:47 -0400
From: Uri Guttman <uri.guttman@fmr.com>
Subject: Re: inefficient regex (clarified)
Message-Id: <sisc4qra3f7g.fsf@tripoli.fmr.com>
>>>>> "M" == Mothra <mothra@mothra.pub> writes:
M> "Mothra" <mothra@mothra.pub> wrote in message
M> news:c6ba68$k5d$1@news6.svr.pol.co.uk...
>> OK, I just posted an earlier request, but wasn't clear enough about what I
>> wanted - my bad :-$
>>
>> I need a regex string that performs the same function as this:
>>
>> /(.*Lost connection.*(\n.*)*){8}/
>>
>> but that runs more quickly when scanning a text file.
>>
M> OK I managed to make it less ugly with:
M> /(Lost connection.*){19}/s
M> when I discovered that the 's' allows dot to match a newline
M> character, but it's still quite slow when you want to match say 40
M> occurences in a file that only actually contains 4.
so you figured out the /s thing which i just posted (from a very SLOW
uploading feed at work. hours of delay, hence the email cc).
M> Is it because the regex is having to backtrack through the entire
M> string each time it tries to ascertain whether or not the limit has
M> been reached?
yes. so use + and here is another trick that may help:
/(Lost connection[^L]+.*?){19}/s
again, read MRE for tips on how to control backtracking which is what is
killing you. that is why the while loops are so much faster. they never
backtrack as they match only once per loop.
uri
------------------------------
Date: Fri, 23 Apr 2004 12:56:51 -0400
From: James Willmore <jwillmore@remove.adelphia.net>
Subject: Re: inefficient regex - please help!
Message-Id: <pan.2004.04.23.16.40.38.952163@remove.adelphia.net>
On Fri, 23 Apr 2004 14:01:58 +0100, Mothra wrote:
> Am trying to match n occurences of a phrase ("Lost connection") in a text
> file.
[ ... ]
> Here's the code I;ve got so far (I'm reading the whole file into the scalar
> $lines)...
>
> if ($lines =~ /(.*Lost connection.*(\n.*)*{4}/) {
^^^ ???
Are you sure you have the right regular expression here? It looks like
you're missing a paren. Plus, you're trying to *extract* the newlines
(think "(\n)" ). Maybe you forgot to escape the metacharacters?
[ ... ]
> How could I rewrite this regex more efficiently?
If you want to just *count* "Lost connection", consider using 'index'. If
you're trying to *extract* information from the line that contains "Lost
Connection", read the file one line at a time, *then* use a regular
expression to extract the information from the line. From what you
posted, it looks like your trying to *extract* information from the "Lost
Connection" occurance - but you state you want to "count" the occurances.
HTH
--
Jim
Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.
a fortune quote ...
Succumb to natural tendencies. Be hateful and boring.
------------------------------
Date: Fri, 23 Apr 2004 17:50:16 +0000 (UTC)
From: ansok@alumni.caltech.edu (Gary E. Ansok)
Subject: Re: inefficient regex - please help!
Message-Id: <c6bl0o$1uc$1@naig.caltech.edu>
In article <c6b42a$f7r$1@news6.svr.pol.co.uk>,
Mothra <mothra@mothra.pub> wrote:
>Am trying to match n occurences of a phrase ("Lost connection") in a text
>file.
>
>Here's the code I;ve got so far (I'm reading the whole file into the scalar
>$lines)...
>
>if ($lines =~ /(.*Lost connection.*(\n.*)*{4}/) {
> print "yep\n";
>} else {
> print "nope\n";
>}
Well, if you absolutely must do it in one regex, how about
if ($lines =~ /(?:Lost connection.*?){4}/s) {
I'm not sure whether there's any benefit to using .*? instead of .*,
you might want to try it both ways and see.
Gary Ansok
--
Never, never, say "moot" to an English person. It gives
them the wrong idea.
-- Lars Eighner
------------------------------
Date: 23 Apr 2004 09:51:10 -0800
From: yf110@vtn1.victoria.tc.ca (Malcolm Dew-Jones)
Subject: Re: operator
Message-Id: <4089497e@news.victoria.tc.ca>
kums (bckumari@yahoo.com) wrote:
: tell me the difference between the substitution operator(s) and
: transliteration operator(tr)
the use of back references
------------------------------
Date: Fri, 23 Apr 2004 11:13:02 -0400
From: James Willmore <jwillmore@remove.adelphia.net>
Subject: Re: perl
Message-Id: <pan.2004.04.23.15.12.57.333218@remove.adelphia.net>
On Fri, 23 Apr 2004 08:44:23 +0200, Vetle Roeim wrote:
> * bckumari@yahoo.com
>> why we are using perl?
>
> [ ... ] What is the meaning of life?
42 :-)
--
Jim
Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.
a fortune quote ...
In Corning, Iowa, it's a misdemeanor for a man to ask his wife
to ride in any motor vehicle.
------------------------------
Date: 23 Apr 2004 09:53:55 -0800
From: yf110@vtn1.victoria.tc.ca (Malcolm Dew-Jones)
Subject: Re: perl
Message-Id: <40894a23@news.victoria.tc.ca>
kums (bckumari@yahoo.com) wrote:
: why we are using perl?
I think that we are both using news readers, not perl.
: what is the perl's usage and advantages?
perl
------------------------------
Date: Fri, 23 Apr 2004 16:50:29 GMT
From: "H. Wade Minter" <minter@lunenburg.org>
Subject: Proper way to use an imported constant under 'use strict'?
Message-Id: <pPbic.34812$6m4.1546444@twister.southeast.rr.com>
I'm taking time and cleaning up my Perl/Tk application to run under
"use strict", which has been fun and productive, and also revealed some
bad code that I've rewritten. So, cool.
I have a question on one thing, though, and didn't see an answer by
Googling. My code is designed on Linux but also runs on Windows via
checks of $^O. So, to import Windows-specific modules, I do this:
if ( "$^O" eq "MSWin32" )
{
$rcfile = "C:\\mrvoice.cfg";
BEGIN
{
if ( $^O eq "MSWin32" )
{
require LWP::UserAgent;
LWP::UserAgent->import();
require HTTP::Request;
HTTP::Request->import();
require Win32::Process;
Win32::Process->import();
require Tk::Radiobutton;
Tk::Radiobutton->import();
require Win32::FileOp;
Win32::FileOp->import();
require Audio::WMA;
Audio::WMA->import();
}
}
}
Then, later in the code, I use an imported constant like:
if ( "$^O" eq "MSWin32" )
{
# Start the MP3 player on a Windows system
my $object;
Win32::Process::Create( $object, $config{'mp3player'}, '', 1,
NORMAL_PRIORITY_CLASS, "." );
$mp3_pid = $object->GetProcessID();
sleep(1);
}
However, when strict subs are enabled, I get an error about barewords:
[minter@localhost mrvoice]$ ./mrvoice.pl
Bareword "NORMAL_PRIORITY_CLASS" not allowed while "strict subs" in use
at ./mrvoice.pl line 3513.
Execution of ./mrvoice.pl aborted due to compilation errors.
My question is - what's the proper way to use this constant when strictures
are enabled?
Thanks,
Wade
------------------------------
Date: Fri, 23 Apr 2004 13:50:31 -0400
From: Richard Morse <remorse@partners.org>
Subject: Re: Proper way to use an imported constant under 'use strict'?
Message-Id: <remorse-E1D3AB.13503123042004@plato.harvard.edu>
In article <pPbic.34812$6m4.1546444@twister.southeast.rr.com>,
"H. Wade Minter" <minter@lunenburg.org> wrote:
> [minter@localhost mrvoice]$ ./mrvoice.pl
> Bareword "NORMAL_PRIORITY_CLASS" not allowed while "strict subs" in use
> at ./mrvoice.pl line 3513.
> Execution of ./mrvoice.pl aborted due to compilation errors.
>
> My question is - what's the proper way to use this constant when strictures
> are enabled?
Completely untested, as I don't right now have access to a windows box,
but perhaps accessing it as Win32::Process::NORMAL_PRIORITY_CLASS?
Ricky
--
Pukku
------------------------------
Date: Fri, 23 Apr 2004 11:46:44 -0400
From: James Willmore <jwillmore@remove.adelphia.net>
Subject: Re: RFC: Text similarity
Message-Id: <pan.2004.04.23.15.46.39.446317@remove.adelphia.net>
On Fri, 23 Apr 2004 14:16:53 +0200, Tore Aursand wrote:
> First of all: Converting the documents to a more sensible format (text in
> my case) is not the problem. The problem is the indexing and how to store
> the data which represents the similarity between the documents.
Just an insight or two ...
I'd use a database to store information about each document in. This way,
you can use SQL to do things like count the word occurances and create
stats on each document. Plus, your mixing apples with apples - raw word
count with raw word count. It doesn't have to be a "real" database (like
mySQL or PostgreSQL) - it could be a Sprite or SQLite database. The
advantages to this approach are 1) you can try different options out
without having to re-parse 3000 documents; 2)if you have more documents to
add or some to remove, a simple SQL statement or two is easier to perform
than a whole lot of re-coding or re-thinking the parsing part of your
code. In fact, you can split up the various parts of your logic into
different scripts that act as filters - one to parse the documents, one to
populate the database, and maybe a few to determine similarities. All too
often we think in terms of "once and done" when a few scripts might me a
better solution.
I'd also look over one (or more) of the Lingua modules to establish a
criteria of what to put into the database. I doubt if you want to put a
whole lot of "the" and "a" entries into the database. This would inflate
the data source to about 5 times what it needs to be. So, using something
like Lingua::StopWord(?) might help.
There are Statistics modules as well. You could perform tests againist
two documents and get a statistically correlation between the documents to
see *how* similar they are. I'm rusty on Statistics 101, but my thinking
is maybe using a t-test between the two documents might be the way to go.
This may be overkill for what you want, but worth thinking about (for
maybe a minute or two :-) ). There may even be something easier to do.
[ ... ]
Just my $0.02 :-)
HTH
--
Jim
Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.
a fortune quote ...
The rhino is a homely beast, For human eyes he's not a feast.
Farewell, farewell, you old rhinoceros, I'll stare at something
less prepoceros. -- Ogden Nash
------------------------------
Date: Fri, 23 Apr 2004 17:27:59 GMT
From: Geoff Cox <geoffacox@dontspamblueyonder.co.uk>
Subject: Re: slurp not working? ideas please!
Message-Id: <qeki80tetscit13bmlme4hqlabp3vnjogl@4ax.com>
On Thu, 22 Apr 2004 15:23:07 -0400, Richard Morse
<remorse@partners.org> wrote:
>I'm not sure exactly how you would get the directory name here (I've not
>had the pleasure of using File::Find yet), but wouldn't it be better to
>do something like:
>
> find sub {
> return if -d (some_function_to_get_cwd() . $_);
> ...
> }
Richard,
Thanks for the idea - will give it a try.
Cheers
Geoff
>
>? This would handle any other miscellaneous directories that appear...
>
>Ricky
------------------------------
Date: Fri, 23 Apr 2004 17:30:28 GMT
From: Geoff Cox <geoffacox@dontspamblueyonder.co.uk>
Subject: Re: slurp not working? ideas please!
Message-Id: <ciki805u201urtk6lask1t6kkk11bajlib@4ax.com>
On 22 Apr 2004 11:42:17 GMT, "Tassilo v. Parseval"
<tassilo.parseval@rwth-aachen.de> wrote:
>> sections, rather than the order above...?
>
>Which order above? The order above is <h2>, <p> and finally <option>.
>
>Even if I add a second or third set of data, the order remains intact
>for me.
>
>I still cannot reproduce this. :-)
Tassilo,
Just to say - I had to go away overnight so will now try and come up
with something you can test!
Cheers
Geoff
------------------------------
Date: Fri, 23 Apr 2004 11:19:04 -0400
From: James Willmore <jwillmore@remove.adelphia.net>
Subject: Re: What kind of RegEx should I use for <TEXTAREA>?
Message-Id: <pan.2004.04.23.15.18.58.939652@remove.adelphia.net>
On Fri, 23 Apr 2004 02:21:15 -0500, W. D. wrote:
> If a form has a <TEXTAREA> box to input all sorts of free-form
> text, is it necessary to do a regular expressions validation?
You should at least remove "harmful" characters - *if* you can. This
isn't always possible, depending on what this form field is supposed to
contain. Visit http://www.cert.org/tech_tips/cgi_metacharacters.html for
some direction on how to accomplish this.
> That is, are there any dangerous characters that shouldn't be
> allowed?
[ ... ]
Again, see the above URL. And, you could visit
http://www.w3.org/Security/Faq/www-security-faq.html
for some direction on safe scripting in general.
And ... you could use Google to search for further direction :-)
HTH
--
Jim
Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.
a fortune quote ...
Too clever is dumb. -- Ogden Nash
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 6453
***************************************