[18774] in Perl-Users-Digest
Perl-Users Digest, Issue: 942 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat May 19 21:10:25 2001
Date: Sat, 19 May 2001 18:10:10 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <990321009-v10-i942@ruby.oce.orst.edu>
Content-Type: text
Perl-Users Digest Sat, 19 May 2001 Volume: 10 Number: 942
Today's topics:
Re: Pronouncing ISA (Abigail)
Re: regexp: counting words ? <ap@andre-probst.de>
Re: regexp: counting words ? <boqichi0@earthlink.net>
Re: regexp: counting words ? (Tad McClellan)
Re: regexp: counting words ? <bart.lateur@skynet.be>
Re: regexp: delete everything between <? ... ?> <ap@andre-probst.de>
Re: Sorry, I solved it... Re: $myexp =~ s/\%/[aeiou]/; <iltzu@sci.invalid>
Re: Stubborn regex won't work <wellhaven@worldnet.att.net>
Re: What's wrong with my scope? <iltzu@sci.invalid>
Re: Why can't I localize a lexical variable? <bart.lateur@skynet.be>
Re: word doc to txt <iltzu@sci.invalid>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sat, 19 May 2001 23:20:39 +0000 (UTC)
From: abigail@foad.org (Abigail)
Subject: Re: Pronouncing ISA
Message-Id: <slrn9gdvu7.vtl.abigail@tsathoggua.rlyeh.net>
Mark Jason Dominus (mjd@plover.com) wrote on MMDCCCXVIII September
MCMXCIII in <URL:news:3b069d0c.2b5c$239@news.op.net>:
:} In article <m1g0e1wfsr.fsf@halfdome.holdit.com>,
:} Randal L. Schwartz <merlyn@stonehenge.com> wrote:
:} >Well, I put that in there because I had someone come up to me in class
:} >one day and started asking about the "ice-uh" variable, kinda like
:} >"ISO standard". And he didn't grok the meaning until I said that it
:} >was "iz uh", as in "this is a that". Bing! His eyes lit up.
:}
:} When I was teaching in Japan not too long ago, one of the students
:} asked why the variable was named @ISA, and when I explained the
:} reason, it turned out that several people in the class hadn't gotten it.
Actually, it took me 3 years of working with Perl to recognize the
"IS-A" meaning. I blame not only not being a native English speaker,
but also having a somewhat different view about OO.
Abigail
--
perl5.004 -wMMath::BigInt -e'$^V=Math::BigInt->new(qq]$^F$^W783$[$%9889$^F47]
.qq]$|88768$^W596577669$%$^W5$^F3364$[$^W$^F$|838747$[8889739$%$|$^F673$%$^W]
.qq]98$^F76777$=56]);$^U=substr($]=>$|=>5)*(q.25..($^W=@^V))=>do{print+chr$^V
%$^U;$^V/=$^U}while$^V!=$^W'
------------------------------
Date: Sat, 19 May 2001 20:01:39 +0200
From: "Andre Probst" <ap@andre-probst.de>
Subject: Re: regexp: counting words ?
Message-Id: <9e6ceg$nkm$05$1@news.t-online.com>
It works fine, but only when the case of the word matches exactly.
If I search for "batterie" , "Batterie" doesn't count.
Can I make the regexp case-independent ?
bye, Andre
But if want to match
"Garry Williams" <garry@ifr.zvolve.net> schrieb im Newsbeitrag
news:slrn9gd406.1gn.garry@zfw.zvolve.net...
> On Sat, 19 May 2001 17:12:36 +0200, Andre Probst <ap@andre-probst.de>
wrote:
> > For a fulltext search I have stripped HTML in a database field.
> >
> > I read the the content and want to know how often the word "$searchword"
> > appears in the text.
> >
> > Is it possible to count matches and if so, how ?
>
> $i++ while /$searchword/g;
>
> --
> Garry Williams
------------------------------
Date: Sat, 19 May 2001 18:09:50 GMT
From: Franco Luissi <boqichi0@earthlink.net>
Subject: Re: regexp: counting words ?
Message-Id: <3B06E2C7.F705C891@earthlink.net>
$i++ while /$searchword/gi;
(note the i next to the g)
Andre Probst wrote:
> It works fine, but only when the case of the word matches exactly.
>
> If I search for "batterie" , "Batterie" doesn't count.
>
> Can I make the regexp case-independent ?
>
> bye, Andre
>
> But if want to match
> "Garry Williams" <garry@ifr.zvolve.net> schrieb im Newsbeitrag
> news:slrn9gd406.1gn.garry@zfw.zvolve.net...
> > On Sat, 19 May 2001 17:12:36 +0200, Andre Probst <ap@andre-probst.de>
> wrote:
> > > For a fulltext search I have stripped HTML in a database field.
> > >
> > > I read the the content and want to know how often the word "$searchword"
> > > appears in the text.
> > >
> > > Is it possible to count matches and if so, how ?
> >
> > $i++ while /$searchword/g;
> >
> > --
> > Garry Williams
------------------------------
Date: Sat, 19 May 2001 13:39:15 -0400
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: regexp: counting words ?
Message-Id: <slrn9gdbu3.uq9.tadmc@tadmc26.august.net>
Andre Probst <ap@andre-probst.de> wrote:
>
>It works fine, but only when the case of the word matches exactly.
>
>Can I make the regexp case-independent ?
Yes.
perldoc perlop
describes how to use Perl's pattern match operator.
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
------------------------------
Date: Sat, 19 May 2001 20:05:27 GMT
From: Bart Lateur <bart.lateur@skynet.be>
Subject: Re: regexp: counting words ?
Message-Id: <5fkdgt0ovrceok5t745kgdm93vl5lqn1o0@4ax.com>
Andre Probst wrote:
> works fine, but only when the case of the word matches exactly.
>
>If I search for "batterie" , "Batterie" doesn't count.
>
>Can I make the regexp case-independent ?
Yes. Add the /i modifier to your regex. See perlre.
Note that speed may drop by half .
--
Bart.
------------------------------
Date: Sat, 19 May 2001 20:21:11 +0200
From: "Andre Probst" <ap@andre-probst.de>
Subject: Re: regexp: delete everything between <? ... ?>
Message-Id: <9e6dj4$ip7$06$1@news.t-online.com>
thank you very much, that works fine and is much simpler than long regexps.
Andre
"Godzilla!" <godzilla@stomp.stomp.tokyo> schrieb im Newsbeitrag
news:3B06B192.E4C772E6@stomp.stomp.tokyo...
> Andre Probst wrote:
>
> > I want to delete PHP Code in a file, so that in result only
> > normal text is left:
>
> > My regexp doesn't work because it stops with the first ">", so
> > I don't get the desired result " textstart text between endtext".
>
> Your stated parameters for output are incorrect based upon
> this code you provide. Please be careful about stating very
> precise and exact expected results.
>
> To produce your stated parameters requires extra coding
> beyond what you display.
>
>
> > What do I have to change, that the regexp takes the last "?>" as the end
of
> > the string to delete and deletes the following PHP code as well ?
>
> A complicated regex is not needed for an easy task as is yours.
>
> Godzilla!
> --
>
> TEST SCRIPT:
> ____________
>
>
> #!perl
>
> print "Content-type: text/plain\n\n";
>
> $line = '
> textstart
> <?
> // php code start
>
> if ($a > 0) { $allowed ="TRUE";}
> if ($b > 0) { $allowed ="FALSE;}
> // php code end
> ?>
>
> text between
>
> <?
> // php code start
>
> if ($a > 0) { $allowed ="TRUE";}
> if ($b > 0) { $allowed ="FALSE;}
> // php code end
> ?>
> ...
> ...
>
> endtext';
>
> do
> {
> $start = index ($line, "<?");
> $stop = index ($line, "?>", $start) + 2;
> substr ($line, $start, $stop - $start, "");
> }
> until (index ($line, "<?") == -1);
>
> print $line;
>
> exit;
>
>
>
> PRINTED RESULTS:
> ________________
>
>
>
> textstart
>
>
> text between
>
>
> ...
> ...
>
> endtext
------------------------------
Date: 19 May 2001 22:02:16 GMT
From: Ilmari Karonen <iltzu@sci.invalid>
Subject: Re: Sorry, I solved it... Re: $myexp =~ s/\%/[aeiou]/; ????
Message-Id: <990308936.17465@itz.pp.sci.fi>
In article <3B039CB5.B187F3F0@fpg.de>, Gerhard Schwarz wrote:
>Damn, better read twice...
>
>> $match is a String from 0 to n length, and does only contain
>> letters.
>
>Wrong, $match can contain "+" od "%". I looked at every single line
>of that array from where $match is taken, and found a handful entries
>that contains either one of them.
>
>> $match =~ s/\+/[bcdfghjklmnpqrstvwxyz]/;
>> <> $match =~ s/\%/[aeiou\344\366\374]/;
Note that if $match is supposed to be a regex with with the shorthand
notation defined above then a) '+' is a singularly bad choice, and b)
you probably ought to check that it's not already backslashed.
If instead, as I suspect, it's meant to be a simple string that only
allows these two metacharacters, then you should probably quotemeta it
first. Note that quotemeta() backslashes all non-alphanumerics:
$match = quotemeta($match);
$match =~ s/\\\+/[bcdfghjklmnpqrstvwxyz]/;
$match =~ s/\\\%/[aeiou\344\366\374]/;
Also note that you've only included the 8-bit characters 'ä', 'ö' and
'ü' in lowercase. Unless you're using an appropriate locale, perl will
not recognise those as letters and will therefore *not* match their
uppercase equivalents even with the /i modifier.
--
Ilmari Karonen - http://www.sci.fi/~iltzu/
Please ignore Godzilla / Kira -- do not feed the troll.
------------------------------
Date: Sat, 19 May 2001 23:45:53 GMT
From: "Will Cardwell" <wellhaven@worldnet.att.net>
Subject: Re: Stubborn regex won't work
Message-Id: <RADN6.31153$t12.2361709@bgtnsc05-news.ops.worldnet.att.net>
Thanks for all the ideas. I like the reverse ones best because...I failed
to mention it ... but I'd like to cover the case: 'ERI000001.INO' also; that
is not rely on any particular non-word char delimiting on the left.
Regards,
Will Cardwell
------------------------------
Date: 19 May 2001 22:24:48 GMT
From: Ilmari Karonen <iltzu@sci.invalid>
Subject: Re: What's wrong with my scope?
Message-Id: <990309984.18709@itz.pp.sci.fi>
In article <3b0408a2.448e$6a@news.op.net>, Mark Jason Dominus wrote:
>In article <jkovgn01aed.fsf@myrtle.ukc.ac.uk>,
>J.C.Posey <jcp@myrtle.ukc.ac.uk> wrote:
>>> > $urls{$fields[1]} += [$fields[2], $fields[3], $fields[4], $fields[5]];
[snip]
>
>But += means to do addition of numbers. An array is not a number.
Of course, just to confuse matters, perl does let you use a reference as
a number. Specifically, a reference used as a number equals the address
of whatever the reference points to. That's pretty much only useful for
comparing two references to see if they're equal.
The more I think about that, the more I feel the whole idea of allowing
references to be numbers was a misguided optimization. It might have
been better to just optimize eq/ne to compare references directly.
--
Ilmari Karonen - http://www.sci.fi/~iltzu/
Please ignore Godzilla / Kira -- do not feed the troll.
------------------------------
Date: Sat, 19 May 2001 19:49:42 GMT
From: Bart Lateur <bart.lateur@skynet.be>
Subject: Re: Why can't I localize a lexical variable?
Message-Id: <5djdgt8jdtmlgbrtb8in3k31vhirdgf3ae@4ax.com>
Clinton A. Pierce wrote:
>Because it hurts my head. Example:
>
> # Does not compile:
> # Can't localize lexical...
> my $foo=66;
> sub cartman {
> print "Foo is $foo\n";
> }
>
> sub kenny {
> local $foo=67;
> cartman();
> }
> cartman();
>
>I should be able to determine the scope of a lexical by visual
>examination -- I don't need to know the program's flow to know what's
>in scope where.
>Yuck. This is why I always code:
...
>To avoid this nonsense.
BS. That is precisely the *purpose* of being able to do that.
Look at this modification of your code:
my $foo=66;
sub cartman {
print "Foo is $foo\n";
}
sub kenny {
$foo=67;
cartman();
}
That's the general problem with "global" variables anyway: they can be
changed from *anywhere* (within the scope). If you want predictability,
limit the scope of $foo to the sub, or to a block surrounding it.
local(), at least, restores the old value when the scope ends.
If you take care of what you do, dynamic scoping is a neat feature. Very
neat.
--
Bart.
------------------------------
Date: 19 May 2001 23:50:30 GMT
From: Ilmari Karonen <iltzu@sci.invalid>
Subject: Re: word doc to txt
Message-Id: <990315102.24011@itz.pp.sci.fi>
In article <slrn9g7o6f.q33.tadmc@tadmc26.august.net>, Tad McClellan wrote:
>sven <huhusven@xs4all.nl> wrote:
>>
>>I want to extract all ascii strings in a microsoft word document.
>
>If you are running on *nix, then it gets even easier:
>
> man strings
I'd like to note, however, that I've repeatedly ended up reinventing
that particular wheel in perl because the text in question has contained
8-bit characters that strings(1) doesn't consider printable but I do.
The real fun begins when the text is in some odd character set like
MacRoman, so that I first need to figure out which character is which
and then to translate them before matching. Still nothing that a perl
one-liner can't handle, though.
Actually, for M$ Word documents just removing control chars is usually
sufficient in my experience:
perl -lpe 'tr/\0-\037//d' <file.doc >file.txt
Cleaning up the result is easily done in any decent text editor.
--
Ilmari Karonen - http://www.sci.fi/~iltzu/
Please ignore Godzilla / Kira -- do not feed the troll.
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 942
**************************************