[18774] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 942 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat May 19 21:10:25 2001

Date: Sat, 19 May 2001 18:10:10 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Message-Id: <990321009-v10-i942@ruby.oce.orst.edu>
Content-Type: text

Perl-Users Digest           Sat, 19 May 2001     Volume: 10 Number: 942

Today's topics:
    Re: Pronouncing ISA (Abigail)
    Re: regexp: counting words ? <ap@andre-probst.de>
    Re: regexp: counting words ? <boqichi0@earthlink.net>
    Re: regexp: counting words ? (Tad McClellan)
    Re: regexp: counting words ? <bart.lateur@skynet.be>
    Re: regexp: delete everything between <?  ... ?> <ap@andre-probst.de>
    Re: Sorry, I solved it... Re: $myexp =~ s/\%/[aeiou]/;  <iltzu@sci.invalid>
    Re: Stubborn regex won't work <wellhaven@worldnet.att.net>
    Re: What's wrong with my scope? <iltzu@sci.invalid>
    Re: Why can't I localize a lexical variable? <bart.lateur@skynet.be>
    Re: word doc to txt <iltzu@sci.invalid>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sat, 19 May 2001 23:20:39 +0000 (UTC)
From: abigail@foad.org (Abigail)
Subject: Re: Pronouncing ISA
Message-Id: <slrn9gdvu7.vtl.abigail@tsathoggua.rlyeh.net>

Mark Jason Dominus (mjd@plover.com) wrote on MMDCCCXVIII September
MCMXCIII in <URL:news:3b069d0c.2b5c$239@news.op.net>:
:}  In article <m1g0e1wfsr.fsf@halfdome.holdit.com>,
:}  Randal L. Schwartz <merlyn@stonehenge.com> wrote:
:} >Well, I put that in there because I had someone come up to me in class
:} >one day and started asking about the "ice-uh" variable, kinda like
:} >"ISO standard".  And he didn't grok the meaning until I said that it
:} >was "iz uh", as in "this is a that".  Bing!  His eyes lit up.  
:}  
:}  When I was teaching in Japan not too long ago, one of the students
:}  asked why the variable was named @ISA, and when I explained the
:}  reason, it turned out that several people in the class hadn't gotten it.


Actually, it took me 3 years of working with Perl to recognize the
"IS-A" meaning. I blame not only not being a native English speaker,
but also having a somewhat different view about OO.


Abigail
-- 
perl5.004 -wMMath::BigInt -e'$^V=Math::BigInt->new(qq]$^F$^W783$[$%9889$^F47]
 .qq]$|88768$^W596577669$%$^W5$^F3364$[$^W$^F$|838747$[8889739$%$|$^F673$%$^W]
 .qq]98$^F76777$=56]);$^U=substr($]=>$|=>5)*(q.25..($^W=@^V))=>do{print+chr$^V
%$^U;$^V/=$^U}while$^V!=$^W'


------------------------------

Date: Sat, 19 May 2001 20:01:39 +0200
From: "Andre Probst" <ap@andre-probst.de>
Subject: Re: regexp: counting words ?
Message-Id: <9e6ceg$nkm$05$1@news.t-online.com>


It works fine, but only when the case of the word matches exactly.

If  I search for "batterie" , "Batterie" doesn't count.

Can I make the regexp case-independent ?

bye, Andre

But if want to match
"Garry Williams" <garry@ifr.zvolve.net> schrieb im Newsbeitrag
news:slrn9gd406.1gn.garry@zfw.zvolve.net...
> On Sat, 19 May 2001 17:12:36 +0200, Andre Probst <ap@andre-probst.de>
wrote:
> > For a fulltext search I have stripped HTML in a database field.
> >
> > I read the the content and want to know how often the word "$searchword"
> > appears in the text.
> >
> > Is it possible to count matches and if so, how ?
>
>   $i++ while /$searchword/g;
>
> --
> Garry Williams








------------------------------

Date: Sat, 19 May 2001 18:09:50 GMT
From: Franco Luissi <boqichi0@earthlink.net>
Subject: Re: regexp: counting words ?
Message-Id: <3B06E2C7.F705C891@earthlink.net>

$i++ while /$searchword/gi;

(note the i next to the g)


Andre Probst wrote:

> It works fine, but only when the case of the word matches exactly.
>
> If  I search for "batterie" , "Batterie" doesn't count.
>
> Can I make the regexp case-independent ?
>
> bye, Andre
>
> But if want to match
> "Garry Williams" <garry@ifr.zvolve.net> schrieb im Newsbeitrag
> news:slrn9gd406.1gn.garry@zfw.zvolve.net...
> > On Sat, 19 May 2001 17:12:36 +0200, Andre Probst <ap@andre-probst.de>
> wrote:
> > > For a fulltext search I have stripped HTML in a database field.
> > >
> > > I read the the content and want to know how often the word "$searchword"
> > > appears in the text.
> > >
> > > Is it possible to count matches and if so, how ?
> >
> >   $i++ while /$searchword/g;
> >
> > --
> > Garry Williams



------------------------------

Date: Sat, 19 May 2001 13:39:15 -0400
From: tadmc@augustmail.com (Tad McClellan)
Subject: Re: regexp: counting words ?
Message-Id: <slrn9gdbu3.uq9.tadmc@tadmc26.august.net>

Andre Probst <ap@andre-probst.de> wrote:
>
>It works fine, but only when the case of the word matches exactly.
>
>Can I make the regexp case-independent ?


Yes.


   perldoc perlop

describes how to use Perl's pattern match operator.


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: Sat, 19 May 2001 20:05:27 GMT
From: Bart Lateur <bart.lateur@skynet.be>
Subject: Re: regexp: counting words ?
Message-Id: <5fkdgt0ovrceok5t745kgdm93vl5lqn1o0@4ax.com>

Andre Probst wrote:

> works fine, but only when the case of the word matches exactly.
>
>If  I search for "batterie" , "Batterie" doesn't count.
>
>Can I make the regexp case-independent ?

Yes. Add the /i modifier to your regex. See perlre.

Note that speed may drop by half .

-- 
	Bart.


------------------------------

Date: Sat, 19 May 2001 20:21:11 +0200
From: "Andre Probst" <ap@andre-probst.de>
Subject: Re: regexp: delete everything between <?  ... ?>
Message-Id: <9e6dj4$ip7$06$1@news.t-online.com>

thank you very much, that works fine and is much simpler than long regexps.

Andre


"Godzilla!" <godzilla@stomp.stomp.tokyo> schrieb im Newsbeitrag
news:3B06B192.E4C772E6@stomp.stomp.tokyo...
> Andre Probst wrote:
>
> > I want to delete PHP Code in a file, so that in result only
> > normal text is left:
>
> > My regexp doesn't work because it stops with the first ">", so
> > I don't get the desired result " textstart text between endtext".
>
> Your stated parameters for output are incorrect based upon
> this code you provide. Please be careful about stating very
> precise and exact expected results.
>
> To produce your stated parameters requires extra coding
> beyond what you display.
>
>
> > What do I have to change, that the regexp takes the last "?>" as the end
of
> > the string to delete and deletes the following PHP code as well ?
>
> A complicated regex is not needed for an easy task as is yours.
>
> Godzilla!
> --
>
> TEST SCRIPT:
> ____________
>
>
> #!perl
>
> print "Content-type: text/plain\n\n";
>
> $line = '
> textstart
> <?
>  // php code start
>
>   if ($a > 0) { $allowed ="TRUE";}
>   if ($b > 0) { $allowed ="FALSE;}
>     // php code end
> ?>
>
> text between
>
> <?
>  // php code start
>
>   if ($a > 0) { $allowed ="TRUE";}
>   if ($b > 0) { $allowed ="FALSE;}
>     // php code end
> ?>
> ...
> ...
>
> endtext';
>
> do
>  {
>    $start = index ($line, "<?");
>    $stop = index ($line, "?>", $start) + 2;
>    substr ($line, $start, $stop - $start, "");
>  }
> until (index ($line, "<?") == -1);
>
> print $line;
>
> exit;
>
>
>
> PRINTED RESULTS:
> ________________
>
>
>
> textstart
>
>
> text between
>
>
> ...
> ...
>
> endtext




------------------------------

Date: 19 May 2001 22:02:16 GMT
From: Ilmari Karonen <iltzu@sci.invalid>
Subject: Re: Sorry, I solved it... Re: $myexp =~ s/\%/[aeiou]/; ????
Message-Id: <990308936.17465@itz.pp.sci.fi>

In article <3B039CB5.B187F3F0@fpg.de>, Gerhard Schwarz wrote:
>Damn, better read twice...
>
>> $match is a String from 0 to n length, and does only contain
>> letters.
>
>Wrong, $match can contain "+" od "%". I looked at every single line 
>of that array from where $match is taken, and found a handful entries
>that contains either one of them.
>
>>           $match =~ s/\+/[bcdfghjklmnpqrstvwxyz]/;
>> <>        $match =~ s/\%/[aeiou\344\366\374]/;

Note that if $match is supposed to be a regex with with the shorthand
notation defined above then a) '+' is a singularly bad choice, and b)
you probably ought to check that it's not already backslashed.

If instead, as I suspect, it's meant to be a simple string that only
allows these two metacharacters, then you should probably quotemeta it
first.  Note that quotemeta() backslashes all non-alphanumerics:

  $match = quotemeta($match);
  $match =~ s/\\\+/[bcdfghjklmnpqrstvwxyz]/;
  $match =~ s/\\\%/[aeiou\344\366\374]/;

Also note that you've only included the 8-bit characters 'ä', 'ö' and
'ü' in lowercase.  Unless you're using an appropriate locale, perl will
not recognise those as letters and will therefore *not* match their
uppercase equivalents even with the /i modifier.

-- 
Ilmari Karonen - http://www.sci.fi/~iltzu/
Please ignore Godzilla / Kira -- do not feed the troll.


------------------------------

Date: Sat, 19 May 2001 23:45:53 GMT
From: "Will Cardwell" <wellhaven@worldnet.att.net>
Subject: Re: Stubborn regex won't work
Message-Id: <RADN6.31153$t12.2361709@bgtnsc05-news.ops.worldnet.att.net>

Thanks for all the ideas. I  like the reverse ones best because...I failed
to mention it ... but I'd like to cover the case: 'ERI000001.INO' also; that
is not rely on any particular non-word char delimiting on the left.

Regards,

Will Cardwell




------------------------------

Date: 19 May 2001 22:24:48 GMT
From: Ilmari Karonen <iltzu@sci.invalid>
Subject: Re: What's wrong with my scope?
Message-Id: <990309984.18709@itz.pp.sci.fi>

In article <3b0408a2.448e$6a@news.op.net>, Mark Jason Dominus wrote:
>In article <jkovgn01aed.fsf@myrtle.ukc.ac.uk>,
>J.C.Posey <jcp@myrtle.ukc.ac.uk> wrote:
>>> > 	$urls{$fields[1]} += [$fields[2], $fields[3], $fields[4], $fields[5]];
 [snip]
>
>But += means to do addition of numbers.  An array is not a number.

Of course, just to confuse matters, perl does let you use a reference as
a number.  Specifically, a reference used as a number equals the address
of whatever the reference points to.  That's pretty much only useful for
comparing two references to see if they're equal.

The more I think about that, the more I feel the whole idea of allowing
references to be numbers was a misguided optimization.  It might have
been better to just optimize eq/ne to compare references directly.

-- 
Ilmari Karonen - http://www.sci.fi/~iltzu/
Please ignore Godzilla / Kira -- do not feed the troll.


------------------------------

Date: Sat, 19 May 2001 19:49:42 GMT
From: Bart Lateur <bart.lateur@skynet.be>
Subject: Re: Why can't I localize a lexical variable?
Message-Id: <5djdgt8jdtmlgbrtb8in3k31vhirdgf3ae@4ax.com>

Clinton A. Pierce wrote:

>Because it hurts my head.    Example:
>
>	# Does not compile:
>	# Can't localize lexical...
>	my $foo=66;
>	sub cartman {
>		print "Foo is $foo\n";
>	}
>
>	sub kenny {
>		local $foo=67;
>		cartman();
>	}
>	cartman();
>
>I should be able to determine the scope of a lexical by visual 
>examination -- I don't need to know the program's flow to know what's
>in scope where.

>Yuck.  This is why I always code:
 ...
>To avoid this nonsense.


BS. That is precisely the *purpose* of being able to do that.

Look at this modification of your code:

	my $foo=66;
	sub cartman {
		print "Foo is $foo\n";
	}

	sub kenny {
		$foo=67;
		cartman();
	}

That's the general problem with "global" variables anyway: they can be
changed from *anywhere* (within the scope). If you want predictability,
limit the scope of $foo to the sub, or to a block surrounding it.

local(), at least, restores the old value when the scope ends.

If you take care of what you do, dynamic scoping is a neat feature. Very
neat.

-- 
	Bart.


------------------------------

Date: 19 May 2001 23:50:30 GMT
From: Ilmari Karonen <iltzu@sci.invalid>
Subject: Re: word doc to txt
Message-Id: <990315102.24011@itz.pp.sci.fi>

In article <slrn9g7o6f.q33.tadmc@tadmc26.august.net>, Tad McClellan wrote:
>sven <huhusven@xs4all.nl> wrote:
>>
>>I want to extract all ascii strings in a microsoft word document. 
>
>If you are running on *nix, then it gets even easier:
>
>   man strings

I'd like to note, however, that I've repeatedly ended up reinventing
that particular wheel in perl because the text in question has contained
8-bit characters that strings(1) doesn't consider printable but I do.

The real fun begins when the text is in some odd character set like
MacRoman, so that I first need to figure out which character is which
and then to translate them before matching.  Still nothing that a perl
one-liner can't handle, though.

Actually, for M$ Word documents just removing control chars is usually
sufficient in my experience:

  perl -lpe 'tr/\0-\037//d' <file.doc >file.txt

Cleaning up the result is easily done in any decent text editor.

-- 
Ilmari Karonen - http://www.sci.fi/~iltzu/
Please ignore Godzilla / Kira -- do not feed the troll.


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc.  For subscription or unsubscription requests, send
the single line:

	subscribe perl-users
or:
	unsubscribe perl-users

to almanac@ruby.oce.orst.edu.  

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.

For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 942
**************************************


home help back first fref pref prev next nref lref last post