[23852] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 6055 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Jan 30 14:10:40 2004

Date: Fri, 30 Jan 2004 11:10:09 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 30 Jan 2004     Volume: 10 Number: 6055

Today's topics:
        regexp <jayme@dcc.ufmg.br>
    Re: regexp <noreply@gunnar.cc>
    Re: regexp <dwall@fastmail.fm>
    Re: regexp <noreply@gunnar.cc>
    Re: regexp <dwall@fastmail.fm>
        Spam Filter Pattern Matching (mossoft)
    Re: Spam Filter Pattern Matching <dwilga-MUNGE@mtholyoke.edu>
        Unexpected initial null in return from split() <craig@lucent.com>
        unused lexicals <no_spam@no_spam.com>
    Re: unused lexicals <dwall@fastmail.fm>
    Re: use of stat and argument isn't numeric message <mbroida@fake.domain>
    Re: use of stat and argument isn't numeric message <mbroida@fake.domain>
    Re: use of stat and argument isn't numeric message <mbroida@fake.domain>
    Re: {ipc - windows} -| no such command? <usenet@morrow.me.uk>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 30 Jan 2004 14:09:32 -0200
From: Jayme Assuncao Casimiro <jayme@dcc.ufmg.br>
Subject: regexp
Message-Id: <Pine.GSO.4.58.0401301404560.29708@turmalina.dcc.ufmg.br>


I have this piece of html text from Amazon.com

<dt><b><a
href=3D"/exec/obidos/ASIN/0965761762/qid=3D917872216/sr=3D1-1/002-1496444-0=
064804">1
Business, 2 Approaches : How to Succeed in Internet Business by Employing
Real-World Strategies</a></b>
 ~ <NOBR><font color=3D#990033>Usually ships in 2-3 days</font></NOBR><dd>
Ron Gielgun / Hardcover / Published 1998
<br>
Our Price: $13.97 ~ <NOBR><font color =3D#990033>You Save: $5.98
(30%)</font></NOBR>
<br>
<a
href=3D"/exec/obidos/ASIN/0965761762/qid=3D917872216/sr=3D1-1/002-1496444-0=
064804"><i>Read
more about this title...</i></a>
<p>

And I would like to use only one regexp to extract the title, the price,
and the desconunt in percent.

On the above example it would be:
title =3D 1 Business, 2 Approaches : How to Succeed in Internet Business by=
 Employing
Real-World Strategies
Price =3D $13.97
Descount =3D 30%

I have used:
=09 ($title) =3D $_ =3D~ m{<a.*?>(.*?)</a>};
=09 ($price) =3D $_ =3D~ m{.*Our Price:\s(\$?[\d\,.]+)};
=09 ($descount) =3D $_ =3D~ m{.*You Save:.*?[\d\,.]+.*?([\d\,.]+)};

But I would like to use only one regexp.

Thanks
+---------------------------------------------+
| Jayme Assuncao Casimiro                     |
| Graduado em Ci=EAncia da Computa=E7=E3o           |
| Estudante de Mestrado em  Computa=E7=E3o        |
| Universidade Federal de Minas Gerais - UFMG |
+---------------------------------------------+


------------------------------

Date: Fri, 30 Jan 2004 17:28:14 +0100
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: regexp
Message-Id: <bve0jf$qg6u7$1@ID-184292.news.uni-berlin.de>

Jayme Assuncao Casimiro wrote:
> I have used:
> 	 ($title) = $_ =~ m{<a.*?>(.*?)</a>};
> 	 ($price) = $_ =~ m{.*Our Price:\s(\$?[\d\,.]+)};
> 	 ($descount) = $_ =~ m{.*You Save:.*?[\d\,.]+.*?([\d\,.]+)};
> 
> But I would like to use only one regexp.

So, what stops you?

     ($title, $price, $discount) = m{...};
------------------------------------^^^
(to be filles with the regex)

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl



------------------------------

Date: Fri, 30 Jan 2004 16:56:53 -0000
From: "David K. Wall" <dwall@fastmail.fm>
Subject: Re: regexp
Message-Id: <Xns9480798B41E9Cdkwwashere@216.168.3.30>

Jayme Assuncao Casimiro <jayme@dcc.ufmg.br> wrote:

> I have this piece of html text from Amazon.com
> 
[snip HTML]
> 
> And I would like to use only one regexp to extract the title, the price,
> and the desconunt in percent.

Don't do that.  Use one of the modules designed for parsing HTML.  Using REs 
to parse HTML is painful and produces easily-broken code.

-- 
David Wall


------------------------------

Date: Fri, 30 Jan 2004 18:30:52 +0100
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: regexp
Message-Id: <bve48r$rbc8q$1@ID-184292.news.uni-berlin.de>

David K. Wall wrote:
> Jayme Assuncao Casimiro <jayme@dcc.ufmg.br> wrote:
>> I have this piece of html text from Amazon.com
>> 
>> [snip HTML]
>> 
>> And I would like to use only one regexp to extract the title, the
>> price, and the desconunt in percent.
> 
> Don't do that.  Use one of the modules designed for parsing HTML.
> Using REs to parse HTML is painful and produces easily-broken code.

For extracting the first link and two other parts that are not
identified by help of HTML markup? Please, David, there are more
colours in this world than black and white. ;-)

perlfaq9 is less rigid:

http://www.perldoc.com/perl5.8.0/pod/perlfaq9.html#How-do-I-remove-HTML-from-a-string%2D

http://www.perldoc.com/perl5.8.0/pod/perlfaq9.html#How-do-I-extract-URLs%2D

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl



------------------------------

Date: Fri, 30 Jan 2004 19:01:20 -0000
From: "David K. Wall" <dwall@fastmail.fm>
Subject: Re: regexp
Message-Id: <Xns94808EA47E00Cdkwwashere@216.168.3.30>

Gunnar Hjalmarsson <noreply@gunnar.cc> wrote:

> David K. Wall wrote:
>> Jayme Assuncao Casimiro <jayme@dcc.ufmg.br> wrote:
>>> I have this piece of html text from Amazon.com
>>> 
>>> [snip HTML]
>>> 
>>> And I would like to use only one regexp to extract the title, the
>>> price, and the desconunt in percent.
>> 
>> Don't do that.  Use one of the modules designed for parsing HTML.
>> Using REs to parse HTML is painful and produces easily-broken code.
> 
> For extracting the first link and two other parts that are not
> identified by help of HTML markup? Please, David, there are more
> colours in this world than black and white. ;-)

Yeah, you're right.  <insert standard excuses>.  Thanks for the reality 
check.

-- 
David Wall


------------------------------

Date: 30 Jan 2004 05:42:09 -0800
From: mmoss26360@aol.com (mossoft)
Subject: Spam Filter Pattern Matching
Message-Id: <a6ceacab.0401300542.5c9f77d5@posting.google.com>

I use SpamAssassin as a SPAM detector, the rules for the Bayes filter
appear to be Perl based.
I need a rule which detects a string in the subject like "Re: ABCDE,
random three words", where the ABCDE bit can be between 2 and 8 upper
case characters, and I came up with:

/Re: [A-Z]{2,8}, .{1,20}? .{1,20}? .{1,20}?/i

Does this look about right to all you experts?

Ta.

M.


------------------------------

Date: Fri, 30 Jan 2004 11:06:01 -0500
From: Dan Wilga <dwilga-MUNGE@mtholyoke.edu>
Subject: Re: Spam Filter Pattern Matching
Message-Id: <dwilga-MUNGE-1E3A1E.11060130012004@nap.mtholyoke.edu>

In article <a6ceacab.0401300542.5c9f77d5@posting.google.com>,
 mmoss26360@aol.com (mossoft) wrote:

> I use SpamAssassin as a SPAM detector, the rules for the Bayes filter
> appear to be Perl based.
> I need a rule which detects a string in the subject like "Re: ABCDE,
> random three words", where the ABCDE bit can be between 2 and 8 upper
> case characters, and I came up with:
> 
> /Re: [A-Z]{2,8}, .{1,20}? .{1,20}? .{1,20}?/i

The one I wrote yesterday (but haven't tested yet) is:

  ^Re:\s[A-Z][A-Z]+,(\s[a-z]+){3}

I'd rather not assume the CAPS part will be from 2-8 chars, or that any 
of the individual words will be from 1-20 chars.

In my experience, these subjects always have all lowercase alphas in the 
three words after the comma, so using "." here is overkill, IMHO.

I've also found when writing regexps that \s is your friend. It's almost 
always preferable to use \s (or even \s+), rather than assume the 
character will be a real space. It might be a tab or a carriage return. 
Granted, it's not too likely in an email subject, but as a general rule 
it's very often true, and costs next to nothing.

-- 
Dan Wilga          dwilga-MUNGE@mtholyoke.edu
** Remove the -MUNGE in my address to reply **


------------------------------

Date: Fri, 30 Jan 2004 12:57:05 -0600
From: "Craig M. Votava" <craig@lucent.com>
Subject: Unexpected initial null in return from split()
Message-Id: <401AA901.2A889631@lucent.com>

Folks-

I'm trying to do something fairly that should be
fairly easy, and obvious:
------------------------------------------------
my %info = split(/magic/, `pkginfo -l SUNWarc`);
print $info{VERSION};
------------------------------------------------
My problem is: what magic will make this work?

Here's my closest solution (using arrays for debugging):
---------------------------------------------------------
use Data::Dumper;
my @info = split(/^\s+(\S+:)\s+/m, `pkginfo -l SUNWarc`);
print STDERR Dumper(\@info);
---------------------------------------------------------

In my environment, info[0] is null, all the rest of the
array is exactly what I would expect. WHERE IS THIS INITIAL
NULL COMING FROM!!! Grrrr, I'm frustrated.

Any help is very much appreciated!

Thanks

-Craig Votava
Lucent Technologies
craig@lucent.com


------------------------------

Date: Fri, 30 Jan 2004 13:30:36 +0000 (UTC)
From: bill <no_spam@no_spam.com>
Subject: unused lexicals
Message-Id: <bvdm9s$2ro$1@reader2.panix.com>




Is there a simple way to find the unused lexicals in Perl code?
By "unused lexicals" I mean lexicals that are mentioned only once
in their scope.

Thanks!

bill



------------------------------

Date: Fri, 30 Jan 2004 15:58:40 -0000
From: "David K. Wall" <dwall@fastmail.fm>
Subject: Re: unused lexicals
Message-Id: <Xns94806FAC09692dkwwashere@216.168.3.30>

bill <no_spam@no_spam.com> wrote:

> Is there a simple way to find the unused lexicals in Perl code?
> By "unused lexicals" I mean lexicals that are mentioned only once
> in their scope.

    use warnings;


------------------------------

Date: Fri, 30 Jan 2004 17:34:05 GMT
From: MPBroida <mbroida@fake.domain>
Subject: Re: use of stat and argument isn't numeric message
Message-Id: <401A958D.509BFCCD@fake.domain>

Eric Schwartz wrote:
> 
> MPBroida <mbroida@fake.domain> writes:
> >       Will:
> >               for (1..$a)
> >       recalculate the end of the range on EACH pass???
> 
> I don't suppose you tried it yourself to see?
> 
> $a=4;
> for(1..$a) {
>   print "$a\n";
>   $a++
> }
> __END__
> 4
> 5
> 6
> 7
> 
> This pretty clearly indicates that Perl does not recalculate the end
> value if $a changes in the middle of the loop; if it did, the loop
> would never terminate.

	Good example.  Thanks!

		Mike


------------------------------

Date: Fri, 30 Jan 2004 17:35:12 GMT
From: MPBroida <mbroida@fake.domain>
Subject: Re: use of stat and argument isn't numeric message
Message-Id: <401A95D0.900A2CA4@fake.domain>

gnari wrote:
> 
> "MPBroida" <mbroida@fake.domain> wrote in message
> news:40198E1B.5B03CEFE@fake.domain...
> > Anno Siegel wrote:
> > >
> > > gnari <gnari@simnet.is> wrote in comp.lang.perl.misc:
> > > > > OK, so in cases where the end of the range may vary within
> > > > > the loop, it would be BAD to use the "start..end" kind of
> > > > > if loop.  I'll try to remember that.  :)  In other cases,
> > > > > it is definitely much cleaner/simpler to use "..".
> > > >
> > > > just remember that .. is a list constructor
> > > >   for my $i (1..100000) {...}
> > > > iterates through a (long) list of elements
> > >
> > > But the list isn't expanded.  The compiler is clever enough to generate
> > > the elements one by one at run time.
> > >
> > > This hasn't always been so, but the feature was introduced early in
> > > Perl 5.x.
> >
> > Sounds like you disagree with gnari (unless I'm missing something).
> 
> no, not at all. he was expanding on what i said. I never said the list
> was expanded. I just sait it was a list (or behaved like one)
> 
> > Alright, at the bell, come out fighting.  Need an authoritative
> > answer:
> >
> > Will:
> > for (1..$a)
> > recalculate the end of the range on EACH pass???
> > This is important if $a changes inside the loop.
> no. this is easily tested:
>   perl -le "my $a=5;for (1..$a) {print $_,$a++}"
>   15
>   26
>   37
>   48
>   59

	OK, now I see what is happening.  :)
	Thanks to all for clarifying this stuff for me.
	I'll make sure I do NOT use 1..$a and expect it to
	adjust to changes in $a.  :)

		Thanks!
			Mike


------------------------------

Date: Fri, 30 Jan 2004 17:33:25 GMT
From: MPBroida <mbroida@fake.domain>
Subject: Re: use of stat and argument isn't numeric message
Message-Id: <401A9565.EFEFCAC7@fake.domain>

Ben Morrow wrote:
> 
> MPBroida <mbroida@fake.domain> wrote:
> > Anno Siegel wrote:
> > >
> > > gnari <gnari@simnet.is> wrote in comp.lang.perl.misc:
> > > > just remember that .. is a list constructor
> > > >   for my $i (1..100000) {...}
> > > > iterates through a (long) list of elements
> > >
> > > But the list isn't expanded.  The compiler is clever enough to generate
> > > the elements one by one at run time.
> > >
> > > This hasn't always been so, but the feature was introduced early in
> > > Perl 5.x.
> >
> >       Sounds like you disagree with gnari (unless I'm missing something).
> 
> Yup, you are. What Anno is saying is that
> 
>    for my $i (1..100_000) {...}
> 
> doesn't actually create a list of 100_000 elements in memory, it just
> makes a note that there are 100_000 elements to iterate over. If you
> were to write

	Ok, that is what I was missing: the fact that it (in a way)
	creates a COUNT of how many times it will loop.  And it must
	also remember the "start" (or "current") value, of course.  :)

		Mike


------------------------------

Date: Fri, 30 Jan 2004 11:28:10 +0000 (UTC)
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: {ipc - windows} -| no such command?
Message-Id: <bvdf4a$k0n$1@wisteria.csv.warwick.ac.uk>


Steven Mocking <ufo.removethisspamnote@quicknet.nl> wrote:
> Ben Morrow wrote:
> >         open STDOUT, ">&=", $TMP or die "dup2 failed: $!";
>
> That gives me:
> 
> Unknown open() mode '>&='

Sorry, stupid mistake. I meant just ">&".

Ben

-- 
"The Earth is degenerating these days. Bribery and corruption abound.
Children no longer mind their parents, every man wants to write a book,
and it is evident that the end of the world is fast approaching."
     -Assyrian stone tablet, c.2800 BC                         ben@morrow.me.uk


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 6055
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[23852] in Perl-Users-Digest

Perl-Users Digest, Issue: 6055 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Fri Jan 30 14:10:40 2004

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Jan 30 14:10:40 2004