[23703] in Perl-Users-Digest
Perl-Users Digest, Issue: 5909 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Dec 7 18:10:39 2003
Date: Sun, 7 Dec 2003 15:10:12 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sun, 7 Dec 2003 Volume: 10 Number: 5909
Today's topics:
Re: Idiom for array index that I'm foreach'ing over? <abigail@abigail.nl>
Re: Idiom for array index that I'm foreach'ing over? <abigail@abigail.nl>
Re: newbie's question on the text file processing? <bacchantecn@yahoo.com.cn>
Re: newbie's question on the text file processing? <jurgenex@hotmail.com>
Re: newbie's question on the text file processing? <krahnj@acm.org>
Re: Overloading <no_spam_for_jkeen@verizon.net>
Re: Perlcc and converting scripts to bytecode <bobx@linuxmail.org>
Re: Perlcc and converting scripts to bytecode <schmidt.2002@gmx.de>
Re: Perlcc and converting scripts to bytecode <bobx@linuxmail.org>
Re: read file with while and then scan lines into array <no_spam_for_jkeen@verizon.net>
Re: Why can't I parse google search results? (Kevin Shay)
Re: Why can't I parse google search results? <kkeller-usenet@wombat.san-francisco.ca.us>
Re: Why can't I parse google search results? <gisle@activestate.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 07 Dec 2003 21:56:41 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: Idiom for array index that I'm foreach'ing over?
Message-Id: <slrnbt78gp.eu.abigail@alexandra.abigail.nl>
Michele Dondi (bik.mido@tiscalinet.it) wrote on MMMDCCXLVIII September
MCMXCIII in <URL:news:36evsvsojg2b2riqrs42hrjdeg0qbq23dd@4ax.com>:
}} On Thu, 4 Dec 2003 19:27:00 +0000 (UTC), Ben Morrow
}} <usenet@morrow.me.uk> wrote:
}}
}} >> $i ++; # hi Abigail
}} ^
}}
}} >I haven't been here long enough to get the 'hi Abigail', but surely
}} >that should be in a continue block so you can 'next'?
}}
}} I *think* it is because of that space. And I can't believe it: it...
}} it really works!
Well, yes, of course. This is after all Perl5, which like every other
main stream language I know about allows optional whitespace between
tokens. The whitespace isn't significant.
Unlike Perl6, which just throws out 50 years of programming language
design out of the window, passes Python on the wrong side and makes
whitespace significant in new painful and revolting ways.
Abigail
--
perl -wle'print"Κυστ αξοτθες Πεςμ Θαγλες"^"\x80"x24'
------------------------------
Date: 07 Dec 2003 21:58:57 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: Idiom for array index that I'm foreach'ing over?
Message-Id: <slrnbt78l1.eu.abigail@alexandra.abigail.nl>
Tassilo v. Parseval (tassilo.parseval@rwth-aachen.de) wrote on MMMDCCXLIX
September MCMXCIII in <URL:news:bqsmhg$lso$1@nets3.rz.RWTH-Aachen.DE>:
&& Also sprach Ben Morrow:
&&
&& > Michele Dondi <bik.mido@tiscalinet.it> wrote:
&& >> >What's so unbelievable about it?
&& >>
&& >> Hmmm... maybe that I didn't think it was possible?!?
&& >
&& > As a general rule, Perl *completely* ignores whitespace. The only
&& > exception I know of is when you have two sequences of \w next to other
&& > which should be interpreted as separate identifiers: ws is required
&& > then.
&&
&& It's a little more complicated in fact. Here's one that Abigail
&& simply adores:
&&
&& ethan@ethan:~$ perl -lw
&& print(1+2)*3;
&& Useless use of multiplication (*) in void context at - line 1.
&& 3
&& ethan@ethan:~$ perl -lw
&& print (1+2)*3;
&& print (...) interpreted as function at - line 1.
&& Useless use of multiplication (*) in void context at - line 1.
&& 3
&&
&& The additional whitespace in the second example will trigger an
&& additional warning.
Not in my perl!
$ perl -V
Summary of my perl5 (revision 5.0 version 8 subversion 2) configuration:
...
Locally applied patches:
defined-or
no print (...) warning
stacked file ops
Abigail
--
sub _ {$_ = shift and y/b-yB-Y/a-yB-Y/ xor !@ _?
exit print :
print and push @_ => shift and goto &{(caller (0)) [3]}}
split // => "KsvQtbuf fbsodpmu\ni flsI " xor & _
------------------------------
Date: Mon, 8 Dec 2003 03:04:46 +0800
From: "Jim" <bacchantecn@yahoo.com.cn>
Subject: Re: newbie's question on the text file processing?
Message-Id: <bqvtl2$m1n$1@mail.cn99.com>
while(my $line = <FILE>) {
$line =~ s/[\+\-\']/_/g;
$line = lc $line;
my @array = ($line =~ /\b\w+\b/g);
foreach(@array) {
$wordFreq{$_}++;
}
}
Is this correct? But I am not sure if the code fulfill the requirement.
Jim
------------------------------
Date: Sun, 07 Dec 2003 19:13:33 GMT
From: "Jόrgen Exner" <jurgenex@hotmail.com>
Subject: Re: newbie's question on the text file processing?
Message-Id: <xZKAb.1953$nz.1055@nwrddc01.gnilink.net>
Jim wrote:
> while(my $line = <FILE>) {
> $line =~ s/[\+\-\']/_/g;
> $line = lc $line;
> my @array = ($line =~ /\b\w+\b/g);
> foreach(@array) {
> $wordFreq{$_}++;
> }
> }
>
> Is this correct? But I am not sure if the code fulfill the
> requirement.
How can we say? You don't tell us what the code is supposed to do (i.e. what
are those ominous requirements you are refering to without actually telling
us) or what kind of problems you have with that code or why you believe it
is not correct. Just "question on text file processing" is a bit vague,
don't you think?
Posting your code is good, but it is not sufficient.
Please
- specify the requirement
- explain what the code is supposed to do (or what you think the code is
doing)
- explain what the code is actully doing and in how this is different from
what you expect it to do
- quote literally any warning or error message you are getting
Then we may be able to help you more
jue
------------------------------
Date: Sun, 07 Dec 2003 20:10:18 GMT
From: "John W. Krahn" <krahnj@acm.org>
Subject: Re: newbie's question on the text file processing?
Message-Id: <3FD388FD.E37555D9@acm.org>
Jim wrote:
>
> I am learning Perl and I have come across something. I would like to process
> the text file and calculate the word frequency in it. All analysis is case
> insensitive and all punctuation marks other than hyphens, apostrophe and
> plus and minus signs were substituted by the space.As I am a new bie, I have
> no idea of how to write a complex regular expression to extract the correct
> word one by one from the file. Can anyone help me finish the script?
my %words;
while ( <> ) {
s/[^[:alnum:]'+-]/ /g;
$words{ lc() }++ for /\S+/g;
}
print "$_\t$words{$_}\n" for sort keys %words;
John
--
use Perl;
program
fulfillment
------------------------------
Date: Sun, 07 Dec 2003 21:34:08 GMT
From: "Jim Keenan" <no_spam_for_jkeen@verizon.net>
Subject: Re: Overloading
Message-Id: <k1NAb.1396$kz2.1233@nwrdny01.gnilink.net>
"Ben Morrow" <usenet@morrow.me.uk> wrote in message
news:bqvoj3$3ih$2@wisteria.csv.warwick.ac.uk...
> "Jim Keenan" <no_spam_for_jkeen@verizon.net> wrote:
> > In attempting to reproduce your problem, I rearranged the code so as to
put
> > package I at the top of the file,
>
> Why?
>
I find it makes the code more readable in cases where I'm including >1
package in a file rather than pulling one in via 'use'. I developed this
practice while working thru code examples in Damian Conway's "Object
Oriented Perl."
> > then explicitly called package main.
>
> package is lexical, so the braces make sure we go back to main::.
>
> > I also threw in some newlines for readability.
>
> ...which is why I used '#!/usr/bin/perl -l', which puts in all those
> newlines and several more.
>
Okay.
> > The result: package main ran without warnings.
>
> ...because you didn't turn them on early enough.
Okay ... so then I went back to your original posting and copied-and-pasted
your code exactly as you typed it. When I ran it, I was unable to reproduce
the first of the two problems you cited:
> 1. Why is 'numify' called twice for int($i)?
> numify
> Argument "*" isn't numeric in addition (+) at ./op line 15.
> numify
> Argument "*" isn't numeric in addition (+) at ./op line 15.
> 0
I did not get the first 2 lines -- only the last 3.
and Well done.
>
> Ben
>
> --
> If you put all the prophets, | You'd have so much more reason
> Mystics and saints | Than ever was born
> In one room together, | Out of all of the conflicts of time.
> ben@morrow.me.uk |----------------+---------------| The Levellers,
'Believers'
------------------------------
Date: Sun, 07 Dec 2003 20:16:32 GMT
From: Robert <bobx@linuxmail.org>
Subject: Re: Perlcc and converting scripts to bytecode
Message-Id: <AULAb.732$Zq2.670245@news2.news.adelphia.net>
Warren Bell wrote:
> I'm running Perl 5.8.2 on linux. I've heard a little about perlcc so I
> desided to try it on one of my scripts (perlcc -o index -B index.cgi)
> and I have a few questions:
>
> Will the script in bytecode run faster?
>
> Can I distribute the script in bytecode and will it work on most
> linux/unix systems with perl?
>
> Is it easy for someone to turn that bytecode back into my original source?
If you only trying to compile a standalone try PAR.
Bob
------------------------------
Date: Sun, 07 Dec 2003 17:00:57 -0500
From: Andreas Schmidt <schmidt.2002@gmx.de>
Subject: Re: Perlcc and converting scripts to bytecode
Message-Id: <oprztzfvhlk05e3a@News.CIS.DFN.DE>
Am Sun, 07 Dec 2003 20:16:32 GMT hat Robert <bobx@linuxmail.org>
geschrieben:
> Warren Bell wrote:
>> I'm running Perl 5.8.2 on linux. I've heard a little about perlcc so I
>> desided to try it on one of my scripts (perlcc -o index -B index.cgi)
>> and I have a few questions:
>>
>> Will the script in bytecode run faster?
>>
>> Can I distribute the script in bytecode and will it work on most
>> linux/unix systems with perl?
>>
>> Is it easy for someone to turn that bytecode back into my original
>> source?
> If you only trying to compile a standalone try PAR.
Wow, I have never heard of Par before, but it definitely sounds cool.
What's the impact so far? Will it be as successful as JAR for java?
------------------------------
Date: Sun, 07 Dec 2003 22:23:10 GMT
From: Robert <bobx@linuxmail.org>
Subject: Re: Perlcc and converting scripts to bytecode
Message-Id: <iLNAb.762$Zq2.694037@news2.news.adelphia.net>
Andreas Schmidt wrote:
> Am Sun, 07 Dec 2003 20:16:32 GMT hat Robert <bobx@linuxmail.org>
> geschrieben:
>
>> Warren Bell wrote:
>>
>>> I'm running Perl 5.8.2 on linux. I've heard a little about perlcc so
>>> I desided to try it on one of my scripts (perlcc -o index -B
>>> index.cgi) and I have a few questions:
>>>
>>> Will the script in bytecode run faster?
>>>
>>> Can I distribute the script in bytecode and will it work on most
>>> linux/unix systems with perl?
>>>
>>> Is it easy for someone to turn that bytecode back into my original
>>> source?
>>
>> If you only trying to compile a standalone try PAR.
>
>
> Wow, I have never heard of Par before, but it definitely sounds cool.
> What's the impact so far? Will it be as successful as JAR for java?
I is pretty cool IMHO. I have only used it a couple of times on Windows
to produce a standalone EXE.
As far as it taking off, only the Perl gods know that one...
------------------------------
Date: Sun, 07 Dec 2003 20:44:13 GMT
From: "Jim Keenan" <no_spam_for_jkeen@verizon.net>
Subject: Re: read file with while and then scan lines into array
Message-Id: <xiMAb.2786$Ji.2697@nwrdny02.gnilink.net>
"Martin Foster" <mdfoster44@netscape.net> wrote in message
news:6a20f90a.0312070847.55fba893@posting.google.com...
> Here's the data
> .....skipping top part of file
> loop_
> _iza_sc_CoordinationSequence
> 1 4 9 17 28 42 60 82 111 149 191 229 262 297 336 384
> 1 4 10 19 30 44 63 89 121 155 188 221 258 302 355 415
> 1 4 9 18 32 49 68 89 114 144 179 221 267 314 364 417
>
> loop_
> _iza_sc_VertexSymbols
> 4.6.4.6.4.6
> 4.4.6.6.6.8_{3}
> 4.4.4.6.8.12
> ......skipping bottom part of file.
>
> I want to scan in the number sequences after
> _iza_sc_CoordinationSequence
> into an array and them into mySQL.
>
Here is a solution which (a) assumes that the target lines all follow a
pattern of "unsigned integers separated by a single whitespace" and (b)
stores the results in a hash of arrays of arrays. I leave to you the task
of feeding this into MySQL.
jimk
##### START CODE BLOCK #################
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my (@chunks, %results);
{
local $/ = "\n\n"; # slurp data in by 'paragraphs'
while (<DATA>) {
next unless /_iza_sc_CoordinationSequence/; # ignore all chunks
except ones that contain this string
push (@chunks, $_);
}
}
for (my $i = 0; $i <= $#chunks; $i++) {
my (@lines, @sequences);
@lines = split(/\n/, $chunks[$i]);
foreach my $line (@lines) {
if ($line =~ /^(\d+\s)+\d+\s*$/) {
push(@sequences, [ split(/\s/, $line) ]);
}
}
$results{$i} = [@sequences];
}
print Dumper(\%results);
__DATA__
loop_
_iza_sc_CoordinationSequence
1 4 9 17 28 42 60 82 111 149 191 229 262 297 336 384
1 4 10 19 30 44 63 89 121 155 188 221 258 302 355 415
1 4 9 18 32 49 68 89 114 144 179 221 267 314 364 417
loop_
_iza_sc_VertexSymbols
4.6.4.6.4.6
4.4.6.6.6.8_{3}
4.4.4.6.8.12
loop_
_iza_sc_SomethingElse
3 7 9 17 28 42 60 82 111 149 191 229 262 297 336 384
3 7 10 19 30 44 63 89 121 155 188 221 258 302 355 415
3 7 9 18 32 49 68 89 114 144 179 221 267 314 364 417
loop_
_iza_sc_CoordinationSequence
5 8 9 17 28 42 60 82 111 149 191 229 262 297 336 384
5 8 10 19 30 44 63 89 121 155 188 221 258 302 355 415
5 8 9 18 32 49 68 89 114 144 179 221 267 314 364 417
##### END CODE BLOCK #################
If we were playing Perl Golf and wanted to trade off readability for
brevity, we could re-write the 'for' loop as:
for (my $i = 0; $i <= $#chunks; $i++) {
my (@sequences);
foreach (split(/\n/, $chunks[$i])) {
push(@sequences, [ split(/\s/) ]) if (/^(\d+\s)+\d+\s*$/);
}
$results{$i} = [@sequences];
}
------------------------------
Date: 7 Dec 2003 11:34:54 -0800
From: kevin_shay@yahoo.com (Kevin Shay)
Subject: Re: Why can't I parse google search results?
Message-Id: <5550ef1e.0312071134.5827276@posting.google.com>
utsuxs@hotmail.com (bob) wrote in message
news:<51c3a5d3.0312070801.5093c8cf@posting.google.com>...
> I'm trying to extract data from the results page of search engines
> with these two
> modules use LWP::Simple and HTML::Parse, and the get command.
>
> I can extract from yahoo and altavista but google is not cooperating.
>
> I get this error message
>
> Can't fetch HTML from http://www.google.com/search?q=smeghead at
> parsing.pl line 13.
It appears Google won't give you a page unless you send a User-Agent
the request, which LWP::Simple doesn't do. Try using LWP::UserAgent
instead.
http://www.perldoc.com/perl5.8.0/lib/LWP/UserAgent.html
Note that fetching Google results programmatically is most likely a
violation of Google's Terms of Service. Not that there would be any
consequences, but I thought I'd point this out. If you wanted to be
above-board about it, you could use the Google API:
http://www.google.com/apis/
--Kevin
------------------------------
Date: Sun, 7 Dec 2003 12:10:24 -0800
From: Keith Keller <kkeller-usenet@wombat.san-francisco.ca.us>
Subject: Re: Why can't I parse google search results?
Message-Id: <gf10rb.s2v.ln@goaway.wombat.san-francisco.ca.us>
-----BEGIN xxx SIGNED MESSAGE-----
Hash: SHA1
On 2003-12-07, Kevin Shay <kevin_shay@yahoo.com> wrote:
>
> Note that fetching Google results programmatically is most likely a
> violation of Google's Terms of Service.
I'm not so sure--the closest violation would be for an ''offline''
search of Google, but since they don't define what that's supposed to
mean, I'd bet that running a script that performs a Google search would
be fine. Putting said script into a cron job might not be fine, but who
knows?
> If you wanted to be
> above-board about it, you could use the Google API:
>
> http://www.google.com/apis/
The Google API has restrictions as well--IIRC you're limited to 100
searches a day. :)
- --keith
- --
kkeller-usenet@wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://wombat.san-francisco.ca.us/cgi-bin/fom
-----BEGIN xxx SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
iD8DBQE/04krhVcNCxZ5ID8RAqx1AKCWeKroZ7F01g+39gSy4cGQwYRxPwCePnhl
gfINSpNyZx2zIbuWZqtqTbM=
=8Nv/
-----END PGP SIGNATURE-----
------------------------------
Date: 07 Dec 2003 14:41:53 -0800
From: Gisle Aas <gisle@activestate.com>
Subject: Re: Why can't I parse google search results?
Message-Id: <m3smjwp7wu.fsf@eik.i-did-not-set--mail-host-address--so-shoot-me>
kevin_shay@yahoo.com (Kevin Shay) writes:
> It appears Google won't give you a page unless you send a User-Agent
> the request, which LWP::Simple doesn't do.
This is not true. LWP::Simple does send a User-Agent header. Problem
here is that Google blocks requests with the default LWP User-Agent
header.
--
Gisle Aas
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 5909
***************************************