[31625] in Perl-Users-Digest
Perl-Users Digest, Issue: 2884 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Mar 24 18:09:46 2010
Date: Wed, 24 Mar 2010 15:09:26 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Wed, 24 Mar 2010 Volume: 11 Number: 2884
Today's topics:
Re: case study: want to print redirects when LWP::UserA <hjp-usenet2@hjp.at>
Does $^N only refer to capturing groups? (Seymour J.)
Re: Does $^N only refer to capturing groups? <derykus@gmail.com>
Re: Does $^N only refer to capturing groups? <ben@morrow.me.uk>
Re: FAQ 5.17 Is there a leak/bug in glob()? <brian.d.foy@gmail.com>
Inline::Python <eric@fruitcom.com>
Re: Inline::Python <ben@morrow.me.uk>
Re: Inline::Python <eric@fruitcom.com>
Perl & Get web content Perl-Function [Expert] <normancougloff@gmail.com>
Re: Perl & Get web content Perl-Function [Expert] <tadmc@seesig.invalid>
Re: Perl & Get web content Perl-Function [Expert] <jurgenex@hotmail.com>
Re: Perl & Get web content Perl-Function [Expert] <normancougloff@gmail.com>
Re: Perl & Get web content Perl-Function [Expert] (Randal L. Schwartz)
Re: Perl & Get web content Perl-Function [Expert] <ben@morrow.me.uk>
Re: Perl & Get web content Perl-Function [Expert] <jurgenex@hotmail.com>
Re: Perl HTML searching <KBfoMe@realdomain.net>
Re: Perl HTML searching <jurgenex@hotmail.com>
Re: Perl HTML searching <ben@morrow.me.uk>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Tue, 23 Mar 2010 19:18:42 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: case study: want to print redirects when LWP::UserAgent makes a request
Message-Id: <slrnhqi1g2.f41.hjp-usenet2@hrunkner.hjp.at>
On 2010-03-22 20:16, Bennett Haselton <bennett@peacefire.org> wrote:
> Suppose you're an experienced Perl programmer but you don't know the
> answer to this particular question. I look first in the documentation
> page UserAgent.html. I see it says "The difference from request() is
> that simple_request() will not try to handle redirects or
> authentication responses. The request() method will in fact invoke
> this method for each simple request it sends.", so I know about the
> difference between those two.
>
> The next thing that stands out is:
> "$ua->redirect_ok( $request )
> This method is called by request() before it tries to follow a
> redirection to the request in $request. This should return a TRUE
> value if this redirection is permissible."
> It's not clear whether LWP::UserAgent calls this for *every* redirect,
> or just the first one. (Taken literally, the documentation says it
> will only do it for the first one -- "before it tries to follow a
> redirection to the request in $request".)
I agree that this isn't crystal-clear, but I don't think your "literal"
interpretation is reasonable: Note that it says "redirection TO the
request in $request". So if you have a redirect cascade
http://foo.example.com
-> http://www.example.com/foo
-> http://www.example.com/show.pl?doc=foo
then it will first be called with $request "GET http://www.example.com/foo",
and if that was allowed and succeeded with the new $request
"GET http://www.example.com/show.pl?doc=foo"
For current versions of LWP::UserAgent, redirect_ok takes two arguments,
btw - the new request and the response which caused the redirect.
> But still, that would mean
> I'd have to create a subclass of LWP::UserAgent to override behavior
> of this function.
Yes. If the default redirect_ok isn't sufficient (AFAIK you can turn on
and off redirects for each request type (GET, POST, ...), but that's it)
then subclassing and overriding this method is the cleanest (and
probably simplest) way.
> I'm hoping there's something easier.
It sounds like just turning off redirects (see requests_redirectable) or
using simple_request is sufficent.
> So that's where I give up and ask someone. What would you do? Is
> there something in the documentation that would make it obvious to you
> what to try next? What do I need to train myself to look for?
What I generally do when the documentation isn't sufficiently clear is
to write small test scripts which use the function I want to explore. I
may even use the debugger to step through this function and see what it
does.
I'm rather impatient though (as a Perl programmer I can admit that) and
if I can't figure out how something works within a short time I write it
myself. So supposing I couldn't figure out how redirect_ok was supposed
to work, I'd just fall back to using simple_request and write an extra
loop to handle the redirects.
hp
------------------------------
Date: Wed, 24 Mar 2010 11:46:05 -0400
From: Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>
Subject: Does $^N only refer to capturing groups?
Message-Id: <4baa33bd$1$fuzhry+tra$mr2ice@news.patriot.net>
In the Perl 5.10 documentation of $^N, does "group" refer only to
capturing groups? The context is that I'd like to write something like
qr/(?:\d\d)
(?({$^N > 24})
(*FAIL)
)
/x
and have no need to capture the \d\d other than the range check.
Thanks.
--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>
Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to spamtrap@library.lspace.org
------------------------------
Date: Wed, 24 Mar 2010 13:41:46 -0700 (PDT)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: Does $^N only refer to capturing groups?
Message-Id: <a6953f6b-3031-4cfd-ac29-ea695839e670@g1g2000pre.googlegroups.com>
On Mar 24, 8:46=A0am, Shmuel (Seymour J.) Metz
<spamt...@library.lspace.org.invalid> wrote:
> In the Perl 5.10 documentation of $^N, does "group" refer only to
> capturing groups? The context is that I'd like to write something like
>
> =A0qr/(?:\d\d)
> =A0 =A0 (?({$^N > 24})
> =A0 =A0 =A0 (*FAIL)
> =A0 =A0 )
> =A0 =A0/x
>
> and have no need to capture the \d\d other than the range check.
No, only capturing groups.
perl -wle '"12" =3D~ /(?:\d)(?{print $^N} (\d)(?{print $^N})/x'
Use of uninitialized value in print ... line 1.
2
perldoc perlvar has an example but with a
non-capturing submatch within that's a bit
distracting.
--
Charles DeRykus
------------------------------
Date: Wed, 24 Mar 2010 21:03:43 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Does $^N only refer to capturing groups?
Message-Id: <fabq77-5a2.ln1@osiris.mauzo.dyndns.org>
Quoth Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>:
> In the Perl 5.10 documentation of $^N, does "group" refer only to
> capturing groups?
Yes, though it refers to both ordinary () capture groups and the new
(?<name>...) named capture groups.
> The context is that I'd like to write something like
>
> qr/(?:\d\d)
> (?({$^N > 24})
> (*FAIL)
> )
> /x
>
> and have no need to capture the \d\d other than the range check.
I don't think that's possible directly, though if your (?:\d\d) is
fixed-length you can use something like
qr/(?:\d\d)
(?(?{ substr(${^MATCH}, -2) > 24 })
(*FAIL)
)
/xp
since during a match ${^MATCH} contains only the portion matched so far.
(I used ${^MATCH} and /p rather than $& since you're already depending
on 5.10.) Just don't try to run another match inside a (?{}) block, as
the regex engine isn't properly reentrant and may segfault.
Ben
------------------------------
Date: Tue, 23 Mar 2010 13:22:43 -0500
From: brian d foy <brian.d.foy@gmail.com>
Subject: Re: FAQ 5.17 Is there a leak/bug in glob()?
Message-Id: <230320101322433621%brian.d.foy@gmail.com>
In article <220320101434074137%brian.d.foy@gmail.com>, brian d foy
<brian.d.foy@gmail.com> wrote:
> In article <lnhbob53qf.fsf@nuthaus.mib.org>, Keith Thompson
> <kst-u@mib.org> wrote:
>
> > PerlFAQ Server <brian@theperlreview.com> writes:
> > > 5.17: Is there a leak/bug in glob()?
> > >
> > > Due to the current implementation on some operating systems, when you
> > > use the glob() function or its angle-bracket alias in a scalar
> > > context,
> > > you may cause a memory leak and/or unpredictable behavior. It's best
> > > therefore to use glob() only in list context.
> >
> > How old is this FAQ? Is the leak still there?
>
> The FAQ is pretty old, but I don't know for sure if the leak is still
> there. I'll see what I can find out.
Actually, this isn't even the current answer in the FAQ. I thought that
was really odd so I don't know how this snuck in there.
------------------------------
Date: 23 Mar 2010 22:19:53 GMT
From: Eric smith <eric@fruitcom.com>
Subject: Inline::Python
Message-Id: <slrnhqifk9.ct.eric@pepper.fruitcom.com>
In order to use the google data API without using (much) python,
I want to use Inline::Python.
The following code works, however, I do not know how to call the
function CreateEntry() from the ContactsSample class in
the perl code.
Thanks for any help with this.
Eric Smith
#!/usr/bin/perl
print "Hello from perl\n";
use Inline Python => <<'END_OF_PYTHON_CODE';
import sys
import getopt
import getpass
import atom
import gdata.contacts
import gdata.contacts.service
class ContactsSample(object):
"""ContactsSample object demonstrates operations with the Contacts feed."""
def __init__(self, email, password):
"""Constructor for the ContactsSample object."""
self.gd_client = gdata.contacts.service.ContactsService()
self.gd_client.email = email
self.gd_client.password = password
self.gd_client.source = 'GoogleInc-ContactsPythonSample-1'
self.gd_client.ProgrammaticLogin()
def CreateEntry(self):
name = 'John Doe''
notes = "Something about him"
primary_email = 'email@domain.com'
new_contact = gdata.contacts.ContactEntry(title=atom.Title(text=name))
new_contact.content = atom.Content(text=notes)
new_contact.email.append(gdata.contacts.Email(address=primary_email,
primary='true', rel=gdata.contacts.REL_WORK))
entry = self.gd_client.CreateContact(new_contact)
if entry:
print 'Creation successful!'
def Run(self):
self.CreateEntry()
def main():
user = 'XXXXXXXXX'
pw = 'XXXXXXXX'
try:
sample = ContactsSample(user, pw)
except gdata.service.BadAuthentication:
print 'Invalid user credentials given.'
return
sample.Run()
if __name__ == '__main__':
main()
END_OF_PYTHON_CODE
------------------------------
Date: Tue, 23 Mar 2010 22:39:44 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Inline::Python
Message-Id: <gisn77-fvl2.ln1@osiris.mauzo.dyndns.org>
Quoth eric@fruitcom.com:
> In order to use the google data API without using (much) python,
> I want to use Inline::Python.
>
> The following code works, however, I do not know how to call the
> function CreateEntry() from the ContactsSample class in
> the perl code.
<snip>
>
> #!/usr/bin/perl
>
> print "Hello from perl\n";
>
> use Inline Python => <<'END_OF_PYTHON_CODE';
<snip>
>
> class ContactsSample(object):
> """ContactsSample object demonstrates operations with the Contacts feed."""
>
> def __init__(self, email, password):
<snip>
>
> def CreateEntry(self):
<snip>
I've never used Inline::Python, but did you try (in Perl)
my $cs = ContactsSample->new;
$cs->CreateEntry;
That's usually how Inline modules work: they just import the appropriate
symbols into your Perl namespace.
Ben
------------------------------
Date: 24 Mar 2010 20:43:20 GMT
From: Eric smith <eric@fruitcom.com>
Subject: Re: Inline::Python
Message-Id: <slrnhqkub8.blc.eric@pepper.fruitcom.com>
Thanks Ben, but no dice alas.
I also tried this from the manual:
py_bind_class("Inline::Python", "__main__", "ContactsSample",\
"set_data", "get_data");
my $o = new ContactsSample;
Both respond with:
Can't locate object method "new" via package "ContactsSample"
Any suggestions / help?
Thanks
Eric
On 2010-03-23, Ben Morrow <ben@morrow.me.uk> wrote:
>
> Quoth eric@fruitcom.com:
>> In order to use the google data API without using (much) python,
>> I want to use Inline::Python.
>>
>> The following code works, however, I do not know how to call the
>> function CreateEntry() from the ContactsSample class in
>> the perl code.
><snip>
>>
>> #!/usr/bin/perl
>>
>> print "Hello from perl\n";
>>
>> use Inline Python => <<'END_OF_PYTHON_CODE';
><snip>
>>
>> class ContactsSample(object):
>> """ContactsSample object demonstrates operations with the Contacts feed."""
>>
>> def __init__(self, email, password):
><snip>
>>
>> def CreateEntry(self):
><snip>
>
> I've never used Inline::Python, but did you try (in Perl)
>
> my $cs = ContactsSample->new;
> $cs->CreateEntry;
>
> That's usually how Inline modules work: they just import the appropriate
> symbols into your Perl namespace.
>
> Ben
>
--
Eric Smith
------------------------------
Date: Wed, 24 Mar 2010 03:10:22 -0700 (PDT)
From: Pseudonyme <normancougloff@gmail.com>
Subject: Perl & Get web content Perl-Function [Expert]
Message-Id: <0f711170-728e-4862-a587-1c97313be721@g10g2000yqh.googlegroups.com>
Hi all !
I have some questions to you, experts, that relate to the PERL
LANGUAGE
1. Is that possible to recognize the platform of a website ?
From example : If I take this website : http://www.latimes.com
How do I know if it is a PERL, a PHP or ASP ?
Is there a trick ?
2. The Get-Web-Content PERL function.
How to get the content using PERL script ?
In PHP : it is like this :
$homepage = file_get_contents(''http://andersen.times.com/adveruser/
adverpay.php?country=1270331545&time=1236928998');
echo $homepage;
What about in PERL ?
Thank you very much dear Madams and Sirs, for your answer.
Norman Cougloff
------------------------------
Date: Wed, 24 Mar 2010 07:58:56 -0500
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: Perl & Get web content Perl-Function [Expert]
Message-Id: <slrnhqk2sv.slh.tadmc@tadbox.sbcglobal.net>
[ alt.perl trimmed from newsgroups, I don't participate in alt.* groups ]
Pseudonyme <normancougloff@gmail.com> wrote:
> Subject: Re: Perl & Get web content Perl-Function [Expert]
Please see the Posting Guidelines that are posted here frequently.
> I have some questions to you, experts, that relate to the PERL
> LANGUAGE
I don't know of any PERL language, though I know a bit about the Perl
language.
perldoc -q difference
What's the difference between "perl" and "Perl"?
> 1. Is that possible to recognize the platform of a website ?
You can *sometimes* glean the OS or web server being used by
examining the HTTP headers.
my $agent = WWW::Mechanize->new();
my $response = $agent->head( 'http://www.latimes.com' );
print $response->as_string();
> From example : If I take this website : http://www.latimes.com
> How do I know if it is a PERL, a PHP or ASP ?
> Is there a trick ?
The programming language used is not a "platform".
In fact, a website may be implemented using ALL of those languages.
Why do you think that you need to know the "platform" of a website?
> 2. The Get-Web-Content PERL function.
> How to get the content using PERL script ?
You should check the Perl FAQ before asking in the Perl newsgroup.
perldoc -q HTML
How do I fetch an HTML file?
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.
------------------------------
Date: Wed, 24 Mar 2010 08:54:53 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Perl & Get web content Perl-Function [Expert]
Message-Id: <pickq5dnhme0thbd528a9v30fvhr0t88ae@4ax.com>
Pseudonyme <normancougloff@gmail.com> wrote:
>Hi all !
Dear Pseudonyme
>I have some questions to you, experts, that relate to the PERL
>LANGUAGE
I think you meant "Perl language"?
>1. Is that possible to recognize the platform of a website ?
The OS can be found in $^O.
>From example : If I take this website : http://www.latimes.com
>How do I know if it is a PERL, a PHP or ASP ?
>Is there a trick ?
Oh, you didn't mean OS when you said platform. No, there isn't and there
can't be because
- the HyperTextTransferProtocol doesn't provide this information
- the web server can map any URI into whatever the user has configured
and can reply with whatever it feels appropriate to use
- any web site can use none or any combination of those and other
BTW: this question has nothing to do with Perl but is about how HTTP and
web servers work.
>2. The Get-Web-Content PERL function.
>How to get the content using PERL script ?
You use get() from WWW::Mechanize.
jue
------------------------------
Date: Wed, 24 Mar 2010 10:14:13 -0700 (PDT)
From: Pseudonyme <normancougloff@gmail.com>
Subject: Re: Perl & Get web content Perl-Function [Expert]
Message-Id: <9e6bc745-9e48-47a3-af86-30ab299169ef@z3g2000yqz.googlegroups.com>
Thank you Madams and Sirs,
get() from WWW::Mechanize.
That one sounds a nice function. Cool ! I will search the options that
relate to that function ?
1 - PHP versus PERL
The Perl language seems more powerful than the PHP Language. What kind
of music sounds to your ears about this sentence ?
2 - PHP into PERL ?
An automatic tool to convert ?
A translator like translating italian into english ... some tools work
so well in that matter !
3 - PERL is also a community like PHP ?
perldoc -q and http://perldoc.perl.org/
MSFT offers for example a fantastic and exhaustive documentation for C+
+ Development (MSDN). How is the documentation with Perl ?
Again, the answers we have from you are very important, and thank you
for that
Norman Cougloff.
How
------------------------------
Date: Wed, 24 Mar 2010 10:25:27 -0700
From: merlyn@stonehenge.com (Randal L. Schwartz)
To: Pseudonyme <normancougloff@gmail.com>
Subject: Re: Perl & Get web content Perl-Function [Expert]
Message-Id: <867hp14ooo.fsf@blue.stonehenge.com>
>>>>> "Pseudonyme" == Pseudonyme <normancougloff@gmail.com> writes:
Pseudonyme> 1 - PHP versus PERL
Pseudonyme> The Perl language seems more powerful than the PHP Language. What
Pseudonyme> kind of music sounds to your ears about this sentence ?
cha-ching.
:-)
Pseudonyme> 2 - PHP into PERL ?
Pseudonyme> An automatic tool to convert ?
Nothing I know of.
Pseudonyme> 3 - PERL is also a community like PHP ?
Pseudonyme> perldoc -q and http://perldoc.perl.org/
Pseudonyme> MSFT offers for example a fantastic and exhaustive documentation for C+
Pseudonyme> + Development (MSDN). How is the documentation with Perl ?
Better than many projects, but spotty coverage in places.
print "Just another Perl hacker,"; # the original
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
------------------------------
Date: Wed, 24 Mar 2010 18:15:29 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Perl & Get web content Perl-Function [Expert]
Message-Id: <1f1q77-3k.ln1@osiris.mauzo.dyndns.org>
Quoth Pseudonyme <normancougloff@gmail.com>:
>
> get() from WWW::Mechanize.
> That one sounds a nice function. Cool ! I will search the options that
> relate to that function ?
>
> 1 - PHP versus PERL
>
> The Perl language seems more powerful than the PHP Language. What kind
> of music sounds to your ears about this sentence ?
I'm not interested in language advocacy debates. Anyone who will
willingly use PHP is welcome to it.
> 2 - PHP into PERL ?
> An automatic tool to convert ?
There isn't any such thing. It would be fairly difficult to write, and
there isn't really much point. (Unless of course you count
| : I've heard that there is a shell (bourne or csh) to perl filter, does
| : anyone know of this or where I can get it?
|
| Yeah, you filter it through Tom Christiansen. :-)
-- Larry Wall
:).)
> 3 - PERL is also a community like PHP ?
>
> perldoc -q and http://perldoc.perl.org/
Perl is certainly a community. See http://perl.org/. perldoc -q searches
the Perl FAQ, which is maintained by brian d foy based on suggestions
from all of us.
> MSFT offers for example a fantastic and exhaustive documentation for C+
> + Development (MSDN). How is the documentation with Perl ?
Perl documentation is generally very complete, but not always terribly
easy to follow. For instance, it can be quite hard to find the answer to
'Which module should I use for X?': searching for 'X' on search.cpan.org
might turn up several modules, and it can be difficult to find out which
is generally considered best. Asking here, or elsewhere
(http://perlmonks.org/ is popular) is often a good strategy.
Ben
------------------------------
Date: Wed, 24 Mar 2010 11:22:48 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Perl & Get web content Perl-Function [Expert]
Message-Id: <jalkq55fcdc7a4t5cbng6ccjggkauv6qig@4ax.com>
Pseudonyme <normancougloff@gmail.com> wrote:
>1 - PHP versus PERL
>
>The Perl language seems more powerful than the PHP Language.
Different design, different purpose, different application area,
partially overlapping usage.
It's like saying a pickup truck is more useful than a suburban. For a
contractor that is certainly true, for a soccer mum certainly not.
>2 - PHP into PERL ?
>An automatic tool to convert ?
Not that I know of and given that they are rather different languages
with rather different design and different application areas I doubt
that anyone has bothered writing such a tool.
>3 - PERL is also a community like PHP ?
Perl is _not_ an acronym. If you are talking about the programming
language, then it is properly spelled as "Perl", if you are talking
about the interpreter it is properly spelled as "perl".
PERL in all caps is something different, I don't know what it is, but it
is not related to what this NG is all about.
>perldoc -q and http://perldoc.perl.org/
>
>MSFT offers for example a fantastic and exhaustive documentation for C+
>+ Development (MSDN). How is the documentation with Perl ?
Perl's documentation is extensive and it is automatically installed with
every Perl installation to give you instant access as well as to match
the Perl version you are using, see 'perldoc perl' for a top-level
overview of what topics are available.
And there are also numerous good books on various topics ranging from
basic introduction into Perl to major software projects like managing
databases or large web sites or complex IT situations with Perl.
Unfortunately there are also quite a few not so good books as well as
many horrible web sites about Perl. See "perldoc -q books" for a list of
recommended titles.
jue
------------------------------
Date: Wed, 24 Mar 2010 13:54:34 -0500
From: "Kyle T. Jones" <KBfoMe@realdomain.net>
Subject: Re: Perl HTML searching
Message-Id: <hodn5b$53d$1@news.eternal-september.org>
Tad McClellan wrote:
> Kyle T. Jones <KBfoMe@realdomain.net> wrote:
>> Steve wrote:
>
>>> like lets say I searched a site
>>> that had 15 news links and 3 of them said "Hello" in the title. I
>>> would want to extract only the links that said hello in the title.
>> Read up on perl regular expressions.
>
>
> While reading up on regular expressions is certainly a good idea,
> it is a horrid idea for the purposes of parsing HTML.
>
Ummm. Could you expand on that?
My initial reaction would be something like - I'm pretty sure *any*
method, including the use of HTML::LinkExtor, or XML transform (both
outlined upthread), involves using regular expressions "for the purposes
of parsing HTML".
At best, you're just abstracting the regex work back to the includes.
AFAIK, and feel free to correct me (I'll go take a look at some of the
relevant module code in a bit), every CPAN module that is involved with
parsing HTML uses fairly straightforward regex matching somewhere within
that module's methods.
I think there's an argument that, considering you can do this so easily
(in under 15 lines of code) without the overhead of unnecessary
includes, my way would be more efficient. We can run some benchmarks if
you want (see further down for working code).
> Have you read the FAQ answers that mention HTML?
>
> perldoc -q HTML
>
>
>> for instance, taking the above, you might first split it into a
>> "one-line per" array -
>>
>> @stuff=split(/\n/, $content);
>>
>> then parse each line for hello -
>>
>> foreach(@stuff){
>> if($_=~/Hello/){
>> do whatever;}
>> }
>
>
> The code below prints "do whatever" 3 times, but there is only one link
> containing "Hello"...
>
I should have been clearer - the above wasn't a "solution", meant to be
copied, pasted, and put into use - it was just meant to illustrate the
basic operation.
I think this works fine:
#!/usr/bin/perl -w
use strict;
use warnings;
use LWP::Simple;
my $targeturl="http://www.google.com";
my $searchstring="google";
my $contents=get($targeturl);
my @semiparsed=split(/href/i, $contents);
foreach(@semiparsed){
if($_=~/^\s*=\s*('|")(.*?)('|")/){
my $link=$2;
if($link=~/$searchstring/i){
print "Link: $link\n";
}
}
}
OUTPUT:
Link: http://images.google.com/imghp?hl=en&tab=wi
Link: http://video.google.com/?hl=en&tab=wv
Link: http://maps.google.com/maps?hl=en&tab=wl
Link: http://news.google.com/nwshp?hl=en&tab=wn
Link: http://www.google.com/prdhp?hl=en&tab=wf
Link: http://mail.google.com/mail/?hl=en&tab=wm
Link: http://www.google.com/intl/en/options/
Link:
/url?sa=p&pref=ig&pval=3&q=http://www.google.com/ig%3Fhl%3Den%26source%3Diglk&usg=AFQjCNFA18XPfgb7dKnXfKz7x7g1GDH1tg
Link:
https://www.google.com/accounts/Login?hl=en&continue=http://www.google.com/
Link:
/aclk?sa=L&ai=CbpBLOFeqS_gX3ZmVB_SbuZINs_2WoQHf44OSEMHZnNkTEAEgwVRQpuf5xAJgPaoEhQFP0M0ypnTnQAI3b4WYFAHIvHiLv4iZWVehmiie-78BOdRJQOj6QayRkYYHH4cKXyaNmAp2rmQiiPSHxtEyaVD5OZo41Kxvy6SAeAAF6CIw-SQAFsLT-9iHRfJUcoYh4qlpGqGbC080ZVCWlUUipS404rornNJFmeGlP89sgXehqOfpe8uL&num=1&sig=AGiWqtw95aIEfk5F25oGM2i6eMwkBBuj6Q&q=http://www.google.com/doodle4google/
Or, if you're only interested in the http/https links, you can do this:
#!/usr/bin/perl -w
use strict;
use warnings;
use LWP::Simple;
my $targeturl="http://www.google.com";
my $searchstring="google";
my $contents=get($targeturl);
my @semiparsed=split(/href/i, $contents);
foreach(@semiparsed){
if($_=~/^\s*=\s*('|")(http.*?)('|")/i){
my $link=$2;
if($link=~/$searchstring/i){
print "Link: $link\n";
}
}
}
OUTPUT:
Link: http://images.google.com/imghp?hl=en&tab=wi
Link: http://video.google.com/?hl=en&tab=wv
Link: http://maps.google.com/maps?hl=en&tab=wl
Link: http://news.google.com/nwshp?hl=en&tab=wn
Link: http://www.google.com/prdhp?hl=en&tab=wf
Link: http://mail.google.com/mail/?hl=en&tab=wm
Link: http://www.google.com/intl/en/options/
Link:
https://www.google.com/accounts/Login?hl=en&continue=http://www.google.com/
Like I said, if you want to present a different method where you push
all the regex work off to an include like HTML::LinkExtor, please post
it, and I can run both using a benchmark module to determine which
method is more efficient. I could be way off, here - maybe using one or
more of the modules mentioned in this thread somehow improves
efficiency. If so, please let me know.
By the way - I can think of wrenches to throw into this solution, too -
addressing the use of ' or " inside a link, for instance - but, then, I
could throw "you prolly won't ever see this but it's theoretically
possible" wrenches into most of the HTML parsing CPAN modules, too, so...
Cheers.
------------------------------
Date: Wed, 24 Mar 2010 12:15:53 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Perl HTML searching
Message-Id: <gcokq59ni6jka0q63tba3nkt5i0ofnma9g@4ax.com>
"Kyle T. Jones" <KBfoMe@realdomain.net> wrote:
>Tad McClellan wrote:
>> Kyle T. Jones <KBfoMe@realdomain.net> wrote:
>>> Steve wrote:
>>
>>>> like lets say I searched a site
>>>> that had 15 news links and 3 of them said "Hello" in the title. I
>>>> would want to extract only the links that said hello in the title.
>>> Read up on perl regular expressions.
>>
>>
>> While reading up on regular expressions is certainly a good idea,
>> it is a horrid idea for the purposes of parsing HTML.
>>
>
>Ummm. Could you expand on that?
>
>My initial reaction would be something like - I'm pretty sure *any*
>method, including the use of HTML::LinkExtor, or XML transform (both
>outlined upthread), involves using regular expressions "for the purposes
>of parsing HTML".
Regular expressions recognize regular languages. But HTML is a
context-free language and therefore cannot be recognized solely by a
regular parser.
Having said that Perl's extended regular expressions are indeed more
powerful than regular, but still it is a bad idea because the
expressions are becoming way to complex.
>At best, you're just abstracting the regex work back to the includes.
>AFAIK, and feel free to correct me (I'll go take a look at some of the
>relevant module code in a bit), every CPAN module that is involved with
>parsing HTML uses fairly straightforward regex matching somewhere within
>that module's methods.
Using REs to do _part_ of the work of parsing any language is a
no-brainer, of course everyone does it e.g. in the tokenizer.
But unless your language is a regular language (and there aren't many
useful regular languages because regular is just too restrictive) you
need additional algorithms that cannot be expressed as REs to actually
parse a context-free or context-sensitive language.
>I think there's an argument that, considering you can do this so easily
>(in under 15 lines of code) without the overhead of unnecessary
>includes, my way would be more efficient. We can run some benchmarks if
>you want (see further down for working code).
But you cannot! Ever heard of the Chomsky Hierarchy? No recollection of
Theory of Computer Languages or Basics of Compiler Construction?
What do people learn in Computer Science today?
jue
------------------------------
Date: Wed, 24 Mar 2010 21:08:40 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Perl HTML searching
Message-Id: <ojbq77-5a2.ln1@osiris.mauzo.dyndns.org>
Quoth Jürgen Exner <jurgenex@hotmail.com>:
>
> But you cannot! Ever heard of the Chomsky Hierarchy? No recollection of
> Theory of Computer Languages or Basics of Compiler Construction?
> What do people learn in Computer Science today?
I suspect that most people writing Perl have never formally studied
Computer Science. I certainly haven't, though I've picked up a fair bit
of the theory along the way because I'm interested.
Ben
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 2884
***************************************