[32601] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 3874 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Feb 2 18:09:22 2013

Date: Sat, 2 Feb 2013 15:09:07 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sat, 2 Feb 2013     Volume: 11 Number: 3874

Today's topics:
    Re: capturing, computing the ephemeris and passing it t <cal@example.invalid>
    Re: capturing, computing the ephemeris and passing it t <derykus@gmail.com>
    Re: capturing, computing the ephemeris and passing it t <ben@morrow.me.uk>
    Re: capturing, computing the ephemeris and passing it t <cal@example.invalid>
    Re: capturing, computing the ephemeris and passing it t <cal@example.invalid>
        Learning to write modules by example. <justin.1211@purestblue.com>
    Re: Learning to write modules by example. <ben@morrow.me.uk>
    Re: Learning to write modules by example. <rweikusat@mssgmbh.com>
        OT: Any IT related events in the NYC/Boston area this y <r.mariotti@fdcx.net>
        RFC - File::Util 4.x Series Pre-Release <nomail@server.invalid>
        taking a look at some examples of web automation script <cal@example.invalid>
    Re: The definitive statement on parsing HTML with regul <brian.d.foy@gmail.com>
    Re: The definitive statement on parsing HTML with regul <*@eli.users.panix.com>
    Re: The definitive statement on parsing HTML with regul <cwilbur@chromatico.net>
    Re: The definitive statement on parsing HTML with regul <ben@morrow.me.uk>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 31 Jan 2013 23:35:07 -0700
From: Cal Dershowitz <cal@example.invalid>
Subject: Re: capturing, computing the ephemeris and passing it to gfortran
Message-Id: <vq6dnR2Jke2G_5bMnZ2dnUVZ_hSdnZ2d@supernews.com>

On 01/31/2013 07:56 AM, Charlton Wilbur wrote:
>>>>>> "CD" == Cal Dershowitz <cal@example.invalid> writes:
>
>      CD> The documentation
>      CD> there though is pretty thin.
>
> Perhaps you are not looking in the right place:
>
>          perldoc perlref
>          perldoc perlreftut
>
> Charlton
>
>

Thx, Charlton, the perlreftut was just the ointment.  Unfortunately, I'm 
on my 4th errant notion of how I'm gonna dereference these things.  In 
the reading, it mentions the alpaca book, which is one of the few books 
I've ever had that I was really digging, and it disappeared before I was 
finished reading it through once.

I remember it had a great section on this, but without it as a reference 
at hand, I seem to be struggling.

Tried a couple things that looked like good ideas from here:

http://www.perlmeme.org/howtos/using_perl/dereferencing.html

I'll place the order to replace the book tomorrow.  My gf has amazon 
prime; it's awesome.

Anyways, this was my latest idea:

$ ./capture4.pl
Not a SCALAR reference at ./capture4.pl line 12.
$ perltidy -b capture4.pl
$ cat capture4.pl
#!/usr/bin/perl -w
use strict;
use autodie;
use utf8;
use WWW::Mechanize;
use HTML::TokeParser;

my $url    = 'http://www.fourmilab.ch/yoursky/';
my $mech   = WWW::Mechanize->new;
my $result = $mech->get($url);
die "GET failedn" unless $result->is_success;
my $res = $$result;
print "res is" . $res . "\n";
my $filename = 'content1.txt';
$mech->save_content($filename);
my $links = $mech->links();

for my $element ( @{$links} ) {

     my $res2 = $$element;
     print "res2 is " . $res2 . "\n";
}

$
-- 
Cal


------------------------------

Date: Fri, 1 Feb 2013 02:45:17 -0800 (PST)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: capturing, computing the ephemeris and passing it to gfortran
Message-Id: <a0d755cd-21c3-4f02-b626-0236b34e133b@googlegroups.com>

On Thursday, January 31, 2013 10:35:07 PM UTC-8, Cal Dershowitz wrote:
> ...
> $ ./capture4.pl
> 
> Not a SCALAR reference at ./capture4.pl line 12.
> 
> $ perltidy -b capture4.pl
> 
> $ cat capture4.pl
> 
> #!/usr/bin/perl -w
> 
> use strict;
> 
> use autodie;
> 
> use utf8;
> 
> use WWW::Mechanize;
> 
> use HTML::TokeParser;
> 
> 
> 
> my $url    = 'http://www.fourmilab.ch/yoursky/';
> 
> my $mech   = WWW::Mechanize->new;
> 
> my $result = $mech->get($url);
> 
> die "GET failedn" unless $result->is_success;
> 
> my $res = $$result;
            ^^^^^^^^^

$result is an HTTP::Response object so you'll
want something like this, eg,

 my $res = $result->as_string;

or:

 my $res = $result->dump(...);

See: perldoc HTTP::Message for more about dump()
     options

> print "res is" . $res . "\n"; 
> ...

You'll need to research and modify appropriately
the following as well:

> my $res2 = $$element; 

-- 
Charles DeRykus 


------------------------------

Date: Fri, 1 Feb 2013 15:17:31 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: capturing, computing the ephemeris and passing it to gfortran
Message-Id: <bt2tt9-hrr2.ln1@anubis.morrow.me.uk>


Quoth Cal Dershowitz <cal@example.invalid>:
> 
> $ ./capture4.pl
> Not a SCALAR reference at ./capture4.pl line 12.
> $ perltidy -b capture4.pl
> $ cat capture4.pl
> #!/usr/bin/perl -w
> use strict;
> use autodie;
> use utf8;
> use WWW::Mechanize;
> use HTML::TokeParser;
> 
> my $url    = 'http://www.fourmilab.ch/yoursky/';
> my $mech   = WWW::Mechanize->new;
> my $result = $mech->get($url);
> die "GET failedn" unless $result->is_success;
> my $res = $$result;
> print "res is" . $res . "\n";
> my $filename = 'content1.txt';
> $mech->save_content($filename);
> my $links = $mech->links();
> 
> for my $element ( @{$links} ) {
> 
>      my $res2 = $$element;

I'm curious: what made you think this might work? What does $element
contain at this point?

Does this help? (It may not.)

    $links ==> SCALAR --> ARRAY
                          elem 0 --> WWW::Mechanize::Link object
             $element ==> elem 1 --> WWW::Mechanize::Link object

==> is a 'naming' or 'aliasing' relationship, --> is a 'referencing'
relationship. So: $links names a scalar. That scalar holds a ref which
points to an array. Each element of that array hold a ref which points
to a WWW::Mechanize::Link object. $element is currently an alternative
name for one of those elements.

Ben



------------------------------

Date: Fri, 01 Feb 2013 19:52:26 -0700
From: Cal Dershowitz <cal@example.invalid>
Subject: Re: capturing, computing the ephemeris and passing it to gfortran
Message-Id: <qPSdnZ6C2rr34pHMnZ2dnUVZ_tGdnZ2d@supernews.com>

On 02/01/2013 03:45 AM, C.DeRykus wrote:
> On Thursday, January 31, 2013 10:35:07 PM UTC-8, Cal Dershowitz wrote:
>> ...
>> $ ./capture4.pl
>>
>> Not a SCALAR reference at ./capture4.pl line 12.
>>
>> $ perltidy -b capture4.pl
>>
>> $ cat capture4.pl
>>
>> #!/usr/bin/perl -w
>>
>> use strict;
>>
>> use autodie;
>>
>> use utf8;
>>
>> use WWW::Mechanize;
>>
>> use HTML::TokeParser;
>>
>>
>>
>> my $url    = 'http://www.fourmilab.ch/yoursky/';
>>
>> my $mech   = WWW::Mechanize->new;
>>
>> my $result = $mech->get($url);
>>
>> die "GET failedn" unless $result->is_success;
>>
>> my $res = $$result;
>              ^^^^^^^^^
>
> $result is an HTTP::Response object so you'll
> want something like this, eg,
>
>   my $res = $result->as_string;
>
> or:
>
>   my $res = $result->dump(...);
>
> See: perldoc HTTP::Message for more about dump()
>       options

Thanks C, your response was enough for me to re-think what I had been 
trying, and I think I've made significant progress.  It turns out that 
this result matters very little when it just works every time.
>
>> print "res is" . $res . "\n";
>> ...
>
> You'll need to research and modify appropriately
> the following as well:
>
>> my $res2 = $$element;
>

This new script shows me why I could never figure out the attributes of 
the sky map image.  It turns out that it's cobbled together:

$ ./capture5.pl
tags
area
area
area
area
area
 ...
guess is Explain symbols in the map.
guess is View horizon at this observing site.
guess is Explain controls in the following panel.
guess is Date and Time
guess is Now
guess is Universal time:
guess is Julian day:
guess is Observing Site
 ...
$ cat capture5.pl
#!/usr/bin/perl -w
use strict;
use autodie;
use utf8;
use WWW::Mechanize;
use HTML::TokeParser;

my $url    = 'http://www.fourmilab.ch/yoursky/';
my $mech   = WWW::Mechanize->new;
my $result = $mech->get($url);
die "GET failedn" unless $result->is_success;

$mech->follow_link( text => 'Set for nearby city', n => 1 );
$mech->follow_link( text => 'San Francisco CA',    n => 1 );
my @links = $mech->links();
print "tags\n";
print $_->tag() . "\n" foreach @links;
my $filename = 'content1.txt';
$mech->save_content($filename);

for my $element (@links) {
     my $guess = $element->text();
     print "guess is " . $guess . "\n";
}

$

Alright, so far so good.  The next step is to change the time.  I'll try 
to make it 5:45 p.m. pacific daylight savings time, on the 21st of january.
, that makes it 22 because in GMT it's a day ahead then.
-- 
Cal


------------------------------

Date: Fri, 01 Feb 2013 22:08:39 -0700
From: Cal Dershowitz <cal@example.invalid>
Subject: Re: capturing, computing the ephemeris and passing it to gfortran
Message-Id: <QKKdnfaWfYvFApHMnZ2dnUVZ_iydnZ2d@supernews.com>

On 02/01/2013 08:17 AM, Ben Morrow wrote:
>
> Quoth Cal Dershowitz <cal@example.invalid>:

>> for my $element ( @{$links} ) {
>>
>>       my $res2 = $$element;
>
> I'm curious: what made you think this might work? What does $element
> contain at this point?

This was just my last desperate try on a given night where nothing was 
working.
>
> Does this help? (It may not.)
>
>      $links ==> SCALAR --> ARRAY
>                            elem 0 --> WWW::Mechanize::Link object
>               $element ==> elem 1 --> WWW::Mechanize::Link object
>
> ==> is a 'naming' or 'aliasing' relationship, --> is a 'referencing'
> relationship. So: $links names a scalar. That scalar holds a ref which
> points to an array. Each element of that array hold a ref which points
> to a WWW::Mechanize::Link object. $element is currently an alternative
> name for one of those elements.
>
> Ben
>

Gosh, Ben, are you saying that --> is the same as ->?

Anyways, I think forms are what's gonna work best for this radio button.

http://search.cpan.org/~gaas/HTML-Form-6.03/lib/HTML/Form.pm

I do have partial results:

$ ./capture6.pl
guess is POST http://www.fourmilab.ch/cgi-bin/Yoursky [request]
   <NONAME>=Update                (submit)
   date=0                         (radio)    [*0/Now|1/Universal 
time:|2/Julian day:]
   utc=2013-02-02 4:47:38         (text)
   jd=2456325.69975               (text)
   lat=37�37'5"                   (text)
 ...
$ cat capture6.pl
#!/usr/bin/perl -w
use strict;
use autodie;
use utf8;
use WWW::Mechanize;
use HTML::TokeParser;

my $url    = 'http://www.fourmilab.ch/yoursky/';
my $mech   = WWW::Mechanize->new;
my $result = $mech->get($url);
die "GET failedn" unless $result->is_success;

$mech->follow_link( text => 'Set for nearby city', n => 1 );
$mech->follow_link( text => 'San Francisco CA',    n => 1 );
my $filename = 'content1.txt';
$mech->save_content($filename);

my @forms = $mech->forms();

for my $element (@forms) {
     my $guess = $element->dump();
     print "guess is " . $guess . "\n";
}

$

I can see that my value is gonna have to be changed to
   utc=2013-01-22 1:45:00         (text)

, but beyond that I'm just gonna keep after it by the method of 
successive guessing.
-- 
Cal


------------------------------

Date: Fri, 1 Feb 2013 12:51:34 +0000
From: Justin C <justin.1211@purestblue.com>
Subject: Learning to write modules by example.
Message-Id: <mbqst9-evq.ln1@zem.masonsmusic.co.uk>


I've written a few in-house modules for use within the business here
and they're ugly. I recently had reason to write a few new ones to
replace some procedural stuff that needed to be updated anyway, so I
thought I'd do it properly and write some sensible OO modules - a
module makes more sense for them anyway.

I started off OK, but it soon got ugly again (though much less so).
I've been reading perlmodstyle, and I've also read José's Guide for
creating Perl modules (though it is a little old). What I think
would be useful would be to read the source of a great existing
module, one that's been written with 'best practice' in mind. Now I
can look at any of the hundreds of modules I have installed but I
don't know which ones are considered to be well (or very well)
written, so, can people recommend good examples of 'the best way to
do it'? I don't want to pick one and find it's the worst one from
which to learn by example.

As alway, thank you for your suggestions.


   Justin.

-- 
Justin C, by the sea.


------------------------------

Date: Fri, 1 Feb 2013 16:29:03 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Learning to write modules by example.
Message-Id: <f37tt9-ghs2.ln1@anubis.morrow.me.uk>


Quoth Justin C <justin.1211@purestblue.com>:
> 
> I've written a few in-house modules for use within the business here
> and they're ugly. I recently had reason to write a few new ones to
> replace some procedural stuff that needed to be updated anyway, so I
> thought I'd do it properly and write some sensible OO modules - a
> module makes more sense for them anyway.
> 
> I started off OK, but it soon got ugly again (though much less so).
> I've been reading perlmodstyle, and I've also read José's Guide for
> creating Perl modules (though it is a little old). What I think
> would be useful would be to read the source of a great existing
> module, one that's been written with 'best practice' in mind. Now I
> can look at any of the hundreds of modules I have installed but I
> don't know which ones are considered to be well (or very well)
> written, so, can people recommend good examples of 'the best way to
> do it'? I don't want to pick one and find it's the worst one from
> which to learn by example.

As far as non-Moose OO goes: the TAP::* modules are fairly newly written
by people who know what they're doing; the CPANPLUS code is also pretty
clean, as are the various modules making up LWP. However, current best
practice is to use Moose for OO code, at least for large systems;
Catalyst would be a good example of a system which was cleaned up a lot
by switching to Moose.

All OO code that will run on 5.10 or above should at least

    use mro "c3";

at the top of the class heirarchy and use the next::method facility
(documented in perldoc mro) rather than SUPER::.

Ben



------------------------------

Date: Fri, 01 Feb 2013 21:51:08 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Learning to write modules by example.
Message-Id: <87y5f71z5f.fsf@sapphire.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:

[...]

> As far as non-Moose OO goes: the TAP::* modules are fairly newly written
> by people who know what they're doing; the CPANPLUS code is also pretty
> clean, as are the various modules making up LWP. However, current best
> practice is to use Moose for OO code, at least for large systems;

Everybody who ever implemented "Jonathans very own version of an OO
programming system" was presumably convinced that HIS idea of how this
should work was Very Much Superior[tm] to the ideas of all other
people. AFAIK, Moose is more-or-less the Perl6 OO system reimplemented
atop of the perl5 interpreter in Perl. I can't comment on the merits
of Perl6 OO because I have never used it. But the idea of implementing
support for fundamental programming paradigms in form of 'interpreted'
extension libraries is a very bad one. People who want to use Perl6
OO should use Perl6. And people who want to use Perl5 for some reason
would be well-advised to stick to the features provided by the perl
implementation itself if they want to write reasonably efficient
'production quality' code (*except* insofar they happen to be
conslutants whose involvement with any project ends long before
'making it work in the real world' becomes an issue -- the 'download
whatever you can' approach is perfect of 'optimizing' the income/ work
ratio in such a setting: Get paid for selling other peoples'
code. Move on before the problems start to hit you).

The perl5 OO system is perfectly usable. Like all systems, it has its
advantages and its drawbacks.


------------------------------

Date: Fri, 01 Feb 2013 17:24:58 -0500
From: BobMCT <r.mariotti@fdcx.net>
Subject: OT: Any IT related events in the NYC/Boston area this year???
Message-Id: <uvfog890lkt60b43b1nv38ib9jt7uiilgq@4ax.com>

It's been quite a while sense the good old LinuxWorld events as well
as other IT related ones.  I've been searching for events this year
being held between Philli, NYC and Boston areas and find virtually
none.

I see that the bostom.pm group is active here so I ask...  is anyone
aware of any such events and be willing to share the info?

Thanks


------------------------------

Date: Thu, 31 Jan 2013 17:43:53 -0600
From: Tommy <nomail@server.invalid>
Subject: RFC - File::Util 4.x Series Pre-Release
Message-Id: <CJudnRhcYLgmnJbMnZ2dnUVZ8tudnZ2d@giganews.com>

This was posted at perlmonks a few hours ago, but I'm trying to cast a
wide net, as it were, and reach as many folks as possible.  I've applied
the proper amount of flame retardant, so here goes...

What's Up

File::Util has undergone some major changes in v4.x, some of which have
been discussed here since late December. I've preserved complete
backward compatibility while performing the overhaul.

The 4.x series is a both a response to community complaints/requests,
and a big push to bring it into step with "modern" best practices and
interface styles.

I'm looking for people to kindly let me know what they
think...good/bad/otherwise. Why? I'd like to get as much community
feedback as possible in the way of "social review" of the new interface
before publishing this distribution of major changes, features, bug
fixes. I value what you have to say.

	The git repository is here:
	https://github.com/tommybutler/file-util

	A packaged dist is available here:
	http://www.atrixnet.com/File-Util-4.130300.tar.gz

What's New

Other than a slew of bug fixes and feature additions, a quick look at
some key differences in the interface is succinctly presented in the
"SYNTAX" section of the manual, here:
	https://github.com/tommybutler/file-util/blob/master/File_Util/lib/File/Util/Manual.pod#SYNTAX

See also the NEWS file in the dist.

What's Left

Things left before actual 4.x release would be to correct any
grammar/spelling issues in the docs that I haven't already caught, to
add more to the cookbook (and revisit recipes in the cookbook that are
old and could be improved), and to add even more to the test suite
(which currently runs over 500 tests in developer release test mode).
See also the TODO file in the dist.

My gratitude goes out to those who provide feedback, even if all you do
is read over the Manual (
https://github.com/tommybutler/file-util/blob/master/File_Util/lib/File/Util/Manual.pod
) on github and point out anything you find good/bad/otherwise. For
those who try out the dist itself (maybe with perlbrew?) and play with
File::Util a bit, I thank you in the most emphatic terms possible. It's
so important to me to put forth the best code I can for the community,
for those who use the module commercially, for the CPAN, and for Perl.

My thanks already goes out to MST and RJBS who have provided valuable
help via IRC and CPAN RT. Also to SirSpammenot and Nick Perez who helped
via email and Google+, and to anyone who ever filed a bug report or
smoked File::Util.

-- Tommy Butler $ perl -MMIME::Base64 -e 'print
decode_base64("dGJ1dGxlci5ubnRwQGludGVybmV0YWxpYXMubmV0Cg==")'



------------------------------

Date: Sat, 02 Feb 2013 00:16:34 -0700
From: Cal Dershowitz <cal@example.invalid>
Subject: taking a look at some examples of web automation scripts
Message-Id: <bYOdnYTQ-JHPIJHMnZ2dnUVZ_s2dnZ2d@supernews.com>

I'd like to take a look at examples of web automation that anyone might 
have.  I found this one today and was able to adapt it (slightly) to 
actually make it work.  I think of how long it takes me to do this with 
GUI events, and this is so much better.

$ ./cpan1.pl HTML::Form
$ cat cpan1.pl
#!/usr/bin/perl

# turn on perl's safety features
use strict;
use warnings;

# work out the name of the module we're looking for
my $module_name = $ARGV[0]
or die "Must specify module name on command line";

# create a new browser
use WWW::Mechanize;
my $browser = WWW::Mechanize->new();

# tell it to get the main page
$browser->get("http://search.cpan.org/");

# okay, fill in the box with the name of the
# module we want to look up
$browser->form_number(1);
$browser->field( "query", $module_name );
$browser->click();

# click on the link that matches the module name
$browser->follow_link( text => $module_name );

my $url = $browser->uri;

# launch a browser...
system( 'firefox', $url );

exit(0);
$

That this fires up firefox at the end makes it a great tool.  I'm 
struggling now with the referencing in the object model, but I'm wearing 
it down with repetition.

I'd love to see any examples others may have.
-- 
Cal


------------------------------

Date: Thu, 31 Jan 2013 13:31:00 -0600
From: brian d foy <brian.d.foy@gmail.com>
Subject: Re: The definitive statement on parsing HTML with regular expressions
Message-Id: <310120131331001225%brian.d.foy@gmail.com>

In article <ke9gk0$9vd$1@reader1.panix.com>, Tim McDaniel
<tmcd@panix.com> wrote:

> I'd have to say that at
>
> http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-
> self-contained-tags
> the first answer is definitive. 

It's certainly funny, and was dogma until tchrist actually solved it
with a recursive regex in a different Stackoverflow answer:

http://stackoverflow.com/questions/4231382/regular-expression-pattern-no
t-matching-anywhere-in-string/4234491#4234491


------------------------------

Date: Thu, 31 Jan 2013 20:35:43 +0000 (UTC)
From: Eli the Bearded <*@eli.users.panix.com>
Subject: Re: The definitive statement on parsing HTML with regular expressions
Message-Id: <eli$1301311520@qz.little-neck.ny.us>

In comp.lang.perl.misc, brian d foy  <brian.d.foy@gmail.com> wrote:
> It's certainly funny, and was dogma until tchrist actually solved it
> with a recursive regex in a different Stackoverflow answer:
> 
> http://stackoverflow.com/questions/4231382/regular-expression-pattern-no
> t-matching-anywhere-in-string/4234491#4234491

I really didn't think parsing HTML would be impossible if you allowed
yourself an arbitrary number of regular expressions and glued them
together with other code, which is exactly what that is. It's a very
cool application of sophisticated regexes, but not miracle work.

Elijah
------
in ten minutes of playing did not find edge cases to break that code


------------------------------

Date: Thu, 31 Jan 2013 17:18:47 -0500
From: Charlton Wilbur <cwilbur@chromatico.net>
Subject: Re: The definitive statement on parsing HTML with regular expressions
Message-Id: <87lib9vvw8.fsf@new.chromatico.net>

>>>>> "bdf" == brian d foy <brian.d.foy@gmail.com> writes:

    bdf> In article <ke9gk0$9vd$1@reader1.panix.com>, Tim McDaniel
    bdf> <tmcd@panix.com> wrote:

    >> I'd have to say that at
    >> 
    >> http://stackoverflow.com/questions/1732348/regex-match-open-tags-
    >> except-xhtml-self-contained-tags the first answer is definitive.

    bdf> It's certainly funny, and was dogma until tchrist actually
    bdf> solved it with a recursive regex in a different Stackoverflow
    bdf> answer:

    bdf> http://stackoverflow.com/questions/4231382/regular-expression-
    bdf> pattern-not-matching-anywhere-in-string/4234491#4234491

To be honest, before tchrist's answer it was dogma that was known to be
false by those of us who either understand the theory of computation
(since Perl's regular expressions stopped being strictly regular some
time ago) or who had to update or maintain a dog's breakfast of HTML
"parsing" using regular expressions. 

tchrist does continue to say that even though you CAN parse HTML with
Perl regular expressions, you probably SHOULDN'T, because the larger and
more sophisticated the problem, the better it is to use a real parser.
Which is wisdom, and I am not just saying that because I have been
saying it for 10+ years at this point.

Charlton


-- 
Charlton Wilbur
cwilbur@chromatico.net


------------------------------

Date: Fri, 1 Feb 2013 00:24:32 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: The definitive statement on parsing HTML with regular expressions
Message-Id: <0jert9-6v82.ln1@anubis.morrow.me.uk>


Quoth Charlton Wilbur <cwilbur@chromatico.net>:
> 
> tchrist does continue to say that even though you CAN parse HTML with
> Perl regular expressions, you probably SHOULDN'T, because the larger and
> more sophisticated the problem, the better it is to use a real parser.

The point is that Perl 6's grammars are, and Perl 5's regexes are slowly
becoming, a better tool for writing 'real parsers' than plain Perl, even
with the assistance of generators like Parse::RecDescent. I've no idea
how stable Regexp::Grammars is (dconway modules have a tendancy to do
amazing things until you push them too hard, at which point they explode
spectacularly) but the general principle of driving a parse with the
regex engine, with its optimiser and builtin support for backtracking,
seems sound to me.

Ben



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3874
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[32601] in Perl-Users-Digest

Perl-Users Digest, Issue: 3874 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Sat Feb 2 18:09:22 2013

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Feb 2 18:09:22 2013