[32677] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3953 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri May 24 14:09:27 2013

Date: Fri, 24 May 2013 11:09:14 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 24 May 2013     Volume: 11 Number: 3953

Today's topics:
    Re: I'd like to try Perl... <peterxpercival@hotmail.com>
        iCalendar module? (hymie!)
    Re: iCalendar module? <news@lawshouse.org>
    Re: iCalendar module? <ben@morrow.me.uk>
        Need more info about problem resolving entity reference <davidmichaelkarr@gmail.com>
    Re: Need more info about problem resolving entity refer <ben@morrow.me.uk>
    Re: Need more info about problem resolving entity refer <davidmichaelkarr@gmail.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 24 May 2013 18:52:48 +0100
From: Peter Percival <peterxpercival@hotmail.com>
Subject: Re: I'd like to try Perl...
Message-Id: <kno9dh$dvf$2@news.albasani.net>

Peter Percival wrote:
> I'd like to try Perl on Win 7 and according to this:
> http://www.perl.org/get.html, it's a choice between ActiveState,
> Strawberry and DWIM.  Any advice on choosing between them would be welcome.

I have "Learning Perl on Win32 Systems" by Schwartz, Olson and 
Christiansen.  It's the right level for me, but I need something for 
Windows 7 specifically, and suggestions?

-- 
I think I am an Elephant,
Behind another Elephant
Behind /another/ Elephant who isn't really there....
				A.A. Milne


------------------------------

Date: 23 May 2013 19:50:03 GMT
From: hymie@lactose.homelinux.net (hymie!)
Subject: iCalendar module?
Message-Id: <519e72eb$0$32317$862e30e2@ngroups.net>

Greetings.

I found this ruby script
    cals = Icalendar.parse($<)
    cals.each do |cal|
      cal.events.each do |event|
        puts "Organizer: #{event.organizer}"
        puts "Event:     #{event.summary}"
        puts "Starts:    #{event.dtstart.myformat} local time"
        puts "Ends:      #{event.dtend.myformat}"
        puts "Location:  #{event.location}"
        puts "Contact:   #{event.contacts}"
        puts "Description:\n#{event.description}"
        puts ""
      end
   end

and I was hoping to write something similar in perl.

The question is, can somebody recommend to me a module that I can use
to read iCal files?  All of the modules that I see appear to handle
creating iCal files, but not reading them and outputting something
human-readable.  Although it's likely that I missed it.

--hymie!    http://lactose.homelinux.net/~hymie    hymie@lactose.homelinux.net
-------------------------------------------------------------------------------


------------------------------

Date: Thu, 23 May 2013 22:25:47 +0100
From: Henry Law <news@lawshouse.org>
Subject: Re: iCalendar module?
Message-Id: <0vudnZU9ZfLBFAPMnZ2dnUVZ7sCdnZ2d@giganews.com>

On 23/05/13 20:50, hymie! wrote:
> The question is, can somebody recommend to me a module that I can use
> to read iCal files?

Tie::iCal?  iCal::Parser?

Their interfaces aren't quite as simple as that ruby program, but at a 
casual glance (I've never used iCal or either module) you should be able 
to do what you want.

-- 

Henry Law            Manchester, England


------------------------------

Date: Fri, 24 May 2013 02:51:56 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: iCalendar module?
Message-Id: <smt27a-50c2.ln1@anubis.morrow.me.uk>


Quoth hymie@lactose.homelinux.net (hymie!):
> Greetings.
> 
> I found this ruby script
>     cals = Icalendar.parse($<)
>     cals.each do |cal|
>       cal.events.each do |event|
>         puts "Organizer: #{event.organizer}"
>         puts "Event:     #{event.summary}"
>         puts "Starts:    #{event.dtstart.myformat} local time"
>         puts "Ends:      #{event.dtend.myformat}"
>         puts "Location:  #{event.location}"
>         puts "Contact:   #{event.contacts}"
>         puts "Description:\n#{event.description}"
>         puts ""
>       end
>    end
> 
> and I was hoping to write something similar in perl.
> 
> The question is, can somebody recommend to me a module that I can use
> to read iCal files?  All of the modules that I see appear to handle
> creating iCal files, but not reading them and outputting something
> human-readable.  Although it's likely that I missed it.

Data::ICal looks convincing to me, though I've not used it.

Ben



------------------------------

Date: Thu, 23 May 2013 15:32:02 -0700 (PDT)
From: David Karr <davidmichaelkarr@gmail.com>
Subject: Need more info about problem resolving entity reference
Message-Id: <2a3c81b9-494f-4aa7-89d3-821725e0a3d3@googlegroups.com>

I have a Cygwin Perl script makes numerous REST api calls to a local service, parses the results from those, and makes other calls with that data.  It also runs some of these calls in multiple threads, using LWP::UserAgent.

It mostly works, but I sometimes get errors like this:

-----------------------
caught error: 
500 Can't connect to www.w3.org:80 (Operation now in progress) http://www.w3.org/TR/html4/strict.dtd
Handler couldn't resolve external entity at line 1, column 90, byte 92
error in processing external entity reference at line 1, column 90, byte 92:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
=========================================================================================^
<html>
<head>
 at /usr/lib/perl5/vendor_perl/5.14/i686-cygwin-threads-64int/XML/Parser.pm line 187 thread 2
----------------------

That's the entire error message.  I have no idea where in the script this gets called from, and I'm not really sure what this error is telling me.


------------------------------

Date: Fri, 24 May 2013 03:34:04 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Need more info about problem resolving entity reference
Message-Id: <s5037a-2rf2.ln1@anubis.morrow.me.uk>


Quoth David Karr <davidmichaelkarr@gmail.com>:
> I have a Cygwin Perl script makes numerous REST api calls to a local
> service, parses the results from those, and makes other calls with that
> data.  It also runs some of these calls in multiple threads, using
> LWP::UserAgent.
> 
> It mostly works, but I sometimes get errors like this:
> 
> -----------------------
> caught error: 
> 500 Can't connect to www.w3.org:80 (Operation now in progress)
> http://www.w3.org/TR/html4/strict.dtd
> Handler couldn't resolve external entity at line 1, column 90, byte 92
> error in processing external entity reference at line 1, column 90, byte 92:
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
> "http://www.w3.org/TR/html4/strict.dtd">
> ======================================================================
> ===================^
> <html>
> <head>
>  at
> /usr/lib/perl5/vendor_perl/5.14/i686-cygwin-threads-64int/XML/Parser.pm
> line 187 thread 2
> ----------------------
> 
> That's the entire error message.  I have no idea where in the script
> this gets called from, and I'm not really sure what this error is
> telling me.

This error comes from XML::Parser. I assume you are invoking that
directly, to parse the REST response? What's happening is that
XML::Parser sees a DOCTYPE declaration like

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
        "http://www.w3.org/TR/html4/strict.dtd">

and, like a good little SGML-derived XML parser, tries to fetch the DTD
(using LWP) so it can validate the rest of the file. For some reason,
when it tries to connect to www.w3.org to download the DTD file, the
connection is failing with EINPROGRESS. Since LWP isn't expecting that
error code, it throws an error.

So, what's the real problem? Well, first, that's an HTML doctype. You
can't, in general, parse HTML with an XML parser, so are you sure you're
getting the responses you expect? REST services are usually pretty good
about getting their Content-types right, so you ought to be able to
check for an XML Content-type before passing the data to XML::Parser.

Second, you really don't want to keep fetching the DTDs like that. Does
the XML you're actually trying to parse use external DTDs? If not, then
you want to pass the NoLWP option to XML::Parser, so that it doesn't
even try to fetch DTDs from the network. In the case of a public DTD
like HTML the attempt to load it as a local file will fail, of course,
but the parsing wasn't going to succeed anyway, because it wasn't XML.

However, I'm slightly confused here, because the XML::Parser
documentation seems to say it doesn't parse external DTDs by default.
It's possible I'm misunderstanding; I don't think I've used XML::Parser
myself. Are you passing ParseParamEnt, and if so, why?

Third, you probably don't want to be using XML::Parser at all. As you
can see, it's old and rather cronky, and while it's extremely solid code
it also takes a rather SGMLish approach to parsing XML. Most of the
time, with modern XML use, DTDs are not used, and instead the XML just
needs to be well-formed and properly namespaced. For this sort of thing
(small documents) I would use XML::LibXML (which, incidentally, also
includes a reasonable HTML parser); if a streaming model is more
appropriate, either because your documents may be ridiculously large or
simply because your program is structured that way, I would use one of
the SAX modules.

Finally, fourth, I have no idea where that EINPROGRESS is coming from.
That error is supposed to be returned if a socket is connected while in
non-blocking mode, and the connection cannot be completed without
blocking; it's basically the equivalent of EAGAIN for connect(). This
means it shouldn't be possible to get that error without having asked
for it by setting nonblocking mode on the socket, which LWP does not
(normally) do. 

Are you doing something peculiar which might cause this to happen?
Alternatively, it's possible this is some sort of Cygwin peculiarity,
which unfortunately may be difficult to track down; if you can isolate
the conditions where the error occurs it would be useful. (For instance,
does it tend to occur when the network goes down? When the network is
overloaded? When the DNS doesn't respond promptly?)

Ben



------------------------------

Date: Fri, 24 May 2013 10:57:26 -0700 (PDT)
From: David Karr <davidmichaelkarr@gmail.com>
Subject: Re: Need more info about problem resolving entity reference
Message-Id: <b80ea31d-88fe-4e2c-81a9-06e3e4d32624@googlegroups.com>

On Thursday, May 23, 2013 7:34:04 PM UTC-7, Ben Morrow wrote:
> Quoth David Karr <davidmichaelkarr@gmail.com>:
>=20
> > I have a Cygwin Perl script makes numerous REST api calls to a local
>=20
> > service, parses the results from those, and makes other calls with that
>=20
> > data.  It also runs some of these calls in multiple threads, using
>=20
> > LWP::UserAgent.
>=20
> >=20
>=20
> > It mostly works, but I sometimes get errors like this:
>=20
> >=20
>=20
> > -----------------------
>=20
> > caught error:=20
>=20
> > 500 Can't connect to www.w3.org:80 (Operation now in progress)
>=20
> > http://www.w3.org/TR/html4/strict.dtd
>=20
> > Handler couldn't resolve external entity at line 1, column 90, byte 92
>=20
> > error in processing external entity reference at line 1, column 90, byt=
e 92:
>=20
> > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
>=20
> > "http://www.w3.org/TR/html4/strict.dtd">
>=20
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>=20
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D^
>=20
> > <html>
>=20
> > <head>
>=20
> >  at
>=20
> > /usr/lib/perl5/vendor_perl/5.14/i686-cygwin-threads-64int/XML/Parser.pm
>=20
> > line 187 thread 2
>=20
> > ----------------------
>=20
> >=20
>=20
> > That's the entire error message.  I have no idea where in the script
>=20
> > this gets called from, and I'm not really sure what this error is
>=20
> > telling me.
>=20
>=20
>=20
> This error comes from XML::Parser. I assume you are invoking that
>=20
> directly, to parse the REST response? What's happening is that
>=20
> XML::Parser sees a DOCTYPE declaration like
>=20
>=20
>=20
>     <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
>=20
>         "http://www.w3.org/TR/html4/strict.dtd">
>=20
>=20
>=20
> and, like a good little SGML-derived XML parser, tries to fetch the DTD
>=20
> (using LWP) so it can validate the rest of the file. For some reason,
>=20
> when it tries to connect to www.w3.org to download the DTD file, the
>=20
> connection is failing with EINPROGRESS. Since LWP isn't expecting that
>=20
> error code, it throws an error.
>=20
>=20
>=20
> So, what's the real problem? Well, first, that's an HTML doctype. You
>=20
> can't, in general, parse HTML with an XML parser, so are you sure you're
>=20
> getting the responses you expect? REST services are usually pretty good
>=20
> about getting their Content-types right, so you ought to be able to
>=20
> check for an XML Content-type before passing the data to XML::Parser.

I'm completely certain that in these anomalous cases, I'm definitely not ge=
tting the response I expect.  The problem with this error message is that i=
t gives me absolutely no clue where in the script this is happening.  I'm g=
uessing that our back-end server gets confused in some cases, but it's hard=
 to diagnose when I don't know what URL was being attempted, or where in th=
e script it was done.

> Second, you really don't want to keep fetching the DTDs like that. Does
>=20
> the XML you're actually trying to parse use external DTDs? If not, then
>=20
> you want to pass the NoLWP option to XML::Parser, so that it doesn't
>=20
> even try to fetch DTDs from the network. In the case of a public DTD
>=20
> like HTML the attempt to load it as a local file will fail, of course,
>=20
> but the parsing wasn't going to succeed anyway, because it wasn't XML.

That "NoLWP" option sounds useful, but it's somewhat moot here.

> However, I'm slightly confused here, because the XML::Parser
>=20
> documentation seems to say it doesn't parse external DTDs by default.
>=20
> It's possible I'm misunderstanding; I don't think I've used XML::Parser
>=20
> myself. Are you passing ParseParamEnt, and if so, why?

I don't know what "ParseParamEnt" is, so I imagine I'm not.

> Third, you probably don't want to be using XML::Parser at all. As you
>=20
> can see, it's old and rather cronky, and while it's extremely solid code
>=20
> it also takes a rather SGMLish approach to parsing XML. Most of the
>=20
> time, with modern XML use, DTDs are not used, and instead the XML just
>=20
> needs to be well-formed and properly namespaced. For this sort of thing
>=20
> (small documents) I would use XML::LibXML (which, incidentally, also
>=20
> includes a reasonable HTML parser); if a streaming model is more
>=20
> appropriate, either because your documents may be ridiculously large or
>=20
> simply because your program is structured that way, I would use one of
>=20
> the SAX modules.

The funny thing about searching in CPAN is that there are no packages (I'm =
guessing) that say "do not use this, use something better".  I'll take a lo=
ok at XML::LibXML to see what it does for me.

> Finally, fourth, I have no idea where that EINPROGRESS is coming from.
>=20
> That error is supposed to be returned if a socket is connected while in
>=20
> non-blocking mode, and the connection cannot be completed without
>=20
> blocking; it's basically the equivalent of EAGAIN for connect(). This
>=20
> means it shouldn't be possible to get that error without having asked
>=20
> for it by setting nonblocking mode on the socket, which LWP does not
>=20
> (normally) do.=20
>=20
>=20
>=20
> Are you doing something peculiar which might cause this to happen?
>=20
> Alternatively, it's possible this is some sort of Cygwin peculiarity,
>=20
> which unfortunately may be difficult to track down; if you can isolate
>=20
> the conditions where the error occurs it would be useful. (For instance,
>=20
> does it tend to occur when the network goes down? When the network is
>=20
> overloaded? When the DNS doesn't respond promptly?)

The script runs for perhaps 30-40 minutes, basically walking the entire dat=
a model of a REST api.  It sends hundreds of requests to the (load-balanced=
) service, some from multiple threads.  This kind of error happens several =
times during the run of the script, which means that the vast majority work=
 well enough.  I ended up putting a hack into my "sendGet" sub that just ch=
ecks for "DOCTYPE HTML" in the output and simply tries again, with a reason=
able limit of retries.  Almost all of the calls that detect this once or tw=
ice eventually get good data.


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3953
***************************************


home help back first fref pref prev next nref lref last post