[32641] in Perl-Users-Digest
Perl-Users Digest, Issue: 3917 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Apr 7 05:17:27 2013
Date: Sun, 7 Apr 2013 02:17:09 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sun, 7 Apr 2013 Volume: 11 Number: 3917
Today's topics:
"walk over," and XPath-based substitutions? <oneingray@gmail.com>
Re: "walk over," and XPath-based substitutions? <droesler@comcast.net>
Re: "walk over," and XPath-based substitutions? <keshlam.cat.nospam@verizon.net>
Re: reporting bugs <oneingray@gmail.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sat, 06 Apr 2013 11:32:11 +0000
From: Ivan Shmakov <oneingray@gmail.com>
Subject: "walk over," and XPath-based substitutions?
Message-Id: <871uanx5c4.fsf@violet.siamics.net>
[Cross-posting to news:comp.text.xml, yet omitting it from
Followup-To:, for I'm primarily interested in Perl-based
solutions.]
Is there an easy way to invoke a particular code for each of XML
nodes that satisfies an XPath expression out of a certain list?
A simple-minded approach (based on XML::LibXML) could be like:
require XML::LibXML;
my %xpath_sub = {
q {//node ()[@foo = "bar"]} => \&foo_bar,
q {//node ()[@baz = "qux"]} => sub { baz ("qux", @_); }
};
foreach my $xpath (keys (%xpath_sub)) {
my $sub
= $xpath_sub{$xpath};
foreach my $node ($context->findnodes ($xpath)) {
$sub->($node);
}
}
However, AIUI, the code above implies that the XML tree is to be
traversed multiple times. Which could probably be avoided by
traversing the tree explicitly, as in:
sub traverse {
my ($node, $xsubs) = @_;
foreach my $xpath (keys (%$xsubs)) {
next
unless ($node->find ($xpath));
## FIXME: check if the result is a boolean?
$xsubs->{$xpath}->($node);
## FIXME: there, one may wish for a recursion; or not
}
## recurse over the children
foreach my $child ($node->childNodes ()) {
traverse ($child, $xsubs);
}
## .
}
Still, it may repeatedly traverse the children of $node while
computing ->find () for each of the XPath expressions. (Unlike
the way an "optimized," or "compiled," regular expression would
be handled, IIUC.)
The question is: does LibXML (or some other library) provide a
way to make such a task both simpler to code and more efficient
on execution?
... Or do I "optimize" all the XPath expressions themselves into
a single one somehow?
TIA.
--
FSF associate member #7257 http://hfday.org/
------------------------------
Date: Sat, 06 Apr 2013 09:01:55 -0600
From: Dennis <droesler@comcast.net>
Subject: Re: "walk over," and XPath-based substitutions?
Message-Id: <kjpdcf$fuj$1@speranza.aioe.org>
On 4/6/2013 5:32 AM, Ivan Shmakov wrote:
> [Cross-posting to news:comp.text.xml, yet omitting it from
> Followup-To:, for I'm primarily interested in Perl-based
> solutions.]
>
> Is there an easy way to invoke a particular code for each of XML
> nodes that satisfies an XPath expression out of a certain list?
>
> A simple-minded approach (based on XML::LibXML) could be like:
>
> require XML::LibXML;
XML::Twig http://search.cpan.org/~mirod/XML-Twig-3.42/Twig.pm
may be useful.
Dennis
------------------------------
Date: Sat, 06 Apr 2013 12:11:40 -0400
From: Joe Kesselman <keshlam.cat.nospam@verizon.net>
Subject: Re: "walk over," and XPath-based substitutions?
Message-Id: <kjphb3$4uo$1@dont-email.me>
On 4/6/2013 7:32 AM, Ivan Shmakov wrote:
...
> However, AIUI, the code above implies that the XML tree is to be
> traversed multiple times.
First off, I'd suggest that you consider XSLT or XQuery, which are
specifically designed for this kind of find-and-process operation.
What you're looking for is a "streaming processor" -- one which rewrites
the complete set of operations into a state machine which can produce
its results in a single pass over the nodes. There are XPath/XSLT/XQuery
systems which attempt to do this for a subset of the query language -- I
think Xerces and the IBM XML parser have streaming-subset XPath
evaluators, and I know the DataPower "xml appliance" machines have some
limited XSLT streaming capability -- but even as subsets, those are
fairly rare, and while they may be able to reduce storage by not keeping
the entire document model in memory they may not reduce computational
load. If you're looking for something off-the-shelf, that's where I'd start.
A _good_ general solution for matching multiple paths in a single pass
over the document is NOT easy to create. You need to create a state
machine which tracks what has been seen so far and detects which nodes
match which expression, and at the same time you want to constrain the
tree walk so you don't waste time exploring trees which provably can't
contribute nodes to those results. Getting all those details right even
for the subset approach can be complicated. Reassembling the individual
results in the correct order to produce the intended result document
further complicates the process.
(I'm one of the authors of a patent on that topic, actually -- US
8,120,789 B2 -- but unfortunately our group didn't get the funding to
finish a product-quality implementation of that logic so it isn't
available for use. If someone wants to license the patent, I'm sure IBM
would be delighted to talk to you...)
--
Joe Kesselman,
http://www.love-song-productions.com/people/keshlam/index.html
{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
------------------------------
Date: Sat, 06 Apr 2013 13:50:27 +0000
From: Ivan Shmakov <oneingray@gmail.com>
Subject: Re: reporting bugs
Message-Id: <8761zzvkd8.fsf@violet.siamics.net>
>>>>> Ben Morrow <ben@morrow.me.uk> writes:
>>>>> Quoth Ivan Shmakov <oneingray@gmail.com>:
> [alt.barcodes removed, since this is about Perl process]
(Not necessarily so.)
[...]
>> BTW, there's a longstanding bug filed at the CPAN RT [2] (along with
>> a patch.) However, it appears to be filed against libwww-perl,
>> while it actually belongs to Net-HTTP.
>> The question is: how do I reassign it?
>> [2] https://rt.cpan.org/Public/Bug/Display.html?id=29468
> You can't; in fact, it looks like the way rt.cpan.org is set up noone
> can move a ticket from one queue to another. The best you could do
> is file a separate bug against Net-HTTP, referencing the LWP bug; but
> since both dists are maintained by Gisle Aas I'm not sure there'd be
> much point.
... Which only makes it more surprising that it wasn't already
dealt with. (Especially given the simplicity of the patch.)
[...]
>> * the issue may indeed be specific to the distribution's build;
>> (naturally, building from the upstream sources for every bug being I
>> report just to check that it wasn't introduced by the packagers is
>> hardly an option.)
> Obviously you have a different approach from me. I would consider
> building the latest upstream release from source, and probably the
> latest upstream equivalent of CVS HEAD, a basic prerequisite for
> reporting a bug. After all, it's almost certainly the first thing
> you'll be asked to do in any case, and a patch which doesn't apply to
> HEAD is probably nearly worthless.
Depending on the goals, it may or may not make sense to ever get
involved with the latest development version.
For instance, I'm occasionally employed by a local university,
to carry over certain computer-related courses (mostly
short-term.) Should I discover an issue while preparing for
them, I'm most likely to report it to the developers. However,
distracting myself to write a patch -- which is unlikely to be
incorporated into the distribution I'll use (and recommend to
the students) by the time the courses will start -- may bring no
good to the courses themselves. In this case, clearly
documenting the issue and providing a work-around for the
students to use may constitute a better solution.
Similarly, while maintaining a few hosts under my
responsibility, I'd try to stick to the distribution-provided
software whenever possible, preferably the "stable" branch.
Given that patches other than security fixes won't generally be
accepted into Debian "stable," and that there're typically a
couple of years between releases...
Yet, indeed, I've made a few contributions to some Git HEADs.
(Most recently libtasn1, IIRC.)
> I suppose that in principle 'I'm using a distro; I'm paying them (or
> not) to sort out whose bug it is and get it fixed upstream' ought to
> be a reasonable argument, but in practice distros tend to be
> extremely unreliable about sending bugs upstream, probably because
> they have had their own share of flaky upstreams to deal with.
The best thing about Debian is that it's a community-based
project. (Which was the reason for me to choose it in the first
place.) Basically, the only privileges that the Debian
Developer status conveys are: to upload, and to vote.
Essentially, anyone (careful enough not to disrupt the
established order) is welcome to do this (or any other, for that
matter) part of the job. Why, (taking a glance over the latest
upstream stable releases) I've just forwarded Debian Bug#700617
and #700618 to CPAN RT#84467 and #84468, respectively.
(Hopefully, I did the thing right; this time.)
>> Alas, even for the Perl modules, the CPAN RT is not always the
>> preferred but tracker. Consider, e. g.:
>> --cut: https://rt.cpan.org/Public/Bug/Display.html?id=79999 --
>> Please report issues via github at
>> https://github.com/gbarr/perl-Convert-ASN1/issues
>> --cut: https://rt.cpan.org/Public/Bug/Display.html?id=79999 --
> There are fields in META.{yml,json} which let a CPAN dist indicate
> where its preferred bugtracker is.
Indeed, these are set correctly in the current META.json.
> search.cpan.org will honour these fields if they are present, so the
> 'View/Report Bugs' link on the page for Convert-ASN1 will take you to
> that github bugtracker. I don't believe there is currently any
> support for forwarding the bug-*@rt.cpan.org emails, though; this is
> at least in part because modules often outlive their original
> authors, and having somewhere to track bugs once the author has
> disappeared is useful.
My point is that GitHubs come and go, but the code remains.
Certainly, I'd prefer a service that could be easily "cloned,"
such as a Usenet newsgroup, a Git archive, or similar.
The Perl-based App::SD was intended to be just such a system.
Alas, it has seen virtually no development from mid-2011 to
late-2012. The situation seem to be slowly improving, though.
--
FSF associate member #7257 http://hfday.org/
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3917
***************************************