[32641] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 3917 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Apr 7 05:17:27 2013

Date: Sun, 7 Apr 2013 02:17:09 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sun, 7 Apr 2013     Volume: 11 Number: 3917

Today's topics:
        "walk over," and XPath-based substitutions? <oneingray@gmail.com>
    Re: "walk over," and XPath-based substitutions? <droesler@comcast.net>
    Re: "walk over," and XPath-based substitutions? <keshlam.cat.nospam@verizon.net>
    Re: reporting bugs <oneingray@gmail.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sat, 06 Apr 2013 11:32:11 +0000
From: Ivan Shmakov <oneingray@gmail.com>
Subject: "walk over," and XPath-based substitutions?
Message-Id: <871uanx5c4.fsf@violet.siamics.net>

	[Cross-posting to news:comp.text.xml, yet omitting it from
	Followup-To:, for I'm primarily interested in Perl-based
	solutions.]

	Is there an easy way to invoke a particular code for each of XML
	nodes that satisfies an XPath expression out of a certain list?

	A simple-minded approach (based on XML::LibXML) could be like:

require XML::LibXML;

my %xpath_sub = {
    q {//node ()[@foo = "bar"]} => \&foo_bar,
    q {//node ()[@baz = "qux"]} => sub { baz ("qux", @_); }
};

foreach my $xpath (keys (%xpath_sub)) {
    my $sub
        = $xpath_sub{$xpath};
    foreach my $node ($context->findnodes ($xpath)) {
        $sub->($node);
    }
}

	However, AIUI, the code above implies that the XML tree is to be
	traversed multiple times.  Which could probably be avoided by
	traversing the tree explicitly, as in:

sub traverse {
    my ($node, $xsubs) = @_;
    foreach my $xpath (keys (%$xsubs)) {
        next
            unless ($node->find ($xpath));
        ## FIXME: check if the result is a boolean?
        $xsubs->{$xpath}->($node);
        ## FIXME: there, one may wish for a recursion; or not
    }
    ## recurse over the children
    foreach my $child ($node->childNodes ()) {
        traverse ($child, $xsubs);
    }
    ## .
}

	Still, it may repeatedly traverse the children of $node while
	computing ->find () for each of the XPath expressions.  (Unlike
	the way an "optimized," or "compiled," regular expression would
	be handled, IIUC.)

	The question is: does LibXML (or some other library) provide a
	way to make such a task both simpler to code and more efficient
	on execution?

	... Or do I "optimize" all the XPath expressions themselves into
	a single one somehow?

	TIA.

-- 
FSF associate member #7257	http://hfday.org/


------------------------------

Date: Sat, 06 Apr 2013 09:01:55 -0600
From: Dennis <droesler@comcast.net>
Subject: Re: "walk over," and XPath-based substitutions?
Message-Id: <kjpdcf$fuj$1@speranza.aioe.org>

On 4/6/2013 5:32 AM, Ivan Shmakov wrote:
> 	[Cross-posting to news:comp.text.xml, yet omitting it from
> 	Followup-To:, for I'm primarily interested in Perl-based
> 	solutions.]
>
> 	Is there an easy way to invoke a particular code for each of XML
> 	nodes that satisfies an XPath expression out of a certain list?
>
> 	A simple-minded approach (based on XML::LibXML) could be like:
>
> require XML::LibXML;

XML::Twig http://search.cpan.org/~mirod/XML-Twig-3.42/Twig.pm
may be useful.

Dennis



------------------------------

Date: Sat, 06 Apr 2013 12:11:40 -0400
From: Joe Kesselman <keshlam.cat.nospam@verizon.net>
Subject: Re: "walk over," and XPath-based substitutions?
Message-Id: <kjphb3$4uo$1@dont-email.me>

On 4/6/2013 7:32 AM, Ivan Shmakov wrote:
 ...
> 	However, AIUI, the code above implies that the XML tree is to be
> 	traversed multiple times.

First off, I'd suggest that you consider XSLT or XQuery, which are 
specifically designed for this kind of find-and-process operation.

What you're looking for is a "streaming processor" -- one which rewrites 
the complete set of operations into a state machine which can produce 
its results in a single pass over the nodes. There are XPath/XSLT/XQuery 
systems which attempt to do this for a subset of the query language -- I 
think Xerces and the IBM XML parser have streaming-subset XPath 
evaluators, and I know the DataPower "xml appliance" machines have some 
limited XSLT streaming capability -- but even as subsets, those are 
fairly rare, and while they may be able to reduce storage by not keeping 
the entire document model in memory they may not reduce computational 
load. If you're looking for something off-the-shelf, that's where I'd start.

A _good_ general solution for matching multiple paths in a single pass 
over the document is NOT easy to create. You need to create a state 
machine which tracks what has been seen so far and detects which nodes 
match which expression, and at the same time you want to constrain the 
tree walk so you don't waste time exploring trees which provably can't 
contribute nodes to those results. Getting all those details right even 
for the subset approach can be complicated. Reassembling the individual 
results in the correct order to produce the intended result document 
further complicates the process.

(I'm one of the authors of a patent on that topic, actually -- US 
8,120,789 B2 -- but unfortunately our group didn't get the funding to 
finish a product-quality implementation of that logic so it isn't 
available for use. If someone wants to license the patent, I'm sure IBM 
would be delighted to talk to you...)





-- 
Joe Kesselman, 
http://www.love-song-productions.com/people/keshlam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail!  | "Put down the squeezebox & nobody gets hurt."


------------------------------

Date: Sat, 06 Apr 2013 13:50:27 +0000
From: Ivan Shmakov <oneingray@gmail.com>
Subject: Re: reporting bugs
Message-Id: <8761zzvkd8.fsf@violet.siamics.net>

>>>>> Ben Morrow <ben@morrow.me.uk> writes:
>>>>> Quoth Ivan Shmakov <oneingray@gmail.com>:

 > [alt.barcodes removed, since this is about Perl process]

	(Not necessarily so.)

[...]

 >> BTW, there's a longstanding bug filed at the CPAN RT [2] (along with
 >> a patch.)  However, it appears to be filed against libwww-perl,
 >> while it actually belongs to Net-HTTP.

 >> The question is: how do I reassign it?

 >> [2] https://rt.cpan.org/Public/Bug/Display.html?id=29468

 > You can't; in fact, it looks like the way rt.cpan.org is set up noone
 > can move a ticket from one queue to another.  The best you could do
 > is file a separate bug against Net-HTTP, referencing the LWP bug; but
 > since both dists are maintained by Gisle Aas I'm not sure there'd be
 > much point.

	... Which only makes it more surprising that it wasn't already
	dealt with.  (Especially given the simplicity of the patch.)

[...]

 >> * the issue may indeed be specific to the distribution's build;
 >> (naturally, building from the upstream sources for every bug being I
 >> report just to check that it wasn't introduced by the packagers is
 >> hardly an option.)

 > Obviously you have a different approach from me.  I would consider
 > building the latest upstream release from source, and probably the
 > latest upstream equivalent of CVS HEAD, a basic prerequisite for
 > reporting a bug.  After all, it's almost certainly the first thing
 > you'll be asked to do in any case, and a patch which doesn't apply to
 > HEAD is probably nearly worthless.

	Depending on the goals, it may or may not make sense to ever get
	involved with the latest development version.

	For instance, I'm occasionally employed by a local university,
	to carry over certain computer-related courses (mostly
	short-term.)  Should I discover an issue while preparing for
	them, I'm most likely to report it to the developers.  However,
	distracting myself to write a patch -- which is unlikely to be
	incorporated into the distribution I'll use (and recommend to
	the students) by the time the courses will start -- may bring no
	good to the courses themselves.  In this case, clearly
	documenting the issue and providing a work-around for the
	students to use may constitute a better solution.

	Similarly, while maintaining a few hosts under my
	responsibility, I'd try to stick to the distribution-provided
	software whenever possible, preferably the "stable" branch.
	Given that patches other than security fixes won't generally be
	accepted into Debian "stable," and that there're typically a
	couple of years between releases...

	Yet, indeed, I've made a few contributions to some Git HEADs.
	(Most recently libtasn1, IIRC.)

 > I suppose that in principle 'I'm using a distro; I'm paying them (or
 > not) to sort out whose bug it is and get it fixed upstream' ought to
 > be a reasonable argument, but in practice distros tend to be
 > extremely unreliable about sending bugs upstream, probably because
 > they have had their own share of flaky upstreams to deal with.

	The best thing about Debian is that it's a community-based
	project.  (Which was the reason for me to choose it in the first
	place.)  Basically, the only privileges that the Debian
	Developer status conveys are: to upload, and to vote.

	Essentially, anyone (careful enough not to disrupt the
	established order) is welcome to do this (or any other, for that
	matter) part of the job.  Why, (taking a glance over the latest
	upstream stable releases) I've just forwarded Debian Bug#700617
	and #700618 to CPAN RT#84467 and #84468, respectively.

	(Hopefully, I did the thing right; this time.)

 >> Alas, even for the Perl modules, the CPAN RT is not always the
 >> preferred but tracker.  Consider, e. g.:

 >> --cut: https://rt.cpan.org/Public/Bug/Display.html?id=79999 --

 >> Please report issues via github at
 >> https://github.com/gbarr/perl-Convert-ASN1/issues

 >> --cut: https://rt.cpan.org/Public/Bug/Display.html?id=79999 --

 > There are fields in META.{yml,json} which let a CPAN dist indicate
 > where its preferred bugtracker is.

	Indeed, these are set correctly in the current META.json.

 > search.cpan.org will honour these fields if they are present, so the
 > 'View/Report Bugs' link on the page for Convert-ASN1 will take you to
 > that github bugtracker.  I don't believe there is currently any
 > support for forwarding the bug-*@rt.cpan.org emails, though; this is
 > at least in part because modules often outlive their original
 > authors, and having somewhere to track bugs once the author has
 > disappeared is useful.

	My point is that GitHubs come and go, but the code remains.
	Certainly, I'd prefer a service that could be easily "cloned,"
	such as a Usenet newsgroup, a Git archive, or similar.

	The Perl-based App::SD was intended to be just such a system.
	Alas, it has seen virtually no development from mid-2011 to
	late-2012.  The situation seem to be slowly improving, though.

-- 
FSF associate member #7257	http://hfday.org/


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3917
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[32641] in Perl-Users-Digest

Perl-Users Digest, Issue: 3917 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Sun Apr 7 05:17:27 2013

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Apr 7 05:17:27 2013