[32471] in Perl-Users-Digest
Perl-Users Digest, Issue: 3736 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Jul 15 06:09:10 2012
Date: Sun, 15 Jul 2012 03:09:03 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sun, 15 Jul 2012 Volume: 11 Number: 3736
Today's topics:
Re: LibXML element->toString vs document->toString (Fergus McMenemie)
Re: LibXML element->toString vs document->toString (Fergus McMenemie)
Re: LibXML element->toString vs document->toString <ben@morrow.me.uk>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sat, 14 Jul 2012 13:22:16 +0100
From: fergus@twig-me-uk.not.here (Fergus McMenemie)
Subject: Re: LibXML element->toString vs document->toString
Message-Id: <1kn84wg.cfkah5td411cN%fergus@twig-me-uk.not.here>
Ben Morrow <ben@morrow.me.uk> wrote:
> > > What are you actually trying to find out?
> > I have to pass references to DOM objects around all over the
> > place. I find I am having to make use of either documentElement()
> > or ownerDocument() depending on what I am doing. I would like to have
> > a consistent "pattern" for doing this. I would like to setting on
> > passing the document object around but it is anoying that I cant then
> > use toString.
>
> I'm afraid I don't understand. When I run the original program I get the
> results I would have expected: the first prints the XML without the
> <?xml?>, the second prints it with it. What is going wrong for you?
Thanks for the tip. My code now reads:-
use strict;
use warnings;
use Encode;
use XML::LibXML;
binmode(STDOUT, ":utf8");
my $src= join("",<DATA>);
$src =~ s/\\x([0-9a-f][0-9a-f])/chr hex $1/egi;
$src = Encode::decode "utf8", $src;
print "LibXML VERSION=$XML::LibXML::VERSION\n";
print "string \$src is invalid \n" unless ( Encode::is_utf8($src,1) );
my $parser = XML::LibXML->new();
my $x = $parser->parse_string($src)->documentElement();
my $str=$x->toString(1);
print "$str\n";
print "string 1 is invalid \n" unless ( Encode::is_utf8($str,1) );
$x = $parser->parse_string($src);
$str=$x->toString(1);
print "$str\n";
print "string 2 is invalid \n" unless ( Encode::is_utf8($str,1) );
__DATA__
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<plugin
name="\xef\xbd\xb1\xef\xbd\xb2\xef\xbd\xb3\xef\xbd\xb4\xef\xbd\xb5"></pl
ugin>
And fails on my mac running OS X Snow Leopard. But the 'real' version is
running with perl 5.12 on centos and also fails there. No sure about the
version of LibXML.
Does it work for your?
------------------------------
Date: Sat, 14 Jul 2012 14:10:59 +0100
From: fergus@twig-me-uk.not.here (Fergus McMenemie)
Subject: Re: LibXML element->toString vs document->toString
Message-Id: <1kn85n0.cb2fw5uewmyoN%fergus@twig-me-uk.not.here>
Ben Morrow <ben@morrow.me.uk> wrote:
> Quoth fergus@twig-me-uk.not.here (Fergus McMenemie):
> > Ben Morrow <ben@morrow.me.uk> wrote:
> > > Quoth fergus@twig-me-uk.not.here (Fergus McMenemie):
£@¤
> > > > Hi, I have been driven mad by the following, which took ages to track
> > > > down. What is going on? I appears it is invalid to use toString on the
> > > > document object.
> > > >
> > > >
> > > > #! /usr/local/bin/perl -w
> > > > use strict;
> > > > use warnings;
> > > > use utf8;
> > > > use Encode;
> > > > use XML::LibXML;
> > > > binmode(STDOUT, ":utf8");
> > > >
> > > > my $src= join("",<DATA>);
> > > > print "string \$src is invalid \n" unless ( Encode::is_utf8($src,1) );
> > >
> > > Don't do that. Encode::is_utf8 checks the state of the SvUTF8 flag,
> > > which is internal to perl and none of your business. (The Encode
> > > documentation is not as clear about this as is might be, because it only
> > > became clear through experience that this is the only approach which
> > > works.)
> >
> > Agreed, the warnings are there. However it did appear to make the
> > issue clearer. This example is rather goofy and posting it to USEnet
> > added a few more wrinkles. My original code and the real program
> > contained the actual characters. However my USEnet reader would not
> > let me post the real chars. Hence the octets.
>
> It can certainly be difficult, given that Usenet officially doesn't
> support anything but ASCII. Unofficially, if you can get your newsreader
> to produce it, articles in UTF-8 with 'Content-type: text/plain;
> charset=UTF-8' seem to work perfectly well.
>
> Another thing you can do is explicitly decode the data in the program
> you post; possibly something like
>
> my $str = <DATA>;
> $str =~ s/%([0-9a-f][0-9a-f])/chr hex $1/egi;
> $str = Encode::decode "utf8", $str;
>
> This uses URL-encoding rather than backslashes; you can pick whatever is
> convenient for the data you are trying to post.
>
> > My issue is that document->toString does not appear to work. Please
> > ignore the use of us_utf8.
>
> OK.
>
> > > What are you actually trying to find out?
> > I have to pass references to DOM objects around all over the
> > place. I find I am having to make use of either documentElement()
> > or ownerDocument() depending on what I am doing. I would like to have
> > a consistent "pattern" for doing this. I would like to setting on
> > passing the document object around but it is anoying that I cant then
> > use toString.
>
> I'm afraid I don't understand. When I run the original program I get the
> results I would have expected: the first prints the XML without the
> <?xml?>, the second prints it with it. What is going wrong for you?
>
> Ben
------------------------------
Date: Sat, 14 Jul 2012 17:20:47 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: LibXML element->toString vs document->toString
Message-Id: <vri8d9-vup.ln1@anubis.morrow.me.uk>
Quoth fergus@twig-me-uk.not.here (Fergus McMenemie):
> Ben Morrow <ben@morrow.me.uk> wrote:
>
> > > > What are you actually trying to find out?
> > > I have to pass references to DOM objects around all over the
> > > place. I find I am having to make use of either documentElement()
> > > or ownerDocument() depending on what I am doing. I would like to have
> > > a consistent "pattern" for doing this. I would like to setting on
> > > passing the document object around but it is anoying that I cant then
> > > use toString.
> >
> > I'm afraid I don't understand. When I run the original program I get the
> > results I would have expected: the first prints the XML without the
> > <?xml?>, the second prints it with it. What is going wrong for you?
<snip>
>
> And fails on my mac running OS X Snow Leopard. But the 'real' version is
> running with perl 5.12 on centos and also fails there. No sure about the
> version of LibXML.
>
> Does it work for your?
Yes, it works as documented for me. Are you getting confused by the fact
that ->toString produces a byte string for whole documents, but a
character string for just an element? Read the 'ENCODINGS SUPPORT'
section in perldoc XML::LibXML: you don't want a :utf8 layer if you're
printing a whole document, because the document isn't necessarily in
UTF-8.
Ben
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3736
***************************************