[32471] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3736 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Jul 15 06:09:10 2012

Date: Sun, 15 Jul 2012 03:09:03 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sun, 15 Jul 2012     Volume: 11 Number: 3736

Today's topics:
    Re: LibXML element->toString vs document->toString (Fergus McMenemie)
    Re: LibXML element->toString vs document->toString (Fergus McMenemie)
    Re: LibXML element->toString vs document->toString <ben@morrow.me.uk>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sat, 14 Jul 2012 13:22:16 +0100
From: fergus@twig-me-uk.not.here (Fergus McMenemie)
Subject: Re: LibXML element->toString vs document->toString
Message-Id: <1kn84wg.cfkah5td411cN%fergus@twig-me-uk.not.here>

Ben Morrow <ben@morrow.me.uk> wrote:

> > > What are you actually trying to find out?
> > I have to pass references to DOM objects around all over the
> > place. I find I am having to make use of either documentElement()
> > or ownerDocument() depending on what I am doing. I would like to have
> > a consistent "pattern" for doing this. I would like to setting on
> > passing the document object around but it is anoying that I cant then
> > use toString.
> 
> I'm afraid I don't understand. When I run the original program I get the
> results I would have expected: the first prints the XML without the
> <?xml?>, the second prints it with it. What is going wrong for you?

Thanks for the tip. My code now reads:-

use strict;
use warnings;
use Encode;
use XML::LibXML;
binmode(STDOUT, ":utf8");

 my $src= join("",<DATA>);
    $src =~ s/\\x([0-9a-f][0-9a-f])/chr hex $1/egi;
    $src = Encode::decode "utf8", $src;
 print "LibXML VERSION=$XML::LibXML::VERSION\n";
 print "string \$src is invalid \n" unless ( Encode::is_utf8($src,1) );
 my $parser = XML::LibXML->new();
 my $x = $parser->parse_string($src)->documentElement();
 my $str=$x->toString(1);
 print "$str\n";
 print "string 1 is invalid \n" unless ( Encode::is_utf8($str,1) );

 $x = $parser->parse_string($src);
 $str=$x->toString(1);
 print "$str\n";
 print "string 2 is invalid \n" unless ( Encode::is_utf8($str,1) );

__DATA__
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<plugin
name="\xef\xbd\xb1\xef\xbd\xb2\xef\xbd\xb3\xef\xbd\xb4\xef\xbd\xb5"></pl
ugin>


And fails on my mac running OS X Snow Leopard. But the 'real' version is
running with perl 5.12 on centos and also fails there. No sure about the
version of LibXML.

Does it work for your?



------------------------------

Date: Sat, 14 Jul 2012 14:10:59 +0100
From: fergus@twig-me-uk.not.here (Fergus McMenemie)
Subject: Re: LibXML element->toString vs document->toString
Message-Id: <1kn85n0.cb2fw5uewmyoN%fergus@twig-me-uk.not.here>

Ben Morrow <ben@morrow.me.uk> wrote:

> Quoth fergus@twig-me-uk.not.here (Fergus McMenemie):
> > Ben Morrow <ben@morrow.me.uk> wrote:
> > > Quoth fergus@twig-me-uk.not.here (Fergus McMenemie):
£@¤
> > > > Hi, I have been driven mad by the following, which took ages to track
> > > > down. What is going on? I appears it is invalid to use toString on the
> > > > document object.
> > > > 
> > > > 
> > > > #! /usr/local/bin/perl -w
> > > > use strict;
> > > > use warnings;
> > > > use utf8;
> > > > use Encode;
> > > > use XML::LibXML;
> > > > binmode(STDOUT, ":utf8");
> > > > 
> > > >  my $src= join("",<DATA>);
> > > >  print "string \$src is invalid \n" unless ( Encode::is_utf8($src,1) );
> > > 
> > > Don't do that. Encode::is_utf8 checks the state of the SvUTF8 flag,
> > > which is internal to perl and none of your business. (The Encode
> > > documentation is not as clear about this as is might be, because it only
> > > became clear through experience that this is the only approach which
> > > works.)
> > 
> > Agreed, the warnings are there. However it did appear to make the
> > issue clearer. This example is rather goofy and posting it to USEnet
> > added a few more wrinkles. My original code and the real program
> > contained the actual characters. However my USEnet reader would not
> > let me post the real chars. Hence the octets.
> 
> It can certainly be difficult, given that Usenet officially doesn't
> support anything but ASCII. Unofficially, if you can get your newsreader
> to produce it, articles in UTF-8 with 'Content-type: text/plain;
> charset=UTF-8' seem to work perfectly well.
> 
> Another thing you can do is explicitly decode the data in the program
> you post; possibly something like
> 
>     my $str = <DATA>;
>     $str =~ s/%([0-9a-f][0-9a-f])/chr hex $1/egi;
>     $str = Encode::decode "utf8", $str;
> 
> This uses URL-encoding rather than backslashes; you can pick whatever is
> convenient for the data you are trying to post.
> 
> > My issue is that document->toString does not appear to work. Please
> > ignore the use of us_utf8.
> 
> OK.
> 
> > > What are you actually trying to find out?
> > I have to pass references to DOM objects around all over the
> > place. I find I am having to make use of either documentElement()
> > or ownerDocument() depending on what I am doing. I would like to have
> > a consistent "pattern" for doing this. I would like to setting on
> > passing the document object around but it is anoying that I cant then
> > use toString.
> 
> I'm afraid I don't understand. When I run the original program I get the
> results I would have expected: the first prints the XML without the
> <?xml?>, the second prints it with it. What is going wrong for you?
> 
> Ben


------------------------------

Date: Sat, 14 Jul 2012 17:20:47 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: LibXML element->toString vs document->toString
Message-Id: <vri8d9-vup.ln1@anubis.morrow.me.uk>


Quoth fergus@twig-me-uk.not.here (Fergus McMenemie):
> Ben Morrow <ben@morrow.me.uk> wrote:
> 
> > > > What are you actually trying to find out?
> > > I have to pass references to DOM objects around all over the
> > > place. I find I am having to make use of either documentElement()
> > > or ownerDocument() depending on what I am doing. I would like to have
> > > a consistent "pattern" for doing this. I would like to setting on
> > > passing the document object around but it is anoying that I cant then
> > > use toString.
> > 
> > I'm afraid I don't understand. When I run the original program I get the
> > results I would have expected: the first prints the XML without the
> > <?xml?>, the second prints it with it. What is going wrong for you?
<snip>
> 
> And fails on my mac running OS X Snow Leopard. But the 'real' version is
> running with perl 5.12 on centos and also fails there. No sure about the
> version of LibXML.
> 
> Does it work for your?

Yes, it works as documented for me. Are you getting confused by the fact
that ->toString produces a byte string for whole documents, but a
character string for just an element? Read the 'ENCODINGS SUPPORT'
section in perldoc XML::LibXML: you don't want a :utf8 layer if you're
printing a whole document, because the document isn't necessarily in
UTF-8.

Ben



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3736
***************************************


home help back first fref pref prev next nref lref last post