[32470] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3735 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Jul 14 06:09:24 2012

Date: Sat, 14 Jul 2012 03:09:13 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sat, 14 Jul 2012     Volume: 11 Number: 3735

Today's topics:
        LibXML element->toString vs document->toString (Fergus McMenemie)
    Re: LibXML element->toString vs document->toString <ben@morrow.me.uk>
    Re: LibXML element->toString vs document->toString (Fergus McMenemie)
    Re: LibXML element->toString vs document->toString <ben@morrow.me.uk>
    Re: Output buffering on IIS7 brillisoft@gmail.com
        Regex: match double OR single quote <jwcarlton@gmail.com>
    Re: Regex: match double OR single quote <ben@morrow.me.uk>
    Re: Regex: match double OR single quote <jwcarlton@gmail.com>
    Re: Regex: match double OR single quote <ben@morrow.me.uk>
    Re: Traversing folders <hansmu@xs4all.nl>
    Re: Traversing folders <rweikusat@mssgmbh.com>
    Re: Traversing folders <news@lawshouse.org>
    Re: Traversing folders <dirk.devos@usa.net>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 12 Jul 2012 05:46:54 +0100
From: fergus@twig-me-uk.not.here (Fergus McMenemie)
Subject: LibXML element->toString vs document->toString
Message-Id: <1kn3bmi.1oxspf51518dzeN%fergus@twig-me-uk.not.here>

Hi, I have been driven mad by the following, which took ages to track
down. What is going on? I appears it is invalid to use toString on the
document object.


#! /usr/local/bin/perl -w
use strict;
use warnings;
use utf8;
use Encode;
use XML::LibXML;
binmode(STDOUT, ":utf8");

 my $src= join("",<DATA>);
 print "string \$src is invalid \n" unless ( Encode::is_utf8($src,1) );
 my $parser = XML::LibXML->new();
 my $x = $parser->parse_string($src)->documentElement();
 my $str=$x->toString(1);
 print "$str\n";
 print "string 1 is invalid \n" unless ( Encode::is_utf8($str,1) );

 $x = $parser->parse_string($src);
 $str=$x->toString(1);
 print "$str\n";
 print "string 2 is invalid \n" unless ( Encode::is_utf8($str,1) );

__DATA__
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<plugin name="\xc5\x81"></plugin>


------------------------------

Date: Thu, 12 Jul 2012 07:29:26 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: LibXML element->toString vs document->toString
Message-Id: <6f72d9-jtr1.ln1@anubis.morrow.me.uk>


Quoth fergus@twig-me-uk.not.here (Fergus McMenemie):
> Hi, I have been driven mad by the following, which took ages to track
> down. What is going on? I appears it is invalid to use toString on the
> document object.
> 
> 
> #! /usr/local/bin/perl -w
> use strict;
> use warnings;
> use utf8;
> use Encode;
> use XML::LibXML;
> binmode(STDOUT, ":utf8");
> 
>  my $src= join("",<DATA>);
>  print "string \$src is invalid \n" unless ( Encode::is_utf8($src,1) );

Don't do that. Encode::is_utf8 checks the state of the SvUTF8 flag,
which is internal to perl and none of your business. (The Encode
documentation is not as clear about this as is might be, because it only
became clear through experience that this is the only approach which
works.)

What are you actually trying to find out?

Ben



------------------------------

Date: Fri, 13 Jul 2012 16:59:03 +0100
From: fergus@twig-me-uk.not.here (Fergus McMenemie)
Subject: Re: LibXML element->toString vs document->toString
Message-Id: <1kn5vzn.1ek9bkp1kx6liyN%fergus@twig-me-uk.not.here>

Ben Morrow <ben@morrow.me.uk> wrote:

> Quoth fergus@twig-me-uk.not.here (Fergus McMenemie):
> > Hi, I have been driven mad by the following, which took ages to track
> > down. What is going on? I appears it is invalid to use toString on the
> > document object.
> > 
> > 
> > #! /usr/local/bin/perl -w
> > use strict;
> > use warnings;
> > use utf8;
> > use Encode;
> > use XML::LibXML;
> > binmode(STDOUT, ":utf8");
> > 
> >  my $src= join("",<DATA>);
> >  print "string \$src is invalid \n" unless ( Encode::is_utf8($src,1) );
> 
> Don't do that. Encode::is_utf8 checks the state of the SvUTF8 flag,
> which is internal to perl and none of your business. (The Encode
> documentation is not as clear about this as is might be, because it only
> became clear through experience that this is the only approach which
> works.)

Agreed, the warnings are there. However it did appear to make the
issue clearer. This example is rather goofy and posting it to USEnet
added a few more wrinkles. My original code and the real program
contained the actual characters. However my USEnet reader would not
let me post the real chars. Hence the octets.

My issue is that document->toString does not appear to work. Please
ignore the use of us_utf8.

> What are you actually trying to find out?
I have to pass references to DOM objects around all over the
place. I find I am having to make use of either documentElement()
or ownerDocument() depending on what I am doing. I would like to have
a consistent "pattern" for doing this. I would like to setting on
passing the document object around but it is anoying that I cant then
use toString.


------------------------------

Date: Fri, 13 Jul 2012 17:51:33 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: LibXML element->toString vs document->toString
Message-Id: <l906d9-niu2.ln1@anubis.morrow.me.uk>


Quoth fergus@twig-me-uk.not.here (Fergus McMenemie):
> Ben Morrow <ben@morrow.me.uk> wrote:
> > Quoth fergus@twig-me-uk.not.here (Fergus McMenemie):
> > > Hi, I have been driven mad by the following, which took ages to track
> > > down. What is going on? I appears it is invalid to use toString on the
> > > document object.
> > > 
> > > 
> > > #! /usr/local/bin/perl -w
> > > use strict;
> > > use warnings;
> > > use utf8;
> > > use Encode;
> > > use XML::LibXML;
> > > binmode(STDOUT, ":utf8");
> > > 
> > >  my $src= join("",<DATA>);
> > >  print "string \$src is invalid \n" unless ( Encode::is_utf8($src,1) );
> > 
> > Don't do that. Encode::is_utf8 checks the state of the SvUTF8 flag,
> > which is internal to perl and none of your business. (The Encode
> > documentation is not as clear about this as is might be, because it only
> > became clear through experience that this is the only approach which
> > works.)
> 
> Agreed, the warnings are there. However it did appear to make the
> issue clearer. This example is rather goofy and posting it to USEnet
> added a few more wrinkles. My original code and the real program
> contained the actual characters. However my USEnet reader would not
> let me post the real chars. Hence the octets.

It can certainly be difficult, given that Usenet officially doesn't
support anything but ASCII. Unofficially, if you can get your newsreader
to produce it, articles in UTF-8 with 'Content-type: text/plain;
charset=UTF-8' seem to work perfectly well.

Another thing you can do is explicitly decode the data in the program
you post; possibly something like

    my $str = <DATA>;
    $str =~ s/%([0-9a-f][0-9a-f])/chr hex $1/egi;
    $str = Encode::decode "utf8", $str;

This uses URL-encoding rather than backslashes; you can pick whatever is
convenient for the data you are trying to post.

> My issue is that document->toString does not appear to work. Please
> ignore the use of us_utf8.

OK.

> > What are you actually trying to find out?
> I have to pass references to DOM objects around all over the
> place. I find I am having to make use of either documentElement()
> or ownerDocument() depending on what I am doing. I would like to have
> a consistent "pattern" for doing this. I would like to setting on
> passing the document object around but it is anoying that I cant then
> use toString.

I'm afraid I don't understand. When I run the original program I get the
results I would have expected: the first prints the XML without the
<?xml?>, the second prints it with it. What is going wrong for you?

Ben



------------------------------

Date: Thu, 12 Jul 2012 00:32:10 -0700 (PDT)
From: brillisoft@gmail.com
Subject: Re: Output buffering on IIS7
Message-Id: <90043e6a-b3b4-41fc-8a6b-f32321d9a94f@googlegroups.com>

Yes Ben, that's a popular and rather justified opinion about IIS.
Yet, one has to work with some corporate restrictions and live together with other developers on same team, plus inertia of things that worked for long time before - thus IIS.

Meanwhile, I found and fixed the problem.
Things changed from IIS 5.x to IIS 7.x, and now there is a web.config file collocated with the scripts there .. popped up there by some magic.

One needs to find that web.config file inside the folder where the script sits, edit it, and at the end of the tag containing the handler for *.pl or *.cgi, add a tag for responseBufferLimit="0", to achieve something like this: 

<add name="Perl CGI for .cgi" path="*.cgi" verb="*" modules="CgiModule" scriptProcessor="C:\Perl64\bin\perl.exe &quot;%s&quot; %s" resourceType="Unspecified" requireAccess="Script" preCondition="bitness64" responseBufferLimit="0"/> 

<add name="Perl CGI for .pl" path="*.pl" verb="*" modules="CgiModule" scriptProcessor="C:\Perl64\bin\perl.exe &quot;%s&quot; %s" resourceType="Unspecified" requireAccess="Script" responseBufferLimit="0"/> 


PHP has a different solution, which I still did not find. 

That's how toy Web Servers work.


------------------------------

Date: Thu, 12 Jul 2012 15:12:33 -0700 (PDT)
From: Jason C <jwcarlton@gmail.com>
Subject: Regex: match double OR single quote
Message-Id: <82fbf8b1-e7f3-4825-8f91-b1d4ab2f96ce@googlegroups.com>

I'm struggling with what I thought was a simple thing, and I'm hoping you guys can help.

I have a string that may contain a ", ', or neither. So, I wrote this in the regex:

["|']*

But this doesn't match anything.

Here's the complete code:

# $text comes from a form, so this is just a sample
$text = <<EOF;
<img src="<a href='http://www.example.com/whatever.jpg'
target='_new'>
http://www.example.com/whatever.jpg</a>"
width="300" height="300" border="0">
EOF

# Regex; line breaks added here for the sake of reading
$text =~ s/<img(.*?)src=
["|']*\s*<a.*? href=
["|']*\s*(.*?)
["|']*.*?>(.*?)<\/a>
["|']*(.*?)>
/<img src="$2"$1$4>/gsi;

If I change ["|']* to whatever I have hard coded, then it works fine, so I know the issue is with that pattern. So how do I correctly match them?


------------------------------

Date: Fri, 13 Jul 2012 00:46:04 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Regex: match double OR single quote
Message-Id: <s644d9-d2c2.ln1@anubis.morrow.me.uk>


Quoth Jason C <jwcarlton@gmail.com>:
> I'm struggling with what I thought was a simple thing, and I'm hoping
> you guys can help.
> 
> I have a string that may contain a ", ', or neither. So, I wrote this in
> the regex:
> 
> ["|']*

You don't use | like that inside a character class. None of the normal
regex special characters have their special meanings, and the class
matches any one of the characters listed, so that class will match any
of ", ', or |.

You also probably don't want that *. AFAICS you want to match exactly
one quote, of either type, so you just want ["']. 

(If you wanted to get fancy you could insist on matching quotes using \1
backreferences, but you may not think it's worth it.)

> But this doesn't match anything.
> 
> Here's the complete code:
> 
> # $text comes from a form, so this is just a sample
> $text = <<EOF;
> <img src="<a href='http://www.example.com/whatever.jpg'
> target='_new'>
> http://www.example.com/whatever.jpg</a>"
> width="300" height="300" border="0">
> EOF
> 
> # Regex; line breaks added here for the sake of reading

If you use /x you can do this in your real source too, though you will
need to remember to escape spaces when you do want them to match
literally.

> $text =~ s/<img(.*?)src=
> ["|']*\s*<a.*? href=
> ["|']*\s*(.*?)
> ["|']*.*?>(.*?)<\/a>
> ["|']*(.*?)>
> /<img src="$2"$1$4>/gsi;
> 
> If I change ["|']* to whatever I have hard coded, then it works fine, so
> I know the issue is with that pattern. So how do I correctly match them?

When I try this (after habing removed the line breaks) it does match
*something*, just not what you wanted it to match. $text ends up as

    <img src="" 
    width="300" height="300" border="0">

which is happening because the second uncaptured .*? is picking up all
the text you wanted to get in $2. Everything between the 'href=' and the
'>' is *ed, so it can all match nothing if it wants to. The .*? in $2
wants to match as little as possible, and so does the one before the >,
and when two sections of the pattern are 'fighting' over what to match
the one earlier in the pattern wins.

In general, .*? is not a panacea in situations like this. You would
probably be better off using negated character classes, something like

    $text =~ s{
        <img ([^>]*) src=["'] \s* 
        <a [^>]* [ ] href=["'] \s* ([^'"]*) ["'] [^>]* >
        ([^<]*) </a>
        ['"] ([^>]*) >
    }{<img src="$2"$1$4>}gsix;

(I've used /x to format it decently, which means the literal space needs
to be escaped somehow. I usually prefer putting it in a character class
to backslashing it, though either would work.)

Here each negated character class stops the match running off past the
next thing, so for instance $2 can't run past the end of the quotes.
This isn't perfect: it will not match at all if there are other tags
inside the <a>, and it's not terribly easy to modify it so it will.
(While it is possible to correctly match arbitrary HTML with Perl
regexes, it isn't entirely straightforward.)

Ben



------------------------------

Date: Thu, 12 Jul 2012 18:21:06 -0700 (PDT)
From: Jason C <jwcarlton@gmail.com>
Subject: Re: Regex: match double OR single quote
Message-Id: <0228bcf2-c33c-4f57-bf2b-597e414e47a0@googlegroups.com>

On Thursday, July 12, 2012 7:46:04 PM UTC-4, Ben Morrow wrote:
  <snip>
> In general, .*? is not a panacea in situations like this. You would
> probably be better off using negated character classes, something like
> 
>     $text =~ s{
>         &lt;img ([^&gt;]*) src=[&quot;&#39;] \s* 
>         &lt;a [^&gt;]* [ ] href=[&quot;&#39;] \s* ([^&#39;&quot;]*) [&quot;&#39;] [^&gt;]* &gt;
>         ([^&lt;]*) &lt;/a&gt;
>         [&#39;&quot;] ([^&gt;]*) &gt;
>     }{&lt;img src=&quot;$2&quot;$1$4&gt;}gsix;
> 
> (I&#39;ve used /x to format it decently, which means the literal space needs
> to be escaped somehow. I usually prefer putting it in a character class
> to backslashing it, though either would work.)
> 
> Here each negated character class stops the match running off past the
> next thing, so for instance $2 can&#39;t run past the end of the quotes.
> This isn&#39;t perfect: it will not match at all if there are other tags
> inside the &lt;a&gt;, and it&#39;s not terribly easy to modify it so it will.
> (While it is possible to correctly match arbitrary HTML with Perl
> regexes, it isn&#39;t entirely straightforward.)
> 
> Ben

Perfect! I actually did mean for the " or ' to be optional, though (it's possible to have references without a quote), so I had to add the * back in, but the idea of negated characters was exactly what I needed.

For the sake of my own knowledge, does the pattern:

/img([^>])src/

translate to "img, not followed by a >, and followed by src", or "img, followed by anything except a >, and followed by src"?


------------------------------

Date: Fri, 13 Jul 2012 10:49:03 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Regex: match double OR single quote
Message-Id: <fh75d9-4sm2.ln1@anubis.morrow.me.uk>


Quoth Jason C <jwcarlton@gmail.com>:
> On Thursday, July 12, 2012 7:46:04 PM UTC-4, Ben Morrow wrote:
>   <snip>
> > In general, .*? is not a panacea in situations like this. You would
> > probably be better off using negated character classes, something like
> > 
> >     $text =~ s{
> >         &lt;img ([^&gt;]*) src=[&quot;&#39;] \s* 
            ^^^^       ^^^^         ^^^^^^^^^^^
If you're going to be posting to programming newsgroups you need to find
a way to stop that from happening. Dropping Google in favour of a real
newsreader might be a good start.

> Perfect! I actually did mean for the " or ' to be optional, though (it's
> possible to have references without a quote), so I had to add the * back
> in, but the idea of negated characters was exactly what I needed.

'Optional' is ?, not *. Presumably you don't want to allow

    <a href=""""""one two three"""""">

> For the sake of my own knowledge, does the pattern:
> 
> /img([^>])src/
> 
> translate to "img, not followed by a >, and followed by src", or "img,
> followed by anything except a >, and followed by src"?

The latter.

Ben



------------------------------

Date: Thu, 12 Jul 2012 01:05:20 +0200
From: Hans Mulder <hansmu@xs4all.nl>
Subject: Re: Traversing folders
Message-Id: <4ffe06b0$0$6888$e4fe514c@news2.news.xs4all.nl>

On 11/07/12 17:40:59, Dirk wrote:
> Hi,
> 
> I am new to Perl and I am trying to list all the folders within a given
> folder and I have the following code. I know it is simple but for some
> reason the program just stops and I am not seeing any error code being
> returned.
> 
> #!/usr/bin/perl
> #
> 
> use warnings;
> use strict;
> use File::Find;
> 
> my $path_name;
> $path_name = '/test';
> 
> find sub {
> 	return unless -d;
> 	print "$File::Find::name\n";
> },$path_name;
> 
> exit;

Does '/test' exist on your system?

What happens if you try $path_name = '..'; ?

-- HansM



------------------------------

Date: Thu, 12 Jul 2012 17:01:09 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Traversing folders
Message-Id: <87sjcx7yuy.fsf@sapphire.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Dirk <dirk.devos@usa.net>:

[...]

>> #!/usr/bin/perl
>> #
>> 
>> use warnings;
>> use strict;
>> use File::Find;
>> 
>> my $path_name;
>> $path_name = '/test';
>> 
>> find sub {
>> 	return unless -d;
>> 	print "$File::Find::name\n";
>> },$path_name;
>> 
>> exit;
>
> You don't need to call 'exit' unless you want to exit early, or with an
> error code. Falling off the end of a Perl program is the usual way to
> exit successfully.

JFTR: This is not necessarily always true. For instance, the embedded
perl interpreter Nagios may use for executing plugins written in Perl
complains about plugins which didn't exit 'properly' if there is no
explicit exit statement at the end of the code.


------------------------------

Date: Thu, 12 Jul 2012 19:24:46 +0100
From: Henry Law <news@lawshouse.org>
Subject: Re: Traversing folders
Message-Id: <y9-dnQ3xbvDzi2LSnZ2dnUVZ8jadnZ2d@giganews.com>

On 11/07/12 16:40, Dirk wrote:
>
> I am new to Perl and I am trying to list all the folders within a given folder and I have the following code.
> I know it is simple but for some reason the program just stops and I am not seeing any error code being returned.

> #!/usr/bin/perl
> #
>
> use warnings;
> use strict;
> use File::Find;
>
> my $path_name;
> $path_name = '/test';
>
> find sub {
> 	return unless -d;
> 	print "$File::Find::name\n";
> },$path_name;
>
> exit;

I'm home now and I went as far as making "/test" and a couple of 
subdirectories on my personal machine, and ran your code exactly as it 
is, superfluous "exit" statement and all.  It runs perfectly.  So you 
have another problem; if you'd reply to some of the posts we'll try to 
help you with it.

henry@eris:~/Perl/tryout$ sudo mkdir /test
henry@eris:~/Perl/tryout$ sudo chown henry:henry /test
henry@eris:~/Perl/tryout$ mkdir /test/one
henry@eris:~/Perl/tryout$ mkdir /test/two
henry@eris:~/Perl/tryout$ ./clpm
/test
/test/two
/test/one



-- 

Henry Law            Manchester, England




------------------------------

Date: Fri, 13 Jul 2012 10:43:15 -0700 (PDT)
From: Dirk <dirk.devos@usa.net>
Subject: Re: Traversing folders
Message-Id: <b7810b23-811c-44e4-8a07-a2e7aa3ee25d@googlegroups.com>

Thanks to everybody for their input. When I first ran the code I was runnin=
g it for a folder that did not have a lot of sub-folders. When I ran it aga=
ins a very large folder I could not verify all folders and it appeared that=
 I had some missing folders. After being able to verify all the folders and=
 it appears that the code did return all sub-folders.

However, I did notice that I am getting a permission error. Something about=
 not being able to change directory.

Thanks again for all the replies. 


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3735
***************************************


home help back first fref pref prev next nref lref last post