[31088] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 2333 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Apr 12 03:09:42 2009

Date: Sun, 12 Apr 2009 00:09:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sun, 12 Apr 2009     Volume: 11 Number: 2333

Today's topics:
    Re: calculate CDF google@edcallahan.com
    Re: calculate CDF <jurgenex@hotmail.com>
    Re: calculate CDF <tadmc@seesig.invalid>
    Re: calculate CDF <uri@stemsystems.com>
    Re: Capture only first match in regular expression <http://joecosby.com/code/mail.pl>
    Re: Capture only first match in regular expression <tadmc@seesig.invalid>
    Re: Capture only first match in regular expression <noreply@gunnar.cc>
        new CPAN modules on Sun Apr 12 2009 (Randal Schwartz)
    Re: Pipe Between Programs <edgrsprj@ix.netcom.com>
    Re: Pipe Between Programs <edgrsprj@ix.netcom.com>
    Re: Pipe Between Programs <cwilbur@chromatico.net>
    Re: XML::LibXML UTF-8 toString() -vs- nodeValue() <hjp-usenet2@hjp.at>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sat, 11 Apr 2009 15:58:56 -0700 (PDT)
From: google@edcallahan.com
Subject: Re: calculate CDF
Message-Id: <faf906ab-eb57-4740-8fe1-33eef90cea4e@q2g2000vbr.googlegroups.com>

On Apr 11, 11:08=A0am, Uri Guttman <u...@stemsystems.com> wrote:
> >>>>> "g" =3D=3D google =A0<goo...@edcallahan.com> writes:
>
> =A0 g> On Apr 10, 11:30=A0am, Uri Guttman <u...@stemsystems.com> wrote:
> =A0 >> >>>>> "g" =3D=3D google =A0<goo...@edcallahan.com> writes:
> =A0 >>
> =A0 >> =A0 g> Honestly, if the manual athttp://search.cpan.org/dist/Math-=
CDF/CDF.pm
> =A0 >> =A0 g> doesn't make sense to you this is probably not the module y=
ou need to
> =A0 >> =A0 g> be using for your problem.
> =A0 >>
> =A0 >> and if that isn't a case of RTFM, then i don't know what RTFM mean=
s.
> =A0 >>
> =A0 >> pot meet kettle.
>
> <quoted signature snipped - learn how to edit quotes. maybe rtfm the
> group guidelines?>
>
> =A0 g> Not at all. The point is that it is a math stat module and if you =
are
> =A0 g> not already familiar with those stat concepts you are probably loo=
king
> =A0 g> for something else, like a data analysis tool.
>
> you still refered him to the manual and didn't hand hold him as you
> admonished tad. considering that tad has been helping here for many
> years, maintains and posts the guideline, me thinks you have little
> basis to bitch about. build up some credentials here before you accuse
> anyone of anything.
>
> uri
>
> --
> Uri Guttman =A0------ =A0u...@stemsystems.com =A0-------- =A0http://www.s=
ysarch.com--
> ----- =A0Perl Code Review , Architecture, Development, Training, Support =
------
> --------- Free Perl Training ---http://perlhunter.com/college.html-------=
--
> --------- =A0Gourmet Hot Cocoa Mix =A0---- =A0http://bestfriendscocoa.com=
---------

Thanks for the advise, but I'll post as I please Uri.


------------------------------

Date: Sat, 11 Apr 2009 17:05:23 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: calculate CDF
Message-Id: <lsb2u452q8vii27cjfvqfci9kuoo2qguvi@4ax.com>

google@edcallahan.com wrote:
>On Apr 11, 11:08 am, Uri Guttman <u...@stemsystems.com> wrote:
>> >>>>> "g" == google  <goo...@edcallahan.com> writes:
>> <quoted signature snipped - learn how to edit quotes. maybe rtfm the
>> group guidelines?>
>>
>> --
>> Uri Guttman  ------  u...@stemsystems.com  --------  http://www.sysarch.com--
>> -----  Perl Code Review , Architecture, Development, Training, Support ------
>> --------- Free Perl Training ---http://perlhunter.com/college.html---------
>> ---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com---------
>
>Thanks for the advise, but I'll post as I please Uri.

Of course you have the right to do as you please, it's just another
empirical confirmation of the quality of contributions from Google
Groups.

And I have the right not to read posters with poor manners. 

So long then Mr. google.

jue


------------------------------

Date: Sat, 11 Apr 2009 23:35:26 -0500
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: calculate CDF
Message-Id: <slrngu2rse.3k9.tadmc@tadmc30.sbcglobal.net>

google@edcallahan.com <google@edcallahan.com> wrote:
> On Apr 11, 11:08 am, Uri Guttman <u...@stemsystems.com> wrote:

>> <quoted signature snipped - learn how to edit quotes.


It has been socially accepted for 20 years that trimming .sigs in 
followups is good manners.


>> --
>> Uri Guttman  ------  u...@stemsystems.com  --------  http://www.sysarch.com--
>> -----  Perl Code Review , Architecture, Development, Training, Support ------
>> --------- Free Perl Training ---http://perlhunter.com/college.html---------
>> ---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com---------
>
> Thanks for the advise, but I'll post as I please Uri.


Then you are likely to be ostracized for your disregard toward others.


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Sun, 12 Apr 2009 00:56:42 -0400
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: calculate CDF
Message-Id: <x77i1qpj05.fsf@mail.sysarch.com>

>>>>> "g" == google  <google@edcallahan.com> writes:

  g> Thanks for the advise, but I'll post as I please Uri.

and your posts will get comments that you won't like as we please. see
who wins in the long run. even moronzilla gave up the ghost! :)

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Sat, 11 Apr 2009 19:54:09 -0700
From: Zapanaz <http://joecosby.com/code/mail.pl>
Subject: Re: Capture only first match in regular expression
Message-Id: <bsl2u4p32s3dgdngliv98ig7a99hfrko0j@4ax.com>

Excuse the cross-post, my server doesn't carry comp.lang.perl.misc but
it looks like there is more activity there.


The answer to this is probably staring me in the face ...

I am parsing/page scraping some HTML.  I know the first anchor tag <a>
contains information I want.  

So I do this:

      if($content =~ /.*(<a.*<\/a>).*/i){
        $anchorContent = $1;

This basically works the way I want, it matches an anchor tag and
captures the content of it.

But there are multiple anchor tags in the HTML.  What I want is the
first one, but what I get is the last one.

I think I should be using one of these

*      Match 0 or more times
+      Match 1 or more times
?      Match 1 or 0 times
{n}    Match exactly n times
{n,}   Match at least n times
{n,m}  Match at least n but not more than m times

To be honest, I really don't know how (n) is actually supposed to
look.  Would I actually use /a(1)/ to match "a" only one time? 



-- 
Zapanaz
International Satanic Conspiracy
Customer Support Specialist
http://joecosby.com/ 
Despite the strange appearance of the scooters, the Chinese ant-terror police are lethal in action.

:: Currently listening to No 21 in C major K467 Allegro maestoso, 1785, by Mozart, from "Piano Concertos - Vladimir Ashkenazy"


------------------------------

Date: Sat, 11 Apr 2009 23:30:46 -0500
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: Capture only first match in regular expression
Message-Id: <slrngu2rjm.3k9.tadmc@tadmc30.sbcglobal.net>

Zapanaz <http> wrote:

> Excuse the cross-post, 


It *decreases* the number of people that will see your post though.


> my server doesn't carry comp.lang.perl.misc but
> it looks like there is more activity there.


Your server's configuration has not been updated for over a decade?

comp.lang.perl was rmgroup'd over 10 years ago when
comp.lang.perl.misc was created.


> The answer to this is probably staring me in the face ...


The answer is: don't try and use regex for parsing context free languages.


> I am parsing/page scraping some HTML.  I know the first anchor tag <a>
> contains information I want.  
>
> So I do this:
>
>       if($content =~ /.*(<a.*<\/a>).*/i){
>         $anchorContent = $1;
>
> This basically works the way I want, it matches an anchor tag and
> captures the content of it.


You should use a module that understands HTML data for
processing HTML data.


> But there are multiple anchor tags in the HTML.  What I want is the
> first one, but what I get is the last one.
>
> I think I should be using one of these
>
> *      Match 0 or more times
> +      Match 1 or more times
> ?      Match 1 or 0 times
> {n}    Match exactly n times
> {n,}   Match at least n times
> {n,m}  Match at least n but not more than m times


None of those is what you're looking for.

I expect you are looking for the ? non-greedy metacharacter, which
is not the same as the ? quantifier you've found above.

Look about 2 paragraphs further down in perlre.pod:

    if($content =~ /.*(<a.*?<\/a>).*/i){


> To be honest, I really don't know how (n) is actually supposed to
> look.  Would I actually use /a(1)/ to match "a" only one time? 


That will match an "a" followed by a "1", and store the "1" in $1.

    /a{1}/

matches an "a" exactly 1 time.

But none of the quantifiers will help with the problem you are having.


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Sun, 12 Apr 2009 08:37:30 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: Capture only first match in regular expression
Message-Id: <74dghcF130u7bU1@mid.individual.net>

Zapanaz wrote:
> Excuse the cross-post, my server doesn't carry comp.lang.perl.misc but
> it looks like there is more activity there.

How would you know??

> I am parsing/page scraping some HTML.  I know the first anchor tag <a>
> contains information I want.  
> 
> So I do this:
> 
>       if($content =~ /.*(<a.*<\/a>).*/i){
>         $anchorContent = $1;
> 
> This basically works the way I want, it matches an anchor tag and
> captures the content of it.
> 
> But there are multiple anchor tags in the HTML.  What I want is the
> first one, but what I get is the last one.

     /(<a.+?<\/a>)/is

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl


------------------------------

Date: Sun, 12 Apr 2009 04:42:26 GMT
From: merlyn@stonehenge.com (Randal Schwartz)
Subject: new CPAN modules on Sun Apr 12 2009
Message-Id: <KHz12q.217r@zorch.sf-bay.org>

The following modules have recently been added to or updated in the
Comprehensive Perl Archive Network (CPAN).  You can install them using the
instructions in the 'perlmodinstall' page included with your Perl
distribution.

Alien-WiX-0.305210
http://search.cpan.org/~csjewell/Alien-WiX-0.305210/
Installing and finding Windows Installer XML (WiX) 
----
AnyEvent-4.351
http://search.cpan.org/~mlehmann/AnyEvent-4.351/
provide framework for multiple event loops 
----
Audio-Scan-0.03
http://search.cpan.org/~agrundma/Audio-Scan-0.03/
Fast C parser for MP3, Ogg Vorbis, FLAC, ASF 
----
CGI-Application-Demo-Ajax-1.00
http://search.cpan.org/~rsavage/CGI-Application-Demo-Ajax-1.00/
A search engine using CGI::Application, AJAX and JSON 
----
CGI-Application-Demo-Ajax-1.01
http://search.cpan.org/~rsavage/CGI-Application-Demo-Ajax-1.01/
A search engine using CGI::Application, AJAX and JSON 
----
CGI-DataObjectMapper-0.0101
http://search.cpan.org/~kimoto/CGI-DataObjectMapper-0.0101/
Data-Object Mapper for CGI form data 
----
CGI-DataObjectMapper-0.0102
http://search.cpan.org/~kimoto/CGI-DataObjectMapper-0.0102/
Data-Object Mapper for CGI form data 
----
CGI-DataObjectMapper-0.0103
http://search.cpan.org/~kimoto/CGI-DataObjectMapper-0.0103/
Data-Object Mapper for CGI form data 
----
CPAN-FindDependencies-2.1
http://search.cpan.org/~dcantrell/CPAN-FindDependencies-2.1/
find dependencies for modules on the CPAN 
----
Data-Inspect-0.03
http://search.cpan.org/~owl/Data-Inspect-0.03/
human-readable object representations 
----
Data-Visitor-0.24
http://search.cpan.org/~nuffin/Data-Visitor-0.24/
Visitor style traversal of Perl data structures 
----
Google-Adwords-v1.13
http://search.cpan.org/~rohan/Google-Adwords-v1.13/
an interface which abstracts the Google Adwords SOAP API 
----
Guard-1.02
http://search.cpan.org/~mlehmann/Guard-1.02/
safe cleanup blocks 
----
HTML-RelExtor-0.02
http://search.cpan.org/~miyagawa/HTML-RelExtor-0.02/
Extract "rel" and "rev" information from LINK and A tags. 
----
HTML-RelExtor-0.03
http://search.cpan.org/~miyagawa/HTML-RelExtor-0.03/
Extract "rel" and "rev" information from LINK and A tags. 
----
HTTP-Server-Simple-0.38_03
http://search.cpan.org/~jesse/HTTP-Server-Simple-0.38_03/
Lightweight HTTP server 
----
IO-Journal-0.1
http://search.cpan.org/~frequency/IO-Journal-0.1/
Perl interface for journalled file operations 
----
Mail-Chimp-0.12
http://search.cpan.org/~dpirotte/Mail-Chimp-0.12/
Perl wrapper around the Mailchimp v1.1 API 
----
Module-Changes-ADAMK-0.03
http://search.cpan.org/~adamk/Module-Changes-ADAMK-0.03/
Parse a traditional Changes file (as ADAMK interpretes it) 
----
Module-Changes-ADAMK-0.04
http://search.cpan.org/~adamk/Module-Changes-ADAMK-0.04/
Parse a traditional Changes file (as ADAMK interpretes it) 
----
MojoX-Session-0.10
http://search.cpan.org/~vti/MojoX-Session-0.10/
Session management for Mojo 
----
Mouse-0.21
http://search.cpan.org/~sartak/Mouse-0.21/
Moose minus the antlers 
----
MySQL-Sandbox-2.0.99
http://search.cpan.org/~gmax/MySQL-Sandbox-2.0.99/
Quickly installs MySQL side server, either standalone or in groups 
----
Net-SNMP-XS-0.02
http://search.cpan.org/~mlehmann/Net-SNMP-XS-0.02/
speed up Net::SNMP by decoding in XS, with limitations 
----
Net-SNMP-XS-0.03
http://search.cpan.org/~mlehmann/Net-SNMP-XS-0.03/
speed up Net::SNMP by decoding in XS, with limitations 
----
POE-Component-IRC-6.05_01
http://search.cpan.org/~hinrik/POE-Component-IRC-6.05_01/
A fully event-driven IRC client module 
----
POE-Component-Pluggable-1.18
http://search.cpan.org/~bingos/POE-Component-Pluggable-1.18/
A base class for creating plugin-enabled POE Components. 
----
Perl-Dist-WiX-0.170
http://search.cpan.org/~csjewell/Perl-Dist-WiX-0.170/
Experimental 4th generation Win32 Perl distribution builder 
----
Pod-Tree-1.16
http://search.cpan.org/~swmcd/Pod-Tree-1.16/
Create a static syntax tree for a POD 
----
Proc-Exists-0.99_01
http://search.cpan.org/~brianski/Proc-Exists-0.99_01/
quickly and portably check for process existence 
----
RDF-Simple-0.411
http://search.cpan.org/~mthurn/RDF-Simple-0.411/
read and write RDF without complication 
----
Simo-0.1008
http://search.cpan.org/~kimoto/Simo-0.1008/
Very simple framework for Object Oriented Perl. 
----
Simo-Util-0.0205
http://search.cpan.org/~kimoto/Simo-Util-0.0205/
Utility Class for Simo 
----
Test-Simple-0.87_02
http://search.cpan.org/~mschwern/Test-Simple-0.87_02/
Basic utilities for writing tests. 
----
Text_Editor_Easy-0.45
http://search.cpan.org/~grommier/Text_Editor_Easy-0.45/
----
VMS-Monitor-0_07
http://search.cpan.org/~cberry/VMS-Monitor-0_07/
Access system performace information on OpenVMS systems 
----
VMS-Priv-1_32
http://search.cpan.org/~cberry/VMS-Priv-1_32/
Get and set privileges for OpenVMS processes 
----
VMS-Process-1_07
http://search.cpan.org/~cberry/VMS-Process-1_07/
Manage processes and retrieve process information on OpenVMS systems 
----
VMS-Process-1_08
http://search.cpan.org/~cberry/VMS-Process-1_08/
Manage processes and retrieve process information on OpenVMS systems 
----
WWW-Shorten-Qurl-2.00
http://search.cpan.org/~davecross/WWW-Shorten-Qurl-2.00/
Perl interface to qurl.com 
----
WWW-Shorten-RevCanonical-0.01
http://search.cpan.org/~miyagawa/WWW-Shorten-RevCanonical-0.01/
Shorten URL using rev="canonical" 
----
WWW-Shorten-RevCanonical-0.02
http://search.cpan.org/~miyagawa/WWW-Shorten-RevCanonical-0.02/
Shorten URL using rev="canonical" 
----
WWW-Shorten-Simple-0.01
http://search.cpan.org/~miyagawa/WWW-Shorten-Simple-0.01/
Factory wrapper around WWW::Shorten to avoid imports 
----
WWW-Translate-Apertium-0.10
http://search.cpan.org/~enell/WWW-Translate-Apertium-0.10/
Open source machine translation 
----
Xacobeo-0.08_01
http://search.cpan.org/~potyl/Xacobeo-0.08_01/
XPath (XML Path Language) visualizer. 
----
kurila-1.19_0
http://search.cpan.org/~tty/kurila-1.19_0/
Perl Kurila 


If you're an author of one of these modules, please submit a detailed
announcement to comp.lang.perl.announce, and we'll pass it along.

This message was generated by a Perl program described in my Linux
Magazine column, which can be found on-line (along with more than
200 other freely available past column articles) at
  http://www.stonehenge.com/merlyn/LinuxMag/col82.html

print "Just another Perl hacker," # the original

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion


------------------------------

Date: Sat, 11 Apr 2009 13:37:21 -0500
From: "E.D.G." <edgrsprj@ix.netcom.com>
Subject: Re: Pipe Between Programs
Message-Id: <E6GdneB_j-HPfn3UnZ2dnUVZ_tCdnZ2d@earthlink.com>

<sln@netherlands.com> wrote in message 
news:m4vut412s0vv3n7ueg3s6d7fcvm179bdoh@4ax.com...
> On Thu, 9 Apr 2009 08:55:53 -0500, "E.D.G." <edgrsprj@ix.netcom.com> 
> wrote:

> Its just lipstick on a pig. Eventually the underlying motive becomes 
> obvious,

It is hardly necessary to go searching for motives here.  They have been 
clearly stated in quite a few of these posts.

This is part of a humanitarian effort. People around the world work on 
projects like this all the time.  Governments and their scientists need 
certain types of assistance with a variety of technical projects that can 
have an impact on people's lives.  And if you do this type of work for very 
long you will probably eventually discover that you can get the most done 
with the least amount of trouble if you simply provide them with the 
technology that they need plus a simple instruction manual that says, "Go to 
Step 1, then Step 2 ."

This particular project involves a tremendous amount of computer 
programming.  That is the reason for all the questions. Perl and Gnuplot 
both looked like a good programs to use in part because they are actively 
supported.  Once governments etc. start using the technology they can hire 
their own computer programmers and rewrite the code to their liking.  But 
they need demo versions to get started that show how everything is supposed 
to work.  And with the download program that is presently available for one 
of these projects the demo version can also be used as a working program.

In my opinion, at least some of the people posting notes here should be 
interested in these types of efforts.  The success or failure of these types 
of projects can have an impact on the international economy and then 
indirectly on people's lives and their jobs.  When the economy starts going 
into the tank, high quality jobs including computer programming and 
servicing jobs can start to get scarce.  One of these projects involves 
earthquake forecasting.  And one of the fastest ways for some country's 
economy to suffer a major setback is for a powerful and totally unexpected 
earthquake to devastate one of its large cities and business centers.  If 
you can accurately predict the earthquake you can save some lives and jobs.

My posts generally stick to technical questions that are the focus of this 
Newsgroup. Let's make this the last discussion regarding motives etc.  These 
are personal opinions.



------------------------------

Date: Sat, 11 Apr 2009 14:11:46 -0500
From: "E.D.G." <edgrsprj@ix.netcom.com>
Subject: Re: Pipe Between Programs
Message-Id: <fradnbBzsJ__dn3UnZ2dnUVZ_jednZ2d@earthlink.com>

"Gunnar Hjalmarsson" <noreply@gunnar.cc> wrote in message 
news:748obtF123l2lU1@mid.individual.net...

> Besides reading up on pipes, I suggest that you also read the posting 
> guidelines for this group:
> http://www.rehabitation.com/clpmisc/clpmisc_guidelines.html

Perhaps there needs to be another Perl Newsgroup.  It would be one where 
people could discuss what they wished without tying up the time and energy 
of the Perl experts.  The experts could continue to answer questions posted 
to this Newsgroup.

People often just need the code that will enable them to do this or that. 
For example, with this latest round I simply needed a group of statements 
that I could put in a Perl program to get it create a data pipe to another 
program.  And I needed the same type of code to store in the second program. 
It is not necessary to understand what it is doing as long as it works. 
Probably any experienced Perl user could answer questions like that or 
develop the code.  If the question could not be answered in the other 
Newsgroup then people could repost their question to this one.




------------------------------

Date: Sat, 11 Apr 2009 15:43:25 -0400
From: Charlton Wilbur <cwilbur@chromatico.net>
Subject: Re: Pipe Between Programs
Message-Id: <86myanm0wy.fsf@mithril.chromatico.net>

>>>>> "EDG" == E D G <edgrsprj@ix.netcom.com> writes:

    EDG> Perhaps there needs to be another Perl Newsgroup.  It would be
    EDG> one where people could discuss what they wished without tying
    EDG> up the time and energy of the Perl experts.  The experts could
    EDG> continue to answer questions posted to this Newsgroup.

The value of the newsgroup is directly proportional to the participation
of experts.  If you let non-experts discuss whatever they like with
non-experts, you will get the blind leading the blind, in a great
assortment of anti-help.

    EDG> People often just need the code that will enable them to do
    EDG> this or that. For example, with this latest round I simply
    EDG> needed a group of statements that I could put in a Perl program
    EDG> to get it create a data pipe to another program.  And I needed
    EDG> the same type of code to store in the second program. It is not
    EDG> necessary to understand what it is doing as long as it
    EDG> works.

Bzzzzt.  If *you* are the programmer, *you* need to understand what it
doing and why it works.  If you don't understand, when you change
something and it stops working, you will be back here asking for someone
to fix your code instead.  

Think first, code second.

    EDG> Probably any experienced Perl user could answer questions like
    EDG> that or develop the code.  If the question could not be
    EDG> answered in the other Newsgroup then people could repost their
    EDG> question to this one.

Indeed, and they'd get the same response you're getting now.  You can
learn and understand, or you can hire a programmer.  Having code written
for you for free is simply not an option.

Charlton


-- 
Charlton Wilbur
cwilbur@chromatico.net


------------------------------

Date: Sat, 11 Apr 2009 20:34:56 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
Message-Id: <slrngu1omh.ae6.hjp-usenet2@hrunkner.hjp.at>

On 2009-04-11 11:59, Eric Pozharski <whynot@pozharski.name> wrote:
> On 2009-04-10, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
>> On 2009-04-10 21:24, sln@netherlands.com <sln@netherlands.com> wrote:
>>> So there is a base 'code' line. Isin't that stupid to interpret the
>>> rest of the code in an encoding interpreted with another code? The
>>> code is then broken!
>>
>> No. Almost all encodings today are supersets of US-ASCII.
>>
>> Consider these two programs:
>>
>> #!/usr/bin/perl
>> use utf8;
>> use warnings;
>> use strict;
>>
>> my $greeting = "ÎšÎ±Î»Î·Î¼ÎÏÎ± ÎºÏŒÏƒÎ¼Îµ";
>> print "$greeting\n";
>> __END__
>
> Show your code, don't master it
>
> 	$ perl -Mutf8 -wle 'print "Ñ„Ñ‹Ð²Ð°"; print "\x{C0}\x{B0}"'
> 	Wide character in print at -e line 1.
> 	Ñ„Ñ‹Ð²Ð°

Yes, there should be a

binmode STDOUT, ":encoding(whatever)";

before the print. But I was only talking about compile time, not run
time, so this is irrelevant.

In fact, that you *do* get this warning shows my point: $greeting now
contains not a byte string (which can be sent directly to the
byte-oriented world outside) but a character string, which needs to be
encoded first.


> 	ï¿½
> 	$ echo $LC_ALL 
> 	en_US.UTF-8
>
>> #!/usr/bin/perl
>> use encoding "iso-8859-7";
>> use warnings;
>> use strict;
>>
>> my $greeting = "ÎšÎ±Î»Î·Î¼ÎÏÎ± ÎºÏŒÏƒÎ¼Îµ";
>> print "$greeting\n";
>> __END__
>
> Show your $ENV{LC_ALL}, please
>
> 	{2775:24} [0:0]$ perl -Mencoding=latin1 -wle 'print "Ñ„Ñ‹Ð²Ð°"; print "\x{C0}\x{B0}"'
> 	Ñ„Ñ‹Ð²Ð°
> 	ï¿½

use encoding als sets the binmode for STDOUT and STDERR, so you won't
get a warning here. Again, I was talking only about compile time
effects, not run time, so I didn't mention that (you can read the manual
yourself).

>> But you can't do something like that:
>>
>> #!/usr/bin/perl
>> use Greeting "ÎšÎ±Î»Î·Î¼ÎÏÎ± ÎºÏŒÏƒÎ¼Îµ";
>> use encoding "iso-8859-7";
>> use warnings;
>> use strict;
>>
>> hello();
>> __END__
>>
>> because now the use encoding comes too late: The compiler would have to
>> go back to the start to parse "ÎšÎ±Î»Î·Î¼ÎÏÎ± ÎºÏŒÏƒÎ¼Îµ" correctly.
>
> You've messed everything up.  Since compiler wasn't told about encoding
> of C<use Greeting>'s argument, it's treated as latin1,

Wrong: It is treated as an unspecified superset of US-ASCII.

> then F<Greeting.pm>
> is fed with that *byte* string,

Right,

> and that's F<Greeting.pm> problems what
> to do with that stuff.

Which is irrelevant for the example. The point is that in this case the
use encoding directive comes too late: at the point the string is
compiled, the compiler still expects some unspecified superset of
US-ASCII and produces byte strings. If you want to tell the compiler
that your source code is in iso-8859-7 (and that is the purpose of the
use encoding directive) then you have to do it *before* the first
element which requires that knowledge. The compiler won't go back and
start over.

> In case there would be C<use utf8> or C<use encoding 'utf8'>,

then the compiler would complain about a malformed UTF-8 character if
the source file was actually in ISO-8859-7.

The use encoding or use utf8 *must* match the encoding of the source
file. (And don't think about mixing several encodings in the same file
unless you want to enter your program in an obfu contest).


> then the *utf8* flag would be set, and then that would be
> F<Greeting.pm> problems what to do with *character* string.

The assumption was of course that Greeting.pm would expect a character
string.


> You missed one important thing -- I dislike this feature,

which feature?

>I hate this already.  Hopefully, since c.l.p.m. isn't that public, that
>dangerous fact would stay unnoted, see this:
>
> 	{4579:37} [0:0]$ perl -wle '$Ñ„Ñ‹Ð²Ð°++; print $Ñ„Ñ‹Ð²Ð°'
> 	Unrecognized character \x84 in column 3 at -e line 1.
> 	{4601:39} [0:2]$ perl -Mutf8 -wle '$Ñ„Ñ‹Ð²Ð°++; print $Ñ„Ñ‹Ð²Ð°'
> 	1
> 	{4605:40} [0:0]$ perl -Mencoding=utf8 -wle '$Ñ„Ñ‹Ð²Ð°++; print $Ñ„Ñ‹Ð²Ð°'
> 	Unrecognized character \x84 in column 3 at -e line 1.

Yes, you can't use "use encoding" for non-ascii variables. "use
encoding" was intended as a cheap way to get pre-5.8 programs with
hard-coded non-ascii strings into the new character string semantic, not
as a general purpose "write your code in any character encoding" tool.

I would *not* advise any one to use "use encoding" in new code, and if
you use it for porting old code, you *must* read the manual. Thoroughly.
Several times. There are dragons here.


> That's what C<use utf8> is fscking for.

What is it for?

>
> I should agree, 'UTF-8 flag' is somewhat misleading since it's about
> characters but utf8 by itself (I hope).
>
> But,..  here be dragons...
>
> 	{3335:27} [0:0]$ echo 'Ñ„Ñ‹Ð²Ð°' | xxd
> 	0000000: d184 d18b d0b2 d0b0 0a                   .........
> 	{3356:28} [0:0]$ echo 'Ñ„Ñ‹Ð²Ð°' | recode utf8..ucs-2-internal |xxd
> 	0000000: 4404 4b04 3204 3004 0a00                 D.K.2.0...
> 	{3414:29} [0:1]$ perl -wle 'print "\x{4404}\x{4b04}\x{3204}\x{3004}"'

You've mixed up the endianness. 'Ñ„' is U+0444, not U+4404.

% echo 'Ñ„Ñ‹Ð²Ð°' | iconv -t UTF-16BE | xxd
0000000: 0444 044b 0432 0430 000a                 .D.K.2.0..
% perl -CO -wle 'print "\x{0444}\x{044b}\x{0432}\x{0430}"'
Ñ„Ñ‹Ð²Ð°

(And another word of warning: -CO only works on the command line in
5.10.0 - in real code always use binmode)

	hp



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 2333
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[31088] in Perl-Users-Digest

Perl-Users Digest, Issue: 2333 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Sun Apr 12 03:09:42 2009

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Apr 12 03:09:42 2009