[31087] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 2332 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Apr 11 14:09:47 2009

Date: Sat, 11 Apr 2009 11:09:09 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sat, 11 Apr 2009     Volume: 11 Number: 2332

Today's topics:
    Re: calculate CDF google@edcallahan.com
    Re: calculate CDF <uri@stemsystems.com>
        multicore cpu QoS@invalid.net
    Re: multicore cpu <smallpond@juno.com>
    Re: multicore cpu <spamtrap@dot-app.org>
        My server script <no@email.please>
    Re: My server script <spamtrap@dot-app.org>
        new CPAN modules on Sat Apr 11 2009 (Randal Schwartz)
    Re: perl values for batch script to use <rkb@i.frys.com>
    Re: The Logic of Beautiful Code (Doug Miller)
    Re: The Logic of Beautiful Code <smallpond@juno.com>
    Re: XML::LibXML UTF-8 toString() -vs- nodeValue() <hjp-usenet2@hjp.at>
    Re: XML::LibXML UTF-8 toString() -vs- nodeValue() <hjp-usenet2@hjp.at>
    Re: XML::LibXML UTF-8 toString() -vs- nodeValue() <hjp-usenet2@hjp.at>
    Re: XML::LibXML UTF-8 toString() -vs- nodeValue() <whynot@pozharski.name>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sat, 11 Apr 2009 06:31:18 -0700 (PDT)
From: google@edcallahan.com
Subject: Re: calculate CDF
Message-Id: <4810b739-47ea-4911-bffc-ece4d86ea306@o30g2000vbc.googlegroups.com>

On Apr 10, 11:30=A0am, Uri Guttman <u...@stemsystems.com> wrote:
> >>>>> "g" =3D=3D google =A0<goo...@edcallahan.com> writes:
>
> =A0 g> Honestly, if the manual athttp://search.cpan.org/dist/Math-CDF/CDF=
 .pm
> =A0 g> doesn't make sense to you this is probably not the module you need=
 to
> =A0 g> be using for your problem.
>
> and if that isn't a case of RTFM, then i don't know what RTFM means.
>
> pot meet kettle.
>
> uri
>
> --
> Uri Guttman =A0------ =A0u...@stemsystems.com =A0-------- =A0http://www.s=
ysarch.com--
> ----- =A0Perl Code Review , Architecture, Development, Training, Support =
------
> --------- Free Perl Training ---http://perlhunter.com/college.html-------=
--
> --------- =A0Gourmet Hot Cocoa Mix =A0---- =A0http://bestfriendscocoa.com=
---------

Not at all. The point is that it is a math stat module and if you are
not already familiar with those stat concepts you are probably looking
for something else, like a data analysis tool.


------------------------------

Date: Sat, 11 Apr 2009 12:08:37 -0400
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: calculate CDF
Message-Id: <x7ocv3qika.fsf@mail.sysarch.com>

>>>>> "g" == google  <google@edcallahan.com> writes:

  g> On Apr 10, 11:30am, Uri Guttman <u...@stemsystems.com> wrote:
  >> >>>>> "g" == google <goo...@edcallahan.com> writes:
  >> 
  >>  g> Honestly, if the manual athttp://search.cpan.org/dist/Math-CDF/CDF.pm
  >>  g> doesn't make sense to you this is probably not the module you need to
  >>  g> be using for your problem.
  >> 
  >> and if that isn't a case of RTFM, then i don't know what RTFM means.
  >> 
  >> pot meet kettle.

<quoted signature snipped - learn how to edit quotes. maybe rtfm the
group guidelines?>

  g> Not at all. The point is that it is a math stat module and if you are
  g> not already familiar with those stat concepts you are probably looking
  g> for something else, like a data analysis tool.

you still refered him to the manual and didn't hand hold him as you
admonished tad. considering that tad has been helping here for many
years, maintains and posts the guideline, me thinks you have little
basis to bitch about. build up some credentials here before you accuse
anyone of anything.

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Sat, 11 Apr 2009 14:40:09 GMT
From: QoS@invalid.net
Subject: multicore cpu
Message-Id: <dZ1El.232$b11.99@nwrddc02.gnilink.net>


It seems my programs writtin in Perl only see one core on a dual core cpu.

Evertime the software has a lot of work to do the cpu utilization goes
up to exactly 50%.  Is there something wrong with my Perl installation?



------------------------------

Date: Sat, 11 Apr 2009 08:05:56 -0700 (PDT)
From: smallpond <smallpond@juno.com>
Subject: Re: multicore cpu
Message-Id: <0bcdbc3b-22e7-4ee1-b06d-8b8487263d85@s21g2000vbb.googlegroups.com>

On Apr 11, 10:40=A0am, Q...@invalid.net wrote:
> It seems my programs writtin in Perl only see one core on a dual core cpu=
 .
>
> Evertime the software has a lot of work to do the cpu utilization goes
> up to exactly 50%. =A0Is there something wrong with my Perl installation?

How many threads are you launching?


------------------------------

Date: Sat, 11 Apr 2009 12:32:10 -0400
From: Sherm Pendley <spamtrap@dot-app.org>
Subject: Re: multicore cpu
Message-Id: <m1d4bj6tit.fsf@dot-app.org>

QoS@invalid.net writes:

> It seems my programs writtin in Perl only see one core on a dual core cpu.
>
> Evertime the software has a lot of work to do the cpu utilization goes
> up to exactly 50%.  Is there something wrong with my Perl installation?

Are you *asking* Perl to use the additional cores, by writing multi-threaded
code? There's been some talk of auto-threading in Perl 6, but that's not
soup yet; in the current release you have to do it yourself.

sherm--

-- 
My blog: http://shermspace.blogspot.com
Cocoa programming in Perl: http://camelbones.sourceforge.net


------------------------------

Date: Sat, 11 Apr 2009 20:34:06 +0300
From: jife <no@email.please>
Subject: My server script
Message-Id: <49e0d48f$0$24751$9b536df3@news.fv.fi>

Hi, I have a tiny perl script listening to a port. Is there a way to 
read  browser form GET parameters. I mean the ones available as 
$ENV{'QUERY_STRING'} in a usual web server CGI?


------------------------------

Date: Sat, 11 Apr 2009 13:42:36 -0400
From: Sherm Pendley <spamtrap@dot-app.org>
Subject: Re: My server script
Message-Id: <m1r5zzaxyr.fsf@dot-app.org>

jife <no@email.please> writes:

> Hi, I have a tiny perl script listening to a port. Is there a way to
> read  browser form GET parameters. I mean the ones available as
> $ENV{'QUERY_STRING'} in a usual web server CGI?

You might want to have a look at HTTP::Server::Simple on CPAN - that wheel
has already been invented. :-)

    <http://search.cpan.org/perldoc?HTTP::Server::Simple>

sherm--

-- 
My blog: http://shermspace.blogspot.com
Cocoa programming in Perl: http://camelbones.sourceforge.net


------------------------------

Date: Sat, 11 Apr 2009 04:42:27 GMT
From: merlyn@stonehenge.com (Randal Schwartz)
Subject: new CPAN modules on Sat Apr 11 2009
Message-Id: <KHx6Er.o0G@zorch.sf-bay.org>

The following modules have recently been added to or updated in the
Comprehensive Perl Archive Network (CPAN).  You can install them using the
instructions in the 'perlmodinstall' page included with your Perl
distribution.

AnyEvent-SNMP-0.11
http://search.cpan.org/~mlehmann/AnyEvent-SNMP-0.11/
adaptor to integrate Net::SNMP into Anyevent. 
----
App-Rad-0.10
http://search.cpan.org/~garu/App-Rad-0.10/
Rapid (and easy!) creation of command line applications 
----
App-Rad-1.00
http://search.cpan.org/~garu/App-Rad-1.00/
Rapid (and easy!) creation of command line applications 
----
Audio-Scan-0.02
http://search.cpan.org/~agrundma/Audio-Scan-0.02/
Fast C scanning of audio file metadata 
----
B-Foreach-Iterator-0.03
http://search.cpan.org/~gfuji/B-Foreach-Iterator-0.03/
Manipulates foreach iterators 
----
Catalyst-App-RoleApplicator-0.002
http://search.cpan.org/~hdp/Catalyst-App-RoleApplicator-0.002/
apply roles to your Catalyst application-related classes 
----
Class-Implant-0.02_01
http://search.cpan.org/~shelling/Class-Implant-0.02_01/
Manipulating mixin and inheritance out of packages 
----
DBD-SQLite-1.22_04
http://search.cpan.org/~adamk/DBD-SQLite-1.22_04/
Self-contained RDBMS in a DBI Driver 
----
Data-GUID-Any-0.002
http://search.cpan.org/~dagolden/Data-GUID-Any-0.002/
Generic interface for GUID creation 
----
Devel-REPL-1.003006
http://search.cpan.org/~oliver/Devel-REPL-1.003006/
a modern perl interactive shell 
----
Email-MIME-Kit-2.004
http://search.cpan.org/~rjbs/Email-MIME-Kit-2.004/
build messages from templates 
----
ExtUtils-MakeMaker-6.51_01
http://search.cpan.org/~mschwern/ExtUtils-MakeMaker-6.51_01/
Create a module Makefile 
----
File-Overwrite-1.1
http://search.cpan.org/~dcantrell/File-Overwrite-1.1/
overwrite the contents of a file and optionally unlink it 
----
Geography-JapanesePrefectures-0.07
http://search.cpan.org/~tokuhirom/Geography-JapanesePrefectures-0.07/
Japanese Prefectures Data. 
----
HTML-CTPP2-2.4.10
http://search.cpan.org/~stellar/HTML-CTPP2-2.4.10/
Perl interface for CTPP2 library 
----
HTML-Chunks-1.55
http://search.cpan.org/~mblythe/HTML-Chunks-1.55/
A simple nested template engine for HTML, XML and XHTML 
----
HTML-Template-Pro-0.74
http://search.cpan.org/~viy/HTML-Template-Pro-0.74/
Perl/XS module to use HTML Templates from CGI scripts 
----
HTTP-Engine-0.1.6
http://search.cpan.org/~yappo/HTTP-Engine-0.1.6/
Web Server Gateway Interface and HTTP Server Engine Drivers (Yet Another Catalyst::Engine) 
----
HTTP-Engine-Middleware-0.11
http://search.cpan.org/~yappo/HTTP-Engine-Middleware-0.11/
middlewares distribution 
----
HTTP-Server-Simple-0.38_02
http://search.cpan.org/~jesse/HTTP-Server-Simple-0.38_02/
Lightweight HTTP server 
----
JSON-DWIW-0.30
http://search.cpan.org/~dowens/JSON-DWIW-0.30/
JSON converter that Does What I Want 
----
Jaipo-0.21
http://search.cpan.org/~bluet/Jaipo-0.21/
Micro-blogging Client 
----
JavaScript-Packer-0.02
http://search.cpan.org/~nevesenin/JavaScript-Packer-0.02/
Perl version of Dean Edwards' Packer.js 
----
Log-Dispatch-FogBugz-0.1
http://search.cpan.org/~dimartino/Log-Dispatch-FogBugz-0.1/
Log::Dispatch appender for sending log messages to the FogBugz bug tracking system 
----
Macro-Micro-0.053
http://search.cpan.org/~rjbs/Macro-Micro-0.053/
really simple templating for really simple templates 
----
Marpa-0.001_008
http://search.cpan.org/~jkegl/Marpa-0.001_008/
General BNF Parsing (Experimental version) 
----
Math-Polynomial-Solve-2.50
http://search.cpan.org/~jgamble/Math-Polynomial-Solve-2.50/
Find the roots of polynomial equations. 
----
MediaWiki-Bot-Plugin-ImageTester-0.2.6
http://search.cpan.org/~dcollins/MediaWiki-Bot-Plugin-ImageTester-0.2.6/
a plugin for MediaWiki::Bot which contains image copyright checking and analysis for the english wikipedia 
----
MediaWiki-Bot-Plugin-ImageTester-0.2.7
http://search.cpan.org/~dcollins/MediaWiki-Bot-Plugin-ImageTester-0.2.7/
a plugin for MediaWiki::Bot which contains image copyright checking and analysis for the english wikipedia 
----
MooseX-RelatedClassRoles-0.003
http://search.cpan.org/~hdp/MooseX-RelatedClassRoles-0.003/
Apply roles to a class related to yours 
----
MouseX-Getopt-0.06
http://search.cpan.org/~masaki/MouseX-Getopt-0.06/
A Mouse role for processing command line options 
----
MySQL-Sandbox-2.0.98i
http://search.cpan.org/~gmax/MySQL-Sandbox-2.0.98i/
Quickly installs MySQL side server, either standalone or in groups 
----
NEXT-0.63
http://search.cpan.org/~flora/NEXT-0.63/
Provide a pseudo-class NEXT (et al) that allows method redispatch 
----
Net-LastFM-Submission-0.5
http://search.cpan.org/~sharifuln/Net-LastFM-Submission-0.5/
Perl interface to the Last.fm Submissions Protocol 
----
Net-LastFM-Submission-0.6
http://search.cpan.org/~sharifuln/Net-LastFM-Submission-0.6/
Perl interface to the Last.fm Submissions Protocol 
----
Net-LimeLight-Purge-0.03
http://search.cpan.org/~gphat/Net-LimeLight-Purge-0.03/
LimeLight Purge Service API 
----
Net-SNMP-EV-0.11
http://search.cpan.org/~mlehmann/Net-SNMP-EV-0.11/
adaptor to integrate Net::SNMP into the EV event loop. 
----
Net-SNMP-EV-0.12
http://search.cpan.org/~mlehmann/Net-SNMP-EV-0.12/
adaptor to integrate Net::SNMP into the EV event loop. 
----
Net-SSH2-0.19
http://search.cpan.org/~rkitover/Net-SSH2-0.19/
Support for the SSH 2 protocol via libssh2. 
----
Palm-Treo680MessagesDB-1.01
http://search.cpan.org/~dcantrell/Palm-Treo680MessagesDB-1.01/
Handler for Treo 680 SMS message databases 
----
Parallel-Fork-BossWorkerAsync-0.03
http://search.cpan.org/~jvannucci/Parallel-Fork-BossWorkerAsync-0.03/
Perl extension for creating asynchronous forking queue processing applications. 
----
Parse-IASLog-1.08
http://search.cpan.org/~bingos/Parse-IASLog-1.08/
A parser for Microsoft IAS-formatted log entries. 
----
REST-Client-88
http://search.cpan.org/~mcrawfor/REST-Client-88/
A simple client for interacting with RESTful http/https resources 
----
Regexp-Assemble-Compressed-0.01
http://search.cpan.org/~taniguchi/Regexp-Assemble-Compressed-0.01/
Assemble more compressed Regular Expression 
----
SQL-Tokenizer-0.19
http://search.cpan.org/~izut/SQL-Tokenizer-0.19/
A simple SQL tokenizer. 
----
Socket-Class-2.10
http://search.cpan.org/~chrmue/Socket-Class-2.10/
A class to communicate with sockets 
----
Sphinx-Search-0.19
http://search.cpan.org/~jjschutz/Sphinx-Search-0.19/
Sphinx search engine API Perl client 
----
Test-PPPort-0.01
http://search.cpan.org/~yappo/Test-PPPort-0.01/
test for ppport.h warnings 
----
Test-PPPort-0.02
http://search.cpan.org/~yappo/Test-PPPort-0.02/
test for ppport.h warnings 
----
Test-Regexp-2009041001
http://search.cpan.org/~abigail/Test-Regexp-2009041001/
Test your regular expressions 
----
Test-Regexp-2009041002
http://search.cpan.org/~abigail/Test-Regexp-2009041002/
Test your regular expressions 
----
Test-Snapshots-0.02
http://search.cpan.org/~szabgab/Test-Snapshots-0.02/
for testing stand alone scripts and executables 
----
Text-Template-Simple-0.62_12
http://search.cpan.org/~burak/Text-Template-Simple-0.62_12/
Simple text template engine 
----
Text-Template-Simple-0.62_13
http://search.cpan.org/~burak/Text-Template-Simple-0.62_13/
Simple text template engine 
----
Tk-ForDummies-Graph-1.05
http://search.cpan.org/~djibel/Tk-ForDummies-Graph-1.05/
Extension of Canvas widget to create a graph like GDGraph. 
----
XML-Parser-Lite-Tree-XPath-0.21
http://search.cpan.org/~iamcal/XML-Parser-Lite-Tree-XPath-0.21/
XPath access to XML::Parser::Lite::Tree structures 
----
namespace-autoclean-0.01
http://search.cpan.org/~flora/namespace-autoclean-0.01/
Keep imports out of your namespace 


If you're an author of one of these modules, please submit a detailed
announcement to comp.lang.perl.announce, and we'll pass it along.

This message was generated by a Perl program described in my Linux
Magazine column, which can be found on-line (along with more than
200 other freely available past column articles) at
  http://www.stonehenge.com/merlyn/LinuxMag/col82.html

print "Just another Perl hacker," # the original

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion


------------------------------

Date: Sat, 11 Apr 2009 08:27:52 -0700 (PDT)
From: Ron Bergin <rkb@i.frys.com>
Subject: Re: perl values for batch script to use
Message-Id: <8f82771f-cbed-4bff-87f1-19570e457daa@y34g2000prb.googlegroups.com>

On Apr 10, 9:39=A0am, s...@netherlands.com wrote:
> On Wed, 8 Apr 2009 06:52:25 -0700 (PDT), Ron Bergin <r...@i.frys.com> wro=
te:
> >On Apr 7, 3:50=A0pm, s...@netherlands.com wrote:
> >> On Tue, 7 Apr 2009 14:09:10 -0700 (PDT), Ron Bergin <r...@i.frys.com> =
wrote:
> >> >On Apr 7, 10:58 am, Slickuser <slick.us...@gmail.com> wrote:
> >> >> I have a filename (file.txt)
>
> >> >> file.txt contains:
> >> >> Sample4.1.2009_US
> >> >> Sample4.2.2009_ASIA
>
> >> >> I can parse this file in Perl fine. Now I want this value to be
> >> >> available to use in a batch script.
> >> >> I try using "set" but the info get clear once I exit perl script.
>
> >> >> perl_script.pl
> >> >> open file.txt
> >> >> parse info
> >> >> use system to execute command ("set xxyz_US=3DSample4.1.2009_US")
> >> >> ("set xxyz_ASIA=3DSample4.2.2009_ASIA")
>
> >> >Use setenv instead of the set command.
>
> >> >http://barnyard.syr.edu/~vefatica/#SETENV
>
> >> >Or, you could use the standard set command and then use Win32::API to
> >> >call 2 C functions (RegFlushKey and BroadcastSystemMessage) to force
> >> >that setting to be retained after the perl script ends, which is
> >> >basically what setenv does.
>
> >> I don't understand posters taken verbatim. The dumb shmuks here
> >> think a perl script can realistically shine shoes if asked. Literally.
>
> >> -sln
>
> >I don't understand how your comment applies to mine. =A0What you're
> >trying to convey?
>
> Well Ron I guess I'm saying "Some little programs for Windows NT/Intel" i=
sn't
> relative in environments for a long time now. No longer are apps clutteri=
ng up
> the environment. Maybe your just a little behind the times.
>
> -sln

Well, "Some little programs for Windows NT/Intel" is relative to the
OP and his environment.

The solution I suggested is a perfectly viable option for the OP's
problem.  However, personally I'd use a different approach, but since
the OP didn't provide enough info on what he needs to accomplish, none
of us can say what that better approach should be.


------------------------------

Date: Sat, 11 Apr 2009 11:18:59 GMT
From: spambait@milmac.com (Doug Miller)
Subject: Re: The Logic of Beautiful Code
Message-Id: <B0%Dl.28893$ZP4.21670@nlpi067.nbdc.sbc.com>

In article <n5cvt4di8fcskoihaika6fdn5bitr7gdmp@4ax.com>, sln@netherlands.com wrote:
[load of incredible nonsense snipped]

Please wait to post until you sober up.


------------------------------

Date: Sat, 11 Apr 2009 08:15:21 -0700 (PDT)
From: smallpond <smallpond@juno.com>
Subject: Re: The Logic of Beautiful Code
Message-Id: <f6cdd531-931e-4c3e-a82a-c51a5d94c2c6@v9g2000vbb.googlegroups.com>

On Apr 10, 5:08=A0pm, s...@netherlands.com wrote:

> Perl is sing-song reading and writing,

COBOL is more sing-song. - one steady rhythm.  There's
only one way to do it.

I think Perl is more like Jazz.  You leave out notes
here and there that are implicit.  You sometimes put
the if in front and sometimes at the end - whichever
sounds better.  Dizzy Gillespie would have been a
Perl programmer.


------------------------------

Date: Sat, 11 Apr 2009 00:43:54 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
Message-Id: <slrngtvitc.vfj.hjp-usenet2@hrunkner.hjp.at>

On 2009-04-10 21:24, sln@netherlands.com <sln@netherlands.com> wrote:
> On Fri, 10 Apr 2009 22:50:51 +0200, "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>>On 2009-04-10 19:27, sln@netherlands.com <sln@netherlands.com> wrote:
>>> On Fri, 10 Apr 2009 21:04:30 +0200, "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>>>>use encoding "X" has a similar effect: You tell the compiler that the
>>>>source code is in encoding X, 
>>>
>>> How do you tell the compiler what coding to use if the encoding '' can't be
>>> decoded?
[...]
>>Or do you mean what happens if the compiler doesn't even get to the "use
>>encoding 'X'" line because that is encoded? This is only a problem if
>>you use an encoding which isn't a superset US-ASCII (or EBCDIC on some
>>platforms). So you can't use UTF-16, because the extra 0x00 octets would confuse
>>the parser which is expecting US-ASCII, and you can't use EBCDIC on an
>>US-ASCII platform, but you can use UTF-8, ISO-8859-X, BIG5, euc-jp, as
>>long as you use only ASCII characters before the use directive (which is
>>easy since that should be the first line (after the shebang) anyway.
>
> So there is a base 'code' line. Isin't that stupid to interpret the rest of
> the code in an encoding interpreted with another code? The code is then broken!

No. Almost all encodings today are supersets of US-ASCII.

Consider these two programs:

#!/usr/bin/perl
use utf8;
use warnings;
use strict;

my $greeting = "Καλημέρα κόσμε";
print "$greeting\n";
__END__

#!/usr/bin/perl
use encoding "iso-8859-7";
use warnings;
use strict;

my $greeting = "Καλημέρα κόσμε";
print "$greeting\n";
__END__

where the first is encoded in UTF-8 and the second is encoded in
ISO-8859-7.

When the compiler starts to parse each program it doesn't know which
encoding is used. But it doesn't have to, because all the octets in the
first two lines are from the common subset of both these encodings: 0x65
is an "e" in both UTF-8 and ISO-8859-7, 0x22 is a double quote in both,
etc. So it can parse those two lines just fine assuming US-ASCII. And
after it has parsed those lines, it knows that the real encoding is not
just US-ASCII, but a specific superset: UTF-8 or ISO-8859-7,
respectively.

But you can't do something like that:

#!/usr/bin/perl
use Greeting "Καλημέρα κόσμε";
use encoding "iso-8859-7";
use warnings;
use strict;

hello();
__END__

because now the use encoding comes too late: The compiler would have to
go back to the start to parse "Καλημέρα κόσμε" correctly.

	hp


------------------------------

Date: Sat, 11 Apr 2009 11:59:55 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
Message-Id: <slrngu0qgs.67e.hjp-usenet2@hrunkner.hjp.at>

On 2009-04-10 21:20, sln@netherlands.com <sln@netherlands.com> wrote:
> On Fri, 10 Apr 2009 22:59:40 +0200, "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>>On 2009-04-10 19:44, sln@netherlands.com <sln@netherlands.com> wrote:
>>> Utf-16 and utf-32 have merits. Unfortunately, Perl won't do that.
>>
>>Actually, for all practical purposes, Perl character strings *are*
>>UTF-32. Each character is a 32-bit value.
>>
>>Both UTF-16 and UTF-32 are supported for I/O, of course.
>>
>>> Imagine Perl doing utf-32.
>>
>>I don't have to imagine that, it does.
>>
>>> Why then you could do Regular Expressions on
>>> a binary stream.
>>
>>You can't do Regexps on streams, whether binary or not (would be nice if
>>we could).
>>
>>You can do Regexps on *strings*, whether they are binary or text.
>>
>>I don't know what that has to do with UTF-32. Binary strings consist of
>>octets. Treating them as UTF-32 is almost almost a mistake.
>>
>>	hp
>
> If you can't do Reges on streams, then you can't parse XML.

You don't need regexps at all to parse XML (or any other language).
And you certainly don't need to do them on streams, since you can always
read the next block or line from the stream and append it to your
buffer.

> I ah think your missing what Unicode is.

I know quite well what Unicode is - I found characterset issues
fascinating ever since I turned on an Apple ][ in 1984 and it identified
itself as "Apple ". I've read Rob Pike's paper in the early 90s and
the full unicode standard (version 2.0) in the late 90s. And I've
discussed character encoding matters (including Unicode) a lot on
various newsgroups and mailinglists over the years and fixed a few
encoding related problems in various pieces of software.

On the other hand, I think you don't know what a stream is:

my ($fh, '<', 'test.xml');

Now $fh refers a stream. Please show me how you can apply a regexp to
this stream. Solutions which don't count:

 * reading chunks from the stream into a scalar variable and then
   applying the regexp to this variable (because then you apply it to a 
   string (as I wrote), not a stream.
 * writing your own regexp engine (since Perl is a general purpose
   programming language, you can of course write that but we were
   talking about Perl' builtin regexp).


> I have already posted sometime back pack/unpack on regex streams.

pack and unpack are Perl functions. They can only be applied to strings,
not streams. If you don't mean these functions but something else, be
more specific. And I have no idea what a "regex stream" might be. A
stream composed of regexps? A stream with special support for regexps? A
stream split into records with a regexp?

> I can repost the code if you need.

Code is always nice because it is unambiguous (unlike the English
language). However, keep in mind that this is a discussion group, not a
code repository. Any code example longer than 50 lines or so is unlikely
to be read.

> Or you can read a few docs on it.
> perlunicode.html and some others.

I've read that several times (and critisized it here, too).

> I doubt you'll capitulate no matter what.

If you think this is a fight where one of us has to win and the other to
capitulate, I'll stop now.

	hp



------------------------------

Date: Sat, 11 Apr 2009 12:13:55 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
Message-Id: <slrngu0rb4.67e.hjp-usenet2@hrunkner.hjp.at>

On 2009-04-10 21:32, sln@netherlands.com <sln@netherlands.com> wrote:
> Btw, just try to pack or un-pack UTF-16 or UTF-32.

Wrong tool. Use encode/decode for that.

> Hey or even UTF-8 that is out of range.

What is "UTF-8 that is out of range"? A UTF-8 sequence which would
be decoded to a Unicode value > 0xFFFF_FFFF? That wasn't well-formed
UTF-8 to begin with since Unicode/ISO-10464 is by definition only 32 bit
(and it is unlikely that there will ever be characters beyond 0x10FFFF
defined since that would break UTF-16).

> Try to do regex on them next.

You can do that, but it would be stupid. You decode them first and use
regexps on the result.

> I did.

Why am I not surprised?

> I didn't pack/unpack utf16 or utt32.
> Let me know if you can do that.

I could. But since there's a better way, I wouldn't.

	hp


------------------------------

Date: Sat, 11 Apr 2009 14:59:47 +0300
From: Eric Pozharski <whynot@pozharski.name>
Subject: Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
Message-Id: <slrngu11id.bnl.whynot@orphan.zombinet>

On 2009-04-10, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
> On 2009-04-10 21:24, sln@netherlands.com <sln@netherlands.com> wrote:
*SKIP*
>> So there is a base 'code' line. Isin't that stupid to interpret the
>> rest of the code in an encoding interpreted with another code? The
>> code is then broken!
>
> No. Almost all encodings today are supersets of US-ASCII.
>
> Consider these two programs:
>
> #!/usr/bin/perl
> use utf8;
> use warnings;
> use strict;
>
> my $greeting = "Καλημέρα κόσμε";
> print "$greeting\n";
> __END__

Show your code, don't master it

	$ perl -Mutf8 -wle 'print "фыва"; print "\x{C0}\x{B0}"'
	Wide character in print at -e line 1.
	фыва
	�
	$ echo $LC_ALL 
	en_US.UTF-8

> #!/usr/bin/perl
> use encoding "iso-8859-7";
> use warnings;
> use strict;
>
> my $greeting = "Καλημέρα κόσμε";
> print "$greeting\n";
> __END__

Show your $ENV{LC_ALL}, please

	{2775:24} [0:0]$ perl -Mencoding=latin1 -wle 'print "фыва"; print "\x{C0}\x{B0}"'
	фыва
	�

> where the first is encoded in UTF-8 and the second is encoded in
> ISO-8859-7.
>
> When the compiler starts to parse each program it doesn't know which
> encoding is used. But it doesn't have to, because all the octets in the
> first two lines are from the common subset of both these encodings: 0x65
> is an "e" in both UTF-8 and ISO-8859-7, 0x22 is a double quote in both,
> etc. So it can parse those two lines just fine assuming US-ASCII. And
> after it has parsed those lines, it knows that the real encoding is not
> just US-ASCII, but a specific superset: UTF-8 or ISO-8859-7,
> respectively.
>
> But you can't do something like that:
>
> #!/usr/bin/perl
> use Greeting "Καλημέρα κόσμε";
> use encoding "iso-8859-7";
> use warnings;
> use strict;
>
> hello();
> __END__
>
> because now the use encoding comes too late: The compiler would have to
> go back to the start to parse "Καλημέρα κόσμε" correctly.

You've messed everything up.  Since compiler wasn't told about encoding
of C<use Greeting>'s argument, it's treated as latin1, then F<Greeting.pm>
is fed with that *byte* string, and that's F<Greeting.pm> problems what
to do with that stuff.

In case there would be C<use utf8> or C<use encoding 'utf8'>, then the
*utf8* flag would be set, and then that would be F<Greeting.pm> problems
what to do with *character* string.

You missed one important thing -- I dislike this feature, I hate this
already.  Hopefully, since c.l.p.m. isn't that public, that dangerous
fact would stay unnoted, see this:

	{4579:37} [0:0]$ perl -wle '$фыва++; print $фыва'
	Unrecognized character \x84 in column 3 at -e line 1.
	{4601:39} [0:2]$ perl -Mutf8 -wle '$фыва++; print $фыва'
	1
	{4605:40} [0:0]$ perl -Mencoding=utf8 -wle '$фыва++; print $фыва'
	Unrecognized character \x84 in column 3 at -e line 1.

That's what C<use utf8> is fscking for.

I should agree, 'UTF-8 flag' is somewhat misleading since it's about
characters but utf8 by itself (I hope).

But,..  here be dragons...

	{3335:27} [0:0]$ echo 'фыва' | xxd
	0000000: d184 d18b d0b2 d0b0 0a                   .........
	{3356:28} [0:0]$ echo 'фыва' | recode utf8..ucs-2-internal |xxd
	0000000: 4404 4b04 3204 3004 0a00                 D.K.2.0...
	{3414:29} [0:1]$ perl -wle 'print "\x{4404}\x{4b04}\x{3204}\x{3004}"'
	Wide character in print at -e line 1.
	䐄䬄㈄〄
	{3415:30} [0:0]$ perl -Mencoding=ucs2 -wle 'print "\x{4404}\x{4b04}\x{3204}\x{3004}"'
	Can't locate object method "cat_decode" via package "Encode::Unicode" at
	-e line 1.



-- 
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 2332
***************************************


home help back first fref pref prev next nref lref last post