[31089] in Perl-Users-Digest
Perl-Users Digest, Issue: 2334 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Apr 12 16:09:48 2009
Date: Sun, 12 Apr 2009 13:09:12 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sun, 12 Apr 2009 Volume: 11 Number: 2334
Today's topics:
Re: calculate CDF google@edcallahan.com
Re: calculate CDF <tadmc@seesig.invalid>
Re: Capture only first match in regular expression sln@netherlands.com
Re: Finding a Perl job <m@rtij.nl.invlalid>
How do I start and restart a program via a perl script? <cdalten@gmail.com>
Re: How do I start and restart a program via a perl scr <tadmc@seesig.invalid>
Re: How do I start and restart a program via a perl scr <cdalten@gmail.com>
Re: How do I start and restart a program via a perl scr <jurgenex@hotmail.com>
Re: How do I start and restart a program via a perl scr <m@rtij.nl.invlalid>
Re: How do I start and restart a program via a perl scr <tadmc@seesig.invalid>
Re: How do I start and restart a program via a perl scr <xhoster@gmail.com>
Re: multicore cpu QoS@invalid.net
Re: multicore cpu <spamtrap@dot-app.org>
Re: multicore cpu sln@netherlands.com
Re: multicore cpu <xhoster@gmail.com>
Simple line-drawing graphics <bernie@fantasyfarm.com>
Re: XML::LibXML UTF-8 toString() -vs- nodeValue() <whynot@pozharski.name>
Re: XML::LibXML UTF-8 toString() -vs- nodeValue() sln@netherlands.com
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sun, 12 Apr 2009 09:51:24 -0700 (PDT)
From: google@edcallahan.com
Subject: Re: calculate CDF
Message-Id: <d42aab67-f746-4323-9fab-bbe9f877df9d@c36g2000yqn.googlegroups.com>
On Apr 11, 11:35=A0pm, Tad J McClellan <ta...@seesig.invalid> wrote:
>
> It has been socially accepted for 20 years that trimming .sigs in
> followups is good manners.
You're right Tad, actually posting the signature was just an
oversight. I actually hadn't noticed that part of Uri's lecture. I've
not used the usenet for many years now and this google message editor
doesn't have many of the niceties. A post about my Math::CDF module
brought me out of hibernation, and I think there is little reason not
to return to it.
My "post as I please" was actually in reference to the concept that I
should build up "credibility" before posting criticism, which I
reject. Usenet is still an unregulated frontier and I have the same
right as anyone else, as everyone else has the right to post as they
like in response.
Singling you out for the RTFM thing wasn't meant to rise so much ire
and certainly was unnecessary, but your post just brought back floods
of memories of searching Google for help, finding exactly my question,
but in response for answers just "search CPAN" or "search the groups"
messages. It does get boring, and I'm never sure why the posters of
those messages bother to post them if not simply to say "I know, but
won't tell". Maybe you *really* didn't think the OP knew what CPAN
was, in which case my sincerest apologies. But I suspect there is
thick skin around here.
It is good to know that in 20 years not everything changes, and the
Perl groups still as they were, warts and all. Maybe this thread will
degenerate to complaints of each others grammer and spelling, a
comparison to Hitler and an invocation of Godwin's law. Then I will
know I can depend on the stability of this world.
Enjoy the day, hope your easter basket was full!
------------------------------
Date: Sun, 12 Apr 2009 12:43:26 -0500
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: calculate CDF
Message-Id: <slrngu4a1u.9ps.tadmc@tadmc30.sbcglobal.net>
google@edcallahan.com <google@edcallahan.com> wrote:
> My "post as I please" was actually in reference to the concept that I
> should build up "credibility" before posting criticism, which I
> reject.
Now *that* is certainly rejectable.
> Maybe you *really* didn't think the OP knew what CPAN
> was,
I really thought that the OP made no effort whatsoever to find
a solution to his problem before posting.
My response was colored in accordance with that thinking.
> degenerate to complaints of each others grammer and spelling, a
^^^^^^^
grammar.
nyuk.
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
------------------------------
Date: Sun, 12 Apr 2009 11:49:01 -0700
From: sln@netherlands.com
Subject: Re: Capture only first match in regular expression
Message-Id: <fgd4u41ghej3enk17grc4gdp9d7rhv8g9v@4ax.com>
On Sat, 11 Apr 2009 23:30:46 -0500, Tad J McClellan <tadmc@seesig.invalid> wrote:
>Zapanaz <http> wrote:
>
>> Excuse the cross-post,
>
>
>It *decreases* the number of people that will see your post though.
>
>
>> my server doesn't carry comp.lang.perl.misc but
>> it looks like there is more activity there.
>
>
>Your server's configuration has not been updated for over a decade?
>
>comp.lang.perl was rmgroup'd over 10 years ago when
>comp.lang.perl.misc was created.
>
>
>> The answer to this is probably staring me in the face ...
>
>
>The answer is: don't try and use regex for parsing context free languages.
But below you tell him how.
The truth is nobody is parsing the context of HTML, only the mark-up.
If regualr expressions can't parse '>' then it is not as good as strcmp(),
because thats what XML stream parsing SAX compliant parsers (and Expat but not compliant)
do. HTML modules parse the same thing.
Unless you think strcmp() has a magical gift, you better make the distinction.
-sln
------------------------------
Date: Sun, 12 Apr 2009 19:44:25 +0200
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: Finding a Perl job
Message-Id: <pan.2009.04.12.17.44.25@rtij.nl.invlalid>
On Fri, 10 Apr 2009 14:59:38 -0700, sln wrote:
> No. Sorry I won't do that.
> You know, the strangest thing, I don't enable scripts or auto-loading of
> Active-X executables/dll's on my machine. I would like to disable crap
> that you may want to load on my machine.
>
> Am I wrong? Is there any other way I can communicate to employers
> without having your crap executing on my Operating System?
Depends on how desperate you need the job. I would pass. But then, I
don't use our time registration system as it doesn't work with any
browser at home or work. I can get away with it, not everybody can.
M4
------------------------------
Date: Sun, 12 Apr 2009 07:20:10 -0700 (PDT)
From: grocery_stocker <cdalten@gmail.com>
Subject: How do I start and restart a program via a perl script?
Message-Id: <f3bea4e6-4fec-48f6-9db9-fb444d84a547@d19g2000prh.googlegroups.com>
The following scrpt is supposed to continuously scan the *nix who list
to see if a particular person enters the party chatline. If they
enter, then the script is supposed to trigger the nope program. When
than person leaves, the nope program is supposed to be killed and the
script goes back to scanning to see if that person enters the party
chanline again.
However, I can't seem to get it to work correclty. Ideas?
#!/usr/bin/perl
my $pid;
while (True)
{
if(`w | grep cdalten | grep party`) {
$pid = open(FH, "/home/guest/cdalten/nope2 &|");
#print $pid;
kill $pid;
}
kill $pid;
sleep(1);
}
------------------------------
Date: Sun, 12 Apr 2009 10:20:21 -0500
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: How do I start and restart a program via a perl script?
Message-Id: <slrngu41ll.8i4.tadmc@tadmc30.sbcglobal.net>
grocery_stocker <cdalten@gmail.com> wrote:
> #!/usr/bin/perl
>
> my $pid;
>
> while (True)
You should always enable warnings when developing Perl code.
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
------------------------------
Date: Sun, 12 Apr 2009 08:41:49 -0700 (PDT)
From: grocery_stocker <cdalten@gmail.com>
Subject: Re: How do I start and restart a program via a perl script?
Message-Id: <a6839adf-2f2e-4a64-85e4-95f36d4ea76d@x29g2000prf.googlegroups.com>
On Apr 12, 8:20 am, Tad J McClellan <ta...@seesig.invalid> wrote:
> grocery_stocker <cdal...@gmail.com> wrote:
> > #!/usr/bin/perl
>
> > my $pid;
>
> > while (True)
>
> You should always enable warnings when developing Perl code.
>
> --
I enables warnings, but it still really hasn't shed light into the
problem. Here is what I get..
#!/usr/bin/perl
use warnings;
my $pid = -1;
while (True)
{
local *FH;
if(`w | grep nambla | grep party`) {
$pid = open(FH, "/home/guest/cdalten/nope2 &|") or die
"$!";
#print $pid;
kill $pid;
}
else {
kill $pid;
}
sleep(1);
}
% ./scan.pl
Bareword found in conditional at ./scan.pl line 20.
^C
------------------------------
Date: Sun, 12 Apr 2009 10:26:34 -0700
From: Jrgen Exner <jurgenex@hotmail.com>
Subject: Re: How do I start and restart a program via a perl script?
Message-Id: <c384u4lik05es0frllt5r8j2fcsubhvuo3@4ax.com>
Tad J McClellan <tadmc@seesig.invalid> wrote:
>grocery_stocker <cdalten@gmail.com> wrote:
>
>> #!/usr/bin/perl
>>
>> my $pid;
>>
>> while (True)
>
>
>You should always enable warnings when developing Perl code.
Yeah, but even that ugly bare word is still a true value, so the loop
will loop as expected.
To the OP: much worse are:
- open() without testing for success
- therefore potentially undefined $pid, causing spurious errors when
using this undefined value in the kill()
- and the intended logic is just beyond me:
if(`w | grep cdalten | grep party`) {
If something is found then
$pid = open(FH, "/home/guest/cdalten/nope2 &|");
start some program
kill $pid;
and kill it immediately (why start it in the first place if you are
killing it immediately?).
\ }
and regardless if something is found or not
kill $pid;
kill the last program, no matter that it had been killed anyway already
To answer the question in the Subject: you use system() (or maybe exec()
or backticks or ...) for both tasks
To answer the question in the body (which interestingly enough has
little to do with the Subject):
Your design is missing vital parts and will never work that way.
When starting that external program store that person and the associated
PID in a hash. Then in every loop check if that person is still online.
And only if he is not online any longer then call the kill with the PID
you stored earlier.
However, that is a poor design. There really should be some better way
to stop your system but to kill a process.
jue
------------------------------
Date: Sun, 12 Apr 2009 19:42:25 +0200
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: How do I start and restart a program via a perl script?
Message-Id: <pan.2009.04.12.17.42.23@rtij.nl.invlalid>
On Sun, 12 Apr 2009 08:41:49 -0700, grocery_stocker wrote:
> On Apr 12, 8:20 am, Tad J McClellan <ta...@seesig.invalid> wrote:
>> grocery_stocker <cdal...@gmail.com> wrote:
>> > #!/usr/bin/perl
>>
>> > my $pid;
>>
>> > while (True)
>>
>> You should always enable warnings when developing Perl code.
>>
>> --
>
>
> I enables warnings, but it still really hasn't shed light into the
> problem. Here is what I get..
>
> #!/usr/bin/perl
> use warnings;
>
> my $pid = -1;
>
> while (True)
> {
> local *FH;
>
> if(`w | grep nambla | grep party`) {
> $pid = open(FH, "/home/guest/cdalten/nope2 &|") or die
> "$!";
> #print $pid;
> kill $pid;
> }
> else {
> kill $pid;
> }
> sleep(1);
> }
>
> % ./scan.pl
> Bareword found in conditional at ./scan.pl line 20. ^C
The error does not match the code you posted.....
M4
------------------------------
Date: Sun, 12 Apr 2009 12:37:26 -0500
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: How do I start and restart a program via a perl script?
Message-Id: <slrngu49mm.9ps.tadmc@tadmc30.sbcglobal.net>
grocery_stocker <cdalten@gmail.com> wrote:
> On Apr 12, 8:20 am, Tad J McClellan <ta...@seesig.invalid> wrote:
>> grocery_stocker <cdal...@gmail.com> wrote:
>> > #!/usr/bin/perl
>>
>> > my $pid;
>>
>> > while (True)
>>
>> You should always enable warnings when developing Perl code.
>>
>> --
>
>
> I enables warnings,
I didn't think it was necessary to point out the natural corollary
to enabling warnings, but I guess it is...
You should always enable warnings when developing Perl code, and then
modify your code so that is does not generate any warning messages.
> Bareword found in conditional at ./scan.pl line 20.
So fix it already!
while ('True')
or
while (1)
or
while ('infinite loop')
or
while ('forever')
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
------------------------------
Date: Sun, 12 Apr 2009 12:25:18 -0700
From: Xho Jingleheimerschmidt <xhoster@gmail.com>
Subject: Re: How do I start and restart a program via a perl script?
Message-Id: <49e241be$2$21983$ed362ca5@nr5-q3a.newsreader.com>
grocery_stocker wrote:
> The following scrpt is supposed to continuously scan the *nix who list
> to see if a particular person enters the party chatline. If they
> enter, then the script is supposed to trigger the nope program. When
> than person leaves, the nope program is supposed to be killed
Killed by whom?
> and the
> script goes back to scanning to see if that person enters the party
> chanline again.
>
> However, I can't seem to get it to work correclty. Ideas?
What is it doing instead of working correctly?
>
> #!/usr/bin/perl
>
> my $pid;
>
> while (True)
Not only should you turn on strict and warnings, you should also take
care of the problems they indicate.
> {
> if(`w | grep cdalten | grep party`) {
> $pid = open(FH, "/home/guest/cdalten/nope2 &|");
Because of the &, perl will open a shell and give the shell your command
(minus the pipe, i think) to run. The return will be the pid of this
shell. The shell will then start nope2. On unix-like systems, the
shell will then exit, because & tells it not to wait for nope2.
> #print $pid;
> kill $pid;
You are killing the shell, not the nope2 that the shell started.
Depending on the vagaries of the scheduler, you might be attempting to
kill it before it spawned nope2, after it spawned nope2 but before it
exited, or after it finished its job and exited.
Xho
------------------------------
Date: Sun, 12 Apr 2009 14:42:39 GMT
From: QoS@invalid.net
Subject: Re: multicore cpu
Message-Id: <z5nEl.424$WK5.281@nwrddc01.gnilink.net>
Sherm Pendley <spamtrap@dot-app.org> wrote in message-id: <m1d4bj6tit.fsf@dot-app.org>
>
> QoS@invalid.net writes:
>
> > It seems my programs writtin in Perl only see one core on a dual core cpu.
> >
> > Evertime the software has a lot of work to do the cpu utilization goes
> > up to exactly 50%. Is there something wrong with my Perl installation?
>
> Are you *asking* Perl to use the additional cores, by writing multi-threaded
> code? There's been some talk of auto-threading in Perl 6, but that's not
> soup yet; in the current release you have to do it yourself.
>
> sherm--
>
Thank you for your replies,
Yes this particular program does utilize threads and threads::shared.
When the main code signals the worker thread to decode some large files
by setting a shared variable, the worker performs enough work to bring
cpu usage up to 50%, so it doesnt seem to utilize the additional core.
Reading up on the threads docs, it seems there is no way to explicitly
assign an affinity to a particular thread when it is launched.
I am thinking perhaps my Perl installtion might have been installed
incorrectly for utilizing multi-core cpu's?
C:\Documents and Settings\Admin>perl -v
This is perl, v5.8.8 built for MSWin32-x86-multi-thread
Binary build 822 [280952] provided by ActiveState http://www.ActiveState.com
Built Jul 31 2007 19:34:48
C:\Documents and Settings\Admin>ver
Microsoft Windows XP [Version 5.1.2600]
Thanks,
Jason
------------------------------
Date: Sun, 12 Apr 2009 11:19:04 -0400
From: Sherm Pendley <spamtrap@dot-app.org>
Subject: Re: multicore cpu
Message-Id: <m1ljq56gt3.fsf@dot-app.org>
QoS@invalid.net writes:
> When the main code signals the worker thread to decode some large files
> by setting a shared variable, the worker performs enough work to bring
> cpu usage up to 50%, so it doesnt seem to utilize the additional core.
"The" worker thread? There's only one? And the main thread is not doing
anything while it waits for the worker thread to finish?
If you've only got one thread doing work, what would you expect Perl to
be doing on the other core?
> I am thinking perhaps my Perl installtion might have been installed
> incorrectly for utilizing multi-core cpu's?
Why on earth would you think that, when your own "perl -v" shows that it
*has* been built with threading support?
sherm--
--
My blog: http://shermspace.blogspot.com
Cocoa programming in Perl: http://camelbones.sourceforge.net
------------------------------
Date: Sun, 12 Apr 2009 11:57:40 -0700
From: sln@netherlands.com
Subject: Re: multicore cpu
Message-Id: <c6e4u4p85oi4gi2edkjm2aabqltiadq8o7@4ax.com>
On Sun, 12 Apr 2009 14:42:39 GMT, QoS@invalid.net wrote:
>
>Sherm Pendley <spamtrap@dot-app.org> wrote in message-id: <m1d4bj6tit.fsf@dot-app.org>
>
>>
>> QoS@invalid.net writes:
>>
>> > It seems my programs writtin in Perl only see one core on a dual core cpu.
>> >
>> > Evertime the software has a lot of work to do the cpu utilization goes
>> > up to exactly 50%. Is there something wrong with my Perl installation?
>>
>> Are you *asking* Perl to use the additional cores, by writing multi-threaded
>> code? There's been some talk of auto-threading in Perl 6, but that's not
>> soup yet; in the current release you have to do it yourself.
>>
>> sherm--
>>
>
>Thank you for your replies,
>
>Yes this particular program does utilize threads and threads::shared.
>
>When the main code signals the worker thread to decode some large files
>by setting a shared variable, the worker performs enough work to bring
>cpu usage up to 50%, so it doesnt seem to utilize the additional core.
>
>Reading up on the threads docs, it seems there is no way to explicitly
>assign an affinity to a particular thread when it is launched.
>
>I am thinking perhaps my Perl installtion might have been installed
>incorrectly for utilizing multi-core cpu's?
>
>C:\Documents and Settings\Admin>perl -v
>
>This is perl, v5.8.8 built for MSWin32-x86-multi-thread
>Binary build 822 [280952] provided by ActiveState http://www.ActiveState.com
>Built Jul 31 2007 19:34:48
>
>C:\Documents and Settings\Admin>ver
>
>Microsoft Windows XP [Version 5.1.2600]
>
>Thanks,
>Jason
>
>
Looks like Microsoft OS. There is no guarantee of dual-core usage on multiple
threads. The first level is multiple-processes, still no guarantee.
Read up in Visual C docs on affinity programming.
My bet is that Perl lacks parameters inducing the default processor only,
no matter what.
-sln
------------------------------
Date: Sun, 12 Apr 2009 12:28:54 -0700
From: Xho Jingleheimerschmidt <xhoster@gmail.com>
Subject: Re: multicore cpu
Message-Id: <49e241bf$0$21983$ed362ca5@nr5-q3a.newsreader.com>
QoS@invalid.net wrote:
> Sherm Pendley <spamtrap@dot-app.org> wrote in message-id: <m1d4bj6tit.fsf@dot-app.org>
>
>> QoS@invalid.net writes:
>>
>>> It seems my programs writtin in Perl only see one core on a dual core cpu.
>>>
>>> Evertime the software has a lot of work to do the cpu utilization goes
>>> up to exactly 50%. Is there something wrong with my Perl installation?
>> Are you *asking* Perl to use the additional cores, by writing multi-threaded
>> code? There's been some talk of auto-threading in Perl 6, but that's not
>> soup yet; in the current release you have to do it yourself.
>>
>> sherm--
>>
>
> Thank you for your replies,
>
> Yes this particular program does utilize threads and threads::shared.
>
> When the main code signals the worker thread to decode some large files
> by setting a shared variable, the worker performs enough work to bring
> cpu usage up to 50%, so it doesnt seem to utilize the additional core.
You start *one* worker thread. It uses *one* CPU.
What is the point of having one worker thread? If there is only to be
one, why not just have the main thread do the work itself?
>
> Reading up on the threads docs, it seems there is no way to explicitly
> assign an affinity to a particular thread when it is launched.
An affinity for what?
Xho
------------------------------
Date: Sun, 12 Apr 2009 15:03:17 -0400
From: Bernie Cosell <bernie@fantasyfarm.com>
Subject: Simple line-drawing graphics
Message-Id: <ihe4u4h9g8rvu11h10aahe6h949mvdncjk@library.airnews.net>
Any recommendations on some simple package/module/techique for doing
line-drawing in Perl? I *dont* need [likely don't want, actually]
interactive graphics. Generating a .jpg or the like would be perfectly
adequate. Or generating output in some sort of line-drawing meta language
that I could post-process into an image would be fine, too. THANKS!
/Bernie\
--
Bernie Cosell Fantasy Farm Fibers
bernie@fantasyfarm.com Pearisburg, VA
--> Too many people, too few sheep <--
------------------------------
Date: Sun, 12 Apr 2009 17:14:09 +0300
From: Eric Pozharski <whynot@pozharski.name>
Subject: Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
Message-Id: <slrngu3tqh.c35.whynot@orphan.zombinet>
Before anything else, I beg your and everyone else pardon. For some
weird reason, I'd called "tokens" "literals". Now I feel much better.
On 2009-04-11, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
> On 2009-04-11 11:59, Eric Pozharski <whynot@pozharski.name> wrote:
>> On 2009-04-10, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
>>> No. Almost all encodings today are supersets of US-ASCII.
>>>
>>> Consider these two programs:
*SKIP*
>> $ perl -Mutf8 -wle 'print "фыва"; print "\x{C0}\x{B0}"'
>> Wide character in print at -e line 1.
>> фыва
>> �
*SKIP*
>> {2775:24} [0:0]$ perl -Mencoding=latin1 -wle 'print "фыва"; print "\x{C0}\x{B0}"'
>> фыва
>> �
>
> use encoding als sets the binmode for STDOUT and STDERR, so you won't
No, it doesn't (s/STDERR/STDIN/)
{5665:37} [0:0]$ perl -Mencoding=utf8 -wle 'print STDERR "фыва"'
Wide character in print at -e line 1.
фыва
> get a warning here. Again, I was talking only about compile time
> effects, not run time, so I didn't mention that (you can read the manual
> yourself).
I fail to see any compile time effects -- either in those two above or
this one below
{2259:8} [0:0]$ perl -Mstrict -wle 'my $x = "фыва"; $x = "\x{C0}\x{B0}"'
{2264:9} [0:0]$
>>> But you can't do something like that:
>>>
>>> #!/usr/bin/perl
>>> use Greeting "Καλημέρα κόσμε";
>>> use encoding "iso-8859-7";
>>> use warnings;
>>> use strict;
>>>
>>> hello();
>>> __END__
>>>
>>> because now the use encoding comes too late: The compiler would have to
>>> go back to the start to parse "Καλημέρα κόσμε" correctly.
>>
>> You've messed everything up. Since compiler wasn't told about encoding
>> of C<use Greeting>'s argument, it's treated as latin1,
>
> Wrong: It is treated as an unspecified superset of US-ASCII.
My understanding is based on this -- C<perldoc perlunicode>
"use encoding" needed to upgrade non-Latin-1 byte strings
By default, there is a fundamental asymmetry in Perl's Unicode
model: implicit upgrading from byte strings to Unicode strings
assumes that they were encoded in ISO 8859-1 (Latin-1), but
Unicode strings are downgraded with UTF-8 encoding. This happens
because the first 256 codepoints in Unicode happens to agree
with Latin-1.
If encoding is unknown, it's treated as latin1, even if it's not.
*SKIP*
>> In case there would be C<use utf8> or C<use encoding 'utf8'>,
>
> then the compiler would complain about a malformed UTF-8 character if
> the source file was actually in ISO-8859-7.
>
> The use encoding or use utf8 *must* match the encoding of the source
> file. (And don't think about mixing several encodings in the same file
> unless you want to enter your program in an obfu contest).
But it didn't. You want to say C<"\x{C0}\x{B0}"> is a welformed UTF-8?
In spite of it's not a welformed UTF-8, compiler ignores it. However,
I've made a file with real bytes with high bit set -- it compiles OK.
The warnings are delayed to run-time.
That's not the compiler who complains, that C<use warnings;>
*SKIP*
>> You missed one important thing -- I dislike this feature,
>
> which feature?
Have you ever seen a program text where tokens are mix of ASCII and
non-ASCII characters? I've seen.
*SKIP*
>> That's what C<use utf8> is fscking for.
>
> What is it for?
Quoting C<perldoc utf8>
Do not use this pragma for anything else than telling Perl that your
script is written in UTF-8. The utility functions described below
are directly usable without "use utf8;".
My understanding of "script" is a program text outside of any quotes in
it.
>> I should agree, 'UTF-8 flag' is somewhat misleading since it's about
>> characters but utf8 by itself (I hope).
>>
>> But,.. here be dragons...
>>
>> {3335:27} [0:0]$ echo 'фыва' | xxd
>> 0000000: d184 d18b d0b2 d0b0 0a .........
>> {3356:28} [0:0]$ echo 'фыва' | recode utf8..ucs-2-internal |xxd
>> 0000000: 4404 4b04 3204 3004 0a00 D.K.2.0...
>> {3414:29} [0:1]$ perl -wle 'print "\x{4404}\x{4b04}\x{3204}\x{3004}"'
>
> You've mixed up the endianness. 'ф' is U+0444, not U+4404.
Yes, my fault. And why you skipped the next line? It behaves the same
way with endianess fixed.
*CUT*
--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom
------------------------------
Date: Sun, 12 Apr 2009 11:19:49 -0700
From: sln@netherlands.com
Subject: Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
Message-Id: <v244u4pnrsn4847icfgiiqrieohp5qkh4h@4ax.com>
On Sat, 11 Apr 2009 11:59:55 +0200, "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>On 2009-04-10 21:20, sln@netherlands.com <sln@netherlands.com> wrote:
>> On Fri, 10 Apr 2009 22:59:40 +0200, "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>>>On 2009-04-10 19:44, sln@netherlands.com <sln@netherlands.com> wrote:
>>>> Utf-16 and utf-32 have merits. Unfortunately, Perl won't do that.
>>>
>>>Actually, for all practical purposes, Perl character strings *are*
>>>UTF-32. Each character is a 32-bit value.
>>>
>>>Both UTF-16 and UTF-32 are supported for I/O, of course.
>>>
>>>> Imagine Perl doing utf-32.
>>>
>>>I don't have to imagine that, it does.
>>>
>>>> Why then you could do Regular Expressions on
>>>> a binary stream.
>>>
>>>You can't do Regexps on streams, whether binary or not (would be nice if
>>>we could).
>>>
>>>You can do Regexps on *strings*, whether they are binary or text.
>>>
>>>I don't know what that has to do with UTF-32. Binary strings consist of
>>>octets. Treating them as UTF-32 is almost almost a mistake.
>>>
>>> hp
>>
-----------------------------------------------------
Hey, first off, I really appretiate your responses, especially the Unicode.
I am newly interrested in this and am learning, but understand a good portion.
Below, I'm just going to briefly clarify some of my previous statements.
Thanks!
-sln
-----------------------------------------------------
>> If you can't do Reges on streams, then you can't parse XML.
I guess this phrasing skipped a few things. Streams is not really an
stand alone definition of anything, but an acronym for doing operations
on file descriptor's in the kernel, via api (POSIX). Certainly there
is not Regular Expression engine, or anything like that in kernel's.
>
>You don't need regexps at all to parse XML (or any other language).
>And you certainly don't need to do them on streams, since you can always
>read the next block or line from the stream and append it to your
>buffer.
>
You certainly don't need regexps to parse XML, and you certainly don't need
regexps to do string comparisons on XML. 'Stream processing' however, has a
more abstract meaning. Basically it means processing locally disposable data,
while traversing a buffer of a kernel file descriptor and not waiting for the
end of file/low-level i/o, device, pipe, or whatever the descriptor referrs to.
You certainly can't do that in the kernel. The key is that a small user buffer
is populated as the 'stream' passes through it. The buffer is either fixed size
or expands and contracts slightly as necessary to process events as they are
parsed, in computer time not necessarily real time.
The machinations of 'buffering' as it seams to indicate some delineation in
your mind, has nothing to do with 'stream' parsing or processing, only the notion
of incremental processing.
Some foolish people obfuscate XML parsing and regular expressions in some high
abstractness of language, which totally misses the point.
Regular expressions used for parsing XML is no different that simple string comparison
of token punctuation. It is for that reason I made my statement.
Many examples of push/pull stream oriented processors.
Some references of stream-oriented processing of XML (SAX or near sax compliant):
http://en.wikipedia.org/wiki/Expat_(XML)
http://en.wikipedia.org/wiki/Streaming_Transformations_for_XML
[paragraph moved]
>On the other hand, I think you don't know what a stream is:
>
>my ($fh, '<', 'test.xml');
>
>Now $fh refers a stream.
No, not really, it refers to a file descriptor.
> Please show me how you can apply a regexp to
>this stream. Solutions which don't count:
>
As I said, there is 'no formal definition' of a stream. By all acounts
a 'stream' is an abstract concept akin to a tree watching water flow by,
a near static observer of fluidic motion.
> * reading chunks from the stream into a scalar variable and then
> applying the regexp to this variable (because then you apply it to a
> string (as I wrote), not a stream.
Again, what is a stream? In this use, its an abstraction consisting of
buffering and processing layers in fluidic motion, in a continous manner.
A 'string' has nothing to do with anything.
> * writing your own regexp engine (since Perl is a general purpose
> programming language, you can of course write that but we were
> talking about Perl' builtin regexp).
But regex has nothing to do with stream's per say, there is only a limited
fixed api (soon to be expanded) that deals with file descriptors
(or Microsofts FILE *). So, you can skip this process.
>
>
>> I have already posted sometime back pack/unpack on regex streams.
>
>pack and unpack are Perl functions. They can only be applied to strings,
>not streams. If you don't mean these functions but something else, be
>more specific. And I have no idea what a "regex stream" might be. A
>stream composed of regexps? A stream with special support for regexps?
>A stream split into records with a regexp?
Remember, 'stream' is an abstract concept, and so is a 'record'.
For the record, stream parsing/processing is grabbing from 1 to user defined
amount of characters/data, using api that works on the file descriptor kernel data,
to match a pattern on which to process. This requires user space buffering.
The concept of 'stream' processing is the antithesis of processing a complete data set.
Stream-parsing XML can be as simple as reading 1 character at a time, buffering until
a key character is found that may represent a character used in the closure of a statement,
processing that possibility, then clearing the buffer, or continue buffering. It can also
depend on the state of parsing variability of the xml processor. The result is the same,
cars are taken off the track and processed. Most xml 'state' processors will stop upon
the (near) first point of error in syntax (MSXML does this). Regular expressions offer
a distinct advantage in this regard, will/can continue processing to report other errors,
advance the stream, but does not enjoy the speed as say Expat does. Stream processing
XML has unique advantages to tree's (although tree's are now windowed) and enables
multi-level filters.
>
>> I ah think your missing what Unicode is.
>
>I know quite well what Unicode is - I found characterset issues
>fascinating ever since I turned on an Apple ][ in 1984 and it identified
>itself as "Apple ". I've read Rob Pike's paper in the early 90s and
>the full unicode standard (version 2.0) in the late 90s. And I've
>discussed character encoding matters (including Unicode) a lot on
>various newsgroups and mailinglists over the years and fixed a few
>encoding related problems in various pieces of software.
>
Ah, back to my original argument, Unicode!
Was not a beef with Unicode, not at all, but it got me very interrested in it.
I didn't want to use pack/unpack templates that had no variability.
I needed to do pattern searches on 32-bit integers, plain and simple.
Had nothing to do with Unicode at all. For instance, if I found a numeric
256 (32-bit integer) in a stream of 32-bit integers, I wanted to grab
the 5th following 32-bit integer in the stream no matter what its value was.
This is the simple explanation, the real one involved complex variabilty.
So I looked at Unicode and Perl's utf-8 as the internal default,
as character representation's of 32-bit integers, to be used in regular expressions.
I didn't start with 'encodings'. In other words, encoding had nothing to do with
what I wanted to do. I understand there are encodings that translate to the code points,
in the particulare Unicode you want 8/16/32, endian and byte order mark.
The octets are the 1-6 bytes (8-bit) result of the encoding.
The code points run in ranges of 0-(2**32 - 1), but they run in ranges (utf-32 hase no code points).
Between those ranges and you run into Unicode internal control, reserved attributes (BOM,endianess etc..).
I guess I don't care about encoding if I could internalize (Perls utf-8) the full range
of 32-bit integers to characters to be used in regular expressions, then extracted back to 32-bit
integers to be used elsewhere.
.....
>> I can repost the code if you need.
I thought I had posted some code when I responded to this one ^^^^. Guess I didn't.
I will post a clipped follow-up code sample.
>
>Code is always nice because it is unambiguous (unlike the English
>language). However, keep in mind that this is a discussion group, not a
>code repository. Any code example longer than 50 lines or so is unlikely
>to be read.
>
>> Or you can read a few docs on it.
>> perlunicode.html and some others.
>
>I've read that several times (and critisized it here, too).
>
>> I doubt you'll capitulate no matter what.
>
>If you think this is a fight where one of us has to win and the other to
>capitulate, I'll stop now.
>
> hp
I hope you understand what my meaning is now, 'capitulate' is just a word.
Thank you!
-sln
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 2334
***************************************