[31090] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 2335 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Apr 13 03:09:38 2009

Date: Mon, 13 Apr 2009 00:09:05 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 13 Apr 2009     Volume: 11 Number: 2335

Today's topics:
        (newbie) need help understanding a few lines of code <jbenjam@gmail.com>
    Re: (newbie) need help understanding a few lines of cod <noreply@gunnar.cc>
        foreach performance <shurikgefter@gmail.com>
    Re: foreach performance <xhoster@gmail.com>
    Re: foreach performance <tadmc@seesig.invalid>
    Re: How do I start and restart a program via a perl scr <cdalten@gmail.com>
    Re: How do I start and restart a program via a perl scr <xhoster@gmail.com>
    Re: multicore cpu sln@netherlands.com
        new CPAN modules on Mon Apr 13 2009 (Randal Schwartz)
    Re: Simple line-drawing graphics <willem@snail.stack.nl>
    Re: Simple line-drawing graphics <hjp-usenet2@hjp.at>
    Re: Simple line-drawing graphics <bernie@fantasyfarm.com>
    Re: XML::LibXML UTF-8 toString() -vs- nodeValue() <hjp-usenet2@hjp.at>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sun, 12 Apr 2009 13:59:08 -0700 (PDT)
From: Ben <jbenjam@gmail.com>
Subject: (newbie) need help understanding a few lines of code
Message-Id: <6e552098-c7d7-4b30-b71a-d956156ec36c@c36g2000yqn.googlegroups.com>

I'm  still learning some of the basic element of Perl.  In the
following segment,

  #!/usr/bin/perl -w
  use strict;
  use OpenGL qw/ :all /;
  use Math::Trig;
  eval 'use Time::HiRes qw( gettimeofday )';
  my $hasHires = !$@;
  $|++;

The last two lines above, what is occurring?    I don't understand the
purpose of "!$@", is it related to error handling of the previous eval
command?     What's "$|++;" doing?  I understand that $| is something
related to flushing piped output.

Later on in the script:
   my $now = $hasHires ? gettimeofday() : time();

What's the function of  "?" in this context?

Thanks,

-Ben




------------------------------

Date: Mon, 13 Apr 2009 00:02:20 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: (newbie) need help understanding a few lines of code
Message-Id: <74f6nfF11ceb1U1@mid.individual.net>

Ben wrote:
> I'm  still learning some of the basic element of Perl.  In the
> following segment,
> 
>   #!/usr/bin/perl -w
>   use strict;
>   use OpenGL qw/ :all /;
>   use Math::Trig;
>   eval 'use Time::HiRes qw( gettimeofday )';

Loads the Time::HiRes module at run time; eval() prevents that the 
program dies for the case Perl fails to find Time::HiRes.

>   my $hasHires = !$@;

If Time::HiRes was found, the $@ variable contains a null string, i.e. a 
false value. !$@ means that a true value is assigned to $hasHires.

     perldoc -f eval

>   $|++;

Sets $| to a true value (adds 1). See "perldoc perlvar" about the 
meaning of $|.

> Later on in the script:
>    my $now = $hasHires ? gettimeofday() : time();
> 
> What's the function of  "?" in this context?

See "perldoc perlop", the "Conditional Operator" section.

Personally I find the coding style somewhat obfuscated. I would probably 
have said something like:

     my $noHires = $@;

and later:

     my $now = $noHires ? time() : gettimeofday();

But that's probably just a matter of taste. :)

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl


------------------------------

Date: Sun, 12 Apr 2009 14:23:41 -0700 (PDT)
From: "shurikgefter@gmail.com" <shurikgefter@gmail.com>
Subject: foreach performance
Message-Id: <5c9974b1-7cac-429f-8b90-428bb33f481a@c36g2000yqn.googlegroups.com>

Hi,

What is a better for performance:

1)
foreach my $key ( split ( ',' , $temp ) )
{
 .....
}

or

2)

@tmp = split ( ',' , $temp );
foreach my $key ( @tmp )
{
 ....
}

Does the split command will be execute each foreach loop or only first
time?



------------------------------

Date: Sun, 12 Apr 2009 14:41:22 -0700
From: Xho Jingleheimerschmidt <xhoster@gmail.com>
Subject: Re: foreach performance
Message-Id: <49e26019$0$16790$ed362ca5@nr5-q7.newsreader.com>

shurikgefter@gmail.com wrote:
> Hi,
> 
> What is a better for performance:

The catholic church was recently apologized for persecuting Galileo.
Why are you still afraid of the experimental method?

> 
> 1)
> foreach my $key ( split ( ',' , $temp ) )
> {
> ......
> }
> 
> or
> 
> 2)
> 
> @tmp = split ( ',' , $temp );
> foreach my $key ( @tmp )
> {
> .....
> }

In my hands the first is faster.  With your computer and your version of 
Perl and your details of the construction of $temp, the results may be 
different.
> 
> Does the split command will be execute each foreach loop or only first
> time?

If it executed each time, you would likely have a infinite loop, 
wouldn't you?


Xho


------------------------------

Date: Sun, 12 Apr 2009 17:46:14 -0500
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: foreach performance
Message-Id: <slrngu4rpm.cl4.tadmc@tadmc30.sbcglobal.net>

shurikgefter@gmail.com <shurikgefter@gmail.com> wrote:


> What is a better for performance:


    perldoc Benchmark


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Sun, 12 Apr 2009 14:51:44 -0700 (PDT)
From: grocery_stocker <cdalten@gmail.com>
Subject: Re: How do I start and restart a program via a perl script?
Message-Id: <b638a4ac-0821-4ad4-8f91-419332f7ca95@d2g2000pra.googlegroups.com>

On Apr 12, 12:25 pm, Xho Jingleheimerschmidt <xhos...@gmail.com>
wrote:
> grocery_stocker wrote:
> > The following scrpt is supposed to continuously scan the *nix who list
> > to see if a particular person enters the party chatline. If they
> > enter, then the script is supposed to trigger the nope program. When
> > than person leaves, the nope program is supposed to be killed
>
> Killed by whom?
>
> > and the
> > script goes back to scanning to see if that person enters the party
> > chanline again.
>
> > However, I can't seem to get it to work correclty. Ideas?
>
> What is it doing instead of working correctly?
>
>
>
> > #!/usr/bin/perl
>
> > my $pid;
>
> > while (True)
>
> Not only should you turn on strict and warnings, you should also take
> care of the problems they indicate.
>
> > {
> >     if(`w | grep cdalten | grep party`) {
> >         $pid = open(FH, "/home/guest/cdalten/nope2 &|");
>
> Because of the &, perl will open a shell and give the shell your command
> (minus the pipe, i think) to run.  The return will be the pid of this
> shell.  The shell will then start nope2.  On unix-like systems, the
> shell will then exit, because & tells it not to wait for nope2.
>
> >         #print $pid;
> >         kill $pid;
>
> You are killing the shell, not the nope2 that the shell started.
> Depending on the vagaries of the scheduler, you might be attempting to
> kill it before it spawned nope2, after it spawned nope2 but before it
> exited, or after it finished its job and exited.
>

Okay, so how would I kill nope2 ?


------------------------------

Date: Sun, 12 Apr 2009 15:05:11 -0700
From: Xho Jingleheimerschmidt <xhoster@gmail.com>
Subject: Re: How do I start and restart a program via a perl script?
Message-Id: <49e264c5$0$16127$ed362ca5@nr5-q3a.newsreader.com>

grocery_stocker wrote:
>>>         #print $pid;
>>>         kill $pid;
>> You are killing the shell, not the nope2 that the shell started.
>> Depending on the vagaries of the scheduler, you might be attempting to
>> kill it before it spawned nope2, after it spawned nope2 but before it
>> exited, or after it finished its job and exited.
>>
> 
> Okay, so how would I kill nope2 ?

I don't know what nope2 does, or how it does it, or why it does it, so I 
don't know how one would go about deciding when it is done doing it.
I would think nope2 would decide for itself when it was doing whatever 
it is doing, and would exit.

If you omitted the & from your forked open code, then the pid you got 
back would be that of nope2, and so your kill statement would be killing 
nope2.  Whether it would be doing it at the right time is doubtful, but 
only you can answer that.

Xho


------------------------------

Date: Sun, 12 Apr 2009 13:49:51 -0700
From: sln@netherlands.com
Subject: Re: multicore cpu
Message-Id: <knk4u45gl0rrnhbn9bu4b09p6nsmeph3it@4ax.com>

On Sun, 12 Apr 2009 12:28:54 -0700, Xho Jingleheimerschmidt <xhoster@gmail.com> wrote:

[snip]
>> 
>> Reading up on the threads docs, it seems there is no way to explicitly
>> assign an affinity to a particular thread when it is launched.
>
>An affinity for what?
>
>Xho

In Microsoft OS you can asign an affinity (the tendency to use) when you start
a process, or can be assigned via the registry, set from explorer shell, proxied
on your behalf, to use a particular core or single core, in a dual/quad-core
environment.

-sln



------------------------------

Date: Mon, 13 Apr 2009 04:42:28 GMT
From: merlyn@stonehenge.com (Randal Schwartz)
Subject: new CPAN modules on Mon Apr 13 2009
Message-Id: <KI0vqs.183p@zorch.sf-bay.org>

The following modules have recently been added to or updated in the
Comprehensive Perl Archive Network (CPAN).  You can install them using the
instructions in the 'perlmodinstall' page included with your Perl
distribution.

Acme-24-0.02
http://search.cpan.org/~cosimo/Acme-24-0.02/
Your favourite TV-show Acme module 
----
Acme-24-0.03
http://search.cpan.org/~cosimo/Acme-24-0.03/
Your favourite TV-show Acme module 
----
Acme-CPANAuthors-Chinese-0.07
http://search.cpan.org/~fayland/Acme-CPANAuthors-Chinese-0.07/
We are chinese CPAN authors 
----
Apache-SWIT-0.44
http://search.cpan.org/~bosu/Apache-SWIT-0.44/
mod_perl based application server with integrated testing. 
----
App-Munchies-0.1.678
http://search.cpan.org/~pjfl/App-Munchies-0.1.678/
Catalyst example application using food recipes as a data set 
----
App-Munchies-0.1.679
http://search.cpan.org/~pjfl/App-Munchies-0.1.679/
Catalyst example application using food recipes as a data set 
----
App-Rad-Plugin-TT-0.1
http://search.cpan.org/~fco/App-Rad-Plugin-TT-0.1/
Template Toolkit extension for the App::Rad framework 
----
Audio-Scan-0.04
http://search.cpan.org/~agrundma/Audio-Scan-0.04/
Fast C parser for MP3, Ogg Vorbis, FLAC, ASF 
----
CPAN-Mini-Growl-0.02
http://search.cpan.org/~miyagawa/CPAN-Mini-Growl-0.02/
Growls updates from CPAN::Mini 
----
CPANPLUS-Dist-Arch-0.07
http://search.cpan.org/~juster/CPANPLUS-Dist-Arch-0.07/
CPANPLUS backend for building Archlinux pacman packages 
----
Catalyst-App-RoleApplicator-0.003
http://search.cpan.org/~hdp/Catalyst-App-RoleApplicator-0.003/
apply roles to your Catalyst application-related classes 
----
CatalystX-Usul-0.1.450
http://search.cpan.org/~pjfl/CatalystX-Usul-0.1.450/
A base class for Catalyst MVC components 
----
Chef-0.01
http://search.cpan.org/~holoway/Chef-0.01/
Write Chef recipes in Perl instead of Ruby. 
----
Convert-zBase32-0.0200
http://search.cpan.org/~gwyn/Convert-zBase32-0.0200/
Convert human-oriented base-32 encoded strings 
----
Document-Stembolt-0.012
http://search.cpan.org/~rkrimen/Document-Stembolt-0.012/
Read & edit a document with YAML-ish meta-data 
----
File-Map-0.13
http://search.cpan.org/~leont/File-Map-0.13/
Memory mapping made simple and safe. 
----
Gtk2-Ex-Xor-6
http://search.cpan.org/~kryde/Gtk2-Ex-Xor-6/
shared support for drawing with XOR 
----
HTML-Accessors-0.1.61
http://search.cpan.org/~pjfl/HTML-Accessors-0.1.61/
Generate HTML elements 
----
HTML-Obliterate-0.3
http://search.cpan.org/~dmuey/HTML-Obliterate-0.3/
Perl extension to remove HTML from a string or arrayref of strings. 
----
IO-Vec-0.02
http://search.cpan.org/~leont/IO-Vec-0.02/
writev and readv in perl 
----
Image-OCR-Tesseract-1.20
http://search.cpan.org/~leocharre/Image-OCR-Tesseract-1.20/
----
Log-Dispatch-Configurator-Any-1.0004
http://search.cpan.org/~oliver/Log-Dispatch-Configurator-Any-1.0004/
Configurator implementation with Config::Any 
----
Module-Changes-ADAMK-0.05
http://search.cpan.org/~adamk/Module-Changes-ADAMK-0.05/
Parse a traditional Changes file (as ADAMK interpretes it) 
----
MooseX-Meta-TypeConstraint-ForceCoercion-0.01
http://search.cpan.org/~flora/MooseX-Meta-TypeConstraint-ForceCoercion-0.01/
Force coercion when validating type constraints 
----
MooseX-Method-Signatures-0.15
http://search.cpan.org/~flora/MooseX-Method-Signatures-0.15/
Method declarations with type constraints and no source filter 
----
MooseX-RelatedClassRoles-0.004
http://search.cpan.org/~hdp/MooseX-RelatedClassRoles-0.004/
Apply roles to a class related to yours 
----
Net-Ifconfig-Wrapper-0.10
http://search.cpan.org/~tpaba/Net-Ifconfig-Wrapper-0.10/
provides a unified way to configure network interfaces on FreeBSD, OpenBSD, Solaris, Linux, OS X, and WinNT (from Win2K). 
----
Net-Ifconfig-Wrapper-0.10-no-world-writable
http://search.cpan.org/~tpaba/Net-Ifconfig-Wrapper-0.10-no-world-writable/
provides a unified way to configure network interfaces on FreeBSD, OpenBSD, Solaris, Linux, OS X, and WinNT (from Win2K). 
----
Numeric-LL_Array-0.03
http://search.cpan.org/~ilyaz/Numeric-LL_Array-0.03/
Perl extension for low level operations over numeric arrays. 
----
POE-Session-Multiplex-0.0400
http://search.cpan.org/~gwyn/POE-Session-Multiplex-0.0400/
POE session with object multiplexing 
----
POE-Session-PlainCall-0.0200
http://search.cpan.org/~gwyn/POE-Session-PlainCall-0.0200/
POE sessions with plain perl calls 
----
Package-Rename-0.01
http://search.cpan.org/~leont/Package-Rename-0.01/
Rename or copy package 
----
Package-Subroutine-0.14
http://search.cpan.org/~sknpp/Package-Subroutine-0.14/
minimalistic import/export and other util methods 
----
Perl-Critic-Pulp-16
http://search.cpan.org/~kryde/Perl-Critic-Pulp-16/
some add-on perlcritic policies 
----
PostScript-PPD-0.0200
http://search.cpan.org/~gwyn/PostScript-PPD-0.0200/
Read PostScript Printer Definition files 
----
Proc-Exists-0.99_02
http://search.cpan.org/~brianski/Proc-Exists-0.99_02/
quickly and portably check for process existence 
----
Proc-Exists-1.00
http://search.cpan.org/~brianski/Proc-Exists-1.00/
quickly and portably check for process existence 
----
RiveScript-1.19
http://search.cpan.org/~kirsle/RiveScript-1.19/
Rendering Intelligence Very Easily 
----
SVN-Hooks-0.16.48
http://search.cpan.org/~gnustavo/SVN-Hooks-0.16.48/
A framework for implementing Subversion hooks. 
----
SVN-Hooks-0.16.52
http://search.cpan.org/~gnustavo/SVN-Hooks-0.16.52/
A framework for implementing Subversion hooks. 
----
Test-Valgrind-1.00
http://search.cpan.org/~vpit/Test-Valgrind-1.00/
Test Perl code through valgrind. 
----
Text-Diff-Parser-0.0900
http://search.cpan.org/~gwyn/Text-Diff-Parser-0.0900/
Parse patch files containing unified and standard diffs 
----
VMS-Mail-0_06
http://search.cpan.org/~cberry/VMS-Mail-0_06/
VMS callable mail interface 
----
ZConf-1.1.0
http://search.cpan.org/~vvelox/ZConf-1.1.0/
A configuration system allowing for either file or LDAP backed storage. 


If you're an author of one of these modules, please submit a detailed
announcement to comp.lang.perl.announce, and we'll pass it along.

This message was generated by a Perl program described in my Linux
Magazine column, which can be found on-line (along with more than
200 other freely available past column articles) at
  http://www.stonehenge.com/merlyn/LinuxMag/col82.html

print "Just another Perl hacker," # the original

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion


------------------------------

Date: Sun, 12 Apr 2009 20:18:19 +0000 (UTC)
From: Willem <willem@snail.stack.nl>
Subject: Re: Simple line-drawing graphics
Message-Id: <slrngu4j4b.1onb.willem@snail.stack.nl>

Bernie Cosell wrote:
) Any recommendations on some simple package/module/techique for doing
) line-drawing in Perl?   I *dont* need [likely don't want, actually]
) interactive graphics.  Generating a .jpg or the like would be perfectly
) adequate.  Or generating output in some sort of line-drawing meta language
) that I could post-process into an image would be fine, too.  THANKS!

"Some sort of line-drawing meta language" ?
Hmm, how about Postscript ?

A quick search on CPAN gives PostScript::Simple as a first hit.


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT


------------------------------

Date: Sun, 12 Apr 2009 23:27:16 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Simple line-drawing graphics
Message-Id: <slrngu4n5k.p6l.hjp-usenet2@hrunkner.hjp.at>

On 2009-04-12 19:03, Bernie Cosell <bernie@fantasyfarm.com> wrote:
> Any recommendations on some simple package/module/techique for doing
> line-drawing in Perl?   I *dont* need [likely don't want, actually]
> interactive graphics.  Generating a .jpg or the like would be perfectly
> adequate.

You may want to look at the GD module and the modules which build on it.

	hp


------------------------------

Date: Sun, 12 Apr 2009 20:52:47 -0400
From: Bernie Cosell <bernie@fantasyfarm.com>
Subject: Re: Simple line-drawing graphics
Message-Id: <m535u49t1cq1gi2cbj2bhd72mrtrqm72fm@library.airnews.net>

Bernie Cosell <bernie@fantasyfarm.com> wrote:

} Any recommendations on some simple package/module/techique for doing
} line-drawing in Perl?

Thanks for the recommendations -- both GD:: and Postscript:: look like
they're just the sort of thing I was looking for. THANKS!  /bernie\
-- 
Bernie Cosell                     Fantasy Farm Fibers
bernie@fantasyfarm.com            Pearisburg, VA
    -->  Too many people, too few sheep  <--          


------------------------------

Date: Sun, 12 Apr 2009 23:14:39 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
Message-Id: <slrngu4me2.p6l.hjp-usenet2@hrunkner.hjp.at>

On 2009-04-12 14:14, Eric Pozharski <whynot@pozharski.name> wrote:
> Before anything else, I beg your and everyone else pardon.  For some
> weird reason, I'd called "tokens" "literals".  Now I feel much better.
>
> On 2009-04-11, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
>> On 2009-04-11 11:59, Eric Pozharski <whynot@pozharski.name> wrote:
>>> On 2009-04-10, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
>>>> No. Almost all encodings today are supersets of US-ASCII.
>>>>
>>>> Consider these two programs:
> *SKIP*
>>> 	$ perl -Mutf8 -wle 'print "фыва"; print "\x{C0}\x{B0}"'
>>> 	Wide character in print at -e line 1.
>>> 	фыва
>>> 	�
> *SKIP*
>>> 	{2775:24} [0:0]$ perl -Mencoding=latin1 -wle 'print "фыва"; print "\x{C0}\x{B0}"'
>>> 	фыва
>>> 	�
>>
>> use encoding als sets the binmode for STDOUT and STDERR, so you won't
>
> No, it doesn't (s/STDERR/STDIN/)

Yes, that was a typo. Sorry.


> 	{5665:37} [0:0]$ perl -Mencoding=utf8 -wle 'print STDERR "фыва"'
> 	Wide character in print at -e line 1.
> 	фыва
>
>> get a warning here. Again, I was talking only about compile time
>> effects, not run time, so I didn't mention that (you can read the manual
>> yourself).
>
> I fail to see any compile time effects -- either in those two above or
> this one below

Well, you aren't looking for any compile time effects, so you won't see
any :-).

Let's compare 4 programs, which are all essentially the same:

#!/usr/bin/perl
use XXX ###
use warnings;
use strict;

my $greeting = "Καλημέρα κόσμε";
dumpstr($greeting);

sub dumpstr {
    my ($s) = @_;

    print utf8::is_utf8($s) ? "char" : "byte";
    print "[", length($s), "]";
    print ":";
    for (split //, $s) {
	printf " %#02x", ord($_);
    }
    print "\n";
}
__END__

The differences are in the encoding of the source file (UTF-8 vs.
ISO-8859-7) and the line marked "use XXX ###" above.

1) encoded in UTF-8, contains "use utf8;"
   prints:

   char[14]: 0x39a 0x3b1 0x3bb 0x3b7 0x3bc 0x3ad 0x3c1 0x3b1 0x20 0x3ba
   0x3cc 0x3c3 0x3bc 0x3b5

2) encoded in UTF-8, no "use utf8;"
   prints:

   byte[27]: 0xce 0x9a 0xce 0xb1 0xce 0xbb 0xce 0xb7 0xce 0xbc 0xce 0xad
   0xcf 0x81 0xce 0xb1 0x20 0xce 0xba 0xcf 0x8c 0xcf 0x83 0xce 0xbc 0xce
   0xb5

3) encoded in ISO-8859-7, contains "use encoding 'ISO-8859-1';"
   prints:

   char[14]: 0x39a 0x3b1 0x3bb 0x3b7 0x3bc 0x3ad 0x3c1 0x3b1 0x20 0x3ba
   0x3cc 0x3c3 0x3bc 0x3b5

4) encoded in ISO-8859-7, no "use encoding 'ISO-8859-1';"
   prints:

   byte[14]: 0xca 0xe1 0xeb 0xe7 0xec 0xdd 0xf1 0xe1 0x20 0xea 0xfc 0xf3
   0xec 0xe5

As you can see, in the two cases where "use utf8" resp. "use encoding"
was used, the string constant was converted to a character string: The
so-called utf8 flag is on, the first character ("Κ") is U+039A ("GREEK
CAPITAL LETTER KAPPA"). In the other two cases the string is left as an
uninterpreted byte string: (0xCE 0x9E) is the UTF-8 encoding of a Kappa,
(0xCA) is the ISO-8859-7 encoding of a Kappa.

You can verify that the compiler really converts the string constant
(and doesn't insert a call to encode which is evaluated at run-time)
with -MO=Concise.



>>>> But you can't do something like that:
>>>>
>>>> #!/usr/bin/perl
>>>> use Greeting "Καλημέρα κόσμε";
>>>> use encoding "iso-8859-7";
>>>> use warnings;
>>>> use strict;
>>>>
>>>> hello();
>>>> __END__
>>>>
>>>> because now the use encoding comes too late: The compiler would have to
>>>> go back to the start to parse "Καλημέρα κόσμε" correctly.
>>>
>>> You've messed everything up.  Since compiler wasn't told about encoding
>>> of C<use Greeting>'s argument, it's treated as latin1,
>>
>> Wrong: It is treated as an unspecified superset of US-ASCII.
>
> My understanding is based on this -- C<perldoc perlunicode>
>
>     "use encoding" needed to upgrade non-Latin-1 byte strings
> 	By default, there is a fundamental asymmetry in Perl's Unicode
> 	model: implicit upgrading from byte strings to Unicode strings
> 	assumes that they were encoded in ISO 8859-1 (Latin-1), but
> 	Unicode strings are downgraded with UTF-8 encoding.

This paragraph is confusing. I have a vague idea what the author wanted
to say but even then it's not quite correct. I doubt somebody can
understand this paragraph unless they already exactly understood the
problems before.


>       This happens because the first 256 codepoints in Unicode happens
>       to agree with Latin-1.
>
> If encoding is unknown, it's treated as latin1, even if it's not.

This has nothing to do with "use utf8" and "use encoding". The
"implicit upgrading" which is mentioned here happens (for example) when
you concatenate a byte string to a character string. But then the result
*is* a character string, not a byte string.

Byte strings are *not* implicitely assumed to be ISO-8859-1, as you can
easily check by matching against a character class:

% perl -le '$_ = "\x{FC}"; print /\w/ ? "yes" : "no"'
no
% perl -le '$_ = "\x{FC}"; utf8::upgrade($_); print /\w/ ? "yes" : "no"'
yes

So, in a byte string the code point 0xFC does not count as a word
character, but in a character string it does. If byte strings were
assumed to be ISO-8859-1, then 0xFC would be a word character, so
obviously it isn't. Instead, byte strings are assumed to be some
superset of US-ASCII:

% perl -le '$_ = "\x{6C}"; print /\w/ ? "yes" : "no"'
yes

0x6C is a letter ("l") in ASCII, but 0xFC isn't (ASCII defines only
0x00-0x7F).

(I hear that somebody's working to change this to reduce the differences
in behaviour between byte and character strings)


>>> In case there would be C<use utf8> or C<use encoding 'utf8'>,
>>
>> then the compiler would complain about a malformed UTF-8 character if
>> the source file was actually in ISO-8859-7.
>
> But it didn't.

It does for me. If I change "use encoding 'ISO-8859-7'" to "use utf8"
in my ISO-8859-7 encoded file, I get a lot of warnings.

> You want to say C<"\x{C0}\x{B0}"> is a welformed UTF-8?

Sort of: It decodes cleanly to U+0030. But the canonical (shortest)
encoding of U+0030 is "\x{30}", and UTF-8 generating programs MUST
always produce the canonical encoding. UTF-8 consuming programs should
complain if they encounter a non-canonical encoding. Perl behaves a bit
weirdly here: It doesn't complain when it reads the string, but it does
complain on some operations on it, e.g. ord(). I consider that a bug.


>>> You missed one important thing -- I dislike this feature,
>>
>> which feature?
>
> Have you ever seen a program text where tokens are mix of ASCII and
> non-ASCII characters?  I've seen.

I usually stick to using English names for my subs and variables. But if
I was using German names I might as well use umlauts. Mathematical
symbols might also be handy. I would have a problem if my colleague used
Chinese, though ;-).

(I already wanted to use € in a variable name (it contained a monetary
amount in Euro), but € isn't a work character. OTOH, $ isn't either, so
I guess that's fair)



>>> That's what C<use utf8> is fscking for.
>>
>> What is it for?
>
> Quoting C<perldoc utf8>
>
>     Do not use this pragma for anything else than telling Perl that your
>     script is written in UTF-8. The utility functions described below
>     are directly usable without "use utf8;".

I believe I already said that once or twice in this thread.


> My understanding of "script" is a program text outside of any quotes in
> it.

Bullshit. A script is the complete program text, including any string
constants, numeric constants, comments, the __DATA__ stream, if any.
Why would a string constant in a script not be part of it?


>>> But,..  here be dragons...
>>>
>>> 	{3335:27} [0:0]$ echo 'фыва' | xxd
>>> 	0000000: d184 d18b d0b2 d0b0 0a                   .........
>>> 	{3356:28} [0:0]$ echo 'фыва' | recode utf8..ucs-2-internal |xxd
>>> 	0000000: 4404 4b04 3204 3004 0a00                 D.K.2.0...
>>> 	{3414:29} [0:1]$ perl -wle 'print "\x{4404}\x{4b04}\x{3204}\x{3004}"'
>>
>> You've mixed up the endianness. 'ф' is U+0444, not U+4404.
>
> Yes, my fault.  And why you skipped the next line?  It behaves the same
> way with endianess fixed.

You mean:

        {3415:30} [0:0]$ perl -Mencoding=ucs2 -wle 'print "\x{4404}\x{4b04}\x{3204}\x{3004}"'
        Can't locate object method "cat_decode" via package "Encode::Unicode" at
        -e line 1.

That doesn't fix the endianness, and it behaves completely differently. 
"perl -Mencoding=ucs2" can't work, as I already explained to sln.

	hp


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 2335
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[31090] in Perl-Users-Digest

Perl-Users Digest, Issue: 2335 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Mon Apr 13 03:09:38 2009

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Apr 13 03:09:38 2009