[24611] in Perl-Users-Digest
Perl-Users Digest, Issue: 6787 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Jul 10 09:05:40 2004
Date: Sat, 10 Jul 2004 06:05:09 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sat, 10 Jul 2004 Volume: 10 Number: 6787
Today's topics:
Re: $var = <LINE> ?? <Joe.Smith@inwap.com>
Re: Code Style -- here-docs -- How do you make them loo <notvalid@email.com>
Re: double quotes vs. single quotes (was Re: hash as ar <nilram@hotpop.com>
Re: double quotes vs. single quotes (was Re: hash as ar <nilram@hotpop.com>
Re: double quotes vs. single quotes (was Re: hash as ar <abigail@abigail.nl>
Re: hash as argument (Anno Siegel)
Re: hash as argument <abigail@abigail.nl>
Re: hash as argument <abigail@abigail.nl>
Re: hash as argument <tassilo.parseval@rwth-aachen.de>
Re: hash as argument <trammell+usenet@hypersloth.invalid>
how perl set envirment variable <wangtg@web.de>
Re: how perl set envirment variable <noreply@gunnar.cc>
Re: how perl set envirment variable <Joe.Smith@inwap.com>
how the vector is created, how to pass vector to webser (Rushikesh Joshi)
Installing seperate version of Perl. <olczyk2002@yahoo.com>
Re: Installing seperate version of Perl. <spamtrap@dot-app.org>
Re: what do you call funct ( funct()) <Joe.Smith@inwap.com>
why utf8::upgrade is needed? <pajas@ufal.ms.mff.cuni.cz>
Re: why utf8::upgrade is needed? <tassilo.parseval@rwth-aachen.de>
Re: why utf8::upgrade is needed? <pajas@ufal.ms.mff.cuni.cz>
Re: why utf8::upgrade is needed? <tassilo.parseval@rwth-aachen.de>
Re: why utf8::upgrade is needed? <flavell@ph.gla.ac.uk>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sat, 10 Jul 2004 07:57:22 GMT
From: Joe Smith <Joe.Smith@inwap.com>
Subject: Re: $var = <LINE> ??
Message-Id: <CjNHc.44081$JR4.668@attbi_s54>
Eric Enright wrote:
>>>foreach $outerline (<FIN>)
>>
>>HERE is your real problem! foreach() will evaluate <FIN> in
>>list context, which means that the entire file will be read
>>into memory, and *then* $outerline will be assigned one line at
>>a time.
>
> I was under the impression that this was an efficient way to run
> through a file line by line, without a lot of buffering. I read
> this somewhere for a method of running through a million line
> file without buffering it all first.
The way to go through a file line by line without a lot of buffering is
while(<FIN>) {...}
or
while(defined $outerline=<FIN>) {...}
and not use foreach().
-Joe
------------------------------
Date: Sat, 10 Jul 2004 06:49:05 GMT
From: Ala Qumsieh <notvalid@email.com>
Subject: Re: Code Style -- here-docs -- How do you make them look good?
Message-Id: <BjMHc.8577$4g3.7894@newssvr25.news.prodigy.com>
Paul Lalli wrote:
> On Fri, 9 Jul 2004, MST wrote:
>
>>if($test) {
>> print <<HTML
>> <p>Some html $Junk!</p>
>> <p>$More junk</p>
>>HTML
>>}
>>
>>and that isn't so bad. What irks me is if the here-doc isn't at the
>>end of a block I need to throw a ; on a line all by its lonesome.
>
>
> No you don't. You need to put a semi-colon after the first heredoc
> marker:
>
> if ($test) {
> print <<HTML;
> <p>Some html here</p>
> HTML
> print "this prints!\n";
> }
That is of course correct. But, as an Emacs user, I know for a fact that
heredocs confuse its cperl mode, and the auto indentation breaks after
the end marker. Putting a lone semicolon on a line by itself after the
end marker fixes the problem.
To the OP, I don't see how this affects readability.
--Ala
------------------------------
Date: 09 Jul 2004 23:04:38 -0500
From: Dale Henderson <nilram@hotpop.com>
Subject: Re: double quotes vs. single quotes (was Re: hash as argument)
Message-Id: <871xjkqzhl.fsf@camel.tamu-commerce.edu>
>>>>> "Abigail" == Abigail <abigail@abigail.nl> writes:
Abigail> There's no interpreter.
Now I'm confused. If there is no interpreter, what executes the
output (for lack of a better word) of the compiler?
--
Dale Henderson
"Imaginary universes are so much more beautiful than this stupidly-
constructed 'real' one..." -- G. H. Hardy
------------------------------
Date: 09 Jul 2004 23:09:18 -0500
From: Dale Henderson <nilram@hotpop.com>
Subject: Re: double quotes vs. single quotes (was Re: hash as argument)
Message-Id: <87wu1cpkpd.fsf@camel.tamu-commerce.edu>
>>>>> "DH" == Dale Henderson <nilram@hotpop.com> writes:
>>>>> "TM" == Tad McClellan <tadmc@augustmail.com> writes:
TM> Joe Smith <Joe.Smith@inwap.com> wrote:
TM> The programmer's choice of quotes is a note to (him|her)self:
TM> "interpolation or backslash escapes are here!" or 'nothing
TM> special going on here'
DH> This makes me wonder if perhaps 'This is a string' is
DH> faster than "This is a string" because in the first example
DH> the interpreter can just use the string as is. But in the
DH> second the interpreter must scan the string looking for
DH> interpolation or backslash and construct a new string to use.
Forget this. I'm being an idiot. First of all backslashes are a
non-issue since they are taken care of by the compiler.
For some reason I was thinking that if you assigned $string="Var is
$var"; then $string would be re-interpolated every time it is
used. This patently false!
--
Dale Henderson
"Imaginary universes are so much more beautiful than this stupidly-
constructed 'real' one..." -- G. H. Hardy
------------------------------
Date: 10 Jul 2004 11:32:27 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: double quotes vs. single quotes (was Re: hash as argument)
Message-Id: <slrncevkua.4g9.abigail@alexandra.abigail.nl>
Dale Henderson (nilram@hotpop.com) wrote on MMMCMLXVI September MCMXCIII
in <URL:news:871xjkqzhl.fsf@camel.tamu-commerce.edu>:
~~ >>>>> "Abigail" == Abigail <abigail@abigail.nl> writes:
~~
~~ Abigail> There's no interpreter.
~~
~~ Now I'm confused. If there is no interpreter, what executes the
~~ output (for lack of a better word) of the compiler?
You got be kidding me. You wrote:
This makes me wonder if perhaps 'This is a string' is faster than
"This is a string" because in the first example the interpreter
can just use the string as is. But in the second the interpreter
must scan the string looking for interpolation or backslash
and construct a new string to use.
Now, clearly you weren't talking about an interpreter that executes
output of the compiler. You were talking about an interpreter in the
classical sense - one that takes a unit of code, interprets and
executes it.
If you want to call whatever is executing the compile code an "interpreter",
that's fine with me. Just don't confuse matters by suggesting that same
thing will actually compile the code as well.
Abigail
--
@;=split//=>"Joel, Preach sartre knuth\n";$;=chr 65;%;=map{$;++=>$_}
0,22,13,16,5,14,21,1,23,11,2,7,12,6,8,15,3,19,24,14,10,20,18,17,4,25
;print@;[@;{A..Z}];
------------------------------
Date: 10 Jul 2004 11:21:53 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: hash as argument
Message-Id: <ccojgh$pbc$1@mamenchi.zrz.TU-Berlin.DE>
Abigail <abigail@abigail.nl> wrote in comp.lang.perl.misc:
> Anno Siegel (anno4000@lublin.zrz.tu-berlin.de) wrote on MMMCMLXV
> September MCMXCIII in <URL:news:ccm3tf$9ra$1@mamenchi.zrz.TU-Berlin.DE>:
> %% Daedalus <daedalus@videotron.ca> wrote in comp.lang.perl.misc:
[...]
> %% > that there's no interpolation. To me their uses are a matter of choice. But
> %% > saying "misuse of double-quotes" or ask to "fix" the thing... Do it because
> %% > the majority is doing it, thats not a rule of perl.
> %%
> %% It is a more general rule of human interaction. It is often useful
> %% to do things in one, agreed-upon way, even if there is no rational
> %% reason why that particular way should be preferred. Traffic regulations
> %% (beginning with the side of the road you drive on) are often quite
> %% arbitrary, but the advantages of adhering to them are obvious.
>
> I get the traffic story. Traffic rules are good because there are many
> people on the road at once, each driving a potentially lethal weapon.
> I don't come across hundreds of other programmers whose editors could
> harm me when I write a program.
The consequences of violating a traffic rule are obviously more dire than
disregarding a programming convention, but that isn't the point of the
comparison. The point is that the rules are basically arbitrary, but
it is still preferable to have a rule to having none. Folklore about
the sword-hand side aside, there is no intrinsic advantage in driving
on the right side over driving on the left. There is still a distinct
advantage to agreeing on one or the other. Knowing what to expect
makes assessment of situations easier. That goes for single/double
quoting as well.
> %% Similarly, for a programming community, there are advantages in having
> %% conventional preferences for one style over another when technically
> %% two (or more) ways would give the same result. By using the conventional
> %% style, the author tells the reader "Nothing to see here, move along...".
> %% A deviation from the standard tells the reader to look for the reason.
> %% That makes such code a lot easier to read.
>
> The "conventional style"? Are you now claiming that whatever style you
> are defending is "conventional" and that the others are "deviating"?
> That's quite presumptuous.
Presumptuous or not, that's the way a new convention propagates.
Deliberately or unconsciously, some people publicly act as if it were
already in place. If enough people follow suit, it becomes so.
Clpm is one of the places where this process happens.
> %% Thus, a set of stylistic conventions gives a language a dimension of
> %% expressiveness it wouldn't have without it. That is a Good Thing.
> %% The preference of '' over "" and of sub() over &sub belong in this
> %% category.
>
> I think they aren't equivalent. 'sub()' is used far more often than
> '&sub' - that preference has been settled. I highly doubt that the
> preference of '' vs "" has been settled.
Then it's time to settle it :)
Anno
------------------------------
Date: 10 Jul 2004 11:33:56 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: hash as argument
Message-Id: <slrncevl14.4g9.abigail@alexandra.abigail.nl>
Anno Siegel (anno4000@lublin.zrz.tu-berlin.de) wrote on MMMCMLXVI
September MCMXCIII in <URL:news:ccojgh$pbc$1@mamenchi.zrz.TU-Berlin.DE>:
:) Abigail <abigail@abigail.nl> wrote in comp.lang.perl.misc:
:) >
:) > I think they aren't equivalent. 'sub()' is used far more often than
:) > '&sub' - that preference has been settled. I highly doubt that the
:) > preference of '' vs "" has been settled.
:)
:) Then it's time to settle it :)
Good. Settle on whatever I'm doing. ;-)
Abigail
--
srand 123456;$-=rand$_--=>@[[$-,$_]=@[[$_,$-]for(reverse+1..(@[=split
//=>"IGrACVGQ\x02GJCWVhP\x02PL\x02jNMP"));print+(map{$_^q^"^}@[),"\n"
------------------------------
Date: 10 Jul 2004 11:36:27 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: hash as argument
Message-Id: <slrncevl5q.4g9.abigail@alexandra.abigail.nl>
Tad McClellan (tadmc@augustmail.com) wrote on MMMCMLXV September MCMXCIII
in <URL:news:slrncetbgo.93h.tadmc@magna.augustmail.com>:
## Anno Siegel <anno4000@lublin.zrz.tu-berlin.de> wrote:
##
## > By using the conventional
## > style, the author tells the reader "Nothing to see here, move along...".
## > A deviation from the standard tells the reader to look for the reason.
## > That makes such code a lot easier to read.
##
##
## Some more examples of saying that "something special" is going
## on when nothing special is going on:
##
## m/RE/s; # when RE does not contain a dot
##
## m/RE/m; # when RE does not contain ^ or $ anchors
##
## printf "%s\n", 'Hello World'; # printf() w/same formatting as print()
Interesting. Damian was argueing on the last YAPC that one should always
use /msx, regardless of whether you actually use . or ^ and $.
I really appreciated the idea.
Abigail
--
echo "==== ======= ==== ======"|perl -pes/=/J/|perl -pes/==/us/|perl -pes/=/t/\
|perl -pes/=/A/|perl -pes/=/n/|perl -pes/=/o/|perl -pes/==/th/|perl -pes/=/e/\
|perl -pes/=/r/|perl -pes/=/P/|perl -pes/=/e/|perl -pes/==/rl/|perl -pes/=/H/\
|perl -pes/=/a/|perl -pes/=/c/|perl -pes/=/k/|perl -pes/==/er/|perl -pes/=/./;
------------------------------
Date: Sat, 10 Jul 2004 14:01:11 +0200
From: "Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de>
Subject: Re: hash as argument
Message-Id: <2la448Fa9gh5U1@uni-berlin.de>
Also sprach Anno Siegel:
> Abigail <abigail@abigail.nl> wrote in comp.lang.perl.misc:
>> Anno Siegel (anno4000@lublin.zrz.tu-berlin.de) wrote on MMMCMLXV
>> September MCMXCIII in <URL:news:ccm3tf$9ra$1@mamenchi.zrz.TU-Berlin.DE>:
[ quoting conventions ]
>> %% Similarly, for a programming community, there are advantages in having
>> %% conventional preferences for one style over another when technically
>> %% two (or more) ways would give the same result. By using the conventional
>> %% style, the author tells the reader "Nothing to see here, move along...".
>> %% A deviation from the standard tells the reader to look for the reason.
>> %% That makes such code a lot easier to read.
>>
>> The "conventional style"? Are you now claiming that whatever style you
>> are defending is "conventional" and that the others are "deviating"?
>> That's quite presumptuous.
>
> Presumptuous or not, that's the way a new convention propagates.
> Deliberately or unconsciously, some people publicly act as if it were
> already in place. If enough people follow suit, it becomes so.
>
> Clpm is one of the places where this process happens.
Just for the record, I'd like to note that this process hasn't yet
started to happen for me. On the quote issue, I decide from case to
case. Usually, when I have characters that require escaping in a
double-quotish context, I use single quotes. When I have something
to interpolate, I use double ones, even if that requires one or the
other backspace. Once the amount of backspacing gets annoying, I start
using heredocs or maybe 'qq'.
I'd say that this quoting issue is too minor to require a convention on
it. I do agree that conventions and rules are invaluable. That however
wont hold true for any convention. Now, getting slightly political, when
I think about the vast amount of bills we have in Germany, I'd really
say that having fewer of those would be a relief for anyone involved.
This group already has quite a few rules, such as using
strictures, warnings, checking the success of system-calls, lexical
scoping where applicable etc. All of those serve a very good purpose.
It's much less obvious (for me anyway) what purpose quoting-rules could
have.
>> %% Thus, a set of stylistic conventions gives a language a dimension of
>> %% expressiveness it wouldn't have without it. That is a Good Thing.
>> %% The preference of '' over "" and of sub() over &sub belong in this
>> %% category.
>>
>> I think they aren't equivalent. 'sub()' is used far more often than
>> '&sub' - that preference has been settled. I highly doubt that the
>> preference of '' vs "" has been settled.
>
> Then it's time to settle it :)
Just keep in mind how certain people react towards rules they don't
agree with. I, for instance, tend to do the exact opposite of such
rules, just for the sake of contradiction and for expressing my grudge.
I am totally aware that this is the mindset of a five-year old, but
sometimes I really don't mind acting like a child. :-)
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
------------------------------
Date: Sat, 10 Jul 2004 11:58:55 +0000 (UTC)
From: "John J. Trammell" <trammell+usenet@hypersloth.invalid>
Subject: Re: hash as argument
Message-Id: <slrncevmfv.uu6.trammell+usenet@hypersloth.el-swifto.com.invalid>
On Sat, 10 Jul 2004 12:36:51 +1000, Andrew Hamm <ahamm@mail.com> wrote:
> AOL?
http://catb.org/~esr/jargon/html/A/AOL-.html
------------------------------
Date: 10 Jul 2004 09:47:05 +0200
From: Ting Wang <wangtg@web.de>
Subject: how perl set envirment variable
Message-Id: <bcabriopame.fsf@marvin.informatik.uni-stuttgart.de>
I want set envirment variable, e.g PATH
with perl, how can I co it?
Thanks
------------------------------
Date: Sat, 10 Jul 2004 09:49:45 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: how perl set envirment variable
Message-Id: <2l9lo3Fa9au3U1@uni-berlin.de>
Ting Wang wrote:
> I want set envirment variable, e.g PATH with perl, how can I co it?
Read about the %ENV variable in "perldoc perlvar".
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: Sat, 10 Jul 2004 08:18:44 GMT
From: Joe Smith <Joe.Smith@inwap.com>
Subject: Re: how perl set envirment variable
Message-Id: <EDNHc.61193$Oq2.46177@attbi_s52>
Ting Wang wrote:
> I want set envirment variable, e.g PATH
> with perl, how can I co it?
That's a FAQ.
perldoc -q environment
Note: Win32 acts different than Unix/Linux in regards to this.
-Joe
------------------------------
Date: 10 Jul 2004 06:04:51 -0700
From: rushi_asi@yahoo.com (Rushikesh Joshi)
Subject: how the vector is created, how to pass vector to webservices method apachesoap:Vector
Message-Id: <a2d4901c.0407100504.76be8cb2@posting.google.com>
Below is input parameter of my Web Services method vectorTest
- <wsdl:message name="vectorTestRequest">
<wsdl:part name="userName" type="xsd:string" />
<wsdl:part name="password" type="xsd:string" />
<wsdl:part name="role" type="xsd:string" />
<wsdl:part name="langpref" type="xsd:string" />
<wsdl:part name="parentid" type="xsd:int" />
<wsdl:part name="vectorParam" type="apachesoap:Vector" />
</wsdl:message>
Now just look on it the 6th parameter is Vector (apachesoap:Vector).
Now how can i pass a vector (vec) from perl to my RPC Server.
If i want to passs 'a','b','c','d' in vector how can i create a vector
in perl. See below is my passing function
my $vectorTest = $service->vectorTest("rushi_asi\@yahoo.com","rrrrr","india","en",1,$vector);
If i passing $vector in different way using Map, Array, Hash i am
receiving follwoign error using my XMLDebugger
<!-- XML Dump -->
POST /anacreon/servlet/rpcrouter HTTP/1.0
Accept: text/xml
Accept: multipart/*
Host: MySite:ServicePORT
User-Agent: SOAP::Lite/Perl/0.50
Content-Length: 967
Content-Type: text/xml; charset=utf-8
SOAPAction: ""
<?xml version="1.0" encoding="UTF-8"?><SOAP-ENV:Envelope
xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/"
SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/1999/XMLSchema">
<SOAP-ENV:Body><namesp1:vectorTest
xmlns:namesp1="com.logicboxes.foundation.sfnb.user.Test">
<userName xsi:type="xsd:string">rushi_asi\@yahoo.com</userName>
<password xsi:type="xsd:string">rrrrr</password>
<role xsi:type="xsd:string">india</role>
<langpref xsi:type="xsd:string">en</langpref>
<parentid xsi:type="xsd:int">1</parentid>
<vectorParam SOAP-ENC:arrayType="xsd:string[4]"
xsi:type="apachesoap:Vector">
<item xsi:type="xsd:string">a</item>
<item xsi:type="xsd:string">b</item>
<item xsi:type="xsd:string">c</item>
<item xsi:type="xsd:string">d</item>
</vectorParam>
</namesp1:vectorTest></SOAP-ENV:Body></SOAP-ENV:Envelope>HTTP/1.0 500
Internal Server Error
Server: Resin/2.1.11
Content-Type: text/xml; charset=utf-8
Date: Fri, 09 Jul 2004 10:09:30 GMT
<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
<soapenv:Fault>
<faultcode>soapenv:Server.userException</faultcode>
<faultstring>org.xml.sax.SAXException: No deserializer defined for
array type {http://www.w3.org/1999/XMLSchema}string</faultstring>
<detail/>
</soapenv:Fault>
</soapenv:Body>
</soapenv:Envelope>
<!-- END OF XML Dump -->
Your help will be great appreciate
Thanks in advance.
Rushikesh
------------------------------
Date: Sat, 10 Jul 2004 07:38:37 -0500
From: TLOlczyk <olczyk2002@yahoo.com>
Subject: Installing seperate version of Perl.
Message-Id: <47ove0lllmu8odpfqcs7gb6fgfp2dau4g3@4ax.com>
I am using Linux and want to debug some code written in a slightly
older version of Pwel. So I want to setup a user who uses that old
version. How do I install it, without mucking up any of the present
perl stuff?
The reply-to email address is olczyk2002@yahoo.com.
This is an address I ignore.
To reply via email, remove 2002 and change yahoo to
interaccess,
**
Thaddeus L. Olczyk, PhD
There is a difference between
*thinking* you know something,
and *knowing* you know something.
------------------------------
Date: Sat, 10 Jul 2004 08:56:02 -0400
From: Sherm Pendley <spamtrap@dot-app.org>
Subject: Re: Installing seperate version of Perl.
Message-Id: <kqCdnSWWrNz-enLdRVn-vA@adelphia.com>
TLOlczyk wrote:
> I am using Linux and want to debug some code written in a slightly
> older version of Pwel. So I want to setup a user who uses that old
> version. How do I install it, without mucking up any of the present
> perl stuff?
That's described in the standard installation docs. The key word to look for
there is "prefix".
Let's say you used a prefix of /usr/local/oldperl. The Perl binary would
then be in /usr/local/oldperl/bin, so add that to your user's PATH. Or,
begin scripts that use the old perl with #!/usr/local/oldperl/bin/perl.
sherm--
--
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org
------------------------------
Date: Sat, 10 Jul 2004 08:01:38 GMT
From: Joe Smith <Joe.Smith@inwap.com>
Subject: Re: what do you call funct ( funct())
Message-Id: <CnNHc.26574$WX.19746@attbi_s51>
Dale Henderson wrote:
> JM> Yes, in mathematics I'd read it as being "at right angles" or
> JM> whatever linear algebra makes out of that.
>
> "at right angles" is a fair description. The most technical ...
Dale,
Whenever I see one of your postings, it takes more brain power to
parse, since you don't put your text at the left margin like everybody
else does. Is there a particular reason why you use this non-standard
style?
-Joe
------------------------------
Date: Sat, 10 Jul 2004 11:39:26 +0200
From: Petr Pajas <pajas@ufal.ms.mff.cuni.cz>
Subject: why utf8::upgrade is needed?
Message-Id: <ccodbl$1oon$1@news.vol.cz>
Hi,
I'm using Perl 5.8.3 and want it to be 100% UTF-8. I'm however having
troubles with latin-1 characters in strings, since they seem to remain
byte encoded, unless I explicitly call utf8::upgrade, which is very
annoying.
In the example below, \x{e1} is latin1 small aacute,
\x{168} is non-latin1 Scaron. The code shows, that \x{e1}
remains non-UTF8 as long as it meets a non-latin1 character, or
utf8::upgrade is called. Can anyone explain why (and possibly
how to avoid that)?
$ perl -e '
use utf8;
use Devel::Peek;
$a="\x{e1}";
$b="\x{e1}\x{168}";
Dump($a);
Dump($b);
utf8::upgrade($a);
Dump($a)'
SV = PV(0x8150000) at 0x816a488
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x8163af8 "\341"\0
CUR = 1
LEN = 2
SV = PV(0x8150090) at 0x816a4c4
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0x8162530 "\303\241\305\250"\0 [UTF8 "\x{e1}\x{168}"]
CUR = 4
LEN = 5
SV = PV(0x8150000) at 0x816a488
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0x81701a8 "\303\241"\0 [UTF8 "\x{e1}"]
CUR = 2
LEN = 3
Thanks,
-- Petr
------------------------------
Date: Sat, 10 Jul 2004 11:56:15 +0200
From: "Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de>
Subject: Re: why utf8::upgrade is needed?
Message-Id: <2l9sq2F9vo5tU1@uni-berlin.de>
Also sprach Petr Pajas:
> Hi,
> I'm using Perl 5.8.3 and want it to be 100% UTF-8. I'm however having
> troubles with latin-1 characters in strings, since they seem to remain
> byte encoded, unless I explicitly call utf8::upgrade, which is very
> annoying.
>
> In the example below, \x{e1} is latin1 small aacute,
> \x{168} is non-latin1 Scaron. The code shows, that \x{e1}
> remains non-UTF8 as long as it meets a non-latin1 character, or
> utf8::upgrade is called.
As long as the numerical value of each character in the string fits into
one byte, actually. Latin1 is such a one-byte encoding and so perl will
not yet utf8ify the string.
>Can anyone explain why (and possibly how to avoid that)?
Turn that around. Why do you want everything to be unicode? In all but
the most pathological cases you can trust perl to do the right thing
with your strings, upgrading when necessary etc.
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
------------------------------
Date: Sat, 10 Jul 2004 12:22:16 +0200
From: Petr Pajas <pajas@ufal.ms.mff.cuni.cz>
Subject: Re: why utf8::upgrade is needed?
Message-Id: <ccofrv$1q3m$1@news.vol.cz>
Tassilo v. Parseval wrote:
> Also sprach Petr Pajas:
>
>> Hi,
>> I'm using Perl 5.8.3 and want it to be 100% UTF-8. I'm however having
>> troubles with latin-1 characters in strings, since they seem to remain
>> byte encoded, unless I explicitly call utf8::upgrade, which is very
>> annoying.
>>
>> In the example below, \x{e1} is latin1 small aacute,
>> \x{168} is non-latin1 Scaron. The code shows, that \x{e1}
>> remains non-UTF8 as long as it meets a non-latin1 character, or
>> utf8::upgrade is called.
>
> As long as the numerical value of each character in the string fits into
> one byte, actually. Latin1 is such a one-byte encoding and so perl will
> not yet utf8ify the string.
>
>>Can anyone explain why (and possibly how to avoid that)?
>
> Turn that around. Why do you want everything to be unicode? In all but
> the most pathological cases you can trust perl to do the right thing
> with your strings, upgrading when necessary etc.
>
> Tassilo
Well, I'm passing the strings to some XS module for XML.
If this module finds UTF8 flag on the string, it knows what to do.
If not, it assumes I'm passing it a string in the encoding of the
XML document (not necessarily Latin1) and that causes problems,
since "\x{e1}" isn't UTF8 flagged and while Perl keeps it Latin1,
the XML module may interpret it quite differently. So I have to do
utf8::upgrade to make sure the string gets converted to utf8 and is
UTF8 flagged.
-- Petr
------------------------------
Date: Sat, 10 Jul 2004 12:58:32 +0200
From: "Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de>
Subject: Re: why utf8::upgrade is needed?
Message-Id: <2la0erFaaj5jU1@uni-berlin.de>
Also sprach Petr Pajas:
> Tassilo v. Parseval wrote:
>
>> Also sprach Petr Pajas:
>>
>>> Hi,
>>> I'm using Perl 5.8.3 and want it to be 100% UTF-8. I'm however having
>>> troubles with latin-1 characters in strings, since they seem to remain
>>> byte encoded, unless I explicitly call utf8::upgrade, which is very
>>> annoying.
>>>
>>> In the example below, \x{e1} is latin1 small aacute,
>>> \x{168} is non-latin1 Scaron. The code shows, that \x{e1}
>>> remains non-UTF8 as long as it meets a non-latin1 character, or
>>> utf8::upgrade is called.
>>
>> As long as the numerical value of each character in the string fits into
>> one byte, actually. Latin1 is such a one-byte encoding and so perl will
>> not yet utf8ify the string.
>>
>>>Can anyone explain why (and possibly how to avoid that)?
>>
>> Turn that around. Why do you want everything to be unicode? In all but
>> the most pathological cases you can trust perl to do the right thing
>> with your strings, upgrading when necessary etc.
> Well, I'm passing the strings to some XS module for XML.
> If this module finds UTF8 flag on the string, it knows what to do.
> If not, it assumes I'm passing it a string in the encoding of the
> XML document (not necessarily Latin1) and that causes problems,
> since "\x{e1}" isn't UTF8 flagged and while Perl keeps it Latin1,
> the XML module may interpret it quite differently. So I have to do
> utf8::upgrade to make sure the string gets converted to utf8 and is
> UTF8 flagged.
Ah, that's indeed a legitimate reason. This module you're talking about,
is that under your control? In this case, you could have the module do a
sv_utf8_upgrade() on its arguments which might already be enough to make
it all work.
Otherwise, maybe contacting the author would be in order.
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
------------------------------
Date: Sat, 10 Jul 2004 11:57:44 +0100
From: "Alan J. Flavell" <flavell@ph.gla.ac.uk>
Subject: Re: why utf8::upgrade is needed?
Message-Id: <Pine.LNX.4.53.0407101140320.20765@ppepc56.ph.gla.ac.uk>
On Sat, 10 Jul 2004, Petr Pajas wrote:
> \x{168} is non-latin1 Scaron. The code shows, that \x{e1}
> remains non-UTF8 as long as it meets a non-latin1 character, or
> utf8::upgrade is called. Can anyone explain why (and possibly
> how to avoid that)?
To try to answer the question "why", the documentation explains this
in terms of transparent compatibility with older 8-bit handling.
http://www.perldoc.com/perl5.8.4/pod/perlunicode.html#Byte-and-Character-Semantics
For how to deal with that in practice,
http://www.perldoc.com/perl5.8.4/pod/perlunicode.html#Forcing-Unicode-in-Perl-(Or-Unforcing-Unicode-in-Perl)
(and the following heading) seem to be particularly relevant.
Maybe I misunderstood what you were saying, but you can't just mark an
iso-8859-1 string as utf8; it's necessary to cause Perl to genuinely
create the utf8 version from the 8-bit-coded version. As I understand
it, once the utf8 version has been created it won't be quietly
destroyed; so if a character > 255 is appended to a string (causing
upgrade to utf8) and then taken off again, the string will still be
held in utf8 form, unless one explicitly down-converts it. I'd
suggest
http://www.perldoc.com/perl5.8.4/pod/perlunicode.html#Interaction-with-Extensions
in relation to your specific interest.
hope this helps
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 6787
***************************************