[31086] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 2331 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Apr 10 18:10:34 2009

Date: Fri, 10 Apr 2009 15:10:00 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 10 Apr 2009     Volume: 11 Number: 2331

Today's topics:
    Re: Berkeley DB 4.7.25 "pthread_self", and feature requ <smallpond@juno.com>
    Re: Berkeley DB 4.7.25 "pthread_self", and feature requ <liarafan@xs4all.nl>
    Re: Bioinformatics Suite sln@netherlands.com
        FAQ on FAQ 1.000 How do I find out who writes/maintains sln@netherlands.com
        Finding a Perl job sln@netherlands.com
    Re: Match Whole word only <rvtol+usenet@xs4all.nl>
    Re: Match Whole word only sln@netherlands.com
        The Logic of Beautiful Code sln@netherlands.com
    Re: XML::LibXML UTF-8 toString() -vs- nodeValue() <hjp-usenet2@hjp.at>
    Re: XML::LibXML UTF-8 toString() -vs- nodeValue() sln@netherlands.com
    Re: XML::LibXML UTF-8 toString() -vs- nodeValue() sln@netherlands.com
    Re: XML::LibXML UTF-8 toString() -vs- nodeValue() <hjp-usenet2@hjp.at>
    Re: XML::LibXML UTF-8 toString() -vs- nodeValue() <hjp-usenet2@hjp.at>
    Re: XML::LibXML UTF-8 toString() -vs- nodeValue() sln@netherlands.com
    Re: XML::LibXML UTF-8 toString() -vs- nodeValue() sln@netherlands.com
    Re: XML::LibXML UTF-8 toString() -vs- nodeValue() sln@netherlands.com
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 10 Apr 2009 13:31:33 -0700 (PDT)
From: smallpond <smallpond@juno.com>
Subject: Re: Berkeley DB 4.7.25 "pthread_self", and feature request
Message-Id: <5b4247db-4c17-43cc-ac08-4d791916de51@j8g2000yql.googlegroups.com>

On Apr 10, 1:59 pm, "Mark" <liara...@xs4all.nl> wrote:
> Hello all (and hoping Paul Marquess catches this too),
>
> Yesterday I upgraded my entire server to Berkeley DB 4.7.25 -- a major
> operation (what with recompiling everything against the new libs and all.
> It seems very stable, so far, and Berkeley, the Perl module, is lightning
> fast!
>
> One thing, though, whilst compiling, I got an error about: "pthread_self"
> being an undefined symbol. I subsequently removed the
> --with-enable-pthread_api option from the Makefile, as I don't seem to
> have the required libraries (this is a slightly older gcc 2.95.4 FreeBSD
> compiler). Everything seems totally fine, now. So, do I really need the
> "pthread_self" stuff? My guess is, I can do without. I ran several very
> severe concurrency tests on de new DB 4.7, and they didn't give an inch,
> which leads me to believe I'm probably fine. Am I?
>
> As a feature request, for the BerkeleyDB module, when the shared ENV __db*
> files get corrupted (this can happen; for instance, on a bad reboot of the
> server, or hung process killed with -9), the BerkeleyDB module basically
> just hangs when trying to reopen the environment, until such time I kill
> the process trying to use it, and manually delete the __db* files first.
> I wonder, could perhaps some sanity checks be done against the shared env?
>
> Thanks,
>
> - Mark


You should be on at least perl 5.8.0 for any module that uses threads.
You may need an old version of the module if you are on 5.6.x.

perl -V will tell you what compiler version was used to build perl.
You should build modules with the same version.  grep for 'cc'

libraries are independent of compiler version.  A module will use the
same libraries as perl except for the unlikely case that perl is
statically linked or you do something funky with the library path.
perl -V will also tell you what libraries perl is using.
grep for 'perllibs'


------------------------------

Date: Fri, 10 Apr 2009 23:25:38 +0200
From: "Mark" <liarafan@xs4all.nl>
Subject: Re: Berkeley DB 4.7.25 "pthread_self", and feature request
Message-Id: <Jbidnc3kf9XPJELUnZ2dnUVZ8uCdnZ2d@giganews.com>

"smallpond" <smallpond@juno.com> wrote in message 
news:5b4247db-4c17-43cc-ac08-4d791916de51@j8g2000yql.googlegroups.com...

> On Apr 10, 1:59 pm, "Mark" <liara...@xs4all.nl> wrote:
>> Hello all (and hoping Paul Marquess catches this too),

> You should be on at least perl 5.8.0 for any module that uses threads.
> You may need an old version of the module if you are on 5.6.x.
>
> perl -V will tell you what compiler version was used to build perl.
> You should build modules with the same version. grep for 'cc'

I haven't compiled Perl with ithreads at the time. And I don't use threads
(when needed, I just 'use forks::BerkeleyDB;'). It's Perl 5.8.8, btw, and
also compiled with the same gcc 2.95.4 compiler.

The "pthread_self" error, for the record, occurs when compiling the
Berkeley DB 4.7.25 *database* itself, not BerkeleyDB, the Perl package. I
guess I didn't make that sufficiently clear. Since BerkeleyDB, the
package, uses Berkeley DB 4.7, the database, I was just wondering whether
having compiled the database without --with-enable-pthread_api would
adversely affect BerkeleyDB, the package.

- Mark 



------------------------------

Date: Fri, 10 Apr 2009 13:31:23 -0700
From: sln@netherlands.com
Subject: Re: Bioinformatics Suite
Message-Id: <ojavt4to01jk8nk7lvqt9bgt9eas46jt47@4ax.com>

On Fri, 10 Apr 2009 10:47:09 -0700, sln@netherlands.com wrote:

>On Fri, 10 Apr 2009 10:39:22 -0700 (PDT), Raghava <raghavagps@gmail.com> wrote:
>
>>Dear Colleagues
>>As you know our group have developed number of webservers over the
>>years. I got lot of request related to i) availability  of source
>>code; ii) perl scripts used to build software; iii) standalone
>>version of methods.  In order to help our bioinformatics users
>>particularly young developers who wish to develop bioinformatics
>>programs; first time we are releasing source code (written in PERL)
>>for public. I hope these perl scripts will be useful for
>>bioinformatics community.
>>
>>Codes are available from
>>http://www.imtech.res.in/raghava/gpsr/
>>https://sourceforge.net/projects/gpsraghava/
>>
>>This is just starting, we will release all software to public in next
>>one year. I will appreciate you comment/suggestion/feedback on this
>>package GPSR.
>>
>>Regards
>>
>>Raghava
>>======================================================
>># Dr G P S Raghava, Scientist and Head Bioinformatics Centre       #
>># Institute of Microbial Technology, Sector-39A, Chandigarh, India  #
>># Phone: +91-172-2690557, Fax: +91-172-2690632
>>#
>># Eadd: http://www.imtech.res.in/raghava/   raghava@imtech.res.in #
>>#=====================================================
>
>Sure, my comment is your a pig researcher sucking down grants and getting
>everybody to earn it for free.
>
>-sln

But ya know. I hate to tell ya, but, you can write all the diffy-q, derivations,
analytical hocus-pocus you want, %99 of which are flawed. Oh, but why is it flawed?
Because, in my life, I have never seen such a generation to generation to generation,
of the most grant driven PhD Physicist in all history I've studied in the last 100
years, except Einstein. Statistically, humans would have another Einstein or 2 since
him given the exponential birth rate and technology.

But no. I'm thinking a PhD is as about as easy to get nowadays as a Mac hamburger.

Gigantic data gathering will not reveal anything without inspiration. Didn't they
ever tell you that?

Don't make me laugh!

-sln



------------------------------

Date: Fri, 10 Apr 2009 13:01:49 -0700
From: sln@netherlands.com
Subject: FAQ on FAQ 1.000 How do I find out who writes/maintains this trash?
Message-Id: <tq8vt49vsji4fsnejh0up15j2n9e86t2h6@4ax.com>


Possibly by asking.

There are other ways, like checking IP's and making a Regualar Expression
that works on a dump, like the dump this site is, and catching names, that
theorhetically are immune to prosecution.

This FAQ is dedicated to the uniquely talented individuals who make up the
minority of this forum, that have the balls to ask questions.

------------------------------------
Thread maintained by outcast genious' workers who don't get paid shit but are the
backbone of this forum. <insert Latin phrase here> <- Ah, E Plerubus Unim

-sln


------------------------------

Date: Fri, 10 Apr 2009 14:59:38 -0700
From: sln@netherlands.com
Subject: Finding a Perl job
Message-Id: <d6gvt4pbc7cj6o61hprh2eou3ga5ghdck2@4ax.com>

No. Sorry I won't do that.
You know, the strangest thing, I don't enable scripts or auto-loading of Active-X executables/dll's on my machine.
I would like to disable crap that you may want to load on my machine.
 
Am I wrong? Is there any other way I can communicate to employers without having your crap executing on my Operating System?
 
Me
----- Original Message ----- 
From: Customer Support 
To: 'Me' 
Sent: Friday, April 10, 2009 12:49 PM
Subject: RE: Profile Wizard Inquiry


Hello Me,
 
    Thank you for letting me know about the error you received when trying to apply for a job on our site. I see that you are using IE6 and there are known glitches between IE6 and our site. Would you
be able to download IE7 or Mozilla Firefox and see if you still receive the error? If there is anything else I can assist you with please let me know.
 
Thank you for choosing Dice!



------------------------------

Date: Fri, 10 Apr 2009 22:25:29 +0200
From: "Dr.Ruud" <rvtol+usenet@xs4all.nl>
Subject: Re: Match Whole word only
Message-Id: <49dfab39$0$192$e4fe514c@news.xs4all.nl>

John W. Krahn wrote:

> my @aval = "@lines" =~ /\ba[0-9]+\b/g;

Alternative:

   my @aval = map /\ba[0-9]+\b/g, @lines;

-- 
Ruud


------------------------------

Date: Fri, 10 Apr 2009 13:40:08 -0700
From: sln@netherlands.com
Subject: Re: Match Whole word only
Message-Id: <lkbvt4t9ui6iia2scn921can846jhll27e@4ax.com>

On Fri, 10 Apr 2009 22:25:29 +0200, "Dr.Ruud" <rvtol+usenet@xs4all.nl> wrote:

> map /\ba[0-9]+\b/g, @lines

print
 map /\ba[0-9]+\b/g, "l qa1;c=c(a0)\0471\047.or.c(a10)\0472\047;&sort;&nz
n01 blah ... ;c=c(a9)\0471\047.and.c(a07)\0471\047
n01 blah ... ;c=c(a08)\0471\047.and.c(a11)\0471\047
*include q1.qin;kod=a05;axm=q4a4
*include q2.qin;pod=a06
";

-sln


------------------------------

Date: Fri, 10 Apr 2009 14:08:37 -0700
From: sln@netherlands.com
Subject: The Logic of Beautiful Code
Message-Id: <n5cvt4di8fcskoihaika6fdn5bitr7gdmp@4ax.com>

Is it possible that the beauty of the written sentence, the formulation of
adjective invectives, interferes with code logic?

Yeah, Perl is a prime example. If you ever uses and assembler you would know
this.

As your eye pleasingly passes over flowery invectives of logic, does it stop at
the right time? Does it distinguish the array name from the scalar name in a flowing
manner?

Perl is sing-song reading and writing, with little obfuscated punctuation, and all
the air taken out of it to hide its authors shortcomings in/or invections of such logic.

The air-heads here take it to incredible lengths to pronounce a particular thing abhorent
in style. In fact, most of the reply's here from these air-heads are all about style.
Sometimes, they completely miss the conceptual logic errors. They can't see the forrest for
the trees. These are not guru's whatsoever. I'm not a guru. Althoug..

In the short examples and scopes posted here, what does 'maintainable' mean anyway?
When should something be considered maintainable?

I suggest deeper logic and structure needs to have maintinance a factor in the design.
But only then and nothing else. Even then, it may be just a few names. Otherwise, like
Perl, a sick, typless language, take the pain! You want it you got it. So suck it up!

Otherwise, it all assembles into the same code. Don't act like Perl is anthing special,
it is far from it. 

More '1-liners', more fun! Don't try to legitamize logic after the fact!

-sln


------------------------------

Date: Fri, 10 Apr 2009 21:04:30 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
Message-Id: <slrngtv620.t3b.hjp-usenet2@hrunkner.hjp.at>

On 2009-04-10 01:48, Ben Bullock <benkasminbullock@gmail.com> wrote:
> On Thu, 09 Apr 2009 20:53:32 +0200, Peter J. Holzer wrote:
>> On 2009-04-09 00:24, Ben Bullock <benkasminbullock@gmail.com> wrote:
>>> If you want to know whether Perl has marked a string as being UTF-8, in
>>> other words whether Perl thinks that a string is UTF-8 or not, the only
>>> way of doing this is to look at Perl's flag, using utf8::is_utf8 or
>>> the similar routine in Encode.
>> 
>> The (stupidly named) utf8 flag doesn't indicate whether perl thinks that
>> a string is UTF-8 or not.
>
> Yes, it does. If I have a string
>
> my $p = "XYZ"; 
>
> where "XYZ" are three bytes of a single character encoded as UTF-8 in the 
> text of the program (ie I have asked my text editor to save in the UTF-8 
> form), then if I have the line
>
> use utf8;
>
> at the top of my program, Perl will set the "utf8 flag" on $p

Right. If you put "use utf8" at the top of you program you tell the perl
compiler that the *source code* of your program is written in UTF-8.
This affects variable names (you can now use a variable like "$käse" or
$κόσμε or $こんにちは) and string constants. The latter are
converted from UTF-8 to Perl's internal character string format. Since
these are now character strings, the (still stupidly named) utf8 flag is
set. 

use encoding "X" has a similar effect: You tell the compiler that the
source code is in encoding X, and it will convert any string constant
from encoding X to Perl's internal character string format. Again, the
fact that the string is now a character string (and not a byte string)
will be signified by the utf8 flag, even though the string in the source
file was not UTF-8.

> and will act as if this multibyte UTF-8 character is a single item,
> for example length ($p) == 1.

And this shows that the string is *not* UTF-8. UTF-8 is by definition a
serialization format for unicode. The Unicode character U+20AC, for
example, is serialized into 3 octets: e2 82 ac. So if the string "€" was
a UTF-8 string, then

    length("€") == 3,
    ord(substr("€", 0, 1) == 0xE2,
    ord(substr("€", 1, 1) == 0x82,
    ord(substr("€", 2, 1) == 0xAC

would all be true. However, if string was parsed from a source with use
utf8 or use encoding in effect, or read from a file with an encoding
layer, or by some other method which yields Perl character strings, all
of these are false. Instead,

    length("€") == 1,
    ord(substr("€", 0, 1) == 0x20AC,

are true. So the string is a Unicode string, but it is not a UTF-8
string.

> If I do not have the use utf8; line in the program, Perl will not set
> the utf8 flag on $p and I will get length ($p) == 3 instead of 1.

And then you really have a UTF-8 string.

> Thus the string may actually be UTF-8 even when Perl's flag 
> is not set.

It can *only* be UTF-8 is the flag is *not* set, unless your program is
broken or you are dealing with double-encoded input.

	hp


------------------------------

Date: Fri, 10 Apr 2009 12:27:29 -0700
From: sln@netherlands.com
Subject: Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
Message-Id: <o87vt4969t885a42ja0h9a3rgevlkvbjmb@4ax.com>

On Fri, 10 Apr 2009 21:04:30 +0200, "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:

>On 2009-04-10 01:48, Ben Bullock <benkasminbullock@gmail.com> wrote:
>> On Thu, 09 Apr 2009 20:53:32 +0200, Peter J. Holzer wrote:
>>> On 2009-04-09 00:24, Ben Bullock <benkasminbullock@gmail.com> wrote:
>>>> If you want to know whether Perl has marked a string as being UTF-8, in
>>>> other words whether Perl thinks that a string is UTF-8 or not, the only
>>>> way of doing this is to look at Perl's flag, using utf8::is_utf8 or
>>>> the similar routine in Encode.
>>> 
>>> The (stupidly named) utf8 flag doesn't indicate whether perl thinks that
>>> a string is UTF-8 or not.
>>
>> Yes, it does. If I have a string
>>
>> my $p = "XYZ"; 
>>
>> where "XYZ" are three bytes of a single character encoded as UTF-8 in the 
>> text of the program (ie I have asked my text editor to save in the UTF-8 
>> form), then if I have the line
>>
>> use utf8;
>>
>> at the top of my program, Perl will set the "utf8 flag" on $p
>
>Right. If you put "use utf8" at the top of you program you tell the perl
>compiler that the *source code* of your program is written in UTF-8.
>This affects variable names (you can now use a variable like "$kse" or
>$????? or $?????) and string constants. The latter are
>converted from UTF-8 to Perl's internal character string format. Since
>these are now character strings, the (still stupidly named) utf8 flag is
>set. 
>
>use encoding "X" has a similar effect: You tell the compiler that the
>source code is in encoding X, 

How do you tell the compiler what coding to use if the encoding '' can't be
decoded? Do you have to encode the encoding. Where does it stop? I mean,
where does it begin?

-sln


------------------------------

Date: Fri, 10 Apr 2009 12:44:01 -0700
From: sln@netherlands.com
Subject: Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
Message-Id: <u38vt4969t885a42ja0h9a3rgevlkvbj9l@4ax.com>

On Fri, 10 Apr 2009 12:27:29 -0700, sln@netherlands.com wrote:

>On Fri, 10 Apr 2009 21:04:30 +0200, "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>
>>On 2009-04-10 01:48, Ben Bullock <benkasminbullock@gmail.com> wrote:
>>> On Thu, 09 Apr 2009 20:53:32 +0200, Peter J. Holzer wrote:
>>>> On 2009-04-09 00:24, Ben Bullock <benkasminbullock@gmail.com> wrote:
>>>>> If you want to know whether Perl has marked a string as being UTF-8, in
>>>>> other words whether Perl thinks that a string is UTF-8 or not, the only
>>>>> way of doing this is to look at Perl's flag, using utf8::is_utf8 or
>>>>> the similar routine in Encode.
>>>> 
>>>> The (stupidly named) utf8 flag doesn't indicate whether perl thinks that
>>>> a string is UTF-8 or not.
>>>
>>> Yes, it does. If I have a string
>>>
>>> my $p = "XYZ"; 
>>>
>>> where "XYZ" are three bytes of a single character encoded as UTF-8 in the 
>>> text of the program (ie I have asked my text editor to save in the UTF-8 
>>> form), then if I have the line
>>>
>>> use utf8;
>>>
>>> at the top of my program, Perl will set the "utf8 flag" on $p
>>
>>Right. If you put "use utf8" at the top of you program you tell the perl
>>compiler that the *source code* of your program is written in UTF-8.
>>This affects variable names (you can now use a variable like "$kse" or
>>$????? or $?????) and string constants. The latter are
>>converted from UTF-8 to Perl's internal character string format. Since
>>these are now character strings, the (still stupidly named) utf8 flag is
>>set. 
>>
>>use encoding "X" has a similar effect: You tell the compiler that the
>>source code is in encoding X, 
>
>How do you tell the compiler what coding to use if the encoding '' can't be
>decoded? Do you have to encode the encoding. Where does it stop? I mean,
>where does it begin?
>
>-sln

Utf-16 and utf-32 have merits. Unfortunately, Perl won't do that.
Imagine Perl doing utf-32. Why then you could do Regular Expressions on
a binary stream. Byte is ok, int is slower, but rx on binary has merits.

Oh no, couldn't have that, no no...
UTF-8 it is then, can't have other choices.

-sln


------------------------------

Date: Fri, 10 Apr 2009 22:50:51 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
Message-Id: <slrngtvc9c.ubt.hjp-usenet2@hrunkner.hjp.at>

On 2009-04-10 19:27, sln@netherlands.com <sln@netherlands.com> wrote:
> On Fri, 10 Apr 2009 21:04:30 +0200, "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>>use encoding "X" has a similar effect: You tell the compiler that the
>>source code is in encoding X, 
>
> How do you tell the compiler what coding to use if the encoding '' can't be
> decoded?

If the encoding cannot be decoded, then the compiler will complain and
stop:

    encoding: Unknown encoding 'X' at foo2 line 1
    BEGIN failed--compilation aborted at foo2 line 1.

Naturally, you can only use encodings which are known to the compiler.
There are quite a lot of them, so I don't think this is a serious
problem.

Or do you mean what happens if the compiler doesn't even get to the "use
encoding 'X'" line because that is encoded? This is only a problem if
you use an encoding which isn't a superset US-ASCII (or EBCDIC on some
platforms). So you can't use UTF-16, because the extra 0x00 octets would confuse
the parser which is expecting US-ASCII, and you can't use EBCDIC on an
US-ASCII platform, but you can use UTF-8, ISO-8859-X, BIG5, euc-jp, as
long as you use only ASCII characters before the use directive (which is
easy since that should be the first line (after the shebang) anyway.

	hp


------------------------------

Date: Fri, 10 Apr 2009 22:59:40 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
Message-Id: <slrngtvcpt.ubt.hjp-usenet2@hrunkner.hjp.at>

On 2009-04-10 19:44, sln@netherlands.com <sln@netherlands.com> wrote:
> Utf-16 and utf-32 have merits. Unfortunately, Perl won't do that.

Actually, for all practical purposes, Perl character strings *are*
UTF-32. Each character is a 32-bit value.

Both UTF-16 and UTF-32 are supported for I/O, of course.

> Imagine Perl doing utf-32.

I don't have to imagine that, it does.

> Why then you could do Regular Expressions on
> a binary stream.

You can't do Regexps on streams, whether binary or not (would be nice if
we could).

You can do Regexps on *strings*, whether they are binary or text.

I don't know what that has to do with UTF-32. Binary strings consist of
octets. Treating them as UTF-32 is almost almost a mistake.

	hp


------------------------------

Date: Fri, 10 Apr 2009 14:20:44 -0700
From: sln@netherlands.com
Subject: Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
Message-Id: <kedvt4tvmp5baos35qpb62u7pg0p3cp8v7@4ax.com>

On Fri, 10 Apr 2009 22:59:40 +0200, "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:

>On 2009-04-10 19:44, sln@netherlands.com <sln@netherlands.com> wrote:
>> Utf-16 and utf-32 have merits. Unfortunately, Perl won't do that.
>
>Actually, for all practical purposes, Perl character strings *are*
>UTF-32. Each character is a 32-bit value.
>
>Both UTF-16 and UTF-32 are supported for I/O, of course.
>
>> Imagine Perl doing utf-32.
>
>I don't have to imagine that, it does.
>
>> Why then you could do Regular Expressions on
>> a binary stream.
>
>You can't do Regexps on streams, whether binary or not (would be nice if
>we could).
>
>You can do Regexps on *strings*, whether they are binary or text.
>
>I don't know what that has to do with UTF-32. Binary strings consist of
>octets. Treating them as UTF-32 is almost almost a mistake.
>
>	hp

If you can't do Reges on streams, then you can't parse XML.
I ah think your missing what Unicode is.
I have already posted sometime back pack/unpack on regex streams.
I can repost the code if you need. Or you can read a few docs on it.
I doubt you'll capitulate no matter what.

perlunicode.html and some others.

-sln


------------------------------

Date: Fri, 10 Apr 2009 14:24:33 -0700
From: sln@netherlands.com
Subject: Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
Message-Id: <85evt4tffln11ct27bubb9dn8r2o48mhc4@4ax.com>

On Fri, 10 Apr 2009 22:50:51 +0200, "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:

>On 2009-04-10 19:27, sln@netherlands.com <sln@netherlands.com> wrote:
>> On Fri, 10 Apr 2009 21:04:30 +0200, "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>>>use encoding "X" has a similar effect: You tell the compiler that the
>>>source code is in encoding X, 
>>
>> How do you tell the compiler what coding to use if the encoding '' can't be
>> decoded?
>
>If the encoding cannot be decoded, then the compiler will complain and
>stop:
>
>    encoding: Unknown encoding 'X' at foo2 line 1
>    BEGIN failed--compilation aborted at foo2 line 1.
>
>Naturally, you can only use encodings which are known to the compiler.
>There are quite a lot of them, so I don't think this is a serious
>problem.
>
>Or do you mean what happens if the compiler doesn't even get to the "use
>encoding 'X'" line because that is encoded? This is only a problem if
>you use an encoding which isn't a superset US-ASCII (or EBCDIC on some
>platforms). So you can't use UTF-16, because the extra 0x00 octets would confuse
>the parser which is expecting US-ASCII, and you can't use EBCDIC on an
>US-ASCII platform, but you can use UTF-8, ISO-8859-X, BIG5, euc-jp, as
>long as you use only ASCII characters before the use directive (which is
>easy since that should be the first line (after the shebang) anyway.
>
>	hp

So there is a base 'code' line. Isin't that stupid to interpret the rest of
the code in an encoding interpreted with another code? The code is then broken!

-sln


------------------------------

Date: Fri, 10 Apr 2009 14:32:57 -0700
From: sln@netherlands.com
Subject: Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
Message-Id: <0jevt45a9957d0cvru7edccm91e0to1o1g@4ax.com>

On Fri, 10 Apr 2009 14:20:44 -0700, sln@netherlands.com wrote:

>On Fri, 10 Apr 2009 22:59:40 +0200, "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote:
>
>>On 2009-04-10 19:44, sln@netherlands.com <sln@netherlands.com> wrote:
>>> Utf-16 and utf-32 have merits. Unfortunately, Perl won't do that.
>>
>>Actually, for all practical purposes, Perl character strings *are*
>>UTF-32. Each character is a 32-bit value.
>>
>>Both UTF-16 and UTF-32 are supported for I/O, of course.
>>
>>> Imagine Perl doing utf-32.
>>
>>I don't have to imagine that, it does.
>>
>>> Why then you could do Regular Expressions on
>>> a binary stream.
>>
>>You can't do Regexps on streams, whether binary or not (would be nice if
>>we could).
>>
>>You can do Regexps on *strings*, whether they are binary or text.
>>
>>I don't know what that has to do with UTF-32. Binary strings consist of
>>octets. Treating them as UTF-32 is almost almost a mistake.
>>
>>	hp
>
>If you can't do Reges on streams, then you can't parse XML.
>I ah think your missing what Unicode is.
>I have already posted sometime back pack/unpack on regex streams.
>I can repost the code if you need. Or you can read a few docs on it.
>I doubt you'll capitulate no matter what.
>
>perlunicode.html and some others.
>
>-sln

Btw, just try to pack or un-pack UTF-16 or UTF-32.
Hey or even UTF-8 that is out of range.
Try to do regex on them next.
I did. I didn't pack/unpack utf16 or utt32.
Let me know if you can do that.

-sln


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 2331
***************************************


home help back first fref pref prev next nref lref last post