[24845] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 6996 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Sep 12 14:06:33 2004

Date: Sun, 12 Sep 2004 11:05:04 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sun, 12 Sep 2004     Volume: 10 Number: 6996

Today's topics:
    Re: "RFC": re [un]pack() (Anno Siegel)
    Re: Perl opting for double-byte chars? <shawn.corey@sympatico.ca>
    Re: Perl opting for double-byte chars? <flavell@ph.gla.ac.uk>
    Re: Perl opting for double-byte chars? $_@_.%_
    Re: Perl opting for double-byte chars? <http://joecosby.com/code/mail.pl>
    Re: Perl opting for double-byte chars? <http://joecosby.com/code/mail.pl>
    Re: Perl opting for double-byte chars? (J. Romano)
    Re: Perl opting for double-byte chars? <flavell@ph.gla.ac.uk>
    Re: Perl opting for double-byte chars? <flavell@ph.gla.ac.uk>
    Re: Perl opting for double-byte chars? <notvalid@email.com>
    Re: Xah Lee's Unixism jmfbahciv@aol.com
    Re: Xah Lee's Unixism jmfbahciv@aol.com
    Re: Xah Lee's Unixism <bm@acm.org>
    Re: Xah Lee's Unixism jmfbahciv@aol.com
    Re: Xah Lee's Unixism <steveo@eircom.net>
    Re: Xah Lee's Unixism <bm@acm.org>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 12 Sep 2004 15:11:17 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: "RFC": re [un]pack()
Message-Id: <ci1oul$ra6$1@mamenchi.zrz.TU-Berlin.DE>

Michele Dondi  <bik.mido@tiscalinet.it> wrote in comp.lang.perl.misc:
> On 10 Sep 2004 22:36:31 GMT, anno4000@lublin.zrz.tu-berlin.de (Anno
> Siegel) wrote:
> 
> >pack and unpack suffer from obscurity, not from a lack of flexibility.  
> 
> I totally agree with you. I find that these two functions are
> incredibly powerful and useful, but every time I need them I have to
> read the docs two or three times and despite of this generally I still
> have to find the correct template by trial and error!

I think that's how most people work with them.  It is good to know
they're around, and to have a general idea what you can do with them.
If you actually need something, look it up.

[...]

> However, taking into account your, and Brian McCauley's, knowledgeable
> opinion, I think I won't bother anymore!

Then again, as Paul Graham said in his essay _Great Hackers_, "The
key to being a good hacker may be to work on what you like."

From a Perl perspective, I'd like to add "...and the wisdom to choose
what to publish on the CPAN" :)

Anno


------------------------------

Date: Sun, 12 Sep 2004 09:16:11 -0400
From: Shawn Corey <shawn.corey@sympatico.ca>
Subject: Re: Perl opting for double-byte chars?
Message-Id: <gYX0d.93$lb5.22016@news20.bellglobal.com>

Hi,

I got caught on this one too. See perldoc perluniintro and perldoc 
perlunicode. Perl v5.8+ has a feature that automatically and silently 
converts its standard (pre-v5.8) strings into UTF-8 strings if it 
encounters a Unicode character. I haven't figure a reliable way around 
this yet but you could try:

$s = pack( 'C*', unpack( 'U*', $s ));

Alan J. Flavell wrote:
> On Sun, 12 Sep 2004, it was written:
> 
> [snip]
> 
> 
>>At some point, Perl does seem to be making the decision to alter the
>>data which I am pulling from the database, changing the particular
>>character
> 
> 
> So write and instrument a small test case, small enough to be posted 
> here (minus the database itself, OK) with some sample printouts of the 
> data at the various points in the processing, preferably in 
> hexadecimal (any attempt to splatter 8-bit characters into a Usenet 
> posting usually turns into a failure to communicate, in my 
> experience).
> 
> 
>>from an 8-bit value to a 16-bit value.
> 
> 
> This may seem like hair splitting, but what you exhibited so far 
> appeared to be a utf-8 character.  Which in this case consisted of two 
> octets (bytes), but that's not the same thing as "a 16-bit value".
> 
> 
>>The job at hand for me is to make it stop doing this.
> 
> 
> Possibly.  That depends on what range of characters you hope to be 
> able to handle in your system.  But let's try to understand where 
> we're at, before discussing where to go from there.
> 
> 
>>As you and the preceding person have pointed out, I don't know
>>everything there is to know about character encodings.  I apologize if
>>I have caused any confusion in describing character encoding
>>incorrectly.
> 
> 
> Oh, it's quite normal...  Naturally I'd urge you to take time to learn 
> a bit more about it, believing - as I do - that it'll save you effort 
> later; but as it's one of my specialist subjects, "I would say that, 
> wouldn't I?"...
> 
> 
>>I would appreciate any pointers you might have on where would be a
>>good place to start looking at system variables to find the relevant
>>environment variables, 
> 
> 
> man printenv
> man locale
> 
> (assuming unix-family OS),
> 
> 
>>but it does seem clear enough, assuming I am
>>understanding the code I am looking at, that Perl is changing a text
>>value 
> 
> 
> Perl doesn't magically "change text values": it handles text in the 
> way that it thinks it's been asked to handle it.
> 
> My feeling is that, sooner rather than later, you're going to need 
> this stuff anyway, so I'd start on perldoc perluniintro and then
> perldoc perlunicode (or the links near the foot of the index page 
> http://www.perldoc.com/perl5.8.0/pod.html or whichever version you are 
> using).
> 
> But if you're determined that you just want to get utf8 out of the way 
> for the moment, and you're sure you'll never be showing Perl a 
> character outside of the iso-8859-1 range, then look for discussions 
> on apparent incompatibilities between RedHat 9 and Perl 5.8, which 
> discuss how RedHat's introduction of utf8 into the locale caused Perl 
> to switch into its Unicode mode, and how to take it out again (I don't 
> have the details at my fingertips right now, sorry).
> 
> 
>>It seems, as far as I can tell, as if that is something I will need to
>>solve within Perl.  Maybe I am mistaken, but I don't see how the
>>operating system is going to make a decision to force data inside a
>>Perl application to alter based on it's active character encoding
>>setup.
> 
> 
> Oh, but it does.  At least in 5.8.0.  Google for "redhat perl 5.8.0 
> utf8 locale" (without the quotes) and read the first few links, I 
> think they'll help.
> 
> 
>>If how Perl makes the decision to change the 8-bit value to a 16-bit 
>>value
> 
> 
> Please stop saying "16 bit value"; it's sure to cause confusion 
> somewhere down the line.  What you're talking about here is a 
> character stored in Perl's native unicode format, which is utf-8: this 
> particular character happens to occupy two bytes in storage, but it's 
> not useful to talk about it as a "16-bit value", and it risks 
> confusing it with utf-16 format (which is the OS's native storage 
> format on Windows NT-based systems, by the way, and commonly used also 
> for storing unicode characters in databases).
> 
> good luck



------------------------------

Date: Sun, 12 Sep 2004 16:01:45 +0100
From: "Alan J. Flavell" <flavell@ph.gla.ac.uk>
Subject: Re: Perl opting for double-byte chars?
Message-Id: <Pine.LNX.4.61.0409121525180.4476@ppepc56.ph.gla.ac.uk>


A: No!

On Sun, 12 Sep 2004, Shawn Corey blurted out atop a fullquote:

> I got caught on this one too. 

Are you sure it was the same?

> See perldoc perluniintro and perldoc perlunicode.

Yup, good advice, already offered.

> Perl v5.8+ has a feature that automatically and silently converts 
> its standard (pre-v5.8) strings into UTF-8 strings if it encounters 
> a Unicode character.

If by "a Unicode character" you mean one whose code value is greater 
than 255, then you're right; but we've been given no evidence here 
that such a character has been involved.  The only "interesting" 
character under discussion has been one which fell into the range 
occupied by printable characters in iso-8859-1, namely 160-255 
decimal.

Perl 5.8 would only have "upgraded" that to utf8 if it had been
given cause to do so.  In 5.8.0, one such cause is the presence
of utf-8 in the locale.  See also the discussion in
http://use.perl.org/articles/03/09/26/2231256.shtml?tid=6 , or
http://twiki.org/cgi-bin/view/Codev/UsingPerl58OnRedHat8 , or
the various other articles that pop up when one tries the search that 
I had suggested.

My hunch is that's what happened.  Maybe I'll be proved wrong; we'll 
see.

> I haven't figure a reliable way around this yet 

(which suggests you haven't read the relevant perldocs closely enough)

There are various approaches, depending on what your problem field is
and what you're trying to achieve.

If you force the old behaviour, then you can get what you'd have been 
accustomed to before, and you won't suffer the overhead of Perl 
processing Unicode; but you'll cut yourself off from the ability to 
process a fuller range of characters, writing systems etc.

If you learn how to work with Unicode - and your database /also/ knows 
how to work with it - then you can write software that can handle 
writing systems which are way outside of mere Latin 1; but you may 
incur some processing overhead due to the extra work of Perl handling 
Unicode characters.

With care, code can be written such that the overhead only cuts in 
when charcters outside of the iso-8859-1 repertoire are used. Thus 
getting the best of both worlds - without having to write messy 
dual-path code, because Perl takes care of it for you (if you're 
asking it right).

In general I'd say (except perhaps for diagnostic purposes), if you're 
messing around with packing and unpacking characters, then you're 
doing it wrong.  The key is to grasp Perl's character representation 
model, and to work *with* it, not to fight it with hand-packed and 
-unpacked representations.

This assumes that your code only needs to run on >= 5.8.0.  If you're 
writing code meant to be runnable on older Perls, then you have to put 
quite a lot more care into the task of producing something compatible.

ttfn

Q: Should I put my Usenet response on the top of a quote of the entire 
previous posting?

http://www.faqs.org/docs/jargon/T/top-post.html



------------------------------

Date: Sun, 12 Sep 2004 16:06:10 GMT
From: $_@_.%_
Subject: Re: Perl opting for double-byte chars?
Message-Id: <St_0d.1608$_53.575@trndny02>


Bëelphazoar <http://joecosby.com/code/mail.pl> wrote in message-id:
<ofr7k0h53p61baobrh4ubfncba5g9rdk2s@4ax.com>
>
>On Sun, 12 Sep 2004 03:00:11 GMT, $_@_.%_ wrote:
>
>>
>>Bëelphazoar <http://joecosby.com/code/mail.pl> wrote in message-id:
>><9i57k0hfs4ov5orh4cji217f55icn6lnrq@4ax.com>
>>>
>>>
>>>I am working on a problem, I have text in a database which includes
>>>the word "más".  The "á" is ASCII value 225/E1 .
>>>
>>>It is definitely this inside the database.
>>>
>>>The code pulls the text out of the database and assigns it to a
>>>variable, but when I print the variable it is now "mĂĄs", the "á" has
>>>been replaced by C3A1 .
>>>
>>>I am PRETTY sure that this is not happening within the code I am
>>>working on, if I am following the code flow correctly it looks like it
>>>does nothing but pull the text from the database and pass it back.
>>>
>>>Digging around in various Perl docs, I found some references which say
>>>that Perl will decide whether to use double-byte for chars > 127, it
>>>looks like that is what's happening here.
>>>
>>>I tried doing this:
>>>
>>>use bytes;
>>>$myVar = pullTextFromDb();
>>>no bytes;
>>>
>>>but I still got the double-byte translation.
>>>
>>>Does anybody have any pointers about how to proceed further debugging
>>>this?
>>>
>>
>>Try perldoc encode
>>
>>
>>
>
>No documentation found for "encode".
>

It may be worth your looking for this doc on the web then.
Here are a couple examples copied from perldoc encode:

For example, to convert a string from Perl's internal format to
iso-8859-1 (also known as Latin1),

  $octets = encode("iso-8859-1", $string);

For example, to convert ISO-8859-1 data to a string in Perl's internal
format:

  $string = decode("iso-8859-1", $octets);

Using PerlIO

open my $in,  "<:encoding(shiftjis)", $infile  or die;
open my $out, ">:encoding(euc-jp)",   $outfile or die;
while(<$in>){ print $out $_; }

HtH




------------------------------

Date: Sun, 12 Sep 2004 09:28:13 -0700
From: Bëelphazoar <http://joecosby.com/code/mail.pl>
Subject: Re: Perl opting for double-byte chars?
Message-Id: <h7u8k05u2ke24ngua79o8gm0b5p9b66vtl@4ax.com>

On Sun, 12 Sep 2004 16:06:10 GMT, $_@_.%_ wrote:

>
>Bëelphazoar <http://joecosby.com/code/mail.pl> wrote in message-id:
><ofr7k0h53p61baobrh4ubfncba5g9rdk2s@4ax.com>
>>
>>On Sun, 12 Sep 2004 03:00:11 GMT, $_@_.%_ wrote:
>>
>>>
>>>Bëelphazoar <http://joecosby.com/code/mail.pl> wrote in message-id:
>>><9i57k0hfs4ov5orh4cji217f55icn6lnrq@4ax.com>
>>>>
>>>>
>>>>I am working on a problem, I have text in a database which includes
>>>>the word "más".  The "á" is ASCII value 225/E1 .
>>>>
>>>>It is definitely this inside the database.
>>>>
>>>>The code pulls the text out of the database and assigns it to a
>>>>variable, but when I print the variable it is now "mĂĄs", the "á" has
>>>>been replaced by C3A1 .
>>>>
>>>>I am PRETTY sure that this is not happening within the code I am
>>>>working on, if I am following the code flow correctly it looks like it
>>>>does nothing but pull the text from the database and pass it back.
>>>>
>>>>Digging around in various Perl docs, I found some references which say
>>>>that Perl will decide whether to use double-byte for chars > 127, it
>>>>looks like that is what's happening here.
>>>>
>>>>I tried doing this:
>>>>
>>>>use bytes;
>>>>$myVar = pullTextFromDb();
>>>>no bytes;
>>>>
>>>>but I still got the double-byte translation.
>>>>
>>>>Does anybody have any pointers about how to proceed further debugging
>>>>this?
>>>>
>>>
>>>Try perldoc encode
>>>
>>>
>>>
>>
>>No documentation found for "encode".
>>
>
>It may be worth your looking for this doc on the web then.
>Here are a couple examples copied from perldoc encode:
>
>For example, to convert a string from Perl's internal format to
>iso-8859-1 (also known as Latin1),
>
>  $octets = encode("iso-8859-1", $string);
>
>For example, to convert ISO-8859-1 data to a string in Perl's internal
>format:
>
>  $string = decode("iso-8859-1", $octets);
>
>Using PerlIO
>
>open my $in,  "<:encoding(shiftjis)", $infile  or die;
>open my $out, ">:encoding(euc-jp)",   $outfile or die;
>while(<$in>){ print $out $_; }
>
>HtH
>

Thanks very much, do you know what package/module encode is in, off
the top of your head?

It may not be in the installation I'm using

-- 
Bëelphazoar
International Satanic Conspiracy
Customer Support Specialist
http://joecosby.com/ 
       Always do sober what you said you'd do drunk.
          That will teach you to keep your mouth shut.
                - Ernest Hemingway 
 


------------------------------

Date: Sun, 12 Sep 2004 09:33:25 -0700
From: Bëelphazoar <http://joecosby.com/code/mail.pl>
Subject: Re: Perl opting for double-byte chars?
Message-Id: <ugu8k0d49hn6i2kd07399mnodn2mf69l04@4ax.com>


Thanks for your help Alan and Shawn, I think you have given me enough
to work with, I will post back if the leads you've presented don't
resolve the issue I'm getting.

-- 
Bëelphazoar
International Satanic Conspiracy
Customer Support Specialist
http://joecosby.com/ 
You mystics are a sorry lot, always whimping about so-and-so's "ego"
getting in the way of their "detachment." Take it to alt.zen.ego-death,
for the love of pete! This is alt.MAGICK. 
 


------------------------------

Date: 12 Sep 2004 10:11:56 -0700
From: jl_post@hotmail.com (J. Romano)
Subject: Re: Perl opting for double-byte chars?
Message-Id: <b893f5d4.0409120911.794b5261@posting.google.com>

Bëelphazoar <http://joecosby.com/code/mail.pl> wrote in message news:<9i57k0hfs4ov5orh4cji217f55icn6lnrq@4ax.com>...
>
> I am working on a problem, I have text in a database which
> includes the word "más".  The "á" is ASCII value 225/E1 .


Dear Joe,

   It will help a lot if you give us the output of "perl -v".  I'm
sure Unicode has something to do with your problem, but Unicode
support has been changing (updating) in recent versions of Perl. 
Without knowing the version of Perl you're using and the platform
you're using it on, we can only guess as to what the problem is.

  By the way, are you SURE that "á" is the extended ASCII value 225? 
According to one source I have, it is extended ASCII value 160.  Maybe
we're using different code pages, but it's worth checking.

> ASCII only defines the low 7 bits, whcih are the same
> character representations in most english-based code
> pages.
>
> In addition to ASCII there is unicode, which is 16-bit,
> and which, somewhere in my application, is apparently
> being used when the "á" is used because it is greater
> than 127.

   You're wrong about Unicode being 16-bit.  That's a myth.  It CAN be
encoded in two bytes (16 bits), but it can also be encoded using a
different method called UTF-8 (which is what Perl normally uses
internally).  The UTF-8 encoding uses variable-length character
encoding, which means that a character can be encoded in one to six
bytes.  In your case, the character whose value is greater than 127 is
being encoded in two bytes, whereas the other characters (< 128) are
being encoded in one byte.

   Understand?  If you don't, here's a great link to an FAQ I used to
understand more about how Unicode is encoded:

   http://www.cl.cam.ac.uk/~mgk25/unicode.html

You may also want to check the following perldocs (which, depending on
your version of Perl, you may or may not have all of):

   perldoc Encode
   perldoc perluniintro
   perldoc Unicode::String

> The code pulls the text out of the database and
> assigns it to a variable, but when I print the
> variable it is now "mĂĄs", the "á" has been
> replaced by C3A1 .

   This certainly looks to me like UTF-8 Unicode encoding, but let's
check just to make sure:

According to the FAQ (whose link I mentioned above), a Unicode
character value can be UTF-8 encoded using one to six bytes:

1: 0xxxxxxx
2: 110xxxxx 10xxxxxx
3: 1110xxxx 10xxxxxx 10xxxxxx
4: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
5: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
6: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

where "x" is a bit that stands for the Unicode value.

0xC3A1 is two bytes long.  Its bit representation is:

   11000011 10100001

So when you apply the 2-byte bit pattern to it:

   110xxxxx 10xxxxxx

the "x"s stand represent the bits:   00011 100001

Put them together and you get 11100001 which is the binary
representation of 225.  Therefore, we now know that character number
225, when encoded into UTF-8 encoding, results in the two bytes 0xc3
and 0xa1, which is exactly what you're seeing.
   
> I am PRETTY sure that this is not happening
> within the code I am working on, if I am following
> the code flow correctly it looks like it does
> nothing but pull the text from the database and
> pass it back.

   SOMEWHERE in the code the characters greater than 127 are being
converted from extended-ASCII to UTF-8 encoding, but it's hard to say
exactly where unless I have access to the code.  Therefore, I'll leave
it up to you to figure out where it's happening.

   But even if you do find where this is happening, you will still
have to deal with the problem of converting the two-byte UTF-8
representation (of characters greater than 127) to their one-byte
extended-ASCII equivalent.  żComprende?

   I'm not sure how to do this, but here are three things you can try.
 Whether or not each one works may depend on the version of Perl you
are using, so letting me know your "perl -v" output may help me out.

----------------------------------------
# Method 1:  Convince Perl that your string
#            is UTF-8 encoded:
use Encode;
$string = pullTextFromDb();
# Convince Perl that $string is in UTF-8 format:
$string = decode_utf8($string);
# Convert UTF-8 string to extended-ASCII:
$string = encode("iso-8859-1", $string);
----------------------------------------
# Method 2:  Tell Perl that $string is UTF-8
#            encoded and that you want its
#            latin1 equivalent:
use Unicode::String qw(utf8 latin1);
$string = pullTextFromDb();
$string = utf8($string)->latin1();
----------------------------------------
# Method 3:  Tell Perl to pack each character's
#            Unicode value into just one byte
#            of a larger string:
$string = pullTextFromDb();
$string = pack "C*", map ord, split //, $string;
----------------------------------------

   Try all these and see if any of them work.  Again, what works and
what doesn't work might very well depend on the version of Perl that
you're using.  Also, even if one of them does work, some other part of
your code might be converting it back to UTF-8 encoding, undo-ing the
conversion you just made.

   But it's still worth a shot to try them out.  Hopefully one of the
above three methods will work for you, and your problem will be "no
más."

   I hope this helps, Joe.

   -- Jean-Luc


------------------------------

Date: Sun, 12 Sep 2004 18:30:53 +0100
From: "Alan J. Flavell" <flavell@ph.gla.ac.uk>
Subject: Re: Perl opting for double-byte chars?
Message-Id: <Pine.LNX.4.61.0409121807400.4476@ppepc56.ph.gla.ac.uk>

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--616733697-1523645840-1095010253=:4476
Content-Type: TEXT/PLAIN; charset=iso-8859-1
Content-Transfer-Encoding: 8BIT

On Sun, 12 Sep 2004, Tassilo v. Parseval wrote:

> Also sprach Bëelphazoar:
> 
> > On Sun, 12 Sep 2004 02:12:26 +0100, "Alan J. Flavell"
> ><flavell@ph.gla.ac.uk> wrote:

[...nothing that was quoted here...]

> > In addition to ASCII there is unicode, which is 16-bit, 
> 
> No, that's not Unicode. Unicode is foremost just a mapping between
> numbers and characters. Each character thusly has a unique number. 

Agreed.  And those numbers no longer fit into 16 bits, in general.
As you indeed imply later.

> When you talk about bits, you are really talking about encodings.

Right; and in fact Unicode has now specialised the terminology even 
further.  See chapter 2, 
http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf

The abstract code point values are embodied in an "Encoding Form"
consisting of "code units" of a particular size (may be 8, 16 or 32 
bits), and then the "Encoding form" is transmitted by an "Encoding 
Scheme" which represents those units as a sequence of octets (bytes)
on a transmission channel.

Fortunately, for utf-8 that final step is one-to-one.  But the
distinction becomes important for utf-16-based and utf-32-based
encoding schemes.

> Unicode defines three encodings: UTF-(8|16|32).

Right.  Those are "Encoding Forms" in the new terminology, and they 
become the "Seven Encoding Schemes" (one utf-8, three utf-16 and
three utf-32, as shown in Table 2-3 in chapter 2.

> Perl internally uses UTF-8 which is a variable width encoding 
> meaning a character can have anything between one and four bytes.

Indeed.  The original algorithm which defined utf-8 could have 
represented code point values up to 7fff ffff (which needs 6 octets in 
encoded form); but Unicode has stated that no characters will be 
defined beyond 0010 ffff, and thus 4 octets are now sufficient.  
rfc3629 obsoletes 2279 ("film at 11").

> This distinction sounds like hairsplitting, but it's crucial if you ever
> want to understand what Unicode is about and how to use it properly.

Agreed.  The hardest part is un-learning things which used to seem 
obvious!

all the best

(No offence meant - just trying to build on what you had already
posted.)
--616733697-1523645840-1095010253=:4476--


------------------------------

Date: Sun, 12 Sep 2004 18:43:11 +0100
From: "Alan J. Flavell" <flavell@ph.gla.ac.uk>
Subject: Re: Perl opting for double-byte chars?
Message-Id: <Pine.LNX.4.61.0409121831550.4476@ppepc56.ph.gla.ac.uk>

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--616733697-959082906-1095010991=:4476
Content-Type: TEXT/PLAIN; charset=iso-8859-1
Content-Transfer-Encoding: 8BIT

On Sun, 12 Sep 2004, J. Romano wrote:

>   By the way, are you SURE that "á" is the extended ASCII value 225? 

There is no such thing as "extended ASCII", so the question is moot.

There are large numbers of 8-bit character codings which have
ASCII as their low half.  The one that's used in polite Latin-1 
circles is iso-8859-1, in which 225 decimal is small a-acute.

> According to one source I have, it is extended ASCII value 160.

That would be the old MS-DOS encodings, such as CP-437 (for US 
residents) or CP-850 (for the Latin-1 locale).  Dinosaurs.

> so letting me know your "perl -v" output may help me out.

Good advice, indeed! 

[some useful diagnostic suggestions snipped]

[but please, let's hear no more of this mythical "extended ASCII"
character code.]
--616733697-959082906-1095010991=:4476--


------------------------------

Date: Sun, 12 Sep 2004 17:54:41 GMT
From: Ala Qumsieh <notvalid@email.com>
Subject: Re: Perl opting for double-byte chars?
Message-Id: <B301d.19385$qY6.11485@newssvr29.news.prodigy.com>

Bëelphazoar wrote:
> Thanks very much, do you know what package/module encode is in, off
> the top of your head?
> 
> It may not be in the installation I'm using

	perldoc Encode

case matters except on broken OSes.

--Ala



------------------------------

Date: Sun, 12 Sep 04 11:49:48 GMT
From: jmfbahciv@aol.com
Subject: Re: Xah Lee's Unixism
Message-Id: <41444b11$0$6932$61fed72c@news.rcn.com>

In article <1549.748T655T9283520@kltpzyxm.invalid>,
   "Charlie Gibbs" <cgibbs@kltpzyxm.invalid> wrote:
>In article <opsd2vlvy7pqzri1@mjolner.upc.no>, john.thingstad@chello.no
>(John Thingstad) writes:
>
>>On Thu, 09 Sep 04 13:12:17 GMT, <jmfbahciv@aol.com> wrote:
>>
>>> I really want to know.  People keep saying this but never say which
>>> freedoms have been lost.
>>
>>Since this is somewhat related to computer programming and AI I will
>>reply.
>>
>>The US has started a initiative to integrate all information about
>>people in the USA into a central database.
>
>Not just people in the USA.
>
>>This includes confidential information like your medical files.
>
><snip>
>
>>The main challenge in computing is sieving through the amount of data.
>>Politically it is to pressure the foreign governments to wave their
>>privacy protection acts and allow unlimited access to information to
>>a foreign power.
>
>It's been revealed that here in British Columbia (that part of
>Canada on the Pacific coast for those of you who are geographically
>challenged), management of medical information has been farmed out
>to a subsidiary of a U.S. corporation.

I'll bet one of your pennies that the subsidiary has farmed it
back out of the country.

> ..  According to the Patriot Act,
>the U.S. government is entitled to access these files, and anyone -
>American or Canadian - who so much as mentions that they're doing it
>can be thrown into a U.S. jail.

[emoticon daydreams about certain talking heads getting caught]

>
>>Don't know what you think of this but it scares the hell out of me!
>
>Me too.

Sure.  But the whole thing becomes moot if western civ is gone.
There are other things getting put into law and custom by 
politicians that are even scarier but there won't be any chance
of rectifying rabid Republican brain damage if there isn't
any civ left.

/BAH


Subtract a hundred and four for e-mail.


------------------------------

Date: Sun, 12 Sep 04 11:51:17 GMT
From: jmfbahciv@aol.com
Subject: Re: Xah Lee's Unixism
Message-Id: <41444b6a$0$6932$61fed72c@news.rcn.com>

In article <4141c830$0$65574$a1866201@newsreader.visi.com>,
   Grant Edwards <grante@visi.com> wrote:
>On 2004-09-10, Alan Balmer <albalmer@att.net> wrote:
>
>>>It's been revealed that here in British Columbia (that part of
>>>Canada on the Pacific coast for those of you who are geographically
>>>challenged), management of medical information has been farmed out
>>>to a subsidiary of a U.S. corporation.  According to the Patriot Act,
>>>the U.S. government is entitled to access these files, and anyone -
>>>American or Canadian - who so much as mentions that they're doing it
>>>can be thrown into a U.S. jail.
>>
>> Can you point to the relevant section(s) of the Act?
>>
>> Can you point to the international agreement which allows Canadian
>> citizens to be thrown into US jails for the stated offense?
>
>I know I shouldn't reply to threads like this, but I just can't
>help it...
>
>What makes you think that the current US government gives a
>shit about international agreements?  Bush thinks he's entitled
>to declare anybody and everybody an "enemy combatant" and lock
>them up in secret forever.  

Would rather he do like Italy?  They are letting them go.
Then these released people go blow up something else.

/BAH

Subtract a hundred and four for e-mail.


------------------------------

Date: Sun, 12 Sep 2004 17:21:55 +0300
From: Bulent Murtezaoglu <bm@acm.org>
Subject: Re: Xah Lee's Unixism
Message-Id: <87zn3v7ekc.fsf@p4.internal>

>>>>> "jmf" == jmfbahciv  <jmfbahciv@aol.com> writes:
    jmf>    Grant Edwards <grante@visi.com> wrote:
    >> ...  Bush thinks he's entitled
    >> to declare anybody and everybody an "enemy combatant" and lock
    >> them up in secret forever.

    jmf> Would rather he do like Italy?  They are letting them go.
    jmf> Then these released people go blow up something else. [...]

Why are those the only two choices?  Do you think people turn into 
bomb-wielding terrorists by feat of mere suspicion?

I don't think the US abuses the 'enemy combatant' device as much as we
fear, yet.   But if the people in the US are convinced that the choice is
between getting blown up and secret detentions w/o judicial oversight
then it will get far worse than we fear.  

I am beginning to think the US gov't and populace alike might be
believing the "they hate us for our freedoms" line and trying to get rid 
of the said freedoms in the hope that it will appease the terrorists.

Look, what is to prevent your government from putting cuffs on me and 
shipping me off to a dungeon the next time I am in the US because of 
the sentence above?  Would I see a judge?  Lawyer?  Would anybody even 
know?  Are you guys truly scared enough to sanction this kind of behaviour 
from your gov't?  

cheers,

BM



------------------------------

Date: Sun, 12 Sep 04 13:32:50 GMT
From: jmfbahciv@aol.com
Subject: Re: Xah Lee's Unixism
Message-Id: <41446336$0$6925$61fed72c@news.rcn.com>

In article <87zn3v7ekc.fsf@p4.internal>,
   Bulent Murtezaoglu <bm@acm.org> wrote:
>>>>>> "jmf" == jmfbahciv  <jmfbahciv@aol.com> writes:
>    jmf>    Grant Edwards <grante@visi.com> wrote:
>    >> ...  Bush thinks he's entitled
>    >> to declare anybody and everybody an "enemy combatant" and lock
>    >> them up in secret forever.
>
>    jmf> Would rather he do like Italy?  They are letting them go.
>    jmf> Then these released people go blow up something else. [...]
>
>Why are those the only two choices?  Do you think people turn into 
>bomb-wielding terrorists by feat of mere suspicion?

Oh, sigh!  [emoticon begins to hit head against wall because
it feels better]


>
>I don't think the US abuses the 'enemy combatant' device as much as we
>fear, yet. 

Hint..the US isn't abusing enemy combatants.

> ...  But if the people in the US are convinced that the choice is
>between getting blown up and secret detentions w/o judicial oversight
>then it will get far worse than we fear. 

WHAT SECRET DETENTIONS?
 
>
>I am beginning to think the US gov't and populace alike might be
>believing the "they hate us for our freedoms" line and trying to get rid 
>of the said freedoms in the hope that it will appease the terrorists.

Now there you actually made a point, but not the one you think you
did.  
>
>Look, what is to prevent your government from putting cuffs on me and 
>shipping me off to a dungeon the next time I am in the US because of 
>the sentence above? 

Too many people coming in.  As long as you don't stand up and
shout bomb or make a fool of yourself going through customs
and fill out the paperwork without trying to be a smartass,
I don't see people who are already overworked and stretched
thin bothering with you.


> .. Would I see a judge?  Lawyer?

I don't know.  I had understood that, if you didn't get
through customs, you were put back on a plane out of the 
country.

> ...  Would anybody even 
>know?  

Yes.  Lots of people.

> ..Are you guys truly scared enough to sanction this kind of behaviour 
>from your gov't?  

If you are a terrorist with the intent to wreak death and
destruction in this country, I sure as hell hope somebody
doesn't let you in.  

/BAH

Subtract a hundred and four for e-mail.


------------------------------

Date: Sun, 12 Sep 2004 13:48:00 +0100
From: Steve O'Hara-Smith <steveo@eircom.net>
Subject: Re: Xah Lee's Unixism
Message-Id: <20040912134800.10ab1a64.steveo@eircom.net>

On Sat, 11 Sep 2004 23:30:28 +0200
lin8080 <lin8080@freenet.de> wrote:

> 
> 
> Steve O'Hara-Smith schrieb:
>  
> >         One thing I always found amusing is the amount of science *fiction*
> > written in the first half of this period about what would happen if the
> > worlds computers became linked together.
> 
> Hi,
> as long as nothing new goes in, nothing. Maybe we read yesterdays
> papers?

	Most of the SF carried the assumption that once a certain level of
complexity was reached the network would "wake up" as some kind of self aware
intelligence.

-- 
C:>WIN                                      |   Directable Mirror Arrays
The computer obeys and wins.                | A better way to focus the sun
You lose and Bill collects.                 |    licences available see
                                            |    http://www.sohara.org/


------------------------------

Date: Sun, 12 Sep 2004 20:04:19 +0300
From: Bulent Murtezaoglu <bm@acm.org>
Subject: Re: Xah Lee's Unixism
Message-Id: <87pt4r771o.fsf@p4.internal>

>>>>> "jmf" == jmfbahciv  <jmfbahciv@aol.com> writes:
[...]
    jmf> Would rather he do like Italy?  They are letting them go.
    jmf> Then these released people go blow up something else. [...]
    bm> Why are those the only two choices?  Do you think people turn
    bm> into bomb-wielding terrorists by feat of mere suspicion?

    jmf> Oh, sigh!  [emoticon begins to hit head against wall because
    jmf> it feels better]

I didn't mean to upset you.  But sigh indeed.  Offtopic in all groups 
too.  Maybe we should get jailed?  Who knows _what else_ we might be 
up to?  Can't be too cautious these days.  What color was that alert 
now?  Better call the authorities.

    bm> I don't think the US abuses the 'enemy combatant' device as
    bm> much as we fear, yet.

    jmf> Hint..the US isn't abusing enemy combatants.

Um, I said 'the enemy combatant device' not the people themselves.
There's no doubt that the people themselves are being abused.  That's
the whole point of a separate status, no?  I thought the 'enemy
combatant' designation was devised to go around both the US law, and
the Geneva Convention pertaining to POWs.  As for the _US_ doing it, 
yes you are correct, the nation itself isn't doing it.  Indeed the 
whole reason for the invention of this odd locution was the thought 
that the nation would have expected its gov't to at least appear 
to stay within certain boundaries.  Maybe they needen't have bothered?  
 
    >> ...  But if the people in the US are convinced that the choice
    >> is between getting blown up and secret detentions w/o judicial
    >> oversight then it will get far worse than we fear. [...]

    jmf> WHAT SECRET DETENTIONS?
 
Responding in "hints" and ALL CAPS brings us to the ludicrous situation
where a Turk gets to give a pointer to the ACLU to an American:

http://www.aclu.org/SafeandFree/SafeandFree.cfm?ID=13079&c=207

;) 

cheers,

BM


    >> I am beginning to think the US gov't and populace alike might
    >> be believing the "they hate us for our freedoms" line and
    >> trying to get rid of the said freedoms in the hope that it will
    >> appease the terrorists.

    jmf> Now there you actually made a point, but not the one you
    jmf> think you did.

Let's hear it.  

    >> Look, what is to prevent your government from putting cuffs on
    >> me and shipping me off to a dungeon the next time I am in the
    >> US because of the sentence above?

    jmf> Too many people coming in.  As long as you don't stand up and
    jmf> shout bomb or make a fool of yourself going through customs
    jmf> and fill out the paperwork without trying to be a smartass, I
    jmf> don't see people who are already overworked and stretched
    jmf> thin bothering with you.


    >> .. Would I see a judge?  Lawyer?

    jmf> I don't know.  I had understood that, if you didn't get
    jmf> through customs, you were put back on a plane out of the
    jmf> country.

    >> ...  Would anybody even know?

    jmf> Yes.  Lots of people.

    >> ..Are you guys truly scared enough to sanction this kind of
    >> behaviour from your gov't?

    jmf> If you are a terrorist with the intent to wreak death and
    jmf> destruction in this country, I sure as hell hope somebody
    jmf> doesn't let you in.

    jmf> /BAH

    jmf> Subtract a hundred and four for e-mail.


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 6996
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[24845] in Perl-Users-Digest

Perl-Users Digest, Issue: 6996 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Sun Sep 12 14:06:33 2004

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Sep 12 14:06:33 2004