[32542] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 3807 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Nov 1 09:09:25 2012

Date: Thu, 1 Nov 2012 06:09:10 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 1 Nov 2012     Volume: 11 Number: 3807

Today's topics:
    Re: array <kst-u@mib.org>
    Re: array <rweikusat@mssgmbh.com>
    Re: array (Seymour J.)
    Re: lerning perl <nospam@nspam.invalid>
    Re: lerning perl <nospam@nspam.invalid>
    Re: lerning perl <jurgenex@hotmail.com>
    Re: lerning perl <justin.1210@purestblue.com>
    Re: lerning perl <hjp-usenet2@hjp.at>
        Mime::Lite module generating an error <dn.perl@gmail.com>
    Re: Mime::Lite module generating an error <rvtol+usenet@xs4all.nl>
    Re: perl and indent <uri@stemsystems.com>
    Re: perl and indent (Seymour J.)
    Re: perl and indent (Seymour J.)
    Re: perl and indent <uri@stemsystems.com>
    Re: Simple (Rookie) Question <hansmu@xs4all.nl>
    Re: Why "Wide character in print"? <hhr-m@web.de>
    Re: Why "Wide character in print"? <whynot@pozharski.name>
    Re: Why "Wide character in print"? <hjp-usenet2@hjp.at>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Tue, 30 Oct 2012 19:59:41 -0700
From: Keith Thompson <kst-u@mib.org>
Subject: Re: array
Message-Id: <lnr4ofpcrm.fsf@nuthaus.mib.org>

"Bill Cunningham" <nospam@nspam.invalid> writes:
> Eli the Bearded wrote:
>> In comp.lang.perl.misc, Bill Cunningham <nospam@nspam.invalid> wrote:
>>>     What's wrong in this code?
>>>
>>> use strict;
>>> use warnings;
>>>
>>> my @cats=["striper","snowball"];
>>           ^                    ^
>>
>> You probably want parens there:
>>
>>  my @cats=("striper","snowball");
>
>     Oh. OK left over from C where arrays are in []. Gotta learn something 
> new.

Perl does use [] for array accesses.

Don't base your attempt to learn Perl on your knowledge of C.
It's a very different language (with some similar syntax) with a
very different memory model.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
    Will write code for food.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"


------------------------------

Date: Wed, 31 Oct 2012 12:02:48 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: array
Message-Id: <87390uj1cn.fsf@sapphire.mobileactivedefense.com>

"Bill Cunningham" <nospam@nspam.invalid> writes:
> Eli the Bearded wrote:
>> In comp.lang.perl.misc, Bill Cunningham <nospam@nspam.invalid> wrote:
>>>     What's wrong in this code?
>>>
>>> use strict;
>>> use warnings;
>>>
>>> my @cats=["striper","snowball"];
>>           ^                    ^
>>
>> You probably want parens there:
>>
>>  my @cats=("striper","snowball");
>
>     Oh. OK left over from C where arrays are in []. Gotta learn something 
> new.

The difference is that (1, 2, 3) in list context is a so-called list
value constructor this means it evaluates to a list of the values in
brackets. That's what you have to use when you want to initialize an
array:

my @a = ('striper', 'snowball')

Assigning a list of values to an array has the obvious semantics. In
contrast to this, [1, 2, 3] creates a reference to an anonymous array
holding the values inside the brackets. Assigning that to an array, as
in

my @a = [1, 2, 3];

assigns this reference to an anonymous array to $a[0].

[...]

>> With your code, striper and snowball are in:
>>
>>  print $cats[0][0];
>>  print $cats[0][1];
>
>     Goodness when I see this I think char **. I have to change my thinking 
> here. There are no pointers in perl are there?

There are 'references' which perform a similar function for perl. The
perlref and perlrefut, perldsc and perllol manapage explain most of
the non-OO-relevant details[*].

	[*] Since the people who presently control perl consider it
	sexy to remove technically correct documentation for political
	reasons, you might or might not have them.
        


------------------------------

Date: Mon, 29 Oct 2012 19:58:02 -0500
From: Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>
Subject: Re: array
Message-Id: <508f261a$22$fuzhry+tra$mr2ice@news.patriot.net>

In <k6mour$c0p$1@dont-email.me>, on 10/29/2012
   at 04:29 PM, "Bill Cunningham" <nospam@nspam.invalid> said:

>There are no pointers in perl are there?

There are references.

-- 
Shmuel (Seymour J.) Metz, SysProg and JOAT  <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action.  I reserve the
right to publicly post or ridicule any abusive E-mail.  Reply to
domain Patriot dot net user shmuel+news to contact me.  Do not
reply to spamtrap@library.lspace.org



------------------------------

Date: Tue, 30 Oct 2012 11:42:29 -0500
From: "Bill Cunningham" <nospam@nspam.invalid>
Subject: Re: lerning perl
Message-Id: <k6osgu$p8h$1@dont-email.me>

Henry Law wrote:

> Done, thank you; forewarned.  Now let's see; who knows, things might
> be different.

    I've had a lot of trouble with C over the years but I am still trying to 
master it. I'm not a professional programmer but I've read about perl and 
I'm still checking things out. I have invested so much time in C I'm not 
going to leave it. Even knowing how to program things in C I don't know 
algorithms and that's needed for proper programing. I do some programming in 
the unix api. Like I am studying sockets. Python has been suggested to me by 
others over the years and I looked at it and perl just seemed better. What 
can I say I try.

Bill




------------------------------

Date: Tue, 30 Oct 2012 14:54:21 -0500
From: "Bill Cunningham" <nospam@nspam.invalid>
Subject: Re: lerning perl
Message-Id: <k6p7ol$f63$1@dont-email.me>

Jürgen Exner wrote:

> Get yourself a book. If you are an experienced programmer in some
> other language then "Programming Perl" is a very good start. If you
> are new to programming itself then 'Learning Perl" is often highly
> recommended.
>
>> Where can I get the meanings of the
>> functions? man open works with the unix API and man fopen with C,
>> but what about perl functions? man key for example doesn't work.
>
> Use perldoc:
> perldoc -f open
> perldoc -f keys

    Ok my linux implementation has perldoc in it's own package. I have it 
installed now.

> To learn more about perldoc use perldoc
> perldoc perldoc
>
> And for frequently asked question, e.g. about what books there are use
> perldoc -q books
>
> jue 




------------------------------

Date: Tue, 30 Oct 2012 16:55:50 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: lerning perl
Message-Id: <e2q0985tbe8tovfkdk08fss958ii1vb04e@4ax.com>

"Bill Cunningham" <nospam@nspam.invalid> wrote:
>[...] Even knowing how to program things in C I don't know 
>algorithms and that's needed for proper programing. 

In other words: you are and have been putting the cart before the horse.
Maybe you should look into learning programming first. Once you
understand that then switching to a different programming language is
usually the easy part (yes, there are exceptions). 

jue


------------------------------

Date: Wed, 31 Oct 2012 15:10:37 +0000
From: Justin C <justin.1210@purestblue.com>
Subject: Re: lerning perl
Message-Id: <dkr7m9-k7d.ln1@zem.masonsmusic.co.uk>

On 2012-10-30, JÃ¼rgen Exner <jurgenex@hotmail.com> wrote:
> "Bill Cunningham" <nospam@nspam.invalid> wrote:
>>[...] Even knowing how to program things in C I don't know 
>>algorithms and that's needed for proper programing. 
>
> In other words: you are and have been putting the cart before the horse.
> Maybe you should look into learning programming first. Once you
> understand that then switching to a different programming language is
> usually the easy part (yes, there are exceptions). 
>
> jue


I feel a certain affinity with the OP. My first
programming was punched cards (at school), after that
I taught myself BASIC on a ZX Spectrum. I went on an
'Introduction to C course' intending to take the
follow-on course but the college dropped it saying
there was no demand. And there my education stopped
until I bought Learning Perl.

I've only ever learnt programming as part of learning
a programming language. I'd never considered it
possible to learn programming without a language
being associated. Could you point me (and the OP if
he's interested) in the direction of suitable /
relevant material?


   Justin.

-- 
Justin C, by the sea.


------------------------------

Date: Thu, 1 Nov 2012 12:42:31 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: lerning perl
Message-Id: <slrnk94o17.5vl.hjp-usenet2@hrunkner.hjp.at>

On 2012-10-31 15:10, Justin C <justin.1210@purestblue.com> wrote:
> On 2012-10-30, Jürgen Exner <jurgenex@hotmail.com> wrote:
>> "Bill Cunningham" <nospam@nspam.invalid> wrote:
>>>[...] Even knowing how to program things in C I don't know 
>>>algorithms and that's needed for proper programing. 
>>
>> In other words: you are and have been putting the cart before the horse.
>> Maybe you should look into learning programming first. Once you
>> understand that then switching to a different programming language is
>> usually the easy part (yes, there are exceptions). 
>
>
> I feel a certain affinity with the OP. My first
> programming was punched cards (at school), after that
> I taught myself BASIC on a ZX Spectrum. I went on an
> 'Introduction to C course' intending to take the
> follow-on course but the college dropped it saying
> there was no demand. And there my education stopped
> until I bought Learning Perl.
>
> I've only ever learnt programming as part of learning
> a programming language. I'd never considered it
> possible to learn programming without a language
> being associated.

I don't think you can learn programming without a language (that would
be like learning to write novels without a language).

But you can't learn programming without algorithms either.

Programming is the art of finding algorithms and expressing them in a
formal language.

If you only learn what the elements of a programming language mean but
not how to put them together, you will never be able to program. If you
aren't able to analyse a problem, to find a repeatable way to solve the
problem (= an algorithm), you won't be able to program.

Actually writing down the algorithm in a specific language is easiest
part (although the devil can certainly be in the details), and it is
also mostly interchangable. If you understand a problem and its solution
and you can write it down in one language, you can also write it in any
other language with a little effort. (This is also apparent in the
existence of compilers: Compilers are programs which translate from one
programming language to another: They have existed for a long time, so
that's obviously a simple problem. But there are no programs which can
really program. So that's a hard problem which needs human creativity)

(I'm not even sure if everybody can learn to program: Some people just
don't seem to have the knack)

	hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR       | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Wed, 31 Oct 2012 22:43:52 -0700 (PDT)
From: "dn.perl@gmail.com" <dn.perl@gmail.com>
Subject: Mime::Lite module generating an error
Message-Id: <c6d43597-b122-4df0-b346-5aa2fa63a078@vy11g2000pbb.googlegroups.com>


I can send email from my linux server with 'mailx' command. I could
also send an email from it using Mime::Lite module until recently.
Today the same old working module has started failing, and it gives an
error: Illegal Seek.
What could be happening?

The mailx cmd runs well: mailx -s "subject1 sub 2"
myname@mycompany.com <aaa

cmd: ./local-send-mail-test.pl
The error:
rv = 1, Illegal seek
An email, with subject: (sample subject), has been sent to myname-etc-
etc


Code as below : (You will need to edit the line marked 'edit-this-
line' before running it.
#!/usr/local/bin/perl

use strict ;
use warnings ;

use MIME::Lite;

my $my_email = 'myname@mycompany.com' ;  ## edit-this-line
my $subject = 'sample subject' ;
my $message = "line 1, line 1" ;
my $msg_body_type = 'text' ;
my $msg = MIME::Lite->new(
        From=>$my_email,
        To => "$my_email",
        Subject => $subject,
        Type => 'multipart/related',
);
$msg->attach(
    Type => $msg_body_type,   #-# text/html or text
    Data => $message
);
my $rv = $msg->send() ;
print "rv = $rv, $! \n\nAn email, with subject: ($subject), has been
sent to $my_email\n\n" ;

##=============


------------------------------

Date: Thu, 01 Nov 2012 08:28:25 +0100
From: "Dr.Ruud" <rvtol+usenet@xs4all.nl>
Subject: Re: Mime::Lite module generating an error
Message-Id: <50922490$0$6844$e4fe514c@news2.news.xs4all.nl>

On 2012-11-01 06:43, dn.perl@gmail.com wrote:

> my $rv = $msg->send() ;
> print "rv = $rv, $! \n\nAn email, with subject: ($subject), has been
> sent to $my_email\n\n" ;

You are printing "$!" for no good reason.

It is a special global variable, that can basically have any value.
You should only print that variable if you know that there was an error.
Now it is probably just about something that send() was trying.

-- 
Ruud



------------------------------

Date: Wed, 31 Oct 2012 00:27:10 -0400
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: perl and indent
Message-Id: <87625rl10h.fsf@stemsystems.com>

>>>>> "JE" == JÃ¼rgen Exner <jurgenex@hotmail.com> writes:

  >> An even more perlish alternative;
  >> process($_) for @list;

  JE> This is where _I_ would ask what is the difference to 
  JE> 	foreach (@list) {process ($_)};

no block entry on each loop iteration which is overhead. also i would
almost never format a foreach loop on one line so the for modifier save
lines, pixels, {} chars and thousand of lost souls. actually fewer lines
mean fewer bugs given the old saw of 1 bug per 100 lines in any language
average.

i like foreach modifier a great deal and use it when i can.

uri


------------------------------

Date: Wed, 31 Oct 2012 08:22:24 -0500
From: Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>
Subject: Re: perl and indent
Message-Id: <50912610$7$fuzhry+tra$mr2ice@news.patriot.net>

In <508f8178$0$6862$e4fe514c@news2.news.xs4all.nl>, on 10/30/2012
   at 08:27 AM, "Dr.Ruud" <rvtol+usenet@xs4all.nl> said:

>I don't see what is 'far better' here. A while on an iterator 
>would be very similar in C.

C doesn't have iterators. It does have a loop construct in which you
give initialization, increment and test clauses, but that is far more
verbose than a foreach loop in Perl.

   for ($i=0; $i<$#list; $i++) {
	process($list[$i]);
   }


   foreach my $elem (@list) {
	process($elem);
   }

>An even more perlish alternative;
>     process($_) for @list;

That is a foreach without an explicit variable name; it's a minor
variation on what Jürgen wrote. It's clearer when the processing is a
single line, but is problematical when the processing is a block that
might affect $_.

>That comment is more 'what' than 'why',
>and that is already clear from the code.

Indeed; that's a hard point to make to students, who routinely write
comments that merely echo the code. )-:

-- 
Shmuel (Seymour J.) Metz, SysProg and JOAT  <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action.  I reserve the
right to publicly post or ridicule any abusive E-mail.  Reply to
domain Patriot dot net user shmuel+news to contact me.  Do not
reply to spamtrap@library.lspace.org



------------------------------

Date: Wed, 31 Oct 2012 08:26:43 -0500
From: Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>
Subject: Re: perl and indent
Message-Id: <50912713$8$fuzhry+tra$mr2ice@news.patriot.net>

In <87625rl10h.fsf@stemsystems.com>, on 10/31/2012
   at 12:27 AM, Uri Guttman <uri@stemsystems.com> said:

>actually fewer lines mean fewer bugs

I've dealt with code that had no white space not required by the
language; it wasn't pretty. I'll take my code prettyprinted, TYVM.

-- 
Shmuel (Seymour J.) Metz, SysProg and JOAT  <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action.  I reserve the
right to publicly post or ridicule any abusive E-mail.  Reply to
domain Patriot dot net user shmuel+news to contact me.  Do not
reply to spamtrap@library.lspace.org



------------------------------

Date: Wed, 31 Oct 2012 16:28:50 -0400
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: perl and indent
Message-Id: <871ugel725.fsf@stemsystems.com>

>>>>> "S(J)M" == Shmuel (Seymour J ) Metz <spamtrap@library.lspace.org.invalid> writes:

  S(J)M> In <87625rl10h.fsf@stemsystems.com>, on 10/31/2012
  S(J)M>    at 12:27 AM, Uri Guttman <uri@stemsystems.com> said:

  >> actually fewer lines mean fewer bugs

  S(J)M> I've dealt with code that had no white space not required by the
  S(J)M> language; it wasn't pretty. I'll take my code prettyprinted, TYVM.

i never said no white space. i said fewer lines when you can. foreach
modifier is one of those times. there is a balance to be found. foreach
modifier is a win for that and other reasons. 

uri


------------------------------

Date: Wed, 31 Oct 2012 10:57:32 +0100
From: Hans Mulder <hansmu@xs4all.nl>
Subject: Re: Simple (Rookie) Question
Message-Id: <5090f60c$0$6854$e4fe514c@news2.news.xs4all.nl>

On 29/10/12 03:52:10, William Humpboys wrote:
> On Sun, 28 Oct 2012 17:48:55 -0700 (PDT), Jason C
> <jwcarlton@gmail.com> wrote:
> 
>>From your description, I think you are using SSI. It's been a LONG time since I've worked with that, but I think you're looking for:
>>
>> <!--#exec cgi="cgi-bin/script_name.cgi"-->
>>
>> The path "cgi-bin/script_name.cgi" would vary based on the name and location of the script.
> 
> In my case, that becomes:
> 
> <!--#exec cgi="/find_header"-->
> 
> and generates an apache error 
> 
> "[an error occurred while processing this directive]"
> 
> http://nizkor.org/test.html

In that case, you'll want to look at the error message.
You can find it in the Apache error log.


Hope this helps,

-- HansM





------------------------------

Date: Tue, 30 Oct 2012 17:40:50 +0100
From: Helmut Richter <hhr-m@web.de>
Subject: Re: Why "Wide character in print"?
Message-Id: <alpine.LNX.2.00.1210301723510.5398@badwlrz-clhri01.ws.lrz.de>

On Mon, 29 Oct 2012, Rainer Weikusat wrote:

> But indepedently of that, inventing the 'Perl is an
> island!' character encoding - no matter how hypothetical - remains a
> stupid idea.

Every program is an "island" within its code. No matter what I use, I do not
normally know the internals, and if I happen to know them I should not use my
knowledge because the internals may change at any time.

Perl is not an island as far as interaction with other programs is
concerned. It is documented how to read and write byte data, and how to read
and write character data whose code and encoding is known. If desired, it is
also not really difficult to write code that tries to guess an unknown code --
with all the pitfalls such a behaviour entails.

There is one interface decision perl has made: it does not by default use the
locale settings to determine the default code and encoding, rather it requires
that these be specified in the script. Opinions may be divided; I like this
decision because my experience is that often the locale settings appear to be
randomly uncorrelated to the codes actually used.

The implementation decisions that are not part of the interface, in particular
the internal representation of values of different types including strings,
concern future developers but not users.  If perl decides to store characters
internally as a 37-bit EBCDIC enhancement, it does not really bother me as
long as the programm still interacts correctly with the outside world in
standardised codes.

--
Helmut Richter


------------------------------

Date: Wed, 31 Oct 2012 20:37:14 +0200
From: Eric Pozharski <whynot@pozharski.name>
Subject: Re: Why "Wide character in print"?
Message-Id: <slrnk92ruq.l6a.whynot@orphan.zombinet>

with <slrnk8vb5d.g94.hjp-usenet2@hrunkner.hjp.at> Peter J. Holzer wrote:
> On 2012-10-29 12:52, Eric Pozharski <whynot@pozharski.name> wrote:
>> with <slrnk8r42s.2s7.hjp-usenet2@hrunkner.hjp.at> Peter J. Holzer wrote:

*SKIP*
>> Please rewind the thread. That's exactly what happened couple of
>> posts ago (specifically: <eli$1210251546@qz.little-neck.ny.us> and
>> <vi7pl9-ui71.ln1@anubis.morrow.me.uk>).
> I've read these postings but I don't know what you are referring to.
> If you are referring to other postings (especially long ones), please
> cite the relevant part.

[quoting <eli$1210251546@qz.little-neck.ny.us> on]

  $ echo 'a' | perl -Mutf8 -wne 's/a/Ã¥/;print' | od -xc
  0000000    0ae5
          345  \n
  0000002

[quote off]

*SKIP*
>>> In UTF-8, latin-1 characters >= 0x80 are 2 bytes, the same as
>>> cyrillic characters. Your example shows this: "Ã " (LATIN SMALL
>>> LETTER A WITH GRAVE) is "\303\240" and "Ð°" (CYRILLIC SMALL LETTER A)
>>> is "\320\260". 
>> No.  Because it's not UTF-8, it's utf8. 
> I presume that by "utf8" you mean a string with the UTF8 bit set
> (testable with the utf8::is_utf8() function).

If "you" above refers to me then you're wrong.

> But as I've written repeatedly, this is completely irrelevant for I/O.
> A string will be treated completely identical, whether is has this bit
> set or not. It is only the value of the string which is important, not
> its internal type and representation.

Try to read it again.  Slowly.

> (Also, I find it very confusing that you post the output of
> Devel::Peek::Dump, but then apparently don't refer to it but talk
> about something else. Please try to organize your postings in a way
> that one can understand what you are talking about.

Indeed, only FLAGS and PV are relevant.  Sadly that Devel::Peek::Dump
doesn't provide means to filter arbitrary parts of output off (however,
that's not the purpose of D::P).  And I consider editing copypastes a
bad taste.

*SKIP*
> Yes. We've been through that. Ben explained it in excruciating detail.
> What don't you understand here?

It's not about understanding.  I'm trying to make a point that latin1 is
special.

>>> However, for real programs, I think tying the encoding of the source
>>> code to the encoding of I/O-streams the script is supposed to handle
>>> is foolish. My scripts are always encoded in UTF-8, but they
>>> frequently have to handle files in CP-1252.
>> Mine are us-ascii, I have open.pm for rest.
> US-ASCII is a subset of UTF-8, so your files are UTF-8, too ;-). (Most
> of mine don't contain non-ASCII characters either) What I meant is that
> I don't use any other encoding (like ISO-8859-1 or ISO-8859-15) to
> encode non-ASCII characters, so I don't have any need for "use
> encoding". If your scripts are all in ASCII and you use open.pm for
> "rest", what do you need "use encoding" for?

Many years ago to get operations to work on characters instead of bytes
some strings must have been pulled.  encoding.pm pulled right strings.
utf8.pm pulled irrelevant strings.  Those days text related operations
worked for you because they fitted in latin1 script or you didn't hit
edge cases.  However I did (more years ago, in 5.6.0, B<lcfirst()>
worked *only* on bytes, no matter what).

Guess what?  I've just figured out I don't need either any more:

	{40710:255} [0:0]% xxd foo.koi8-u 
	0000000: c6d9 d7c1 0a                             .....
	{40731:262} [0:0]% perl -wle '              
	open $fh, "<:encoding(koi8-u)", "foo.koi8-u";
	read $fh, $fh, -s $fh;
	$fh =~ m{(\w\w)};
	print $1
	'                       
	Wide character in print at -e line 5.
	Ñ„Ñ‹

> Remember, this subthread started when you berated Ben for discouraging
> the use "use encoding".

It comes clear to me now what made you both (you and Ben) believe in
bugginess of F<encoding.pm>.  I'm fine with that.

-- 
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom


------------------------------

Date: Thu, 1 Nov 2012 12:16:06 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Why "Wide character in print"?
Message-Id: <slrnk94mfm.5vl.hjp-usenet2@hrunkner.hjp.at>

On 2012-10-31 18:37, Eric Pozharski <whynot@pozharski.name> wrote:
> with <slrnk8vb5d.g94.hjp-usenet2@hrunkner.hjp.at> Peter J. Holzer wrote:
>> On 2012-10-29 12:52, Eric Pozharski <whynot@pozharski.name> wrote:
>>> with <slrnk8r42s.2s7.hjp-usenet2@hrunkner.hjp.at> Peter J. Holzer wrote:
>
> *SKIP*
>>> Please rewind the thread. That's exactly what happened couple of
                              ^^^^
>>> posts ago (specifically: <eli$1210251546@qz.little-neck.ny.us> and
>>> <vi7pl9-ui71.ln1@anubis.morrow.me.uk>).
>> I've read these postings but I don't know what you are referring to.
>> If you are referring to other postings (especially long ones), please
>> cite the relevant part.
>
> [quoting <eli$1210251546@qz.little-neck.ny.us> on]
>
>   $ echo 'a' | perl -Mutf8 -wne 's/a/Ã¥/;print' | od -xc
>   0000000    0ae5
>           345  \n
>   0000002

Then I don't understand what you meant by "that" in the quoted
paragraph, since that seemed to refer to something else.


>>>> In UTF-8, latin-1 characters >= 0x80 are 2 bytes, the same as
>>>> cyrillic characters. Your example shows this: "Ã " (LATIN SMALL
>>>> LETTER A WITH GRAVE) is "\303\240" and "Ð°" (CYRILLIC SMALL LETTER A)
>>>> is "\320\260". 
>>> No.  Because it's not UTF-8, it's utf8. 
>> I presume that by "utf8" you mean a string with the UTF8 bit set
>> (testable with the utf8::is_utf8() function).
>
> If "you" above refers to me

Yes, of course. You used to the term "utf8", so I was wondering what you
meant by it.

> then you're wrong.

Then I don't know what you meant by "utf8". Care to explain?


>> But as I've written repeatedly, this is completely irrelevant for I/O.
>> A string will be treated completely identical, whether is has this bit
>> set or not. It is only the value of the string which is important, not
>> its internal type and representation.
>
> Try to read it again.  Slowly.

Read *what* again? The paragraph you quoted is correct and explains the
behaviour you are seeing.


>> (Also, I find it very confusing that you post the output of
>> Devel::Peek::Dump, but then apparently don't refer to it but talk
>> about something else. Please try to organize your postings in a way
>> that one can understand what you are talking about.
>
> Indeed, only FLAGS and PV are relevant.  Sadly that Devel::Peek::Dump
> doesn't provide means to filter arbitrary parts of output off (however,
> that's not the purpose of D::P).  And I consider editing copypastes a
> bad taste.

That's not the problem. The problem is that you gave the output of
Devel::Peek::Dump which clearly showed a latin-1 character occupying
*two* bytes and then claimed that it was only one byte long. Which it
clearly wasn't. What you probably meant was that the latin1 character
would be only 1 byte long if written to an output stream without an
encoding layer. But you didn't write that. You just made an assertion
which clearly contradicted the example you had just given and didn't
even give any indication that you had even noticed the contradiction.


>> Yes. We've been through that. Ben explained it in excruciating detail.
>> What don't you understand here?
>
> It's not about understanding.  I'm trying to make a point that latin1 is
> special.

It is only special in the sense that all its codepoints have a value <=
255. So if you are writing to a byte stream, it can be directly
interpreted as a string of bytes and written to the stream without
modification.

The point that *I* am trying to make is that an I/O stream without an
:encoding() layer isn't for I/O of *characters*, it is for I/O of
*bytes*. 

Thus, when you write the string "KÃ¤se" to such a stream, you aren't
writing Upper Case K, lower case umlaut a, etc. You are writing 4 bytes
with the values 0x4B, 0xE4, 0x73, 0x65. The I/O-code doesn't care about
whether the string is character string (with the UTF8 bit set) or a byte
string, it just interprets every element of the string as a byte. Those
four bytes could be pixels in image, for all the Perl I/O code knows.

OTOH, if there is an :encoding() layer, the string is taken to be
composed of (unicode) characters. If there is an element with the
codepoint \x{E4} in the string, it is a interpreted as a lower case
umlaut a, and converted to the proper encoding (e.g. one byte 0x84 for
CP850, two bytes 0xC3 0xA4 for UTF-8 and one byte 0xE4 for latin-1). But
again, this happens *always*. The Perl I/O layer doesn't care whether
the string is a character string (with the UTF8 bit set) or not.


>> If your scripts are all in ASCII and you use open.pm for "rest", what
>> do you need "use encoding" for?
>
> Many years ago to get operations to work on characters instead of bytes
> some strings must have been pulled.  encoding.pm pulled right strings.
> utf8.pm pulled irrelevant strings.  Those days text related operations
> worked for you because they fitted in latin1 script or you didn't hit
> edge cases.  However I did (more years ago, in 5.6.0, B<lcfirst()>
> worked *only* on bytes, no matter what).

Perl aquired unicode support in its current form only in 5.8.0. 5.6.0
did have some experimental support for UTF-8-encoded strings, but it was
different and widely regarded as broken (that's why it was changed for
5.8.0). So what Perl 5.6.0 did or didn't do is irrelevant for this
discussion.

With some luck I managed to skip the 5.6 days and went directly from the
<=5.005 "bytestrings only" era to the modern >=5.8.0  "character
strings" era. However, in the early days of 5.8.x, the documentation was
quite bad and it took a lot of reading, experimenting and thinking to
arrive at a consistent understanding of the Perl string model.

But once you have this understanding, it is really quite simple and
consistent.

> Guess what?  I've just figured out I don't need either any more:
>
> 	{40710:255} [0:0]% xxd foo.koi8-u 
> 	0000000: c6d9 d7c1 0a                             .....
> 	{40731:262} [0:0]% perl -wle '              
> 	open $fh, "<:encoding(koi8-u)", "foo.koi8-u";
> 	read $fh, $fh, -s $fh;
> 	$fh =~ m{(\w\w)};
> 	print $1
> 	'                       
> 	Wide character in print at -e line 5.
> 	Ñ„Ñ‹

This example doesn't have any non-ascii characters in the source code,
so of course it doesn't need 'use utf8'. The only effect of use utf8 it
to tell the perl compiler that the source code is encoded in UTF-8. 

But you *do* need some indication of the encoding of STDOUT (did you
notice the warning "Wide character in print at -e line 5."? As long as
you get this warning, your code is wrong). 

You could use "use encoding 'utf-8'":

% perl -wle '
use encoding "UTF-8";
open $fh, "<:encoding(koi8-u)", "foo.koi8-u";
read $fh, $fh, -s $fh;
$fh =~ m{(\w\w)};
print $1
'
Ñ„Ñ‹

Or you could use -C on the command line:

% perl -CS -wle '
open $fh, "<:encoding(koi8-u)", "foo.koi8-u";
read $fh, $fh, -s $fh;
$fh =~ m{(\w\w)};
print $1
'
Ñ„Ñ‹


Or could use "use open":

% perl -wle '
use open ":locale";  
open $fh, "<:encoding(koi8-u)", "foo.koi8-u";
read $fh, $fh, -s $fh;
$fh =~ m{(\w\w)};
print $1
'
Ñ„Ñ‹


Note: No warning in all three cases. The latter takes the encoding from
the environment, which hopefully matches your terminal settings. So it
works on a UTF-8 or ISO-8859-5 or KOI-8 terminal. But of course it
doesn't work on a latin-1 terminal and you get an appropriate warning:

"\x{0444}" does not map to iso-8859-1 at -e line 6.
"\x{044b}" does not map to iso-8859-1 at -e line 6.
\x{0444}\x{044b}



>> Remember, this subthread started when you berated Ben for discouraging
>> the use "use encoding".
>
> It comes clear to me now what made you both (you and Ben) believe in
> bugginess of F<encoding.pm>.  I'm fine with that.

I don't know whether encoding.pm is broken in the sense that it doesn't
do what is documented to do (it was, but it is possible that all of
those bugs have been fixed). I do think that it is "broken as designed",
because it conflates two different things:

 * The encoding of the source code of the script
 * The default encoding of some I/O streams

and it does so even in an inconsistent manner (e.g. the encoding is
applied to STDOUT, but not to STDERR) and finally, because it is too
complex and that will lead to surprising results.

	hp

-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR       | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaÃŸt. -- Ralph Babel


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3807
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[32542] in Perl-Users-Digest

Perl-Users Digest, Issue: 3807 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Thu Nov 1 09:09:25 2012

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Nov 1 09:09:25 2012