[32541] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3806 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Oct 30 09:09:24 2012

Date: Tue, 30 Oct 2012 06:09:09 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Tue, 30 Oct 2012     Volume: 11 Number: 3806

Today's topics:
    Re: array <*@eli.users.panix.com>
    Re: array <glex_no-spam@qwest-spam-no.invalid>
    Re: array <nospam@nspam.invalid>
    Re: array <news@lawshouse.org>
    Re: array <nospam@nspam.invalid>
    Re: array <news@lawshouse.org>
    Re: array <nospam@nspam.invalid>
    Re: array <jurgenex@hotmail.com>
    Re: array <hjp-usenet2@hjp.at>
    Re: array <hjp-usenet2@hjp.at>
    Re: basic perl question <news@lawshouse.org>
    Re: basic perl question <jurgenex@hotmail.com>
    Re: Get the decimal separator from Windows <hjp-usenet2@hjp.at>
        lerning perl <nospam@nspam.invalid>
    Re: lerning perl <jurgenex@hotmail.com>
    Re: lerning perl (Jens Thoms Toerring)
    Re: lerning perl <news@lawshouse.org>
    Re: perl and indent <jurgenex@hotmail.com>
    Re: perl and indent <rvtol+usenet@xs4all.nl>
    Re: perl and indent <jurgenex@hotmail.com>
    Re: Why "Wide character in print"? <hjp-usenet2@hjp.at>
    Re: Why was suid support dropped in perl? <shrike@cyberspace.org>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Mon, 29 Oct 2012 20:14:42 +0000 (UTC)
From: Eli the Bearded <*@eli.users.panix.com>
Subject: Re: array
Message-Id: <eli$1210291610@qz.little-neck.ny.us>

In comp.lang.perl.misc, Bill Cunningham <nospam@nspam.invalid> wrote:
>     What's wrong in this code?
> 
> use strict;
> use warnings;
> 
> my @cats=["striper","snowball"];
           ^                    ^

You probably want parens there:

  my @cats=("striper","snowball");

What you have is an array presented as a single value.

> print $cats[0];
> print $cats[1];

With your code, striper and snowball are in:

  print $cats[0][0];
  print $cats[0][1];

The square bracket notation is very handy, I find, for making
arrays of arrays.

Elijah
------
can see this as a very easy new-to-perl mistake


------------------------------

Date: Mon, 29 Oct 2012 15:27:05 -0500
From: "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid>
Subject: Re: array
Message-Id: <508ee699$0$52259$815e3792@news.qwest.net>

On 10/29/12 15:56, Bill Cunningham wrote:
>      What's wrong in this code?
>
> use strict;
> use warnings;
>
> my @cats=["striper","snowball"];
            ^(                   ^)
>
> print $cats[0];
> print $cats[1];


[] creates a reference to an anonymous array reference.
() creates a list.

See: perldoc -q "What is the difference between a list and an array?"

To store a list as an array:

my @cats = ( 'striper', 'snowball' );
print $cats[0];
push( @cats, 'tiger' )

To use an anonymous array reference, you have to dereference the variable.

my $cats = [ 'striper', 'snowball' ];
print $cats->[0];
push( @$cats, 'tiger' )

Learning how to use references is very important in many languages. 
After you learn about arrays and hashes, then take a look at references.

See: perldoc perlreftut



You can also take a look at using qw() to help clean up things when you 
create a list. e.g.

my @cats = qw( striper snowball );


------------------------------

Date: Mon, 29 Oct 2012 16:29:20 -0500
From: "Bill Cunningham" <nospam@nspam.invalid>
Subject: Re: array
Message-Id: <k6mour$c0p$1@dont-email.me>

Eli the Bearded wrote:
> In comp.lang.perl.misc, Bill Cunningham <nospam@nspam.invalid> wrote:
>>     What's wrong in this code?
>>
>> use strict;
>> use warnings;
>>
>> my @cats=["striper","snowball"];
>           ^                    ^
>
> You probably want parens there:
>
>  my @cats=("striper","snowball");

    Oh. OK left over from C where arrays are in []. Gotta learn something 
new.

> What you have is an array presented as a single value.
>
>> print $cats[0];
>> print $cats[1];
>
> With your code, striper and snowball are in:
>
>  print $cats[0][0];
>  print $cats[0][1];

    Goodness when I see this I think char **. I have to change my thinking 
here. There are no pointers in perl are there?

> The square bracket notation is very handy, I find, for making
> arrays of arrays.
>
> Elijah
> ------
> can see this as a very easy new-to-perl mistake 




------------------------------

Date: Mon, 29 Oct 2012 20:36:30 +0000
From: Henry Law <news@lawshouse.org>
Subject: Re: array
Message-Id: <a8idnTxl__1JdRPNnZ2dnUVZ8u2dnZ2d@giganews.com>

On 29/10/12 21:29, Bill Cunningham wrote:
>> With your code, striper and snowball are in:
>>
>>   print $cats[0][0];
>>   print $cats[0][1];
>
>      Goodness when I see this I think char **. I have to change my thinking
> here. There are no pointers in perl are there?

Oh yes ... but don't you dare call them that!  All sorts of contumely 
will be your portion. :-)

Look up "references".  "perldoc perlreftut" on a command line will get 
you started, or you may have some other way of getting at the built-in 
documentation.

-- 

Henry Law            Manchester, England


------------------------------

Date: Mon, 29 Oct 2012 16:46:08 -0500
From: "Bill Cunningham" <nospam@nspam.invalid>
Subject: Re: array
Message-Id: <k6mpub$iqj$1@dont-email.me>

Henry Law wrote:

> Oh yes ... but don't you dare call them that!  All sorts of contumely
> will be your portion. :-)
>
> Look up "references".  "perldoc perlreftut" on a command line will get
> you started, or you may have some other way of getting at the built-in
> documentation.

    I have to use man perlintro on my linux. I'm not even through reading 
it. I must've forgot how to use lists. 




------------------------------

Date: Mon, 29 Oct 2012 21:43:24 +0000
From: Henry Law <news@lawshouse.org>
Subject: Re: array
Message-Id: <XeOdnfj6odYfZRPNnZ2dnUVZ8oqdnZ2d@giganews.com>

On 29/10/12 21:46, Bill Cunningham wrote:
>
>      I have to use man perlintro on my linux. I'm not even through reading
> it. I must've forgot how to use lists.

I've been through some of what you're going through and I'm happy to 
help.  Mail me direct (address is valid) if you like.


-- 

Henry Law            Manchester, England


------------------------------

Date: Mon, 29 Oct 2012 19:38:24 -0500
From: "Bill Cunningham" <nospam@nspam.invalid>
Subject: Re: array
Message-Id: <k6n41b$fvi$1@dont-email.me>

Henry Law wrote:
> On 29/10/12 21:46, Bill Cunningham wrote:
>>
>>      I have to use man perlintro on my linux. I'm not even through
>> reading it. I must've forgot how to use lists.
>
> I've been through some of what you're going through and I'm happy to
> help.  Mail me direct (address is valid) if you like.

Will do.

B




------------------------------

Date: Mon, 29 Oct 2012 17:05:09 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: array
Message-Id: <586u881dvvm7icmsp00ce2kv3cs314uj96@4ax.com>

"Bill Cunningham" <nospam@nspam.invalid> wrote:
>    Goodness when I see this I think char **. I have to change my thinking 
>here. There are no pointers in perl are there?

There are many in perl, but none in Perl.

perl is the interpreter and of course its implementation uses plenty of
pointers. 
However the programming language Perl uses references to create dynamic
data structures, which are _MUCH_ more programmer-friendly than
primitive pointers.

jue


------------------------------

Date: Tue, 30 Oct 2012 11:42:20 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: array
Message-Id: <slrnk8vboc.g94.hjp-usenet2@hrunkner.hjp.at>

On 2012-10-29 21:29, Bill Cunningham <nospam@nspam.invalid> wrote:
> Eli the Bearded wrote:
>> In comp.lang.perl.misc, Bill Cunningham <nospam@nspam.invalid> wrote:
>>>     What's wrong in this code?
[...]
>>> my @cats=["striper","snowball"];
>>           ^                    ^
>>
>> You probably want parens there:
>>
>>  my @cats=("striper","snowball");
>
>     Oh. OK left over from C where arrays are in [].

No. Initializers in C are in {}:

int a[] = {1, 2, 3};

This would be a syntax error:

int a[] = [1, 2, 3];

Lists are denoted by [] in Python.

	hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR       | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Tue, 30 Oct 2012 12:09:41 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: array
Message-Id: <slrnk8vdbl.g94.hjp-usenet2@hrunkner.hjp.at>

On 2012-10-30 00:05, Jürgen Exner <jurgenex@hotmail.com> wrote:
> "Bill Cunningham" <nospam@nspam.invalid> wrote:
>>    Goodness when I see this I think char **. I have to change my thinking 
>>here. There are no pointers in perl are there?
>
> There are many in perl, but none in Perl.
>
> perl is the interpreter and of course its implementation uses plenty of
> pointers. 
> However the programming language Perl uses references to create dynamic
> data structures,

There is no universally agreed upon definition of the terms "reference"
and "pointer" which allows to distinguish between these two terms.

A "pointer" in Pascal has different properties than a "pointer" in C or
C++.  C++ distinguishes between "references" and "pointers". Java
doesn't have anything called "pointers" or "references", but clearly
object variables" don't contain an object, they "reference" it or "point
to" it. Other languages have different convention.

In Perl, the term is "reference", but as for all other languages, this
is language-specific jargon, not a conceptional difference.

A Perl "reference" is very similar to a Pascal "pointer" or a Java
"object variable". It is quite different from both a C++ "pointer" and a
C++ "reference", although it is closer to the "pointer" than the
"reference" (A C++ "reference" is similar to a Perl "alias").

So yes, there are "pointers" in the generic sense in Perl and they are
called "references". There is no equivalent to "C or C++ pointers" in
Perl.

> which are _MUCH_ more programmer-friendly than primitive pointers.

I would argue that a C pointer is *less* primitive than a Perl
reference. A perl reference can only be dereferenced, assigned, and
compared for equality. A C pointer has a much richer set on operations
defined on it: In addition to the operations possible on a Perl
reference (or a Pascal pointer), you can add or subtract an integer, you
can subtract one pointer from another and you can order them.

C doesn't do garbage collection, but that's a property of the language
as a whole, not of C-type pointers. A language with C-type pointers and
garbage collection is entirely possible. (Indeed, I think the C standard
even allows this, but it doesn't mandate it).

	hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR       | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Mon, 29 Oct 2012 20:30:34 +0000
From: Henry Law <news@lawshouse.org>
Subject: Re: basic perl question
Message-Id: <M4KdnTWCJIPtehPNnZ2dnUVZ8vWdnZ2d@giganews.com>

On 29/10/12 20:01, vis29 wrote:
> I am going through a perl program, i have following lines which I dont understand, please explain what those lines does and why.
>
> if (/^$hash{server}/) {
>     $name="not accessible";
>    }

You'll find this behaviour documented in "perldoc perlretut" (or however 
you get at the built-in documentation on your machine); you'll find full 
details there.

But in summary the stuff between // is a regular expression (regex), and 
if no variable is specified to be matched against the regex then the 
built-in variable $_ is assumed. It's equivalent to this:

   if ($_ =~ /^$hash{server}/) {

Presumably you know what $hash{server} is since it's a fundamental part 
of the Perl language, but bear in mind that the hash key in the curly 
brackets doesn't need to be in quotes if it's just alphanumeric.  In 
this case it's equivalent to $hash{'server'}.

-- 

Henry Law            Manchester, England


------------------------------

Date: Mon, 29 Oct 2012 17:09:53 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: basic perl question
Message-Id: <pg6u881h1m43mp13h5ee2dfrnhdkiandfe@4ax.com>

vis29 <dviswa@gmail.com> wrote:
>I am going through a perl program, i have following lines which I dont understand, please explain what those lines does and why.
>
>if (/^$hash{server}/) {
>   $name="not accessible";
>  }

It tests if the value in $_ (which is the default if no other value is
specified) matches the regular expression between the /.../, which in
turn specifies that the string should begin with whatever value the
element "server" in the hash %hash currently has.
If a match is found, then it assigns the text "not accessible" to the
variable $name.

jue


------------------------------

Date: Tue, 30 Oct 2012 09:42:20 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Get the decimal separator from Windows
Message-Id: <slrnk8v4nc.g94.hjp-usenet2@hrunkner.hjp.at>

On 2012-10-29 13:45, mathieu.hedard@gmail.com <mathieu.hedard@gmail.com> wrote:
> i write a script to process files on different computer with various language.
> And i wonder if there is a way to know which is the decimal separator
> for the current user in my script.

POSIX::localeconv

	hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR       | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Mon, 29 Oct 2012 21:19:43 -0500
From: "Bill Cunningham" <nospam@nspam.invalid>
Subject: lerning perl
Message-Id: <k6n9v9$h3e$1@dont-email.me>

    Is the best way to learn perl to start with perlintro and then move in a 
sequence through the tutorials? Where can I get the meanings of the 
functions? man open works with the unix API and man fopen with C, but what 
about perl functions?

B
man key for example doesn't work.




------------------------------

Date: Mon, 29 Oct 2012 19:44:52 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: lerning perl
Message-Id: <tffu88teb2lltpf7a8vkp3dsb5tbvbkguh@4ax.com>

"Bill Cunningham" <nospam@nspam.invalid> wrote:
>    Is the best way to learn perl to start with perlintro and then move in a 
>sequence through the tutorials? 

Get yourself a book. If you are an experienced programmer in some other
language then "Programming Perl" is a very good start. If you are new to
programming itself then 'Learning Perl" is often highly recommended.

>Where can I get the meanings of the 
>functions? man open works with the unix API and man fopen with C, but what 
>about perl functions? man key for example doesn't work.

Use perldoc:
	perldoc -f open
	perldoc -f keys

To learn more about perldoc use perldoc
	perldoc perldoc

And for frequently asked question, e.g. about what books there are use 
	perldoc -q books

jue


------------------------------

Date: 30 Oct 2012 10:47:41 GMT
From: jt@toerring.de (Jens Thoms Toerring)
Subject: Re: lerning perl
Message-Id: <af9pidFm4amU1@mid.uni-berlin.de>

Jürgen Exner <jurgenex@hotmail.com> wrote:
> "Bill Cunningham" <nospam@nspam.invalid> wrote:
> >    Is the best way to learn perl to start with perlintro and then move in a 
> >sequence through the tutorials? 

> Get yourself a book. If you are an experienced programmer in some other
> language then "Programming Perl" is a very good start. If you are new to
> programming itself then 'Learning Perl" is often highly recommended.

Before you start spending lots of time trying to "help" this
Bill Cunningham character I'd recommend that you all spend a
short bit of time on checking his posting history over the
years in other newsgroups (like, for example, comp.lang.c or
comp.unix.programmer) and see for yourself if you can detect
a certain pattern. I think I can predict with high certainty
that he will never master the most basic elements of Perl -
same as he never seems to manage to write a working C pro-
gram after years of coaching - it looks as if he actually
gets worse at it over the years...

                              Regards, Jens
-- 
  \   Jens Thoms Toerring  ___      jt@toerring.de
   \__________________________      http://toerring.de


------------------------------

Date: Tue, 30 Oct 2012 11:16:38 +0000
From: Henry Law <news@lawshouse.org>
Subject: Re: lerning perl
Message-Id: <xOSdnZ0zcL-DKhLNnZ2dnUVZ7sudnZ2d@giganews.com>

On 30/10/12 10:47, Jens Thoms Toerring wrote:

> Before you start spending lots of time trying to "help" this
> Bill Cunningham character I'd recommend that you all spend a
> short bit of time on checking his posting history over the
> years in other newsgroups

Done, thank you; forewarned.  Now let's see; who knows, things might be 
different.

-- 

Henry Law            Manchester, England


------------------------------

Date: Mon, 29 Oct 2012 17:16:35 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: perl and indent
Message-Id: <io6u88tkot13qv3uf3qd0p2ckn136sfno7@4ax.com>

Henry Law <news@lawshouse.org> wrote:
>On 29/10/12 17:35, Bill Cunningham wrote:
>
>> I use nano. I'll check more into it. I'm new to Perl. I'm just now looking
>> at it as a second language along with C. So far it looks C like.
>
>Bill, nice to have you aboard.  Yes, it's C-like, and many of the 
>differences are (IMO) improvements. But there are people who say "Don't 
>write C in Perl; learn to write Perl" and if you read people's code 
>you'll see -- at least partly -- what they mean.
>
>For example (and I'm only an amateur at this), the C structure for an 
>"if" statement translates to Perl as this:

A far better example would be something, where there is no analog
construct in C, e.g. a foreach loop.
	foreach my $elem (@list) {
		process($elem);
	}

Or filtering a list:
	 @foo = grep {!/^#/} @bar;    # weed out lines that begin with #
Try writing this as concise in C.

jue


------------------------------

Date: Tue, 30 Oct 2012 08:27:52 +0100
From: "Dr.Ruud" <rvtol+usenet@xs4all.nl>
Subject: Re: perl and indent
Message-Id: <508f8178$0$6862$e4fe514c@news2.news.xs4all.nl>

On 2012-10-30 01:16, Jürgen Exner wrote:
> Henry Law <news@lawshouse.org> wrote:

>> For example (and I'm only an amateur at this), the C structure for an
>> "if" statement translates to Perl as this:
>> [... The code was removed, but why? ...]
>
> A far better example would be something, where there is no analog
> construct in C, e.g. a foreach loop.
> 	foreach my $elem (@list) {
> 		process($elem);
> 	}

I don't see what is 'far better' here. A while on an iterator would be 
very similar in C.


An even more perlish alternative;

     process($_) for @list;


> Or filtering a list:
> 	 @foo = grep {!/^#/} @bar;    # weed out lines that begin with #

That comment is more 'what' than 'why',
and that is already clear from the code.
Consider: "ignore comment lines".

-- 
Ruud



------------------------------

Date: Tue, 30 Oct 2012 00:53:40 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: perl and indent
Message-Id: <a21v889j8u45s98u0h5sktug8id1p296fg@4ax.com>

"Dr.Ruud" <rvtol+usenet@xs4all.nl> wrote:
>On 2012-10-30 01:16, Jürgen Exner wrote:
>> Henry Law <news@lawshouse.org> wrote:
>
>>> For example (and I'm only an amateur at this), the C structure for an
>>> "if" statement translates to Perl as this:
>>> [... The code was removed, but why? ...]
>>
>> A far better example would be something, where there is no analog
>> construct in C, e.g. a foreach loop.
>> 	foreach my $elem (@list) {
>> 		process($elem);
>> 	}
>
>I don't see what is 'far better' here. A while on an iterator would be 
>very similar in C.

I respectfully disagree. The point is that in C you cannot access the
elements of an array or list without using an explicit index which means
you have to iterate (i.e. explicitely initialize, increment, and
terminate) over that index.

Therefore the difference is between 
	"with each element of the list do foobar()" 
and 
	"initialize $i to start index; 
	as long as $i is smaller than the end index 
		{do foobar() with element array[$i] and increment $i}

To me it is a very major difference if I have to invent and maintain an
auxiliary index variable or not.

>An even more perlish alternative;
>     process($_) for @list;

This is where _I_ would ask what is the difference to 
	foreach (@list) {process ($_)};

jue


------------------------------

Date: Tue, 30 Oct 2012 11:32:13 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Why "Wide character in print"?
Message-Id: <slrnk8vb5d.g94.hjp-usenet2@hrunkner.hjp.at>

On 2012-10-29 12:52, Eric Pozharski <whynot@pozharski.name> wrote:
> with <slrnk8r42s.2s7.hjp-usenet2@hrunkner.hjp.at> Peter J. Holzer wrote:
>> On 2012-10-28 11:45, Eric Pozharski <whynot@pozharski.name> wrote:
>>> with <3q7ul9-l7s.ln1@anubis.morrow.me.uk> Ben Morrow wrote:
>
>>>> In any case, the result is exactly what I said: the string contains
>>>> one (logical) character. If you apply length() to that string it
>>>> will return 1. (This character happens to be represented internally
>>>> as two bytes; that is none of your business.) What do you think I
>>>> omitted from the story?
>>> Right.  And that's closely related to your last example (the one
>>> about utf8.pm being unsafe).  I've tried to make a point that
>>> *characters* from different *ranges* happen to be of different length
>>> in bytes.
>> Then maybe you shouldn't have chosen two examples which both are same
>> length in bytes.
>
> (Last night I've reread loads of perlunicode and friends, I feel much
> better now) No, they are the same length *if* encoding of stream is set:

You posted the output of Devel::Peek::Dump, so I thought you were
talking about the *internal* representation. 

How many bytes they occupy in an I/O stream depends on the encoding.

LATIN SMALL LETTER A WITH GRAVE is one byte in ISO-8859-1, CP850, ...
LATIN SMALL LETTER A WITH GRAVE is two bytes in UTF-8, UTF-16, ...
LATIN SMALL LETTER A WITH GRAVE is four bytes in UTF-32, ...

CYRILLIC SMALL LETTER A is one byte in ISO-8859-5, KOI-8, ...
CYRILLIC SMALL LETTER A is two bytes in UTF-8, UTF-16, ...
CYRILLIC SMALL LETTER A is four bytes in UTF-32, ...

(And of course, both characters cannot be represented at all in some
encodings: There is no LATIN SMALL LETTER A WITH GRAVE in ISO-8859-5,
and no CYRILLIC SMALL LETTER A in ISO-8859-1)

> 	{7453:22} [0:0]% perl -CS -Mutf8 -wle 'print "à"' | xxd 
> 	0000000: c3a0 0a                                  ...
> 	{7459:23} [0:0]% perl -CS -Mutf8 -wle 'print "а"' | xxd
> 	0000000: d0b0 0a                                  ...
> 	{7466:24} [0:0]% 
>
> But latin1 is special (I've reread perlunicode and friends), *if*
> there's no reason (printing isn't reason) to upgrade to utf8 then
> *characters* of latin1 script (and latin1 only) stay *bytes*:

I already explained that. When writing to a file handle, perl doesn't
care whether a string is composed of bytes or characters.

If the file handle has no :encoding() layer, it will try to write each
element of the string as a single byte.

If the file has an :encoding() layer, it will interpret each element of
the string as a character and convert that to a byte sequence according
to that encoding.

So without an encoding layer "\x{E0}" will always be written as the single byte
0xE0, regardless of whether the string is a byte string or a character
string. With an ":encoding(UTF-8)" layer it will always be written as
two bytes 0xC3 0xA0; and with an ":encoding(CP850)" layer, it will
always be written as a single byte 0x85.

What it apparently confusing you is what happens if that fails.

Obviously you can't write a single byte with the value 0x430, you can't
encode CYRILLIC SMALL LETTER A in ISO-8859-1 and you can't encode LATIN
SMALL LETTER A WITH GRAVE in ISO-8859-5.

So what does perl do? It prints a warning to STDERR and writes
a more or less reasonable approximation to the stream. The details
depend on the I/O layer:

If there is no :encoding() layer, the warning is "Wide character in
print" and the utf-8 representation is sent to the stream. And to
confuse matters further, this is done for the whole string, not just
this particular string element:

% perl -Mutf8 -E 'say "->\x{E0}\x{430}<-"' 
Wide character in say at -e line 1.
->àа<-

(one string: \x{E0} and \x{430} converted to UTF-8)

% perl -Mutf8 -E 'say "->\x{E0}<-", "->\x{430}<-"' 
Wide character in say at -e line 1.
->�<-->а<-

(two strings: \x{E0} printed as a single byte, \x{430} converted to UTF-8)

If there is an :encoding() layer, the warning is "\x{....} does not map
to $charset" and a \x{....} escape sequence is sent to the stream:

% perl -Mutf8 -E 'binmode STDOUT, ":encoding(iso-8859-5)"; say "->\x{E0}<-"' 
"\x{00e0}" does not map to iso-8859-5 at -e line 1.
->\x{00e0}<-

But these are responses to an *error* condition. You shouldn't try to
write codepoints > 255 to a byte stream (actually, you shouldn't write
any characters to a byte stream, a byte stream is for bytes), and you
shouldn't try to write latin accented characters to a cyrillic stream.
Or at least you shouldn't be terribly surprised if the result is a
little confusing - garbage in, garbage out.


> But even if encoding of stream isn't set concatenation with non-latin1
> script upgrades latin1 too:

The term "upgrade" has a rather specific meaning in Perl in context with
byte and character strings, and I don't think you are talking about
that.


> 	{7800:26} [0:0]% perl -Mutf8 -wle 'print "[à][а]"' | xxd
> 	Wide character in print at -e line 1.
> 	0000000: 5bc3 a05d 5bd0 b05d 0a                   [..][..].

You have a single string "[à][а]" here. As I wrote above, print treats
the string as unit and in the absence of an :encoding() layer just dumps
it in UTF-8 encoding. So, yes, both the "à" and the "а" within this
single string will be UTF-8-encoded (as will be the square brackets, but
for them the UTF-8 encoding is the same as for US-ASCII, so you don't
notice that).

And I repeat it again: You are doing something which just doesn't make
sense (writing characters to a byte stream), so don't be surprised if
the result is a little surprising. Do it right and the result will make
sense.


> Please rewind the thread. That's exactly what happened couple of posts
> ago (specifically: <eli$1210251546@qz.little-neck.ny.us> and
><vi7pl9-ui71.ln1@anubis.morrow.me.uk>).

I've read these postings but I don't know what you are referring to. If
you are referring to other postings (especially long ones), please cite
the relevant part.


>>> 	{9829:45} [0:0]% perl -Mutf8 -MDevel::Peek -wle '$aa = "aàа" ; Dump $aa'
>>> 	SV = PV(0xa06f750) at 0xa08afac
>>> 	  REFCNT = 1
>>> 	  FLAGS = (POK,pPOK,UTF8)
>>> 	  PV = 0xa086a08 "a\303\240\320\260"\0 [UTF8 "a\x{e0}\x{430}"]
>>> 	  CUR = 5
>>> 	  LEN = 12
>>>
>>> *Characters* of latin1 aren't wide (even if they are characters, they
>>> are still one byte long)
>> In UTF-8, latin-1 characters >= 0x80 are 2 bytes, the same as cyrillic
>> characters. Your example shows this: "à" (LATIN SMALL LETTER A WITH
>> GRAVE) is "\303\240" and "а" (CYRILLIC SMALL LETTER A) is "\320\260". 
>
> No.  Because it's not UTF-8, it's utf8. 

I presume that by "utf8" you mean a string with the UTF8 bit set
(testable with the utf8::is_utf8() function). But as I've written
repeatedly, this is completely irrelevant for I/O. A string will be
treated completely identical, whether is has this bit set or not. It is
only the value of the string which is important, not its internal type
and representation.

(Also, I find it very confusing that you post the output of
Devel::Peek::Dump, but then apparently don't refer to it but talk about
something else. Please try to organize your postings in a way that one
can understand what you are talking about. It is very likely that this
exercise will also clear up the confusion in your mind)


> As long as utf8 semantics isn't set, anything scalar stays plain
> bytes:
>
> 	{2786:10} [0:0]% perl -MDevel::Peek -wle 'Dump "à"'           
> 	SV = PV(0x9d0e878) at 0x9d29f28
> 	  REFCNT = 1
> 	  FLAGS = (PADTMP,POK,READONLY,pPOK)
> 	  PV = 0x9d2ddc8 "\303\240"\0
> 	  CUR = 2
> 	  LEN = 12
>
> However, when utf8 semantics is set, then those codepoints that fit
> latin1 script become special Perl-latin1:
>
> 	{5930:11} [0:0]% perl -MDevel::Peek -Mutf8 -wle 'Dump "à"'  
> 	SV = PV(0x9b92880) at 0x9badf10
> 	  REFCNT = 1
> 	  FLAGS = (PADTMP,POK,READONLY,pPOK,UTF8)
> 	  PV = 0x9bb1eb0 "\303\240"\0 [UTF8 "\x{e0}"]
> 	  CUR = 2
> 	  LEN = 12

Yes. We've been through that. Ben explained it in excruciating detail.
What don't you understand here?


>> However, for real programs, I think tying the encoding of the source
>> code to the encoding of I/O-streams the script is supposed to handle
>> is foolish. My scripts are always encoded in UTF-8, but they
>> frequently have to handle files in CP-1252.
>
> Mine are us-ascii, I have open.pm for rest.

US-ASCII is a subset of UTF-8, so your files are UTF-8, too ;-). (Most
of mine don't contain non-ASCII characters either) What I meant is that
I don't use any other encoding (like ISO-8859-1 or ISO-8859-15) to
encode non-ASCII characters, so I don't have any need for "use
encoding". If your scripts are all in ASCII and you use open.pm for
"rest", what do you need "use encoding" for? Remember, this subthread
started when you berated Ben for discouraging the use "use encoding".

	hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR       | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Mon, 29 Oct 2012 14:23:30 -0700 (PDT)
From: "shrike@cyberspace.org" <shrike@cyberspace.org>
Subject: Re: Why was suid support dropped in perl?
Message-Id: <907aaabf-b44e-4088-8405-da903991b1a6@googlegroups.com>

On Tuesday, October 23, 2012 7:29:35 PM UTC-4, Andrew Gideon wrote:
> On Sat, 20 Oct 2012 09:59:11 -0700, shrike@cyberspace.org wrote:
>=20
>=20
>=20
> > It turns out I have run into such a problem: running a driver written i=
n
>=20
> > perl on a remote host via SSH.
>=20
>=20
>=20
> Why not permit SSH to root using a key pair with a command restriction? =
=20
>=20
> Since the command runs as root, there's no su-ing required. =20
>=20
>=20
>=20
> This does have the risks associated with the program itself running as=20
>=20
> root, but you'd have those anyway, right?
>=20
>=20
>=20
> 	- Andrewq

Because then I would have to support public key rhost based authentication =
for sshd, which is an even worse proposition than supporting sudo. If I tou=
ch _anything_ else, I own it. All I can reasonably expect to secure or supp=
ort is _my_ code. This is the basic reality of software support.=20

My concern is not whether _I_ can use it. My concern is whether somebody el=
se can use it by following a short set of instructions. "chmod +s" works. s=
udo or rhost+sshd is a 3 hour support call. And I'm not going to tell someb=
ody to turn on remote access for the root account for sshd, when I have no =
reasonable expectation that they understand the consequences of doing so.=
=20









------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3806
***************************************


home help back first fref pref prev next nref lref last post