[32690] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3814 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jun 3 14:55:56 2013

Date: Thu, 8 Nov 2012 02:20:57 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 8 Nov 2012     Volume: 11 Number: 3814

Today's topics:
    Re: Problems with tabulations <luca.francesca01@gmail.com>
    Re: Problems with tabulations <luca.francesca01@gmail.com>
    Re: Trampoline sub <rweikusat@mssgmbh.com>
    Re: Why "Wide character in print"? <whynot@pozharski.name>
    Re: Why "Wide character in print"? <rweikusat@mssgmbh.com>
    Re: Why "Wide character in print"? (Seymour J.)
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Tue, 6 Nov 2012 22:12:52 -0800 (PST)
From: Luca Francesca <luca.francesca01@gmail.com>
Subject: Re: Problems with tabulations
Message-Id: <11e0a98d-d85b-48b5-a518-7b249cadb8c0@googlegroups.com>

Il giorno marted=EC 6 novembre 2012 22:47:35 UTC+1, J. Gleixner ha scritto:
> On 11/06/12 15:30, Luca Francesca wrote:
>=20
> > Hello.
>=20
> > I've a program to extract some data from a file. ( http://pastebin.com/=
f3M6LuQw ).
>=20
> > The output of the program is not nice as i want (I used \t but isn't wo=
rking well)
>=20
> >
>=20
> > Any idea about a fix??
>=20
>=20
>=20
>=20
>=20
> Pretty poor code.. but to answer your question.. modify the outout
>=20
> produced by 'print' as needed.. e.g. use printf instead of relying on
>=20
> tab.
>=20
>=20
>=20
> e.g. this line:
>=20
>=20
>=20
> print "Service \t $service \n";
>=20
>=20
>=20
> might produce nicer output as
>=20
>=20
>=20
> printf "%-15s %s\n", 'Service', $service;
>=20
>=20
>=20
> For other options and to explain what the '-' does, see:
>=20
> perldoc -f sprintf

Nice fix.
Thanks.


------------------------------

Date: Tue, 6 Nov 2012 22:13:48 -0800 (PST)
From: Luca Francesca <luca.francesca01@gmail.com>
Subject: Re: Problems with tabulations
Message-Id: <2add9344-eb49-4c30-929e-4ae3c356047b@googlegroups.com>

Il giorno mercoled=EC 7 novembre 2012 00:33:04 UTC+1, Ben Morrow ha scritto=
:
> Quoth "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid>:
>=20
> > On 11/06/12 15:30, Luca Francesca wrote:
>=20
> > > Hello.
>=20
> > > I've a program to extract some data from a file. (
>=20
> > http://pastebin.com/f3M6LuQw ).
>=20
> > > The output of the program is not nice as i want (I used \t but isn't
>=20
> > working well)
>=20
> > >
>=20
> > > Any idea about a fix??
>=20
> >=20
>=20
> >=20
>=20
> > Pretty poor code.. but to answer your question.. modify the outout
>=20
> > produced by 'print' as needed.. e.g. use printf instead of relying on
>=20
> > tab.
>=20
> >=20
>=20
> > e.g. this line:
>=20
> >=20
>=20
> > print "Service \t $service \n";
>=20
> >=20
>=20
> > might produce nicer output as
>=20
> >=20
>=20
> > printf "%-15s %s\n", 'Service', $service;
>=20
>=20
>=20
> If you're producing formatted ASCII output, you might consider using
>=20
> formats.
>=20
>=20
>=20
> <duck>
>=20
>=20
>=20
> (Or Perl6::Form, which is nicer and saner, and not really Perl-6-related
>=20
> at all.)
>=20
>=20
>=20
> Ben

I'll give it a shot ;)


------------------------------

Date: Wed, 07 Nov 2012 17:37:42 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Trampoline sub
Message-Id: <87bof9pb4p.fsf@sapphire.mobileactivedefense.com>

"C.DeRykus" <derykus@gmail.com> writes:
> On Monday, November 5, 2012 6:45:45 AM UTC-8, Rainer Weikusat wrote:
>> Rainer Weikusat <rweikusat@mssgmbh.com> writes:

[...]

>> When building state machines in C, I usually use a function pointer
>> as 'state variable' because this implies there's no need to
>> explicitly written, state-dependent control transfer code. The
>> first time I did this in Perl, it occurred to me that it should be
>> possible to simplify the 'obvious' implementation,using a
>> module-global scalar variable holding a reference to the
>> subroutine to be executed next, to one which just invokes the
>> subroutine by name

[...]

>> ------------
>> 
>> package FlipFlop;
>> 
>> 
>> 
>> require Exporter;
>> 
>> 
>> 
>> our @ISA = qw(Exporter);
>> 
>> our @EXPORT = qw(v);
>                 -----
>                 qw(*v);   # <---------- 
>
> Ran into this by accident.  Exporting the actual glob 
> makes your flipflop work:
>
>    our @EXPORT = qw( *v );
>
> $ perl -MFlipFlop -le 'while(1) { print v();sleep 1}'
> 0
> 1
> ...
>
> Here's the relevant bit from perlmod:
>
>    What makes all of this important is that the 
>    Exporter module uses glob aliasing as the import/
>    export mechanism. Whether or not you can properly
>    localize a variable that has been exported from a
>    module depends on how it was exported:
>
>     @EXPORT = qw($FOO); # Usual form, can't be 
>                         #   localized
>     @EXPORT = qw(*FOO); # Can be localized

[...]

> So, evidently, Perl has to be able localize the
> glob... to do the glob twiddle on the fly.

This is going to become somewhat lengthy ...

What the text you quoted refers to as 'can be localized' is another
side effect of the glob export. I think I should first show the
difference between this "can't be localized" and "can be
localized". Assuming that a file named Localized.pm with the following
content

-----------
package Localized;

require Exporter;

our @ISA =	qw(Exporter);
our @EXPORT =	qw($a_var a_sub);

our $a_var = 3;

sub a_sub {
    return $a_var + 1;
}

1;
------------

is available in the perl module search path, the program included
below

------------
use Localized;

print a_sub(), "\n";

{
    local $a_var = 55;
    
    print a_sub(), ' ', $a_var, "\n";
}
------------

will print

,----
| 4
| 4 55
`----

The reason for this can be seen by modifying it as follows:

------------
use Devel::Peek;
use Localized;

Dump(*a_var);
Dump(*Localized::a_var);

{
    local $a_var = 55;
    Dump(*a_var);
}
------------

The output of that is

,----
| SV = PVGV(0x6a3740) at 0x68b558
|   REFCNT = 4
|   FLAGS = (MULTI,IN_PAD,IMPORT( SV ))
|   NAME = "a_var"
|   NAMELEN = 5
|   GvSTASH = 0x605bb0    "main"
|   GP = 0x68e320
|     SV = 0x660390
|     REFCNT = 1
|     IO = 0x0
|     FORM = 0x0  
|     AV = 0x0
|     HV = 0x0
|     CV = 0x0
|     CVGEN = 0x0
|     LINE = 193
|     FILE = "/usr/share/perl/5.10/Exporter/Heavy.pm"
|     FLAGS = 0x1a
|     EGV = 0x68b558      "a_var"
| SV = PVGV(0x65b490) at 0x660378
|   REFCNT = 3
|   FLAGS = (MULTI,IN_PAD)
|   NAME = "a_var"
|   NAMELEN = 5
|   GvSTASH = 0x63cfa8    "Localized"
|   GP = 0x62f1d0
|     SV = 0x660390
|     REFCNT = 1
|     IO = 0x0
|     FORM = 0x0  
|     AV = 0x0
|     HV = 0x0
|     CV = 0x0
|     CVGEN = 0x0
|     LINE = 8
|     FILE = "Localized.pm"
|     FLAGS = 0xa
|     EGV = 0x660378      "a_var"
| SV = PVGV(0x6a3740) at 0x68b558
|   REFCNT = 5
|   FLAGS = (MULTI,IN_PAD,IMPORT( SV ))
|   NAME = "a_var"
|   NAMELEN = 5
|   GvSTASH = 0x605bb0    "main"
|   GP = 0x68e320
|     SV = 0x605d48
|     REFCNT = 1
|     IO = 0x0
|     FORM = 0x0  
|     AV = 0x0
|     HV = 0x0
|     CV = 0x0
|     CVGEN = 0x0
|     LINE = 193
|     FILE = "/usr/share/perl/5.10/Exporter/Heavy.pm"
|     FLAGS = 0x1a
|     EGV = 0x68b558      "a_var"
`----

After the initial import, the GPs of both a_var point to different
objects but the SV slot of each GP points to the same scalar. The
later local changes the binding of the SV slot of the a_var GP in main
but doesn't affect the SV slot of Localized::a_var. When the *a_var
glob is exported instead, this changes to

,----
| SV = PVGV(0x6a3740) at 0x660258
|   REFCNT = 4
|   FLAGS = (MULTI,IN_PAD,IMPORTALL)
|   NAME = "a_var"
|   NAMELEN = 5
|   GvSTASH = 0x605bb0    "main"
|   GP = 0x62f1d0
|     SV = 0x660390
|     REFCNT = 2
|     IO = 0x0
|     FORM = 0x0  
|     AV = 0x0
|     HV = 0x0
|     CV = 0x0
|     CVGEN = 0x0
|     LINE = 8
|     FILE = "Localized.pm"
|     FLAGS = 0xfa
|     EGV = 0x660378      "a_var"
| SV = PVGV(0x65b490) at 0x660378
|   REFCNT = 3
|   FLAGS = (MULTI,IN_PAD)
|   NAME = "a_var"
|   NAMELEN = 5
|   GvSTASH = 0x63cfa8    "Localized"
|   GP = 0x62f1d0
|     SV = 0x660390
|     REFCNT = 2
|     IO = 0x0
|     FORM = 0x0  
|     AV = 0x0
|     HV = 0x0
|     CV = 0x0
|     CVGEN = 0x0
|     LINE = 8
|     FILE = "Localized.pm"
|     FLAGS = 0xa
|     EGV = 0x660378      "a_var"
| SV = PVGV(0x6a3740) at 0x660258
|   REFCNT = 5
|   FLAGS = (MULTI,IN_PAD,IMPORTALL)
|   NAME = "a_var"
|   NAMELEN = 5
|   GvSTASH = 0x605bb0    "main"
|   GP = 0x62f1d0
|     SV = 0x605d48
|     REFCNT = 2
|     IO = 0x0
|     FORM = 0x0  
|     AV = 0x0
|     HV = 0x0
|     CV = 0x0
|     CVGEN = 0x0
|     LINE = 8
|     FILE = "Localized.pm"
|     FLAGS = 0xfa
|     EGV = 0x660378      "a_var"
`----

Here, the GP associated with both names is identical and rebinding the
SV slot of this GP thus causes a change visible to the a_sub
subroutine. With this change, the output of the original program
becomes

,----
| 4
| 56 55
`----

This 'GP' export is also what causes the subroutine switching in the
FlipFlop example to work as intended: Since both FlipFlop::v and
main::v share a GP, changing the CV slot of this GP in FlipFlop is
also visible in main. The downsides of this for a pure subroutine
export/ import are that changes in the importing module may now effect
changes in the exporting module, as demonstrated in the a_var example,
which is not exactly obvious and usually not intended, and that this
still doesn't guarantee that modifications to the FlipFlop symbol
table will remain visible in main: This only works for as long as both
continue to share the GP and it is possible to cause a new GP to be
assigned to either v, eg, by assigning a different glob to *v.


------------------------------

Date: Tue, 06 Nov 2012 21:19:34 +0200
From: Eric Pozharski <whynot@pozharski.name>
Subject: Re: Why "Wide character in print"?
Message-Id: <slrnk9iom6.la3.whynot@orphan.zombinet>

with <slrnk99ugj.r8t.hjp-usenet2@hrunkner.hjp.at> Peter J. Holzer wrote:
> On 2012-11-02 15:49, Eric Pozharski <whynot@pozharski.name> wrote:
>> with <slrnk94mfm.5vl.hjp-usenet2@hrunkner.hjp.at> Peter J. Holzer wrote:

*SKIP*
> If you use “use encoding 'KOI8-U';”, you can use KOI8 sequences
> (either literally or via escape sequences) in your source code. For
> example, if you store this program in KOI8-U encoding:
>
>
> #!/usr/bin/perl
> use warnings;
> use strict;
> use 5.010;
> use encoding 'KOI8-U';
>
> my $s1 = "Б";
> say ord($s1);
> my $s2 = "\x{E2}";
> say ord($s2);
> __END__
>
> (i.e. the string literal on line 7 is stored as the byte sequence 0x22
> 0xE2 0x22), the program will print 1041 twice, because:
>
> * The perl compiler knows that the source code is in KOI-8, so a
> single byte 0xE2 in the source code represents the character “U+0411
> CYRILLIC CAPITAL LETTER BE”. Similarly, Escape sequences of the form
> \ooo and \Xxx are taken to denote bytes in the source character set
> and translated to unicode. So both the literal Б on line 7 and the
> \x{E2} on line 9 are translated to U+0411.
>
> * At run time, the bytecode interpreter sees a string with the single
> unicode character U+0411. How this character was represented in the
> source code is irrelevant (and indeed, unknowable) to the byte code
> interpreter at this stage. It just prints the decimal representation
> of 0x0411, which happens to be 1041.

Indeed, that renders perl somewhat lame. "They" could invent some
property attached at will to any scalar that would reflect some
byte-encoding somewhat connected with this scalar. Then make each other
operation to pay attention to that property. However, that hasn't been
done. Because on the way to all-utf8 Perl sacrifices have to be made.
Now, if that source would be saved as UTF-8 then output wouldn't be any
different.

I had no use for ord() (and I don't have now) but that wouldn't surprise
me if at some point in perl development ord() (in this script) would
return 208. And the only thing that could be done to make it work would
be upgrade, sometime later.

Look, *literals* are converted to utf8 with UTF8 flag on. Maybe that's
what made (and makes) qr// to work, as expected:

	{41393:56} [0:0]% perl -wlE '"фыва" =~ m{(\w)}; print $1' 

	{42187:57} [0:0]% perl -Mutf8 -wle '"фыва" =~ m{(\w)}; print $1' 
	Wide character in print at -e line 1.
	ф
	{42203:58} [0:0]% perl -Mencoding=utf8 -wle '"фыва" =~ m{(\w)}; print $1' 
	ф

For explanation what happens in 1st example see below. I may be wrong
here, but I think, that in 2nd and 3rd example it all turns around $^H
anyway.

>> In pre-all-utf8 times qr// was working on bytes without being told to
>> behave otherwise. That's different now.
> Yes, I think I wrote that before. I don't know what this has to do
> with the behaviour of “use encoding”, except that historically, “use
> encoding” was intended to convert old byte-oriented scripts to the
> brave new unicode-centered world with minimal effort. (I don't think
> it met that goal: Over the years I have encountered a lot of people
> who had problems with “use encoding”, but I don't remember ever
> reading from someone who successfully converted their scripts by
> slapping “use encoding '...'” at the beginning.)

I didn't convert anything. So I don't pretend you can count me in.
Just now I've come to conclusion that C<use encoding 'utf8';> (that's
what I've ever used) is effects of C<use utf8;> plus binmode() on
streams minus posibility to make non us-ascii literals. I've been
always told that I *must* C<use utf8;> and than manually do binmode()s
myself. Nobody ever explained why I can't do that with C<use encoding
'utf8';>.

Now, C<use encoding 'binary-enc';> behaves as above (they have fully
functional UTF-8 script limited by advance of perl to all-utf8), except
actual source isn't UTF-8. I can imagine reasons why that could be
necessary. Indeed, such circumstances would be rare.  Myself is in
aproximately full control of environment, thus it's not problem for me.

As of 'lot of people', I'll tell you who I've met. I've seen loads of
13-year-old boys (those are called snowflakes these days) who don't know
how to deal with shit. For those, who don't know how to deal with shit,
jobs.perl.org is the way.

*SKIP*
> (but you do have to call it explicitely for STDERR, which IMNSHO is
> inconsistent).

Think about it. What terminal presents (in fonts) is locale dependent.
That locale could be 'POSIX'. There's no 'POSIX.UTF-8'.  And see below.

*SKIP*
>> Except the middle one (what I should think about), I think
>> encoding.pm wins again.
> You didn't understand why the the middle one produced this particular
> result. So you were surprised by the way “use encoding” translates
> string literals. I wasn't surprised. I knew how it works and explained
> it to you in my followup. 

That's nice you brought that back. I've already figured it all out.

----
	{0:1} [0:0]% perl -Mutf8 -wle 'print "à"' 
	�
	{23:2} [0:0]% perl -Mutf8 -wle 'print "à "'
	� 
----
	{36271:17} [0:0]% perl -Mutf8 -wle 'print "à"' 

	{36280:18} [0:0]% perl -Mutf8 -wle 'print "à "'
	à 
----

What's common in those two pairs: it's special Perl-latin1, with UTF8
flag off, none utf8 concerned layer is set on output. What's different:
the former is xterm, the latter is urxvt. In eather case, that's what
is output actually:

	{36831:20} [0:1]% perl -Mutf8 -wle 'print "à"' | xxd
	0000000: e00a ..
	{37121:21} [0:0]% perl -Mutf8 -wle 'print "à "' | xxd
	0000000: e020 0a . .

So, 0xe0 has nothing to do in utf-8 output. xterm replaces it with
replacement (what makes sense). In contrary, urxvt applies some weird
heuristic (and it's really weird)

	{37657:28} [0:0]% perl -Mutf8 -wle 'print "àá"' 
	à
	{37663:29} [0:0]% perl -Mutf8 -wle 'print "àáâ"' 
	àá
	{37666:30} [0:0]% perl -Mutf8 -wle 'print "àáâã"'
	àáâ

*If* it's xterm vs. urxvt then, I think, it's religious (that means it's
not going to change). However, it doesn't look configurable or at least
documented while obviously it could be usable (configurability
provided). Then it may be some weird interaction with fontconfig, or
xft, or some unnamed perl extension, or whatever else. If I won't
forget I'll invsetigate it later after upgrades.

As of your explanation. It's not precise.  encoding.pm does what it
always does. It doesn't mangle scalars itself, it *hints* Encode.pm
(and friends) for decoding from encoding specified to utf8. (How
Encode.pm comes into play is beyond my understanding for now.) In case
of C<use encoding 'utf8';> it happens to be decoding from utf-8 to utf8.
Encode.pm tries to decode byte with value more than 0x7F and falls back
for replacement.

That may be undesired. And considering this:

	encoding - allows you to write your script in non-ascii or non-utf8

C<use encoding 'utf8';> may constitute abuse. What can I say?  I'm
abusing it. May be that's why it works.

*CUT*

-- 
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom


------------------------------

Date: Wed, 07 Nov 2012 10:51:40 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Why "Wide character in print"?
Message-Id: <87625h1y9v.fsf@sapphire.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:
> Quoth "Peter J. Holzer" <hjp-usenet2@hjp.at>:

[...]

>> Unicode was originally intended to be a 16 bit code, and Unicode 1.0
>> reflected this: It was 16 bit only and there was no intention to expand
>> it. That was only added in 2.0, about 4 years later (and at that time it
>> was theoretical: The first characters outside of the BMP were defined in
>> Unicode 3.1 in 2001, 9 years after the first release).
>> 
>> So of course anybody who implemented Unicode between 1992 and 1996
>> implemented it as a 16 bit code, because that was what the standard
>> said. Those early adopters include Plan 9, Windows NT, and Java.
>
> Yeah, fair enough, I suppose. It seems obvious in hindsight that 16 bits
> weren't going to be enough, but maybe that isn't fair.

It should have been obvious 'in foresight' that the '16 bit code' of
today will turn into a 22 bit code tomorrow, a 56 bit code a fortnight
from now and then slip back to 18.5 bit two weeks later[*] (the 0.5 bit
introduced by some guy who used to work with MPEG who transferred to the
Unicode consortium), much in the same way the W3C keeps changing the
name of HTML 4.01 strict to give the impression of development beyond
aimlessly moving in circles in the hope that - some day - someone might
chose to adopt it (web developers have shown a remarkable common sense
in this respect).

BTW, there's another aspect of the "all the world is external to perl
and doesn't matter [to us]" nonsense: perl can be embedded. Eg, I
spend a sizable part of my day yesterday writing some Perl code
supposed to run inside of postgres, as part of an UTF-8 based
database. In practice, it is possible to chose a database encoding
which can represent everything which needs to be represented in this
database which is also compatible with Perl, making it feasible to use
it for data manipulation. In theory, that's another "Thing which must
not be done" which - in this case - simply means that avoiding Perl
for such code in favour of a language which gives its users less
gratuitious headaches is preferable.

[*] I keep wondering why the letter T isn't defined as 'vertical
bar' + 'combining overline' (or why A isn't 'greek delta' + 'combining
hyphen' ...)


------------------------------

Date: Tue, 06 Nov 2012 20:52:54 -0500
From: Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>
Subject: Re: Why "Wide character in print"?
Message-Id: <5099bef6$50$fuzhry+tra$mr2ice@news.patriot.net>

In <slrnk9gg63.84i.hjp-usenet2@hrunkner.hjp.at>, on 11/05/2012
   at 11:42 PM, "Peter J. Holzer" <hjp-usenet2@hjp.at> said:

>Who is "we"? Before 5.12, you had to make the distinction. Strings
>without the SvUTF8 flag simply didn't have Unicode semantics. Now
>there is the unicode_strings feature, but

 3. 5.8.7 is the last Perl release available on IBM's EBCDIC
    operating systems, e.g., z/OS. I don't know whether there
    is a similar issue with Unisys.

-- 
Shmuel (Seymour J.) Metz, SysProg and JOAT  <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action.  I reserve the
right to publicly post or ridicule any abusive E-mail.  Reply to
domain Patriot dot net user shmuel+news to contact me.  Do not
reply to spamtrap@library.lspace.org



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3814
***************************************


home help back first fref pref prev next nref lref last post