[33168] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4447 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jun 8 05:17:23 2015

Date: Mon, 8 Jun 2015 02:17:05 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 8 Jun 2015     Volume: 11 Number: 4447

Today's topics:
        JSON and Unicode, am I missing something? <*@eli.users.panix.com>
    Re: JSON and Unicode, am I missing something? <rweikusat@mobileactivedefense.com>
    Re: JSON and Unicode, am I missing something? <whynot@pozharski.name>
    Re: JSON and Unicode, am I missing something? <*@eli.users.panix.com>
    Re: JSON and Unicode, am I missing something? <rweikusat@mobileactivedefense.com>
    Re: JSON and Unicode, am I missing something? <rweikusat@mobileactivedefense.com>
    Re: push/shift/keys/... on refs (was: A hash of referen <whynot@pozharski.name>
    Re: push/shift/keys/... on refs <rweikusat@mobileactivedefense.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 5 Jun 2015 23:38:16 +0000 (UTC)
From: Eli the Bearded <*@eli.users.panix.com>
Subject: JSON and Unicode, am I missing something?
Message-Id: <eli$1506051909@qz.little-neck.ny.us>

The JSON module claims to expect UTF-8, but it doesn't seem to like it.
I get the same "Wide character in subroutine entry" error for external
UTF-8 data files read { open(FH, '<:encoding(UTF-8)', $jsonfile) },
UTF-8 included in strings in "use utf8;" source, and \x{} escapes for
Unicode characters.

  :r! cat /tmp/test-json
  #!/usr/bin/perl -w
  use JSON;
  use strict;
  use vars qw( $json $data );

  # begin flailing for a fix to "Wide character in subroutine entry" {
  use diagnostics;
  use feature 'unicode_strings';
  use utf8;
  binmode STDIN, ':utf8';
  binmode STDERR, ':utf8';
  binmode STDOUT, ':utf8';
  no warnings 'utf8';
  # } end flailing

  $json = qq![
    {
	"unicode": "U+2512",
	"highbit": "\x{2512}"
    }
  ]!;

  $data = decode_json $json;
  __END__

  :r! env -i /usr/local/bin/perl5.14.1 /tmp/test-json
  Wide character in subroutine entry at /tmp/test-json line 23 (#1)
      (S utf8) Perl met a wide character (>255) when it wasn't expecting
      one.  This warning is by default on for I/O (like print).  The easiest
      way to quiet this warning is simply to add the :utf8 layer to the
      output, e.g. binmode STDOUT, ':utf8'.  Another way to turn off the
      warning is to add no warnings 'utf8'; but that is often closer to
      cheating.  In general, you are supposed to explicitly mark the
      filehandle with an encoding, see open and "binmode" in perlfunc.
      
  Uncaught exception from user code:
	  Wide character in subroutine entry at /tmp/test-json line 23.
   at /tmp/test-json line 23

  :r! env -i /usr/local/bin/perl5.20.2 /tmp/test-json
  Use of uninitialized value $^WARNING_BITS in bitwise xor (^) at /usr/local/lib/perl5/site_perl/5.14.1/common/sense.pm line 237.
  Use of uninitialized value $^WARNING_BITS in bitwise xor (^) at /usr/local/lib/perl5/site_perl/5.14.1/common/sense.pm line 237.
  Wide character in subroutine entry at /tmp/test-json line 23 (#1)
      (S utf8) Perl met a wide character (>255) when it wasn't expecting
      one.  This warning is by default on for I/O (like print).  The easiest
      way to quiet this warning is simply to add the :utf8 layer to the
      output, e.g. binmode STDOUT, ':utf8'.  Another way to turn off the
      warning is to add no warnings 'utf8'; but that is often closer to
      cheating.  In general, you are supposed to explicitly mark the
      filehandle with an encoding, see open and "binmode" in perlfunc.
      
  Uncaught exception from user code:
	  Wide character in subroutine entry at /tmp/test-json line 23.

  :r! /usr/local/bin/perl5.20.2 -v

  This is perl 5, version 20, subversion 2 (v5.20.2) built for i386-netbsd-thread-multi

  Copyright 1987-2015, Larry Wall

  Perl may be copied only under the terms of either the Artistic License or the
  GNU General Public License, which may be found in the Perl 5 source kit.

  Complete documentation for Perl, including FAQ lists, should be found on
  this system using "man perl" or "perldoc perl".  If you have access to the
  Internet, point your browser at http://www.perl.org/, the Perl Home Page.


That's running on the Panix hosts, where I have my personal webspace. Panix
keeps multiple versions of perl around, currently nine between 5.00403
and 5.20.2. I get the same results on Ubuntu (12.04.4) with the packaged
perl:

This is perl 5, version 14, subversion 2 (v5.14.2) built for x86_64-linux-gnu-thread-multi
(with 57 registered patches, see perl -V for more detail)

  $ /usr/bin/perl test-json
  Wide character in subroutine entry at test-json line 21 (#1)
      (S utf8) Perl met a wide character (>255) when it wasn't expecting
      one.  This warning is by default on for I/O (like print).  The easiest
      way to quiet this warning is simply to add the :utf8 layer to the
      output, e.g. binmode STDOUT, ':utf8'.  Another way to turn off the
      warning is to add no warnings 'utf8'; but that is often closer to
      cheating.  In general, you are supposed to explicitly mark the
      filehandle with an encoding, see open and "binmode" in perlfunc.
      
  Uncaught exception from user code:
	  Wide character in subroutine entry at test-json line 21.
   at test-json line 21
  $

(That test-json doesn't have the two "flailing" comments, so different
line numbers.)

What am I missing here?

Elijah
------
has other code using the JSON module that just seems to work


------------------------------

Date: Sat, 06 Jun 2015 15:50:32 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: JSON and Unicode, am I missing something?
Message-Id: <87oaksevrb.fsf@doppelsaurus.mobileactivedefense.com>

Eli the Bearded <*@eli.users.panix.com> writes:
> The JSON module claims to expect UTF-8, but it doesn't seem to like it.
> I get the same "Wide character in subroutine entry" error for external
> UTF-8 data files read { open(FH, '<:encoding(UTF-8)', $jsonfile) },
> UTF-8 included in strings in "use utf8;" source, and \x{} escapes for
> Unicode characters.
>
>   :r! cat /tmp/test-json
>   #!/usr/bin/perl -w
>   use JSON;
>   use strict;
>   use vars qw( $json $data );
>
>   # begin flailing for a fix to "Wide character in subroutine entry" {
>   use diagnostics;
>   use feature 'unicode_strings';
>   use utf8;
>   binmode STDIN, ':utf8';
>   binmode STDERR, ':utf8';
>   binmode STDOUT, ':utf8';
>   no warnings 'utf8';
>   # } end flailing
>
>   $json = qq![
>     {
> 	"unicode": "U+2512",
> 	"highbit": "\x{2512}"
>     }
>   ]!;
>
>   $data = decode_json $json;
>   __END__

You aren't passing 'utf-8' into the function but a Perl string
containing wide characters. For this example, you'd either need to use
the interface which accepts unicode or 'encode' your data into UTF-8
which basically means turning off the 'utf8' flag. Example with
everything not serving any purpose removed (tested with 5.14.2)

--------
  #!/usr/bin/perl -w
  use JSON;
  use strict;
  use vars qw( $json $data );
  use Encode;

  # begin flailing for a fix to "Wide character in subroutine entry" {
  use diagnostics;
  use feature 'unicode_strings';
  # } end flailing

  $json = qq![
    {
	"unicode": "U+2512",
	"highbit": "\x{2512}"
    }
  ]!;

  $data = from_json($json);
  $data = decode_json(encode('utf-8', $json));
---------

There is no UTF-8 in your source code and none of the STD*-streams is
used for anything (in addition to being a weird idea, the virtual
top-secret internal Perl encoding also seems to be a tad bit too
complicated to be easily understood ...).




------------------------------

Date: Sat, 06 Jun 2015 14:11:10 +0300
From: Eric Pozharski <whynot@pozharski.name>
Subject: Re: JSON and Unicode, am I missing something?
Message-Id: <slrnmn5lae.is0.whynot@orphan.zombinet>

with <eli$1506051909@qz.little-neck.ny.us> Eli the Bearded wrote:

*SKIP*
>   Uncaught exception from user code:
> 	  Wide character in subroutine entry at /tmp/test-json line 23.
>    at /tmp/test-json line 23
*SKIP*
> What am I missing here?

Evidence, of course.  I don't understand why on site perl hides this:

	Uncaught exception from user code:
		Wide character in subroutine entry at /home/whynot/foo.hgT42R.pl line 23
	 at /usr/share/perl5/JSON/backportPP.pm line 654
		JSON::PP::PP_decode_json('JSON::PP=HASH(0x88b493c)', '[
	    {
		"unicode": "U+2512",
		"highbit": "┒"
	 ...', 0) called at /usr/share/perl5/JSON/backportPP.pm line 149
		JSON::PP::decode('JSON::PP=HASH(0x88b493c)', '[
	    {
		"unicode": "U+2512",
		"highbit": "┒"
	 ...') called at /usr/share/perl5/JSON/backportPP.pm line 111
		JSON::PP::decode_json('[
	    {
		"unicode": "U+2512",
		"highbit": "┒"
	 ...') called at /home/whynot/foo.hgT42R.pl line 23

For me /usr/share/perl5/JSON/backportPP.pm around line#654 looks like
this:

	651:	        ($utf8, $relaxed, $loose, $allow_bigint, $allow_barekey, $singlequote)
	652:	            = @{$idx}[P_UTF8, P_RELAXED, P_LOOSE .. P_ALLOW_SINGLEQUOTE];
	653 	
	654:	        if ( $utf8 ) {
	655:	            utf8::downgrade( $text, 1 ) or Carp::croak("Wide character in subroutine entry");
	656 	        }
	657 	        else {
	658:	            utf8::upgrade( $text );
	659 	        }
	660 	

HTH?

-- 
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom


------------------------------

Date: Sun, 7 Jun 2015 06:47:49 +0000 (UTC)
From: Eli the Bearded <*@eli.users.panix.com>
Subject: Re: JSON and Unicode, am I missing something?
Message-Id: <eli$1506070229@qz.little-neck.ny.us>

In comp.lang.perl.misc,
Rainer Weikusat  <rweikusat@mobileactivedefense.com> wrote:
> You aren't passing 'utf-8' into the function but a Perl string
> containing wide characters. For this example, you'd either need to use
> the interface which accepts unicode or 'encode' your data into UTF-8
> which basically means turning off the 'utf8' flag. Example with
> everything not serving any purpose removed (tested with 5.14.2)

I've lost you at "accepts unicode". UTF-8 is Unicode. It is not the
only Unicode encoding, but it is one of the more common ones. \x{2512}
is a reference to a defined Unicode code point. It is Unicode.

    :r! cat mktest-json
    #!/usr/bin/perl -w
    use strict;
    use vars qw( $file $json );
    my $file = '/tmp/json-data';
    my $json = qq![
	{
	    "unicode": "U+2512",
	    "highbit": "\x{2512}"
	}
    ]!;

    if(!open(JSON, '>:encoding(UTF-8)', $file)) {
      die "$0: oops: $file $!\n";
    }

    print JSON $json;
    close JSON;
    __END__

    :r! perl5.14.2 mktest-json; file /tmp/json-data
    /tmp/json-data: UTF-8 Unicode text

    :r! cat test-json
    #!/usr/bin/perl -w
    use strict;
    use JSON;
    use vars qw( $json $data $file );
    $file = '/tmp/json-data';
    $json = '';

    if(!open(JSON, '<:encoding(UTF-8)', $file)) {
      die "$0: oops: $file $!\n";
    }
    while (<JSON>) { $json .= $_; }
    close JSON;

    $data = decode_json $json;
    __END__

    :r! perl5.14.2 test-json
    Wide character in subroutine entry at test-json line 14.


But rereading this sentence:
> You aren't passing 'utf-8' into the function but a Perl string
> containing wide characters.

I think you are trying to say that because I've informed Perl of the
file encoding, I'm running into issues since Perl is decoding from UTF-8
to it's internal encoding, and then that internal encoding is breaking
the 'this is uninterpreted UTF-8' requirement of the decode_json()
function.

That's a subtlety that yes, I can see myself overlooking. And indeed,
that does seem to fix it:

    :r! cat test-json-noencode
    #!/usr/bin/perl -w
    use strict;
    use JSON;
    use vars qw( $json $data $file );
    $file = '/tmp/json-data';
    $json = '';

    if(!open(JSON, '<', $file)) {
      die "$0: oops: $file $!\n";
    }
    while (<JSON>) { $json .= $_; }
    close JSON;

    $data = decode_json $json;
    __END__

    :r! perl5.14.2 test-json-noencode


(No output of all, of course, being the expected result of that test.)

Elijah
------
can now proceed to fix the actual code


------------------------------

Date: Sun, 07 Jun 2015 12:21:47 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: JSON and Unicode, am I missing something?
Message-Id: <874mmjojas.fsf@doppelsaurus.mobileactivedefense.com>

Eli the Bearded <*@eli.users.panix.com> writes:
> In comp.lang.perl.misc,
> Rainer Weikusat  <rweikusat@mobileactivedefense.com> wrote:
>> You aren't passing 'utf-8' into the function but a Perl string
>> containing wide characters. For this example, you'd either need to use
>> the interface which accepts unicode or 'encode' your data into UTF-8
>> which basically means turning off the 'utf8' flag. Example with
>> everything not serving any purpose removed (tested with 5.14.2)
>
> I've lost you at "accepts unicode".

You've lost your understanding of the module documentation I was quoting
at that point: 'Accepts unicode' means 'accepts a string marked as
UTF-8' which is not the same as 'a UTF-8 string' because the latter is a
sequence of bytes without encoding information (NB: This is again an
almost verbatim quote from the documentation)

> UTF-8 is Unicode.

UTF-8 is scheme for encoding numbers whose values can't be
represented with only 7 value bits.


------------------------------

Date: Sun, 07 Jun 2015 12:40:36 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: JSON and Unicode, am I missing something?
Message-Id: <87oakrvj9n.fsf@doppelsaurus.mobileactivedefense.com>

Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
> Eli the Bearded <*@eli.users.panix.com> writes:
>> In comp.lang.perl.misc,
>> Rainer Weikusat  <rweikusat@mobileactivedefense.com> wrote:
>>> You aren't passing 'utf-8' into the function but a Perl string
>>> containing wide characters. For this example, you'd either need to use
>>> the interface which accepts unicode or 'encode' your data into UTF-8
>>> which basically means turning off the 'utf8' flag. Example with
>>> everything not serving any purpose removed (tested with 5.14.2)
>>
>> I've lost you at "accepts unicode".
>
> You've lost your understanding of the module documentation I was quoting
> at that point:

This is sort-of silly but "gebranntes Kind scheut den Ofen". The
corresponding passage is

    # option-acceptable interfaces (expect/generate UNICODE by default)

    $json_text   = to_json( $perl_scalar, { ascii => 1, pretty => 1 } );
    $perl_scalar = from_json( $json_text, { utf8  => 1 });
    
            


------------------------------

Date: Sat, 06 Jun 2015 14:34:35 +0300
From: Eric Pozharski <whynot@pozharski.name>
Subject: Re: push/shift/keys/... on refs (was: A hash of references to arrays of references to hashes... is there a better way?)
Message-Id: <slrnmn5mmb.is0.whynot@orphan.zombinet>

with <slrnm760f4.5p3.whynot@orphan.zombinet> Eric Pozharski wrote:
> with <slrnm73qmm.pgj.hjp-usenet3@hrunkner.hjp.at> Peter J. Holzer wrote:

*SKIP*
>> What? That's one of the most useful additions to the Perl language in
>> recent years. It makes my code a lot less cluttered. I really hope you
>> are wrong.  Source?

	http://www.nntp.perl.org/group/perl.perl5.porters/2013/11/msg209170.html

There's no consensus between developers and users though.  Developers
are SHOCKED -- users don't care about consistency!

*CUT*

p.s.  And I didn't much fact checking.  Googling "auto deref" proved
useless.  At least now I have a name.

-- 
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom


------------------------------

Date: Sun, 07 Jun 2015 16:50:13 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: push/shift/keys/... on refs
Message-Id: <87zj4bo6ve.fsf@doppelsaurus.mobileactivedefense.com>

Eric Pozharski <whynot@pozharski.name> writes:
> with <slrnm760f4.5p3.whynot@orphan.zombinet> Eric Pozharski wrote:
>> with <slrnm73qmm.pgj.hjp-usenet3@hrunkner.hjp.at> Peter J. Holzer wrote:
>
> *SKIP*
>>> What? That's one of the most useful additions to the Perl language in
>>> recent years. It makes my code a lot less cluttered. I really hope you
>>> are wrong.  Source?
>
> 	http://www.nntp.perl.org/group/perl.perl5.porters/2013/11/msg209170.html
>
> There's no consensus between developers and users though.  Developers
> are SHOCKED -- users don't care about consistency!

People generally don't care about problems which don't affect
themselves. Eg, C coders often shun preprocessor directives for
conditional compilation, presumably, because what the thing they
call a text editor calls C language support doesn't work well in concert
with these. If this in results users having to pay for features they're
not interested then, well, that's someone else's problem. Similarly,
people who manged to get all worked up over having to type

push(@{$_[0]},

"I know that I mean that, why doesn't the stupid computer, too!!1"
probably couldn't care less about '@{} overloading' and other strange
stuff they'd never want to use, anyway.

Nice post in this thread:

http://www.nntp.perl.org/group/perl.perl5.porters/2013/11/msg209444.html

NB: 'auto-deref' (like 'say') is one of these new features I'd never
consider using because breaking compatibility with older Perls in order
to avoid typing three characters is IMHO syntax fixation gone totally
mad ---- I have much more difficult problems to cope with that being
haunted by @{}s ...




------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4447
***************************************


home help back first fref pref prev next nref lref last post