[32846] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4112 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Jan 8 21:14:33 2014

Date: Wed, 8 Jan 2014 18:14:08 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 8 Jan 2014     Volume: 11 Number: 4112

Today's topics:
    Re: Question about language setting <dave@invalid.invalid>
    Re: Question about language setting <ben@morrow.me.uk>
    Re: Question about language setting <dave@invalid.invalid>
    Re: Question about language setting <rweikusat@mobileactivedefense.com>
    Re: Question about language setting <ben@morrow.me.uk>
    Re: Question about language setting <ben@morrow.me.uk>
    Re: Question about language setting <rweikusat@mobileactivedefense.com>
    Re: Question about language setting (Tim McDaniel)
    Re: Question about language setting (Tim McDaniel)
    Re: Question about language setting <ben@morrow.me.uk>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 8 Jan 2014 14:03:34 +0000 (UTC)
From: "Dave Saville" <dave@invalid.invalid>
Subject: Re: Question about language setting
Message-Id: <fV45K0OBJxbE-pn2-dgcdC6ExUh8V@paddington.bear.den>

On Sun, 29 Dec 2013 14:20:24 UTC, "Peter J. Holzer" 
<hjp-usenet3@hjp.at> wrote:

> On 2013-12-29 13:00, Dave Saville <dave@invalid.invalid> wrote:
> > Interestingly:
> >
> > use strict;
> > use warnings;
> > use Carp;
> >
> > printf("%f\n", 2.5);
> >
> > Using perl 5.16.0
> >
> > [T:\tmp]try.pl
> > Invalid version format (non-numeric data) at 
> > u:/perl5/lib/5.16.0/Carp.pm line 3.
> >
> > BEGIN failed--compilation aborted at u:/perl5/lib/5.16.0/Carp.pm line 
> > 3.
> > Compilation failed in require at try.pl line 3.
> > BEGIN failed--compilation aborted at try.pl line 3.
> >
> > Carp line 3 says { use 5.006; }
> 
> Yup. I think somebody (Rainer?) had already pointed out this line as the
> likely culprit.
> 
> > Using perl 5.8.2
> >
> > [T:\tmp]try.pl
> > perl: warning: Setting locale failed.
> > perl: warning: Please check that your locale settings:
> >         LC_ALL = (unset),
> >         LANG = "de_DE_EURO"
> >     are supported and installed on your system.
> > perl: warning: Falling back to the standard locale ("C").
> > 2.500000
> >
> > Which is a darn site more useful IMHO.
> >
> > As the OS/2 setlocale() seems to suffer the same problem as the other 
> > OS above is there a perlish way around this one?
> 
> You could try 
> 
>     BEGIN { $ENV{LANG} = 'C' }
> 
> 
> > The problem I have is that I am comparing strings from two sources. 
> > One where the string is in the local code page and the other in utf8. 
> > I solved this by using Encode
> 
> Yes, Encode contains the necessary functions (But depending on the
> source it may be more convenient to use an I/O filter or soemthing
> similar instead of calling Encode::decode() explicitely).
> 
> 
> > and friends but that introduces the 
> > requirement for Carp and the above error :-(
> 
> I'm not sure why using Encode requires the use of Carp, but in any case 
> 
>     { use 5.006; }
> 
> is valid Perl and must compile successfully, regardless of any locale
> settings.
> 
> Is OS/2 still officially supported by Perl? It isn't on
> http://www.cpan.org/ports/ anymore.
> 
> A viable workaround might be to compile perl on OS/2 without locale
> support (since it doesn't seem to work correctly anyway). But that would
> also mean that your users need to use this locale-less perl binary,
> which may break some other scripts they use.

Well we have found the problem, conflicting .h file definitions, and 
rebuilt 5.163 which appears to be OK. But I would like to program 
round it if possible.

The problem occurs when the decimal separator is a comma. So, going on
your previous suggestion,

use strict;
use warnings;
BEGIN:
{
  if ( sprintf("%f", 2.5) =~ m{\,} )
  {
    print "oh dear\n";
    $ENV{LANG} = 'C';
  } 
}
use Encode;
print "Hello world\n";

But that fails too :-(

What is needed is not to process the use Encode - which triggers the 
error, before I have a chance to fix it by setting to C. Or is that 
not possible? 
-- 
Regards
Dave Saville


------------------------------

Date: Wed, 8 Jan 2014 16:00:32 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Question about language setting
Message-Id: <0a90qa-0jg1.ln1@anubis.morrow.me.uk>


Quoth "Dave Saville" <dave@invalid.invalid>:
> 
> The problem occurs when the decimal separator is a comma. So, going on
> your previous suggestion,
> 
> use strict;
> use warnings;
> BEGIN:

This is not a BEGIN block. This is a label called BEGIN, which is not
the same thing at all. Leave off the colon.

> {
>   if ( sprintf("%f", 2.5) =~ m{\,} )

I suspect you don't need to go as far as sprintf; something like

    if (2.5 ne "2.5") {

should be sufficient.

>   {
>     print "oh dear\n";
>     $ENV{LANG} = 'C';
>   } 
> }
> use Encode;
> print "Hello world\n";
> 
> But that fails too :-(
> 
> What is needed is not to process the use Encode - which triggers the 
> error, before I have a chance to fix it by setting to C. Or is that 
> not possible? 

If that code above (as corrected) doesn't work then I suspect it isn't
possible from within Perl: the setlocale call that picks up the bad
locale must happen too soon. (This would not surprise me, since perl is
*trying* to get a clean locale for doing number->string conversions.)

If your perl is built with usesitecustomize (perl -V:usesitecustomize)
then you could try using that, since that code is run extremely early. I
think the only documentation for this feature is in perlrun under -f
(which is the switch that disables it).

Other than that, you'll just have to arrange for LC_ALL=C to be set in
the external environment. If you use the 'extproc' #! substitute you
might be able to modify that line to set an environment variable.

Ben



------------------------------

Date: Wed, 8 Jan 2014 17:00:06 +0000 (UTC)
From: "Dave Saville" <dave@invalid.invalid>
Subject: Re: Question about language setting
Message-Id: <fV45K0OBJxbE-pn2-deCzdUwrgShs@paddington.bear.den>

On Wed, 8 Jan 2014 16:00:32 UTC, Ben Morrow <ben@morrow.me.uk> wrote:

> 
> Quoth "Dave Saville" <dave@invalid.invalid>:
> > 
> > The problem occurs when the decimal separator is a comma. So, going on
> > your previous suggestion,
> > 
> > use strict;
> > use warnings;
> > BEGIN:
> 
> This is not a BEGIN block. This is a label called BEGIN, which is not
> the same thing at all. Leave off the colon.

Thanks Ben, Not used BEGIN before and I guess I automaticllay typed a 
colon after a "label" :-) 


> 
> > {
> >   if ( sprintf("%f", 2.5) =~ m{\,} )
> 
> I suspect you don't need to go as far as sprintf; something like
> 
>     if (2.5 ne "2.5") {
> 
> should be sufficient.
> 
> >   {
> >     print "oh dear\n";
> >     $ENV{LANG} = 'C';
> >   } 
> > }
> > use Encode;
> > print "Hello world\n";
> > 
> > But that fails too :-(
> > 
> > What is needed is not to process the use Encode - which triggers the 
> > error, before I have a chance to fix it by setting to C. Or is that 
> > not possible? 
> 
> If that code above (as corrected) doesn't work then I suspect it isn't
> possible from within Perl: the setlocale call that picks up the bad
> locale must happen too soon. (This would not surprise me, since perl is
> *trying* to get a clean locale for doing number->string conversions.)
> 

I think it will now - a quick test works here. But I have a problem 
setting a test case environment to match that of the guy who first hit
the problem so I am mailing him test scripts to try.

> If your perl is built with usesitecustomize (perl -V:usesitecustomize)
> then you could try using that, since that code is run extremely early. I
> think the only documentation for this feature is in perlrun under -f
> (which is the switch that disables it).
> 

It's not.

> Other than that, you'll just have to arrange for LC_ALL=C to be set in
> the external environment. If you use the 'extproc' #! substitute you
> might be able to modify that line to set an environment variable.
> 

Last resort :-)

-- 
Regards
Dave Saville


------------------------------

Date: Wed, 08 Jan 2014 17:43:00 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Question about language setting
Message-Id: <8738kywesb.fsf@sable.mobileactivedefense.com>

"Dave Saville" <dave@invalid.invalid> writes:

[...]

> The problem occurs when the decimal separator is a comma. So, going on
> your previous suggestion,
>
> use strict;
> use warnings;
> BEGIN:
> {
>   if ( sprintf("%f", 2.5) =~ m{\,} )
>   {
>     print "oh dear\n";
>     $ENV{LANG} = 'C';
>   } 
> }
> use Encode;
> print "Hello world\n";

Considering

	It is exactly equivalent to

        BEGIN { require Module; Module->import( LIST ); }
	[perldoc -f use]

making that

BEGIN {
	if ( sprintf("%f", 2.5) =~ m{\,} )
	{
		print "It's an invasion!\n";
		$ENV{LANG} = 'C';
	}

        require Encode;
        Encode->import();
}

might make sense. Or possibly (untested)

BEGIN {
	local $ENV{LANG} = 'C' if sprintf("%f", 2.5) =~ m{\,};
        
        require Encode;
        Encode->import();
}

as this would restrict the modified environment to this block.
This could itself be put into a module, eg

package SafeEncode;

BEGIN {
	.
        .
        .
}

or

package SafeEncode;

sub import
{
	local $ENV{LANG} = 'C' if sprintf("%f", 2.5) =~ m{\,};

        require Encode;

        splice(@_, 0, 1, 'Encode');
        goto &Encode::import;
}

1;

to have a full-featured replacement module which could even be used to
import arbitrary 'not exported by default' symbols from Encode.



------------------------------

Date: Wed, 8 Jan 2014 19:48:07 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Question about language setting
Message-Id: <nkm0qa-muh1.ln1@anubis.morrow.me.uk>


Quoth "Dave Saville" <dave@invalid.invalid>:
> On Wed, 8 Jan 2014 16:00:32 UTC, Ben Morrow <ben@morrow.me.uk> wrote:
> > Quoth "Dave Saville" <dave@invalid.invalid>:
> > > 
> > > BEGIN:
> > 
> > This is not a BEGIN block. This is a label called BEGIN, which is not
> > the same thing at all. Leave off the colon.
> 
> Thanks Ben, Not used BEGIN before and I guess I automaticllay typed a 
> colon after a "label" :-) 

This may or may not help...

The magic blocks (BEGIN, END, CHECK, INIT, UNITCHECK) are actually subs.
Whenever Perl compiles a sub called BEGIN, instead of installing it in
the symbol table as usual it runs it immediately. (The others are pushed
onto internal lists to be run at the appropriate time.) So the 'real'
syntax is

    sub BEGIN { ... }

However, as a special hack (to make Perl look more like awk), you can
leave off the 'sub' for these special subs only. It's usual to do so,
since having multiple 'sub BEGIN's in a program would be rather
confusing.

Ben



------------------------------

Date: Wed, 8 Jan 2014 19:52:59 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Question about language setting
Message-Id: <rtm0qa-muh1.ln1@anubis.morrow.me.uk>


Quoth Rainer Weikusat <rweikusat@mobileactivedefense.com>:
> "Dave Saville" <dave@invalid.invalid> writes:
> 
> [...]
> 
> > The problem occurs when the decimal separator is a comma. So, going on
> > your previous suggestion,
> >
> > use strict;
> > use warnings;
> > BEGIN:
> > {
> >   if ( sprintf("%f", 2.5) =~ m{\,} )
> >   {
> >     print "oh dear\n";
> >     $ENV{LANG} = 'C';
> >   } 
> > }
> > use Encode;
> > print "Hello world\n";
> 
> Considering
> 
> 	It is exactly equivalent to
> 
>         BEGIN { require Module; Module->import( LIST ); }
> 	[perldoc -f use]
> 
> making that
> 
> BEGIN {
> 	if ( sprintf("%f", 2.5) =~ m{\,} )
> 	{
> 		print "It's an invasion!\n";
> 		$ENV{LANG} = 'C';
> 	}
> 
>         require Encode;
>         Encode->import();
> }

Given that BEGINs run in sequence this is exactly equivalent to the
'use'.

> might make sense. Or possibly (untested)
> 
> BEGIN {
> 	local $ENV{LANG} = 'C' if sprintf("%f", 2.5) =~ m{\,};
>         
>         require Encode;
>         Encode->import();
> }

This would be sensible if Encode were the only module affected. But the
evidence is that all number-to-string conversions are affected, so the
environment variable should be set as early as possible and remain set.

> sub import
> {
> 	local $ENV{LANG} = 'C' if sprintf("%f", 2.5) =~ m{\,};
> 
>         require Encode;
> 
>         splice(@_, 0, 1, 'Encode');
>         goto &Encode::import;

Encode uses Exporter, so there's no need for that nastiness:

    Encode->Exporter::export(scalar caller);

Ben



------------------------------

Date: Wed, 08 Jan 2014 20:21:42 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Question about language setting
Message-Id: <87y52qusvd.fsf@sable.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Rainer Weikusat <rweikusat@mobileactivedefense.com>:

[...]

>> BEGIN {
>> 	if ( sprintf("%f", 2.5) =~ m{\,} )
>> 	{
>> 		print "It's an invasion!\n";
>> 		$ENV{LANG} = 'C';
>> 	}
>> 
>>         require Encode;
>>         Encode->import();
>> }
>
> Given that BEGINs run in sequence this is exactly equivalent to the
> 'use'.

Semantically, yes. But in this case, all the code which logically
belongs together is contained in the begin block.

>> might make sense. Or possibly (untested)
>> 
>> BEGIN {
>> 	local $ENV{LANG} = 'C' if sprintf("%f", 2.5) =~ m{\,};
>>         
>>         require Encode;
>>         Encode->import();
>> }
>
> This would be sensible if Encode were the only module affected. But the
> evidence is that all number-to-string conversions are affected, so the
> environment variable should be set as early as possible and remain
> set.

It's the purpose of the locale setting to affect numerical
formatting. Hence, if it has to be disabled/ overridden somewhere in
order to avoid a bug, this override should affect the codepath
triggering the bug, not any other, perfectly harmless one which happens
to format a number (or do something else which is influenced by the
locale).

>> sub import
>> {
>> 	local $ENV{LANG} = 'C' if sprintf("%f", 2.5) =~ m{\,};
>> 
>>         require Encode;
>> 
>>         splice(@_, 0, 1, 'Encode');
>>         goto &Encode::import;
>
> Encode uses Exporter, so there's no need for that nastiness:
>
>     Encode->Exporter::export(scalar caller);

This is a perfectly normal and documented way to invoke a subroutine
after some other processing has been performed without the subroutine
being able to notice that an intermediate subroutine ran, cf

	The "goto-&NAME" form is quite different from the other forms of
        "goto".  In fact, it isn't a goto in the normal sense at all,
        and doesn't have the stigma associated with other gotos.
        Instead, it exits the current subroutine (losing any changes set
        by local()) and immediately calls in its place the named
        subroutine using the current value of @_.  This is used by
        "AUTOLOAD" subroutines that wish to load another subroutine and
        then pretend that the other subroutine had been called in the
        first place (except that any modifications to @_ in the current
        subroutine are propagated to the other subroutine.)

But this will kill the local (I didn't think about that), hence, it
won't work in this case. Apart from that, you're absolutely free to
cultivate a philosophical dislike for any particular Perl feature (and
to argument against it) and everyone else is as perfectly free to
consider your opinion misguided and the arguments in favor of it
unconvincing.


------------------------------

Date: Wed, 8 Jan 2014 22:59:48 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: Question about language setting
Message-Id: <lakl94$ane$1@reader1.panix.com>

In article <nkm0qa-muh1.ln1@anubis.morrow.me.uk>,
Ben Morrow  <ben@morrow.me.uk> wrote:
>The magic blocks (BEGIN, END, CHECK, INIT, UNITCHECK) are actually subs.
>Whenever Perl compiles a sub called BEGIN, instead of installing it in
>the symbol table as usual it runs it immediately. (The others are pushed
>onto internal lists to be run at the appropriate time.)

I just ran

$ perl -w -e 'use strict;BEGIN {print "hi\n";}  print "real\n"; BEGIN();print "end\n"'
hi
real
end

"sub" before "BEGIN" does not change the behavior.  The same happens
for CHECK, INIT, and UNITCHECK.  For END, the END() call similarly
does nothing, so it's real, end, hi.

So it appears to me that they're far from real subs:
- they do allow the "sub" keyword
- they have code blocks, but that's not unique to subs
- they are invoked automatically
- you can define them multiple times without "Subroutine ___
  redefined", but unlike subs, the code blocks are concatenated rather
  than replaced
- calling them neither causes "Undefined subroutine" nor causes code
  to run
- you can stringize \&BEGIN and get "CODE(0xbb9455c0)" or whatever,
  but if you try to call any of them, you get "Undefined subroutine
  &main::BEGIN called" vel sim.

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Wed, 8 Jan 2014 23:25:54 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: Question about language setting
Message-Id: <lakmq2$a23$1@reader1.panix.com>

To clarify,

In article <lakl94$ane$1@reader1.panix.com>,
Tim McDaniel <tmcd@panix.com> wrote:
>- calling them neither causes "Undefined subroutine" nor causes code
>  to run

I meant "calling them directly like 'BEGIN();'".

>- you can stringize \&BEGIN and get "CODE(0xbb9455c0)" or whatever,
>  but if you try to call any of them, you get "Undefined subroutine
>  &main::BEGIN called" vel sim.

I meant "calling them via a reference like 'my $x = \&BEGIN; $x->();'".

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Thu, 9 Jan 2014 01:59:43 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Question about language setting
Message-Id: <fdc1qa-ecl1.ln1@anubis.morrow.me.uk>


Quoth tmcd@panix.com:
> In article <nkm0qa-muh1.ln1@anubis.morrow.me.uk>,
> Ben Morrow  <ben@morrow.me.uk> wrote:
> >The magic blocks (BEGIN, END, CHECK, INIT, UNITCHECK) are actually subs.
> >Whenever Perl compiles a sub called BEGIN, instead of installing it in
> >the symbol table as usual it runs it immediately. (The others are pushed
> >onto internal lists to be run at the appropriate time.)
> 
> I just ran
> 
> $ perl -w -e 'use strict;BEGIN {print "hi\n";}  print "real\n";
> BEGIN();print "end\n"'
> hi
> real
> end
> 
> "sub" before "BEGIN" does not change the behavior.  The same happens
> for CHECK, INIT, and UNITCHECK.  For END, the END() call similarly
> does nothing, so it's real, end, hi.
> 
> So it appears to me that they're far from real subs:
> - they do allow the "sub" keyword
> - they have code blocks, but that's not unique to subs

    - when invoked they get a stack frame visible to caller(), with the
      right sub name (and this stack frame is internally marked as a
      'subroutine' frame, though this is only indirectly visible from
      Perl)

(There is also an outer eval {} frame; this is added as part of the
invoke-a-special-block logic, so that perl can throw the 'FOO failed--
compilation/call queue aborted' error if one of them fails.)

    - when inside one of these blocks the __SUB__ magic token returns a
      reference to the block, not the surrounding sub or file

    - if the debugger is running an (implicit) call to one of these
      blocks will invoke &DB::sub, just like an ordinary sub

    - exiting one of these blocks with 'last' &c. gives an 'Exiting
      subroutine via last' warning

> - they are invoked automatically
> - you can define them multiple times without "Subroutine ___
>   redefined", but unlike subs, the code blocks are concatenated rather
>   than replaced

These are implied by 'they are not installed into the symbol table but
are pushed onto internal lists to be invoked at the proper time'. If you
define an ordinary named sub, remove it from the symbol table (with
undef &sub or by deleting the symbol and replacing it), and then define
it again, you don't get the warning either.

> - calling them neither causes "Undefined subroutine" nor causes code
>   to run

This surprised me, actually; it turns out it's a (probably unexpected)
side-effect of being allowed to omit the 'sub' keyword. This statement:

    BEGIN();

is converted by the lexer into this:

    sub BEGIN();

which is a forward-declaration of a sub called 'BEGIN' with a ()
prototype. It isn't a sub call at all, and in fact if you force it to be
a sub call with
    
    &BEGIN();

you do get the 'undefined subroutine' error. Also, if you then create an
ordinary BEGIN block, like this:

    BEGIN();
    BEGIN { 1; }

you get a 'prototype mismatch' error (so don't do that :) ).

> - you can stringize \&BEGIN and get "CODE(0xbb9455c0)" or whatever,
>   but if you try to call any of them, you get "Undefined subroutine
>   &main::BEGIN called" vel sim.

You can take a ref to any not-defined sub; the sub is autovivified as
though a forward-declaration but no definition had been seen. You can in
fact add a definition, as long as it doesn't look like a named sub
definition:

    my $x = \&BEGIN;
    *BEGIN = sub { say "foo" };
    $x->();

(Note that this sub isn't a BEGIN block, and will never be invoked as
one, not even if you call it BEGIN using Sub::Name.)

Ben



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4112
***************************************


home help back first fref pref prev next nref lref last post