[32627] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3902 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Mar 15 09:09:15 2013

Date: Fri, 15 Mar 2013 06:09:02 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 15 Mar 2013     Volume: 11 Number: 3902

Today's topics:
    Re: bytes, English, and prototypes <ben@morrow.me.uk>
    Re: Imager::QRCode-ing octet sequences vs. zbarimg(1) <oneingray@gmail.com>
    Re: prototypes? <ben@morrow.me.uk>
    Re: prototypes? <rweikusat@mssgmbh.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 14 Mar 2013 20:59:25 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: bytes, English, and prototypes
Message-Id: <daq91a-m1c2.ln1@anubis.morrow.me.uk>


Quoth Ivan Shmakov <oneingray@gmail.com>:
> >>>>> Ben Morrow <ben@morrow.me.uk> writes:
> >>>>> Quoth Ivan Shmakov <oneingray@gmail.com>:
> >>>>> Ben Morrow <ben@morrow.me.uk> writes:
> >>>>> Quoth Ivan Shmakov <oneingray@gmail.com>:
> 
>  >>> You should not use 'bytes'.  It doesn't ever do anything useful and
>  >>> sometimes lets you look at parts of the perl internals you
>  >>> shouldn't be looking at.
> 
>  >> Indeed, I've read the documentation.  It was my understanding that,
>  >> in the nutshell, the "bytes" pragma makes Perl operate strictly on
>  >> octet sequences for its strings, instead of allowing either strings
>  >> of octets /or/ strings of Unicode characters.
> 
>  > That was the original idea, in the 5.6 days, but it never worked like
>  > that in practice because it turns out to be basically impossible to
>  > prevent character strings from leaking in to your 'use bytes'
>  > sections.
> 
>  > If you don't want strings containing characters above 0xff, don't
>  > create them.  'bytes' doesn't gain you anything in that case.
> 
> 	ACK.  I've got three more questions, however:
> 
> 	* how do I ensure that a value passed to my function is an octet
> 	  sequence? (IOW, doesn't contain a code over \xFF);

    $value =~ tr/\0-\xff//dc;

> 	* how do I ensure that a non-ASCII octet is never considered to
> 	  be a member of, say, the [[:alpha:]] set? as in the following
> 	  code (although, perhaps, of questionable value):

For this you need 5.14, which has a /a regex modifier to do exactly
that. Alternatively you can just use [a-zA-Z] instead; most of the POSIX
character classes are not exactly complicated to specify explicitly.

> 	* is the "It breaks encapsulation" comment in bytes(3perl)
> 	  really justified? if the function in question was designed to
> 	  operate on octet sequences, and not character strings, then
> 	  it's an error for the caller to supply it a character string
> 	  in the first place.

Perl strings have an internal flag called SvUTF8. If this flag is set,
the string is stored internally in UTF-8; in particular, characters
between 0x80 and 0xff are stored as two bytes. Perl will change the
representation of a string as it sees fit; just because a string
currently happens to only contain characters below 0xff that doesn't
mean SvUTF8 is off. 

In the context of 'use bytes', the string operators all operate on the
internal representation, so you will get different results depending on
whether the string happens to have SvUTF8 set or not. So if a function
happens to return a string with SvUTF8 set, even if that string
currently only contains bytes, you may get the wrong results:

    # in some other module
    sub foo {
        my $x = "\x80\x{100}";
        chop $x;
        return $x;
    }

    use bytes;
    warn length foo();

Also, it is possible to corrupt the internal representation:

    use Devel::Peek;
    use bytes;

    my $x = foo();  # as above
    $x =~ s/.$//;
    Dump $x;

If you could be sure your *entire* program ran under 'bytes', this
probably wouldn't be a problem, but as soon as you call any other code
you can't be sure of that.

>  >>>> use English;
> 
>  >>> You should not use English, it makes your code harder to read for
>  >>> anyone who knows Perl, and teaches you bad habits.
[snippety snip]
> 
> 	I've seen "solutions" to this kind of "problem," such as those
> 	implemented by the designers of Python and Go.  And the only
> 	thing that comes to my mind is the old saying (paraphrased):
> 	"If programmers are so smart, why aren't they walking in
> 	formation?"

OK, whatever. If you post code here which uses English people are likely
to find it harder to read than if it didn't, so you are less likely to
get useful help.

>  >> Is there a practical reason to forgo the compile-time arguments'
>  >> type checking they offer?
> 
>  > They do more than that: they change the context of the parameters to
>  > the call, which is (usually) entirely unexpected:
> 
>  > sub foo (;$$) { say $_[0], $_[1]; }
> 
>  > my @x = ("a", "b");
>  > foo(@x);
> 
> 	Impressive!  (Although I've had to "use feature qw (say);".)

Oh, yeah, sorry; most of what I post is under 'use 5.012;'.

Ben



------------------------------

Date: Thu, 14 Mar 2013 20:25:52 +0000
From: Ivan Shmakov <oneingray@gmail.com>
Subject: Re: Imager::QRCode-ing octet sequences vs. zbarimg(1)
Message-Id: <87mwu5k9tb.fsf@violet.siamics.net>

>>>>> Ben Morrow <ben@morrow.me.uk> writes:

[...]

 > There is a Perl decoder based on zbar (Barcode::ZBar), though
 > presumably it would behave the same as zbarimg.

	... Or it may not.  It definitely worths checking out.

[...]

 > So you have a UTF-8 problem somewhere.  (c2 and c3 (or  and Ã)
 > showing up unexpectedly is the giveaway here.)  Looking at the code,
 > I think it's zbar which is converting 8859-1 to UTF-8; one way to
 > test this is to create a QR code containing 17 0xffs at ECC level L;
 > this is the maximum number of characters that will fit into a 21x21
 > QR code, so if the code comes out bigger than that you know there are
 > extra bytes in there somewhere.

	ACK, thanks!  With qw (level L  margin 0  size 2) being added to
	the parameters, the code now gives (also using $ zbarimg --raw):

Blob:     ffffffffffffffffffffffffffffffffff
Image:    42 by 42
Decoded:  c3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bfc3bf0a
scanned 1 barcode symbols from 1 images in 0.05 seconds

	Thus, unless there's some magic in the resulting QR code saying
	that it's an ISO-8859-1-encoded string (I'm not familiar with QR
	encoding, so can't tell if it's a sensible guess), zbarimg(1),
	is indeed to blame, and perhaps the underlying library, too.

 > However, it's not unlikely that other QR code readers will do similar
 > conversions to UTF-8, or other stupid things.  Depending on what
 > you're doing it might be safer to explicitly UTF-8-encode your data
 > (all 8-bit data can be represented in UTF-8) and then decode it on
 > the other end.  Of course, this will make the codes a little larger
 > than they need to be.

	In this case, there'd indeed be some benefit from using the
	smallest-possible image.  OTOH, I do not expect for the problem
	of interoperability to arise anytime soon.

[...]

-- 
FSF associate member #7257


------------------------------

Date: Thu, 14 Mar 2013 20:23:31 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: prototypes?
Message-Id: <37o91a-nfb2.ln1@anubis.morrow.me.uk>


Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
> Ben Morrow <ben@morrow.me.uk> writes:
> > Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
> >> 
> >> I searched for this a while ago, after stumbling
> >> over all kinds of other 'strange modifications' of texts I remembered,
> >> ultimatively triggered by the descision to remove the various 'OO
> >> tutorial texts' from the Perl distribution in favor of strongly
> >> suggesting that no one should want to learn about Perl OO, that
> >> people who use it are not exactly sane (something like 'You can find
> >> the reference documentation in ..., in case you have to maintain code
> >> written in this style') and that everybody should just download this
> >> or that (or maybe another) CPAN module and didn't find
> >> it. Consequently, I assumed that it had been removed because of
> >> 'political incorrectness' as well.
> >
> > Again, this is manifest nonsense: nothing has been removed.
> 
> This is, as a already posted here some time ago, what the perl 5.16.0
> changes document has been claiming since 2012, the corresponding text
> is
> 
> 	Removed Documentation
> 	Old OO Documentation
> 
> 	The old OO tutorials, perltoot, perltooc, and perlboot, have
> 	been removed. The perlbot (bag of object tricks) document has
> 	been removed as well.

Hmm. Interesting; I see what's happened: the file pod/perltoot.pod is
still there, but it is now just a stub. Sorry, I hadn't realised that.

> > Please stop spreading falsehoods without making at least a cursory
> > attempt to check your facts.
> 
> Dito.

I made a cursory attempt: I checked the file was still there in blead,
and while doing so I found the new perlootut, which I hadn't seen
before. I didn't think to check if it had been changed.

Ben



------------------------------

Date: Thu, 14 Mar 2013 21:31:12 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: prototypes?
Message-Id: <87d2v17jof.fsf@sapphire.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
>> Ben Morrow <ben@morrow.me.uk> writes:
>> > Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:

[...]

>>>> the descision to remove the various 'OO
>>>> tutorial texts' from the Perl distribution

[...]

>>> Again, this is manifest nonsense: nothing has been removed.
>> 
>> This is, as a already posted here some time ago, what the perl 5.16.0
>> changes document has been claiming since 2012, the corresponding text
>> is
>> 
>> 	Removed Documentation
>> 	Old OO Documentation
>> 
>> 	The old OO tutorials, perltoot, perltooc, and perlboot, have
>> 	been removed. The perlbot (bag of object tricks) document has
>> 	been removed as well.
>
> Hmm. Interesting; I see what's happened: the file pod/perltoot.pod is
> still there, but it is now just a stub. Sorry, I hadn't realised that.
>
>> > Please stop spreading falsehoods without making at least a cursory
>> > attempt to check your facts.
>> 
>> Dito.
>
> I made a cursory attempt: I checked the file was still there in
> blead,

What I was referring to was something like this,

http://philosophy.lander.edu/oriental/charity.html

or this

http://en.wikipedia.org/wiki/Principle_of_humanity

which could be pulled together more succinctly as 'Someone being
convinced of $something which seems totally wrong could mean that a
piece of information necessary for understanding $something correctly
is missing'.


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3902
***************************************


home help back first fref pref prev next nref lref last post