[31304] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 2549 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Aug 13 21:09:43 2009

Date: Thu, 13 Aug 2009 18:09:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 13 Aug 2009     Volume: 11 Number: 2549

Today's topics:
    Re: end-of-line conventions <ben@morrow.me.uk>
    Re: end-of-line conventions <tadmc@seesig.invalid>
    Re: end-of-line conventions <no.email@please.post>
    Re: end-of-line conventions <ben@morrow.me.uk>
    Re: end-of-line conventions <heiko@hexco.de>
    Re: end-of-line conventions <smallpond@juno.com>
    Re: end-of-line conventions <ben@morrow.me.uk>
    Re: end-of-line conventions <nat.k@gm.ml>
    Re: Function prototype (Tim McDaniel)
    Re: Function prototype <ben@morrow.me.uk>
    Re: more than one statement in a post perlish condition <nat.k@gm.ml>
    Re: overridden method in perltoot <cmic@live.fr>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 13 Aug 2009 21:03:39 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: end-of-line conventions
Message-Id: <r58el6-283.ln1@osiris.mauzo.dyndns.org>


Quoth kj <no.email@please.post>:
> 
> There are three major conventions for the end-of-line marker:
> "\n", "\r\n", and "\r".
> 
> In a variety of situation, Perl must split strings into "lines",
> and must therefore follow a particular convention to identify line
> boundaries.  There are three situations that interest me in
> particular:  1. the splitting into lines that happens when one
> iterates over a file using the <> operator; 2. the meaning of the
> operation performed by chomp; and 3. the meaning of the $ anchor
> in regular expressions.
> 
> These three issues are tested by the following simple script:
> 
> my $lines = my $matches = 0;
> while (<>) {
>   $lines++;
>   if (/z$/) {
>     $matches++;
>     chomp;
>     print ">$_<";
>   }
> }
> 
> print "$/$matches matches out of $lines lines$/";
> __END__
> 
> I have three files, unix.txt, dos.txt, and mac.txt, each containing
> four lines.  Disregarding the end-of-line character(s) these lines
> are "foo", "bar", "baz", "frobozz".
> 
> The file unix.txt uses "\n" to separate the lines.  The output that
> I get when I pass it as the argument to the script is this:
> 
> % demo.pl unix.txt
> >baz<>frobozz<
> 2 matches out of 4 lines
> 
> The file dos.txt uses "\r\n" to separate lines, and the file mac.txt
> uses "\r".  Here's the output I get when I pass these files to the
> script:
> 
> % demo.pl dos.txt
> 
> 0 matches out of 4 lines
> % demo.pl mac.txt
> 
> 0 matches out of 1 lines
> 
> How can I change the script so that the output for unix.txt, dos.txt,
> and mac.txt will be the same as the one shown above for unix.txt?

I would use PerlIO::eol, but I'm not sure how to integrate that into a
script using magic <>. It's possible that something like

    BEGIN { binmode ARGV, ":raw:eol" }

will work; if not, you will need to loop over @ARGV and open the files
with the :eol layer yourself. (You could, I suppose, use

    use open ":std", ":raw:eol";

but that will affect all filehandles in your program.)

Ben



------------------------------

Date: Thu, 13 Aug 2009 15:27:30 -0500
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: end-of-line conventions
Message-Id: <slrnh88t9r.nog.tadmc@tadmc30.sbcglobal.net>

kj <no.email@please.post> wrote:
>
>
> Subject: end-of-line conventions


Have you read the "Newlines" section in

    perldoc perlport

??


> There are three major conventions for the end-of-line marker:
> "\n", "\r\n", and "\r".
>
> In a variety of situation, Perl must split strings into "lines",
> and must therefore follow a particular convention to identify line
> boundaries.  


perl detects its platform when it is *compiled*.

That is, perl decides what line ending to use when it is built.


> The file dos.txt uses "\r\n" to separate lines, and the file mac.txt
> uses "\r". 

> How can I change the script so that the output for unix.txt, dos.txt,
> and mac.txt will be the same as the one shown above for unix.txt?


You can't.


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Thu, 13 Aug 2009 20:53:43 +0000 (UTC)
From: kj <no.email@please.post>
Subject: Re: end-of-line conventions
Message-Id: <h61ugm$fqn$1@reader1.panix.com>

In <slrnh88t9r.nog.tadmc@tadmc30.sbcglobal.net> Tad J McClellan <tadmc@seesig.invalid> writes:

>kj <no.email@please.post> wrote:
>>
>>
>> Subject: end-of-line conventions


>Have you read the "Newlines" section in

>    perldoc perlport

>??


>> There are three major conventions for the end-of-line marker:
>> "\n", "\r\n", and "\r".
>>
>> In a variety of situation, Perl must split strings into "lines",
>> and must therefore follow a particular convention to identify line
>> boundaries.  


>perl detects its platform when it is *compiled*.

>That is, perl decides what line ending to use when it is built.


>> The file dos.txt uses "\r\n" to separate lines, and the file mac.txt
>> uses "\r". 

>> How can I change the script so that the output for unix.txt, dos.txt,
>> and mac.txt will be the same as the one shown above for unix.txt?


>You can't.


Mind-blowing, to say the least...

Oh, well.  Live and lurn.  Thanks.  And to Ben too.

kynn


------------------------------

Date: Thu, 13 Aug 2009 21:59:50 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: end-of-line conventions
Message-Id: <6fbel6-7541.ln1@osiris.mauzo.dyndns.org>


Quoth Tad J McClellan <tadmc@seesig.invalid>:
> kj <no.email@please.post> wrote:
> >
> > There are three major conventions for the end-of-line marker:
> > "\n", "\r\n", and "\r".
> >
> > In a variety of situation, Perl must split strings into "lines",
> > and must therefore follow a particular convention to identify line
> > boundaries.  
> 
> 
> perl detects its platform when it is *compiled*.
> 
> That is, perl decides what line ending to use when it is built.

This isn't strictly true. The C compiler used determines what numeric
values to associate with the characters "\n" and "\r"; on all non-EBCDIC
non-Mac-OS-Classic systems (including Win32 and Mac OS X) they are 10
and 13 respectively. 

With modern perls (certainly since 5.8.0; I'm not sure what happened
with 5.6) perl decides at build time what default PerlIO layers to use;
on Win32 (and some other systems) this will include :crlf, which
translates "\r\n" newlines into "\n" on input and vice-versa on output.
Internally perl always considers a newline to be whatever the C compiler
calls "\n". (Presumably this means Perl on OS X can't read Mac-native
"\r"-separated files without help.) 

This default can be changed, in several ways. It can be changed for
individual filehandles with binmode; for a given lexical scope with the
'open' pragma; and for the whole process by running perl with the PERLIO
environment variable set. I would *always* recommend that anyone wanting
to read text files either sets all filehandles to :raw mode and handles
newlines manually or uses the :eol PerlIO layer. IMHO the :crlf layer is
not useful, and trying to do anything clever with it can have very
surprising results.

Ben



------------------------------

Date: 13 Aug 2009 21:13:17 GMT
From: =?iso-8859-2?Q?Heiko_Ei=DFfeldt?= <heiko@hexco.de>
Subject: Re: end-of-line conventions
Message-Id: <4a8481ec$0$30221$9b4e6d93@newsspool1.arcor-online.net>

kj wrote:

> There are three major conventions for the end-of-line marker:
> "\n", "\r\n", and "\r".

These notations are not unambigious! See perlport documentation section
newlines for details.

> In a variety of situation, Perl must split strings into "lines",
> and must therefore follow a particular convention to identify line
> boundaries.  There are three situations that interest me in
> particular:  1. the splitting into lines that happens when one
> iterates over a file using the <> operator; 2. the meaning of the
> operation performed by chomp; and 3. the meaning of the $ anchor
> in regular expressions.

<> and chomp use the $/ variable for line endings. Since $/ does not
support regular expressions, you cannot use this mechanism for all
types of line endings.

The $ anchor normally is just the end of the string (with or without an
line ending).

> How can I change the script so that the output for unix.txt, dos.txt,
> and mac.txt will be the same as the one shown above for unix.txt?

use strict;
use warnings;

my $lines = my $matches = 0;
{
  local $/ = undef;
  for (<> =~ m{\G([^\012\015]*) \015?\012?}xmsg) {
    $lines++;
    if (/z$/) {
      $matches++;
      print ">$_<";
    }
  }
}
print "\n$matches matches out of $lines lines\n";
__END__

This uses <> with no line end definition, and iterates with a regular
expression suitable for three types of line endings. The line ending is
not included in $_, so chomp is omitted.

If you need the line endings in $_ use the following lines.
  for (<> =~ m{\G([^\012\015]* \015?\012?)}xmsg) {
    $lines++;
    if (/z\s*$/) {
      $matches++;
      s{[\015\012][\015\012]?}{}xms;	# chomp replacement

Hope that helps, heiko


------------------------------

Date: Thu, 13 Aug 2009 18:24:22 -0400
From: Steve C <smallpond@juno.com>
Subject: Re: end-of-line conventions
Message-Id: <h624b5$lrm$1@news.eternal-september.org>

kj wrote:
> There are three major conventions for the end-of-line marker:
> "\n", "\r\n", and "\r".
> 
> In a variety of situation, Perl must split strings into "lines",
> and must therefore follow a particular convention to identify line
> boundaries.  There are three situations that interest me in
> particular:  1. the splitting into lines that happens when one
> iterates over a file using the <> operator; 2. the meaning of the
> operation performed by chomp; and 3. the meaning of the $ anchor
> in regular expressions.
> 
> These three issues are tested by the following simple script:
> 
> my $lines = my $matches = 0;
> while (<>) {
>   $lines++;
>   if (/z$/) {
>     $matches++;
>     chomp;
>     print ">$_<";
>   }
> }
> 
> print "$/$matches matches out of $lines lines$/";
> __END__
> 
> I have three files, unix.txt, dos.txt, and mac.txt, each containing
> four lines.  Disregarding the end-of-line character(s) these lines
> are "foo", "bar", "baz", "frobozz".
> 
> The file unix.txt uses "\n" to separate the lines.  The output that
> I get when I pass it as the argument to the script is this:
> 
> % demo.pl unix.txt
>> baz<>frobozz<
> 2 matches out of 4 lines
> 
> The file dos.txt uses "\r\n" to separate lines, and the file mac.txt
> uses "\r".  Here's the output I get when I pass these files to the
> script:
> 
> % demo.pl dos.txt
> 
> 0 matches out of 4 lines
> % demo.pl mac.txt
> 
> 0 matches out of 1 lines
> 
> How can I change the script so that the output for unix.txt, dos.txt,
> and mac.txt will be the same as the one shown above for unix.txt?
> 

Since "\n" eq "\012" on unix, you ought to be able to
do something like this to be the same on all platforms:

my $lines = my $matches = 0;

$/ = "\012";
binmode STDIN;
binmode STDOUT;

while (<>) {
   $lines++;
   if (/z\012/) {
     $matches++;
     s/\012//g;
     print ">$_<";
   }
}

print "$/$matches matches out of $lines lines$/";
__END__


------------------------------

Date: Fri, 14 Aug 2009 00:19:12 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: end-of-line conventions
Message-Id: <gkjel6-9u41.ln1@osiris.mauzo.dyndns.org>


Quoth Steve C <smallpond@juno.com>:
> 
> Since "\n" eq "\012" on unix, you ought to be able to
> do something like this to be the same on all platforms:
> 
> my $lines = my $matches = 0;
> 
> $/ = "\012";
> binmode STDIN;
> binmode STDOUT;
> 
> while (<>) {
>    $lines++;
>    if (/z\012/) {
>      $matches++;
>      s/\012//g;
>      print ">$_<";
>    }
> }
> 
> print "$/$matches matches out of $lines lines$/";
> __END__

Did you try it? This completely fails with "\r"-separated files, and
fails to match any lines with "\r\n"-separated files.

Ben



------------------------------

Date: Thu, 13 Aug 2009 16:33:26 -0700
From: Nathan Keel <nat.k@gm.ml>
Subject: Re: end-of-line conventions
Message-Id: <bp1hm.143047$zq1.75521@newsfe22.iad>

kj wrote:

> 
> Mind-blowing, to say the least...
> 
> Oh, well.  Live and lurn.  Thanks.  And to Ben too.
> 
> kynn

Don't worry, use a real OS (not Windows) and you'll not have to think
about these things, though they are easily dealt with, and you'll have
a lot more benefits as well.


------------------------------

Date: Thu, 13 Aug 2009 21:48:43 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: Function prototype
Message-Id: <h621nr$9te$1@reader1.panix.com>

As for "using Perl 5.005 in production": that's the default Perl on
Panix (my ISP) in its shell accounts.  It's not production, and I
recally really don't have control.  I can't even convince Panix to
install Perl 5.010.

In article <c3tbl6-iq3.ln1@osiris.mauzo.dyndns.org>,
Ben Morrow  <ben@morrow.me.uk> wrote:
>
>Quoth tmcd@panix.com:
>> In article <luvuk6-m9n2.ln1@osiris.mauzo.dyndns.org>,
>> Ben Morrow  <ben@morrow.me.uk> wrote:
>> >There's no need for the '\': $_[0] is passed by reference.
>> 
>> Not if the actual argument is an expression.  Then Perl creates a
>> temporary variable and passes a reference to that, so any assignments
>> to it are silently discarded after the sub ends.
>
>Not true. Perl passes an alias to the result of the expression, so you
>can perfectly well assign to $_[0] and change the variable passed:

I didn't SAY "variable"!  I said "if ... expression", by which I meant
the general case of an expression, not a variable specifically.
(Yes, a variable is an expression, but variables like $v were treated
like expressions like $v+2, then Perl would not have pass-by-reference
semantics in any such situation, and I would not have written "if".)

>    ~% perl -E'sub foo ($) { $_[0] = 2 } my $x; foo($x); say $x'
>    2

Of course.  I showed much the same thing as an example.

>    ~% perl -E'sub foo ($) { $_[0] = 2 } foo(1); say $x'
>    Modification of read-only value attempted at -e line 1.

And to repeat my previous example,

$ perl -e 'sub t2($) { $_[0] = 1; print $_[0], "\n";} my $v = -1; t2($v + 2); print $v, "\n"; '
1
-1
$ perl -e 'print $], "\n"'
5.010000

The argument is the expression "$v + 2", and like I wrote, Perl
silently discards the assignment after the sub ends.  (That is, it's
assigning to a temporary variable.)  The statement in "man perlsub",

    In particular, if an element $_[0] is updated, the corresponding
    argument is updated (or an error occurs if it is not updatable).

is false.

>(the ($) prototype has no effect in any of these cases).

Right.  It's my reflex.

>> By contrast, a prototype of \$ errors out if the argument is not a
>> scalar variable (expressions, even array elements, are forbidden)
>> and if prototypes are being checked.
>
>Array and hash elements *are* allowed,

I swear I tested it ... Ah.  It depends on *where* I test it.  5.005
says

$ perl -e 'sub t2(\$) { ${$_[0]} = 1; print ${$_[0]}, "\n";} my @v = (-1); t2($v[0]); print $v[0], "\n"; '
Type of arg 1 to main::t2 must be scalar (not array element) at -e line 1, near "])"
Execution of -e aborted due to compilation errors.

But it works in Perl 5.008 and later.

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Thu, 13 Aug 2009 23:31:20 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Function prototype
Message-Id: <oqgel6-2p41.ln1@osiris.mauzo.dyndns.org>


Quoth tmcd@panix.com:
> As for "using Perl 5.005 in production": that's the default Perl on
> Panix (my ISP) in its shell accounts.  It's not production, and I
> recally really don't have control.  I can't even convince Panix to
> install Perl 5.010.

Get a new ISP. There's really nothing else for it if you want to use
Perl.

> In article <c3tbl6-iq3.ln1@osiris.mauzo.dyndns.org>,
> Ben Morrow  <ben@morrow.me.uk> wrote:
> >
> >Quoth tmcd@panix.com:
> >> In article <luvuk6-m9n2.ln1@osiris.mauzo.dyndns.org>,
> >> Ben Morrow  <ben@morrow.me.uk> wrote:
> >> >There's no need for the '\': $_[0] is passed by reference.
> >> 
> >> Not if the actual argument is an expression.  Then Perl creates a
> >> temporary variable and passes a reference to that, so any assignments
> >> to it are silently discarded after the sub ends.
> >
> >Not true. Perl passes an alias to the result of the expression, so you
> >can perfectly well assign to $_[0] and change the variable passed:
> 
> I didn't SAY "variable"!  I said "if ... expression", by which I meant
> the general case of an expression, not a variable specifically.
> (Yes, a variable is an expression, but variables like $v were treated
> like expressions like $v+2, then Perl would not have pass-by-reference
> semantics in any such situation, and I would not have written "if".)

If you pass an expression that is normally assignable-to, the assignment
sticks:

    ~% perl -E'sub foo { $_[0] = "x" } my $x = "abc"; 
        foo substr $x, 0, 1; say $x'
    xbc
    ~%

Such expressions are disallowed by a \$ prototype, which is why I don't
like them.

> >    ~% perl -E'sub foo ($) { $_[0] = 2 } foo(1); say $x'
> >    Modification of read-only value attempted at -e line 1.
> 
> And to repeat my previous example,
> 
> $ perl -e 'sub t2($) { $_[0] = 1; print $_[0], "\n";} my $v = -1; t2($v
> + 2); print $v, "\n"; '
> 1
> -1
> $ perl -e 'print $], "\n"'
> 5.010000

Odd. I would call this a bug. It seems that the 'Can't modify addition
(+) in list assignment' error you would get from a plain

    ($v + 2) = 1;

fires at compile time, and the actual result of the '$v + 2' is not
marked readonly. IMHO it should be (except in odd cases such as $v
having an overloaded '+' that does, in fact, return an lvalue).

It's possible to observe this effect without using subs:

    ~% perl -E'my $x = 1; ${ \( $x + 2 ) } = 3; say $x'
    1
    ~%

so it's not so much a bug in pass-by-reference as it is in general
expression semantics.

> The argument is the expression "$v + 2", and like I wrote, Perl
> silently discards the assignment after the sub ends.  (That is, it's
> assigning to a temporary variable.)  The statement in "man perlsub",
> 
>     In particular, if an element $_[0] is updated, the corresponding
>     argument is updated (or an error occurs if it is not updatable).
> 
> is false.

The section in brackets is arguably false, yes. Strictly speaking it
should say '(or an error occurs if it is SvREADONLY)', but that's too
much implementation detail for user documentation.

> >> By contrast, a prototype of \$ errors out if the argument is not a
> >> scalar variable (expressions, even array elements, are forbidden)
> >> and if prototypes are being checked.
> >
> >Array and hash elements *are* allowed,
> 
> I swear I tested it ... Ah.  It depends on *where* I test it.  5.005
> says

Well, yeah. 5.005 is deeply buggy, which is why you shouldn't be using
it.

Ben



------------------------------

Date: Thu, 13 Aug 2009 16:31:14 -0700
From: Nathan Keel <nat.k@gm.ml>
Subject: Re: more than one statement in a post perlish condition
Message-Id: <6n1hm.143046$zq1.8220@newsfe22.iad>

Peter Makholm wrote:

> As Uri has explained this is very prone to mis-readings

I think it's more that Uri is proe to mis-reading.  Anyway, that was one
example by one person.  If you prefer it to read different (nothing
wrong with that, maybe there's everything right with that), then feel
free to make the alternative suggestion.  Saying someone's wrong
because you're too excited about arguing with people on the usenet
group (which is often what uri does), is pointless.


------------------------------

Date: Thu, 13 Aug 2009 13:40:53 -0700 (PDT)
From: cmic <cmic@live.fr>
Subject: Re: overridden method in perltoot
Message-Id: <c7e06950-84c1-49a5-b012-e348253a6b5f@w6g2000yqw.googlegroups.com>

Hello

On 13 ao=FBt, 16:56, Tad J McClellan <ta...@seesig.invalid> wrote:
> cmic <c...@live.fr> wrote:
> > I don't understand why this excerpt from perltoot doesn't work as
> > expected.
 ...
> =A0 =A0 bless($self, $class);
>
> > =A0 =A0 return $self;
> > }
>
> See the "Planning for the Future: Better Constructors" section in perltoo=
t,
> and note what the single-arg form of bless() does:
>
> =A0 =A0 perldoc -f bless
>
> =A0 =A0 ... Always use the two-argument version if a derived class might
> =A0 =A0 inherit the function doing the blessing...

OK. I dig it. (In short, this another form of RTFM 8-) )
Thanks to Tad and Dave.

--
michel marcon aka cmic



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 2549
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[31304] in Perl-Users-Digest

Perl-Users Digest, Issue: 2549 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Thu Aug 13 21:09:43 2009

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Aug 13 21:09:43 2009