[32650] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3926 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Apr 16 00:14:28 2013

Date: Mon, 15 Apr 2013 21:14:10 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 15 Apr 2013     Volume: 11 Number: 3926

Today's topics:
        More idiomatic <source@netcom.com>
    Re: More idiomatic <glex_no-spam@qwest-spam-no.invalid>
    Re: More idiomatic <rweikusat@mssgmbh.com>
    Re: More idiomatic <rweikusat@mssgmbh.com>
    Re: More idiomatic <rweikusat@mssgmbh.com>
    Re: More idiomatic (Tim McDaniel)
    Re: More idiomatic <source@netcom.com>
    Re: More idiomatic <source@netcom.com>
    Re: More idiomatic <derykus@gmail.com>
    Re: More idiomatic <rweikusat@mssgmbh.com>
    Re: More idiomatic <derykus@gmail.com>
    Re: More idiomatic <rweikusat@mssgmbh.com>
    Re: More idiomatic <glex_no-spam@qwest-spam-no.invalid>
    Re: More idiomatic <source@netcom.com>
    Re: More idiomatic <derykus@gmail.com>
    Re: More idiomatic <ben@morrow.me.uk>
    Re: More idiomatic <ben@morrow.me.uk>
    Re: More idiomatic <ben@morrow.me.uk>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Mon, 15 Apr 2013 11:17:52 -0700
From: David Harmon <source@netcom.com>
Subject: More idiomatic
Message-Id: <bfmdneSZqe8Y2fHMnZ2dnUVZ_vWdnZ2d@earthlink.com>

Here is some code I wrote in another newsgroup.  It is probably
obvious that it was written by an old C hacker.  If anybody would
like to suggest how it could be better perl or more idiomatic, 
go ahead.

>>>

use strict; use warnings;
open my $fin, 't1.txt';
my $fromline = '';
my $useragent = '';
my $hascont = 'n';
my $crosspost = 0;
my %statstab;

while (<$fin>) {
    chomp;
    if (/Newsgroups: .*,/i) {
        $crosspost = 1;
        next;
    }
    if (/^Content-Type:/i) {
        $hascont = 'y';
        next;
    }
    if (/^From: /i) {
        $fromline = $_;
        $fromline =~ s/<.*>/<...>/;
        next;
    }
    if (/^User-Agent:|^X-News-Software:|^X-Newsreader:/i) {
        $useragent = $_;
        $useragent =~ s/ \(.*//;
        $useragent =~ s/^.*:\s*//;
        next;
    }
    if (/^$/) {
        if ($fromline ne '' and not $crosspost) {
            ++$statstab{$fromline.'##'.$useragent.'##'.$hascont};
        }
        $fromline = $useragent = '';
        $crosspost = 0;  $hascont = 'n';
    }
}

foreach (sort keys %statstab) {
    if ( $statstab{$_} > 1) {
        ($fromline,$useragent,$hascont) = split '##';
        printf("%s %-24s %3d %s\n",
             $hascont, $useragent, $statstab{$_}, $fromline);
    }
}


------------------------------

Date: Mon, 15 Apr 2013 14:04:32 -0500
From: "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid>
Subject: Re: More idiomatic
Message-Id: <516c4f40$0$73617$815e3792@news.qwest.net>

On 04/15/13 13:17, David Harmon wrote:
> Here is some code I wrote in another newsgroup.  It is probably
> obvious that it was written by an old C hacker.  If anybody would
> like to suggest how it could be better perl or more idiomatic,
> go ahead.
>
>>>>
>
> use strict; use warnings;
> open my $fin, 't1.txt';

Use 3-argument open handle failure (die) if it can't open.
perldoc -f open

> my $fromline = '';
> my $useragent = '';
> my $hascont = 'n';
> my $crosspost = 0;
> my %statstab;


>
> while (<$fin>) {
>      chomp;
>      if (/Newsgroups: .*,/i) {
>          $crosspost = 1;
>          next;
>      }

>      if (/^Content-Type:/i) {
>          $hascont = 'y';
>          next;
>      }

Even a C Hacker should have learned about using else...else if...

Use elsif on the rest of these, since it can't start with more than one 
of these.. and avoids the need of 'next;' in every if.  If none of these
could also have 'Newsgroups: ' in it, then all of these, except for the 
first if, should be elsif's.

perldoc perlintro

>      if (/^From: /i) {
>          $fromline = $_;
>          $fromline =~ s/<.*>/<...>/;
>          next;
>      }
>      if (/^User-Agent:|^X-News-Software:|^X-Newsreader:/i) {
/^(?:User-Agent|X-News-Software|X-Newsreader):/i
Tiny bit shorter..

>          $useragent = $_;
>          $useragent =~ s/ \(.*//;
>          $useragent =~ s/^.*:\s*//;
>          next;
>      }
>      if (/^$/) {
>          if ($fromline ne '' and not $crosspost) {
>              ++$statstab{$fromline.'##'.$useragent.'##'.$hascont};
>          }
>          $fromline = $useragent = '';
>          $crosspost = 0;  $hascont = 'n';
>      }
> }
close( $fin );
>
> foreach (sort keys %statstab) {
        next unless statstab{$_} > 1;

>      if ( $statstab{$_}>  1) {
>          ($fromline,$useragent,$hascont) = split '##';

Since they're used specifically as a result of split, in
this loop, then define them ('my') in this scope instead
of re-using the global variables.  It works, in this case,
however it's better to narrow the scope of the variables.

>          printf("%s %-24s %3d %s\n",
>               $hascont, $useragent, $statstab{$_}, $fromline);
>      }
> }



------------------------------

Date: Mon, 15 Apr 2013 20:24:25 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: More idiomatic
Message-Id: <878v4jwqae.fsf@sapphire.mobileactivedefense.com>

"J. Gleixner" <glex_no-spam@qwest-spam-no.invalid> writes:
> On 04/15/13 13:17, David Harmon wrote:

[...]

>> while (<$fin>) {
>>      chomp;
>>      if (/Newsgroups: .*,/i) {
>>          $crosspost = 1;
>>          next;
>>      }
>
>>      if (/^Content-Type:/i) {
>>          $hascont = 'y';
>>          next;
>>      }
>
> Even a C Hacker should have learned about using else...else if...

This doesn't make much sense here because there is no common code
after the sequence of if-statements. The 'next' communicates this
clearly. Using an if - elsif - elsif - else cascade instead would
server to obfuscate this fact instead.

[...]

>> foreach (sort keys %statstab) {
>        next unless statstab{$_} > 1;
>
>>      if ( $statstab{$_}>  1) {
>>          ($fromline,$useragent,$hascont) = split '##';
>
> Since they're used specifically as a result of split, in
> this loop, then define them ('my') in this scope instead
> of re-using the global variables.  It works, in this case,
> however it's better to narrow the scope of the variables.

It is not 'better' to declare identically named variables in bazillion
inner scopes except if one if optimizing for 'maximal outsider
confusion' aka 'It was hard to write. It should be hard to read."



------------------------------

Date: Mon, 15 Apr 2013 20:32:16 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: More idiomatic
Message-Id: <874nf7wpxb.fsf@sapphire.mobileactivedefense.com>

David Harmon <source@netcom.com> writes:

[...]


> while (<$fin>) {
>     chomp;
>     if (/Newsgroups: .*,/i) {
>         $crosspost = 1;
>         next;
>     }

The if (...) { } could be avoided here in favour of something a la

/Newsgroups: .*,i/ and $crosspost = 1, next;

[...]

>     if (/^From: /i) {
>         $fromline = $_;
>         $fromline =~ s/<.*>/<...>/;
>         next;
>     }

For a multiline-group, this could become

/^From: /i && do {
	.
        .
        next;
};

Another idea would be to use given/ when (=> perlsyn, 'switch
statement'). In this way, all the next; could go away.

>     if (/^$/) {
>         if ($fromline ne '' and not $crosspost) {
>             ++$statstab{$fromline.'##'.$useragent.'##'.$hascont};
>         }

This could use a statement modifier instead of the single-line block:

++$statstab{...} if $fromline ne '' and !$crosspost;

also, a non-empty string is 'true' so this could be written as

++$statstab{...} if $fromline and !$crosspost;

[...]

> foreach (sort keys %statstab) {
>     if ( $statstab{$_} > 1) {
>         ($fromline,$useragent,$hascont) = split '##';
>         printf("%s %-24s %3d %s\n",
>              $hascont, $useragent, $statstab{$_}, $fromline);
>     }
> }

People usually use for instead of foreach for such loops.


------------------------------

Date: Mon, 15 Apr 2013 20:45:34 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: More idiomatic
Message-Id: <87zjwzvaqp.fsf@sapphire.mobileactivedefense.com>

Rainer Weikusat <rweikusat@mssgmbh.com> writes:
> David Harmon <source@netcom.com> writes:
>
> [...]
>
>
>> while (<$fin>) {
>>     chomp;
>>     if (/Newsgroups: .*,/i) {
>>         $crosspost = 1;
>>         next;
>>     }
>
> The if (...) { } could be avoided here in favour of something a la
>
> /Newsgroups: .*,i/ and $crosspost = 1, next;

Disclaimer: My opinions are usually not considered 'comme il faut' by
the people who define themselves as 'the Perl community' and their
opinions as 'The Perl Way of XYZ'.


------------------------------

Date: Mon, 15 Apr 2013 19:50:50 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: More idiomatic
Message-Id: <kkhlmq$i1i$1@reader1.panix.com>

In article <bfmdneSZqe8Y2fHMnZ2dnUVZ_vWdnZ2d@earthlink.com>,
David Harmon  <bad@example.invalid> wrote:
>Here is some code I wrote in another newsgroup.  It is probably
>obvious that it was written by an old C hacker.  If anybody would
>like to suggest how it could be better perl or more idiomatic, 
>go ahead.
>
>>>>
>
>use strict; use warnings;

I prefer one per line.

>open my $fin, 't1.txt';

I concur with the other suggestion of

    open(my $fin, '<', 't1.txt') or die 'cannot open "t1.txt"';

>    if (/Newsgroups: .*,/i) {

Missing caret.
    if (/^Newsgroups: .*,/i) {
I don't know header standards: there always going to be a space after
the "Newsgroups:" keyword?  In this list, sometimes you require one,
sometimes you don't.  I'd be inclined to not try to match the space
just to be more accepting.

>    if (/^Content-Type:/i) {

I agree with "elsif".

>    if (/^$/) {

While it's an appealing symmetry to use a regexp, I'd still probably
do
    if ($_ eq '') {

>        ($fromline,$useragent,$hascont) = split '##';

DEFINITELY concur in "my"ing them here.

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Mon, 15 Apr 2013 12:55:37 -0700
From: David Harmon <source@netcom.com>
Subject: Re: More idiomatic
Message-Id: <Xr2dnfk5PNXwxvHMnZ2dnUVZ_oydnZ2d@earthlink.com>

On Mon, 15 Apr 2013 14:04:32 -0500 in comp.lang.perl.misc, "J.
Gleixner" <glex_no-spam@qwest-spam-no.invalid> wrote,
>On 04/15/13 13:17, David Harmon wrote:
>> open my $fin, 't1.txt';
>
>Use 3-argument open handle failure (die) if it can't open.
>perldoc -f open

Yeah, I sleazed out on that because I would have had to look up
which magic variable holds the error message.

>>      if (/Newsgroups: .*,/i) {

Should have been
      if (/^Newsgroups: .*,/i) {


>>      if (/^Content-Type:/i) {
>>          $hascont = 'y';
>>          next;
>>      }
>
>Use elsif on the rest of these, since it can't start with more than one 
>of these.. and avoids the need of 'next;' in every if.  

In my opinion 'next;' is better in this case because 'else if'
requires the reader to look below to see if anything else may happen
before the end of the loop.

>>      if (/^User-Agent:|^X-News-Software:|^X-Newsreader:/i) {
>/^(?:User-Agent|X-News-Software|X-Newsreader):/i
>Tiny bit shorter..

Actually, it is *exactly* the same number of characters.  
And requires remembering (?:

What with the added grouping, does it make the regex any more
efficient?

>> }
>close( $fin );

Does that matter?

>> foreach (sort keys %statstab) {
>        next unless statstab{$_} > 1;

Here you give me the opposite advice regarding 'next;' as you did
above.


------------------------------

Date: Mon, 15 Apr 2013 13:31:03 -0700
From: David Harmon <source@netcom.com>
Subject: Re: More idiomatic
Message-Id: <_vWdndziSoIg_vHMnZ2dnUVZ_qoAAAAA@earthlink.com>

On Mon, 15 Apr 2013 19:50:50 +0000 (UTC) in comp.lang.perl.misc,
tmcd@panix.com (Tim McDaniel) wrote,
>In article <bfmdneSZqe8Y2fHMnZ2dnUVZ_vWdnZ2d@earthlink.com>,
>David Harmon  <bad@example.invalid> wrote:
>>use strict; use warnings;
>
>I prefer one per line.

I say those in one breath, so...

>I concur with the other suggestion of
>
>    open(my $fin, '<', 't1.txt') or die 'cannot open "t1.txt"';

I see little point in doing all that without the $!

>>    if (/Newsgroups: .*,/i) {
>
>Missing caret.

Indeed.

>I agree with "elsif".

I agree with Ranier on that one.

>>    if (/^$/) {
>
>While it's an appealing symmetry to use a regexp, I'd still probably
>do
>    if ($_ eq '') {

Concur.  But please, can I write it
     if (eq '') {
if not, why not?


------------------------------

Date: Mon, 15 Apr 2013 13:40:53 -0700 (PDT)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: More idiomatic
Message-Id: <ace5ee37-41ae-436a-9890-2ce698160913@googlegroups.com>

On Monday, April 15, 2013 12:24:25 PM UTC-7, Rainer Weikusat wrote:
> "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid> writes:
> 
> > On 04/15/13 13:17, David Harmon wrote:
> 
> 
> 
> [...]
> 
> 
> 
> >> while (<$fin>) {
> 
> >>      chomp;
> 
> >>      if (/Newsgroups: .*,/i) {
> 
> >>          $crosspost = 1;
> 
> >>          next;
> 
> >>      }
> 
> >
> 
> >>      if (/^Content-Type:/i) {
> 
> >>          $hascont = 'y';
> 
> >>          next;
> 
> >>      }
> 
> >
> 
> > Even a C Hacker should have learned about using else...else if...
> 
> 
> 
> This doesn't make much sense here because there is no common code
> 
> after the sequence of if-statements. The 'next' communicates this
> 
> clearly. Using an if - elsif - elsif - else cascade instead would
> 
> server to obfuscate this fact instead.
> 
>

But similarly you could argue "next" is confusing
because it suggests that you need to bypass common 
code at the end of the conditionals. Since there 
isn't any common code, "next" is unnecessary and
adds line noise. Of course, if it's a huge sequence
of conditionals, you could argue that the added
clarity trumps line noise. 

> ...

-- 
Charles DeRykus


------------------------------

Date: Mon, 15 Apr 2013 21:59:30 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: More idiomatic
Message-Id: <87ppxvv7bh.fsf@sapphire.mobileactivedefense.com>

"C.DeRykus" <derykus@gmail.com> writes:
> On Monday, April 15, 2013 12:24:25 PM UTC-7, Rainer Weikusat wrote:
>> "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid> writes:
>> > On 04/15/13 13:17, David Harmon wrote:

[...]


>> >>      if (/Newsgroups: .*,/i) {
>> >>          $crosspost = 1;
>> >>          next;
>> >>      }

[...]

>> > Even a C Hacker should have learned about using else...else if...
>> 
>> This doesn't make much sense here because there is no common code
>> after the sequence of if-statements. The 'next' communicates this
>> clearly. Using an if - elsif - elsif - else cascade instead would
>> server to obfuscate this fact instead.
>
> But similarly you could argue "next" is confusing
> because it suggests that you need to bypass common 
> code at the end of the conditionals.

There is 'common code' after each if block, namely, all the remaining if
conditions which have no special relation to each other and 'next' has
no special relation to them.


------------------------------

Date: Mon, 15 Apr 2013 14:11:46 -0700 (PDT)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: More idiomatic
Message-Id: <9dc8a870-4645-48e0-9886-ae1c191ea7fd@googlegroups.com>

On Monday, April 15, 2013 1:59:30 PM UTC-7, Rainer Weikusat wrote:
> "C.DeRykus" <derykus@gmail.com> writes:
> 
> > On Monday, April 15, 2013 12:24:25 PM UTC-7, Rainer Weikusat wrote:
> 
> >> "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid> writes:
> 
> >> > On 04/15/13 13:17, David Harmon wrote:
> 
> 
> 
> [...]
> 
> 
> 
> 
> 
> >> >>      if (/Newsgroups: .*,/i) {
> 
> >> >>          $crosspost = 1;
> 
> >> >>          next;
> 
> >> >>      }
> 
> 
> 
> [...]
> 
> 
> 
> >> > Even a C Hacker should have learned about using else...else if...
> 
> >> 
> 
> >> This doesn't make much sense here because there is no common code
> 
> >> after the sequence of if-statements. The 'next' communicates this
> 
> >> clearly. Using an if - elsif - elsif - else cascade instead would
> 
> >> server to obfuscate this fact instead.
> 
> >
> 
> > But similarly you could argue "next" is confusing
> 
> > because it suggests that you need to bypass common 
> 
> > code at the end of the conditionals.
> 
> 
> 
> There is 'common code' after each if block, namely, all the remaining if
> 
> conditions which have no special relation to each other and 'next' has
> 
> no special relation to them.


But, there really didn't need to be any common code 
since each "if" clause was accompanied with a "next".

So there's no compelling reason to not code this as
"if-elsif..." and avoid the noisy "next" insertions.

-- 
Charles DeRykus


------------------------------

Date: Mon, 15 Apr 2013 22:24:17 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: More idiomatic
Message-Id: <87li8jv666.fsf@sapphire.mobileactivedefense.com>

"C.DeRykus" <derykus@gmail.com> writes:
> On Monday, April 15, 2013 1:59:30 PM UTC-7, Rainer Weikusat wrote:

[...]

>> There is 'common code' after each if block, namely, all the remaining if
>> conditions which have no special relation to each other and 'next' has
>> no special relation to them.
>
> But, there really didn't need to be any common code 
> since each "if" clause was accompanied with a "next".
>
> So there's no compelling reason to not code this as
> "if-elsif..." and avoid the noisy "next" insertions.

The compelling reason is that execution continues with the next
statement after the if-elsif... cascade in case of such a cascade. But
when there is not such 'next statement', the whole construct is just
a blind: It forces someone who is not familiar with the code to go
looking for such common code only to find none.


------------------------------

Date: Mon, 15 Apr 2013 16:42:56 -0500
From: "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid>
Subject: Re: More idiomatic
Message-Id: <516c7460$0$52263$815e3792@news.qwest.net>

On 04/15/13 14:55, David Harmon wrote:
> On Mon, 15 Apr 2013 14:04:32 -0500 in comp.lang.perl.misc, "J.
> Gleixner"<glex_no-spam@qwest-spam-no.invalid>  wrote,
[...]
>> Use elsif on the rest of these, since it can't start with more than one
>> of these.. and avoids the need of 'next;' in every if.
>
> In my opinion 'next;' is better in this case because 'else if'
> requires the reader to look below to see if anything else may happen
> before the end of the loop.

You asked for "idiomatic" suggestions. I rarely see code that follows
your "opinion", but it's not code I'll have to fix, so do whatever
you want.

There are times to use if/next, however using it in every
if block is not idiomatic.

>
>>>       if (/^User-Agent:|^X-News-Software:|^X-Newsreader:/i) {
>> /^(?:User-Agent|X-News-Software|X-Newsreader):/i
>> Tiny bit shorter..
>
> Actually, it is *exactly* the same number of characters.
> And requires remembering (?:

Knowing (?:) is idiomatic to Perl.

It was more for showing how you could avoid repetition... Three
carats, three colons... If you need to add another one, it won't
require those common characters, and it's clear that whatever it is
you are looking for starts from the beginning..  If the match
can be from the beginning, or at any place, then it might be
coded differently, but it this case they all start from the
beginning, so do it once.

>
> What with the added grouping, does it make the regex any more
> efficient?

No grouping.

>
>>> }
>> close( $fin );
>
> Does that matter?

Again, it's idiomatic.. For ever open, there should be a close.
That's common in most languages. Or 'use autodie;'.

>
>>> foreach (sort keys %statstab) {
>>         next unless statstab{$_}>  1;
>
> Here you give me the opposite advice regarding 'next;' as you did
> above.

Not at all related.

There are a lot of times to use 'next', and this is one of many.

You'd probably see this from a person who is well versed in Perl
and you'd see the if() with folks who aren't as comfortable, which
again would be idiomatic.


------------------------------

Date: Mon, 15 Apr 2013 14:50:28 -0700
From: David Harmon <source@netcom.com>
Subject: Re: More idiomatic
Message-Id: <-7idnbyNn-vF6_HMnZ2dnUVZ_rKdnZ2d@earthlink.com>

On Mon, 15 Apr 2013 16:42:56 -0500 in comp.lang.perl.misc, "J.
Gleixner" <glex_no-spam@qwest-spam-no.invalid> wrote,
>You asked for "idiomatic" suggestions. I rarely see code that follows
>your "opinion", but it's not code I'll have to fix, so do whatever
>you want.

I did indeed ask, and thanks for giving me your suggestions and for
bothering to look at my code.  Whether I agree with any particular
part or not.



------------------------------

Date: Mon, 15 Apr 2013 14:57:46 -0700 (PDT)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: More idiomatic
Message-Id: <a912647f-1a73-45a3-8f9a-ae36f484fb21@googlegroups.com>

On Monday, April 15, 2013 2:24:17 PM UTC-7, Rainer Weikusat wrote:
> "C.DeRykus" <derykus@gmail.com> writes:
> 
> > On Monday, April 15, 2013 1:59:30 PM UTC-7, Rainer Weikusat wrote:
> 
> 
> 
> [...]
> 
> 
> 
> >> There is 'common code' after each if block, namely, all the remaining if
> 
> >> conditions which have no special relation to each other and 'next' has
> 
> >> no special relation to them.
> 
> >
> 
> > But, there really didn't need to be any common code 
> 
> > since each "if" clause was accompanied with a "next".
> 
> >
> 
> > So there's no compelling reason to not code this as
> 
> > "if-elsif..." and avoid the noisy "next" insertions.
> 
> 
> 
> The compelling reason is that execution continues with the next
> 
> statement after the if-elsif... cascade in case of such a cascade. But
> 
> when there is not such 'next statement', the whole construct is just
> 
> a blind: It forces someone who is not familiar with the code to go
> 
> looking for such common code only to find none.

And the multiple "if...next" code forces you to read
through them all anyway to determine if there's common 
code as well. After all, at some point, an "if..." may 
not have a "next".  You'll be schleppen through lots of 
noisy "next" statements to ensure that's the case.

There's no "One True Way".  Stylistically, I'd prefer
the "if...elsif" as the more idiomatic way but TIMTODI
is the "next" best option.

-- 
Charles DeRykus 




------------------------------

Date: Tue, 16 Apr 2013 03:07:50 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: More idiomatic
Message-Id: <mcou3a-abi.ln1@anubis.morrow.me.uk>


Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
> 
> >     if (/^$/) {
> >         if ($fromline ne '' and not $crosspost) {
> >             ++$statstab{$fromline.'##'.$useragent.'##'.$hascont};
> >         }
> 
> This could use a statement modifier instead of the single-line block:
> 
> ++$statstab{...} if $fromline ne '' and !$crosspost;
> 
> also, a non-empty string is 'true' so this could be written as
> 
> ++$statstab{...} if $fromline and !$crosspost;

Not all nonempty strings are true; in particular, "0" is false. I
usually use length() as a boolean to test for an empty string:

    length $fromline and not $crosspost
        and $statstab{...}++;

Ben



------------------------------

Date: Tue, 16 Apr 2013 03:03:33 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: More idiomatic
Message-Id: <l4ou3a-abi.ln1@anubis.morrow.me.uk>


Quoth "Newsgroup only please, address is no longer replyable."  <bad@example.invalid>:
> Here is some code I wrote in another newsgroup.  It is probably
> obvious that it was written by an old C hacker.  If anybody would
> like to suggest how it could be better perl or more idiomatic, 
> go ahead.
> 
> >>>
> 
> use strict; use warnings;

Good.

> open my $fin, 't1.txt';

With a constant filename, it's not so important, but it's safer to get
into the habit of using 3-arg open:

    open my $fin, "<", "t1.txt";

You need to check the open succeeded. A simple way to do this is to add

    use autodie;

to your 'use strict; use warnings;' invocation.

It's usual (though not universal) to use variable names in all caps for
filehandles.

> my $fromline = '';
> my $useragent = '';
> my $hascont = 'n';
> my $crosspost = 0;

This is definitely unidiomatic. None of these variables except $hascont
need initialising: Perl variables are initialised to undef, which is the
empty string in string context, 0 in numeric context and false in
boolean context. Even $hascont is actually a boolean; the standard
boolean values in perl are 1 and either the empty string or undef, not
"y" and "n".

> while (<$fin>) {
>     chomp;
>     if (/Newsgroups: .*,/i) {
>         $crosspost = 1;
>         next;
>     }
>     if (/^Content-Type:/i) {
>         $hascont = 'y';
>         next;
>     }
>     if (/^From: /i) {
>         $fromline = $_;
>         $fromline =~ s/<.*>/<...>/;

The usual idiom here is

    ($fromline = $_) =~ s/.../.../;

though as of 5.14 you can use s///r instead:

    $fromline = s/<.*>/<...>/r;

Note: that is '=', not '=~'. The subsitution is performed on $_
(implicitly), but because of /r the substituted value is returned rather
than altering $_.

>         next;
>     }
>     if (/^User-Agent:|^X-News-Software:|^X-Newsreader:/i) {
>         $useragent = $_;
>         $useragent =~ s/ \(.*//;
>         $useragent =~ s/^.*:\s*//;
>         next;
>     }
>     if (/^$/) {
>         if ($fromline ne '' and not $crosspost) {

I believe the style police say you shouldn't use 'and' and 'not' in an
expression; instead they should be reserved for flow control. So either

    $fromline ne '' and not $crosspost
        and ++$statstab{...};

or

    if ($fromline ne '' && !$crosspost) {
        ++$statstab{...};
    }

>             ++$statstab{$fromline.'##'.$useragent.'##'.$hascont};

Faking structures using strings is a shell trick; Perl has real
structures you should use instead. In this case I'd use more layers of
hash:

    ++$statstab{$fromline}{$useragent}{$hascont};

Also, it's more idiomatic to use interpolation than to use '.'
explicitly, so that string would usually be written

    "$fromline##$useragent##$hascont"

possibly with some ${}s if you thought it would make things clearer.

>         }
>         $fromline = $useragent = '';
>         $crosspost = 0;  $hascont = 'n';

I really don't like this; it's messy, and if you add more variables the
chance of forgetting to reset them is too high. In fact, it's indicative
of a problem I have with your whole approach here: flag variables are
generally a bad sign. Sometimes they are the only straightforward way to
make things work, but it's worth looking for alternatives if you can.

In this case, my first thought would be that these variables are all
connected (they all refer to a single article header), so they should be
members of a single data structure. Perhaps something like this:

    my %article;

    while (<$fin>) {
        if (...) {
            $article{crosspost} = 1;
        }
        ...;

        if (/^$/) {
            ...;
            undef %article;
        }

This at least lets you clear all the flags at once. However, I think if
I were writing this I might do something more like this:

    my %StatTab;

    local $/ = "";
    while (<$fin>) {
        my $xpost   = /^Newsgroups: .*,/im;
        my $ctype   = /^Content-type:/im;
        my ($from)  = /^From:(.*)/im;
        $from and $from =~ s/<.*>/<...>/;
        my ($ua)    = m!
            ^ (?: User-Agent
                | X-News (?:reader|-Software)
            ): \s*
            ([^(]*)
        !imx;

        $from and not $xpost and
            $StatTab{$from}{$ua}{$ctype}++;
    }

The important changes here are the '$/ = ""' and the /m flags. The first
makes <> read by paragraphs instead of lines: that is, each iteration of
the loop reads all the way to the first blank line in one go. The second
changes /^/ and /$/ to match the start or end of any line in the string,
rather than the start or end of the whole string, so we can match
against the whole header to find individual lines.

(And yes, we all know $/ is a rather nasty interface; it's actually
rather a shame IO::Handle->getline doesn't support an optional
record-separator argument. It is, however, extremely idiomatic, so you
want to learn how to use it.)

>     }
> }
> 
> foreach (sort keys %statstab) {
>     if ( $statstab{$_} > 1) {
>         ($fromline,$useragent,$hascont) = split '##';
>         printf("%s %-24s %3d %s\n",
>              $hascont, $useragent, $statstab{$_}, $fromline);
>     }
> }

With properly-structured data you need to work a little harder to get it
out, unfortunately:

    for my $from (sort keys %statstab) {
        my $uas = $statstab{$from};
        for my $ua (sort keys %$ua) {
            my $ctypes = $$uas{$ua};
            for my $ctype (sort keys %$ctypes) {
                my $count = $$ctypes{$ctype}
                    or next;
                printf "%s %-24s %3d %s\n",
                    ($ctype ? "y" : "n"), $ua, $count, $from;
            }
        }
    }

though with a little thought you could write something (or find
something on CPAN) which would turn that data structure inside-out into
an array of hashrefs; let's see, perhaps

    sub invert;
    sub invert {
        my ($f, $t, $l, @levels) = @_;

        return map {
            @levels 
                ? invert $$f{$_}, { %$t, $l => $_ }, @levels
                : { %$t, $l => $$f{$_} }
        } sort keys %$f;
    }

so you could just write

    for my $art (
        invert \%statstab, {}, qw/from ua ctype count/
    ) {

Ben



------------------------------

Date: Tue, 16 Apr 2013 03:15:06 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: More idiomatic
Message-Id: <aqou3a-abi.ln1@anubis.morrow.me.uk>


Quoth "Newsgroup only please, address is no longer replyable."
<bad@example.invalid>:
> On Mon, 15 Apr 2013 19:50:50 +0000 (UTC) in comp.lang.perl.misc,
> tmcd@panix.com (Tim McDaniel) wrote,
> >In article <bfmdneSZqe8Y2fHMnZ2dnUVZ_vWdnZ2d@earthlink.com>,
> >David Harmon  <bad@example.invalid> wrote:
> >>use strict; use warnings;
> >
> >I prefer one per line.
> 
> I say those in one breath, so...
> 
> >I concur with the other suggestion of
> >
> >    open(my $fin, '<', 't1.txt') or die 'cannot open "t1.txt"';
> 
> I see little point in doing all that without the $!

Well, the point is that if you let the program continue it either
produces incorrect results or fails later with an even more confusing
error message. I agree that the message would be better if it included
$!; autodie will do this for you.

> >While it's an appealing symmetry to use a regexp, I'd still probably
> >do
> >    if ($_ eq '') {
> 
> Concur.  But please, can I write it
>      if (eq '') {
> if not, why not?

No, you can't. As for why not: well, you just can't :). Some operators
have optional arguments that default to $_, some do not; none of the
binary operators are in the set that do, because that would be really
confusing. (I'm not even sure they would parse unambiguously in the
general case.)

However, you can write

    unless (length) {

if you prefer.

Ben



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3926
***************************************


home help back first fref pref prev next nref lref last post