[32776] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4040 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Sep 23 14:09:41 2013

Date: Mon, 23 Sep 2013 11:09:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 23 Sep 2013     Volume: 11 Number: 4040

Today's topics:
    Re: mixed cmp operator for sorting <marc.girod@gmail.com>
    Re: mixed cmp operator for sorting <ben@morrow.me.uk>
    Re: mixed cmp operator for sorting <marc.girod@gmail.com>
    Re: mixed cmp operator for sorting <ben@morrow.me.uk>
    Re: mixed cmp operator for sorting <uri@stemsystems.com>
    Re: mixed cmp operator for sorting <bill@todbe.com>
    Re: mixed cmp operator for sorting (Tim McDaniel)
    Re: mixed cmp operator for sorting <jwkrahn@example.com>
    Re: mixed cmp operator for sorting <derykus@gmail.com>
    Re: mixed cmp operator for sorting <rweikusat@mobileactivedefense.com>
    Re: mixed cmp operator for sorting <ben@morrow.me.uk>
    Re: mixed cmp operator for sorting <ben@morrow.me.uk>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sun, 22 Sep 2013 13:55:16 -0700 (PDT)
From: Marc Girod <marc.girod@gmail.com>
Subject: Re: mixed cmp operator for sorting
Message-Id: <010fb1df-8c7a-4704-909b-e24ca786c173@googlegroups.com>

On Sunday, 22 September 2013 20:07:34 UTC+1, Ben Morrow  wrote:

> You would be better off prototyping ucmp ($$), which makes sort pass the
> values to be sorted in @_. This is slightly slower than using $a and $b,
> which is why it isn't the default, but using eval will be *much* slower,
> as well as rather less safe.

OK. I knew of this option, but I wanted to learn about the pros and cons...

> If you must access your caller's variables, you should use symrefs in
> preference to eval:
> 
>     my ($a, $b) = do {
>         no strict "refs";
>         my $pkg = caller;
>         ${"$pkg\::a"}, ${"$pkg\::b"};
>     };

Thank you! You remind me of an advice you already gave me previously in another context!
Now, how would this compare with prototyping the function?

> >   my @t = $a =~ /(\D*)(\d*)/g; pop @t; pop @t;
> >   my @s = $b =~ /(\D*)(\d*)/g; pop @s; pop @s;
> 
> Why the pops?

I found that since my regexp matches an empty string, the 'g' flag will stop after matching one (which is very nice of it), leaving me with one pair of empty strings in every case...
I am just removing them.

> >   my $l = ((@t <= @s)? @t : @s) / 2;
> >   my $ret;
> >   while ($l--) {
> >     $ret = ((shift @t) cmp (shift @s) or (shift @t or 0) <=> (shift @s
> >     or 0));

> You try string comparison first; this means numerical sections will be
> sorted in string order and the numerical comparison is pointless.

No? I try it on the string token, possibly empty.

> You need to consider what to do if the sections you are comparing are of
> different types (a numeric vs a non-numeric section).

My regexp matching has taken care of this...

> You should use || for or-as-an-expression; 'or' is for flow control.

OK. The only result there would be to change the precedence, allowing me to drop one set of parentheses...?

> >     $l = 0 if $ret;
> >   }
> >   return ($ret or @t <=> @s);

> I don't understand what $l is doing for you here. AFAICS these two
> strings

>     a1b2z9
>     a1c3z9

> will compare equal; is that what you want?

$l is initialised to the length of the shorter string (in token pairs).
Your two strings do sort right...

$ perl -MUCmp -lE 'print for sort ucmp qw(a1b2z9 a1c3z9)'
a1b2z9
a1c3z9

The first pass of the loop will handle the two (a, 1); they compare doubly equal, but since $l is not down to 0 yet, the next pass will handle (b, 2) and (c, 3), and there the comparison of b and c will suffice:
$l will be forcibly set to 0, and since $ret is non null, the original lengths will not need to be compared.

Marc


------------------------------

Date: Sun, 22 Sep 2013 22:17:34 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: mixed cmp operator for sorting
Message-Id: <ec34ha-ss4.ln1@anubis.morrow.me.uk>


Quoth Marc Girod <marc.girod@gmail.com>:
> On Sunday, 22 September 2013 20:07:34 UTC+1, Ben Morrow  wrote:
> 
> > You would be better off prototyping ucmp ($$), which makes sort pass the
> > values to be sorted in @_. This is slightly slower than using $a and $b,
> > which is why it isn't the default, but using eval will be *much* slower,
> > as well as rather less safe.
> 
> OK. I knew of this option, but I wanted to learn about the pros and cons...
> 
> > If you must access your caller's variables, you should use symrefs in
> > preference to eval:
> > 
> >     my ($a, $b) = do {
> >         no strict "refs";
> >         my $pkg = caller;
> >         ${"$pkg\::a"}, ${"$pkg\::b"};
> >     };
> 
> Thank you! You remind me of an advice you already gave me previously in
> another context!
> Now, how would this compare with prototyping the function?

You'd have to benchmark it to see, but I would expect that all using $a
and $b saves is the cost of pushing the values on the stack and popping
them off again, which is exactly what that 'do' does. So I would expect
this to be (slightly) slower than the prototype.

> > >   my @t = $a =~ /(\D*)(\d*)/g; pop @t; pop @t;
> > >   my @s = $b =~ /(\D*)(\d*)/g; pop @s; pop @s;
> > 
> > Why the pops?
> 
> I found that since my regexp matches an empty string, the 'g' flag will
> stop after matching one (which is very nice of it), leaving me with one
> pair of empty strings in every case...
> I am just removing them.

Ah yes; I missed that this could match the empty string. I might rather
modify the pattern so it doesn't, though that's not terribly
straightforward; something like

    /(?=.)(\D*)(\d*)/gs

is probably as simple as it will go.

> > >   my $l = ((@t <= @s)? @t : @s) / 2;
> > >   my $ret;
> > >   while ($l--) {
> > >     $ret = ((shift @t) cmp (shift @s) or (shift @t or 0) <=> (shift @s
> > >     or 0));
> 
> > You try string comparison first; this means numerical sections will be
> > sorted in string order and the numerical comparison is pointless.
> 
> No? I try it on the string token, possibly empty.
> 
> > You need to consider what to do if the sections you are comparing are of
> > different types (a numeric vs a non-numeric section).
> 
> My regexp matching has taken care of this...

You're right; the inline shifts confused me into thinking that the same
token was being compared both times.

> > You should use || for or-as-an-expression; 'or' is for flow control.
> 
> OK. The only result there would be to change the precedence, allowing me
> to drop one set of parentheses...?

Yes.

> > >     $l = 0 if $ret;
> > >   }
> > >   return ($ret or @t <=> @s);
> 
> > I don't understand what $l is doing for you here. AFAICS these two
> > strings
> 
> >     a1b2z9
> >     a1c3z9
> 
> > will compare equal; is that what you want?
> 
> $l is initialised to the length of the shorter string (in token pairs).
> Your two strings do sort right...
> 
> $ perl -MUCmp -lE 'print for sort ucmp qw(a1b2z9 a1c3z9)'
> a1b2z9
> a1c3z9

Yes; as I said, I missed the fact you were shifting off pairs of values.

In that case: there's no need to count the loop, since Perl arrays know
how long they are:

    while (@s && @t) {
        $ret = ...
            and last;
    }

or

        my $ret = ...;
        $ret and return $ret;

(once again I ask myself why 'my' variables aren't introduced until the
end of the statement...)

Ben



------------------------------

Date: Sun, 22 Sep 2013 14:49:23 -0700 (PDT)
From: Marc Girod <marc.girod@gmail.com>
Subject: Re: mixed cmp operator for sorting
Message-Id: <68cfab11-7c60-45f2-92c3-f495eb061a5b@googlegroups.com>

On Sunday, 22 September 2013 22:17:34 UTC+1, Ben Morrow  wrote:

> You'd have to benchmark it to see, but I would expect that all using $a
> and $b saves is the cost of pushing the values on the stack and popping
> them off again, which is exactly what that 'do' does. So I would expect
> this to be (slightly) slower than the prototype.

Fair enough, and understood.

> Ah yes; I missed that this could match the empty string. I might rather
> modify the pattern so it doesn't, though that's not terribly
> straightforward; something like
> 
>     /(?=.)(\D*)(\d*)/gs
> 
> is probably as simple as it will go.

OK. I see this is better.
Why do you use the 's' modifier?
To sort multiline items?...

> In that case: there's no need to count the loop, since Perl arrays know
> how long they are:
> 
>     while (@s && @t) {
>         $ret = ...
>             and last;
>     }

OK. This one I get.

> or

>         my $ret = ...;
>         $ret and return $ret;

But this one I don't...
I have now:

  return $ret || @t <=> @s;

> (once again I ask myself why 'my' variables aren't introduced until the
> end of the statement...)

And neither this comment...
Thanks!
Marc


------------------------------

Date: Sun, 22 Sep 2013 23:38:35 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: mixed cmp operator for sorting
Message-Id: <b484ha-0l5.ln1@anubis.morrow.me.uk>


Quoth Marc Girod <marc.girod@gmail.com>:
> On Sunday, 22 September 2013 22:17:34 UTC+1, Ben Morrow  wrote:
> 
> > You'd have to benchmark it to see, but I would expect that all using $a
> > and $b saves is the cost of pushing the values on the stack and popping
> > them off again, which is exactly what that 'do' does. So I would expect
> > this to be (slightly) slower than the prototype.
> 
> Fair enough, and understood.
> 
> > Ah yes; I missed that this could match the empty string. I might rather
> > modify the pattern so it doesn't, though that's not terribly
> > straightforward; something like
> > 
> >     /(?=.)(\D*)(\d*)/gs
> > 
> > is probably as simple as it will go.
> 
> OK. I see this is better.
> Why do you use the 's' modifier?
> To sort multiline items?...

/s makes . match newline. Without it the pattern will fail to match if a
non-digit section starts with a newline.

> > In that case: there's no need to count the loop, since Perl arrays know
> > how long they are:
> > 
> >     while (@s && @t) {
> >         $ret = ...
> >             and last;
> >     }
> 
> OK. This one I get.
> 
> > or
> 
> >         my $ret = ...;
> >         $ret and return $ret;
> 
> But this one I don't...
> I have now:
> 
>   return $ret || @t <=> @s;

Maybe I should write the whole thing out:

    while (@s && @t) {
        my $ret = ...;
        $ret and return $ret;
    }
    return @t <=> @s;

Rather than using 'last' to get out of the loop and using $ret to carry
the return value out of the loop, simply return immediately from inside
the loop. Then, if the loop ends and the sub hasn't returned yet, do the
final comparison. This avoids having to have $ret's scope extend over
every iteration of the loop and beyond, which I consider a Good Thing.
(Limiting the scope of variables is always a Good Thing.)

> > (once again I ask myself why 'my' variables aren't introduced until the
> > end of the statement...)
> 
> And neither this comment...

I would like to be able to write

    my $ret = ...
        and return $ret;

but I can't, because Perl does not let you refer to a variable in the
same statement as it was declared. This restriction is entirely
artificial (it would be easier for perl to bring $ret into scope
immediately) and I consider it unnecessary and unhelpful, for reasons
I've explained before and so won't bore people by explaining again.

Ben



------------------------------

Date: Sun, 22 Sep 2013 21:16:29 -0400
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: mixed cmp operator for sorting
Message-Id: <87r4cguyky.fsf@stemsystems.com>

>>>>> "MG" == Marc Girod <marc.girod@gmail.com> writes:

  MG> Hi, I often have to sort strings with embedded numbers, which gets
  MG> into the problem that neither cmp nor <=> is appropriate to deal
  MG> with them.  Making ad-hoc sort functions is nearly trivial, but
  MG> cumbersome e.g. for one-liners.

have you looked at Sort::Maker on cpan? you can make the sort elsewhere
and call the single code ref to sort stuff. you can even run it offline
and get the source for the sorter to use in your code.

  MG> sub ucmp {
  MG>   my $a = eval '$' . caller . '::a';
  MG>   my $b = eval '$' . caller . '::b';
  MG>   my @t = $a =~ /(\D*)(\d*)/g; pop @t; pop @t;
  MG>   my @s = $b =~ /(\D*)(\d*)/g; pop @s; pop @s;

that is redundant and prone to error. that is one area sort::maker
helps. you only need to describe how to extract each key from the
elements to be sorted. the module does the rest.

uri


------------------------------

Date: Sun, 22 Sep 2013 19:27:19 -0700
From: "$Bill" <bill@todbe.com>
Subject: Re: mixed cmp operator for sorting
Message-Id: <l1o8u8$oeu$1@dont-email.me>

On 9/22/2013 11:15, Marc Girod wrote:
> Hi,
>
> I often have to sort strings with embedded numbers, which gets into the problem that neither cmp nor <=> is appropriate to deal with them.
> Making ad-hoc sort functions is nearly trivial, but cumbersome e.g. for one-liners.
>
> I tried thus to write a sort function which would parse out the string parts from the numeric ones.
> Here is what I came up with.
 ...
> sub ucmp {

>    my $a = eval '$' . caller . '::a';
>    my $b = eval '$' . caller . '::b';
>    my @t = $a =~ /(\D*)(\d*)/g; pop @t; pop @t;
>    my @s = $b =~ /(\D*)(\d*)/g; pop @s; pop @s;
>    my $l = ((@t <= @s)? @t : @s) / 2;

>    my $ret;
>    while ($l--) {
>      $ret = ((shift @t) cmp (shift @s) or (shift @t or 0) <=> (shift @s or 0));

I might use:
	$ret = ((shift @t || '') cmp (shift @s || '') or (shift @t || 0) <=> (shift @s || 0));
or:
	$ret = (shift @t || '') cmp (shift @s || '');
	$ret = (shift @t || 0) <=> (shift @s || 0) if not $ret;

>      $l = 0 if $ret;

or:
	last if $ret;

>    }
>    return ($ret or @t <=> @s);
> }

Interesting sub - not sure where it would come in handy though
with that specific kind of data string(s) [mixed alpha/numeric].








------------------------------

Date: Mon, 23 Sep 2013 04:23:05 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: mixed cmp operator for sorting
Message-Id: <l1ofn9$i38$2@reader1.panix.com>

In article <9ff61f8b-ea0a-4d16-83a8-7fb010d765f4@googlegroups.com>,
Marc Girod  <marc.girod@gmail.com> wrote:
>say for sort ucmp qw(                                                                                 
>                        a12b34                                                                        
>                        a2c                                                                           
>                        b23                                                                           
>                        a7                                                                            
>                        a7b                                                                           
>                        23                                                                            
>                     );
>$ ./bar
>23
>a2c
>a7
>a7b
>a12b34
>b23
>
>Would you care to criticize it?

You don't document what the result should be.  I can make a guess from
that sample data, but I'd rather not guess; I'd rather have a spec
to verify results.

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Mon, 23 Sep 2013 00:30:46 -0700
From: "John W. Krahn" <jwkrahn@example.com>
Subject: Re: mixed cmp operator for sorting
Message-Id: <H6S%t.125833$La6.92325@fx04.iad>

Marc Girod wrote:
> Hi,
>
> I often have to sort strings with embedded numbers, which gets into the problem that neither cmp nor<=>  is appropriate to deal with them.
> Making ad-hoc sort functions is nearly trivial, but cumbersome e.g. for one-liners.
>
> I tried thus to write a sort function which would parse out the string parts from the numeric ones.
> Here is what I came up with.
>
> -8<---------------
> package UCmp;
>
> require Exporter;
> @ISA = qw(Exporter);
> @EXPORT = qw(ucmp);
>
> sub ucmp {
>    my $a = eval '$' . caller . '::a';
>    my $b = eval '$' . caller . '::b';
>    my @t = $a =~ /(\D*)(\d*)/g; pop @t; pop @t;
>    my @s = $b =~ /(\D*)(\d*)/g; pop @s; pop @s;

Better as:

     my @t = $a =~ /\D+|\d+/g;
     my @s = $b =~ /\D+|\d+/g;

Avoids zero length strings.



John
-- 
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction.                   -- Albert Einstein


------------------------------

Date: Mon, 23 Sep 2013 02:17:56 -0700
From: Charles DeRykus <derykus@gmail.com>
Subject: Re: mixed cmp operator for sorting
Message-Id: <l1p103$js9$1@speranza.aioe.org>

On 9/22/2013 3:38 PM, Ben Morrow wrote:
> ...
>
> I would like to be able to write
>
>      my $ret = ...
>          and return $ret;
>
> but I can't, because Perl does not let you refer to a variable in the
> same statement as it was declared. This restriction is entirely
> artificial (it would be easier for perl to bring $ret into scope
> immediately) and I consider it unnecessary and unhelpful, for reasons
> I've explained before and so won't bore people by explaining again.
>

Of course it currently works fine with a previous $ret in scope.
There's no error but the sub returns "foo" instead of "bar":

   { my $ret="foo"; sub {my $ret="bar" and return $ret}  }

A side-benefit would be this scope surprise would be eliminated because 
the later $ret would be seen.

But, there's always some catch: you'd still need:  no warnings 'syntax' 
though in order to quiet: "Found = in conditional, should be ==".

-- 
Charles DeRykus

p.s.   A global would eliminate the problem but has a considerable yuck 
factor..



------------------------------

Date: Mon, 23 Sep 2013 11:57:41 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: mixed cmp operator for sorting
Message-Id: <877ge7955m.fsf@sable.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:

[...]

> Maybe I should write the whole thing out:
>
>     while (@s && @t) {
>         my $ret = ...;
>         $ret and return $ret;
>     }
>     return @t <=> @s;
>
> Rather than using 'last' to get out of the loop and using $ret to carry
> the return value out of the loop, simply return immediately from inside
> the loop. Then, if the loop ends and the sub hasn't returned yet, do the
> final comparison. This avoids having to have $ret's scope extend over
> every iteration of the loop and beyond, which I consider a Good Thing.
> (Limiting the scope of variables is always a Good Thing.)

Subroutines which are so complicated that people start to loose track of
which variables were being used in what places are 'always a bad thing'
and this solution is not to create more variables but the split the
sequential tapeworm into multiple subroutines (with individual sets of
variables)[*].

[*] I keep being amazed at this idea that the solution to "It is too
complicated already!" must be "Make it more complicated!" (and maybe,
add some neat warnings to help the poor guy who can stop out-clevering
himself in code ...).

>> > (once again I ask myself why 'my' variables aren't introduced until the
>> > end of the statement...)
>> 
>> And neither this comment...
>
> I would like to be able to write
>
>     my $ret = ...
>         and return $ret;
>
> but I can't, because Perl does not let you refer to a variable in the
> same statement as it was declared. This restriction is entirely
> artificial (it would be easier for perl to bring $ret into scope
> immediately) and I consider it unnecessary and unhelpful, for reasons
> I've explained before and so won't bore people by explaining again.

The reason for this is that code like this

my $v = 12;
{
    my $v = $v;
    print("$v\n");
}

works as intended, ie, the inner $v gets initialized to the value of the
outer $v. Granted, this is a silly example, but I've already used
similar constructs in real code.


------------------------------

Date: Mon, 23 Sep 2013 17:41:14 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: mixed cmp operator for sorting
Message-Id: <ai76ha-u2n1.ln1@anubis.morrow.me.uk>


Quoth Charles DeRykus <derykus@gmail.com>:
> On 9/22/2013 3:38 PM, Ben Morrow wrote:
> >
> > I would like to be able to write
> >
> >      my $ret = ...
> >          and return $ret;
> >
> > but I can't, because Perl does not let you refer to a variable in the
> > same statement as it was declared. This restriction is entirely
> > artificial (it would be easier for perl to bring $ret into scope
> > immediately) and I consider it unnecessary and unhelpful, for reasons
> > I've explained before and so won't bore people by explaining again.
> 
> Of course it currently works fine with a previous $ret in scope.
> There's no error but the sub returns "foo" instead of "bar":
> 
>    { my $ret="foo"; sub {my $ret="bar" and return $ret}  }

That's the 'feature' this restriction is intended to allow. It made
sense for local, when there was always an outer variable present which
may have been needed, but IMHO makes no sense for my.

> A side-benefit would be this scope surprise would be eliminated because 
> the later $ret would be seen.

Yes. And with the block you don't even get a 'my variable masks earlier
declaration' warning, which would actually be useful.

> But, there's always some catch: you'd still need:  no warnings 'syntax' 
> though in order to quiet: "Found = in conditional, should be ==".

That's only the case if the RHS is constant. Perl's warnings,
fortunately, aren't quite as bad as gcc's yet.

Ben



------------------------------

Date: Mon, 23 Sep 2013 17:43:22 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: mixed cmp operator for sorting
Message-Id: <am76ha-u2n1.ln1@anubis.morrow.me.uk>


Quoth Rainer Weikusat <rweikusat@mobileactivedefense.com>:
> 
> The reason for this is that code like this
> 
> my $v = 12;
> {
>     my $v = $v;
>     print("$v\n");
> }
> 
> works as intended, ie, the inner $v gets initialized to the value of the
> outer $v. Granted, this is a silly example, but I've already used
> similar constructs in real code.

As you say, this is a silly example; have you got a real one? I can see
the point of that construction with local, but not with my.

Ben



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4040
***************************************


home help back first fref pref prev next nref lref last post