[32615] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3889 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Mar 2 03:09:22 2013

Date: Sat, 2 Mar 2013 00:09:06 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sat, 2 Mar 2013     Volume: 11 Number: 3889

Today's topics:
        domain name ordering <oneingray@gmail.com>
    Re: domain name ordering <rvtol+usenet@xs4all.nl>
    Re: domain name ordering <blgl@stacken.kth.se>
    Re: domain name ordering <rweikusat@mssgmbh.com>
    Re: domain name ordering <rweikusat@mssgmbh.com>
    Re: domain name ordering <rweikusat@mssgmbh.com>
    Re: domain name ordering <ben@morrow.me.uk>
    Re: domain name ordering <oneingray@gmail.com>
    Re: domain name ordering <rweikusat@mssgmbh.com>
    Re: domain name ordering <rweikusat@mssgmbh.com>
    Re: domain name ordering <jurgenex@hotmail.com>
    Re: domain name ordering <rweikusat@mssgmbh.com>
    Re: Rehabilitating bad Perl code <rweikusat@mssgmbh.com>
    Re: Rehabilitating bad Perl code <cartercc@gmail.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Thu, 28 Feb 2013 08:36:51 +0000
From: Ivan Shmakov <oneingray@gmail.com>
Subject: domain name ordering
Message-Id: <87d2vk7s2k.fsf@violet.siamics.net>

	One more "domain name" problem: write an "inequality" predicate
	to compare domain names "right to left".  IOW:

my @ordered
    = qw (qux.example.net bar.example.org foo.bar.example.org foo.example.org);

	A trivial solution is to split /\./ each of the names, reverse
	the resulting lists, and then compare them elementwise, until
	either one of the lists ends (which is then the lesser one), or
	an inequality is found.  Hence:

sub list_cmp (&$$) {
    ## BTW, how do I imitate sort's own signature here?
    ## (i. e., make $cmp truly optional.)
    my ($cmp, $a, $b) = @_;
    $cmp
        //= sub { $_[0] cmp $_[1]; };
    for (my $i = 0; $i <= $#$a; $i++) {
        ## .
        return 1
            if ($i > $#$b);
        my $v
            = &$cmp ($a->[$i], $b->[$i]);
        ## .
        return $v
            if ($v != 0);
    }

    ## .
    return ($#$a < $#$b ? -1 : 0);
}

sub dns_name_cmp ($$) {
    my ($a, $b) = @_;

    ## .
    list_cmp (undef,
              [reverse (split (/\./, $a, -1))],
              [reverse (split (/\./, $b, -1))]);
}

## print in reverse
print (join ("\n", sort { - dns_name_cmp ($a, $b); } (@ordered)), "\n");
# foo.example.org
# foo.bar.example.org
# bar.example.org
# qux.example.net

	However, doing a split on every sort iteration doesn't seem all
	that sensible (or does it?)  Let's try with rindex () instead:

sub dns_name_cmp ($$) {
    my ($a, $b) = @_;

    my ($ta, $tb)
        = (-1 + length ($a), -1 + length ($b));
    while ($ta >= 0 && $tb >= 0) {
        ## NB: $[ is deprecated, thus rindex () >= -1
        my ($i, $j)
            = (rindex ($a, ".", $ta),
               rindex ($b, ".", $tb));
        ## .
        my $v
            = (substr     ($a, 1 + $i, $ta - $i)
               cmp substr ($b, 1 + $j, $tb - $j));
        return $v
            if ($v != 0);
        ($ta, $tb)
            = (-1 + $i, -1 + $j);
    }

    ## .
    return ($ta   > 0 ? +1
            : $tb > 0 ? -1
            : 0);
}

## print in reverse
print (join ("\n", sort { - dns_name_cmp ($a, $b); } (@ordered)), "\n");
# foo.example.org
# foo.bar.example.org
# bar.example.org
# qux.example.net

	Hopefully I didn't miss any corner case with these.

	Now, it makes me wonder if there's an easier way to write this
	function...

-- 
FSF associate member #7257


------------------------------

Date: Thu, 28 Feb 2013 10:24:12 +0100
From: "Dr.Ruud" <rvtol+usenet@xs4all.nl>
To: Ivan Shmakov <oneingray@gmail.com>
Subject: Re: domain name ordering
Message-Id: <512F223C.8030600@xs4all.nl>

On 2013-02-28 09:36, Ivan Shmakov wrote:
> 	One more "domain name" problem: write an "inequality" predicate
> 	to compare domain names "right to left".  IOW:
>
> my @ordered
>      = qw (qux.example.net bar.example.org foo.bar.example.org foo.example.org);

You can build a hash from that, for example:

   my %dname = (
       net => {
           example => { qux => { '.' => -1 } },
       },
       org => {
           example => {
               bar => {
                   foo => { '.' => -1 },
                   '.' => -1,
               },
               foo => { '.' => -1 },
           },
       },
   );



Or do
   my @dnames= sort { length($b) <=> length($a) or $a cmp $b } @ordered;
to make it:
   qw(foo.bar.x.org bar.x.org foo.x.org qux.x.net);
and use a 'for my $dname (@dnames) { ... } loop
that stops at the first match on
   /(?:^|\.)\Q$dname$/

-- 
Ruud



------------------------------

Date: Thu, 28 Feb 2013 12:48:00 +0100
From: Bo Lindbergh <blgl@stacken.kth.se>
Subject: Re: domain name ordering
Message-Id: <kgng38$g8b$1@dont-email.me>

Transform each FQDN into something that will sort correctly
using the default string comparison; sort; reverse the transformation.

@ordered = map {
    join ".", reverse split / /;
} sort map {
    join " ", reverse split /\./;
} @unordered;


/Bo Lindbergh


------------------------------

Date: Thu, 28 Feb 2013 11:57:06 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: domain name ordering
Message-Id: <87k3ps4pnx.fsf@sapphire.mobileactivedefense.com>

Ivan Shmakov <oneingray@gmail.com> writes:
> 	One more "domain name" problem: write an "inequality" predicate
> 	to compare domain names "right to left".  IOW:
>
> my @ordered
>     = qw (qux.example.net bar.example.org foo.bar.example.org foo.example.org);
>
> 	A trivial solution is to split /\./ each of the names, reverse
> 	the resulting lists, and then compare them elementwise, until
> 	either one of the lists ends (which is then the lesser one), or
> 	an inequality is found.  Hence:
>
> sub list_cmp (&$$) {

[...]

> 	However, doing a split on every sort iteration doesn't seem all
> 	that sensible (or does it?)  Let's try with rindex () instead:
>
>
> sub dns_name_cmp ($$) {
>     my ($a, $b) = @_;
>
>     my ($ta, $tb)
>         = (-1 + length ($a), -1 + length ($b));

-1 + length($a) == length($a) - 1

>     while ($ta >= 0 && $tb >= 0) {
>         ## NB: $[ is deprecated, thus rindex () >= -1
>         my ($i, $j)
>             = (rindex ($a, ".", $ta),
>                rindex ($b, ".", $tb));
>         ## .
>         my $v
>             = (substr     ($a, 1 + $i, $ta - $i)
>                cmp substr ($b, 1 + $j, $tb - $j));

Have you benchmarked that? You're still doing the structure analysis
and component extraction for ever comparison and out of my head, I
wouldn't be absolutely sure that doing it this way instead of with
split is actually faster.

The obvious other idea would be to split the domain names once and
leave them this way until the list is sorted. If you could afford to
lose some speed at the expense of code clarity, this could be put in a
class overloading cmp which either constructs the 'non-splitted,
non-reversed' representation 'on demand' (=> overloaded "") or keeps
both around or constructs the 'exploded' one on demand.

Instead of reversing the splitted lists, one could also compare from
the end, possibly using negative indices ($a[$#a] is the same as $a[-1]).


------------------------------

Date: Thu, 28 Feb 2013 12:06:32 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: domain name ordering
Message-Id: <87ehg04p87.fsf@sapphire.mobileactivedefense.com>

Bo Lindbergh <blgl@stacken.kth.se> writes:
> Transform each FQDN into something that will sort correctly
> using the default string comparison; sort; reverse the transformation.
>
> @ordered = map {
>     join ".", reverse split / /;
> } sort map {
>     join " ", reverse split /\./;
> } @unordered;

This is a neat idea. It could be made slightly more general by using
\000 instead of the space as alternate label separator.


------------------------------

Date: Thu, 28 Feb 2013 15:32:11 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: domain name ordering
Message-Id: <87vc9co3no.fsf@sapphire.mobileactivedefense.com>

Rainer Weikusat <rweikusat@mssgmbh.com> writes:
> Bo Lindbergh <blgl@stacken.kth.se> writes:
>> Transform each FQDN into something that will sort correctly
>> using the default string comparison; sort; reverse the transformation.
>>
>> @ordered = map {
>>     join ".", reverse split / /;
>> } sort map {
>>     join " ", reverse split /\./;
>> } @unordered;
>
> This is a neat idea. It could be made slightly more general by using
> \000 instead of the space as alternate label separator.

A 'better' ('faster') idea which needs even less Perl code would be to
build a hash of sort keys and sort the domain name array based on
comparing these instead of destroying and recreating the dataset which
is supposed to be sorted:

------------------
use Benchmark;

my @in = reverse(qw(qux.example.net bar.example.org foo.bar.example.org foo.example.org));

timethese(-5, 
	  {
	   sort_x => sub {
	       return map {
		   join '.', reverse split /\000/
	       } sort map {
		   join "\000", reverse split /\./
	       } @in;
	   },
	   
	   sort_y => sub {
	       my %keys;

	       $keys{$_} = join("\000", reverse(split(/\./, $_))) for @in;
	       return sort { $keys{$a} cmp $keys{$b} } @in;
	   }});



------------------------------

Date: Thu, 28 Feb 2013 18:36:08 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: domain name ordering
Message-Id: <olk40a-vga.ln1@anubis.morrow.me.uk>


Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
> Rainer Weikusat <rweikusat@mssgmbh.com> writes:
> > Bo Lindbergh <blgl@stacken.kth.se> writes:
> >> Transform each FQDN into something that will sort correctly
> >> using the default string comparison; sort; reverse the transformation.
> >>
> >> @ordered = map {
> >>     join ".", reverse split / /;
> >> } sort map {
> >>     join " ", reverse split /\./;
> >> } @unordered;
> >
> > This is a neat idea. It could be made slightly more general by using
> > \000 instead of the space as alternate label separator.
> 
> A 'better' ('faster') idea which needs even less Perl code would be to
> build a hash of sort keys and sort the domain name array based on
> comparing these instead of destroying and recreating the dataset which
> is supposed to be sorted:
[...]
> 	       my %keys;
> 
> 	       $keys{$_} = join("\000", reverse(split(/\./, $_))) for @in;
> 	       return sort { $keys{$a} cmp $keys{$b} } @in;

Or just use a standard ST

    map $$_[0], 
        sort { $$a[1] cmp $$b[1] } 
        map [ $_, join "\0", reverse split /\./ ],
        @in;

or use Sort::Maker or Sort::Key or...

Actually, now that I think about it, there's another way of doing this I
haven't seen anywhere else: use dualvars.

    use Scalar::Util "dualvar";

    sub ksort (&@) {
        my $cb = shift;

        map $_[$_], 
            sort
            map dualvar($_, do {
                local $_ = $_[$_];
                $cb->();
            }),
            0..$#_;
    }

    my @out = ksort { join "\0", reverse split /\./ } @in;

This has the advantage of using perl's argumentless sort, which is much
faster than calling any kind of callback.

Ben



------------------------------

Date: Thu, 28 Feb 2013 18:54:24 +0000
From: Ivan Shmakov <oneingray@gmail.com>
Subject: Re: domain name ordering
Message-Id: <874ngw6zhb.fsf@violet.siamics.net>

>>>>> Rainer Weikusat <rweikusat@mssgmbh.com> writes:

[...]

 > A 'better' ('faster') idea which needs even less Perl code would be
 > to build a hash of sort keys and sort the domain name array based on
 > comparing these instead of destroying and recreating the dataset
 > which is supposed to be sorted:

[...]

	Unless I be mistaken, sort { } () with the second variant of
	dns_name_cmp I've originally posted is still faster.

$ cat < 3zytr64ehn1aqckj5mk93da6zg.perl 
use common::sense;

require Benchmark;

## Courtesy of /usr/share/dict/words
my @n1 = qw {
    igloo.burgs.tuft leads.polyp.lakes doily.rude.corks rains.cuds.bilks
    aces.mumps.colts pink.bebop.prune coeds.troop.sees tour.hulks.rungs
    vent.dial.whelk logos.loped.nor mated.bran.cited pies.loaf.rune
    sperm.yelp.harms how.scaly.ante dogma.lawns.plods dell.yaw.piano
    caged.daub.gores weird.sofas.arias rave.gouty.tamp sics.avows.today
    crass.execs.smuts dale.real.stack boots.dew.pool demur.fancy.stoop
    gruff.aspen.memo gaze.homey.sorts new.slot.queue wiry.share.gulf
    phase.canny.vogue grace.eddy.beaux juts.wands.doc helm.was.ladle
};
my @n2 = qw {
    bums.example.com fools.example.com sic.example.org feel.example.com
    wound.example.net stop.example.com diced.example.org
    purse.example.com sip.example.net poll.example.com whole.example.net
    kinky.example.net homer.example.org ulnas.example.com
    won.example.org chase.example.com prong.example.net
    hacks.example.net link.example.com monk.example.com sexy.example.net
    bogs.example.net crest.example.org rays.example.com
    solar.example.com click.example.net mixer.example.com
    rib.example.org pinup.example.com gees.example.net fizzy.example.com
    aisle.example.org
};

## sub dns_name_cmp { ... as per news:87d2vk7s2k.fsf@violet.siamics.net ... }

sub rw_sort (@) {
    my %keys;

    $keys{$_}
        = join ("\000", reverse (split (/\./, $_)))
        for @_;
    ## .
    return
        sort { $keys{$a} cmp $keys{$b} } @_;
}

Benchmark::timethese (1024 * 1024, {
    "is-1" => sub { sort dns_name_cmp (@n1); },
    "is-2" => sub { sort dns_name_cmp (@n2); },
    "rw-1" => sub { rw_sort (@n1); },
    "rw-2" => sub { rw_sort (@n1); }
});
$ perl  3zytr64ehn1aqckj5mk93da6zg.perl 
Benchmark: timing 1048576 iterations of is-1, is-2, rw-1, rw-2...
      is-1: -1 wallclock secs ( 0.03 usr +  0.00 sys =  0.03 CPU) @ 34952533.33/s (n=1048576)
            (warning: too few iterations for a reliable count)
      is-2: -1 wallclock secs ( 0.16 usr +  0.00 sys =  0.16 CPU) @ 6553600.00/s (n=1048576)
            (warning: too few iterations for a reliable count)
      rw-1: 69 wallclock secs (69.14 usr +  0.03 sys = 69.17 CPU) @ 15159.40/s (n=1048576)
      rw-2: 69 wallclock secs (68.61 usr +  0.02 sys = 68.63 CPU) @ 15278.68/s (n=1048576)
$ 

-- 
FSF associate member #7257	np. Lord of the Rings -- Blind Guardian


------------------------------

Date: Thu, 28 Feb 2013 19:00:41 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: domain name ordering
Message-Id: <87ip5cnu06.fsf@sapphire.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
>> Rainer Weikusat <rweikusat@mssgmbh.com> writes:
>> > Bo Lindbergh <blgl@stacken.kth.se> writes:
>> >> Transform each FQDN into something that will sort correctly
>> >> using the default string comparison; sort; reverse the transformation.
>> >>
>> >> @ordered = map {
>> >>     join ".", reverse split / /;
>> >> } sort map {
>> >>     join " ", reverse split /\./;
>> >> } @unordered;
>> >
>> > This is a neat idea. It could be made slightly more general by using
>> > \000 instead of the space as alternate label separator.
>> 
>> A 'better' ('faster') idea which needs even less Perl code would be to
>> build a hash of sort keys and sort the domain name array based on
>> comparing these instead of destroying and recreating the dataset which
>> is supposed to be sorted:
> [...]
>> 	       my %keys;
>> 
>> 	       $keys{$_} = join("\000", reverse(split(/\./, $_))) for @in;
>> 	       return sort { $keys{$a} cmp $keys{$b} } @in;
>
> Or just use a standard ST
>
>     map $$_[0], 
>         sort { $$a[1] cmp $$b[1] } 
>         map [ $_, join "\0", reverse split /\./ ],
>         @in;

I tried that. It is/ was slower for the data I tested with than the
algorithm originally posted by Bo Lindbergh.


------------------------------

Date: Thu, 28 Feb 2013 19:57:13 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: domain name ordering
Message-Id: <87ehg0nrdy.fsf@sapphire.mobileactivedefense.com>

Ivan Shmakov <oneingray@gmail.com> writes:
>>>>>> Rainer Weikusat <rweikusat@mssgmbh.com> writes:
>
> [...]
>
>  > A 'better' ('faster') idea which needs even less Perl code would be
>  > to build a hash of sort keys and sort the domain name array based on
>  > comparing these instead of destroying and recreating the dataset
>  > which is supposed to be sorted:
>
> [...]
>
> 	Unless I be mistaken, sort { } () with the second variant of
> 	dns_name_cmp I've originally posted is still faster.

[...]

> sub rw_sort (@) {
>     my %keys;
>
>     $keys{$_}
>         = join ("\000", reverse (split (/\./, $_)))
>         for @_;
>     ## .
>     return
>         sort { $keys{$a} cmp $keys{$b} } @_;
> }
>
> Benchmark::timethese (1024 * 1024, {
>     "is-1" => sub { sort dns_name_cmp (@n1); },
>     "is-2" => sub { sort dns_name_cmp (@n2); },
>     "rw-1" => sub { rw_sort (@n1); },
>     "rw-2" => sub { rw_sort (@n1); }
> });

The obvious issue with this would be that you're adding a gratuitious
subroutine call per iteration for the 3rd and 4th test. But that
doesn't matter that much: Comparing the labels via substring and
stopping at the first mismatch is several orders of magnitude faster
then processing the complete data once per sort and sorting based on
the 'preprocessed' data.


------------------------------

Date: Thu, 28 Feb 2013 12:52:06 -0800
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: domain name ordering
Message-Id: <9ngvi85te2nmaomhta3guodk5qtqch4f6l@4ax.com>

Ivan Shmakov <oneingray@gmail.com> wrote:
[...]
>	However, doing a split on every sort iteration doesn't seem all
>	that sensible (or does it?)  
[...]
>	Now, it makes me wonder if there's an easier way to write this
>	function...

Standard answer to this whole class of problem is to use a Schwartzian
Transformation: http://en.wikipedia.org/wiki/Schwartzian_transform

jue


------------------------------

Date: Fri, 01 Mar 2013 14:47:19 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: domain name ordering
Message-Id: <87lia718js.fsf@sapphire.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:

[...]

> Actually, now that I think about it, there's another way of doing this I
> haven't seen anywhere else: use dualvars.
>
>     use Scalar::Util "dualvar";
>
>     sub ksort (&@) {
>         my $cb = shift;
>
>         map $_[$_], 
>             sort
>             map dualvar($_, do {
>                 local $_ = $_[$_];
>                 $cb->();
>             }),
>             0..$#_;
>     }
>
>     my @out = ksort { join "\0", reverse split /\./ } @in;

This is a neat idea as well. I thought about attaching the indices of
the actual data items to the corresponding sort keys and generating
the sorted sequence by pulling data items from the input array based
on the index permutation in the sorted, intermediate array 'somehow'
but abandoned the idea in favor of using anonymous arrays to tack sort
key and data item together (I knew that something called 'a
Schwartzian transform' existed but had not real idea what it actually
was) because I wasn't aware of a sensible way to accomplish the other.



------------------------------

Date: Thu, 28 Feb 2013 11:40:19 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Rehabilitating bad Perl code
Message-Id: <87obf44qfw.fsf@sapphire.mobileactivedefense.com>

dkcombs@panix.com (David Combs) writes:
> In article <40368ab6-b6b0-4401-9c06-92390a2153c3@googlegroups.com>,
> ccc31807  <cartercc@gmail.com> wrote:
>>On Tuesday, January 8, 2013 2:37:57 PM UTC-5, Henry Law wrote:
>
>>3. I intentionally attempted to rewrite the code in a functional style, which has paid for itself in decreased maintenance time over the years.
>
> [very late followup, this]
>
> For education of us all, please say a bit about how you
> wrote perl code in a functional style.

Judgeing from the text, I think what he meant is structuring the code
into named subroutines solving subproblems instead of the usual "5000
lines down the road, there comes the final bracket of 'the loop'" (cf
"Real programmers don't use Pascal").


------------------------------

Date: Thu, 28 Feb 2013 16:31:15 -0800 (PST)
From: ccc31807 <cartercc@gmail.com>
Subject: Re: Rehabilitating bad Perl code
Message-Id: <4b9971c3-385c-422e-b977-c1b58796c179@googlegroups.com>

On Wednesday, February 27, 2013 9:19:34 PM UTC-5, David Combs wrote:

> >3. I intentionally attempted to rewrite the code in a functional style, which has paid for itself in decreased maintenance time over the years.
> 
> [very late followup, this]
> 
> For education of us all, please say a bit about how you
> wrote perl code in a functional style.

I should have said 'procedural style', by which I mean named blocks of code that accept arguments and return results.

I would recommend 'Higher Order Perl' which is available as a free web version, but by the book to encourage the order. This is a good book. You probably won't write Perl in a (true) functional style, but the techniques are slick.

CC


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3889
***************************************


home help back first fref pref prev next nref lref last post