[32961] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4237 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jun 16 09:09:16 2014

Date: Mon, 16 Jun 2014 06:09:04 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 16 Jun 2014     Volume: 11 Number: 4237

Today's topics:
    Re: Can be this be optimized? <Wasell@example.invalid>
    Re: Can be this be optimized? <gamo@telecable.es>
    Re: Can be this be optimized? <hjp-usenet3@hjp.at>
    Re: Can be this be optimized? <jurgenex@hotmail.com>
    Re: Can be this be optimized? <hjp-usenet3@hjp.at>
    Re: Can be this be optimized? <gravitalsun@hotmail.foo>
    Re: Can be this be optimized? <gamo@telecable.es>
    Re: Can be this be optimized? <rweikusat@mobileactivedefense.com>
    Re: Can be this be optimized? <gamo@telecable.es>
    Re: Can be this be optimized? <hjp-usenet3@hjp.at>
    Re: Can be this be optimized? <gamo@telecable.es>
    Re: Can be this be optimized? <rweikusat@mobileactivedefense.com>
    Re: Can be this be optimized? <gamo@telecable.es>
    Re: sorting file according to a unicode column ehabaziz2001@gmail.com
    Re: sorting file according to a unicode column <jurgenex@hotmail.com>
    Re: sorting file according to a unicode column <gravitalsun@hotmail.foo>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sat, 14 Jun 2014 11:09:08 +0200
From: Wasell <Wasell@example.invalid>
Subject: Re: Can be this be optimized?
Message-Id: <MPG.2e062f8cfc23fa80989698@news.eternal-september.org>

On Sat, 14 Jun 2014 07:40:58 +0200, in article <lngn9a$k48$1
@speranza.aioe.org>, gamo wrote:
> 
> sub count{
>      my ($char, $string) = @_;
>      my $c = 0;
>      ++$c while ($string =~ /$char/g);
>      return $c;
> }
> 
> TIA

Maybe this:

    sub count {
      my ($char, $string) = @_;
      return scalar( () = $string =~ /$char/g );
    }

Also, "perldoc -q count" will give you useful information.


------------------------------

Date: Sat, 14 Jun 2014 12:26:15 +0200
From: gamo <gamo@telecable.es>
Subject: Re: Can be this be optimized?
Message-Id: <lnh806$sin$1@speranza.aioe.org>

El 14/06/14 11:09, Wasell escribi:
>        return scalar( () = $string =~ /$char/g );

Thank you. It works and must be a lot faster.

-- 
http://www.telecable.es/personales/gamo/


------------------------------

Date: Sat, 14 Jun 2014 12:51:38 +0200
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: Can be this be optimized?
Message-Id: <slrnlpoa9q.rnm.hjp-usenet3@hrunkner.hjp.at>

On 2014-06-14 09:09, Wasell <Wasell@example.invalid> wrote:
> On Sat, 14 Jun 2014 07:40:58 +0200, in article <lngn9a$k48$1
> @speranza.aioe.org>, gamo wrote:
>> 
>> sub count{
>>      my ($char, $string) = @_;
>>      my $c = 0;
>>      ++$c while ($string =~ /$char/g);
>>      return $c;
>> }
>> 
>> TIA
>
> Maybe this:
>
>     sub count {
>       my ($char, $string) = @_;
>       return scalar( () = $string =~ /$char/g );
>     }

Surprisingly, gamo's version is a bit faster on my systems.

        hp

-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpat. -- Ralph Babel


------------------------------

Date: Sat, 14 Jun 2014 04:28:52 -0700
From: Jrgen Exner <jurgenex@hotmail.com>
Subject: Re: Can be this be optimized?
Message-Id: <8fcop9951vq6bcofc5vt2qpeqgsi6pdc0s@4ax.com>

gamo <gamo@telecable.es> wrote:
>sub count{

The usual way is to use the tr/// operator.

jue


------------------------------

Date: Sat, 14 Jun 2014 14:37:51 +0200
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: Can be this be optimized?
Message-Id: <slrnlpoggv.iuv.hjp-usenet3@hrunkner.hjp.at>

On 2014-06-14 11:28, Jrgen Exner <jurgenex@hotmail.com> wrote:
> gamo <gamo@telecable.es> wrote:
>>sub count{
>
> The usual way is to use the tr/// operator.

This works only for constant characters. tr/// doesn't do double quote
interpolation, so you would have to use string eval to implement a count
subroutine with tr (which I did and which is has (unsurprisingly) quite
a high overhead. It's faster for long strings (over ~ 2000 characters),
though).

In all cases you have to be careful with the $char argument, because it
may have a special meaning. In the regexp variants using /\Q$char/
instead of /$char/ should work, though.

        hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpat. -- Ralph Babel


------------------------------

Date: Sat, 14 Jun 2014 16:06:51 +0300
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: Can be this be optimized?
Message-Id: <lnhhda$2dsj$1@news.ntua.gr>

#!/usr/bin/perl
# This is about 10 times faster

use strict;
use warnings;
use Benchmark;

my $char     = 'l';
my $string   = 'Hello Worldll';
my $veryfast = fast_iterator($char, $string);


Benchmark::cmpthese(100_000, {
orig => sub {count($char, $string)},
fast => $veryfast
});


sub count{
my ($char, $string) = @_;
my $c = 0;
++$c while ($string =~ /$char/g);
return $c;
}

sub fast_iterator {
my ($char, $string, $c) = (@_,0);
eval "sub { '$string'=~tr/$char/$char/ }"
}



------------------------------

Date: Sat, 14 Jun 2014 19:45:24 +0200
From: gamo <gamo@telecable.es>
Subject: Re: Can be this be optimized?
Message-Id: <lni1ni$r1f$1@speranza.aioe.org>

El 14/06/14 15:06, George Mpouras escribi:
> #!/usr/bin/perl
> # This is about 10 times faster
>
> use strict;
> use warnings;
> use Benchmark;
>
> my $char     = 'l';
> my $string   = 'Hello Worldll';
> my $veryfast = fast_iterator($char, $string);
>
>
> Benchmark::cmpthese(100_000, {
> orig => sub {count($char, $string)},
> fast => $veryfast
> });
>
>
> sub count{
> my ($char, $string) = @_;
> my $c = 0;
> ++$c while ($string =~ /$char/g);
> return $c;
> }
>
> sub fast_iterator {
> my ($char, $string, $c) = (@_,0);
> eval "sub { '$string'=~tr/$char/$char/ }"
> }
>

Well, thanks. At first sight, doesn't look good.
At a second sight, it seems to run fast_iterator()
only once and store the result in a scalar $veryfast.
I don't check the results: sure they are impressive.
It happened to me too, a great result is caused
for not doing rather than doing.

-- 
http://www.telecable.es/personales/gamo/


------------------------------

Date: Sun, 15 Jun 2014 14:47:03 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Can be this be optimized?
Message-Id: <87ioo270a0.fsf@sable.mobileactivedefense.com>

gamo <gamo@telecable.es> writes:
> sub count{
>     my ($char, $string) = @_;
>     my $c = 0;
>     ++$c while ($string =~ /$char/g);
>     return $c;
> }

Depending on how long your strings are and how often you'll be calling
this function vs how often the program is going to be compiled, using a
set of "pre-compiled character-counting routines" might make sense:

----------------------
use Benchmark;

use constant LEN => 50;

my $string = join('', map { chr(rand(26) + 65) } 1 .. LEN);
my $char = chr(rand(26) + 65);
    

print STDERR ("$string, $char\n");
    
sub count{
    my ($char, $string) = @_;
    my $c = 0;
    ++$c while ($string =~ /$char/g);
    return $c;
}

sub count2 {
    my ($char, $string) = @_;
    return scalar( () = $string =~ /$char/g );
}

my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';
sub count3 {
    &{$counters[ord(shift) - 65]};
}


print STDERR (count3($char, $string), "\n");


timethese(-4,
	  {
	   count => sub { count($char, $string) },
	   count2 => sub { count2($char, $string)},
	   count3 => sub { count3($char, $string)}
	  });


------------------------------

Date: Sun, 15 Jun 2014 16:48:48 +0200
From: gamo <gamo@telecable.es>
Subject: Re: Can be this be optimized?
Message-Id: <lnkbog$rl$1@speranza.aioe.org>

El 15/06/14 15:47, Rainer Weikusat escribi:
> gamo <gamo@telecable.es> writes:
>> sub count{
>>      my ($char, $string) = @_;
>>      my $c = 0;
>>      ++$c while ($string =~ /$char/g);
>>      return $c;
>> }
>
> Depending on how long your strings are and how often you'll be calling
> this function vs how often the program is going to be compiled, using a
> set of "pre-compiled character-counting routines" might make sense:
>
> ----------------------
> use Benchmark;
>
> use constant LEN => 50;
>
> my $string = join('', map { chr(rand(26) + 65) } 1 .. LEN);
> my $char = chr(rand(26) + 65);
>
>
> print STDERR ("$string, $char\n");
>
> sub count{
>      my ($char, $string) = @_;
>      my $c = 0;
>      ++$c while ($string =~ /$char/g);
>      return $c;
> }
>
> sub count2 {
>      my ($char, $string) = @_;
>      return scalar( () = $string =~ /$char/g );
> }
>
> my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';
> sub count3 {
>      &{$counters[ord(shift) - 65]};
> }
>
>
> print STDERR (count3($char, $string), "\n");
>
>
> timethese(-4,
> 	  {
> 	   count => sub { count($char, $string) },
> 	   count2 => sub { count2($char, $string)},
> 	   count3 => sub { count3($char, $string)}
> 	  });
>

Thank you. 'count3' is faster but do the counting before,
I think. It's confuse for me.

Is there any chance that a solution with substr could be
efficient?

-- 
http://www.telecable.es/personales/gamo/


------------------------------

Date: Sun, 15 Jun 2014 17:13:28 +0200
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: Can be this be optimized?
Message-Id: <slrnlpre0o.49d.hjp-usenet3@hrunkner.hjp.at>

On 2014-06-15 14:48, gamo <gamo@telecable.es> wrote:
> El 15/06/14 15:47, Rainer Weikusat escribi:
>> gamo <gamo@telecable.es> writes:
>>> sub count{
>>>      my ($char, $string) = @_;
>>>      my $c = 0;
>>>      ++$c while ($string =~ /$char/g);
>>>      return $c;
>>> }
>>
>> Depending on how long your strings are and how often you'll be calling
>> this function vs how often the program is going to be compiled, using a
>> set of "pre-compiled character-counting routines" might make sense:
>>
>> ----------------------
[...]
>>
>> my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';
>> sub count3 {
>>      &{$counters[ord(shift) - 65]};
>> }
>>
>
> Thank you. 'count3' is faster but do the counting before,
> I think. It's confuse for me.

No, it uses the same trick as Mpouras demonstrated before: It constructs
and compiles a custom subroutine counting the occurances of a specific
character using tr and then runs that the custom sub. The difference is
that Rainer's version creates subs for the characters 'A' to 'Z'
beforehand and then just calls the appropriate behind the scenes, while
with Mpouras' version you had to call the generator function
explicitely.

I think I would just create each custom functions on first call and use
a hash to cache them instead or precreateing them for a set of
characters, though. Just in case somebody wants to search for
"\x{1F4A9}" ...


> Is there any chance that a solution with substr could be
> efficient?

Why don't you try it? I doubt it, but then I was surprised that your
explicit loop was faster than using scalar to count the matches. 

Using index() is about as fast as your original version.

        hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpat. -- Ralph Babel


------------------------------

Date: Sun, 15 Jun 2014 19:20:16 +0200
From: gamo <gamo@telecable.es>
Subject: Re: Can be this be optimized?
Message-Id: <lnkkkf$nm3$1@speranza.aioe.org>

El 15/06/14 17:13, Peter J. Holzer escribi:
>> Is there any chance that a solution with substr could be
>> >efficient?
> Why don't you try it? I doubt it, but then I was surprised that your
> explicit loop was faster than using scalar to count the matches.
>
> Using index() is about as fast as your original version.
>
>

Tried with substr, it's a lot slower.

Thanks

-- 
http://www.telecable.es/personales/gamo/


------------------------------

Date: Sun, 15 Jun 2014 19:41:29 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Can be this be optimized?
Message-Id: <87r42q2exy.fsf@sable.mobileactivedefense.com>

gamo <gamo@telecable.es> writes:
> El 15/06/14 15:47, Rainer Weikusat escribi:
>> gamo <gamo@telecable.es> writes:
>>> sub count{
>>>      my ($char, $string) = @_;
>>>      my $c = 0;
>>>      ++$c while ($string =~ /$char/g);
>>>      return $c;
>>> }

[...]

>> my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';
>> sub count3 {
>>      &{$counters[ord(shift) - 65]};
>> }
>>
>>
>> print STDERR (count3($char, $string), "\n");
>>
>>
>> timethese(-4,
>> 	  {
>> 	   count => sub { count($char, $string) },
>> 	   count2 => sub { count2($char, $string)},
>> 	   count3 => sub { count3($char, $string)}
>> 	  });
>>
>
> Thank you. 'count3' is faster but do the counting before,
> I think. It's confuse for me.

Some explanations: For the purpose of this example, 'characters' are
restricted to the uppercase letters A - Z. Further, ASCII enconding is
assumed.

my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';

This builds an array of 26 subroutines each counting occurrence of one of
the possible input characters in its first argument. In order to avoid
copying this argument (and because the subroutine code is really
simple), it accesses that via @_ as $_[0].

sub count3 {
      &{$counters[ord(shift) - 65]};
}

This is a subroutine which expects the character supposed to be counted
as first argument and the string as second. It shifts the first argument
of its @_ and converts that into an ASCII codepoint via ord. Subtracting
65 (the ASCII code of A) from this numbers results in the array index
for the counting subroutine counting the correct character. The result
of the expression

$counters[ord(shift) - 65]

is a reference to this subroutine. It is then invoked via & without
arguments which means the invoked subroutine uses the @_ of the invoking
subroutine. Since the original first argument was shifted away, the
string to be searched is now the new $_[0] which was what the called
subroutine expects.

This is also a nice example where the argument passing mechanism used by
Perl really shines: All the outer subroutine has to know about the
passed @_ is 'the first argument is mine'. Whatever remains is simply
passed on to the next subroutine and considering that Perl is strictly
'the caller decides whatever it likes to pass' (when prototypes are not
being used), calling a subroutine with more than one argument indirectly
in this way can be accomplished by passing more arguments to the
directly invoked subroutine which will pass them on.


------------------------------

Date: Sun, 15 Jun 2014 21:53:56 +0200
From: gamo <gamo@telecable.es>
Subject: Re: Can be this be optimized?
Message-Id: <lnktkj$f06$2@speranza.aioe.org>

El 15/06/14 20:41, Rainer Weikusat escribi:
>> Thank you. 'count3' is faster but do the counting before,
>> I think. It's confuse for me.
>
> Some explanations: For the purpose of this example, 'characters' are
> restricted to the uppercase letters A - Z. Further, ASCII enconding is
> assumed.
>
> my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';
>
> This builds an array of 26 subroutines each counting occurrence of one of
> the possible input characters in its first argument. In order to avoid
> copying this argument (and because the subroutine code is really
> simple), it accesses that via @_ as $_[0].
>
> sub count3 {
>        &{$counters[ord(shift) - 65]};
> }
>
> This is a subroutine which expects the character supposed to be counted
> as first argument and the string as second. It shifts the first argument
> of its @_ and converts that into an ASCII codepoint via ord. Subtracting
> 65 (the ASCII code of A) from this numbers results in the array index
> for the counting subroutine counting the correct character. The result
> of the expression
>
> $counters[ord(shift) - 65]
>
> is a reference to this subroutine. It is then invoked via & without
> arguments which means the invoked subroutine uses the @_ of the invoking
> subroutine. Since the original first argument was shifted away, the
> string to be searched is now the new $_[0] which was what the called
> subroutine expects.
>
> This is also a nice example where the argument passing mechanism used by
> Perl really shines: All the outer subroutine has to know about the
> passed @_ is 'the first argument is mine'. Whatever remains is simply
> passed on to the next subroutine and considering that Perl is strictly
> 'the caller decides whatever it likes to pass' (when prototypes are not
> being used), calling a subroutine with more than one argument indirectly
> in this way can be accomplished by passing more arguments to the
> directly invoked subroutine which will pass them on.
>

Genial, thanks for the explanation.

-- 
http://www.telecable.es/personales/gamo/


------------------------------

Date: Sat, 14 Jun 2014 13:11:32 -0700 (PDT)
From: ehabaziz2001@gmail.com
Subject: Re: sorting file according to a unicode column
Message-Id: <cc155c6e-e63c-462e-a583-b23172909c22@googlegroups.com>

I am sorry I can not find your email Mine is : ehabaziz2001@gmail.com


------------------------------

Date: Sat, 14 Jun 2014 13:44:49 -0700
From: Jrgen Exner <jurgenex@hotmail.com>
Subject: Re: sorting file according to a unicode column
Message-Id: <r0dpp91fgslaphiepvq4qv5qnme66t1sja@4ax.com>

ehabaziz2001@gmail.com wrote:
>I am sorry I can not find your email Mine is : ehabaziz2001@gmail.com

That's no problem, I never sent you one.

jue


------------------------------

Date: Sun, 15 Jun 2014 02:03:58 +0300
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: sorting file according to a unicode column
Message-Id: <lnikcu$1su1$1@news.ntua.gr>

Στις 14/6/2014 11:11 μμ, ο/η ehabaziz2001@gmail.com έγραψε:
> I am sorry I can not find your email Mine is : ehabaziz2001@gmail.com
>
use this one

perl -e "print pack 'h*', '76271667964716c63757e60486f647d61696c6e236f6d6'"


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4237
***************************************


home help back first fref pref prev next nref lref last post