[32779] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 4043 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Sep 25 16:09:38 2013

Date: Wed, 25 Sep 2013 13:09:04 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 25 Sep 2013     Volume: 11 Number: 4043

Today's topics:
    Re: mixed cmp operator for sorting <marc.girod@gmail.com>
    Re: mixed cmp operator for sorting <bill@todbe.com>
    Re: mixed cmp operator for sorting <ben@morrow.me.uk>
    Re: mixed cmp operator for sorting <hjp-usenet3@hjp.at>
    Re: mixed cmp operator for sorting <willem@turtle.stack.nl>
    Re: mixed cmp operator for sorting <ben@morrow.me.uk>
    Re: utilities in perl <cal@example.invalid>
    Re: utilities in perl <ben@morrow.me.uk>
    Re: utilities in perl <jurgenex@hotmail.com>
    Re: utilities in perl <cal@example.invalid>
    Re: utilities in perl <cal@example.invalid>
    Re: utilities in perl <cal@example.invalid>
    Re: utilities in perl <ben@morrow.me.uk>
    Re: utilities in perl <ben@morrow.me.uk>
    Re: utilities in perl <glex_no-spam@qwest-spam-no.invalid>
    Re: utilities in perl <glex_no-spam@qwest-spam-no.invalid>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Tue, 24 Sep 2013 11:35:35 -0700 (PDT)
From: Marc Girod <marc.girod@gmail.com>
Subject: Re: mixed cmp operator for sorting
Message-Id: <0a2815a4-fc15-4b7b-940c-3142acdd0881@googlegroups.com>

On Tuesday, 24 September 2013 08:15:29 UTC+1, Ben Morrow  wrote:

> What is 20% of what? What is 5% of what, and what makes you sure it's
> too expensive? Are you benchmarking a sort sub which does actual work,
> because I would be extremely surprised if the differences were visible
> in that case.

Sorry.  You are right.
I made now 4 versions of the function:

ucmp1 is the one using 'do' to initialize my ($a, $b).
ucmp is the prototyped version, getting its arguments from the stack.
ucmp2 is an attempt to avoid 'do's usage of the stack,  skipping the intermediate variables:
  my @t = ${"${pkg}::a"} =~ /(?=.)(\D*)(\d*)/gs;
mkcmp is an attempt to use a closure to avoid evaluating 'caller' every time.

I time with the original 6 item data:

my @data = qw( a12b34 a2c b23 a7 a7b 23 );
cmpthese(100000, {
                  'ucmp' => sub {1 for sort ucmp @data},
                  'ucmp1' => sub {1 for sort ucmp1 @data},
                  'ucmp2' => sub {1 for sort ucmp2 @data},
                  'mkcmp' => sub {my $cmp = mkcmp; 1 for sort $cmp @data}
                 });

The result is:

sort> ./cmpcmp
         Rate ucmp1 mkcmp ucmp2  ucmp
ucmp1  9747/s    --   -1%   -6%  -13%
mkcmp  9881/s    1%    --   -4%  -12%
ucmp2 10331/s    6%    5%    --   -8%
ucmp  11173/s   15%   13%    8%    --

So, the version using a prototype is about 15% (I got 20 yesterday) faster than the one using 'do'.
Avoiding stack manipulation is only  6% faster.
I guessed invoking 'caller' was expensive, but using a closure to avoid it involves something even more expensive (so only 1% faster than the slowest).

Of course, using different data would impact the results.

Marc


------------------------------

Date: Wed, 25 Sep 2013 00:43:18 -0700
From: "$Bill" <bill@todbe.com>
Subject: Re: mixed cmp operator for sorting
Message-Id: <l1u46m$966$1@dont-email.me>

On 9/24/2013 11:35, Marc Girod wrote:
> On Tuesday, 24 September 2013 08:15:29 UTC+1, Ben Morrow  wrote:
>
>> What is 20% of what? What is 5% of what, and what makes you sure it's
>> too expensive? Are you benchmarking a sort sub which does actual work,
>> because I would be extremely surprised if the differences were visible
>> in that case.
>
> Sorry.  You are right.
> I made now 4 versions of the function:

What happens if you just change the package name to main and drop
the initialization of $a/$b in the do and just let them come in as
defined variables ?  Wouldn't that give you a little quicker version ?

> ucmp1 is the one using 'do' to initialize my ($a, $b).
> ucmp is the prototyped version, getting its arguments from the stack.
> ucmp2 is an attempt to avoid 'do's usage of the stack,  skipping the intermediate variables:
>    my @t = ${"${pkg}::a"} =~ /(?=.)(\D*)(\d*)/gs;
> mkcmp is an attempt to use a closure to avoid evaluating 'caller' every time.
>
> I time with the original 6 item data:
>
> my @data = qw( a12b34 a2c b23 a7 a7b 23 );
> cmpthese(100000, {
>                    'ucmp' => sub {1 for sort ucmp @data},
>                    'ucmp1' => sub {1 for sort ucmp1 @data},
>                    'ucmp2' => sub {1 for sort ucmp2 @data},
>                    'mkcmp' => sub {my $cmp = mkcmp; 1 for sort $cmp @data}
>                   });
>
> The result is:
>
> sort> ./cmpcmp
>           Rate ucmp1 mkcmp ucmp2  ucmp
> ucmp1  9747/s    --   -1%   -6%  -13%
> mkcmp  9881/s    1%    --   -4%  -12%
> ucmp2 10331/s    6%    5%    --   -8%
> ucmp  11173/s   15%   13%    8%    --
>
> So, the version using a prototype is about 15% (I got 20 yesterday) faster than the one using 'do'.
> Avoiding stack manipulation is only  6% faster.
> I guessed invoking 'caller' was expensive, but using a closure to avoid it involves something even more expensive (so only 1% faster than the slowest).
>
> Of course, using different data would impact the results.




------------------------------

Date: Wed, 25 Sep 2013 09:57:24 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: mixed cmp operator for sorting
Message-Id: <k4laha-ca13.ln1@anubis.morrow.me.uk>


Quoth "$Bill" <bill@todbe.com>:
> On 9/24/2013 11:35, Marc Girod wrote:
> > On Tuesday, 24 September 2013 08:15:29 UTC+1, Ben Morrow  wrote:
> >
> >> What is 20% of what? What is 5% of what, and what makes you sure it's
> >> too expensive? Are you benchmarking a sort sub which does actual work,
> >> because I would be extremely surprised if the differences were visible
> >> in that case.
> >
> > Sorry.  You are right.
> > I made now 4 versions of the function:
> 
> What happens if you just change the package name to main and drop
> the initialization of $a/$b in the do and just let them come in as
> defined variables ?  Wouldn't that give you a little quicker version ?

What happens if the caller isn't in main?

I did a little testing, and once you've moved away from a plain block
the differences are pretty small and almost certainly not worth worrying
about. Anything you do by way of mucking about with caller or eval will
come out slower than using the prototype and @_.

    use Benchmark "cmpthese";

    my @d = qw/2 3 1 2 5 6 4/; 

    sub ab              { $a + 1 <=> $b + 1 } 
    sub args ($$)       { $_[0] + 1 <=> $_[1] + 1 } 
    my $ab = sub        { $a + 1 <=> $b + 1 };
    my $args = sub ($$) { $_[0] + 1 <=> $_[1] + 1 };

    cmpthese -5, { 
        block =>    sub { 1 for sort { $a + 1 <=> $b + 1 } @d }, 
        ab =>       sub { 1 for sort ab @d }, 
        args =>     sub { 1 for sort args @d }, 
        anonab =>   sub { 1 for sort $ab @d },
        anonargs => sub { 1 for sort $args @d },
    };

             Rate       ab     args anonargs   anonab    block
ab       269915/s       --      -0%      -5%      -6%     -11%
args     270042/s       0%       --      -5%      -6%     -11%
anonargs 283833/s       5%       5%       --      -2%      -6%
anonab   288225/s       7%       7%       2%       --      -5%
block    302222/s      12%      12%       6%       5%       --

Note that on that run $a/$b was slower than @_ in both cases; in my
tests the order of those two tends to switch randomly.

Ben



------------------------------

Date: Wed, 25 Sep 2013 13:13:29 +0200
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: mixed cmp operator for sorting
Message-Id: <slrnl45hap.fvd.hjp-usenet3@hrunkner.hjp.at>

On 2013-09-25 08:57, Ben Morrow <ben@morrow.me.uk> wrote:
> I did a little testing, and once you've moved away from a plain block
> the differences are pretty small and almost certainly not worth worrying
> about. Anything you do by way of mucking about with caller or eval will
> come out slower than using the prototype and @_.
>
>     use Benchmark "cmpthese";
>
>     my @d = qw/2 3 1 2 5 6 4/; 
>
>     sub ab              { $a + 1 <=> $b + 1 } 
>     sub args ($$)       { $_[0] + 1 <=> $_[1] + 1 } 
>     my $ab = sub        { $a + 1 <=> $b + 1 };
>     my $args = sub ($$) { $_[0] + 1 <=> $_[1] + 1 };
>
>     cmpthese -5, { 
>         block =>    sub { 1 for sort { $a + 1 <=> $b + 1 } @d }, 
>         ab =>       sub { 1 for sort ab @d }, 
>         args =>     sub { 1 for sort args @d }, 
>         anonab =>   sub { 1 for sort $ab @d },
>         anonargs => sub { 1 for sort $args @d },
>     };
>
>              Rate       ab     args anonargs   anonab    block
> ab       269915/s       --      -0%      -5%      -6%     -11%
> args     270042/s       0%       --      -5%      -6%     -11%
> anonargs 283833/s       5%       5%       --      -2%      -6%
> anonab   288225/s       7%       7%       2%       --      -5%
> block    302222/s      12%      12%       6%       5%       --
>
> Note that on that run $a/$b was slower than @_ in both cases; in my
> tests the order of those two tends to switch randomly.

Interesting. On my machine anonargs is consistently the fastest, ab the
slowest, with the others somewhere in between (and sometimes switching
places).

But the picture changes for larger arrays (e.g. 100 or 1000 elements):

Then args and anonargs are the slowest, ab and anonab are about 10%
faster and block is about 17% faster.

Looks like the args variants have the lowest setup cost but the highest
per call cost, so they are better for (very) small arrays, while ab and
especially block are better for larger arrays.

I also find it interesting that tha anon variants seem to be a bit
faster than their counterparts, and that this effect is more pronounced
for smaller arrays. Is there some setup cost associated with a normal
sub that an anonymous sub doesn't have?

	hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Wed, 25 Sep 2013 12:43:44 +0000 (UTC)
From: Willem <willem@turtle.stack.nl>
Subject: Re: mixed cmp operator for sorting
Message-Id: <slrnl45mk0.g8c.willem@turtle.stack.nl>

Marc Girod wrote:
) On Monday, 23 September 2013 08:30:46 UTC+1, John W. Krahn  wrote:
)
)>      my @t = $a =~ /\D+|\d+/g;
)>      my @s = $b =~ /\D+|\d+/g;
)> 
)> Avoids zero length strings.
)
) Indeed, but it implies other changes in the following.

      my @t = split /(\d+)/, $a;
      my @s = split /(\d+)/, $a;

Avoids both issues.


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT


------------------------------

Date: Wed, 25 Sep 2013 19:46:38 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: mixed cmp operator for sorting
Message-Id: <elnbha-a16.ln1@anubis.morrow.me.uk>


Quoth "Peter J. Holzer" <hjp-usenet3@hjp.at>:
> On 2013-09-25 08:57, Ben Morrow <ben@morrow.me.uk> wrote:
> > I did a little testing, and once you've moved away from a plain block
> > the differences are pretty small and almost certainly not worth worrying
> > about. Anything you do by way of mucking about with caller or eval will
> > come out slower than using the prototype and @_.
> >
<snip>
> >              Rate       ab     args anonargs   anonab    block
> > ab       269915/s       --      -0%      -5%      -6%     -11%
> > args     270042/s       0%       --      -5%      -6%     -11%
> > anonargs 283833/s       5%       5%       --      -2%      -6%
> > anonab   288225/s       7%       7%       2%       --      -5%
> > block    302222/s      12%      12%       6%       5%       --
> >
> > Note that on that run $a/$b was slower than @_ in both cases; in my
> > tests the order of those two tends to switch randomly.
> 
> Interesting. On my machine anonargs is consistently the fastest, ab the
> slowest, with the others somewhere in between (and sometimes switching
> places).

Anonargs is faster than block? I am surprised by that.

> But the picture changes for larger arrays (e.g. 100 or 1000 elements):
> 
> Then args and anonargs are the slowest, ab and anonab are about 10%
> faster and block is about 17% faster.

I can confirm that.

> Looks like the args variants have the lowest setup cost but the highest
> per call cost, so they are better for (very) small arrays, while ab and
> especially block are better for larger arrays.
> 
> I also find it interesting that tha anon variants seem to be a bit
> faster than their counterparts, and that this effect is more pronounced
> for smaller arrays. Is there some setup cost associated with a normal
> sub that an anonymous sub doesn't have?

Yes, there is:

    /usr/src% perl -MO=Concise -e'sub foo { 1 } sort foo qw/1 2 3/'
    9  <@> leave[1 ref] vKP/REFC ->(end)
    1     <0> enter ->2
    2     <;> nextstate(main 2 -e:1) v:{ ->3
    8     <@> sort vKS ->9
    3        <0> pushmark s ->4
    -        <1> null K/1 ->5
    4           <$> const(PV "foo") s/BARE ->5
    5        <$> const(PV "1") s ->6
    6        <$> const(PV "2") s ->7
    7        <$> const(PV "3") s ->8
    -e syntax OK

The subname is in the optree as a string (op 4), so it has to be looked
up at runtime. With an anon sub sort already has a reference, and
chasing a reference is much faster than looking up a name.

Ben



------------------------------

Date: Tue, 24 Sep 2013 15:53:34 -0700
From: Cal Dershowitz <cal@example.invalid>
Subject: Re: utilities in perl
Message-Id: <X6qdnfYRvIBzit_PnZ2dnUVZ_rSdnZ2d@supernews.com>

On 09/23/2013 09:28 AM, Ben Morrow wrote:
>
> Quoth tmcd@panix.com:
>> In article <87bo3j95p8.fsf@sable.mobileactivedefense.com>,
>> Rainer Weikusat  <rweikusat@mobileactivedefense.com> wrote:
>>> tmcd@panix.com (Tim McDaniel) writes:
>>>> I found your answer confusing.  When I type a command line, ...
>> ...
>>> That's probably how the shell invoked it but it need not be done in this
>>> way. Assuming execl as an example, ...
>>
>> I was restricting myself to the shell, and in particular to my
>> *perception* of the command line, in particular the "program name" and
>> "first argument".  Certainly exec.*() makes things clearer and allows
>> playing some games.
>
> It's important to be clear, though, that whether you invoke a perl
> script as
>
>      /path/to/script arg
>
> with a #! line or
>
>      perl /path/to/script arg
>
> the arguments perl sees are the same, so the variables end up set the
> same.
>
> This is quite separate from the possibility of mucking about with
> argv[0]. In the first case that argument is (I think) thrown away by the
> kernel; in the second perl will, as Rainer said, only use it for $^X if
> it hasn't got some other way of finding its own path.
>
> Ben
>

Ben,

I'm just catching up on this reading.  It seems like almost everyone had 
a turn to be wrong about some aspect of it.  Would anyone say that STDIN 
is doing anything here:

#!/bin/bash

set -e
mkdir -p "images"
cp *.JPG "images/"
cd "images/"
find . -name "*.JPG" -printf "%p\n$2/%h-%f\n" |  xargs -n2 echo cp
find . -name "*.JPG" -printf "%p\n$2/%h-%f\n" |  xargs -n2  cp
cd "images/"
ls -l
mogrify -resize 800x600! *
rename 'y/A-Z/a-z/' *.JPG
ls -l

-- 
Cal



------------------------------

Date: Wed, 25 Sep 2013 00:44:53 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: utilities in perl
Message-Id: <lok9ha-uik2.ln1@anubis.morrow.me.uk>


Quoth Cal Dershowitz <cal@example.invalid>:
> 
> I'm just catching up on this reading.  It seems like almost everyone had 
> a turn to be wrong about some aspect of it.  Would anyone say that STDIN 
> is doing anything here:
> 
> #!/bin/bash
> 
> set -e
> mkdir -p "images"
> cp *.JPG "images/"
> cd "images/"
> find . -name "*.JPG" -printf "%p\n$2/%h-%f\n" |  xargs -n2 echo cp
> find . -name "*.JPG" -printf "%p\n$2/%h-%f\n" |  xargs -n2  cp
> cd "images/"
> ls -l
> mogrify -resize 800x600! *
> rename 'y/A-Z/a-z/' *.JPG
> ls -l

There are about 11+2n processes there (depending on which commands bash
implements as builtins), each with its own stdin. Which in particular
were you talking about?

And what does this have to do with Perl?

Ben



------------------------------

Date: Tue, 24 Sep 2013 16:58:19 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: utilities in perl
Message-Id: <bf9449pmjdp7vohc3jdsdhk0es1elnq48q@4ax.com>

>Quoth Cal Dershowitz <cal@example.invalid>:
>> I'm just catching up on this reading.  It seems like almost everyone had 
>> a turn to be wrong about some aspect of it.  Would anyone say that STDIN 
>> is doing anything here:
>>
>> find . -name "*.JPG" -printf "%p\n$2/%h-%f\n" |  xargs -n2 echo cp
>> find . -name "*.JPG" -printf "%p\n$2/%h-%f\n" |  xargs -n2  cp

Yes. Whatever find writes to the STDOUT filehandle is piped into the
STDIN filehandle of xargs such that xargs can read those values.
But STDIN itself is not "doing" anything, it is just a passive data
source from where xargs can read.

What does this have to do with Perl?

jue


------------------------------

Date: Tue, 24 Sep 2013 17:46:17 -0700
From: Cal Dershowitz <cal@example.invalid>
Subject: Re: utilities in perl
Message-Id: <lLKdndF6_c7Er9_PnZ2dnUVZ_tmdnZ2d@supernews.com>

On 09/21/2013 01:31 AM, George Mpouras wrote:
> # you may like it
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use feature 'say';
> my $dir = exists $ARGV[0] && -d $ARGV[0] ? $ARGV[0] : './';
> foreach(<$dir/*>){say "$_ is ", -d $_ ? 'directory':'file'}
>

Thx, george, I try to stay away from compound statements, as I tend not 
to understand them yet.  For the purposes of the template, I basically 
want to hard code the directory at this point.

$ ./get_file1.pl
files are 
/home/fred/Documents/root/pages/leprechaun//template_stuff/captions/eng_captions 
/home/fred/Documents/root/pages/leprechaun//template_stuff/captions/eng_captions~ 
/home/fred/Documents/root/pages/leprechaun//template_stuff/captions/f 
/home/fred/Documents/root/pages/leprechaun//template_stuff/captions/f~
$ cat get_file1.pl
#!/usr/bin/perl -w
use strict;
use 5.010;
use File::Basename;
use Cwd;
use HTML::FromText;
use Text::Template;

my $path_to_dir = '/template_stuff/captions/';
my $base = getcwd;
my $path = $base.'/'. $path_to_dir;
# print "path is $path\n";
my @files = <$path*>;
print "files are @files\n";

$

Q1)  Are files that end in a ~ files like any other?  I can't seem to do 
much with them, but I know that I don't want them to match the files I 
will ultimately open and read.  What is the best way to exclude files 
with an ultimate tilde?

I believe . and .. are already gone.

Q2)  What is the best way to match the unicode files that remain after 
the above exclusions?
-- 
Cal


------------------------------

Date: Tue, 24 Sep 2013 18:27:22 -0700
From: Cal Dershowitz <cal@example.invalid>
Subject: Re: utilities in perl
Message-Id: <dsSdnZ9WsKJnpt_PnZ2dnUVZ_rednZ2d@supernews.com>

On 09/24/2013 04:58 PM, Jürgen Exner wrote:
>> Quoth Cal Dershowitz <cal@example.invalid>:
>>> I'm just catching up on this reading.  It seems like almost everyone had
>>> a turn to be wrong about some aspect of it.  Would anyone say that STDIN
>>> is doing anything here:
>>>
>>> find . -name "*.JPG" -printf "%p\n$2/%h-%f\n" |  xargs -n2 echo cp
>>> find . -name "*.JPG" -printf "%p\n$2/%h-%f\n" |  xargs -n2  cp
>
> Yes. Whatever find writes to the STDOUT filehandle is piped into the
> STDIN filehandle of xargs such that xargs can read those values.
> But STDIN itself is not "doing" anything, it is just a passive data
> source from where xargs can read.
>
> What does this have to do with Perl?
>
> jue
>

It's not uncommon that perl questions intertwine with unix questions. 
For the sake of the record, you were one of the persons who was not 
wrong at some point in this discussion, as far as I can tell.  I swear 
to God that I don't want to dwell on a protracted discussion of STDIN, 
because that doesn't do anything for my scripts right now.

#!/usr/bin/perl -w
use strict;
use 5.010;
use File::Basename;
use Cwd;
use HTML::FromText;
use Text::Template;
my $path_to_dir = '/template_stuff/captions/';
my $base = getcwd;
my $path = $base.'/'. $path_to_dir;
# print "path is $path\n";
my @files = <$path*>;
foreach @files {
next if /~$/;
next if -d;
next unless -T;
push(@matching, $_);
}
$

I want this utility to work, and I think my crayon versions of what 
files I'm looking for indicate my intent.  I'm fine with returning to 
perl.  How do I effectively use $_ instead of defining a looping variable?
-- 
Cal




------------------------------

Date: Tue, 24 Sep 2013 19:37:59 -0700
From: Cal Dershowitz <cal@example.invalid>
Subject: Re: utilities in perl
Message-Id: <2qmdnVg3Kq8a0d_PnZ2dnUVZ_uydnZ2d@supernews.com>

On 09/24/2013 04:44 PM, Ben Morrow wrote:

> There are about 11+2n processes there (depending on which commands bash
> implements as builtins), each with its own stdin. Which in particular
> were you talking about?

I agree, and now I understand.
>
> And what does this have to do with Perl?
>
> Ben
>

Unix has everything to do with perl.  It's how the abstractions of comp 
sci hit the rubber of the road.  Let's talk about perl.

#!/usr/bin/perl -w
use strict;
use File::Find;
use File::Basename;
use Cwd;
my $path_to_caps = '/template_stuff/captions/';
my $path_to_images = '/template_stuff/images/';
my $base = getcwd;
my $path = $base.'/'. $path_to_caps;
my $path2 = $base.'/'. $path_to_images;
my @filetypes = qw/jpg gif png/;
print "image types are @filetypes\n";

I believe that this is the way I want to go with it.  In my reading, I'm 
looking at p. 98 of the alpaca book:


     use File::Find;
     find(\&wanted, @directories_to_search);
     sub wanted { ... }

What is the best way to define whatever wanted is, given that I believe 
it has to be defined by the caller.  For my purposes, I could have it be 
an array.  I wonder if I don't want to "throw it into a hash," which 
seems to be common methodology in perl.

Thank you for your comment.
-- 
Cal


------------------------------

Date: Wed, 25 Sep 2013 03:35:57 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: utilities in perl
Message-Id: <dpu9ha-1im2.ln1@anubis.morrow.me.uk>


Quoth Cal Dershowitz <cal@example.invalid>:
> 
> Q1)  Are files that end in a ~ files like any other?  I can't seem to do 
> much with them, but I know that I don't want them to match the files I 
> will ultimately open and read.  What is the best way to exclude files 
> with an ultimate tilde?

They are usually backup files created by your editor. You should be able
to configure it not to do that; generally speaking a version-control
system is a better way of keeping old versions of a file available.

You can filter them out of the list with grep. I don't think glob syntax
allows you to exclude them directly.

> I believe . and .. are already gone.

Glob doesn't match any files beginning with . unless you explicitly ask
it to. . and .. are not otherwise treated specially, so a glob like <.*>
will return them.

> Q2)  What is the best way to match the unicode files that remain after 
> the above exclusions?

You can strip out all non-ASCII filenames with

    grep !/[[:ascii:]]/a, ...

though you may prefer [:print:] or [:graph:] instead. The /a is
important: it restricts the character classes to their C-locale ASCII
definitions.

(If you read my earlier post about Unix filename semantics and how Perl
handles them (or doesn't), you will remember that filenames as returned
from glob are always 8-bit strings in some indeterminate character set.
However, since in practice that charset will always be a strict superset
of ASCII (so not anything like UTF-16), for the purposes of removing
non-ASCII filenames the particular charset in use is not important.)

Ben



------------------------------

Date: Wed, 25 Sep 2013 05:47:16 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: utilities in perl
Message-Id: <kf6aha-0pu2.ln1@anubis.morrow.me.uk>


Quoth Cal Dershowitz <cal@example.invalid>:
>
> my $path_to_caps = '/template_stuff/captions/';
> my $path_to_images = '/template_stuff/images/';
> my $base = getcwd;
> my $path = $base.'/'. $path_to_caps;
> my $path2 = $base.'/'. $path_to_images;

Both these paths will end up with a double slash in. It's not important,
but it's untidy, especially since it didn't need to be there.

You seem to understand about double-quotes and interpolated variables,
so I have to wonder why you're not using them...

> my @filetypes = qw/jpg gif png/;
> print "image types are @filetypes\n";
> 
> I believe that this is the way I want to go with it.  In my reading, I'm 
> looking at p. 98 of the alpaca book:
> 
> 
>      use File::Find;
>      find(\&wanted, @directories_to_search);
>      sub wanted { ... }
> 
> What is the best way to define whatever wanted is, given that I believe 
> it has to be defined by the caller.  For my purposes, I could have it be 
> an array.  I wonder if I don't want to "throw it into a hash," which 
> seems to be common methodology in perl.

wanted is a sub. It has to be a sub; that's the way File::Find works. It
tells find() whether you want a given file returned in the result list
or not.

Since you seem to be familiar with find(1), I would recommend using
File::Find::Rule instead. File::Find is rather old and cronky, and
writing a correct wanted routine is not entirely straightforward.

Ben



------------------------------

Date: Wed, 25 Sep 2013 12:37:32 -0500
From: "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid>
Subject: Re: utilities in perl
Message-Id: <52431f5c$0$73612$815e3792@news.qwest.net>

On 09/24/13 19:46, Cal Dershowitz wrote:

> [...]  For the purposes of the template, I basically
> want to hard code the directory at this point.

Or.. look at using Find::Bin.  That way you could run
/home/blah/get_file1.pl .. instead of always being in
/home/blah and having to run ./get_file1.pl.. and the
templates could be relative to get_file1.pl.



------------------------------

Date: Wed, 25 Sep 2013 13:03:12 -0500
From: "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid>
Subject: Re: utilities in perl
Message-Id: <52432560$0$63201$815e3792@news.qwest.net>

On 09/25/13 12:37, J. Gleixner wrote:
> On 09/24/13 19:46, Cal Dershowitz wrote:
>
>> [...]  For the purposes of the template, I basically
>> want to hard code the directory at this point.
>
> Or.. look at using Find::Bin.  That way you could run
> /home/blah/get_file1.pl .. instead of always being in
> /home/blah and having to run ./get_file1.pl.. and the
> templates could be relative to get_file1.pl.
>
correction..  FindBin  not Find::Bin.. sorry.


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4043
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[32779] in Perl-Users-Digest

Perl-Users Digest, Issue: 4043 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Wed Sep 25 16:09:38 2013

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Sep 25 16:09:38 2013