[33039] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4315 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Nov 23 14:09:17 2014

Date: Sun, 23 Nov 2014 11:09:05 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sun, 23 Nov 2014     Volume: 11 Number: 4315

Today's topics:
    Re: A hash of references to arrays of references to has <rweikusat@mobileactivedefense.com>
    Re: A hash of references to arrays of references to has <rweikusat@mobileactivedefense.com>
    Re: A hash of references to arrays of references to has <whynot@pozharski.name>
    Re: A hash of references to arrays of references to has <see.my.sig@for.my.address>
    Re: A hash of references to arrays of references to has <gamo@telecable.es>
    Re: A hash of references to arrays of references to has <gamo@telecable.es>
    Re: A hash of references to arrays of references to has <hjp-usenet3@hjp.at>
    Re: A hash of references to arrays of references to has <see.my.sig@for.my.address>
    Re: A hash of references to arrays of references to has <gamo@telecable.es>
    Re: A hash of references to arrays of references to has <hjp-usenet3@hjp.at>
    Re: A hash of references to arrays of references to has <hjp-usenet3@hjp.at>
    Re: A hash of references to arrays of references to has <gamo@telecable.es>
    Re: A hash of references to arrays of references to has <hjp-usenet3@hjp.at>
        push/shift/keys/... on refs (was: A hash of references  <hjp-usenet3@hjp.at>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sun, 23 Nov 2014 14:56:04 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <87siha7ypn.fsf@doppelsaurus.mobileactivedefense.com>

Robbie Hatley <see.my.sig@for.my.address> writes:

[...]

>
> foreach my $Size (reverse sort {$a<=>$b} keys %CurDirFiles)
> {

reverse(sort {$a <=> $b } keys(%CurDirFiles))

is equivalent to

sort {$b <=> $a } keys(%CurDirFiles)

ie, instead of reversing a sorted list, you can sort using an inverted
test.



------------------------------

Date: Sun, 23 Nov 2014 19:02:41 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <877fylkaem.fsf@doppelsaurus.mobileactivedefense.com>

Robbie Hatley <see.my.sig@for.my.address> writes:

[...]

>    $CurDirFiles{$Size} = [] unless $CurDirFiles{$Size};
>       push @{ $CurDirFiles{$Size} },
>          {
>             "Date" => $ModDate,
>             "Time" => $ModTime,
>             "Type" => $Type,
>             "Size" => $Size,
>             "Attr" => $mode,
>             "Name" => $FileName
>          };

The first line could be expressed more succinctly as

$CurDirFiles{$Size} ||= [];

However, it can also be omitted altogether as autovivification will
create the anonymous array as side effect of the push.


------------------------------

Date: Sun, 23 Nov 2014 09:55:38 +0200
From: Eric Pozharski <whynot@pozharski.name>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <slrnm734nq.ltt.whynot@orphan.zombinet>

with <slrnm7107n.l3u.hjp-usenet3@hrunkner.hjp.at> Peter J. Holzer wrote:
> On 2014-11-22 05:20, Robbie Hatley <see.my.sig@for.my.address> wrote:

*SKIP*
> (in newer versions of perl, you can also push onto an array reference,
> so instead of 
>     push @{ $CurDirFiles{$Size} }, ...
> you can also write 
>     push $CurDirFiles{$Size}, ...
> which eliminates some line noise. I think it's still considered
> experimental, though.)

AFAIK it's deprecated alright in latest versions.

*CUT*

-- 
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom


------------------------------

Date: Sun, 23 Nov 2014 04:03:01 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <6dOdna9Z4pJrUezJnZ2dnUVZ572dnZ2d@giganews.com>

Greetings, "gamo". Regarding:

 > El 22/11/14 a las 06:20, Robbie Hatley escribió:
 > > ... I'm currently using the following 2 platforms:
 > > 1. Perl on Cygwin on Win 8.1 on notebook computer
 > > 2. Perl on Point Linux on desktop computer ...
 >
 > Since you use 2) you can't code the thing as if
 > `locate -b "$filename"` does not exists.

What's "locate"? Never heard of it. Wait, I'll look it
up. Seems to be a file-system indexing database and
lookup facility. Hmmm. Cygwin has it but Point Linux
doesn't. It's in the repository, though. Ok, I installed
it. May come in handy for finding files quickly on my
Linux installation.

Not what I'm looking for in regards to my "Dedup" program,
though. I'm looking to write a program that's as portable
as feasible, and making it dependent on locate/updatedb
databases is a step backwards from just reading directories
directly.

 > Other considerations are that after you find
 > candidates to dupes by file attributes, you must
 > really check if they are dupes, so use of
 > Digest::SHA3 is advisable.

What's Digest:SHA3? Never heard of it. Wait, I'll look it up.
Ah, seems to be some sort of checksum system. Not really
what I'm looking for, for a couple reasons. Firstly, again,
I want to be reasonably free from prerequisites. And secondly,
it would take longer to compute checksums for files and
compare those than to just compare two same-size files
byte-by-byte. So I'll go for the byte-by-byte compare, because
it's portable, simple, and straightforward.

But thanks for the info on "locate"; I'm sure I'll find
that useful, but just not for this particular program.


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'



------------------------------

Date: Sun, 23 Nov 2014 13:51:07 +0100
From: gamo <gamo@telecable.es>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <m4sl7p$au4$1@speranza.aioe.org>

El 22/11/14 a las 15:16, gamo escribió:
> El 22/11/14 a las 06:20, Robbie Hatley escribió:
>> # Plan: Recursively descend directory tree starting from current
>> working       #
>> # directory, and make a master list of all files encountered on this
>> branch.   #
>> # Order the list by size.  Within each size group, compare each file,
>> from     #
>> # left to right, to all the files to its right.  If a duplicate pair is
>> found, #
>> # alert user and get user input.  Give user these
>> choices:                     #
>> # 1. Erase left
>> file                                                           #
>> # 2. Erase right
>> file                                                          #
>> # 3. Ignore this pair of duplicate files and move to
>> next                      #
>> # 4.
>> Quit
>> #
>> # If user elects to delete a file, delete it, then move to next
>> duplicate.     #
>
> This is O(2), which usually means wrong. You want to compare all with
> all and expect that dupes appair in pairs. First, you must collect the
> info that makes unique the file: the filename and the file content.
> Then you have $filename from readdir and `sha3sum` of that file's content.
> You can add 1 for each apparence of $hash{ $filename.'+'.$hash_sha3 }++
> ; treat that hash by value, and you are almost done.
>

I send you my unwarranted code. Be warned: there are a lot of dupes out 
there!

#!/usr/bin/perl -w

use Digest::SHA3;
use strict;

my $home = $ARGV[0] // '.';
my %SIGN;

verdup($home);

sub verdup {
     my $dir = shift;
     $dir .= '/' if ($dir !~ /\/$/);
     print "DIR-> $dir\n";
     opendir (my $dh, $dir) or warn "Cannot opendir $dir";
     my @files = readdir ($dh);
     closedir ($dh);
     for my $i (@files){
         next if ($i=~/^\.$/ || $i=~/^\.\.$/);
         my $real = $dir.$i;
#       print "$real\n";
         if (-d $real) {
#           if ( chdir $real ){
                 verdup ($real);
#           }else{ warn "No hay privilegios y no se accede a $i" };
         }elsif (-f $real) {
#       $sha3->addfile(*F);
             my $sha3 = Digest::SHA3->new(256);
             $sha3->addfile($real,'b');
             my $digest = $sha3->digest;        # compute digest
#       $digest = $sha3->hexdigest;
#       $digest = $sha3->b64digest;
#           $real =~ /\/(.*)$/;
#           my $base = $1;
             $SIGN{ $i.$digest }++;
             if ($SIGN{ $i.$digest } > 1 ){
                 print "$real is a real dupe type $i 
[D]elete/[O]mit/[Q]uit?\n";
                 my $r = <STDIN>;
                 if ($r =~ /d/i) {
                     unlink $real;
                 }elsif ($r =~ /q/i){
                     exit 1;
                 }else{
                     # do nothing
                 }
             }else{
                 # since the file is unique you must add its SIGN to a DB
                 # for security purposes
             }
         }else{
#           print "Not a file or a directory\n";
         }
     }
}

__END__


-- 
http://www.telecable.es/personales/gamo/


------------------------------

Date: Sun, 23 Nov 2014 15:28:33 +0100
From: gamo <gamo@telecable.es>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <m4sque$oj8$1@speranza.aioe.org>

El 23/11/14 a las 13:03, Robbie Hatley escribió:
>  > Other considerations are that after you find
>  > candidates to dupes by file attributes, you must
>  > really check if they are dupes, so use of
>  > Digest::SHA3 is advisable.
>
> What's Digest:SHA3? Never heard of it. Wait, I'll look it up.
> Ah, seems to be some sort of checksum system. Not really
> what I'm looking for, for a couple reasons. Firstly, again,
> I want to be reasonably free from prerequisites. And secondly,
> it would take longer to compute checksums for files and
> compare those than to just compare two same-size files
> byte-by-byte. So I'll go for the byte-by-byte compare, because
> it's portable, simple, and straightforward.

You really don't want what you said. A byte-by-byte comparison
could last an unreasonable time. Think of gigabytes.
Digest::SHA3 it's not bullet speed, but it's portable and with
it you avoid that combinatorial problem of multiples dupes.

-- 
http://www.telecable.es/personales/gamo/


------------------------------

Date: Sun, 23 Nov 2014 15:32:37 +0100
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <slrnm73s05.pgj.hjp-usenet3@hrunkner.hjp.at>

On 2014-11-23 12:03, Robbie Hatley <see.my.sig@for.my.address> wrote:
> Greetings, "gamo". Regarding:
> > Since you use 2) you can't code the thing as if
> > `locate -b "$filename"` does not exists.
>
> What's "locate"? Never heard of it. Wait, I'll look it
> up. Seems to be a file-system indexing database and
> lookup facility. Hmmm. Cygwin has it but Point Linux
> doesn't. It's in the repository, though. Ok, I installed
> it. May come in handy for finding files quickly on my
> Linux installation.
>
> Not what I'm looking for in regards to my "Dedup" program,
> though.

Right. Locate doesn't help you to find duplicate content at all.

Also I disagree with gamo here on principle: If a tool is only available
on one of your target platforms, you not only can ignore it - you have
to (or otherwise provide an implementation for your other platforms).


> I'm looking to write a program that's as portable
> as feasible, and making it dependent on locate/updatedb
> databases is a step backwards from just reading directories
> directly.
>
> > Other considerations are that after you find
> > candidates to dupes by file attributes, you must
> > really check if they are dupes, so use of
> > Digest::SHA3 is advisable.
>
> What's Digest:SHA3? Never heard of it. Wait, I'll look it up.
> Ah, seems to be some sort of checksum system. Not really
> what I'm looking for, for a couple reasons.

Oh yes, you are looking for that, or you should be.

> Firstly, again, I want to be reasonably free from prerequisites.

Ignoring obvious building blocks isn't being "reasonably" free of
prerequisites, it it's being unreasonably free of prerequisites. Use
CPAN! As somebody once remarked, Perl is CPAN - the rest is just
syntactic sugar.

> And secondly, it would take longer to compute checksums for files and
> compare those than to just compare two same-size files byte-by-byte.

Not if you have more than two files of any given size. 

If you have 100 files of the same size and you compare each to every
other byte-by-byte, you have to read ~5000 files. If you compute a hash
of each file and compare the hashes, you have to read each file only
once - a speedup of 50!

One of the very first dedup programs I saw (back in ~1990 or so) used
CRC-32 as a checksumming algorithm. It can proceeded to doing a
byte-by-byte comparison for files with the same checksum. For SHA-3 the
chances of a duplicate are probably lower than cmp failing, so that's
probably not necessary.

If your files are very large, often of the same size and typically
different in the first few blocks, you might want to compute the hash of the
first MB (or so) each file as an intermediate step.

        hp

-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Sun, 23 Nov 2014 07:01:06 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <DImdnYBcK48ua-zJnZ2dnUVZ57ydnZ2d@giganews.com>

On 11/22/2014 04:26 AM, Peter J. Holzer wrote:

> ... In object oriented programming you would define a class for each
> of these to formalize this abstraction....

Or just use classes written by others. :-) That's what the C++ STL
and the Perl CPAN are about.

> > Seems to me there's got to be an easier way.  In C++ I just use
> > "multimaps" from the C++ Standard Template Library. Maybe there's
> > something like that in CPAN but I haven't looked yet.
>
> I don't really think that helps here. On the contrary, it breaks the
> nice abstractions and obscures what the program is supposed to do.

For my current purposes I agree: I'm mostly interested in learning
to do the abstractions manually, so automating it would be
counterproductive.

> ... The idea is fine. That's usually the first step in finding
> duplicates since it is fast...

Yep, no sense comparing 2 files byte-by-byte if they're not even
the same size; you already know they're not duplicates.

> Don't copy the whole array each time, use push.
>
> So instead of
>
>>      if ($CurDirFiles{$Size})
>>      {
>>         $CurDirFiles{$Size} =
>> 	  [
>> 	     @{$CurDirFiles{$Size}},
>> 	     {
>>                   "Date" => $ModDate,
>>                   "Time" => $ModTime,
>>                   "Type" => $Type,
>>                   "Size" => $Size,
>>                   "Attr" => $mode,
>>                   "Name" => $FileName
>>                }
>> 	  ];
>>      }
>>      else
>>      {
>>         $CurDirFiles{$Size} =
>> 	  [
>> 	     {
>>                   "Date" => $ModDate,
>>                   "Time" => $ModTime,
>>                   "Type" => $Type,
>>                   "Size" => $Size,
>>                   "Attr" => $mode,
>>                   "Name" => $FileName
>>                }
>> 	  ];
>>      }
>
> write
>
>        $CurDirFiles{$Size} = [] unless $CurDirFiles{$Size};
>        push @{ $CurDirFiles{$Size} },
>    	   {
>                 "Date" => $ModDate,
>                 "Time" => $ModTime,
>                 "Type" => $Type,
>                 "Size" => $Size,
>                 "Attr" => $mode,
>                 "Name" => $FileName
>             };

Fascinating. So, for each file, initialize its size slot in the
outer hash to a reference to a blank list, unless such a list
already exists. Then "push" the file record onto the end of that
list. Yep, that's much simpler. Thanks!

> (in newer versions of perl, you can also push onto an array reference,
> so instead of
>      push @{ $CurDirFiles{$Size} }, ...
> you can also write
>      push $CurDirFiles{$Size}, ...
> which eliminates some line noise. I think it's still considered
> experimental, though.)

That feature looks like it requires Perl to violate its usual
"Perl does no implicit dereferencing" rule. I'm not sure that
it simplifies things, ultimately.

> ... (Note that I have also chosen descriptive variable names:
> "$HashRef" is a terrible name: It tells you nothing about what the
> variable represents, only about the implementation...

I was in a hurry, so I called it "$HashRef" because it was fast to
type. $FileRecordRef would be more to the point of what it's doing,
though.  Or perhaps even $file_record_ref .

> ... Also, I've stuck with your capitalization scheme for consistency,
> but starting a variable name with an upper case letter is very unusual...

I've been of the habit of naming most user-named things in computer
programs (in any language) in WikiCase for some years now, because
in most languages the keywords are all in lower case. So if, unknown
to me, a language has a keyword "melon", if I name a variable "melon",
would likely create a nasty problem: code that might compile but give
unexpected results when ran and be hard to troubleshoot.

But it occurs to me that in Perl one isn't going to have that problem
%because $variables @all $have &those %funny $characters
-- "sigils", I think they're called -- in front of them.
So lets try the following:

#!/usr/bin/perl
use v5.14;
use strict;
use warnings;
my $while = 3;
my $if = 5;
print($while+$if,"\n");

Yep, runs without error or warning, and prints "8", even though it
severely misuses 2 Perl keywords. Interesting.

> >      my $Type;
> >
> >      if ( -d _ )
> >      {
> >         $Type = "Dir";
> >      }
> >      else
> >      {
> >         $Type = "File";
> >      }
>
> These 10 lines can be replaced by one:
>
>      my $Type = -d _ ? "Dir" : "File";

Cool! I never thought of that. Very C.

> >      foreach my $HashRef (@{$CurDirFiles{$Size}})
> >       {
> >         print($$HashRef{Date}, "  ");
> >         print($$HashRef{Time}, "  ");
> >         print($$HashRef{Type}, "  ");
> >         print($$HashRef{Size}, "  ");
> >         print($$HashRef{Attr}, "  ");
> >         print($$HashRef{Name}, "\n");
> >      }
>
> ... don't write $$HashRef{$key}, write $HashRef->{$key}.
> It's one character more to type but much clearer.

Ah, yes. Again, that never occurred to me. I'd forgotten
(if I ever knew) that that syntax could be used with refs in Perl.
Much cleaner, yes.

> Then, either use a loop on the keys:
>
>        foreach my $HashRef (@{$CurDirFiles{$Size}})
>        {
>           for my $Field (qw(Date Time Type Size Attr Name))
>           {
>               print($HashRef->{$Field}, "  ");
>           }
>           print "\n";
>        }
>
> (this leaves an extra blank at the end of the line, but I consider this
> acceptable)

I haven't studied the q() qq() qw() qr() constructs in years. I remember
using qr() for regular expressions back in 2005, but usually I use
"double quotes" or 'single quotes' to quote stuff. I should learn the
other way, though.

I like how your approach makes the code more maintainable because to
add or remove or re-order fields you just change what's in the qw().

> or, even better, use a hash slice and join
>
>        foreach my $HashRef (@{$CurDirFiles{$Size}})
>        {
>           print join(" ",
>                      @{ $HashRef }{qw(Date Time Type Size Attr Name)}),
>                 "\n";
>        }

That one's going way over my head. I'm not understanding what the braces 
around $HashRef are doing, or what the braces around the qw()
are doing.

But thanks for all the tips! They've been very enlightening. I already
stripped away about 30 lines of code bloat by applying them, and I
haven't even looped the prints yet.


Current state of program, for reference:


#!/usr/bin/perl

use v5.14;
use strict;
use warnings;

use Cwd;

sub time_from_mtime;
sub date_from_mtime;

my $CurDir;
my %CurDirFiles;

$CurDir = getcwd();
print "CWD = ", $CurDir, "\n";
opendir(my $Dot, ".") or die "Can\'t open directory. $!";

while (my $FileName=readdir($Dot))
{
    my ($dev,     $ino,     $mode,    $nlink,   $uid,
        $gid,     $rdev,    $size,    $atime,   $mtime,
        $ctime,   $blksize, $blocks)
       = stat($FileName);

    my $ModDate = date_from_mtime($mtime);
    my $ModTime = time_from_mtime($mtime);
    my $Size = -s _ ;
    my $Type = -d _ ? "Dir " : "File";

    $CurDirFiles{$Size} = [] unless $CurDirFiles{$Size};
       push @{ $CurDirFiles{$Size} },
          {
             "Date" => $ModDate,
             "Time" => $ModTime,
             "Type" => $Type,
             "Size" => $Size,
             "Attr" => $mode,
             "Name" => $FileName
          };

};

closedir($Dot);

foreach my $Size (reverse sort {$a<=>$b} keys %CurDirFiles)
{
    foreach my $file_record_ref (@{$CurDirFiles{$Size}})
    {
       print($file_record_ref->{Date}, "  ");
       print($file_record_ref->{Time}, "  ");
       print($file_record_ref->{Type}, "  ");
       print($file_record_ref->{Size}, "  ");
       print($file_record_ref->{Attr}, "  ");
       print($file_record_ref->{Name}, "\n");
    }
}

sub date_from_mtime
{
    my $TimeDate = scalar localtime shift @_;
    my $Date = substr ($TimeDate, 0, 10);
    $Date .= ", ";
    $Date .= substr ($TimeDate, 20, 4);
    return $Date;
}

sub time_from_mtime
{
    my $TimeDate = scalar localtime shift @_;
    my $Time = substr ($TimeDate, 11, 8);
    return $Time;
}



-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/


------------------------------

Date: Sun, 23 Nov 2014 16:39:05 +0100
From: gamo <gamo@telecable.es>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <m4sv2m$3ho$1@speranza.aioe.org>

El 23/11/14 a las 15:32, Peter J. Holzer escribió:
> Right. Locate doesn't help you to find duplicate content at all.
>

WRONG! locate -c filename

        -c, --count
               Instead of writing file names on standard output, write 
the number of matching entries only.

So, locate could answer your question if you have a filename.

-- 
http://www.telecable.es/personales/gamo/


------------------------------

Date: Sun, 23 Nov 2014 17:09:17 +0100
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <slrnm741ld.ftn.hjp-usenet3@hrunkner.hjp.at>

On 2014-11-23 15:39, gamo <gamo@telecable.es> wrote:
> El 23/11/14 a las 15:32, Peter J. Holzer escribió:
>> Right. Locate doesn't help you to find duplicate content at all.
>
> WRONG! locate -c filename
>
>         -c, --count
>                Instead of writing file names on standard output, write 
> the number of matching entries only.

Er, I wrote "content", not "count".


> So, locate could answer your question if you have a filename.

No, it won't. If I have a file house.jpg, it won't tell me that it's a
duplicate of P100123.JPG.

        hp

-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Sun, 23 Nov 2014 17:34:08 +0100
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <slrnm74340.ftn.hjp-usenet3@hrunkner.hjp.at>

On 2014-11-23 15:01, Robbie Hatley <see.my.sig@for.my.address> wrote:
> On 11/22/2014 04:26 AM, Peter J. Holzer wrote:
>> ... In object oriented programming you would define a class for each
>> of these to formalize this abstraction....
>
> Or just use classes written by others. :-) That's what the C++ STL
> and the Perl CPAN are about.

I really hope that the STL doesn't contain specialized classes for "all
information about a file you need to find duplicates" and "a list of
files of the same size". Much too specialized. Classes like this belong
into an application, not into a standard library. (Something like this
might be on CPAN, though. CPAN contains lots of quite specialized stuff,
and it actually even contains a File::Find::Duplicates module which
might contain classes like this.


>> or, even better, use a hash slice and join
>>
>>        foreach my $HashRef (@{$CurDirFiles{$Size}})
>>        {
>>           print join(" ",
>>                      @{ $HashRef }{qw(Date Time Type Size Attr Name)}),
>>                 "\n";
>>        }
>
> That one's going way over my head. I'm not understanding what the braces 
> around $HashRef are doing, or what the braces around the qw()
> are doing.

The braces around $HashRef are just for grouping: We want to dereference
$HashRef, not access %HashRef. The braces around qw are the braces used
for selecting members in a hash. just like in «$hash{member}». You can
select several of them by writing «@hash{member1, member2, ...}»: The
«@» sigil tells perl that the result is a list (not a scalar) and the
braces «{}» tell perl that we are selecting members from a hash, not an
array.

So, basically, «@{ $HashRef }{qw(Date Time Type Size Attr Name)}»
is the same as

    do {
        my %tmp_hash = %$HashRef;
        map { $tmp_hash{$_} qw(Date Time Type Size Attr Name);
    }

        hp

-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Sun, 23 Nov 2014 17:55:00 +0100
From: gamo <gamo@telecable.es>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <m4t3h1$emj$1@speranza.aioe.org>

El 23/11/14 a las 17:09, Peter J. Holzer escribió:
>>          -c, --count
>> >                Instead of writing file names on standard output, write
>> >the number of matching entries only.

> Er, I wrote "content", not "count".

And you are partly right. A dupe must have NAME AND CONTENT identical.
So if name or content differs, there is not a dupe.

>
>
>> >So, locate could answer your question if you have a filename.

> No, it won't. If I have a file house.jpg, it won't tell me that it's a
> duplicate of P100123.JPG.
>

But it can tell you they are NOT, because name are different.

Think in this prorrogation of budget:

$ cp budget2014.pdf budget2015.pdf

The content is identical, but the name no, and you could go
to jail if you erase one of the files.

-- 
http://www.telecable.es/personales/gamo/


------------------------------

Date: Sun, 23 Nov 2014 19:41:30 +0100
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <slrnm74aiq.9k3.hjp-usenet3@hrunkner.hjp.at>

On 2014-11-23 16:55, gamo <gamo@telecable.es> wrote:
> El 23/11/14 a las 17:09, Peter J. Holzer escribió:
>>>          -c, --count
>>> >                Instead of writing file names on standard output, write
>>> >the number of matching entries only.
>
>> Er, I wrote "content", not "count".
>
> And you are partly right. A dupe must have NAME AND CONTENT identical.
> So if name or content differs, there is not a dupe.

As far as I (and the OP, and just about anybody who has ever written a
"finddup" program) am concerned, when I'm searching for duplicates, the
name is irrelevant. If the name was the same, it would be easy. But the
very reason for these programs is to find files with the same content
regardless of name (or owner, permissions, ...).


>>> >So, locate could answer your question if you have a filename.
>
>> No, it won't. If I have a file house.jpg, it won't tell me that it's a
>> duplicate of P100123.JPG.
>>
>
> But it can tell you they are NOT, because name are different.
>
> Think in this prorrogation of budget:
>
> $ cp budget2014.pdf budget2015.pdf
>
> The content is identical, but the name no, and you could go
> to jail if you erase one of the files.
>

But it's ok if you did
    cp budget2014/overview.pdf budget2015/overview.pdf
?

I suspect your chances for going to jail are actually higher if you have
a budget2014.pdf and budget2015.pdf which are completely identical.

(Now, if that's budget2014.xlsx and budget2015.xlsx, they could be
identical - for about 5 minutes)

        hp

-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Sun, 23 Nov 2014 15:10:30 +0100
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: push/shift/keys/... on refs (was: A hash of references to    arrays of references to hashes... is there a better way?)
Message-Id: <slrnm73qmm.pgj.hjp-usenet3@hrunkner.hjp.at>

On 2014-11-23 07:55, Eric Pozharski <whynot@pozharski.name> wrote:
> with <slrnm7107n.l3u.hjp-usenet3@hrunkner.hjp.at> Peter J. Holzer wrote:
>> (in newer versions of perl, you can also push onto an array reference,
>> so instead of 
>>     push @{ $CurDirFiles{$Size} }, ...
>> you can also write 
>>     push $CurDirFiles{$Size}, ...
>> which eliminates some line noise. I think it's still considered
>> experimental, though.)
>
> AFAIK it's deprecated alright in latest versions.

What? That's one of the most useful additions to the Perl language in
recent years. It makes my code a lot less cluttered. I really hope you
are wrong.  Source?

(FIW, perl 5.20.0 (the latest version I have installed) warns me that
it's experimental, not deprecated)

Dir you maybe confuse this with the ability to omit the sigil on push
and some other builtin operators? I.e., you can write

    push x, "hello";

and it's equivalent to 

    push @x, "hello";

That is indeed heavily deprecated - it warns even without «use warnings»
or «use strict» on 5.14, and it didn't even compile with «use strict»
since at least 5.8.

        hp

-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4315
***************************************


home help back first fref pref prev next nref lref last post