[33040] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 4316 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Nov 24 05:17:23 2014

Date: Mon, 24 Nov 2014 02:17:07 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 24 Nov 2014     Volume: 11 Number: 4316

Today's topics:
    Re: A hash of references to arrays of references to has <rweikusat@mobileactivedefense.com>
    Re: A hash of references to arrays of references to has <jurgenex@hotmail.com>
    Re: A hash of references to arrays of references to has <rweikusat@mobileactivedefense.com>
    Re: A hash of references to arrays of references to has <gamo@telecable.es>
    Re: A hash of references to arrays of references to has <see.my.sig@for.my.address>
    Re: A hash of references to arrays of references to has <see.my.sig@for.my.address>
    Re: A hash of references to arrays of references to has <see.my.sig@for.my.address>
    Re: A hash of references to arrays of references to has <gamo@telecable.es>
    Re: A hash of references to arrays of references to has <see.my.sig@for.my.address>
    Re: A hash of references to arrays of references to has <gamo@telecable.es>
    Re: A hash of references to arrays of references to has <hjp-usenet3@hjp.at>
        How do I compare two files byte-by-byte? <see.my.sig@for.my.address>
    Re: How do I compare two files byte-by-byte? <gravitalsun@hotmail.foo>
    Re: How do I compare two files byte-by-byte? <m@rtij.nl.invlalid>
    Re: How do I compare two files byte-by-byte? <see.my.sig@for.my.address>
    Re: How do I compare two files byte-by-byte? <see.my.sig@for.my.address>
        open and pipes <gandalf23@mail.com>
    Re: open and pipes <gravitalsun@hotmail.foo>
    Re: open and pipes <gamo@telecable.es>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sun, 23 Nov 2014 19:09:45 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <873899ka2u.fsf@doppelsaurus.mobileactivedefense.com>

"Peter J. Holzer" <hjp-usenet3@hjp.at> writes:

[...]

> As somebody once remarked, Perl is CPAN - the rest is just
> syntactic sugar.

Perl is a programming language. CPAN is a web archive of freely
downloadable software. IOW, Perl is Perl and CPAN is CPAN and the
statement above makes no sense.


------------------------------

Date: Sun, 23 Nov 2014 12:47:33 -0800
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <kph47a9h85d8ukoarholsejbrdauit8i2g@4ax.com>

gamo <gamo@telecable.es> wrote:
>$ cp src.c src.bak
>
>You do a copy as a backup to prevent losing a file by
>malfunction of a disk or simply by changing its contents.
>Do you want that the backup copies are targeted as dupes?

Absolutely yes. That is the expressed purpose of a remove-duplicate: get
rid of all those old junk copies that I created in some distant past for
whatever reason.
Now, if that is a legitimate backup then it will be on a separate volume
or in a backup archive and of course you would exclude those from
remove-duplicate anyway.

jue


------------------------------

Date: Sun, 23 Nov 2014 21:04:45 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <87ppcdiq6q.fsf@doppelsaurus.mobileactivedefense.com>

Robbie Hatley <see.my.sig@for.my.address> writes:
> On 11/23/2014 11:02 AM, Rainer Weikusat wrote:
>> Robbie Hatley <see.my.sig@for.my.address> writes:
>>
>> [...]
>>
>>>     $CurDirFiles{$Size} = [] unless $CurDirFiles{$Size};

[...]

>> The first line

[...]

>> it can also be omitted altogether as autovivification will
>> create the anonymous array as side effect of the push.
>
> I've never seen the word "autovivification" before. Sounds like
> something Dr Frankenstein would have liked. :-)

That's the term used in the Perl references reference documentatin
('perldoc perlref'): Using a scalar with undefined value in a lvalue
context expecting a reference to a certain thing, eg as in

push(@{$CurDirFiles{$Size}}, ...)

will cause a suitable 'thing' to be created an a reference to it to be
assigned to the scalar. That's also the mechanism exploited by

opendir(my $Dot, ".") or die "Can\'t open directory. $!";

which implicitly creates an anonymous glob and assigned a reference to
that to $Dot.


------------------------------

Date: Sun, 23 Nov 2014 20:09:45 +0100
From: gamo <gamo@telecable.es>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <m4tbdm$cfq$1@speranza.aioe.org>

El 23/11/14 a las 19:41, Peter J. Holzer escribió:
>> Think in this prorrogation of budget:
>> >
>> >$ cp budget2014.pdf budget2015.pdf
>> >
>> >The content is identical, but the name no, and you could go
>> >to jail if you erase one of the files.
>> >
> But it's ok if you did
>      cp budget2014/overview.pdf budget2015/overview.pdf
> ?

No, that's not ok. That files must be different in content.

>
> I suspect your chances for going to jail are actually higher if you have
> a budget2014.pdf and budget2015.pdf which are completely identical.
>
> (Now, if that's budget2014.xlsx and budget2015.xlsx, they could be
> identical - for about 5 minutes)

Another example:

$ cp src.c src.bak

You do a copy as a backup to prevent losing a file by
malfunction of a disk or simply by changing its contents.
Do you want that the backup copies are targeted as dupes?

-- 
http://www.telecable.es/personales/gamo/


------------------------------

Date: Sun, 23 Nov 2014 12:26:03 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <K_CdnSck36FB3-_JnZ2dnUVZ57ydnZ2d@giganews.com>

On 11/23/2014 06:56 AM, Rainer Weikusat wrote:

> reverse(sort {$a <=> $b } keys(%CurDirFiles))
>
> is equivalent to
>
> sort {$b <=> $a } keys(%CurDirFiles)
>
> ie, instead of reversing a sorted list, you can sort using
> an inverted test.

Cool! Thanks for the tip. Another thing I didn't think of.
I'd never actually used the "sort {block} list" syntax before.
The only reason I used it this time is because otherwise "sort"
was giving me a reverse ASCII sort:

395
377884
3574
33594

When what I wanted was a reverse numerical sort:

377884
33594
3574
395

The syntax does lend it self to switching the test around
so that "reverse" becomes unnecessary, but at the time
I wrote that line (about 3 days ago) that never occurred to me.


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/


------------------------------

Date: Sun, 23 Nov 2014 12:34:42 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <kbWdnfzjTqN-2e_JnZ2dnUVZ572dnZ2d@giganews.com>

On 11/23/2014 11:02 AM, Rainer Weikusat wrote:
> Robbie Hatley <see.my.sig@for.my.address> writes:
>
> [...]
>
>>     $CurDirFiles{$Size} = [] unless $CurDirFiles{$Size};
>>        push @{ $CurDirFiles{$Size} },
>>           {
>>              "Date" => $ModDate,
>>              "Time" => $ModTime,
>>              "Type" => $Type,
>>              "Size" => $Size,
>>              "Attr" => $mode,
>>              "Name" => $FileName
>>           };
>
> The first line could be expressed more succinctly as
>
> $CurDirFiles{$Size} ||= [];
>
> However, it can also be omitted altogether as autovivification will
> create the anonymous array as side effect of the push.

I've never seen the word "autovivification" before. Sounds like
something Dr Frankenstein would have liked. :-) But I imagine you're
saying that the arrays will make themselves. Hmmm. Let's try that...
:::comments-out line:::
:::runs script:::
Yep, you're right, it creates the arrays automagically. Cool.


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/


------------------------------

Date: Sun, 23 Nov 2014 13:10:18 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <CYydnXgFReKn0O_JnZ2dnUVZ57ydnZ2d@giganews.com>

On 11/23/2014 08:34 AM, Peter J. Holzer wrote:

> On 2014-11-23 15:01, Robbie Hatley <see.my.sig@for.my.address> wrote:
> > On 11/22/2014 04:26 AM, Peter J. Holzer wrote:
> > > ... In object oriented programming you would define a class for each
> > > of these to formalize this abstraction....
> >
> > Or just use classes written by others. :-) That's what the C++ STL
> > and the Perl CPAN are about.
>
> I really hope that the STL doesn't contain specialized classes for "all
> information about a file you need to find duplicates" and "a list of
> files of the same size". Much too specialized.

No, but it contains "maps" and "multimaps".

"Maps" are the same as Perl's "hashes" but even more automated, with
an object-oriented syntax.

Multimaps are like hashes of arrays: any one key can correspond to
multiple values. That makes multimaps the choice for doing things
like ordering file records into size groups.

> Classes like this belong into an application, not into a standard
> library. (Something like this might be on CPAN, though. CPAN contains
> lots of quite specialized stuff, and it actually even contains a
> File::Find::Duplicates module which might contain classes like this.

Ah, but that's cheating. :-) I haven't looked in CPAN for modules
related to this program, and I'm not going to do so until I've finished
writing it the "manual" way.

> @{$HashRef}{qw(Date Time Type Size Attr Name)}

That's going to take some study. I need to learn more about dereferencing,
and about qw(). I'm setting that aside for right now, concentrating on
finishing the rest of the program. I'll get back to this one later.


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'


------------------------------

Date: Sun, 23 Nov 2014 23:19:11 +0100
From: gamo <gamo@telecable.es>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <m4tmgr$ckf$1@speranza.aioe.org>

El 23/11/14 a las 21:47, jurgenex@hotmail.com escribió:
> gamo <gamo@telecable.es> wrote:
>> $ cp src.c src.bak
>>
>> You do a copy as a backup to prevent losing a file by
>> malfunction of a disk or simply by changing its contents.
>> Do you want that the backup copies are targeted as dupes?
>
> Absolutely yes. That is the expressed purpose of a remove-duplicate: get
> rid of all those old junk copies that I created in some distant past for
> whatever reason.
> Now, if that is a legitimate backup then it will be on a separate volume
> or in a backup archive and of course you would exclude those from
> remove-duplicate anyway.
>
> jue
>

Then, that's not what I want definitively. I have various types of backups:
1. The copy from scr1.pl to scr2.pl
2. The backup journaled automated by jed after any session of edition 
(numbered)
3. The usb drive or external HD rsync copy

If I'm particulary obfuscated with a bug, I could go forward and 
backward more than 20 times until I get the proper output.
All recorded and all desired.

If I search for name AND content equal, lots of dupes appair.
For what reason I will want more suspects?

-- 
http://www.telecable.es/personales/gamo/


------------------------------

Date: Sun, 23 Nov 2014 14:40:25 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <cq-dnYfdW8zH_-_JnZ2dnUVZ57ydnZ2d@giganews.com>


On 11/23/2014 08:55 AM, gamo wrote:

> A dupe must have NAME AND CONTENT identical.

What my program will do is define "duplicates" to mean
"two files in same directory with different names but
identical contents".

As for two files with the same name in the same directory,
that's not possible in any file system I know of.

My original version of this program (written in C++ and using
proprietary libraries from djgpp, which are now becoming
increasingly obsolete and unsupported) I wrote to help me
weed out duplicate garbage from my burgeoning collection of
"stuff I've downloaded from the Internet over the years",
now approaching 1TB, and including sounds, images, documents,
executables, source code, and even a few files that "I don't
know what the heck this is" that have no file-name extension
and look like random gibberish in a hex editor.

Most of these duplicates have different file names. But I
have no need to have on file, say, five 17MB recordings of
Jack Sprat playing a ukelele with these names:

"Jack Sprat Playing A Ukelele.flac"   #using spaces
"Jack-Sprat-Playing-A-Ukelele.flac"   #using hyphens
"jack on uke.flac"                    #shorter name
"jack_playing_the_uke.jpg"            #wrong file extension

A good dup-detector will find each pair of dups and give
the user some options, such as:

File "jack_playing_the_uke.jpg" is a duplicate of file
"jack on uke.flac".  What would  you like to do?
1. Erase "jack on uke.flac"?
2. Erase "jack_playing_the_uke.jpg"?
3. Ignore these duplicates?
4. Erase all newer dups without prompting?
5. Exit program?

> Think in this prorrogation of budget:
> $ cp budget2014.pdf budget2015.pdf
> The content is identical, but the name no, and you could go
> to jail if you erase one of the files.

I'm of the opinion that just because guns are capable of being
used to murder people, law-abiding citizens should not be
prohibited from owning guns.

Likewise, just because *sometimes* it's not a good idea to
delete a file just because it has identical contents to
another file, doesn't mean that my file-duplicate-deletion
tools should be lobotomized to prevent that. My program will
let the user decide what to delete and what not to delete.


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/


------------------------------

Date: Mon, 24 Nov 2014 00:07:26 +0100
From: gamo <gamo@telecable.es>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <m4tpbe$j03$1@speranza.aioe.org>

El 23/11/14 a las 23:40, Robbie Hatley escribió:
> Likewise, just because *sometimes* it's not a good idea to
> delete a file just because it has identical contents to
> another file, doesn't mean that my file-duplicate-deletion
> tools should be lobotomized to prevent that. My program will
> let the user decide what to delete and what not to delete.

If you look at my suggested script, just deleting '$i.' in two
lines of the code, behave as 'content only' dupe detector.

-- 
http://www.telecable.es/personales/gamo/


------------------------------

Date: Mon, 24 Nov 2014 00:41:07 +0100
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: A hash of references to arrays of references to hashes... is there a better way?
Message-Id: <slrnm74s4j.n89.hjp-usenet3@hrunkner.hjp.at>

On 2014-11-23 21:10, Robbie Hatley <see.my.sig@for.my.address> wrote:
> On 11/23/2014 08:34 AM, Peter J. Holzer wrote:
>
>> On 2014-11-23 15:01, Robbie Hatley <see.my.sig@for.my.address> wrote:
>> > On 11/22/2014 04:26 AM, Peter J. Holzer wrote:
>> > > ... In object oriented programming you would define a class for each
>> > > of these to formalize this abstraction....
>> >
>> > Or just use classes written by others. :-) That's what the C++ STL
>> > and the Perl CPAN are about.
>>
>> I really hope that the STL doesn't contain specialized classes for "all
>> information about a file you need to find duplicates" and "a list of
>> files of the same size". Much too specialized.
>
> No, but it contains "maps" and "multimaps".

Yes, but these were *not* the classes I was talking about. They don't
formalize the abstraction - on the contrary, they emphasize the
implementation.


>> @{$HashRef}{qw(Date Time Type Size Attr Name)}
>
> That's going to take some study. I need to learn more about dereferencing,

perlreftut is the place to start.

> and about qw().

qw() is just a shorthand for writing lists of strings which don't
contain spaces. 
    qw(Date Time Type Size Attr Name) 
is exactly equivalent to 
    ('Date', 'Time', 'Type', 'Size', 'Attr', 'Name')

        hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Sun, 23 Nov 2014 14:49:41 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: How do I compare two files byte-by-byte?
Message-Id: <qvCdncRHI6Mb-e_JnZ2dnUVZ572dnZ2d@giganews.com>


Ok, so I'm getting to the part in my "duplicate file detection"
program where I've found groups of files with same size and
I want to open files in pairs and compare them byte-for-byte.
How do I do that?

I tried the following, but it gives me "syntax error"
(and probably wouldn't actually put just 1 byte in $byte1
and $byte2 anyway):

# At this point in the program, I have two files open for
# input, on file handles $filehandle1 and $filehandle2.
my $diff = 0;
# NEXT LINE GIVES "SYNTAX ERROR":
while (my $byte1 <$filehandle1> and my $byte2 <$filehandle2>)
{
    if ($byte1 == $byte2)
    {
       next;
    }
    else
    {
       $diff = 1;
       last;
    }
}
if ($diff) { do some  stuff } # files are differnet
else       { do other stuff } # files are identical

So clearly, that's not how to read two open files,
synchronously, byte by byte. How would I go about
doing that?



-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/


------------------------------

Date: Mon, 24 Nov 2014 01:02:00 +0200
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: How do I compare two files byte-by-byte?
Message-Id: <m4tp1b$263b$1@news.ntua.gr>

On 24/11/2014 00:49, Robbie Hatley wrote:
>
> Ok, so I'm getting to the part in my "duplicate file detection"
> program where I've found groups of files with same size and
> I want to open files in pairs and compare them byte-for-byte.
> How do I do that?
>
> I tried the following, but it gives me "syntax error"
> (and probably wouldn't actually put just 1 byte in $byte1
> and $byte2 anyway):
>
> # At this point in the program, I have two files open for
> # input, on file handles $filehandle1 and $filehandle2.
> my $diff = 0;
> # NEXT LINE GIVES "SYNTAX ERROR":
> while (my $byte1 <$filehandle1> and my $byte2 <$filehandle2>)
> {
>     if ($byte1 == $byte2)
>     {
>        next;
>     }
>     else
>     {
>        $diff = 1;
>        last;
>     }
> }
> if ($diff) { do some  stuff } # files are differnet
> else       { do other stuff } # files are identical
>
> So clearly, that's not how to read two open files,
> synchronously, byte by byte. How would I go about
> doing that?
>
>
>


did you read my answer at your previous thread ?

By the way if you want to do that you have to open both files in binary 
mode and seek/read/compare buffer in parallel until their first 
difference or their end, but it is something completely useless.


------------------------------

Date: Mon, 24 Nov 2014 08:39:27 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: How do I compare two files byte-by-byte?
Message-Id: <eu3bkb-asc.ln1@news.rtij.nl>

On Sun, 23 Nov 2014 14:49:41 -0800, Robbie Hatley wrote:

> So clearly, that's not how to read two open files, synchronously, byte
> by byte. How would I go about doing that?

You don't as it is slow as molasseses. Open the files binary and use 
sysread to read large chunks, then compare the chunks.

But as others already have stated, this may not be the best way to do it 
when there are more then two files to byte wise compare

M4


------------------------------

Date: Mon, 24 Nov 2014 00:48:41 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: How do I compare two files byte-by-byte?
Message-Id: <MuOdnQhAN6t3be_JnZ2dnUVZ57ydnZ2d@giganews.com>


On 11/23/2014 03:02 PM, George Mpouras wrote:

> did you read my answer at your previous thread ?

Yes. I thought I answered this one already. Ah, I see, I
clicked "reply" instead of "followup" so you alone got it.
(Short recap for everyone else: I basically said "no" to
databases, checksums, and CPAN modules, saying I prefered
to write this program "manually" as a "learning Perl"
exercise.)

> By the way if you want to do that you have to open both
> files in binary mode and seek/read/compare buffer in parallel
> until their first difference or their end.

Thanks for your mention of "read"; that's the function
I didn't know I needed to get this program up and running!
Turns out I didn't need "seek"; just open (in "raw" mode),
read to end, compare bytes (had to use "ord()" to get the
numerical value of each byte), and close.

> ... but it is something completely useless ...

I find it quite useful.

I've now got my "dedup" program working correctly up to
the point of alerting the user as to whether each pair
of same-size files are "identical" or "different".

For reference, here is the current state of my program:

#!/usr/bin/perl

################################################################################
# dedup.perl                                                                   #
# Duplicate file finding/erasing program.                                      #
# Written by Robbie Hatley, starting 2005-06-21, as a "learn Perl" exercise.   #
# Plan: Recursively descend directory tree starting from current working       #
# directory, and make a master list of all files encountered on this branch.   #
# Order the list by size.  Within each size group, compare each file, from     #
# left to right, to all the files to its right.  If a duplicate pair is found, #
# alert user and get user input.  Give user these choices:                     #
# 1. Erase left file                                                           #
# 2. Erase right file                                                          #
# 3. Ignore this pair of duplicate files and move to next                      #
# 4. Quit                                                                      #
# If user elects to delete a file, delete it, then move to next duplicate.     #
# Edit history:                                                                #
#    Tue Jun 21, 2005 - Started writing it.                                    #
#    Thu Nov 20, 2014 - Getting back to this exercise after 9-year hiatus.     #
#    Mon Nov 24, 2014 - Got it working up to the point of alerting user as to  #
#                       whether each pair of same-size files are identical or  #
#                       different.                                             #
################################################################################

use v5.14;
use strict;
use warnings;

use Cwd;

sub time_from_mtime;
sub date_from_mtime;

my $CurDir;
my %CurDirFiles;

$CurDir = getcwd();
print "CWD = ", $CurDir, "\n";
opendir(my $Dot, ".") or die "Can\'t open directory. $!";

while (my $FileName=readdir($Dot))
{
    my ($dev,     $ino,     $mode,    $nlink,   $uid,
        $gid,     $rdev,    $size,    $atime,   $mtime,
        $ctime,   $blksize, $blocks)
       = stat($FileName);

    my $ModDate = date_from_mtime($mtime);
    my $ModTime = time_from_mtime($mtime);
    my $Size = -s _ ;
    my $Type = -d _ ? "Dir " : "File";

    push @{ $CurDirFiles{$Size} },
       {
          "Date" => $ModDate,
          "Time" => $ModTime,
          "Type" => $Type,
          "Size" => $Size,
          "Attr" => $mode,
          "Name" => $FileName
       };

};

closedir($Dot);

foreach my $Size (sort {$b<=>$a} keys %CurDirFiles)
{
    foreach my $file_record_ref ( @{$CurDirFiles{$Size}} )
    {
       print($file_record_ref->{Date}, "  ");
       print($file_record_ref->{Time}, "  ");
       print($file_record_ref->{Type}, "  ");
       print($file_record_ref->{Size}, "  ");
       print($file_record_ref->{Attr}, "  ");
       print($file_record_ref->{Name}, "\n");
    }
    my $count = scalar(@{$CurDirFiles{$Size}});
    print("$count files in this size group.\n");

    # If fewer than two files exist of this size, go to next size group:
    if ($count < 2) {next;}

    # If we get to here, compare each file to the files to its right:
    for (my $i = 0 ; $i < ($count - 1) ; ++$i)
    {
       for (my $j = 1 ; $j < $count ; ++$j)
       {
          my $filename1 = ${$CurDirFiles{$Size}}[$i]->{Name};
          my $filename2 = ${$CurDirFiles{$Size}}[$j]->{Name};
          print("Comparing ", $filename1, " to ", $filename2, "\n");

          my $filehandle1;
          my $filehandle2;
          open ($filehandle1, "< :raw", $filename1)
             or die "Can't open $filename1 for reading. $!\n";
          open ($filehandle2, "< :raw", $filename2)
             or die "Can't open $filename2 for reading. $!\n";

          my $diff  = 0;
          my $byte1;
          my $byte2;
          while ( read($filehandle1, $byte1, 1) and read($filehandle2, $byte2, 1) )
          {
             if ( ord($byte1) == ord($byte2) )
             {
                next;
             }
             else
             {
                $diff = 1;
                last;
             }
          }
          close($filehandle1);
          close($filehandle2);

          if ($diff)
          {
             print($filename1, " is different from ", $filename2, "\n");
          }
          else
          {
             print($filename1, " is identical to ", $filename2, "\n");
          }
       }
    }
}

sub date_from_mtime
{
    my $TimeDate = scalar localtime shift @_;
    my $Date = substr ($TimeDate, 0, 10);
    $Date .= ", ";
    $Date .= substr ($TimeDate, 20, 4);
    return $Date;
}

sub time_from_mtime
{
    my $TimeDate = scalar localtime shift @_;
    my $Time = substr ($TimeDate, 11, 8);
    return $Time;
}


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/


------------------------------

Date: Mon, 24 Nov 2014 01:07:35 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: How do I compare two files byte-by-byte?
Message-Id: <EtudnUelCqnKaO_JnZ2dnUVZ572dnZ2d@giganews.com>



On 11/23/2014 11:39 PM, Martijn Lievaart wrote:

 > On Sun, 23 Nov 2014 14:49:41 -0800, Robbie Hatley wrote:
 >
 > > ... how to read two open files, synchronously, byte
 > > by byte. How would I go about doing that?
 >
 > You don't as it is slow as molasseses. Open the files binary
 > and use sysread to read large chunks, then compare the chunks.

According to "Programming Perl", Perl's "read()" uses a buffer
between itself and the operating system's "read()", so I don't
think it actually sends 1-byte requests to the OS. More likely
4096 bytes or something like that. So if a program compares
two 15,000byte files, Perl likely sends 3 reads to the OS,
not 15,000 reads.

But if I run into trouble comparing mid-sized files (such as
7MB mp3 files) or large files (such as 4000MB Linux install
DVD images), I'll try switching over to sysread and try to
tinker with buffer sizes.

 > But as others already have stated, this may not be the best
 > way to do it when there are more then two files to byte wise
 > compare.

I think it's the best way for the purpose of a general-purpose
script which one does not want to be dependent on databases and
checksums (which may not exist), and which one wants to be
portable to multiple platforms (Windows and Linux) with no
modifications.


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/



------------------------------

Date: Sun, 23 Nov 2014 17:53:31 -0800 (PST)
From: Kiuhnm Mnhuik <gandalf23@mail.com>
Subject: open and pipes
Message-Id: <f1ce5ac5-af48-4acb-a2e3-ba0e52d142a0@googlegroups.com>

I'm not a Perl programmer, but I need to understand a command injection example. I looked up the open function in the documentation. AFAIK, a pipe can be put before or after a command, but what does the pipe in the middle mean here?
  open($myhandle, "/a/b/|ls|")


------------------------------

Date: Mon, 24 Nov 2014 09:40:30 +0200
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: open and pipes
Message-Id: <m4uncf$26ds$1@news.ntua.gr>

On 24/11/2014 03:53, Kiuhnm Mnhuik wrote:
> I'm not a Perl programmer, but I need to understand a command injection example. I looked up the open function in the documentation. AFAIK, a pipe can be put before or after a command, but what does the pipe in the middle mean here?
>    open($myhandle, "/a/b/|ls|")
>

# two examples.
# Read from an external program
my $pid = open SHELL, '-|', "somecommand 2>&1" or die "Oups $?\n";
while (<SHELL>) { print "$_" }
close SHELL;

# The opposite, write to an external program some data
open  SHELL, '|-',  'somecommand' or die "outs $^E\n";
print SHELL  "$_\n"  foreach 'a1', 'a2', 'a3';
close SHELL;



------------------------------

Date: Mon, 24 Nov 2014 08:57:53 +0100
From: gamo <gamo@telecable.es>
Subject: Re: open and pipes
Message-Id: <m4uodv$c27$1@speranza.aioe.org>

El 24/11/14 a las 02:53, Kiuhnm Mnhuik escribió:
> I'm not a Perl programmer, but I need to understand a command injection example.

  I looked up the open function in the documentation. AFAIK, a pipe can 
be put before

  or after a command, but what does the pipe in the middle mean here?
>    open($myhandle, "/a/b/|ls|")
>

Try it with the real example.

-- 
http://www.telecable.es/personales/gamo/


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4316
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[33040] in Perl-Users-Digest

Perl-Users Digest, Issue: 4316 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Mon Nov 24 05:17:23 2014

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Nov 24 05:17:23 2014