[32694] in Perl-Users-Digest
Perl-Users Digest, Issue: 3867 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jun 3 14:55:58 2013
Date: Thu, 24 Jan 2013 02:17:17 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Thu, 24 Jan 2013 Volume: 11 Number: 3867
Today's topics:
capturing, computing the ephemeris and passing it to gf <cal@example.invalid>
Re: capturing, computing the ephemeris and passing it t <news@lawshouse.org>
Re: capturing, computing the ephemeris and passing it t <cal@example.invalid>
including a Perl script in a C executable <cartercc@gmail.com>
Re: including a Perl script in a C executable <rweikusat@mssgmbh.com>
Re: including a Perl script in a C executable <rweikusat@mssgmbh.com>
Re: including a Perl script in a C executable <ben@morrow.me.uk>
Re: including a Perl script in a C executable <jurgenex@hotmail.com>
Re: including a Perl script in a C executable <derykus@gmail.com>
Re: including a Perl script in a C executable <cartercc@gmail.com>
Re: Trouble with embedded whitespace in filenames using <rweikusat@mssgmbh.com>
Re: Trouble with embedded whitespace in filenames using <usenet.14@scottsonline.org.uk.invalid>
Re: Trouble with embedded whitespace in filenames using <rweikusat@mssgmbh.com>
Re: Trouble with embedded whitespace in filenames using <usenet.14@scottsonline.org.uk.invalid>
Re: Trouble with embedded whitespace in filenames using <rweikusat@mssgmbh.com>
Re: Trouble with embedded whitespace in filenames using <jimsgibson@gmail.com>
Re: Trouble with embedded whitespace in filenames using <willem@turtle.stack.nl>
Re: Trouble with embedded whitespace in filenames using <ben@morrow.me.uk>
Re: Trouble with embedded whitespace in filenames using <rweikusat@mssgmbh.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Thu, 24 Jan 2013 02:15:41 -0700
From: Cal Dershowitz <cal@example.invalid>
Subject: capturing, computing the ephemeris and passing it to gfortran
Message-Id: <lPCdndHDCIAgZp3MnZ2dnUVZ_gqdnZ2d@supernews.com>
Hello newsgroup,
I return many moons after I tried this once before as I don't know who.
I always remember Wade Ward, who might be a textbook example of
writing perl in fortran, all hideous 300 instances of it.
Wade's a good guy who always tries hard, and we call ourselves brothers,
but he, like fortran, can't do I/O for shit.
That's the nominal reason for me establishing a new appropriate
tool-chain for what I'm going to be doing.
I've been thinking about the fortran part for a while, so assume that
that part won't embarrass you.
$ pwd
/home/fred/Desktop/27.5
$ ls
Screenshot from 2013-01-24 00:19:49.png
Screenshot from 2013-01-24 00:20:14.png
$
How does one use perl to upload images with the meta-data?
thx for your comment,
--
cal
------------------------------
Date: Thu, 24 Jan 2013 09:51:22 +0000
From: Henry Law <news@lawshouse.org>
Subject: Re: capturing, computing the ephemeris and passing it to gfortran
Message-Id: <upSdndQkqviHmZzMnZ2dnUVZ8hGdnZ2d@giganews.com>
On 24/01/13 09:15, Cal Dershowitz wrote:
> That's the nominal reason for me establishing a new appropriate
> tool-chain for what I'm going to be doing.
>
> I've been thinking about the fortran part for a while, so assume that
> that part won't embarrass you.
>
> $ pwd
> /home/fred/Desktop/27.5
> $ ls
> Screenshot from 2013-01-24 00:19:49.png
> Screenshot from 2013-01-24 00:20:14.png
> $
>
>
> How does one use perl to upload images with the meta-data?
Is this a coded message to a sleeper group? It makes no sense at all.
--
Henry Law Manchester, England
------------------------------
Date: Thu, 24 Jan 2013 03:08:20 -0700
From: Cal Dershowitz <cal@example.invalid>
Subject: Re: capturing, computing the ephemeris and passing it to gfortran
Message-Id: <7O-dneKRyIaOlZzMnZ2dnUVZ_hSdnZ2d@supernews.com>
On 01/24/2013 02:51 AM, Henry Law wrote:
> On 24/01/13 09:15, Cal Dershowitz wrote:
>> That's the nominal reason for me establishing a new appropriate
>> tool-chain for what I'm going to be doing.
>>
>> I've been thinking about the fortran part for a while, so assume that
>> that part won't embarrass you.
>>
>> $ pwd
>> /home/fred/Desktop/27.5
>> $ ls
>> Screenshot from 2013-01-24 00:19:49.png
>> Screenshot from 2013-01-24 00:20:14.png
>> $
>>
>>
>> How does one use perl to upload images with the meta-data?
>
> Is this a coded message to a sleeper group? It makes no sense at all.
>
>
go to sleep, grandpa.
--
cal
------------------------------
Date: Tue, 22 Jan 2013 07:11:52 -0800 (PST)
From: ccc31807 <cartercc@gmail.com>
Subject: including a Perl script in a C executable
Message-Id: <24c3c9c5-ddbd-40e2-881e-9d6fdbd36421@googlegroups.com>
I have a small C program that displays file attributes, essentially running=
the stat command and emitting the output to the console.
This morning, I have a new requirement -- to nicely format the output, prin=
ting the field names in CAPS and justifying the field values.
C isn't my game, and while I probably could (with some effort) read up on r=
eading in system output as strings, munging them, and displaying them, if i=
t's possible I'd rather use Perl. The program must be self contained in one=
exe file. The object is to save a manager from knowing anything about the =
stat command (yeah, I know.)
Is this doable? Or must I write the entire thing in C?
Thanks, CC.
------------------------------
Date: Tue, 22 Jan 2013 15:49:02 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: including a Perl script in a C executable
Message-Id: <87k3r5jk0h.fsf@sapphire.mobileactivedefense.com>
ccc31807 <cartercc@gmail.com> writes:
> I have a small C program that displays file attributes, essentially
> running the stat command and emitting the output to the console.
>
> This morning, I have a new requirement -- to nicely format the
> output, printing the field names in CAPS and justifying the field
> values.
>
> C isn't my game, and while I probably could (with some effort) read
> up on reading in system output as strings, munging them, and
> displaying them, if it's possible I'd rather use Perl. The program
> must be self contained in one exe file. The object is to save a
> manager from knowing anything about the stat command (yeah, I know.)
>
> Is this doable?
You could write a Perl script which generates a C source file from a
perl script by creating an initialized array of char *s containg the
lines of the Perl script, short mockup of that:
------------------
print("static char const *script[] = {\n");
while (<>) {
chomp;
s/(["\\])/\\$1/g;
print("\t\"$_\\n\",\n");
}
print("\t0\n};\n");
-----------------
If you want to keep things simple, you could then use popen to start
the perl interpreter with input from a pipe and (assuming the FILE *
was named perl_in) use a loop a la
char *cur;
cur = script;
while (*cur) {
fprintf(perl_in, "%s\n", *cur);
++cur;
}
pclose(perl_in);
to feed the 'compiled in' script to perl for execution.
OTOH, why don't you use Perl to determine the the necessary output
data with the help of stat/fstat (=> perldoc -f -f) and print the
'nicely formatted' output in one go?
------------------------------
Date: Tue, 22 Jan 2013 16:51:12 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: including a Perl script in a C executable
Message-Id: <8738xtjh4v.fsf@sapphire.mobileactivedefense.com>
Rainer Weikusat <rweikusat@mssgmbh.com> writes:
[...]
> You could write a Perl script which generates a C source file from a
> perl script by creating an initialized array of char *s containg the
> lines of the Perl script, short mockup of that:
>
> ------------------
> print("static char const *script[] = {\n");
[...]
> use a loop a la
>
> char *cur;
This needs to be char const **cur ...
------------------------------
Date: Tue, 22 Jan 2013 16:58:07 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: including a Perl script in a C executable
Message-Id: <v1t2t9-2931.ln1@anubis.morrow.me.uk>
Quoth ccc31807 <cartercc@gmail.com>:
> I have a small C program that displays file attributes, essentially
> running the stat command and emitting the output to the console.
>
> This morning, I have a new requirement -- to nicely format the output,
> printing the field names in CAPS and justifying the field values.
>
> C isn't my game, and while I probably could (with some effort) read up
> on reading in system output as strings, munging them, and displaying
That would be an insane way to do this, in either C or Perl. Both C and
Perl have a stat() function, so use that.
> them, if it's possible I'd rather use Perl. The program must be self
> contained in one exe file. The object is to save a manager from knowing
> anything about the stat command (yeah, I know.)
>
> Is this doable? Or must I write the entire thing in C?
AFAIK it's still not easy. Last time I looked the nearest thing to a
turnkey solution was PAR, which will indeed build you a self-contained
executable, but that executable contains a zipfile which it needs to
unpack in a temporary directory before it can run. I have in the past
used PAR to make things easier for non-technical users, and found that
every so often it would lose its mind and do something stupid. That was
a while ago, though, so it's possible things have improved.
IMHO the Right Thing to do here is to use the perlembed facilities to
build a custom version of perl that just runs one program. With
appropriate use of PerlIO layers and @INC hooks it's entirely possible
to include a whole @INC tree in the executable image, and read it from
there without unpacking it anywhere.
Once upon a time I had the start of a system for automating that, but
IIRC I got stuck on Dynaloaded extensions; I think my plan was to link
them statically instead, but since most people don't link their perls
statically any more the facilities for doing that don't work as well as
they once did. (I don't think Module::Build supports static linking at
all, for instance.) Hmm, let's see...
http://github.com/mauzo/ExtUtils-PerlToExe. Unfinished, unreleased,
you-get-to-keep-both-pieces, and so on.
Ben
------------------------------
Date: Tue, 22 Jan 2013 10:11:18 -0800
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: including a Perl script in a C executable
Message-Id: <04ltf8l5leipgg9b902mcf76q4hlfbghi9@4ax.com>
Ben Morrow <ben@morrow.me.uk> wrote:
>
>Quoth ccc31807 <cartercc@gmail.com>:
>> I have a small C program that displays file attributes, essentially
>> running the stat command and emitting the output to the console.
>>
>> This morning, I have a new requirement -- to nicely format the output,
>> printing the field names in CAPS and justifying the field values.
>>
>> C isn't my game, and while I probably could (with some effort) read up
>> on reading in system output as strings, munging them, and displaying
>
>That would be an insane way to do this, in either C or Perl. Both C and
>Perl have a stat() function, so use that.
>
>> them, if it's possible I'd rather use Perl.
perldoc -f sprintf
>The program must be self
>> contained in one exe file. The object is to save a manager from knowing
>> anything about the stat command (yeah, I know.)
Teach your manager to select in Windows Explorer (you did mention exe,
therefore I conclude that you are on Windows) "View" -> "Details";
right-click on the header bar where it says "Name, Date Modified, Type
... " (actual headers will vary), and select "More..." from the context
menu. Then he can select to display whatever information his heart
desires.
jue
------------------------------
Date: Tue, 22 Jan 2013 16:26:24 -0800 (PST)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: including a Perl script in a C executable
Message-Id: <d388b013-a2ae-437e-868e-bd9e1de65ce9@googlegroups.com>
On Tuesday, January 22, 2013 7:11:52 AM UTC-8, ccc31807 wrote:
> I have a small C program that displays file attributes, essentially runni=
ng the stat command and emitting the output to the console.
>=20
>=20
>=20
> This morning, I have a new requirement -- to nicely format the output, pr=
inting the field names in CAPS and justifying the field values.
>=20
>=20
>=20
> C isn't my game, and while I probably could (with some effort) read up on=
reading in system output as strings, munging them, and displaying them, if=
it's possible I'd rather use Perl. The program must be self contained in o=
ne exe file. The object is to save a manager from knowing anything about th=
e stat command (yeah, I know.)
>=20
>=20
>=20
> Is this doable? Or must I write the entire thing in C?
>=20
>=20
IIUC (and I'm not at all sure I do), you could=20
comment out the existing code in the .exe and=20
hack up a replacement to shell out to a runnable=20
perl script=20
=20
Of course, just adding some formatting to the
original would probably be easier. But, if you=20
want the former and, assuming .exe takes a file=20
as its only argument, here's a quick sketch:
#include <stdio.h>
int main( int argc, char *argv[] ) {
// put entire perl script in string
char *perlsource =3D "#!/usr/bin/perl\n...";
// output perl source to file=20
FILE *fp;
char *perlfile =3D "/tmp/doppelgaenger.pl"; // edit
// add error checking to open/flose/system
fp =3D fopen(perlfile, "w");=20
fprintf(fp, "%s\n", perlsource);
fclose(fp);
char cmd[100];
sprintf(cmd, "perl %s %s", perlfile, argv[1] );
system(cmd);
}
--=20
Charles DeRykus
------------------------------
Date: Wed, 23 Jan 2013 08:25:06 -0800 (PST)
From: ccc31807 <cartercc@gmail.com>
Subject: Re: including a Perl script in a C executable
Message-Id: <b91901ee-d42c-4ca0-a103-6d852d6d5ff2@googlegroups.com>
On Tuesday, January 22, 2013 1:11:18 PM UTC-5, J=FCrgen Exner wrote:
>=20
> therefore I conclude that you are on Windows) "View" -> "Details";=20
> right-click on the header bar where it says "Name, Date Modified, Type=20
> ... " (actual headers will vary), and select "More..." from the context=
=20
> menu. Then he can select to display whatever information his heart=20
> desires.
It's a 'manager problem' and you are exactly right. I can't cater to every =
whim, at least not when I've got work to do otherwise. He can get the infor=
mation he wants, but not necessarily the format he wants, and we've worked =
it out.
Besides, I just wondered if it could be done ... an idle thought that I'm n=
ow a little ashamed of posting. I write very little in C, I write a lot of =
Perl, and I suppose I can blame the question on a neuron misfire thinking t=
hat I should be able to invoke a Perl script from C the same way I can invo=
ke an executable in Perl (by system(), e.g.).
Thanks to all who replied, CC.
------------------------------
Date: Tue, 22 Jan 2013 12:10:21 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Trouble with embedded whitespace in filenames using File::Find
Message-Id: <8738xtifki.fsf@sapphire.mobileactivedefense.com>
Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Clint O <clint.olsen@gmail.com>:
>> On Monday, January 21, 2013 1:24:28 PM UTC-8, Rainer Weikusat wrote:
>> >
>> > BTW: Assuming you're running this as root, someone who doesn't like
>> >
>> > you could create a file named |rm -rf `printf "\x2f"` and you probably
>> >
>> > wouldn't like the result of trying to open that.
>>
>> Ok, thanks for the tip and the heads-up. I am running the program as
>> root on a NAS, and the files are created by my family, but just as a
>> good FYI, are there ways I can protect myself against malicious code?
>> Running as root ensures I can read all the files w/o question.
[...]
> If you must do this as root, I would seriously consider using find(1),
> xargs(1) and md5(1) instead, assuming your find and xargs support the
> -print0 and -0 arguments. You're much less likely to make a serious
> mistake using preexisting utilities than trying to write your own.
Sorry to be so blunt but this is a really stupid suggestion: It's not
only that a lot of characters valid in filenames are of syntactic
relevance to the shell but it will also perform multiple passes of
textual substitution on a complete input line and happily execute
whatever the combined result happens to be, IOW, the shell does not
genuinely distinguish between 'script text from a file' and 'text
produced as result of an operation performed by the script', making it
an extremely poor choice for writing code supposed to run in a hostile
environment. perl is much better in this respect because it not only
doesn't execute data 'by default' (just when explicitly asked to) but
it can also be made to complain about a lot of potentially unsafe
'data flows', see 'Taint mode' in perlsec. These checks can be onerous
at times but they should catch a lot of accidental errors (such as the
2-arg open of a string which came from the file system).
------------------------------
Date: Wed, 23 Jan 2013 09:29:04 +0000
From: Mike Scott <usenet.14@scottsonline.org.uk.invalid>
Subject: Re: Trouble with embedded whitespace in filenames using File::Find
Message-Id: <kdoah0$a25$1@dont-email.me>
On 21/01/13 21:39, Clint O wrote:
....
>
> Ok, thanks for the tip and the heads-up. I am running the program as
> root on a NAS, and the files are created by my family, but just as a
> good FYI, are there ways I can protect myself against malicious code?
> Running as root ensures I can read all the files w/o question. I've
> used Safe before, but I'm not sure whether it's necessary or
> appropriate for this application.
>
If I may ask a naive question.... Why are you writing a duplicate-file
finder from scratch when programs such as fdupes already exist and
presumably have such issues already resolved?
fdupes "searches the given path for duplicate files. Such files are
found by comparing file sizes and MD5 signatures, followed by a
byte-by-byte comparison". That last bit is important.
--
Mike Scott (unet2 <at> [deletethis] scottsonline.org.uk)
Harlow Essex England
------------------------------
Date: Wed, 23 Jan 2013 13:21:59 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Trouble with embedded whitespace in filenames using File::Find
Message-Id: <87sj5skpag.fsf@sapphire.mobileactivedefense.com>
Mike Scott <usenet.14@scottsonline.org.uk.invalid> writes:
> On 21/01/13 21:39, Clint O wrote:
> ....
>>
>> Ok, thanks for the tip and the heads-up. I am running the program as
>> root on a NAS, and the files are created by my family, but just as a
>> good FYI, are there ways I can protect myself against malicious code?
>> Running as root ensures I can read all the files w/o question. I've
>> used Safe before, but I'm not sure whether it's necessary or
>> appropriate for this application.
>
> If I may ask a naive question.... Why are you writing a
> duplicate-file finder from scratch when programs such as fdupes
> already exist and presumably have such issues already resolved?
May I ask you an equally naive question? Why precisely do you think
your statement is even remotely on topic for a Perl newsgroup?
> fdupes "searches the given path for duplicate files. Such files are
> found by comparing file sizes and MD5 signatures, followed by a
> byte-by-byte comparison". That last bit is important.
Indeed. It commmunicates that the author didn't really think straight:
Calculating a MD5 hash of a file requires an expensive processing
operation to be performed for each byte of this file. OTOH, comparing
the content of files of identical sizes (which should already be quite
rare) with each other will usually stop early if the files are not
identical.
------------------------------
Date: Wed, 23 Jan 2013 15:45:16 +0000
From: Mike Scott <usenet.14@scottsonline.org.uk.invalid>
Subject: Re: Trouble with embedded whitespace in filenames using File::Find
Message-Id: <kdp0ic$ebf$1@dont-email.me>
On 23/01/13 13:21, Rainer Weikusat wrote:
> Mike Scott <usenet.14@scottsonline.org.uk.invalid> writes:
...
>> If I may ask a naive question.... Why are you writing a
>> duplicate-file finder from scratch when programs such as fdupes
>> already exist and presumably have such issues already resolved?
>
> May I ask you an equally naive question? Why precisely do you think
> your statement is even remotely on topic for a Perl newsgroup?
Because it's answering the issue implicit in the the original post, and
may save the OP considerable effort and pain. I entirely agree a
discussion of fdupes itself would be out of place here.
--
Mike Scott (unet2 <at> [deletethis] scottsonline.org.uk)
Harlow Essex England
------------------------------
Date: Wed, 23 Jan 2013 16:00:39 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Trouble with embedded whitespace in filenames using File::Find
Message-Id: <87libjx520.fsf@sapphire.mobileactivedefense.com>
Mike Scott <usenet.14@scottsonline.org.uk.invalid> writes:
> On 23/01/13 13:21, Rainer Weikusat wrote:
>> Mike Scott <usenet.14@scottsonline.org.uk.invalid> writes:
> ...
>>> If I may ask a naive question.... Why are you writing a
>>> duplicate-file finder from scratch when programs such as fdupes
>>> already exist and presumably have such issues already resolved?
>>
>> May I ask you an equally naive question? Why precisely do you think
>> your statement is even remotely on topic for a Perl newsgroup?
>
> Because it's answering the issue implicit in the the original post,
You asserted that you are convinced that a certain program wouldn't
suffer from a certain problem. The question was "Why doesn't
the perl 2-argument open work with filenames containing leading
whitespace?". Even if your believes about this program happen to be
correct, voicing them doesn't answer the question.
------------------------------
Date: Wed, 23 Jan 2013 09:32:22 -0800
From: Jim Gibson <jimsgibson@gmail.com>
Subject: Re: Trouble with embedded whitespace in filenames using File::Find
Message-Id: <230120130932223013%jimsgibson@gmail.com>
In article <87sj5skpag.fsf@sapphire.mobileactivedefense.com>, Rainer
Weikusat <rweikusat@mssgmbh.com> wrote:
> Indeed. It commmunicates that the author didn't really think straight:
> Calculating a MD5 hash of a file requires an expensive processing
> operation to be performed for each byte of this file. OTOH, comparing
> the content of files of identical sizes (which should already be quite
> rare) with each other will usually stop early if the files are not
> identical.
True enough, but if you have N files and are looking for duplicates
among any pair, it is probably more efficient to compute a checksum for
each of the files, then look for duplicates among the checksums. If the
files are large enough, comparing checksums will be faster than
comparing the files themselves.
--
Jim Gibson
------------------------------
Date: Wed, 23 Jan 2013 17:52:59 +0000 (UTC)
From: Willem <willem@turtle.stack.nl>
Subject: Re: Trouble with embedded whitespace in filenames using File::Find
Message-Id: <slrnkg08rr.1701.willem@turtle.stack.nl>
Jim Gibson wrote:
) In article <87sj5skpag.fsf@sapphire.mobileactivedefense.com>, Rainer
) Weikusat <rweikusat@mssgmbh.com> wrote:
)
)> Indeed. It commmunicates that the author didn't really think straight:
)> Calculating a MD5 hash of a file requires an expensive processing
)> operation to be performed for each byte of this file. OTOH, comparing
)> the content of files of identical sizes (which should already be quite
)> rare) with each other will usually stop early if the files are not
)> identical.
)
) True enough, but if you have N files and are looking for duplicates
) among any pair, it is probably more efficient to compute a checksum for
) each of the files, then look for duplicates among the checksums. If the
) files are large enough, comparing checksums will be faster than
) comparing the files themselves.
That depends. Even assuming the files are all the same size, it's
quite probable that there will be differences in the first block.
I think a good approach is to first group by file size, but read
the first N bytes of each file as well and keep those in memory.
(To take advantage of filesystems that store the first chunk of
a file inside the inode).
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
------------------------------
Date: Wed, 23 Jan 2013 20:16:48 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Trouble with embedded whitespace in filenames using File::Find
Message-Id: <g2t5t9-ur42.ln1@anubis.morrow.me.uk>
Quoth Jim Gibson <jimsgibson@gmail.com>:
> In article <87sj5skpag.fsf@sapphire.mobileactivedefense.com>, Rainer
> Weikusat <rweikusat@mssgmbh.com> wrote:
>
> > Indeed. It commmunicates that the author didn't really think straight:
> > Calculating a MD5 hash of a file requires an expensive processing
MD5 is not expensive. It probably takes less time to MD5 two files,
reading each sequentially, than it takes to read alternating blocks from
each file, with the associated disk seeks. (This will depend greatly on
your filesystem's block allocation and caching policies.) Of course, if
you are expecting large numbers of duplicate sizes it may be worth using
a better checksum like sha512 that has a negligible chance of giving a
false positive, since then you can skip the final compare step.
> > operation to be performed for each byte of this file. OTOH, comparing
> > the content of files of identical sizes (which should already be quite
> > rare) with each other will usually stop early if the files are not
> > identical.
>
> True enough, but if you have N files and are looking for duplicates
> among any pair, it is probably more efficient to compute a checksum for
> each of the files, then look for duplicates among the checksums. If the
> files are large enough, comparing checksums will be faster than
> comparing the files themselves.
You mean 'if there are enough duplicated sizes'; the size of the data
you are checksumming is irrelevant, except that for small files (less
than a block) it may be better to just record the whole contents of the
file. It's also worth calculating the checksums lazily: that is, don't
sum a file until you've found another file the same size. That way, if a
file has a unique size you don't need to touch its data blocks at all.
Ben
------------------------------
Date: Wed, 23 Jan 2013 21:24:36 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Trouble with embedded whitespace in filenames using File::Find
Message-Id: <87d2wvy4mj.fsf@sapphire.mobileactivedefense.com>
Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Jim Gibson <jimsgibson@gmail.com>:
>> In article <87sj5skpag.fsf@sapphire.mobileactivedefense.com>, Rainer
>> Weikusat <rweikusat@mssgmbh.com> wrote:
>>
>> > Indeed. It commmunicates that the author didn't really think straight:
>> > Calculating a MD5 hash of a file requires an expensive processing
>
> MD5 is not expensive. It probably takes less time to MD5 two files,
> reading each sequentially, than it takes to read alternating blocks from
> each file, with the associated disk seeks.
MD5 (or any other hashing algorithm) is a lot more expensive than a
comparison and especially so if MD5 needs to process 2G of data while
the comparison would only need 8K. This means that MD5 will usually
lose if the files are different. And MD5 + byte-by-byte comparison
will usually lose if they aren't. Anything else is a pathological
situation (eg, lots of large files differing in the last few bytes).
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3867
***************************************