[33046] in Perl-Users-Digest
Perl-Users Digest, Issue: 4322 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Dec 4 21:09:21 2014
Date: Thu, 4 Dec 2014 18:09:06 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Thu, 4 Dec 2014 Volume: 11 Number: 4322
Today's topics:
Re: Can you point me in the right direction - accessing <jblack@nospam.com>
Re: Can you point me in the right direction - accessing <jblack@nospam.com>
Re: Can you point me in the right direction - accessing <sbryce@scottbryce.com>
Re: Can you point me in the right direction - accessing <jblack@nospam.com>
Re: Can you point me in the right direction - accessing <news@lawshouse.org>
Re: Can you point me in the right direction - accessing <gamo@telecable.es>
Re: Can you point me in the right direction - accessing <news@lawshouse.org>
Re: Can you point me in the right direction - accessing <gamo@telecable.es>
Re: Can you point me in the right direction - accessing <jblack@nospam.com>
Re: Can you point me in the right direction - accessing <news@lawshouse.org>
Re: Can you point me in the right direction - accessing <lws4art@gmail.com>
Re: Can you point me in the right direction - accessing <see.my.sig@for.my.address>
Dedup script is finished and works. Critiques? <see.my.sig@for.my.address>
Hard challenge: want to form a Perl team? <gamo@telecable.es>
Re: How do I compare two files byte-by-byte? <see.my.sig@for.my.address>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Wed, 3 Dec 2014 11:46:06 -0600
From: John Black <jblack@nospam.com>
Subject: Re: Can you point me in the right direction - accessing a website
Message-Id: <MPG.2ee90858267ac9c5989802@news.eternal-september.org>
In article <m5g28d$gl0$1@news.ntua.gr>, gravitalsun@hotmail.foo says...
>
> On 30/11/2014 23:28, John Black wrote:
> > Hi,
> >
> > I am relatively good with basic Perl but don't know web stuff. I don't need to build a
> > website (right now). All I want to do is have a script that accesses a certain webpage, like
> > my bank, and logs in via a user id and pwd and then grabs some data like current balance. I
> > know how to make the program run once a day so that my computer can automatically stay on top
> > of my account and send me the daily balance or other alerts. Can someone point me in the
> > right direction? I'd bet there are existing cpan modules that do things like that. It does
> > not have to be specific to any application - just something that can pull in a page and log
> > in to it and then navigate around a little to grab info. Thanks.
> >
> > John Black
> >
>
>
> fits WWW-Mechanize to your problem
I installed Mechanize and it worked for non-https sites. But the sites I need are https.
$ perl www.pl
Error GETing https://www.yahoo.com: Protocol scheme 'https' is not supported
(LWP::Protocol::https not installed) at www.pl line 9.
So I try to install LWP::Protocol::https but I seem to have zero luck with cpan or cpanm. In
anything other than a trivial package, there is always some dependant module that fails to
install for some reason
$ cpanm LWP::Protocol::https
--> Working on LWP::Protocol::https
Fetching http://www.cpan.org/authors/id/M/MS/MSCHILLI/LWP-Protocol-https-6.06.tar.gz ... OK
Configuring LWP-Protocol-https-6.06 ... OK
==> Found dependencies: IO::Socket::SSL, LWP::UserAgent
--> Working on IO::Socket::SSL
Fetching http://www.cpan.org/authors/id/S/SU/SULLR/IO-Socket-SSL-2.007.tar.gz ... OK
Configuring IO-Socket-SSL-2.007 ... OK
==> Found dependencies: Net::SSLeay
--> Working on Net::SSLeay
Fetching http://www.cpan.org/authors/id/M/MI/MIKEM/Net-SSLeay-1.66.tar.gz ... OK
Configuring Net-SSLeay-1.66 ... OK
Building and testing Net-SSLeay-1.66 ... FAIL
! Installing Net::SSLeay failed. See /home/richardn/.cpanm/work/1417627934.14540/build.log
for details. Retry with --force to force install it.
! Installing the dependencies failed: Module 'Net::SSLeay' is not installed
! Bailing out the installation for IO-Socket-SSL-2.007.
--> Working on LWP::UserAgent
Fetching http://www.cpan.org/authors/id/M/MS/MSCHILLI/libwww-perl-6.08.tar.gz ... OK
Configuring libwww-perl-6.08 ... OK
==> Found dependencies: Net::HTTP
--> Working on Net::HTTP
Fetching http://www.cpan.org/authors/id/M/MS/MSCHILLI/Net-HTTP-6.07.tar.gz ... OK
Configuring Net-HTTP-6.07 ... OK
Building and testing Net-HTTP-6.07 ... OK
Successfully installed Net-HTTP-6.07 (upgraded from 6.06)
Building and testing libwww-perl-6.08 ... FAIL
! Installing LWP::UserAgent failed. See /home/richardn/.cpanm/work/1417627934.14540/build.log
for details. Retry with --force to force install it.
! Installing the dependencies failed: Module 'IO::Socket::SSL' is not installed, Installed
version (6.05) of LWP::UserAgent is not in range '6.06'
! Bailing out the installation for LWP-Protocol-https-6.06.
1 distribution installed
Still no https support...
$ perl www.pl
Error GETing https://www.yahoo.com: Protocol scheme 'https' is not supported
(LWP::Protocol::https not installed) at www.pl line 9.
John Black
------------------------------
Date: Wed, 3 Dec 2014 11:55:31 -0600
From: John Black <jblack@nospam.com>
Subject: Re: Can you point me in the right direction - accessing a website
Message-Id: <MPG.2ee90a8d87fba8fa989803@news.eternal-september.org>
In article <MPG.2ee90858267ac9c5989802@news.eternal-september.org>, jblack@nospam.com says...
>
> In article <m5g28d$gl0$1@news.ntua.gr>, gravitalsun@hotmail.foo says...
> >
> > On 30/11/2014 23:28, John Black wrote:
> > > Hi,
> > >
> > > I am relatively good with basic Perl but don't know web stuff. I don't need to build a
> > > website (right now). All I want to do is have a script that accesses a certain webpage, like
> > > my bank, and logs in via a user id and pwd and then grabs some data like current balance. I
> > > know how to make the program run once a day so that my computer can automatically stay on top
> > > of my account and send me the daily balance or other alerts. Can someone point me in the
> > > right direction? I'd bet there are existing cpan modules that do things like that. It does
> > > not have to be specific to any application - just something that can pull in a page and log
> > > in to it and then navigate around a little to grab info. Thanks.
> > >
> > > John Black
> > >
> >
> >
> > fits WWW-Mechanize to your problem
>
> I installed Mechanize and it worked for non-https sites. But the sites I need are https.
>
> $ perl www.pl
> Error GETing https://www.yahoo.com: Protocol scheme 'https' is not supported
> (LWP::Protocol::https not installed) at www.pl line 9.
>
> So I try to install LWP::Protocol::https but I seem to have zero luck with cpan or cpanm. In
> anything other than a trivial package, there is always some dependant module that fails to
> install for some reason
>
>
> $ cpanm LWP::Protocol::https
> --> Working on LWP::Protocol::https
> Fetching http://www.cpan.org/authors/id/M/MS/MSCHILLI/LWP-Protocol-https-6.06.tar.gz ... OK
> Configuring LWP-Protocol-https-6.06 ... OK
> ==> Found dependencies: IO::Socket::SSL, LWP::UserAgent
> --> Working on IO::Socket::SSL
> Fetching http://www.cpan.org/authors/id/S/SU/SULLR/IO-Socket-SSL-2.007.tar.gz ... OK
> Configuring IO-Socket-SSL-2.007 ... OK
> ==> Found dependencies: Net::SSLeay
> --> Working on Net::SSLeay
> Fetching http://www.cpan.org/authors/id/M/MI/MIKEM/Net-SSLeay-1.66.tar.gz ... OK
> Configuring Net-SSLeay-1.66 ... OK
> Building and testing Net-SSLeay-1.66 ... FAIL
> ! Installing Net::SSLeay failed. See /home/richardn/.cpanm/work/1417627934.14540/build.log
> for details. Retry with --force to force install it.
> ! Installing the dependencies failed: Module 'Net::SSLeay' is not installed
> ! Bailing out the installation for IO-Socket-SSL-2.007.
> --> Working on LWP::UserAgent
> Fetching http://www.cpan.org/authors/id/M/MS/MSCHILLI/libwww-perl-6.08.tar.gz ... OK
> Configuring libwww-perl-6.08 ... OK
> ==> Found dependencies: Net::HTTP
> --> Working on Net::HTTP
> Fetching http://www.cpan.org/authors/id/M/MS/MSCHILLI/Net-HTTP-6.07.tar.gz ... OK
> Configuring Net-HTTP-6.07 ... OK
> Building and testing Net-HTTP-6.07 ... OK
> Successfully installed Net-HTTP-6.07 (upgraded from 6.06)
> Building and testing libwww-perl-6.08 ... FAIL
> ! Installing LWP::UserAgent failed. See /home/richardn/.cpanm/work/1417627934.14540/build.log
> for details. Retry with --force to force install it.
> ! Installing the dependencies failed: Module 'IO::Socket::SSL' is not installed, Installed
> version (6.05) of LWP::UserAgent is not in range '6.06'
> ! Bailing out the installation for LWP-Protocol-https-6.06.
> 1 distribution installed
>
> Still no https support...
>
> $ perl www.pl
> Error GETing https://www.yahoo.com: Protocol scheme 'https' is not supported
> (LWP::Protocol::https not installed) at www.pl line 9.
>
> John Black
Traced the problem to this: (not sure why needed include files like err.h are not found??)
chmod 644 SSLeay.bs
/usr/bin/perl.exe "-Iinc" /usr/lib/perl5/5.14/ExtUtils/xsubpp -typemap
/usr/lib/perl5/5.14/ExtUtils/typemap -typemap typemap SSLeay.xs > SSLeay.xsc && mv
SSLeay.xsc SSLeay.c
gcc-4 -c -DPERL_USE_SAFE_PUTENV -U__STRICT_ANSI__ -g -fno-strict-aliasing -pipe -fstack-
protector -DUSEIMPORTLIB -O3 -DVERSION=\"1.66\" -DXS_VERSION=\"1.66\" "-
I/usr/lib/perl5/5.14/i686-cygwin-threads-64int/CORE" SSLeay.c
SSLeay.xs:163:25: fatal error: openssl/err.h: No such file or directory
#include <openssl/err.h>
^
compilation terminated.
Makefile:352: recipe for targ
John Black
------------------------------
Date: Wed, 03 Dec 2014 10:58:05 -0700
From: Scott Bryce <sbryce@scottbryce.com>
Subject: Re: Can you point me in the right direction - accessing a website
Message-Id: <m5niut$vmu$1@dont-email.me>
On 12/3/2014 10:46 AM, John Black wrote:
> I installed Mechanize and it worked for non-https sites. But the
> sites I need are https.
So try LWP::UserAgent. I use it on https sites.
------------------------------
Date: Wed, 3 Dec 2014 12:13:08 -0600
From: John Black <jblack@nospam.com>
Subject: Re: Can you point me in the right direction - accessing a website
Message-Id: <MPG.2ee90eac55d92f95989804@news.eternal-september.org>
In article <m5niut$vmu$1@dont-email.me>, sbryce@scottbryce.com says...
>
> On 12/3/2014 10:46 AM, John Black wrote:
> > I installed Mechanize and it worked for non-https sites. But the
> > sites I need are https.
>
>
> So try LWP::UserAgent. I use it on https sites.
This failed install as well. I've grown to hate cpan because it just represents what I can't
have.
John Black
------------------------------
Date: Wed, 03 Dec 2014 18:40:40 +0000
From: Henry Law <news@lawshouse.org>
Subject: Re: Can you point me in the right direction - accessing a website
Message-Id: <BradnVGMO8e0xOLJnZ2dnUVZ7sUAAAAA@giganews.com>
On 03/12/14 18:13, John Black wrote:
> I've grown to hate cpan because it just represents what I can't
> have.
Risking going off at a tangent here, that's the first problem you need
to fix, if you don't mind my saying so. I use CPAN fairly often (though
I tend to go first to the Ubuntu repos) and I can't remember the last
time that it failed to install what I need. Like Perl it Does What I Mean.
--
Henry Law Manchester, England
------------------------------
Date: Wed, 03 Dec 2014 20:49:09 +0100
From: gamo <gamo@telecable.es>
Subject: Re: Can you point me in the right direction - accessing a website
Message-Id: <m5npfi$jo8$1@speranza.aioe.org>
El 03/12/14 a las 19:40, Henry Law escribió:
> On 03/12/14 18:13, John Black wrote:
>> I've grown to hate cpan because it just represents what I can't
>> have.
>
> Risking going off at a tangent here, that's the first problem you need
> to fix, if you don't mind my saying so. I use CPAN fairly often (though
> I tend to go first to the Ubuntu repos) and I can't remember the last
> time that it failed to install what I need. Like Perl it Does What I Mean.
>
BTW, I'm using Ubuntu. How do you install modules without cpan?
PS: Sorry John, more tangent.
--
http://www.telecable.es/personales/gamo/
------------------------------
Date: Wed, 03 Dec 2014 21:34:02 +0000
From: Henry Law <news@lawshouse.org>
Subject: Re: Can you point me in the right direction - accessing a website
Message-Id: <dKSdnTLhOM5WHOLJnZ2dnUVZ8uKdnZ2d@giganews.com>
On 03/12/14 19:49, gamo wrote:
> I'm using Ubuntu. How do you install modules without cpan?
In general the module Foo::Bar is in a package called libfoo-bar-perl,
but you can usually find it using "apt-cache search foo-bar".
Thus XML::Simple is libxml-simple-perl, DBI is libdbi-perl,
Finance::Quote is libfinance-quote-perl ... and so on.
(PS how you do this with a graphic front-end I don't know 'cos I'm old
enough to be a command-line jockey. I'm sure you'll find out how.)
--
Henry Law Manchester, England
------------------------------
Date: Wed, 03 Dec 2014 23:20:47 +0100
From: gamo <gamo@telecable.es>
Subject: Re: Can you point me in the right direction - accessing a website
Message-Id: <m5o2br$bt5$1@speranza.aioe.org>
El 03/12/14 a las 22:34, Henry Law escribió:
> On 03/12/14 19:49, gamo wrote:
>> I'm using Ubuntu. How do you install modules without cpan?
>
> In general the module Foo::Bar is in a package called libfoo-bar-perl,
> but you can usually find it using "apt-cache search foo-bar".
>
> Thus XML::Simple is libxml-simple-perl, DBI is libdbi-perl,
> Finance::Quote is libfinance-quote-perl ... and so on.
>
> (PS how you do this with a graphic front-end I don't know 'cos I'm old
> enough to be a command-line jockey. I'm sure you'll find out how.)
>
This is good and bad. It's good because it's good for Perl use,
but is bad because that people takes time to pack things already
packed, which can be spent i.e. in packing a new version of jed, or
reporting to cpan authors that their module doesn't install, or...
Thank you very much for the advice.
Ps: I'm supposing that they do not automate the production of
libxyz-123-perl, in which case I'm totally wrong.
--
http://www.telecable.es/personales/gamo/
------------------------------
Date: Wed, 3 Dec 2014 17:20:19 -0600
From: John Black <jblack@nospam.com>
Subject: Re: Can you point me in the right direction - accessing a website
Message-Id: <MPG.2ee956ab5e5892d6989805@news.eternal-september.org>
In article <BradnVGMO8e0xOLJnZ2dnUVZ7sUAAAAA@giganews.com>, news@lawshouse.org says...
>
> On 03/12/14 18:13, John Black wrote:
> > I've grown to hate cpan because it just represents what I can't
> > have.
>
> Risking going off at a tangent here, that's the first problem you need
> to fix, if you don't mind my saying so. I use CPAN fairly often (though
> I tend to go first to the Ubuntu repos) and I can't remember the last
> time that it failed to install what I need. Like Perl it Does What I Mean.
Ok, I fixed all of these type problems by re-installing. There was some wrong compiler that
got itself into the path somehow.
So I installed LWP::Protocol::https like it said and I can now access https pages!
I wrote this little program which I thought would take me to my account home page:
#!/usr/bin/perl
use WWW::Mechanize;
$url = 'https://www.chase.com';
my $mech = WWW::Mechanize->new();
$password = 'XXXXXXXX';
$username = 'XXXXXXXX';
$mech->get( $url );
die unless ($mech->success);
$mech->submit_form(
form_name => 'logonform',
fields => {
username => $username,
password => $password,
},
);
die unless ($mech->success);
print $mech->content;
However, I think I may have gotten in over my head. What shows up in the content is not the
account page but a more advanced login page that contains a bunch of javascript and cookie
code. Its hard to believe its would be a simple matter to get past that?
John Black
------------------------------
Date: Thu, 04 Dec 2014 07:53:06 +0000
From: Henry Law <news@lawshouse.org>
Subject: Re: Can you point me in the right direction - accessing a website
Message-Id: <LYqdnWMhgap4jx3JnZ2dnUVZ8nydnZ2d@giganews.com>
On 03/12/14 23:20, John Black wrote:
> However, I think I may have gotten in over my head.
You are now in a maze of twisty little passages, all alike.
Travelling along this road, somewhat in front of you, I have found that
this isn't unusual. For example, I've written a simple Mechanize
program that logs in to a router and grabs the up and down counts. Going
to http://wherever gives one page which contains a redirect to another
page, in which javascript summons up yet another page which is the one
the user sees. I have resorted to line tracing to figure it out. Screen
scraping (for that is what we're both doing) was always difficult and
painstaking.
Try turning javascript off on your browser, emptying the cache and going
to the login screen at your bank; is what you see different from normal?
If so then save the HTML of that page and look for redirects and
javascript weirdness. Code up the key parts of that in your Mechanize
program and try again.
What worries me is that all this trial-and-error is firing repeated HTTP
requests at your bank, which may spook them (it ought to), to say
nothing of getting your password locked out.
--
Henry Law Manchester, England
------------------------------
Date: Thu, 04 Dec 2014 09:59:08 -0500
From: "Jonathan N. Little" <lws4art@gmail.com>
Subject: Re: Can you point me in the right direction - accessing a website
Message-Id: <m5psrg$7dm$1@dont-email.me>
Henry Law wrote:
> On 03/12/14 23:20, John Black wrote:
>> However, I think I may have gotten in over my head.
>
> You are now in a maze of twisty little passages, all alike.
>
> Travelling along this road, somewhat in front of you, I have found that
> this isn't unusual. For example, I've written a simple Mechanize
> program that logs in to a router and grabs the up and down counts. Going
> to http://wherever gives one page which contains a redirect to another
> page, in which javascript summons up yet another page which is the one
> the user sees. I have resorted to line tracing to figure it out. Screen
> scraping (for that is what we're both doing) was always difficult and
> painstaking.
>
> Try turning javascript off on your browser, emptying the cache and going
> to the login screen at your bank; is what you see different from normal?
> If so then save the HTML of that page and look for redirects and
> javascript weirdness. Code up the key parts of that in your Mechanize
> program and try again.
>
> What worries me is that all this trial-and-error is firing repeated HTTP
> requests at your bank, which may spook them (it ought to), to say
> nothing of getting your password locked out.
>
It also is just a *Bad Idea ™* from the get-go. Fraught with all kinds
of issues and a security risk. As I stated elsewhere banks typically
have automated email/text alerts that you can setup to notify you of
your current account status. Those would be PUSHed from them and not
PULLed from you and eliminate the login/security issue.
--
Take care,
Jonathan
-------------------
LITTLE WORKS STUDIO
http://www.LittleWorksStudio.com
------------------------------
Date: Thu, 04 Dec 2014 15:46:59 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: Can you point me in the right direction - accessing a website
Message-Id: <M9CdnQs0Ud_zbx3JnZ2dnUVZ572dnZ2d@giganews.com>
On 11/30/2014 1:28 PM, John Black wrote:
> Hi,
>
> I am relatively good with basic Perl but don't know web stuff. I don't need to build a
> website (right now). All I want to do is have a script that accesses a certain webpage, like
> my bank, and logs in via a user id and pwd and then grabs some data like current balance. I
> know how to make the program run once a day so that my computer can automatically stay on top
> of my account and send me the daily balance or other alerts. Can someone point me in the
> right direction? I'd bet there are existing cpan modules that do things like that. It does
> not have to be specific to any application - just something that can pull in a page and log
> in to it and then navigate around a little to grab info. Thanks.
>
> John Black
I can tell you right now that's probably not going to work.
Many banks have anti-bot features such as "captcha" to block
just that kind of thing. And many will ask a random security
question from a set of several ("what was the name of your
childhood pet?") if they don't recognize your current
browser as one you've used before. So you'd have to write
a perl script that recognizes all of the security questions
its likely to see, knows all the right answers, and is able
to enter the correct answers into the correct fields of the
web forms and hit "enter" or click "ok" for you.
Alternative idea: research "AutoIt3"; that might be closer
to what you need than perl. (But even an AutoIt3 script is
going to have a hard time with Captcha.)
--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
------------------------------
Date: Thu, 04 Dec 2014 16:13:10 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Dedup script is finished and works. Critiques?
Message-Id: <m8-dnRjSztYPZR3JnZ2dnUVZ572dnZ2d@giganews.com>
I finished that "find duplicate files" program I was writing as
a "learn Perl" exercise, and it appears to work correctly, after
correcting a few amusing run-time bugs. Such as, it was comparing
each file to itself, saying "blat.txt is a duplicate of blat.txt;
which would you like to erase, blat.txt or blat.txt?" :-)
Obvious bugs fixed, it appears to work correctly, and I've already
used it to quickly find and remove a bunch of large duplicate files
on my system.
However, I suspect that it's full of "long and winding roads"
(to quote Paul McCartney), and that "you're a fool if you see a
winding road; there's always a straight path to the place you
seek" (to quote Akeboshi). Anyone here see places in the script
(pasted below) which make you cringe and say "yuck, that's so
clumsy, I would have done xxxxxxx xxxxxx xxxxx instead"?
Flaws I see myself include:
1. The way I'm reading unbuffered single characters from keyboard
is very NOT portable. Is there a more portable way?
2. I collect way too much info on all files; I should probably
simplify my file-records hash so that it only keeps track of
file names and sizes, as that's all I'm actually using.
3. I'm not sure the buffer size is quite optimum, or if using
sysread instead of read would speed it up; I should probably
experiment with that.
4. Perhaps I should present each cluster of several duplicates
to user all at once, instead of one pair at a time, and give
more options as to what to do?
#!/usr/bin/perl
################################################################################
# dedup.perl #
# Duplicate file finding/erasing program. #
# Written by Robbie Hatley, starting 2005-06-21, as a "learn Perl" exercise. #
# Plan: Recursively descend directory tree starting from current working #
# directory, and make a master list of all files encountered on this branch. #
# Order the list by size. Within each size group, compare each file, from #
# left to right, to all the files to its right. If a duplicate pair is found, #
# alert user and get user input. Give user these choices: #
# 1. Erase first file #
# 2. Erase second file #
# 3. Ignore this pair of duplicate files and move to next #
# 4. Quit #
# If user elects to delete a file, delete it, then move to next duplicate. #
# Edit history: #
# Tue Jun 21, 2005 - Started writing it. #
# Thu Nov 20, 2014 - Getting back to this exercise after 9-year hiatus. #
# Mon Nov 24, 2014 - Got it working up to the point of alerting user as to #
# whether each pair of same-size files are identical or #
# different. #
# Thu Dec 04, 2014 - Now fully functional. #
################################################################################
use v5.14;
use strict;
use warnings;
use Cwd;
sub date_from_mtime
{
my $TimeDate = scalar localtime shift @_;
my $Date = substr ($TimeDate, 0, 10);
$Date .= ", ";
$Date .= substr ($TimeDate, 20, 4);
return $Date;
}
sub time_from_mtime
{
my $TimeDate = scalar localtime shift @_;
my $Time = substr ($TimeDate, 11, 8);
return $Time;
}
sub get_character {
system "stty cbreak </dev/tty >/dev/tty 2>&1";
my $char = getc;
system "stty icanon </dev/tty >/dev/tty 2>&1";
return $char;
}
my $CurDir;
my %CurDirFiles;
$CurDir = getcwd();
print "CWD = ", $CurDir, "\n";
opendir(my $Dot, ".") or die "Can\'t open directory. $!";
while (my $FileName=readdir($Dot))
{
my ($dev, $ino, $mode, $nlink, $uid,
$gid, $rdev, $size, $atime, $mtime,
$ctime, $blksize, $blocks)
= stat($FileName);
my $ModDate = date_from_mtime($mtime);
my $ModTime = time_from_mtime($mtime);
my $Size = -s _ ;
# No sense trying to compare directories byte-for-byte, seeing as how
# they aren't files and don't HAVE bytes to compare, so if the current
# "file" is actually a directory, just move on to the next file:
if ( -d _ ) {next;}
push @{ $CurDirFiles{$Size} },
{
"Date" => $ModDate,
"Time" => $ModTime,
"Size" => $Size,
"Attr" => $mode,
"Name" => $FileName
};
};
closedir($Dot);
SIZE: foreach my $Size (sort {$b<=>$a} keys %CurDirFiles)
{
my $count = scalar(@{$CurDirFiles{$Size}});
# If fewer than two files exist of this size, go to next size group:
if ($count < 2) {next SIZE;}
# If we get to here, compare each file to the files to its right:
my $filename1;
my $filename2;
my $filehandle1;
my $filehandle2;
# OUTER LOOP: iterate through all files except last, comparing each
# "first file" to all files to its right in the array:
FIRST: for (my $i = 0 ; $i < ($count - 1) ; ++$i)
{
$filename1 = ${$CurDirFiles{$Size}}[$i]->{Name};
open ($filehandle1, "< :raw", $filename1) or next FIRST;
# INNER LOOP: For each "first file", compare it to each "second file"
# which is to its right in the sort order. (This is why $j starts at
# $i + 1 instead of 0 or 1 as you might expect.)
SECOND: for (my $j = $i + 1 ; $j < $count ; ++$j)
{
$filename2 = ${$CurDirFiles{$Size}}[$j]->{Name};
open ($filehandle2, "< :raw", $filename2) or next SECOND;
my $different = 0;
my $buffer1;
my $buffer2;
# BUFFER LOOP: Read data from first and second files, in buffers of
# 1 MiB, comparing first buffer to second buffer. If a difference is
# found, mark files as "different" and exit loop; else continue
# reading the files until out of data:
BUFFER: while ( read($filehandle1, $buffer1, 1048576)
and read($filehandle2, $buffer2, 1048576) )
{
if ( $buffer1 eq $buffer2 )
{
next BUFFER;
}
else
{
$different = 1;
last BUFFER;
}
}
# We're done with second file for now, so close it:
close($filehandle2);
# If the two files are not different, give user choices on what to do:
if (not $different)
{
say("\n\n${filename1}\nis identical to\n${filename2}");
say("Type 1 to erase ${filename1}");
say("Type 2 to erase ${filename2}");
say("Type 3 to ignore these duplicates");
say("Type 4 to abort program and return to bash");
my $char = get_character;
given ($char)
{
when (1)
{
close($filehandle1); # Close first file.
unlink($filename1); # Erase first file.
next FIRST; # Move on to next first file.
}
when (2)
{
unlink($filename2); # Erase second file.
next SECOND; # Move on to next second file.
}
when (3)
{
next SECOND; # Just move on to next second file.
}
when (4)
{
exit; # Abort operations and exit program.
}
default
{
;
}
}#end given $char
}#end else not $diff
}#end SECOND
# Done with first file, so close it:
close($filehandle1);
}#end FIRST
}#end foreach size group
--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
------------------------------
Date: Wed, 03 Dec 2014 23:55:51 +0100
From: gamo <gamo@telecable.es>
Subject: Hard challenge: want to form a Perl team?
Message-Id: <m5o4di$gq4$1@speranza.aioe.org>
Look at:
https://www.kaggle.com/c/helping-santas-helpers
The problem is of job shop sheduling optimization
modified by not linear objective and restrictions.
We only have to beat a commercial program and a
bunch of experts. A perfect plan for Xmas holidays.
Is anyone interested?
--
http://www.telecable.es/personales/gamo/
------------------------------
Date: Wed, 03 Dec 2014 04:19:05 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: How do I compare two files byte-by-byte?
Message-Id: <aaSdnYljh8krYuPJnZ2dnUVZ57ydnZ2d@giganews.com>
On 11/29/2014 1:13 AM, Martijn Lievaart wrote (regarding buffer
sizes for comparing two files):
> On Fri, 28 Nov 2014 06:44:10 -0800, Robbie Hatley wrote:
>
> > I think I'll start by trying a 4096-byte buffer (one HD allocation
> > unit). If that's still too slow, I'll try larger.
>
> Go for larger directly. Start at 1MB.
Ok, I tried various buffer sizes, and the results were a bit surprising,
but I think I understand why.
I compared two pairs of files:
The first set was a pair of identical 20MB pdf files.
Buffer size: 1byte
Elapsed time: 11 seconds
Buffer size: 4096 bytes
Elapsed time: 0.5 seconds
2000% improvement just by going to a bigger buffer.
The second set was a pair of identical 1.15GB (1.06GiB) avi files.
Buffer size: 4096 bytes
Elapsed time: 104 seconds
Buffer size: 1048576 bytes
Elapsed time: 11 seconds!!!
Buffer size: 8388608 bytes
Elapsed time: 24 seconds. (?!)
Buffer size: 1048576 bytes
Elapsed time: 37 seconds. (?!)
Seems that about 1MiB is optimum. Larger buffer sizes appear to fill
all "free" RAM with cached crap, leaving RAM about 1/2 "used" and
about 1/2 "standby" (full of cached crap) with only about 1% "free"
(completely unused). This makes everything slower because then the
computer is spending too much of its time swapping cached crap
out to the swapfile to make room for new data. And the problem
isn't reversible without restarting the machine, so you take a
permanent reduced-speed hit until next system restart. (This is
Windows 8.1. Other OSs may use different memory management
techniques.)
But clearly 4KiB is too small. 1MiB is about right. Thanks for
the tip on that!
--
Cheers,
Robbie Hatley
lonewolf [at] well [dot] come
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 4322
***************************************