[32586] in Perl-Users-Digest
Perl-Users Digest, Issue: 3858 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Jan 11 09:09:17 2013
Date: Fri, 11 Jan 2013 06:09:04 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Fri, 11 Jan 2013 Volume: 11 Number: 3858
Today's topics:
Re: best way to make a few changes in a large data file <rweikusat@mssgmbh.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Fri, 11 Jan 2013 13:56:32 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <877gnjakmn.fsf@sapphire.mobileactivedefense.com>
Rainer Weikusat <rweikusat@mssgmbh.com> writes:
> "C.DeRykus" <derykus@gmail.com> writes:
>
> [...]
>
>> Since speed isn't critical, the Tie::File suggestion
>> would simplify the code considerably.
>
> [...]
>
>> use Tie::File;
>>
>> tie my @array, 'Tie::File', 'd0' or die $!;
>>
>> open(my $fh, '<', 'd1') or die $!;
>> while (<$fh>) {
>> chomp;
>> my($id, $value) = split /,/;
>> $array[$id-1] = "$id,$value";
>> }
[...]
> Assuming that speed doesn't matter, a simple implementation could
> look like this
>
> sub small
> {
> my ($fh, %chgs);
>
> open($fh, '<', 'd1');
> %chgs = map { split /,/ } <$fh>;
>
> open($fh, '<', 'd0');
> /(.*),(.*)/s, print ($1, ',', $chgs{$1} // $2) while <$fh>;
> }
As an afterthought: Instead of guessing at what's taking the time when
executing the code above, I've instead tested it. The 'small_hash'
implementation below (with data files constructed in the way I
described in an earlier posting) is either faster than big_hash or
runs at comparable speeds (tested with files up to 1004K in size). It
can also process a 251M file which the big_hash one can't do within a
reasonable amount of time because it first causes perl to eat all RAM
available on the system where I tested this and then makes that go into
'heavy thrashing' mode because 'all available RAM' is - by far - not
enough.
----------------
use Benchmark;
open($out, '>', '/dev/null');
timethese(-5,
{
big_hash => sub {
my ($fh, %data, $k, $d);
open($fh, '<', 'd0');
%data = map { split /,/ } <$fh>;
open($fh, '<', 'd1');
while (<$fh>) {
($k, $d) = split /,/;
$data{$k} = $d;
}
print $out ($_, ',', $data{$_}) for keys(%data);
},
small_hash => sub {
my ($fh, %chgs, $k, $d);
open($fh, '<', 'd1');
%chgs = map { split /,/ } <$fh>;
open($fh, '<', 'd0');
while (<$fh>) {
($k, $d) = split /,/;
print $out ($k, ',', $chgs{$k} // $d);
}
}});
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3858
***************************************