[32586] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3858 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Jan 11 09:09:17 2013

Date: Fri, 11 Jan 2013 06:09:04 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 11 Jan 2013     Volume: 11 Number: 3858

Today's topics:
    Re: best way to make a few changes in a large data file <rweikusat@mssgmbh.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 11 Jan 2013 13:56:32 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: best way to make a few changes in a large data file
Message-Id: <877gnjakmn.fsf@sapphire.mobileactivedefense.com>

Rainer Weikusat <rweikusat@mssgmbh.com> writes:
> "C.DeRykus" <derykus@gmail.com> writes:
>
> [...]
>
>> Since speed isn't critical, the Tie::File suggestion 
>> would simplify the code considerably.
>
> [...]
>
>>   use Tie::File;
>>
>>   tie my @array, 'Tie::File', 'd0' or die $!;
>>
>>   open(my $fh, '<', 'd1') or die $!;
>>   while (<$fh>) {
>>     chomp;
>>     my($id, $value) = split /,/;
>>     $array[$id-1] = "$id,$value";
>>   }

[...]

> Assuming that speed doesn't matter, a simple implementation could
> look like this
>
> sub small
> {
>     my ($fh, %chgs);
>
>     open($fh, '<', 'd1');
>     %chgs = map { split /,/ } <$fh>;
>
>     open($fh, '<', 'd0');
>     /(.*),(.*)/s, print ($1, ',', $chgs{$1} // $2) while <$fh>;
> }

As an afterthought: Instead of guessing at what's taking the time when
executing the code above, I've instead tested it. The 'small_hash'
implementation below (with data files constructed in the way I
described in an earlier posting) is either faster than big_hash or
runs at comparable speeds (tested with files up to 1004K in size). It
can also process a 251M file which the big_hash one can't do within a
reasonable amount of time because it first causes perl to eat all RAM
available on the system where I tested this and then makes that go into
'heavy thrashing' mode because 'all available RAM' is - by far - not
enough.

----------------
use Benchmark;

open($out, '>', '/dev/null');

timethese(-5,
	  {
	   big_hash => sub {
	       my ($fh, %data, $k, $d);

	       open($fh, '<', 'd0');
	       %data = map { split /,/ } <$fh>;

	       open($fh, '<', 'd1');
	       while (<$fh>) {
		   ($k, $d) = split /,/;
		   $data{$k} = $d;
	       }

	       print $out ($_, ',', $data{$_}) for keys(%data);
	   },

	   small_hash => sub {
	      my ($fh, %chgs, $k, $d);

	      open($fh, '<', 'd1');
	      %chgs = map { split /,/ } <$fh>;

	      open($fh, '<', 'd0');
	      while (<$fh>) {
		  ($k, $d) = split /,/;
		  print $out ($k, ',', $chgs{$k} // $d);
	      }
	  }});


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3858
***************************************


home help back first fref pref prev next nref lref last post