[32660] in Perl-Users-Digest
Perl-Users Digest, Issue: 3936 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Apr 29 09:09:38 2013
Date: Mon, 29 Apr 2013 06:09:04 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Mon, 29 Apr 2013 Volume: 11 Number: 3936
Today's topics:
Re: nice parallel file reading <rweikusat@mssgmbh.com>
Re: nice parallel file reading <rweikusat@mssgmbh.com>
Re: nice parallel file reading <nospam.gravitalsun.antispam@spamno.hotmail.anispam.com.nospam>
Re: nice parallel file reading <rweikusat@mssgmbh.com>
Re: nice parallel file reading <rweikusat@mssgmbh.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sat, 27 Apr 2013 19:25:15 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: nice parallel file reading
Message-Id: <87txmrn850.fsf@sapphire.mobileactivedefense.com>
"George Mpouras"
<nospam.gravitalsun.antispam@spamno.hotmail.anispam.com.nospam>
writes:
> # there was a problem with the code at my initial post
> # Here is corrected, of how to read files like round-robin
> # using an iterator
[...]
> sub Read_files_round_robin
> {
> my @FH;
> for (my $i=$#_; $i>=0; $i--) { if (open my $fh, $_[$i]) {push @FH, $fh} }
> my $k = $#FH;
>
> sub
> {
> until (0 == @FH)
> {
> for (my $i=$k--; $i>=0; $i--)
> {
> $k = $#FH if $k == -1;
>
> if ( eof $FH[$i] )
> {
> close $FH[$i];
> splice @FH, $i, 1;
> $k--
> }
> else
> {
> return readline $FH[$i]
> }
> }
> }
>
> '__ALL_FILES_HAVE_BEEN_READ__'
> }
> }
Fun ways to waste your time:
----------------------
#!/usr/bin/perl
use strict;
my $Reader = Read_files_round_robin( 'file1.txt', 'wuzz', 'file2.txt', 'file3.txt');
while ( my $line = $Reader->() ) {
chomp $line;
print "*$line*\n";
}
sub Read_files_round_robin
{
my (@F, $cur);
open($F[0][@{$F[0]}], '<', $_) // --$#{$F[0]}
for @_;
return sub {
my ($fh, $l);
do {
$fh = shift(@{$F[$cur]}) or return
} until defined($l = <$fh>);
push(@{$F[$cur ^ 1]}, $fh);
$cur ^= 1 unless @{$F[$cur]};
return $l;
};
}
------------------------------
Date: Sat, 27 Apr 2013 22:34:12 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: nice parallel file reading
Message-Id: <87ehdv7j57.fsf@sapphire.mobileactivedefense.com>
Rainer Weikusat <rweikusat@mssgmbh.com> writes:
[...]
> sub Read_files_round_robin
> {
> my (@F, $cur);
>
> open($F[0][@{$F[0]}], '<', $_) // --$#{$F[0]}
> for @_;
>
> return sub {
> my ($fh, $l);
>
> do {
> $fh = shift(@{$F[$cur]}) or return
> } until defined($l = <$fh>);
>
> push(@{$F[$cur ^ 1]}, $fh);
> $cur ^= 1 unless @{$F[$cur]};
>
> return $l;
> };
> }
While this is fairly neat, it is unfortunately broken: It is possible
that the 'current' array runs out of usable file handles but that a
usable file handle still exists in the 'next' array (eg, when the
first file is the one containing the most lines of text). This means
the 'current' array needs to be switched exactly once in this case
which, in turn, ends up making the control flow rather ugly :-( (I
tried a few variants but didn't find one I would want to post).
------------------------------
Date: Sun, 28 Apr 2013 02:31:27 +0300
From: "George Mpouras" <nospam.gravitalsun.antispam@spamno.hotmail.anispam.com.nospam>
Subject: Re: nice parallel file reading
Message-Id: <klhn4i$1k1k$1@news.ntua.gr>
push(@{$F[$cur ^ 1]}, $fh);
impressive , I have to study this !!
------------------------------
Date: Sun, 28 Apr 2013 13:20:40 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: nice parallel file reading
Message-Id: <87sj2alucn.fsf@sapphire.mobileactivedefense.com>
"George Mpouras"
<nospam.gravitalsun.antispam@spamno.hotmail.anispam.com.nospam>
writes:
> push(@{$F[$cur ^ 1]}, $fh);
>
> impressive ,
Not really. The idea to use two arrays cannot work in this way, as I
already wrote in another posting. But it is still possible to do away
with the counting loops (which are IMHO 'rather ugly', IOW, I never
use for (;;;) for anything):
-----------------
sub Read_files_round_robin
{
my (@FH, $cur);
open($FH[@FH], '<', $_) // --$#FH
for @_;
$cur = -1;
return sub {
my $l;
return unless @FH;
$cur = ($cur + 1) % @FH;
$cur == @FH and --$cur
until ($l = readline($FH[$cur])) // (splice(@FH, $cur, 1), !@FH);
return $l;
};
}
------------------
It is possible to replace the
$cur == @FH and --$cur
with
$cur -= $cur == @FH
This would be a good idea in C because it would avoid a branch in favor
of an arithmetic no-op. I don't really know if this is true or false
for Perl and I'm unusure whether one or the other should be preferred
for clarity.
?
------------------------------
Date: Sun, 28 Apr 2013 22:04:20 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: nice parallel file reading
Message-Id: <877gjmjrjf.fsf@sapphire.mobileactivedefense.com>
"Peter J. Holzer" <hjp-usenet3@hjp.at> writes:
> On 2013-04-27 14:49, Jürgen Exner <jurgenex@hotmail.com> wrote:
>> "George Mpouras"
>><nospam.gravitalsun.antispam@spamno.hotmail.anispam.com.nospam> wrote:
>>># there was a problem with the code at my initial post
>>># Here is corrected, of how to read files like round-robin
>>># using an iterator
>>
>> While this might be mildly interesting as an academic exercise I wonder
>> if there is any actual non-contrived application where you would have to
>> read multiple files synchronously line-by-line and at the same time the
>> files are too large to just load them into a variable and then process
>> their content.
>
> Not exactly like George's code, but very similar: Merge sorted files.
>
> A similar technique could be used to implement comm(1).
There's also a paste utility which does round-robin merging of lines
from several input files. This would need a different EOF-handling,
though (it would need to return an empty line every time a file which
ran out of data is supposed to be read from).
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3936
***************************************