[33060] in Perl-Users-Digest
Perl-Users Digest, Issue: 4336 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Dec 26 09:09:15 2014
Date: Fri, 26 Dec 2014 06:09:02 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Fri, 26 Dec 2014 Volume: 11 Number: 4336
Today's topics:
Re: fields separation <gravitalsun@hotmail.foo>
Re: fields separation <gravitalsun@hotmail.foo>
Re: fields separation <rweikusat@mobileactivedefense.com>
Re: fields separation <gravitalsun@hotmail.foo>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Thu, 25 Dec 2014 16:36:33 +0200
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: fields separation
Message-Id: <m7h7dg$nth$1@mouse.otenet.gr>
On 23/12/2014 23:11, Rainer Weikusat wrote:
> Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
> --------------------
> sub find_fields
> {
> my ($next_block, $sep) = @_;
> my ($in, $field, $gather, $sp, $re);
> my @fields;
>
> $gather = 1;
>
> $re = substr($sep, 0, 1);
> $re = '\\\\' if $re eq '\\';
> $re = '^([^'.$re.'\n]+)';
>
> OUTER:
> while (length($in) || $next_block->($in)) {
> if ($gather) {
> $in =~ /$re/ and do {
> $field .= $1;
>
> substr($in, 0, length($1), '');
> next;
> };
>
> $in =~ /^(\n+)/ and do {
> substr($in, 0, length($1), '');
> next;
> };
>
> $sp = 0;
> $gather = 0;
> }
>
> while (substr($in, 0, 1) eq substr($sep, $sp, 1)) {
> substr($in, 0, 1, '');
>
> if (++$sp == length($sep)) {
> push(@fields, $field);
> $field = '';
>
> $gather = 1;
> next OUTER;
> }
>
> next OUTER unless length($in);
> }
>
> $in =~ /^(\n+)/ and do {
> substr($in, 0, length($1), '');
> next;
> };
>
> $field .= substr($sep, 0, $sp);
> $gather = 1;
> }
>
> $field .= substr($sep, 0, $sp) unless $gather;
> push(@fields, $field) if length($field);
>
> return @fields;
> }
>
very good, works fine and it is three times faster than my version.
I use this for parsing some ancient ms dos files !
------------------------------
Date: Thu, 25 Dec 2014 18:07:25 +0200
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: fields separation
Message-Id: <m7hcns$q0i$1@mouse.otenet.gr>
the following Rainer is 52% faster than your last version (produce the
same rsults) and I have a feeling that it can become even faster
sub george
{
my $separator = '<*>';
my $lensep = length $separator;
my $field;
my $rawdata;
open FILE, '<:raw', 'test.txt' or die $!;
while (read FILE, $rawdata, 64)
{
foreach (split //, $rawdata)
{
next if /\v/;
$field .= $_;
if ($separator eq substr $field, -$lensep)
{
$field = substr $field, 0, -$lensep;
#print "$field\n";
$field = ''
}
}
}
close FILE
}
------------------------------
Date: Thu, 25 Dec 2014 22:06:55 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: fields separation
Message-Id: <87zjabwfi8.fsf@doppelsaurus.mobileactivedefense.com>
George Mpouras <gravitalsun@hotmail.foo> writes:
> the following Rainer is 52% faster than your last version (produce the
> same rsults) and I have a feeling that it can become even faster
[per-char processing via split(//, ...)]
This depends on the ratio of size of the input data divided by total
length of all separators, eg, when prepending 64 capital Bs to the test
input you posted, the code which doesn't examine all characters with
explicitly written Perl-code becomes faster for me.
There are also some simple modifications to make the regex-based one go
faster, namely, replace
$in =~ /^$re/ and do {
$field .= $1;
substr($in, 0, length($1), '');
next;
};
with
$in =~ s/^$re// and $field .= $1, next;
and
$in =~ /^(\n+)/ and do {
substr($in, 0, length($1), '');
next;
};
with
$in =~ s/^\n+// and next;
It's still slower than yours for the posted test data, though.
------------------------------
Date: Fri, 26 Dec 2014 00:13:52 +0200
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: fields separation
Message-Id: <m7i26u$2s2$1@mouse.otenet.gr>
On 26/12/2014 00:06, Rainer Weikusat wrote:
> George Mpouras <gravitalsun@hotmail.foo> writes:
>> the following Rainer is 52% faster than your last version (produce the
>> same rsults) and I have a feeling that it can become even faster
>
> [per-char processing via split(//, ...)]
>
>
>
> It's still slower than yours for the posted test data, though.
>
Yes it depends on the data. my split // is slow.
the compiled qr/ ... / regex will help your code a litle, but I am
thinking of a completely different aproach using regex and states.
Tomorrow I will think about it
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 4336
***************************************