[33064] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 4340 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Jan 1 09:09:18 2015

Date: Thu, 1 Jan 2015 06:09:02 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 1 Jan 2015     Volume: 11 Number: 4340

Today's topics:
    Re: fields separation <rweikusat@mobileactivedefense.com>
    Re: fields separation <rweikusat@mobileactivedefense.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 31 Dec 2014 15:45:52 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: fields separation
Message-Id: <87vbkryg9b.fsf@doppelsaurus.mobileactivedefense.com>

蔡汝凯 <cairukai@huawei.com> writes:
>> > $buff=~s/<(\s*\n*\s*){1,}\*(\s*\n*\s*){1,}>/<*>/g;
>> 
>> This obviously won't work with separators other than <*>. Further, in a regex
>> 
>> \s*\n*\s*
>> 
>> only the first \s* will ever match anything because \s matches \n,
>> 
>> [rw@doppelsaurus]~#perl -e '$a = " \n "; $a =~ /(\s*)(\n*)(\s*)/;
>> printf("|%s|, |%s|, |%s|\n", $1, $2, $3);'
>> |
>>  |, ||, ||
> Hi Rainer, i do not think it's the important thing, matching the \n
> with \n or with \s. i just need match the whole pattern ant kill them.

Assuming you want to match all whitespace and all newlines,

\s*\n*\s*

is really the same as

\s*

because \n is a whitespace-character: The first \s* will gobble up all
of them because it is a greedy match and the \n*\s* just make the
complete regex a little more complicated without affecting what it
matches.


------------------------------

Date: Wed, 31 Dec 2014 20:11:28 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: fields separation
Message-Id: <87oaqjy3yn.fsf@doppelsaurus.mobileactivedefense.com>

Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:

[...]


> Here's a hint why you're latest still sucks except when making suitable
> assumptions about what the input data will be:
>
> ---------
> use Benchmark qw(cmpthese);
>
> my $t = 'A' x 16;
> my @a = ($t) x 512;
> push(@a, 'bb');
>
> cmpthese(-3,
> 	  {
> 	   half => sub {

[...]

 	   },
> 	   
> 	   quarter => sub {

[...]

	   },
> 	   
> 	   all => sub {

[...]

> 	   }});
> ---------
>
> You'll have to interpret that yourself this time.

With apologies to everyone to whom this is self-evident but since there
are probably people to whom it isn't:

	while (read $fh, my $rawdata, $buffsize)
	{

[...]
	my @fields = "$leftbehind$rawdata" =~/(.*?)$sep/g;
	
		if (@fields)
		{
		$leftbehind = $';#'
		# call the Callback function
		}
		else
		{
		$leftbehind .= $rawdata
		}
	}

This piece of 'applied ingenuity' has one glaring problem, namely, it
will scan the remainder again on the next iteration. This implies that
it will perform extremly badly on input which doesn't conform to the
assumption it has been 'designed' around, namely, that the average field
size is much smaller than the input block and that the regex will thus
spend most of its time with finding fields instead of with scanning data
already scanned during the last iteration. This effect is also
cumulative, eg, for a token whose size is n * $buffsize, the total
amount of work will be proportional to

	$buffsize + 2 * $buffsize + 3 * $buffsize + ... + n * $buffsize;
        = $buffsize * (1 + 2 + 3 ... + n)
        = $buffsize * ( (n * (n + 1)) / 2)
        = $buffsize * ((n**2 + n) / 2)
        = ($buffsize / 2) * (n**2 + n)
        
ie, it will be proportional to the square of the number of blocks.

A perl one-liner a la

perl -e '$x = 61; print("A" x $x, "<*>"), $x *= 3 for (0 .. $ARGV[0])'

(first parameter should be the number of tokens to generate)

could be used to create input conforming to George's
pseudo-specification which will effectively perform an algorithmic
complexity attack on this code (the constants have been chosen such that
the token size will always be relatively prime to the block size to
prevent the algorithm from 'catching up').

NB: This can obviously be avoided by reading the entire input into
memory and scanning it only once.


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4340
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[33064] in Perl-Users-Digest

Perl-Users Digest, Issue: 4340 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Thu Jan 1 09:09:18 2015

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Jan 1 09:09:18 2015