[32965] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4241 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Jun 26 05:17:17 2014

Date: Thu, 26 Jun 2014 02:17:04 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 26 Jun 2014     Volume: 11 Number: 4241

Today's topics:
    Re: Complex regular subexpression recursion limit <derykus@gmail.com>
    Re: Complex regular subexpression recursion limit <rweikusat@mobileactivedefense.com>
    Re: Complex regular subexpression recursion limit <*@eli.users.panix.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Tue, 24 Jun 2014 22:50:43 -0700
From: Charles DeRykus <derykus@gmail.com>
Subject: Re: Complex regular subexpression recursion limit
Message-Id: <lodo0k$8o0$2@speranza.aioe.org>

On 6/24/2014 5:48 PM, Eli the Bearded wrote:
> Maybe I'm tired, but I'm not seeing why I'm triggering this warning on
> this code with this input.
>
> #!/usr/local/bin/perl5.14.2
> use warnings;
> use strict;
>
> # flush after each print so that error appears in right context
> # when run with STDIN and STDERR mixed.
> $| ++;
>
> my $hold = <<'END_BLOCK';
> ...
>
> END_BLOCK
>
> # first eat all 'strict' sentences
> while($hold =~ s/^(
>                          (?: [^.!?]+             # non-ending characters
>                          | [.!?]+[,\w-]          # ending charters in non-ending
>                                                  # context
>                          )+
>                      [.!?]+['"\(\)\[\]{}*]*      # ending characters in ending
>                      \s                          # context
>                    )//x) {
>      my $a = $1;
>      $a =~ s/^\s+//;
>      $a =~ s/\s+$//;
>      $a =~ s/\s+/ /g;
>
>      print "$a\n";
> }
>
> # whatever is left is a loose sentence.
> $hold =~ s/^\s+//;
> $hold =~ s/\s+$//;
> $hold =~ s/\s+/ /g;
>
> print "\n\n$hold\n";
> __END__
>


IIUC you can eliminate this by using the 'possessive' quantifier 
mentioned in perlre:

  while($hold =~ s/^(
                         (?: [^.!?]+      # non-ending characters
                         | [.!?]+[,\w-]   # ending charters in non-ending
                                          # context
                         )++              # possessively
                   [.!?]+['"\(\)\[\]{}*]*  # ending characters in ending
                     \s                      # context
                )//x) {

-- 
Charles DeRykus


------------------------------

Date: Wed, 25 Jun 2014 13:09:45 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Complex regular subexpression recursion limit
Message-Id: <87ionptcl2.fsf@sable.mobileactivedefense.com>

Eli the Bearded <*@eli.users.panix.com> writes:
> Maybe I'm tired, but I'm not seeing why I'm triggering this warning on
> this code with this input.

[...]

> # first eat all 'strict' sentences
> while($hold =~ s/^(
>                         (?: [^.!?]+             # non-ending characters

At least for this example, you can also remove the + after the ] which
causes the warning to go away without changing the output.

>                         | [.!?]+[,\w-]          # ending charters in non-ending
>                                                 # context
>                         )+ 
>                     [.!?]+['"\(\)\[\]{}*]*      # ending characters in ending
>                     \s                          # context
>                   )//x) {

Judging from the comments, you assume that colons don't terminate sentences
which is wrong. Apart from that, this looks terribly like an
incrementally constructed "need more ++!!.?:£$!!?: ;!!"- regex and
whatever problem you're trying to solve, the end-result will be
something you don't understand anymore (sic) which still doesn't work.



------------------------------

Date: Wed, 25 Jun 2014 17:52:09 +0000 (UTC)
From: Eli the Bearded <*@eli.users.panix.com>
Subject: Re: Complex regular subexpression recursion limit
Message-Id: <eli$1406251335@qz.little-neck.ny.us>

In comp.lang.perl.misc,
Rainer Weikusat  <rweikusat@mobileactivedefense.com> wrote:
> Eli the Bearded <*@eli.users.panix.com> writes:
>> Maybe I'm tired, but I'm not seeing why I'm triggering this warning on
>> this code with this input.
 ...
>>                         (?: [^.!?]+             # non-ending characters
> 
> At least for this example, you can also remove the + after the ] which
> causes the warning to go away without changing the output.

Okay, thanks.

>>                     [.!?]+['"\(\)\[\]{}*]*      # ending characters in ending
>>                     \s                          # context
> 
> Judging from the comments, you assume that colons don't terminate
> sentences which is wrong.

Colons don't, as a general rule, terminate sentences. When the paragraph
ends with a colon, a dash, a semicolon, etc, I'm forced to accept it,
but not before then.

> Apart from that, this looks terribly like an
> incrementally constructed "need more ++!!.?:£$!!?: ;!!"- regex and
> whatever problem you're trying to solve, the end-result will be
> something you don't understand anymore (sic) which still doesn't work.

The code is supposed to divide input text, typically formatted as blank
line delimited paragraphs such as is common in Usenet, into one sentence
per line output. When the results are imperfect, disaster does not ensue
and incremental code tweaks are acceptable.

The sentences are being further processed in a non-semantic manner:
specifically I'm looking for existing text that naturally anagrams to
other existing text, much like the Anagramatron does with twitter
output:

http://anagramatron.tumblr.com/

Many are dull or just okay, but some ten percent or so are *excellent*:

http://anagramatron.tumblr.com/post/89699314855/bbylonglegs-vs-johnquinn128

	I can rest when I'm dead.
	I am drenched in sweat

http://anagramatron.tumblr.com/post/89567969260/jenkinshannah-vs-i-stayy-freesh

	everything on their menu
	I'm not even hungry either

http://anagramatron.tumblr.com/post/89465094460/dom-fuck-vs-loveeeshelly

	boredom strikes late at night.
	Let me start reading this book

So, given the context and the use, what would you suggest instead of 
my approach, which so obviously offends you?

Elijah
------
gutenberg.org has a large corpus of input text to test with


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4241
***************************************


home help back first fref pref prev next nref lref last post