[33150] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4429 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun May 10 18:09:22 2015

Date: Sun, 10 May 2015 15:09:05 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sun, 10 May 2015     Volume: 11 Number: 4429

Today's topics:
    Re: custom 'multigrep' <gamo@telecable.es>
    Re: custom 'multigrep' <rweikusat@mobileactivedefense.com>
    Re: custom 'multigrep' <rweikusat@mobileactivedefense.com>
    Re: custom 'multigrep' <rweikusat@mobileactivedefense.com>
    Re: custom 'multigrep' <rweikusat@mobileactivedefense.com>
    Re: custom 'multigrep' sharma__r@hotmail.com
    Re: custom 'multigrep' <rweikusat@mobileactivedefense.com>
    Re: custom 'multigrep' <fillmore_remove@hotmail.com>
    Re: custom 'multigrep' <rweikusat@mobileactivedefense.com>
        wordx....not_wordx...wordy   pattern matching. deangwilliam30@gmail.com
    Re: wordx....not_wordx...wordy   pattern matching. <derykus@gmail.com>
    Re: wordx....not_wordx...wordy   pattern matching. deangwilliam30@gmail.com
    Re: wordx....not_wordx...wordy   pattern matching. <derykus@gmail.com>
    Re: wordx....not_wordx...wordy   pattern matching. <rweikusat@mobileactivedefense.com>
    Re: wordx....not_wordx...wordy   pattern matching. <derykus@gmail.com>
    Re: wordx....not_wordx...wordy   pattern matching. <derykus@gmail.com>
    Re: wordx....not_wordx...wordy   pattern matching. <rweikusat@mobileactivedefense.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 08 May 2015 09:28:57 +0200
From: gamo <gamo@telecable.es>
Subject: Re: custom 'multigrep'
Message-Id: <mihojp$n4j$1@speranza.aioe.org>

El 07/05/15 a las 16:44, Rainer Weikusat escribió:
> Even something using an explicit file argument and an explicit counting
> loop can be implemented in a simpler way making different design
> descisions, ie, use $_ for the second loop instead of the first, and
> doing things in a less roundabout way[*]:
>
> ----
> my ($file, $line);
>
> open($file, '<', shift) or die;
> while ($line = <$file>) {
>      my $oks;
>
>      $oks += $line =~ /$_/ for @ARGV;
>
>      print $line if $oks == @ARGV;
>      print "NONE: $line" unless $oks;
> }
> ----

Okay, and if you insert an 'elsif' would be better.

-- 
http://www.telecable.es/personales/gamo/
The generation of random numbers is too important to be left to chance


------------------------------

Date: Fri, 08 May 2015 16:23:33 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: custom 'multigrep'
Message-Id: <87a8xf2ive.fsf@doppelsaurus.mobileactivedefense.com>

gamo <gamo@telecable.es> writes:
> El 07/05/15 a las 16:44, Rainer Weikusat escribió:
>> Even something using an explicit file argument and an explicit counting
>> loop can be implemented in a simpler way making different design
>> descisions, ie, use $_ for the second loop instead of the first, and
>> doing things in a less roundabout way[*]:
>>
>> ----
>> my ($file, $line);
>>
>> open($file, '<', shift) or die;
>> while ($line = <$file>) {
>>      my $oks;
>>
>>      $oks += $line =~ /$_/ for @ARGV;
>>
>>      print $line if $oks == @ARGV;
>>      print "NONE: $line" unless $oks;
>> }
>> ----
>
> Okay, and if you insert an 'elsif' would be better.

I usually prefer using loop control over 'indenting constructs' if there
isn't really anything behind the construct. In this case, this would be
(untested)

print $line, next if $oks == @ARGV;
print "NONE: $line" unless $oks;

"And now for something completely different": It's actually not that
difficult to implement the original requirements in a reasonable way
('reasonable' here supposed to mean: push most of the work into the
regex engine), including the later demand for 'not matches'. The program
below uses - as prefix for 'not' as ! will trigger history expansion on
a (UNIX(*)) shell supporting it. It is assumed that individual
regex-arguments don't contain alternatives (|).

---------
my (@love, %love, $love, @hate, $hate);

for (@ARGV) {
    push(@hate, substr($_, 1)), next if /^-/;
	
    push(@love, $_);
    $love{$_} = 1;
}

$love = join('|', sort { length($b) <=> length($a) } @love);
$hate = join('|', @hate);

while (<STDIN>) {
    my %wanted = %love;

    delete($wanted{$1}) while /($love)/g;
    next if %wanted || ($hate && /$hate/);

    print;
}


------------------------------

Date: Fri, 08 May 2015 23:01:07 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: custom 'multigrep'
Message-Id: <871tiqn2zg.fsf@doppelsaurus.mobileactivedefense.com>

Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
> gamo <gamo@telecable.es> writes:

[...]

> ---------
> my (@love, %love, $love, @hate, $hate);
>
> for (@ARGV) {
>     push(@hate, substr($_, 1)), next if /^-/;
> 	
>     push(@love, $_);
>     $love{$_} = 1;
> }
>
> $love = join('|', sort { length($b) <=> length($a) } @love);

The @love could be omitted by using keys(%love) instead.

> $hate = join('|', @hate);
>
> while (<STDIN>) {
>     my %wanted = %love;
>
>     delete($wanted{$1}) while /($love)/g;

And this is actually wrong (or can be regared as wrong) because the
condition won't be satisfied if the tail of a longer match is a prefix
of a shorter one and both would need to match 'in the same place', eg,
assuming the script was stored in a file a.pl,

echo fetchmail | perl a.pl fetchm mail

prints nothing while a loop trying to match both sequentially would find
them both.





------------------------------

Date: Fri, 08 May 2015 23:29:17 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: custom 'multigrep'
Message-Id: <87wq0iln42.fsf@doppelsaurus.mobileactivedefense.com>

Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
> Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:

[multi-key matching]


>>     delete($wanted{$1}) while /($love)/g;
>
> And this is actually wrong (or can be regared as wrong) because the
> condition won't be satisfied if the tail of a longer match is a prefix
> of a shorter one and both would need to match 'in the same place',

Ugly workaround:

----
my (%love, $love, @hate, $hate);

for (@ARGV) {
    push(@hate, substr($_, 1)), next if /^-/;
    $love{$_} = 1 if length();
}

$love = join('|', sort { length($b) <=> length($a) } keys(%love));
$hate = join('|', @hate);

while (<STDIN>) {
    my %wanted = %love;

    while (/($love)/g) {
	delete($wanted{$1});
	pos() -= length($1) - 1;
    }
    
    next if %wanted || ($hate && /$hate/);

    print;
}
----

An 'optimizing' implementation would advance the position by the length
of the longest prefix of the current match which neither contains a
later match nor has a suffix which is a prefix of one.

"This is left as an excercise for the reader" :-)


------------------------------

Date: Fri, 08 May 2015 23:32:06 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: custom 'multigrep'
Message-Id: <87sib6lmzd.fsf@doppelsaurus.mobileactivedefense.com>

Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
> Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
>> Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
>
> [multi-key matching]
>
>
>>>     delete($wanted{$1}) while /($love)/g;
>>
>> And this is actually wrong (or can be regared as wrong) because the
>> condition won't be satisfied if the tail of a longer match is a prefix
>> of a shorter one and both would need to match 'in the same place',
>
> Ugly workaround:

[...]

>     while (/($love)/g) {
> 	delete($wanted{$1});
> 	pos() -= length($1) - 1;
>     }

Also broken in case the current match has a prefix which is a later
match ... :-(


------------------------------

Date: Fri, 8 May 2015 20:50:05 -0700 (PDT)
From: sharma__r@hotmail.com
Subject: Re: custom 'multigrep'
Message-Id: <a6e4d854-5ed1-4f95-8272-8b6c466ac21f@googlegroups.com>

On Saturday, 9 May 2015 04:02:11 UTC+5:30, Rainer Weikusat  wrote:
> Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
> > Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
> >> Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
> >
> > [multi-key matching]
> >
> >
> >>>     delete($wanted{$1}) while /($love)/g;
> >>
> >> And this is actually wrong (or can be regared as wrong) because the
> >> condition won't be satisfied if the tail of a longer match is a prefix
> >> of a shorter one and both would need to match 'in the same place',
> >
> > Ugly workaround:
> 
> [...]
> 
> >     while (/($love)/g) {
> > 	delete($wanted{$1});
> > 	pos() -= length($1) - 1;
> >     }
> 
> Also broken in case the current match has a prefix which is a later
> match ... :-(

The List::MoreUtils module is very helpful in this case as well,
where the "pairwise" func. is used to compute the `effective match'
of a line given the patterns "/key1/", "/key2/", ..., "!/key3/" ...
@ARGs ---> array storing args 1 to last, i.e., key1, key2, ..., -key3, ...keyN
@REs ---> array storing matching patterns, i.e., key1, key2, .., key3, .., keyN
@MASK ---> array storing sense of matching patterns corresping to each key.
Thus wherever it is a "1" => the sense of regular expression needs to be inverted.

perl -Mstrict -Mwarnings -MList::MoreUtils="all,pairwise" -lne '
BEGIN{
   my(@ARGs, @REs, @MASK) = splice @ARGV, 1, $#ARGV;
   while(@ARGs && length($a = shift @ARGs)) {
      push @MASK, $a =~ s/^-//;
      push @REs, $a;
   }
   sub match_res {
     my($L) = @_ ? @_ : $_;
     pairwise {
        local $_ = $L;
        0 + ($a ? !/\Q$b\E/ : /\Q$b\E/)
     } @MASK, @REs;
   }
}
print if all { $_ } match_res();
' input_file key1 key2 -key3 ... keyN


------------------------------

Date: Sat, 09 May 2015 12:20:50 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: custom 'multigrep'
Message-Id: <87twvmt2st.fsf@doppelsaurus.mobileactivedefense.com>

sharma__r@hotmail.com writes:
> On Saturday, 9 May 2015 04:02:11 UTC+5:30, Rainer Weikusat  wrote:
>> Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
>> > Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
>> >> Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
>> >
>> > [multi-key matching]
>> >
>> >
>> >>>     delete($wanted{$1}) while /($love)/g;
>> >>
>> >> And this is actually wrong (or can be regared as wrong) because the
>> >> condition won't be satisfied if the tail of a longer match is a prefix
>> >> of a shorter one and both would need to match 'in the same place',
>> >
>> > Ugly workaround:
>> 
>> [...]
>> 
>> >     while (/($love)/g) {
>> > 	delete($wanted{$1});
>> > 	pos() -= length($1) - 1;
>> >     }
>> 
>> Also broken in case the current match has a prefix which is a later
>> match ... :-(
>
> The List::MoreUtils module is very helpful in this case as well,

[code example]

Most of the 'mechanic' parts of algorithms based on simple loops can be
abstracted out into more-or-less general subroutines but the equivalent
Perl code is usually by itself simple enough that this is not really
worth the effort as your code demonstrated nicely: It's not shorter than
an equivalent algorithm explicitly looping over the 'positive' matches,
just different.


------------------------------

Date: Sat, 09 May 2015 09:26:54 -0400
From: Fillmore <fillmore_remove@hotmail.com>
Subject: Re: custom 'multigrep'
Message-Id: <mil1ut$781$1@speranza.aioe.org>

On 05/03/2015 08:13 PM, Fillmore wrote:
> Hi,
>
> here is what I am aiming for:
>
> #./mymultigrep.pl <file> key1 key2 ....keyN
>
> where the number of keys is up to the use to decide.
>
> <file> would be a text file of ASCII lines
>
>
> What I need is that each line gets selected iff:
>
> $line =~ /$key1/ && $line =~ /$key2/ && $line =~ /$keyN/
>
> the result would be similar to shell commands
>
> #grep key1 <file> | grep key2| .... | grep keyN
>
>
> I know how to implement this easily with a fixed number of Keys, but the
> varying number of keys is confusing me.
>
> The next step is to allow the '!keyN' syntax to allow users to negate
> the presence of a substring/token/regexp (think grep -v)
>
> Is there a RegExp way to match a string that does not match a token? hmmmm

for the record, here is how I fixed it:

  if ( checkConditionList($ua, @keys) ) {
          :
  }

   :

sub checkConditionList {

     my $uaString = shift @_;
     my @conditions = @_;

     #no conditions => condition is verified
     return 1 unless @conditions;

     foreach my $i (0..$#conditions) {

	my $searchKey = $conditions[$i];
	if (length($searchKey) < 3) {
	    die ("Invalid SearchKey: $searchKey");
	}

	#implement mini-Expr Language to negate key
	if (substr($searchKey, 0, 1) eq "!") {
	    #remove initial "!"
	    $searchKey =~ s/^!//;
		
	    if ($uaString =~ /$searchKey/) {
		return 0;
	    }
	} else {
	
	    if ($uaString !~ /$searchKey/) {
		return 0;
	    }
	}	

     }

     #if no condition has failed, it means they are all verified
     return 1;
}

I know, I know....you guys did this at the age of 3...


------------------------------

Date: Sat, 09 May 2015 17:08:27 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: custom 'multigrep'
Message-Id: <87vbg13f9g.fsf@doppelsaurus.mobileactivedefense.com>

Fillmore <fillmore_remove@hotmail.com> writes:
> On 05/03/2015 08:13 PM, Fillmore wrote:
>> Hi,
>>
>> here is what I am aiming for:
>>
>> #./mymultigrep.pl <file> key1 key2 ....keyN
>>
>> where the number of keys is up to the use to decide.
>>
>> <file> would be a text file of ASCII lines

[...]

> for the record, here is how I fixed it:
>
>  if ( checkConditionList($ua, @keys) ) {
>          :
>  }
>
>   :
>
> sub checkConditionList {
>
>     my $uaString = shift @_;
>     my @conditions = @_;
>
>     #no conditions => condition is verified
>     return 1 unless @conditions;

Considering that the following loop won't run if there are no
conditions, this check can be omitted.

>
>     foreach my $i (0..$#conditions) {
>
> 	my $searchKey = $conditions[$i];
> 	if (length($searchKey) < 3) {
> 	    die ("Invalid SearchKey: $searchKey");
> 	}

Instead of looping over the numbers from 0 to $#conditions and using the
current number to select a condition, the loop can just use @conditions
as input list.

>
> 	#implement mini-Expr Language to negate key
> 	if (substr($searchKey, 0, 1) eq "!") {
> 	    #remove initial "!"
> 	    $searchKey =~ s/^!//;
> 		
> 	    if ($uaString =~ /$searchKey/) {
> 		return 0;
> 	    }

[...]

Another consideration: If you want to match against the same set of keys
repeatedly, it's also a good idea to do the key-interpretation step only
once and just test the matches in the loop. Also, while the idea of
using |-joined expression to find 'all the matches' can't (easily) be
used for the positive matches, it can be used for the negative ones since
it doesn't matter which of negated patterns matched. If this is supposed
to be part of a large program, a subroutine returning a 'matcher'
closure could be used instead of just coding 'it'.

-----
sub make_matcher
{
    my (@want, @dont, $dont);

    for (@_) {
	push(@dont, substr($_, 1)), next if /^!/;
	push(@want,  $_);
    }

    $dont = join('|', @dont);

    return sub {
	$_[0] =~ /$_/ or return for @want;
	return if $dont && $_[0] =~ /$dont/;

	return 1;
    };
}

my $matcher = make_matcher(@ARGV);
$matcher->($_) and print while <STDIN>;
-----

I've tested this with a 31M input file (which is very small for a
'real-world log file') and two keys of either kind. The
checkConditionList subroutine needed a little more than 1s to process
that, the other around 0.3s (I've also tested with qr//, still turned
out to be slower).


------------------------------

Date: Fri, 8 May 2015 01:07:15 -0700 (PDT)
From: deangwilliam30@gmail.com
Subject: wordx....not_wordx...wordy   pattern matching.
Message-Id: <aad31f80-f4d1-49da-a43a-f22626c87d52@googlegroups.com>

I've written a program in Powerbasic that matches the closest/least greedy "procedure/function" and "begin" pairs (in a Delphi program) so that I can insert a trace statement immediately after the first "begin" in procedures and functions.

I've looked at Perl and Tcl and see that there are things called negative lookaheads (that Powerbasic doesn't have) and can't help feeling that these might make the matching trivial but so far all of my efforts have failed.

What I'd like to do is match "proc2_edf_begin" in "proc1_abc_proc2_edf_begin_ghi_begin_end".

Any advice much appreciated.

BTW I'm not aware of the 2 in proc2 ahead of time...I'm just looking for the tighest procedure/function___begin couplings i.e. there is the odd stray procedure/function and there are potentially lots of begins in each proc/fn.


------------------------------

Date: Fri, 8 May 2015 14:39:13 -0700 (PDT)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: wordx....not_wordx...wordy   pattern matching.
Message-Id: <f95ab324-70a7-4998-983a-1022bdcc735e@googlegroups.com>

On Friday, May 8, 2015 at 1:07:20 AM UTC-7, deangwi...@gmail.com wrote:
> I've written a program in Powerbasic that matches the closest/least greedy "procedure/function" and "begin" pairs (in a Delphi program) so that I can insert a trace statement immediately after the first "begin" in procedures and functions.
> 
> I've looked at Perl and Tcl and see that there are things called negative lookaheads (that Powerbasic doesn't have) and can't help feeling that these might make the matching trivial but so far all of my efforts have failed.
> 
> What I'd like to do is match "proc2_edf_begin" in "proc1_abc_proc2_edf_begin_ghi_begin_end".
> 
> Any advice much appreciated.
> 
> BTW I'm not aware of the 2 in proc2 ahead of time...I'm just looking for the tighest procedure/function___begin couplings i.e. there is the odd stray procedure/function and there are potentially lots of begins in each proc/fn.

IIUC:

$num = ...

$str = "proc1_abc_proc2_edf_begin_ghi_begin_end"; 

while ( $str =~ /proc${num}_edf_begin/g ) {
    print "match starts: $-[0] and ends: ",$+[0]-1;
}

-- 
Charles DeRykus


------------------------------

Date: Sat, 9 May 2015 02:11:56 -0700 (PDT)
From: deangwilliam30@gmail.com
Subject: Re: wordx....not_wordx...wordy   pattern matching.
Message-Id: <94880ba7-99ef-4e60-a537-cd18c0e3f9b6@googlegroups.com>

Thank you very much for responding and sorry if I've misled you re my question.
Other than "proc" and "begin"...I have no idea what's in the main string so can't mention anything specifically in the regex other than those words.

All I want to do is make sure that there is no "proc" (or "begin") words hiding in the middle of the returned substring i.e. that starts with "proc" and ends with "begin".

I hope this clarifies my question.


------------------------------

Date: Sat, 9 May 2015 23:43:08 -0700 (PDT)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: wordx....not_wordx...wordy   pattern matching.
Message-Id: <d9429edf-8e05-46c7-ac0d-aca6177a930b@googlegroups.com>

On Saturday, May 9, 2015 at 2:12:01 AM UTC-7, deangwi...@gmail.com wrote:
> Thank you very much for responding and sorry if I've misled you re my question.
> Other than "proc" and "begin"...I have no idea what's in the main string so can't mention anything specifically in the regex other than those words.
> 
> All I want to do is make sure that there is no "proc" (or "begin") words hiding in the middle of the returned substring i.e. that starts with "proc" and ends with "begin".
> 
> I hope this clarifies my question.

Hm, maybe something like:

use feature 'say';
while ( $str =~ /(proc(.*?)begin)/g )   {
      my($match, $between, $start ) = ( $1, $2, $-[0] );

      if ( $between =~ /proc/g) {
          pos($str) = $start + pos($between);
          say "failed at $start";
          next;
      }
      say "ok at $start";
}


-- 
Charles DeRykus





------------------------------

Date: Sun, 10 May 2015 14:42:34 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: wordx....not_wordx...wordy   pattern matching.
Message-Id: <87twvkpn05.fsf@doppelsaurus.mobileactivedefense.com>

"C.DeRykus" <derykus@gmail.com> writes:
> On Saturday, May 9, 2015 at 2:12:01 AM UTC-7, deangwi...@gmail.com wrote:
>> Thank you very much for responding and sorry if I've misled you re my question.
>> Other than "proc" and "begin"...I have no idea what's in the main string so can't mention anything specifically in the regex other than those words.
>> 
>> All I want to do is make sure that there is no "proc" (or "begin") words hiding in the middle of the returned substring i.e. that starts with "proc" and ends with "begin".
>> 
>> I hope this clarifies my question.
>
> Hm, maybe something like:
>
> use feature 'say';
> while ( $str =~ /(proc(.*?)begin)/g )   {
>       my($match, $between, $start ) = ( $1, $2, $-[0] );
>
>       if ( $between =~ /proc/g) {
>           pos($str) = $start + pos($between);
>           say "failed at $start";
>           next;
>       }
>       say "ok at $start";
> }

A former colleague of mine used to have a printout which read "Think
simple!" (Einfach denken!) behind him on the wall...

----
my $s = 'proc1_abc_proc2_edf_begin_ghi_begin_en';
$s =~ /^.*(proc.*?begin)/ and print($1, "\n");
----

Additional explanation: ^.*proc matches the longest sequence of
characters from the start of the string which is followed by
proc. Hence, this proc must be the innermost proc. .*?begin matches the
shortest sequence of characters followed by begin so this begin must be
the next begin.

I admit that I played around with various more complicated ideas before
this occurred to me.


------------------------------

Date: Sun, 10 May 2015 12:04:57 -0700 (PDT)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: wordx....not_wordx...wordy   pattern matching.
Message-Id: <08ed709f-e449-416e-8fcc-d3e715d6926d@googlegroups.com>

On Sunday, May 10, 2015 at 6:42:38 AM UTC-7, Rainer Weikusat wrote:
> "C.DeRykus" <derykus@gmail.com> writes:
> > On Saturday, May 9, 2015 at 2:12:01 AM UTC-7, deangwi...@gmail.com wrote:
> >> Thank you very much for responding and sorry if I've misled you re my question.
> >> Other than "proc" and "begin"...I have no idea what's in the main string so can't mention anything specifically in the regex other than those words.
> >> 
> >> All I want to do is make sure that there is no "proc" (or "begin") words hiding in the middle of the returned substring i.e. that starts with "proc" and ends with "begin".
> >> 
> >> I hope this clarifies my question.
> >
> > Hm, maybe something like:
> >
> > use feature 'say';
> > while ( $str =~ /(proc(.*?)begin)/g )   {
> >       my($match, $between, $start ) = ( $1, $2, $-[0] );
> >
> >       if ( $between =~ /proc/g) {
> >           pos($str) = $start + pos($between);
> >           say "failed at $start";
> >           next;
> >       }
> >       say "ok at $start";
> > }
> 
> A former colleague of mine used to have a printout which read "Think
> simple!" (Einfach denken!) behind him on the wall...
> 
> ----
> my $s = 'proc1_abc_proc2_edf_begin_ghi_begin_en';
> $s =~ /^.*(proc.*?begin)/ and print($1, "\n");
> ----
> 
> Additional explanation: ^.*proc matches the longest sequence of
> characters from the start of the string which is followed by
> proc. Hence, this proc must be the innermost proc. .*?begin matches the
> shortest sequence of characters followed by begin so this begin must be
> the next begin.
> 
> I admit that I played around with various more complicated ideas before
> this occurred to me.

Nifty!  It wasn't clear to me if the idea was to identify others too.


P.S.
To quote H.v. Hoffmannsthal:

Wie wundervoll sind diese Wesen,
Die, was nicht deutbar, dennoch deuten,
Was nie geschrieben wurde, lesen,
Verworrenes beherrschend binden
Und Wege noch im Ewig-Dunkeln finden


------------------------------

Date: Sun, 10 May 2015 13:21:28 -0700 (PDT)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: wordx....not_wordx...wordy   pattern matching.
Message-Id: <4ccb98ca-19f6-4798-980b-5e00cc13bfab@googlegroups.com>

On Sunday, May 10, 2015 at 12:05:00 PM UTC-7, C.DeRykus wrote:
> On Sunday, May 10, 2015 at 6:42:38 AM UTC-7, Rainer Weikusat wrote:
> > "C.DeRykus" <derykus@gmail.com> writes:
> > > On Saturday, May 9, 2015 at 2:12:01 AM UTC-7, deangwi...@gmail.com wrote:
> > >> Thank you very much for responding and sorry if I've misled you re my question.
> > >> Other than "proc" and "begin"...I have no idea what's in the main string so can't mention anything specifically in the regex other than those words.
> > >> 
> > >> All I want to do is make sure that there is no "proc" (or "begin") words hiding in the middle of the returned substring i.e. that starts with "proc" and ends with "begin".
> > >> 
> > >> I hope this clarifies my question.
> > >
> > > Hm, maybe something like:
> > >
> > > use feature 'say';
> > > while ( $str =~ /(proc(.*?)begin)/g )   {
> > >       my($match, $between, $start ) = ( $1, $2, $-[0] );
> > >
> > >       if ( $between =~ /proc/g) {
> > >           pos($str) = $start + pos($between);
> > >           say "failed at $start";
> > >           next;
> > >       }
> > >       say "ok at $start";
> > > }
> > 
> > A former colleague of mine used to have a printout which read "Think
> > simple!" (Einfach denken!) behind him on the wall...
> > 
> > ----
> > my $s = 'proc1_abc_proc2_edf_begin_ghi_begin_en';
> > $s =~ /^.*(proc.*?begin)/ and print($1, "\n");
> > ----
> > 
> > Additional explanation: ^.*proc matches the longest sequence of
> > characters from the start of the string which is followed by
> > proc. Hence, this proc must be the innermost proc. .*?begin matches the
> > shortest sequence of characters followed by begin so this begin must be
> > the next begin.
> > 
> > I admit that I played around with various more complicated ideas before
> > this occurred to me.
> 
> Nifty!  It wasn't clear to me if the idea was to identify others too.
> 

I suspect strongly though that was the answer. (Unless...maybe he wanted only the first of multiple. Sigh...always a "what if")

 




------------------------------

Date: Sun, 10 May 2015 21:29:19 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: wordx....not_wordx...wordy   pattern matching.
Message-Id: <87zj5crxb4.fsf@doppelsaurus.mobileactivedefense.com>

"C.DeRykus" <derykus@gmail.com> writes:
> On Sunday, May 10, 2015 at 12:05:00 PM UTC-7, C.DeRykus wrote:
>> On Sunday, May 10, 2015 at 6:42:38 AM UTC-7, Rainer Weikusat wrote:
>> > "C.DeRykus" <derykus@gmail.com> writes:
>> > > On Saturday, May 9, 2015 at 2:12:01 AM UTC-7, deangwi...@gmail.com wrote:
>> > >> Thank you very much for responding and sorry if I've misled you re my question.
>> > >> Other than "proc" and "begin"...I have no idea what's in the main string so can't mention anything specifically in the regex other than those words.
>> > >> 
>> > >> All I want to do is make sure that there is no "proc" (or "begin") words hiding in the middle of the returned substring i.e. that starts with "proc" and ends with "begin".
>> > >> 
>> > >> I hope this clarifies my question.
>> > >
>> > > Hm, maybe something like:
>> > >
>> > > use feature 'say';
>> > > while ( $str =~ /(proc(.*?)begin)/g )   {
>> > >       my($match, $between, $start ) = ( $1, $2, $-[0] );
>> > >
>> > >       if ( $between =~ /proc/g) {
>> > >           pos($str) = $start + pos($between);
>> > >           say "failed at $start";
>> > >           next;
>> > >       }
>> > >       say "ok at $start";
>> > > }
>> > 
>> > A former colleague of mine used to have a printout which read "Think
>> > simple!" (Einfach denken!) behind him on the wall...
>> > 
>> > ----
>> > my $s = 'proc1_abc_proc2_edf_begin_ghi_begin_en';
>> > $s =~ /^.*(proc.*?begin)/ and print($1, "\n");
>> > ----
>> > 
>> > Additional explanation: ^.*proc matches the longest sequence of
>> > characters from the start of the string which is followed by
>> > proc. Hence, this proc must be the innermost proc. .*?begin matches the
>> > shortest sequence of characters followed by begin so this begin must be
>> > the next begin.
>> > 
>> > I admit that I played around with various more complicated ideas before
>> > this occurred to me.
>> 
>> Nifty!  It wasn't clear to me if the idea was to identify others too.
>> 
>
> I suspect strongly though that was the answer. (Unless...maybe he
> wanted only the first of multiple. Sigh...always a "what if")

Considering his other statements, he probably wants to find the begin
matching a proc. It's easy to do this recursively. The code below
assumes that a 'start of something' is the string 'proc' plus whatever
follows upto but not including the next _. It finds the matching begin
(for all pairs) and insert a 'trace_proc...' after that.

---
my $s = 'proc0_rrt_begin_tttty_proc1_abc_proc2_edf_begin_ghi_begin_en_proc4_begin';

sub add_traces
{
    my ($proc, $begin, $trace);

    pos($_[0]) = $_[1];
    
    while ($_[0] =~ /(proc.*?(?=_)|begin)/g) {
	if (substr($1, 0, 4) eq 'proc') {
	    $trace = "[trace_$1]";
	    $begin = add_traces($_[0], pos($_[0]));
	    substr($_[0], $begin, 0,  $trace);
	    pos($_[0]) = $begin + length($trace);
	    
	    next;
	}

	return pos($_[0]);
    }
}

add_traces($s);
print("$s\n");


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4429
***************************************


home help back first fref pref prev next nref lref last post