[31408] in Perl-Users-Digest
Perl-Users Digest, Issue: 2660 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Oct 31 06:09:42 2009
Date: Sat, 31 Oct 2009 03:09:04 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sat, 31 Oct 2009 Volume: 11 Number: 2660
Today's topics:
deepcopy by Data:Dumper quirk <peter@www.pjb.com.au>
Re: deepcopy by Data:Dumper quirk <uri@StemSystems.com>
Re: FAQ 6.12 Can I use Perl regular expressions to matc sln@netherlands.com
Re: FAQ 6.12 Can I use Perl regular expressions to matc sln@netherlands.com
Re: FAQ 6.12 Can I use Perl regular expressions to matc sln@netherlands.com
Re: Perl to read EMBEDDED Outlook email? <hjp-usenet2@hjp.at>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: 30 Oct 2009 07:36:33 GMT
From: Peter Billam <peter@www.pjb.com.au>
Subject: deepcopy by Data:Dumper quirk
Message-Id: <slrnhel5s1.3o2.peter@box8.pjb.com.au>
Greetings: This must be a faq somewhere, but I haven't found it
and it's not in perldoc Data::Dumper... Trying to deepcopy @a:
use Data::Dumper;
@a = ([1,2,3], [4,5,6,], [7,8,9]);
print "a is ", Dumper(@a);
@b = eval Dumper(@a);
print "b is ", Dumper(@b);
fails. It outputs:
a is $VAR1 = [
1,
2,
3
];
$VAR2 = [
4,
5,
6
];
$VAR3 = [
7,
8,
9
];
b is $VAR1 = [
7,
8,
9
];
so @b only gets the last element in @a :-(
The only thing I could get working was ugly:
use Data::Dumper;
$Data::Dumper::Terse = 1;
@a = ([1,2,3], [4,5,6,], [7,8,9]);
print "a = ", Dumper(@a);
@b = eval '('.join(',',Dumper(@a)).')';
print "b = ", Dumper(@b);
Is Data:Dumper really the best way to deepcopy ?
Should I reference first, then eval Dumper, then dereference,
so that Dumper only has to handle one argument ?
Converting to a string and back seems crazy anyway.
Apologies for what must be an entry-level question,
Regards, Peter
--
Peter Billam www.pjb.com.au www.pjb.com.au/comp/contact.html
------------------------------
Date: Fri, 30 Oct 2009 03:59:25 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: deepcopy by Data:Dumper quirk
Message-Id: <874ophwbvm.fsf@quad.sysarch.com>
>>>>> "PB" == Peter Billam <peter@www.pjb.com.au> writes:
PB> Greetings: This must be a faq somewhere, but I haven't found it
PB> and it's not in perldoc Data::Dumper... Trying to deepcopy @a:
PB> use Data::Dumper;
PB> @a = ([1,2,3], [4,5,6,], [7,8,9]);
PB> print "a is ", Dumper(@a);
Dumper( \@a ) is what you want. you are doing 3 dumps vs one dump of @a.
and you can use other modules such as Storable which has a clone method.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
------------------------------
Date: Thu, 29 Oct 2009 21:22:03 -0700
From: sln@netherlands.com
Subject: Re: FAQ 6.12 Can I use Perl regular expressions to match balanced text?
Message-Id: <g1oke5toi1id6h5j178kt5pcebsefca590@4ax.com>
On Thu, 29 Oct 2009 04:00:02 GMT, PerlFAQ Server <brian@theperlreview.com> wrote:
>This is an excerpt from the latest version perlfaq6.pod, which
>comes with the standard Perl distribution. These postings aim to
>reduce the number of repeated questions as well as allow the community
>to review and update the answers. The latest version of the complete
>perlfaq is at http://faq.perl.org .
>
>--------------------------------------------------------------------
>
>6.12: Can I use Perl regular expressions to match balanced text?
>
> (contributed by brian d foy)
>
> Your first try should probably be the "Text::Balanced" module, which is
> in the Perl standard library since Perl 5.8. It has a variety of
> functions to deal with tricky text. The "Regexp::Common" module can also
> help by providing canned patterns you can use.
>
> As of Perl 5.10, you can match balanced text with regular expressions
> using recursive patterns. Before Perl 5.10, you had to resort to various
> tricks such as using Perl code in "(??{})" sequences.
>
> Here's an example using a recursive regular expression. The goal is to
> capture all of the text within angle brackets, including the text in
> nested angle brackets. This sample text has two "major" groups: a group
> with one level of nesting and a group with two levels of nesting. There
> are five total groups in angle brackets:
>
> I have some <brackets in <nested brackets> > and
> <another group <nested once <nested twice> > >
> and that's it.
>
> The regular expression to match the balanced text uses two new (to Perl
> 5.10) regular expression features. These are covered in perlre and this
> example is a modified version of one in that documentation.
>
> First, adding the new possesive "+" to any quantifier finds the longest
> match and does not backtrack. That's important since you want to handle
> any angle brackets through the recursion, not backtracking. The group
> "[^<>]++" finds one or more non-angle brackets without backtracking.
>
> Second, the new "(?PARNO)" refers to the sub-pattern in the particular
> capture buffer given by "PARNO". In the following regex, the first
> capture buffer finds (and remembers) the balanced text, and you need
> that same pattern within the first buffer to get past the nested text.
> That's the recursive part. The "(?1)" uses the pattern in the outer
> capture buffer as an independent part of the regex.
>
> Putting it all together, you have:
>
> #!/usr/local/bin/perl5.10.0
>
> my $string =<<"HERE";
> I have some <brackets in <nested brackets> > and
> <another group <nested once <nested twice> > >
> and that's it.
> HERE
>
> my @groups = $string =~ m/
> ( # start of capture buffer 1
> < # match an opening angle bracket
> (?:
> [^<>]++ # one or more non angle brackets, non backtracking
> |
> (?1) # found < or >, so recurse to capture buffer 1
> )*
> > # match a closing angle bracket
> ) # end of capture buffer 1
> /xg;
>
> $" = "\n\t";
> print "Found:\n\t@groups\n";
>
> The output shows that Perl found the two major groups:
>
> Found:
> <brackets in <nested brackets> >
> <another group <nested once <nested twice> > >
>
> With a little extra work, you can get the all of the groups in angle
> brackets even if they are in other angle brackets too. Each time you get
> a balanced match, remove its outer delimiter (that's the one you just
> matched so don't match it again) and add it to a queue of strings to
> process. Keep doing that until you get no matches:
>
> #!/usr/local/bin/perl5.10.0
>
> my @queue =<<"HERE";
> I have some <brackets in <nested brackets> > and
> <another group <nested once <nested twice> > >
> and that's it.
> HERE
>
> my $regex = qr/
> ( # start of bracket 1
> < # match an opening angle bracket
> (?:
> [^<>]++ # one or more non angle brackets, non backtracking
> |
> (?1) # recurse to bracket 1
> )*
> > # match a closing angle bracket
> ) # end of bracket 1
> /x;
>
> $" = "\n\t";
>
> while( @queue )
> {
> my $string = shift @queue;
>
> my @groups = $string =~ m/$regex/g;
> print "Found:\n\t@groups\n\n" if @groups;
>
> unshift @queue, map { s/^<//; s/>$//; $_ } @groups;
> }
>
> The output shows all of the groups. The outermost matches show up first
> and the nested matches so up later:
>
> Found:
> <brackets in <nested brackets> >
> <another group <nested once <nested twice> > >
>
> Found:
> <nested brackets>
>
> Found:
> <nested once <nested twice> >
>
> Found:
> <nested twice>
>
>
>
>--------------------------------------------------------------------
>
>The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
>are not necessarily experts in every domain where Perl might show up,
>so please include as much information as possible and relevant in any
>corrections. The perlfaq-workers also don't have access to every
>operating system or platform, so please include relevant details for
>corrections to examples that do not work on particular platforms.
>Working code is greatly appreciated.
>
>If you'd like to help maintain the perlfaq, see the details in
>perlfaq.pod.
I originally just wanted to comment on the way this grouping
affects recursion ..
(?:
[^<>]++ # one or more non angle brackets, non backtracking
|
(?1) # found < or >, so recurse to capture buffer 1
)*
This form tells the engine to to *recurse* on either '<' or '>'
when in fact you don't want to recurse on '>'.
The way around this is to use this form ..
(?:
[^<>]*+ # 0 or more non angle brackets, non backtracking
|
(?1) # found <, so recurse to capture buffer 1
)+
This avoids an unecessary recursion per <> pair, but does the same thing.
-----------------
Now that that is said, I have some problems with this new recursive
feature.
The fact that the expanded faq example (above) has to jump through
so many hoops, while/shift/unshift/map ... , all stems from the
idiom of this new feature.
And that is recursing a capture buffer expression instead of recursing a group,
capture or not, and letting the data captures take care of themselves.
Even though nodes are known, there is probably just no implementation of being
able to *name* and reference a non-capture grouping.
That is too bad since all you want to do is just recurse an expression.
Now there is the extra baggage of having a single container capture buffer possibly
storing the entire document if the begin and end delimeters are at the beginning
and end of the document.
This result would be absolutely meaningless and convoluted as the faq example above
demonstrates.
If *I* can extract nested balanced text, the engine should be able to do that for me,
but it can't because of recursion that is only offerred to a *capture* group.
And none of any sub-capture nested groups are recorded.
I'm sure there is some limited use for this, but right now it looks like very little.
I welcome comments/rebuttal.
An alternative to the above faq example-mangulation is submitted below.
-sln
- - - - - - - -
use strict;
use warnings;
my @queue =<<"HERE";
I have so<<<<>>>>me <brackets in <nested brackets> > and <another group <nested once <nested twice> > > and that's it.
HERE
my @enter; # position queue of each nested level of bracket
my $regex = qr/
(
< # open angle bracket
(?{ push @enter,pos();
print "\n* enter <",scalar(@enter)," at ",pos();
})
(?:
[^<>]*+ # one or more non angle brackets, non backtracking
|
(?{ print "\n recurs - ",pos();})
(?1) # recurse to open bracket (capture grp 1)
)+
> # close angle bracket
(?{ print "\n* leave ",scalar(@enter),"> at ",pos()," = Found: ";
print substr($_, $enter[$#enter]-1, 1+pos() - pop @enter);
})
)
/x;
$" = "\n\t";
my $string = shift @queue;
my @groups = $string =~ /$regex/g;
print "\n\n\$1 Captures:\n\t@groups\n\n" if @groups;
@enter = ();
$string =~ s/$regex//g;
print "\n\n\$string:\n\t$string";
__END__
* enter <1 at 10
recurs - 10
* enter <2 at 11
recurs - 11
* enter <3 at 12
recurs - 12
* enter <4 at 13
* leave 4> at 14 = Found: <>
* leave 3> at 15 = Found: <<>>
* leave 2> at 16 = Found: <<<>>>
* leave 1> at 17 = Found: <<<<>>>>
* enter <1 at 21
recurs - 33
* enter <2 at 34
* leave 2> at 50 = Found: <nested brackets>
* leave 1> at 52 = Found: <brackets in <nested brackets> >
* enter <1 at 58
recurs - 72
* enter <2 at 73
recurs - 85
* enter <3 at 86
* leave 3> at 99 = Found: <nested twice>
* leave 2> at 101 = Found: <nested once <nested twice> >
* leave 1> at 103 = Found: <another group <nested once <nested twice> > >
$1 Captures:
<<<<>>>>
<brackets in <nested brackets> >
<another group <nested once <nested twice> > >
* enter <1 at 10
recurs - 10
* enter <2 at 11
recurs - 11
* enter <3 at 12
recurs - 12
* enter <4 at 13
* leave 4> at 14 = Found: <>
* leave 3> at 15 = Found: <<>>
* leave 2> at 16 = Found: <<<>>>
* leave 1> at 17 = Found: <<<<>>>>
* enter <1 at 21
recurs - 33
* enter <2 at 34
* leave 2> at 50 = Found: <nested brackets>
* leave 1> at 52 = Found: <brackets in <nested brackets> >
* enter <1 at 58
recurs - 72
* enter <2 at 73
recurs - 85
* enter <3 at 86
* leave 3> at 99 = Found: <nested twice>
* leave 2> at 101 = Found: <nested once <nested twice> >
* leave 1> at 103 = Found: <another group <nested once <nested twice> > >
$string:
I have some and and that's it.
------------------------------
Date: Thu, 29 Oct 2009 21:38:48 -0700
From: sln@netherlands.com
Subject: Re: FAQ 6.12 Can I use Perl regular expressions to match balanced text?
Message-Id: <qvqke5humoun2n4i1g3doegpfkau3hhind@4ax.com>
On Thu, 29 Oct 2009 21:22:03 -0700, sln@netherlands.com wrote:
>On Thu, 29 Oct 2009 04:00:02 GMT, PerlFAQ Server <brian@theperlreview.com> wrote:
>
>>6.12: Can I use Perl regular expressions to match balanced text?
>>
>> (contributed by brian d foy)
<snip>
>I originally just wanted to comment on the way this grouping
>affects recursion ..
> (?:
> [^<>]++ # one or more non angle brackets, non backtracking
> |
> (?1) # found < or >, so recurse to capture buffer 1
> )*
>This form tells the engine to to *recurse* on either '<' or '>'
>when in fact you don't want to recurse on '>'.
>The way around this is to use this form ..
(This form will cause a crash if unbalanced text sample)
> (?:
> [^<>]*+ # 0 or more non angle brackets, non backtracking
> |
> (?1) # found <, so recurse to capture buffer 1
> )+
>This avoids an unecessary recursion per <> pair, but does the same thing.
>
No, no.. forget the above optimization. Since I just tried this on
un-balanced text, it crashed Perl.
So the original is correct (ie: doesen't crash):
(?:
[^<>]++ # one or more non angle brackets, non backtracking
|
(?1) # found < or >, so recurse to capture buffer 1
)*
Then the corrected code lines:
>
>- - - - - - - -
>use strict;
>use warnings;
>
>my @queue =<<"HERE";
>I have so<<<<>>>>me <brackets in <nested brackets> > and <another group <nested once <nested twice> > > and that's it.
>HERE
>
>my @enter; # position queue of each nested level of bracket
>
>my $regex = qr/
>
> (
> < # open angle bracket
> (?{ push @enter,pos();
> print "\n* enter <",scalar(@enter)," at ",pos();
> })
>
> (?:
[^<>]++ # one or more non angle brackets, non backtracking
> |
> (?{ print "\n recurs - ",pos();})
>
> (?1) # recurse to open bracket (capture grp 1)
)*
> > # close angle bracket
>
> (?{ print "\n* leave ",scalar(@enter),"> at ",pos()," = Found: ";
> print substr($_, $enter[$#enter]-1, 1+pos() - pop @enter);
> })
> )
>/x;
>
>$" = "\n\t";
>my $string = shift @queue;
>my @groups = $string =~ /$regex/g;
>print "\n\n\$1 Captures:\n\t@groups\n\n" if @groups;
>
>@enter = ();
>$string =~ s/$regex//g;
>print "\n\n\$string:\n\t$string";
>
>__END__
Sorry about that.
-sln
------------------------------
Date: Fri, 30 Oct 2009 19:11:08 -0700
From: sln@netherlands.com
Subject: Re: FAQ 6.12 Can I use Perl regular expressions to match balanced text?
Message-Id: <tm6ne5he6ua4r01oo6q0jtnok9u461haiv@4ax.com>
On Thu, 29 Oct 2009 21:38:48 -0700, sln@netherlands.com wrote:
>On Thu, 29 Oct 2009 21:22:03 -0700, sln@netherlands.com wrote:
>
>>On Thu, 29 Oct 2009 04:00:02 GMT, PerlFAQ Server <brian@theperlreview.com> wrote:
>>
>>>6.12: Can I use Perl regular expressions to match balanced text?
>>>
>>> (contributed by brian d foy)
>
><snip>
>
>>I originally just wanted to comment on the way this grouping
>>affects recursion ..
>> (?:
>> [^<>]++ # one or more non angle brackets, non backtracking
>> |
>> (?1) # found < or >, so recurse to capture buffer 1
>> )*
>>This form tells the engine to to *recurse* on either '<' or '>'
>>when in fact you don't want to recurse on '>'.
>>The way around this is to use this form ..
>
> (This form will cause a crash if unbalanced text sample)
>> (?:
>> [^<>]*+ # 0 or more non angle brackets, non backtracking
>> |
>> (?1) # found <, so recurse to capture buffer 1
>> )+
>>This avoids an unecessary recursion per <> pair, but does the same thing.
>>
>
>No, no.. forget the above optimization. Since I just tried this on
>un-balanced text, it crashed Perl.
>So the original is correct (ie: doesen't crash):
> (?:
> [^<>]++ # one or more non angle brackets, non backtracking
> |
> (?1) # found < or >, so recurse to capture buffer 1
> )*
>
Thinking about this a bit, this really drives home that this
5.10 recursion feature was extremely poorly written, if not
thought out.
That changing a simple quantifier can actually CRASH Perl.
No infinite loop or any other stuff, just a pure crash when
substituting + for *
It makes me think that some BOZO(s') is(are) programming Perl.
And that this form:
(?:
[^<>]++ # one or more non angle brackets, non backtracking
|
(?1) # found < or >, so recurse to capture buffer 1
)*
is almost *manditory* when doing recursion.
Simply sad, sad, sad, sad...
Let me write the Regex engine code, I can make it stand up
and *whistle dixie*. Long live the South!
-sln
------------------------------
Date: Sat, 31 Oct 2009 09:34:19 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Perl to read EMBEDDED Outlook email?
Message-Id: <slrnhentkc.jin.hjp-usenet2@hrunkner.hjp.at>
On 2009-10-29 20:41, maylin <yudelin100@gmail.com> wrote:
> Hope some experts can help me on this -
>
> If you have an Outlook email that includes an attached email, how can
> you use Perl to read/parse that embedded email? I tried to use
> WIN32::OLE, and I know you can get the mail body this way:
> $mailbody=$folder->Items->Item($i)->Body;
Can you get at the complete rfc5322-conforming text of the email?
If you can, simply parse it with MIME::Parser and extract the parts you
need from the MIME::Entity object it returns.
hp
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 2660
***************************************