[30123] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 1366 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Mar 16 18:09:45 2008

Date: Sun, 16 Mar 2008 15:09:09 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sun, 16 Mar 2008     Volume: 11 Number: 1366

Today's topics:
    Re: comparing a 2D array <rvtol+news@isolution.nl>
    Re: FAQ 5.28 How can I read in an entire file all at on <rvtol+news@isolution.nl>
    Re: FAQ 5.28 How can I read in an entire file all at on <bik.mido@tiscalinet.it>
    Re: Fastest way to find a match? <rvtol+news@isolution.nl>
    Re: Fastest way to find a match? <bik.mido@tiscalinet.it>
    Re: help with a regex <uri@stemsystems.com>
    Re: help with a regex <tadmc@seesig.invalid>
    Re: help with a regex <someone@example.com>
    Re: HTML parsing <stoupa@practisoft.cz>
    Re: HTML parsing <joost@zeekat.nl>
    Re: Matching multiple subexpressions in a regular expre <rvtol+news@isolution.nl>
    Re: Matching multiple subexpressions in a regular expre <nospam-abuse@ilyaz.org>
        open a file generated dynamically <ela@yantai.org>
    Re: open a file generated dynamically <jurgenex@hotmail.com>
    Re: Pattern extraction <szrRE@szromanMO.comVE>
    Re: P~(ptilde) 0.9 released, new scripting language wit <ptilderegex@gmail.com>
    Re: regular expression with split goes wrong ? <abigail@abigail.be>
    Re: Variables interpolated in character classes? <abigail@abigail.be>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sun, 16 Mar 2008 12:45:52 +0100
From: "Dr.Ruud" <rvtol+news@isolution.nl>
Subject: Re: comparing a 2D array
Message-Id: <frj51q.184.1@news.isolution.nl>

Rose schreef:

>     @attr1 = split(/[\t ]+/, $line);


You could write  /[\t ]+/  as  /[[:blank:]]+/

but most often this is what you want:

     @attr1 = split " ", $line;

See perldoc -f split.

-- 
Affijn, Ruud

"Gewoon is een tijger."


------------------------------

Date: Sun, 16 Mar 2008 12:53:26 +0100
From: "Dr.Ruud" <rvtol+news@isolution.nl>
Subject: Re: FAQ 5.28 How can I read in an entire file all at once?
Message-Id: <frj5b9.17k.1@news.isolution.nl>

PerlFAQ Server schreef:

>             $var = do { local $/; <INPUT> };


This variant uses less memory:

          my $data;
          do { local $/; $data = <INPUT> };

-- 
Affijn, Ruud

"Gewoon is een tijger."


------------------------------

Date: Sun, 16 Mar 2008 23:05:54 +0100
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: FAQ 5.28 How can I read in an entire file all at once?
Message-Id: <ad6rt3l32qdleii6stq4kmr91nirf655a1@4ax.com>

On Sun, 16 Mar 2008 12:53:26 +0100, "Dr.Ruud"
<rvtol+news@isolution.nl> wrote:

>>             $var = do { local $/; <INPUT> };
>
>
>This variant uses less memory:
>
>          my $data;
>          do { local $/; $data = <INPUT> };

Too bad that it does, because the former is certainly more
appealing...


Michele
-- 
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
 .'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,


------------------------------

Date: Sun, 16 Mar 2008 12:58:30 +0100
From: "Dr.Ruud" <rvtol+news@isolution.nl>
Subject: Re: Fastest way to find a match?
Message-Id: <frj5kq.184.1@news.isolution.nl>

bukzor schreef:

> I'm trying to find the fastest way in perl to see if a name 
> contains another.
> 
> I've a list of 2704 names (aka "A")
> 
> I've another name (aka "B")
> 
> I need to know if any of A is contained in B.

Considered fgrep?

-- 
Affijn, Ruud

"Gewoon is een tijger."


------------------------------

Date: Sun, 16 Mar 2008 23:03:28 +0100
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: Fastest way to find a match?
Message-Id: <486rt39if0r9okoaq3vnt7cpcq2v8an3oc@4ax.com>

On Sat, 15 Mar 2008 19:10:57 +0100, jm <jm@nospam.fr> wrote:

>I didnot find ~~ in man perlop.
>
>Is this a perl operator?
>or a L::U operator?

A

  use 5.010;

operator. I wanted to stay 5.10ish. Of course you can use =~ instead.


Michele
-- 
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
 .'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,


------------------------------

Date: Sun, 16 Mar 2008 07:10:19 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: help with a regex
Message-Id: <x78x0jb0v8.fsf@mail.sysarch.com>

>>>>> "SK" == Steve K <savagebeaste@yahoo.com> writes:

  SK> Uri Guttman wrote:
  >>>>>>> "d" == donebrowsers  <donebrowsers@yahoo.com> writes:
  >> 
  d> While that works and I appreciate it, I was just using the []s as
  >> a d> placeholder. I'm actually using PHP's <a
  >> href="http://us3.php.net/ d>
  >> manual/en/function.preg-match.php">preg_match()</a> function which
  d> uses PERL style regular expressions. I submitted it to this group
  d> because PERL programmers tend to be better with regular
  >> expressions d> than anyone else.
  >> 
  >> it is Perl, never PERL. preg is NOT perl, nor is it compatible with
  >> perl. that is why we use Perl and not php. the answer you got was
  >> valid perl and that will likely be all you will get here.

  SK> 1) What right do you have to speak for everyone? You should of
  SK> said "and that will likely be all you will get from me" as there
  SK> are plenty of people who actually offer help without the attitude
  SK> people like your self feel they must attach. You may not like PHP,
  SK> but it has a place, just as Perl does.

not in this group. that is the whole point. this group is about perl and
not php. you can find php help over there --->

  SK> 2) Yes we all know it's Perl. The world will cease to rotate
  SK> properly on it's axis and we will die well in advance of 2012
  SK> (right...) if someone says PERL... one could always leave a
  SK> friendly little note about it if it really bothers you that much
  SK> rather than the rude fucktardery your types like to share.

i would rather correct it when and how i please. you can uncorrect it as
you wish.

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Architecture, Development, Training, Support, Code Review  ------
-----------  Search or Offer Perl Jobs  ----- http://jobs.perl.org  ---------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Sun, 16 Mar 2008 10:38:04 -0500
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: help with a regex
Message-Id: <slrnftqfms.79l.tadmc@tadmc30.sbcglobal.net>

Steve K. <savagebeaste@yahoo.com> wrote:
> Uri Guttman wrote:
>>>>>>> "d" == donebrowsers  <donebrowsers@yahoo.com> writes:
>>
>>  d> While that works and I appreciate it, I was just using the []s as
>>  a d> placeholder. I'm actually using PHP's <a
>>  href="http://us3.php.net/ d>
>>  manual/en/function.preg-match.php">preg_match()</a> function which
>>  d> uses PERL style regular expressions. I submitted it to this group
>>  d> because PERL programmers tend to be better with regular
>> expressions d> than anyone else.
>>
>> it is Perl, never PERL. preg is NOT perl, nor is it compatible with
>> perl. that is why we use Perl and not php. the answer you got was
>> valid perl and that will likely be all you will get here.
>
> 1) What right do you have to speak for everyone? You should of said "and 
                                                       ^^^^^^^^^
                                                       ^^^^^^^^^

"Who would cross the Bridge of Death must answer me these questions three"


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Sun, 16 Mar 2008 16:59:15 GMT
From: "John W. Krahn" <someone@example.com>
Subject: Re: help with a regex
Message-Id: <DlcDj.85555$FO1.51862@edtnps82>

Tad J McClellan wrote:
> Steve K. <savagebeaste@yahoo.com> wrote:
>> Uri Guttman wrote:
>>>>>>>> "d" == donebrowsers  <donebrowsers@yahoo.com> writes:
>>>  d> While that works and I appreciate it, I was just using the []s as
>>>  a d> placeholder. I'm actually using PHP's <a
>>>  href="http://us3.php.net/ d>
>>>  manual/en/function.preg-match.php">preg_match()</a> function which
>>>  d> uses PERL style regular expressions. I submitted it to this group
>>>  d> because PERL programmers tend to be better with regular
>>> expressions d> than anyone else.
>>>
>>> it is Perl, never PERL. preg is NOT perl, nor is it compatible with
>>> perl. that is why we use Perl and not php. the answer you got was
>>> valid perl and that will likely be all you will get here.
>> 1) What right do you have to speak for everyone? You should of said "and 
>                                                        ^^^^^^^^^
>                                                        ^^^^^^^^^
> 
> "Who would cross the Bridge of Death must answer me these questions three"

Blue.  No yel--  Auuuuuuuugh!


John
-- 
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall


------------------------------

Date: Sun, 16 Mar 2008 20:21:12 +0100
From: "Petr Vileta" <stoupa@practisoft.cz>
Subject: Re: HTML parsing
Message-Id: <frjt62$22dt$1@ns.felk.cvut.cz>

Jürgen Exner wrote:
> June Lee <iiuu66@yahoo.com> wrote:
>> any good way to extract the data?
>> I want to parse the following HTML page
>
> As has been mentioned many, many times in this NG: if you want to
> parse HTML then use an HTML parser.
You are right when HTML page are valid, but on not valid pages Parser fail. 
The HTML code bellow work for most browsers but Parser fail on it.

--- example ---
<html><body>
<table border=1>
<tr><td>1<td>first row</td>
<tr><td>2<td>second row</td>
</table>
</body>
</html>
--- example ---

-- 
Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your
mail from another non-spammer site please.)

Please reply to <petr AT practisoft DOT cz>



------------------------------

Date: Sun, 16 Mar 2008 22:58:40 +0100
From: Joost Diepenmaat <joost@zeekat.nl>
Subject: Re: HTML parsing
Message-Id: <87lk4is54f.fsf@zeekat.nl>

"Petr Vileta" <stoupa@practisoft.cz> writes:

> Jürgen Exner wrote:
>> June Lee <iiuu66@yahoo.com> wrote:
>>> any good way to extract the data?
>>> I want to parse the following HTML page
>>
>> As has been mentioned many, many times in this NG: if you want to
>> parse HTML then use an HTML parser.
> You are right when HTML page are valid, but on not valid pages Parser
> fail. The HTML code bellow work for most browsers but Parser fail on
> it.

HTML parsers aren't XML parsers. If they were, life would be a lot
easier for people writing browsers, and 90% of the pages on the web
would not render at all.

> --- example ---
> <html><body>
> <table border=1>
> <tr><td>1<td>first row</td>
> <tr><td>2<td>second row</td>
> </table>
> </body>
> </html>
> --- example ---

Works fine in HTML::Parser:

#!/usr/local/bin/perl
use HTML::Parser;


my $p = HTML::Parser->new( api_version => 3,
			   start_h => [\&start, "tagname, attr"],
			   end_h   => [\&end,   "tagname"],
			   marked_sections => 1,
			 );
$p->parse_file(\*DATA);

sub start {
  print "START: @_\n";
}
sub end {
  print "END: @_\n";
}

__DATA__
 <html><body>
 <table border=1>
 <tr><td>1<td>first row</td>
 <tr><td>2<td>second row</td>
 </table>
 </body>
 </html>


-- 
Joost Diepenmaat | blog: http://joost.zeekat.nl/ | work: http://zeekat.nl/


------------------------------

Date: Sun, 16 Mar 2008 13:25:04 +0100
From: "Dr.Ruud" <rvtol+news@isolution.nl>
Subject: Re: Matching multiple subexpressions in a regular expression
Message-Id: <frj73u.120.1@news.isolution.nl>

ShaunJ schreef:

> As it turns out, there is a bug in
> Perl 5.8.6 (which is shipped with MacOSX 10.4.11 incidentally). Using
> either English or $& causes the memory leak. This bug is fixed in
> 5.10.0.

It is neither a leak nor a bug. Read perldoc perlre. 

-- 
Affijn, Ruud

"Gewoon is een tijger."


------------------------------

Date: Sun, 16 Mar 2008 21:52:25 +0000 (UTC)
From:  Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: Matching multiple subexpressions in a regular expression
Message-Id: <frk4qp$1okm$1@agate.berkeley.edu>

[A complimentary Cc of this posting was NOT [per weedlist] sent to
Dr.Ruud
<rvtol+news@isolution.nl>], who wrote in article <frj73u.120.1@news.isolution.nl>:
> ShaunJ schreef:
> 
> > As it turns out, there is a bug in
> > Perl 5.8.6 (which is shipped with MacOSX 10.4.11 incidentally). Using
> > either English or $& causes the memory leak. This bug is fixed in
> > 5.10.0.
> 
> It is neither a leak nor a bug. Read perldoc perlre. 

If you think it is not a bug, please explain what is the purpose of
the stored information.

Thanks,
Ilya


------------------------------

Date: Sun, 16 Mar 2008 18:57:51 +0800
From: "Ela" <ela@yantai.org>
Subject: open a file generated dynamically
Message-Id: <friufk$kdh$1@ijustice.itsc.cuhk.edu.hk>

a file named "summary.####" is generated dynamically where #### is a number 
with unknown digits (i.e. maybe 1 to 9), how can I ask perl to open this 
file for further processing? 




------------------------------

Date: Sun, 16 Mar 2008 11:51:19 GMT
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: open a file generated dynamically
Message-Id: <v72qt395g5g3tjai6al1olpik8egsketjf@4ax.com>

"Ela" <ela@yantai.org> wrote:
>a file named "summary.####" is generated dynamically where #### is a number 
>with unknown digits (i.e. maybe 1 to 9), how can I ask perl to open this 
>file for further processing? 

You will have to determine the actual file name first, e.g. by using glob()
or opendir() and readdir() and then isolating the desired name from the
list.

jue


------------------------------

Date: Sun, 16 Mar 2008 00:09:19 -0700
From: "szr" <szrRE@szromanMO.comVE>
Subject: Re: Pattern extraction
Message-Id: <frih3004c3@news2.newsguy.com>

John W. Krahn wrote:
> szr wrote:
>> Abigail wrote:
>>>                                                          _
>>> Easy piecy!
>>>
>>>    $str       = "/a/b/c/";  # Or "/a/b/c/d/".
>>>    my $answer = substr $str, 5, 1;
>>>
>>>    say $answer;
>>
>> "say" ?
>>
>>    Can't call method "say" without a package or object reference
>>
>>
>> I am curious where this "say" came from?
>>
>>
>> $ perldoc say
>> No documentation found for "say".
>>
>> $ perldoc -f say
>> No documentation for perl function `say' found
>>
>> $ perldoc -q say
>> No documentation for perl FAQ keyword `say' found
>
> say() is a new feature of Perl version 5.10 so you won't see it until
> you upgrade.

Thanks. I had a feeling. I am currently building 5.10.0 right now as I 
type this :-)


-- 
szr 




------------------------------

Date: Sun, 16 Mar 2008 10:10:23 -0700 (PDT)
From: ptilderegex <ptilderegex@gmail.com>
Subject: Re: P~(ptilde) 0.9 released, new scripting language with novel regex
Message-Id: <a7248270-47ca-4bcc-8a94-be16073a6b8e@v3g2000hsc.googlegroups.com>

On Mar 15, 9:52=A0am, brian d  foy <brian.d....@gmail.com> wrote:
> In article
> <f8b25af9-ff95-40e3-8457-1ebec32ee...@d45g2000hsc.googlegroups.com>,
>
> ptilderegex <ptildere...@gmail.com> wrote:
> > > Perhaps you can list a couple of examples. People in this newsgroup
> > > love showing others how easy it is to get things done with regular
> > > expressions. :)
>
> > A simple example: lets say that in Perl you have a regex but you don't
> > know what it is. =A0Its held in a string passed by some function and
> > needs to be a parameter. =A0Now, you want to strip everything but what
> > matches each time. =A0Or better yet, output what does match to one
> > stream, and output what doesn't match to another (in one pass).
>
> It sounds like most of your problem has little to do with regular
> expressions and more to do with I/O management.
>
> =A0 =A0while( <$fh> )
> =A0 =A0 =A0 {
> =A0 =A0 =A0 if( m/$regex/ ) { print $out "$`$'"; print $out2 $& }
> =A0 =A0 =A0 else =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{ print $out }=

> =A0 =A0 =A0 }

The point of Ptilde is that you can do these complex stream
transformations of any kind at all in one regex pass.  What you've got
above is a while loop, not a single regex pass.


------------------------------

Date: 16 Mar 2008 10:03:47 GMT
From: Abigail <abigail@abigail.be>
Subject: Re: regular expression with split goes wrong ?
Message-Id: <slrnftps43.t4.abigail@alexandra.abigail.be>

                                                  _
Ben Bullock (benkasminbullock@gmail.com) wrote on VCCCXI September
MCMXCIII in <URL:news:fri5rc$9l4$1@ml.accsnet.ne.jp>:
[]  On Tue, 11 Mar 2008 21:46:38 +0000, Abigail wrote:
[]  
[] > If *both* the pattern *and* the subject (the string matched against) are
[] > not in UTF-8, then, and only then, does \D equal [^0-9].
[] > 
[] > However, if either of them is in UTF-8 format (which does not
[] > necessarely mean they contain a non-ASCII character), then \D excludes a
[] > lot more than just the digits 0 to 9.
[] > 
[] >   $ perl -wE 'chr =~ /[^0-9]/ or $c ++ for 0x00 .. 0xD7FF; say $c' 10
[] >   $ perl -wE 'chr =~ /\D/ or $c ++ for 0x00 .. 0xD7FF; say $c' 220
[]  
[]  You need to use (0x00 .. 0xD7FF, 0xE000 .. 0xFDCF, 0xFDF0.. 0xFFFD) here, 


Nah, all I needed to show was that there were more than 10. ;-)



Abigail
-- 
perl -weprint\<\<EOT\; -eJust -eanother -ePerl -eHacker -eEOT


------------------------------

Date: 16 Mar 2008 10:07:07 GMT
From: Abigail <abigail@abigail.be>
Subject: Re: Variables interpolated in character classes?
Message-Id: <slrnftpsab.t4.abigail@alexandra.abigail.be>

                                           _
Robbie Hatley (lonewolf@well.com) wrote on VCCCXI September MCMXCIII in
<URL:news:YdudnYodkIB0AUHanZ2dnUVZ_tCrnZ2d@giganews.com>:
`'  
`'  I just wrote a Perl program to linkify text files with http URLs,
`'  by generating an html file with the same content, but with the
`'  URLs imbeded in a and p elements.  Here's an edited-for-brevity
`'  version:
`'  
`'  # (snip code here for printing opening lines of HTML file)
`'  while (<>)
`'  {
`'     # regex for recognizing URLs:
`'     my $Regex = qr{(s?https?://[[:alnum:];/?:@=&#%$_.+!*'(),-]+)};
`'  
`'     # wrap URLs in "a" and "p" elements, and put them on their own lines:
`'     s{$Regex}{\n<p><a href="\1">\1</a></p>\n}g;
`'  
`'     # Print the edited line:
`'     print ($_);
`'  }
`'  # (snip code here for printing closing lines of HTML file)
`'  
`'  
`'  To my surprise, I was getting error messages like this:
`'  
`'     illegal [] range error i-b in "cgi-bin"
`'  
`'  Huh???  There's no "cgi-bin" in the regex!!!
`'  
`'  Then I realized, the regex contains "$_", which was embedding
`'  the entire line of text to be searched inside the regex!
`'  
`'  I had thought that character classes removed the special
`'  meanings of all characters, with the exception of:
`'    ^ (inverts class; but only when first char.)
`'    - (character range; but only if not first or last char.)
`'    \ (for escaping ^ and -)

It does.

However, interpolation goes first. So, first $_ is interpolated, then
any [] parsing is done. If it then finds a $, it's just a dollar sign.

`'  I got the program to work by replacing "$_" with "\$_",
`'  and by moving the declaration of $Regex to top of program
`'  to prevent having to recompile it every iteration.
`'  But I'm still puzzled as to why I have escape the $.
`'  Don't character classes prevent variable interpolation?

Nope.


Abigail
-- 
perl -we '$| = 1; $_ = "Just another Perl Hacker\n";  print
          substr  $_ => 0, 1 => "" while $_ && sleep 1 => 1'


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 1366
***************************************


home help back first fref pref prev next nref lref last post