[28863] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 107 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Feb 9 22:21:06 2007

Date: Fri, 9 Feb 2007 19:20:31 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 9 Feb 2007     Volume: 11 Number: 107

Today's topics:
    Re: HTML:Parser how to remove "//<![CDATA[ ... //]]>" ? sl123@netherlands.area
    Re: HTML:Parser how to remove "//<![CDATA[ ... //]]>" ? <uri@stemsystems.com>
    Re: HTML:Parser how to remove "//<![CDATA[ ... //]]>" ? <bik.mido@tiscalinet.it>
    Re: HTML:Parser how to remove "//<![CDATA[ ... //]]>" ? <spamtrap@dot-app.org>
    Re: HTML:Parser how to remove "//<![CDATA[ ... //]]>" ? <1usa@llenroc.ude.invalid>
        I need get many results from regular express <chinuy@gmail.com>
    Re: I need get many results from regular express <someone@example.com>
    Re: I need get many results from regular express <thepoet_nospam@arcor.de>
    Re: I need get many results from regular express <bik.mido@tiscalinet.it>
    Re: I need get many results from regular express <wahab-mail@gmx.de>
    Re: I need get many results from regular express <wahab-mail@gmx.de>
        indexing large collection of HTML files <woland99@gmail.com>
    Re: indexing large collection of HTML files xhoster@gmail.com
        Installation of Lingua:Stem fails -> Lingua-Stem-Snowba <g.h.vandoorn@gmail.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sun, 04 Feb 2007 00:30:51 -0800
From: sl123@netherlands.area
Subject: Re: HTML:Parser how to remove "//<![CDATA[ ... //]]>" ?
Message-Id: <ep1bs2l6e7k3aupj46d4s079o7g3rqqkj4@4ax.com>

On Fri, 02 Feb 2007 03:03:40 -0500, Uri Guttman <uri@stemsystems.com> wrote:

>>>>>> "s" == sl123  <sl123@netherlands.area> writes:
>
>  s> On Fri, 02 Feb 2007 00:10:46 -0500, Uri Guttman <uri@stemsystems.com> wrote:
>  >>>>>>> "s" == sl123  <sl123@netherlands.area> writes:
>  >> 
>  >> 
>  s> If you are stream processing and as you say spread over more than 1 line,
>  s> use ROBIC0's approach, buffer untill you have complete comment or cdata:
>  >> 
>  >> ok, the first time you backed his code i asked you if it was a
>  >> joke. obviously you think it is real code. so i will shred this crap and
>  >> hopefully you will see why it is bad code.
>  >> 
>  s> $RxParseXP1 = qr/(?:<(?:\[CDATA\[(.*?)\]\])|(?:--(.*?[^-])--)>)/s;
>  s> #                (  <(           0   0    )|(    1       1  ) )
>  >> 
>  >> why no /x modifier? long and complex regexes always should use /x.
>
>  s> Not sure its relavent here
>
>then you don't know what /x is for.
>
>  >> 
>  s> while (!$done)
>  >> 
>  >> loop flags like that are very silly and kiddie code. 
>
>  s> perhaps a goto is something more to your liking?
>
>no. better code is to my liking. i can't recall needing a loop flag in
>perl. that is more like basic. with all the long/deep nested if/elses
>you can't track the logic at all.
>
>  s> # stream processing (if not buffered)
>  s> if (!$BUFFERED) {
>  >> 
>  >> upper case variable names are for constants and such. notice he likes !
>  >> all over the place. better to use unless and until.
>
>  s> From looking at code BUFFERED is a very important
>
>important? one boolean test does not make something important.
>
>  >> 
>  s> if (!($_ = <$markup_file>)) {
>  >> 
>  >> that is just bad. use a named lexical instead of $_. what if the last
>  >> line of the file was just '0' with no newline? that fails.
>
>  s> I don't think this is true
>
>then you don't know perl. sorry.
>
>  >> 
>  s> # just parse what we have
>  s> $done = 1;
>  >> 
>  >> just bad. really bad to use flags like that. you don't even know what
>  >> more code might execute here because the ridiculously long if/else blocks.
>
>  s> huh?
>
>forget it. if you think loop flags are good (especially when done like
>this) then please don't code in perl. 
>
>  >> 
>  s> # boundry check for runnaway
>  s> if (($complete_comment+$complete_cdata) > 0) {
>  >> 
>  >> huh??? is he testing 2 flags for either one being set? has the idiot
>  >> ever heard of a boolean or??
>
>  s> Checking this, the addition and comparison here is half the
>  s> assembly instruction cycles than to do a double comarison with two
>  s> jmp's. Could be a performance thing
>
>assembly? what does that have to do with perl? perl is compiled to an
>internal form which is interpreted. actually that code is slower in perl
>than a boolean test for several reasons. but you won't understand them
>so i won't cover it.
>
>  >> 
>  s> $ln_cnt--;
>  >> we lose lines too??
>
>  s> what is this variable? have you checked?
>
>looks line line count abbreviated. if it isn't, it is named poorly. if
>it is, decrementing a line count makes no sense. but you may understand
>it. i won't delve into the logic as i said the code is too bad to
>bother.
>
>  >> but what about the done flag?? don't mix loop flags with flow control
>  >> ops. just dumb. pick one style.
>
>  s> It looks like he uses the fall-through method, avoiding the extra
>  s> machine cycles.  I think the $done flag is set elsewhere as well
>
>you again don't understand perl and machine cycles. perl HAS NO MACHINE
>CYCLES. it has operations in an op loop. you don't optimize perl that
>way. and this code could be optimized, simplified and made much better
>with a decent design without the stupid loop flag and all those
>if/elses.
>
>would you believe i have a 10k line perl system with about 25 else
>clauses in total? and it is very clean and readable code throughout. not
>hard to do at all if you know perl and coding. eschew else is my new motto.
>
>  s> ## flag serialized comments/cdata buffering
>  s> if (/(<!--)|(<!\[CDATA\[)/)
>  >> 
>  >> more complex regexes that need /x
>
>  s> not sure its relavent for this, could be right, though with no effect
>
>effect? what are you babbling about? /x HAS NO EFFECT on regexes. it is
>not meant to have any effect (actually it does on the syntax but not
>worth covering here). it is meant for CLARITY. but the author knows not
>of that.
>
>  >> the nesting here is getting very deep. a sign of someone who loves
>  >> if/else too much. a cleaner design would be simpler, easier to read and
>  >> understand.
>
>  s> The if/then/else construct can't be avoided. The machining can be
>  s> trimmed.  He did a good job thinning with single if's comparisons
>  s> to make a path back.  I don't see any faster constructs than what
>  s> he's got
>
>it can easily be simplfied. just a poor design requires all that
>if/else stuff. read what i said above with eschew else. this code is
>done in basic style.
>
>  >> 
>  s> }
>  s> }
>  s> elsif (defined $2) { # complete cdata
>  >> 
>  >> why check those two separately but match in one regex? just test one
>  >> pattern and handle it or test the other. this is bullshit code. way
>  >> longer and more complex than it needs to be.
>
>  s> Yea its either one or the other. Did he expect more?
> 
>huh?? i was suggesting how to clean up that mess. match and test one,
>then match and test the other. the way he did it is longer, slower,
>clunkier. 
>
>  >> 
>  s> if ($$ref_parse_ln !~
>  s> /<!\[CDATA\[.*?\]\]>/s)
>  s> {
>  >> 
>  >> he uses ! all over and here he switches to !~ which is rarely used.
>
>  s> Looks like he positively wanted to know if that was case. Looks alright
>
>ok, you have blinders on. i give up.
>
>  >> can you even tell what this is the else for? it scrolled off my screen.
>
>  s> what "else" are you looking at. Apparently, />/ is like a period at
>  s> the end of a sentence. Otherwise better to not do anything at the
>  s> termination of this block or will be trouble, you are in the middle
>  s> of a sentence
>
>you don't get it. 
>
> 
>  >> 
>  s> } else {
>  s> $ln_cnt = 1;
>  s> $done = 1; 
>  >> 
>  >> now we're done? do we exit the loop here? who knows? i have to scroll up
>  >> and find the if and see it that falls through or what??
>
>  s> To me it looks like if its streamed (not BUFFERED) he is waiting
>  s> for a complete sentence with the />/ to pass through to the formal
>  s> parser below this. If it is BUFFERED, means the complete file is
>  s> passed to the formal parser. Streaming, it looks like he waits for
>  s> a complete sentence, parses, goes back up top, gets another, etc...
>
>insane. you can merge both into one flow with no troubles at all. done
>all the time in parsers. you make the input an iterator or sub that
>works on the stream or the text. reduces his code by half as there is no
>need for such a long if/else block.
>
>  s> When you have it buffered from above (or already buffered), parse it:
>  >> 
>  s> ## REGEX Parsing loop
>  s> while ($$ref_parse_ln =~ /$RxParseXP1/g)
>  s> {
>  s> ## CDATA
>  s> if (defined $0) {
>  >> 
>  >> oh that is wonderful. a true bug by the way. $0 is the name of the
>  >> program and not a grabbed match. not that he does any work in this if
>
>  s> My mistake, I transposed his variabes, should have been $1, $2
>
>huh? what transpose? this is your code or his? 
>  >> 
>  s> }
>  s> ## COMMENT
>  s> elsif (defined $1) {
>  >> 
>  >> again, an empty else clause. wo what was the purpose of this if/else?
>  >> showing off more bad code is my guess.
>
>  s> I just meant to condense his code for an example. The unfilled block
>  s> is left as an exercise
>
>i prefer to read real code, not empty clauses.
>
>
>  s> He's probably psycotic or genious, I don't know of him. The code
>  s> looks good though.  Its possible that the "order" of the parse
>  s> regex is important. I think it is.  Here:
>
>i will bet the house on psychotic. look at his posting history
>here. rants and drools and flames all over the place. no one here
>respects his code at all.
>
>$done = 1 ;
>
>uri

Hello again uri!

Finally able to get back and read some posts.
I just have a few comments on your reply.

First off, I'm no Perl expert so I didn't judge his code in that.
I have used it with sucess at work and its not bad for me.
I had to add to his handler setup and a couple of other things.
Overall though the core parse doesen't show conceptual errors,
but is'nt compatible sometimes for old html.

You mention dislike for "else" clause, especial when nesting
and only have 25 of in 10k code lines. If one can avoid an else
with a return in the conditional block (or continue) it should be done,
no question. But you may want to re-think if it is believed that
nested code is being touched every time. No matter how constructs
tabularized through indirection (compiled) the end result is machine
jump code being executed. Wether its relative (conditional) or absolute jump.
Your code paths are absent of jumps? Conditional jumps are the fastest
instruction per effectiveness a cpu can do.

I don't know of Perl "op" loops you talk about. I would have to imagine
them being the translation table into an execution processor (interpreter)
creating code pages for the processor. That again produces a binary based
on the "op" codes of the underlying processor, when executed.
To me, an "op code", from back in my dissasembly days, is the character
translation of the assembly major "operation" into its binary code from the
table.

Perl provides a C like language construct. I hope its a close proximity in its
constructs in performance, otherwise it should not be.

--

Finally, Perl can't do pointers that I know of. Pointers offer more
granularity in code design when it comes to parsers, for instance, xml
is highly controlled by escape codes, albeit ascii ones. If one could
write code on that level, jump if >,!=,<,== based on the result in the
accumulator, the its the fastest possible. But a string comparison is
really out of the question.

Appaently in C, you can parse xml/html, using a more granular approach
with pointers and bit-mapped state variables, as it pertains to control
characters. This however is not possible in Perl. As well I don't see
pointer arithmatic being available.

To me this guy looks like did ok. You could probably make his code
more efficient. Sounds like alot of folks here don't like him much



------------------------------

Date: Sun, 04 Feb 2007 16:02:17 -0500
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: HTML:Parser how to remove "//<![CDATA[ ... //]]>" ?
Message-Id: <x7y7ndd4bq.fsf@mail.sysarch.com>

>>>>> "s" == sl123  <sl123@netherlands.area> writes:


  s> First off, I'm no Perl expert so I didn't judge his code in that.
  s> I have used it with sucess at work and its not bad for me.
  s> I had to add to his handler setup and a couple of other things.
  s> Overall though the core parse doesen't show conceptual errors,
  s> but is'nt compatible sometimes for old html.

  s> You mention dislike for "else" clause, especial when nesting and
  s> only have 25 of in 10k code lines. If one can avoid an else with a
  s> return in the conditional block (or continue) it should be done, no
  s> question. But you may want to re-think if it is believed that
  s> nested code is being touched every time. No matter how constructs
  s> tabularized through indirection (compiled) the end result is
  s> machine jump code being executed. Wether its relative (conditional)
  s> or absolute jump.  Your code paths are absent of jumps? Conditional
  s> jumps are the fastest instruction per effectiveness a cpu can do.


you are very lost here. machine code and perl execution of conditionals
have ABSOLUTELY nothing in common. the layers separating perl from
machine code are deep and nasty. you obviously don't know about how
interpreters are written. please stop this illogical line you seem to
think matters. the key bottleneck in perl is the op code dispatch loop
and not any particular machine instruction. you don't see to realize how
loops/branches are done in perl and that they are about the same speed
as most builtin simple ops. the machine jumps are not even on the radar
at that level. please study some interpreter designs and learn about
them. this machine language babble of your is so off the mark it is not
funny.


  s> I don't know of Perl "op" loops you talk about. I would have to
  s> imagine them being the translation table into an execution
  s> processor (interpreter) creating code pages for the processor. That
  s> again produces a binary based on the "op" codes of the underlying
  s> processor, when executed.  To me, an "op code", from back in my
  s> dissasembly days, is the character translation of the assembly
  s> major "operation" into its binary code from the table.

no no no no. since you don't know about perl's guts why do you insist on
talking about them at a machine level? the machine is several levels
down from perl's source and you could never tell what machine code is
being execute for any perl op. and as i keep telling you perl ops are
way larger in cpu usage than ant single machine instruction. your
assembler background is useless in understanding perl optimization.

  s> Perl provides a C like language construct. I hope its a close
  s> proximity in its constructs in performance, otherwise it should not
  s> be.

it has no closeness to the metal at all. you don't understand
interpreter design at all. except for a few special cases which do some
JIT code generation, none generate any machine code directly. and even
those that do are not at the same level as hand written c.

  s> Finally, Perl can't do pointers that I know of. Pointers offer more
  s> granularity in code design when it comes to parsers, for instance, xml
  s> is highly controlled by escape codes, albeit ascii ones. If one could
  s> write code on that level, jump if >,!=,<,== based on the result in the
  s> accumulator, the its the fastest possible. But a string comparison is
  s> really out of the question.

again, you don't know what you are talking about. perl has references
and can do most anything with them that c could except for pointer math
and that perl's references are safer and can't cause core dumps. i will
take refs over pointers any day.

  s> Appaently in C, you can parse xml/html, using a more granular approach
  s> with pointers and bit-mapped state variables, as it pertains to control
  s> characters. This however is not possible in Perl. As well I don't see
  s> pointer arithmatic being available.

huh??? you are making no sense. there are many ways to parse
anything. the issue is that robic's parser is BAD and BUGGY perl. it may
work for your special cases but in the general sense it is broken in
many ways. it could be optimized in many ways, cleaned up, whatever but
it is crappy code. you are probably the only user of it besides its
psychotic author. just wait until you try to communicate with him for a
bug fix or improvement

  s> To me this guy looks like did ok. You could probably make his code
  s> more efficient. Sounds like alot of folks here don't like him much

your approval means little. you have not shown any understanding of good
perl code, perl optimization, interpreter design, parser design,
etc. this means you are not experienced enough to properly give a
professional opinion on that module. as for people not liking him,
please google for his past posting and you will see why. as for his
code, i am not the only one who thinks it is crap. you are the only one
who likes it. think about that.

uri


-- 
Uri Guttman  ------  uri@stemsystems.com  -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org


------------------------------

Date: Sun, 04 Feb 2007 22:23:17 +0100
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: HTML:Parser how to remove "//<![CDATA[ ... //]]>" ?
Message-Id: <ujjcs213679u04oqpcgtb19ja14onadidt@4ax.com>

On Sun, 04 Feb 2007 00:30:51 -0800, sl123@netherlands.area wrote:

>To me this guy looks like did ok. You could probably make his code
>more efficient. Sounds like alot of folks here don't like him much

Nobody likes robic0! (See <http://www.snpp.com/episodes/4F01.html>.)


Michele
-- 
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
 .'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,


------------------------------

Date: Sun, 04 Feb 2007 16:54:28 -0500
From: Sherm Pendley <spamtrap@dot-app.org>
Subject: Re: HTML:Parser how to remove "//<![CDATA[ ... //]]>" ?
Message-Id: <m2veih8u7f.fsf@Sherm-Pendleys-Computer.local>

sl123@netherlands.area writes:

> Hello again uri!
>
> Finally able to get back and read some posts.
> I just have a few comments on your reply.
>
> First off, I'm no Perl expert

Then why are you arguing with someone who *is* an expert?

> I have used it with sucess at work and its not bad for me.

The fact that you don't understand the many, many problems in Robic0's
code doesn't make those problems disappear.

> To me this guy looks like did ok.

That doesn't indicate that the code is OK, it indicates that you need to
learn more Perl, so you can understand why it's *not* good code.

> Sounds like alot of folks here don't like him much

What do you expect? He kept posting crap code, irrelevant comments about
the size of his "manhood" and drinking capacity, and he had a habit of
posting thousands of lines of pure gibberish - why on Earth *would* we
like someone who behaved like that?

Anyway, so what? Even if he were everyone's best buddy, that wouldn't make
his code any better.

sherm--

-- 
Web Hosting by West Virginians, for West Virginians: http://wv-www.net
Cocoa programming in Perl: http://camelbones.sourceforge.net


------------------------------

Date: Sun, 04 Feb 2007 22:51:48 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: HTML:Parser how to remove "//<![CDATA[ ... //]]>" ?
Message-Id: <Xns98CDB5FC02C50asu1cornelledu@127.0.0.1>

sl123@netherlands.area wrote in
news:ep1bs2l6e7k3aupj46d4s079o7g3rqqkj4@4ax.com: 

[ snip tribute to robic0 ]

>  First off, I'm no Perl expert

More like you don't know the first thing about programming anything.

> ...
> I have used it with sucess at work and its not bad for me.

I pity your employer or the poor sod who will have to maintain the crap 
you wrote.

 ...

> To me this guy looks like did ok.

*PLONK*
-- 
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html



------------------------------

Date: 5 Feb 2007 07:14:40 -0800
From: "inuy" <chinuy@gmail.com>
Subject: I need get many results from regular express
Message-Id: <1170688480.004472.147310@p10g2000cwp.googlegroups.com>

Hello ,

I have a regular express like this:
(...)(..)\s*(.)(...)....(\d*)....

And I want to store some information from it , I can use $data[1] =
$1;$data[2]=$2; $data[3]=$3 an so on.
But if there are more than ten data I need to store and I don't like
to write them by hand , I hope to store them by for loop or what else.

So I write this "wrong" code:

/(...)(..)\s*(.)(...)....(\d*)..../;
for $i (  1..10 ){
  $data[$i] = ${$i};
}

Obviously , it doesn't work.
Could somebody knows how to deal with this kind of problem?Thank you.



------------------------------

Date: Mon, 05 Feb 2007 15:39:20 GMT
From: "John W. Krahn" <someone@example.com>
Subject: Re: I need get many results from regular express
Message-Id: <IcIxh.37057$Y6.1465@edtnps89>

inuy wrote:
> 
> I have a regular express like this:
> (...)(..)\s*(.)(...)....(\d*)....
> 
> And I want to store some information from it , I can use $data[1] =
> $1;$data[2]=$2; $data[3]=$3 an so on.
> But if there are more than ten data I need to store and I don't like
> to write them by hand , I hope to store them by for loop or what else.
> 
> So I write this "wrong" code:
> 
> /(...)(..)\s*(.)(...)....(\d*)..../;
> for $i (  1..10 ){
>   $data[$i] = ${$i};
> }
> 
> Obviously , it doesn't work.
> Could somebody knows how to deal with this kind of problem?Thank you.

my @data = /(...)(..)\s*(.)(...)....(\d*)..../;



John
-- 
Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order.       -- Larry Wall


------------------------------

Date: Mon, 05 Feb 2007 16:39:54 +0100
From: Christian Winter <thepoet_nospam@arcor.de>
Subject: Re: I need get many results from regular express
Message-Id: <45c74fca$0$27616$9b4e6d93@newsspool2.arcor-online.net>

inuy wrote:
> Hello ,
> 
> I have a regular express like this:
> (...)(..)\s*(.)(...)....(\d*)....
> 
> And I want to store some information from it , I can use $data[1] =
> $1;$data[2]=$2; $data[3]=$3 an so on.
> But if there are more than ten data I need to store and I don't like
> to write them by hand , I hope to store them by for loop or what else.
> 
> So I write this "wrong" code:
> 
> /(...)(..)\s*(.)(...)....(\d*)..../;
> for $i (  1..10 ){
>   $data[$i] = ${$i};
> }
> 
> Obviously , it doesn't work.
> Could somebody knows how to deal with this kind of problem?Thank you.

You can simply capture the groups of the pattern match in an array like

my $a = "this string does match, at least partly";
my @hits = $a =~ /(string)\s\S+\s(\w+)/;
print $_.$/ for( @hits );

This is documenented in "perldoc perlop", under the headline of
"Regexp Quote-Like Operators":
[snip]
"m//" in list context returns a
list consisting of the subexpressions matched by the parentheses
in the pattern, i.e., ($1, $2, $3...). (Note that here $1 etc.
are also set, and that this differs from Perl 4's behavior.)
[snap]

-Chris


------------------------------

Date: Mon, 05 Feb 2007 17:33:31 +0100
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: I need get many results from regular express
Message-Id: <lsmes2ltij5e10s121nn757ievu4g541b1@4ax.com>

On 5 Feb 2007 07:14:40 -0800, "inuy" <chinuy@gmail.com> wrote:

>/(...)(..)\s*(.)(...)....(\d*)..../;
>for $i (  1..10 ){
>  $data[$i] = ${$i};
>}
>
>Obviously , it doesn't work.

Besides the fact that it's horrible, and that others rightfully
explained to you how to use the return value of a match in list
context, it *should* work, if you're not under strict 'refs'. Of
course you must *always* be under strict 'refs', except when it's
really necessary not to, which certainly is not the case here.


Michele
-- 
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
 .'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,


------------------------------

Date: Mon, 05 Feb 2007 22:26:36 +0100
From: Mirco Wahab <wahab-mail@gmx.de>
Subject: Re: I need get many results from regular express
Message-Id: <eq87oc$hsd$1@mlucom4.urz.uni-halle.de>

inuy wrote:
> I have a regular express like this:
> (...)(..)\s*(.)(...)....(\d*)....
> 
> And I want to store some information from it , I can use $data[1] =
> $1;$data[2]=$2; $data[3]=$3 an so on.
> But if there are more than ten data I need to store and I don't like
> to write them by hand , I hope to store them by for loop or what else.
> 
> So I write this "wrong" code:
> 
> /(...)(..)\s*(.)(...)....(\d*)..../;
> for $i (  1..10 ){
>   $data[$i] = ${$i};
> }

You got lots of correct advice
already - how to pull the results
into an array.

In the end, there's the answer
left how your wrong approach
*would* have had worked at least ;-)

My shot at it:

  ...
  use strict;

  my $str=' 11 22 33 44 55 66 77 88 99 110 111 112 113 114 115 116 ';
  my $reg='(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+'
         .'(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+';

  if( $str =~ /$reg/ ) {
    for my $i (1 .. @--1) {
       no strict 'refs';
       print "\$$i", " => ", ${"$i"}, "\n"
    }
  }
  ...


But thats only for educational purpose.
Don't use things like that ...

Regards

Mirco


------------------------------

Date: Mon, 05 Feb 2007 22:21:27 +0100
From: Mirco Wahab <wahab-mail@gmx.de>
Subject: Re: I need get many results from regular express
Message-Id: <eq88ka$i7b$1@mlucom4.urz.uni-halle.de>

inuy wrote:
> I have a regular express like this:
> (...)(..)\s*(.)(...)....(\d*)....
> And I want to store some information from it , I can use $data[1] =
> $1;$data[2]=$2; $data[3]=$3 an so on.
> But if there are more than ten data I need to store and I don't like
> to write them by hand , I hope to store them by for loop or what else.
> 
> /(...)(..)\s*(.)(...)....(\d*)..../;
> for $i (  1..10 ){
>   $data[$i] = ${$i};
> }
> Obviously , it doesn't work.

You got lots of correct advice
already - how to pull the results
into an array.

In the end, there's the answer left
how your wrong approach *would* have
had worked under 'strict' at least ;-)

My shot at it:


  ...
  use strict;

  my $str=' 11 22 33 44 55 66 77 88 99 110 111 112 113 114 115 116 ';
  my $reg='(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+'
         .'(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+(\d+)\D+';

  my @data;

  if( $str =~ /$reg/ ) {
    for my $i (1..@--1) {
       $data[$i] = substr $str,$-[$i],$+[$i]-$-[$i];
       print "\$$i => $data[$i]\n"
    }
  }
  ...


But thats only for educational purpose.
Don't use things like that ...

To find out what the @- and @+ ($-[..], $+[..])
things do (in substr $str,$-[$i],$+[$i]-$-[$i]),
check out LAST_MATCH_END and LAST_MATCH_START
from 'perldoc perlvar'.

Regards

Mirco


------------------------------

Date: 6 Feb 2007 08:47:00 -0800
From: "Woland99" <woland99@gmail.com>
Subject: indexing large collection of HTML files
Message-Id: <1170780420.481990.91740@p10g2000cwp.googlegroups.com>

Howdy - is there a module I can use to index collection of about 3000
HTML files?
Sorry if that is trivial question - I just have no experience with
creating indexes -
even some relevant keywords would be helpful.

JT



------------------------------

Date: 06 Feb 2007 17:21:25 GMT
From: xhoster@gmail.com
Subject: Re: indexing large collection of HTML files
Message-Id: <20070206122218.627$aY@newsreader.com>

"Woland99" <woland99@gmail.com> wrote:
> Howdy - is there a module I can use to index collection of about 3000
> HTML files?
> Sorry if that is trivial question - I just have no experience with
> creating indexes -
> even some relevant keywords would be helpful.

Have you looked at the HTML::Index module?

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service                        $9.95/Month 30GB


------------------------------

Date: 6 Feb 2007 03:55:28 -0800
From: "Gerwin" <g.h.vandoorn@gmail.com>
Subject: Installation of Lingua:Stem fails -> Lingua-Stem-Snowball-Da already installed...
Message-Id: <1170762928.096709.48250@v45g2000cwv.googlegroups.com>

When i try to install Lingua:Stem i get the following message (and
error):

Note that i do not have Lingua-Stem-Snowball-Da-1.01 installed and i
can not find any of the modules in the site directory after this error
occurs. using --force does not help. Does anybody know how to install
this?

Downloading Lingua-Stem-0.82...done
Downloading Lingua-Stem-It-0.01...done
Downloading Lingua-Stem-Ru-0.01...done
Downloading Lingua-Stem-Snowball-Da-1.01...done
Downloading Lingua-PT-Stemmer-0.01...done
Downloading Snowball-Swedish-1.01...done
Downloading Lingua-Stem-Fr-0.02...done
Downloading Text-German-0.06...done
Downloading Snowball-Norwegian-1.0...done
Unpacking Lingua-Stem-0.82...done
Unpacking Lingua-Stem-It-0.01...done
Unpacking Lingua-Stem-Ru-0.01...done
Unpacking Lingua-Stem-Snowball-Da-1.01...done
Unpacking Lingua-PT-Stemmer-0.01...done
Unpacking Snowball-Swedish-1.01...done
Unpacking Lingua-Stem-Fr-0.02...done
Unpacking Text-German-0.06...done
Unpacking Snowball-Norwegian-1.0...done
Generating HTML for Lingua-Stem-0.82...done
Generating HTML for Lingua-Stem-It-0.01...done
Generating HTML for Lingua-Stem-Ru-0.01...done
Generating HTML for Lingua-Stem-Snowball-Da-1.01...done
Generating HTML for Lingua-PT-Stemmer-0.01...done
Generating HTML for Snowball-Swedish-1.01...done
Generating HTML for Lingua-Stem-Fr-0.02...done
Generating HTML for Text-German-0.06...done
Generating HTML for Snowball-Norwegian-1.0...done
Updating files in site area...failed
ppm install failed: File conflict for 'C:/Perl/site/lib/Lingua/Stem/
Snowball/ste
mmer.pl'.
    The package Lingua-Stem-Snowball-Da has already installed a file
that packag
e Snowball-Swedish wants to install. Uninstall Lingua-Stem-Snowball-
Da, or use --force to allow
 files to be overwritten.



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 107
**************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[28863] in Perl-Users-Digest

Perl-Users Digest, Issue: 107 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Fri Feb 9 22:21:06 2007

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Feb 9 22:21:06 2007