[30769] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 2014 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Nov 28 21:14:46 2008

Date: Fri, 28 Nov 2008 18:14:24 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 28 Nov 2008     Volume: 11 Number: 2014

Today's topics:
    Re: FAQ 9.4 How do I remove HTML from a string? sln@netherlands.com
    Re: FAQ 9.4 How do I remove HTML from a string? <tim@burlyhost.com>
    Re: FAQ 9.4 How do I remove HTML from a string? sln@netherlands.com
    Re: FAQ 9.4 How do I remove HTML from a string? sln@netherlands.com
    Re: FAQ 9.4 How do I remove HTML from a string? sln@netherlands.com
    Re: FAQ 9.4 How do I remove HTML from a string? <tim@burlyhost.com>
    Re: FAQ 9.4 How do I remove HTML from a string? sln@netherlands.com
    Re: FAQ 9.4 How do I remove HTML from a string? <tim@burlyhost.com>
    Re: FAQ 9.4 How do I remove HTML from a string? <tim@burlyhost.com>
    Re: FAQ 9.4 How do I remove HTML from a string? <tim@burlyhost.com>
    Re: FAQ 9.4 How do I remove HTML from a string? sln@netherlands.com
    Re: FAQ 9.4 How do I remove HTML from a string? <tim@burlyhost.com>
    Re: FAQ 9.4 How do I remove HTML from a string? sln@netherlands.com
    Re: FAQ 9.4 How do I remove HTML from a string? <tim@burlyhost.com>
    Re: FAQ 9.4 How do I remove HTML from a string? sln@netherlands.com
    Re: FAQ 9.4 How do I remove HTML from a string? <jurgenex@hotmail.com>
    Re: FAQ 9.4 How do I remove HTML from a string? sln@netherlands.com
    Re: FAQ 9.4 How do I remove HTML from a string? <tim@burlyhost.com>
    Re: FAQ 9.4 How do I remove HTML from a string? sln@netherlands.com
    Re: FAQ 9.4 How do I remove HTML from a string? sln@netherlands.com
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 28 Nov 2008 22:25:04 GMT
From: sln@netherlands.com
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <hin0j4dknpfsa9r7vtm985udkh75qmk8nm@4ax.com>

On Thu, 27 Nov 2008 00:03:01 -0800, PerlFAQ Server <brian@stonehenge.com> wrote:

>This is an excerpt from the latest version perlfaq9.pod, which
>comes with the standard Perl distribution. These postings aim to 
>reduce the number of repeated questions as well as allow the community
>to review and update the answers. The latest version of the complete
>perlfaq is at http://faq.perl.org .
>
>--------------------------------------------------------------------
>
>9.4: How do I remove HTML from a string?
                ^^
What does this really mean anyway? Surely, html is a string to be removed
it must be. Is HTML a whole entity? Where does it begin and end?
A misnomer at best. Please re-phrase this. Giant 'HTML' is daunting enough.
Are you trying to scare people? I say 'YES', buy why is that?
HTML is apparently unformatted and wide open. What is markup then, delimeters?
If its delimeters, then it has nothing to do with Perl, because Perl can't do
delimeters because Perl is not consistent with strings, tic's or magical stuff.
Perl is a little too afraid of HTML. It cowers it its presence, just a pussy.
Perl then, should not be considered as anything usefull at all in the face of
the simplest string tasks. Just a whimp, a lightweight, it won't survive the next
5 years. Perl's first duty was to tack on an interface to C library's, because
it is nothing without them. Man-up Perl or gtfo.. why even use it then?
At best its a prototype test language, certainly not for production code.
>
>    The most correct way (albeit not the fastest) is to use HTML::Parser
         ^^^^^^^^^^^^
I really like those two words together. More fear tactics.

>    from CPAN. Another mostly correct way is to use HTML::FormatText which
                        ^^^^^^^^^^^^^^
Slightly less fearfull expression.

>    not only removes HTML but also attempts to do a little simple formatting
>    of the resulting plain text.
>
>    Many folks attempt a simple-minded regular expression approach, like
                          ^^^^^^^^^^^^^
Not just a 'simple' approach, but a dumb-ass approach. Now its fear and stupid-
ass, can it get any worse? I think the stupid-ass should come before the fear.
Insult the reader from the outset, don't hide it until he's quiverring in fear.

>    "s/<.*?>//g", but that fails in many cases because the tags may continue
>    over line breaks,

Thats good to know, gee when are regular expressions going to be able to process
past line breaks?

> they may contain quoted angle-brackets, or HTML
>    comment may be present. Plus, folks forget to convert entities--like
>    "&lt;" for example.

In 25 or so words or so you managed to hi-jack one quarter of the specification
buzz words without actually know anything you just said.
>
>    Here's one "simple-minded" approach, that works for most files:
                 ^^^^^^^^^^^^^
More attacks. If this is one, where are the others?
Bring out the really bad stuff.
>
>        #!/usr/bin/perl -p0777
>        s/<(?:[^>'"]*|(['"]).*?\1)*>//gs
>
>    If you want a more complete solution,

This is lovely, who is the consumer here? Some dumb-ass?

> see the 3-stage striphtml program
>    in http://www.cpan.org/authors/Tom_Christiansen/scripts/striphtml.gz .
>
>    Here are some tricky cases that you should think about when picking a
>    solution:

Unfortunately, you have not thought about it. What solution are you selling?
One some dumb-ass can't do with regular expressions I bet.

>
>        <IMG SRC = "foo.gif" ALT = "A > B">

Sorry, on my regular expression meter this gives me:
 IMG
    SRC = foo.gif
    ALT = A > B
>
>        <IMG SRC = "foo.gif"
>             ALT = "A > B">
>
So does this
 IMG
    SRC = foo.gif
    ALT = A > B

>        <!-- <A comment> -->

Yeah, this is easy
 <A comment> 
>
>        <script>if (a<b && a>c)</script>
         ^^^^^^^^
outside of a comment, or encapsulated <![ conditional or CDATA
this gives me a script tag.
>        <script>if (a<b && a>c)</script>
                                ^^^^^^^^^
So does this.

>        <script>if (a<b && a>c)</script>
                      ^^^^^^^^
This however, is not markup. Its dependency is on 'script'.
Not very dileimeter-friendly. Which means you actually have to
act on a tag keyword. But if the closed keyword is absent?
Oops, sorry, the rest of the text is script.
Its just not squash.

>
>        <# Just data #>

Just character data.

>
>        <![INCLUDE CDATA [ >>>>>>>>>>>> ]]>

Ahh, now this is usually XML and in the DTD, its a conditional, not CDATA.
Funny how you blast 'CDATA' in the string. Who was this line meant for?
>
>    If HTML comments include other tags, those solutions would also break on
>    text like this:
>
>        <!-- This section commented out.
>            <B>You can't see me!</B>
>        -->
Easy.
 This section commented out.
    <B>You can't see me!</B>
 
>
In conclusion, not only do I have the regular expressions to easily do all of this,
I in no way endorse your gratuituse attitude to some other bullshit agenda (that even
you don't know), nor rudeness to the reader.

Please expunge the FAQ of these types of posts. You have no idea what you are talking
about.
>
[snip]
>If you'd like to help maintain the perlfaq, see the details in 
>perlfaq.pod.


Otherwise, love the FAQ's BDF...

sln



------------------------------

Date: Fri, 28 Nov 2008 15:04:01 -0800
From: Tim Greer <tim@burlyhost.com>
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <CN_Xk.1949$yB4.1567@newsfe07.iad>

sln@netherlands.com wrote:

> Please expunge the FAQ of these types of posts. You have no idea what
> you are talking about.
>>
> [snip]
>>If you'd like to help maintain the perlfaq, see the details in
>>perlfaq.pod.
> 

Your post sounds like you don't like Perl.  Why are you here then?  FYI,
I'm pretty sure the guys at stonehenge have _some_ idea of what they
are talking about.  If you have a better suggestion for this particular
FAQ, why don't you make said suggestion?

PS: I can't tell if you're trying to be sarcastic, but Perl isn't going
anywhere any time soon.
-- 
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!


------------------------------

Date: Fri, 28 Nov 2008 23:39:27 GMT
From: sln@netherlands.com
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <d8v0j41a9fdjviru4tala061rr21ofkiv8@4ax.com>

On Fri, 28 Nov 2008 15:04:01 -0800, Tim Greer <tim@burlyhost.com> wrote:

>sln@netherlands.com wrote:
>
>> Please expunge the FAQ of these types of posts. You have no idea what
>> you are talking about.
>>>
>> [snip]
>>>If you'd like to help maintain the perlfaq, see the details in
>>>perlfaq.pod.
>> 
>
>Your post sounds like you don't like Perl.  Why are you here then?  FYI,
>I'm pretty sure the guys at stonehenge have _some_ idea of what they
>are talking about.  If you have a better suggestion for this particular
>FAQ, why don't you make said suggestion?
>
>PS: I can't tell if you're trying to be sarcastic, but Perl isn't going
>anywhere any time soon.

It might be if the supporters ways are not changed.
Before C, back to B  (was there A?), I did it. What comes after direct
C parsed assembly? Not very much. What will ever come? Not until no 
registers. Will Perl fall off the face of the earth. Why, yes. Yes
it will. Does regular expressions mean anything more than a prototype
to C? No, no it doesen't. Does that mean something you can do in Perl
is useless because it can be done much faster in C?

Then is Perl itself useless, with no advantage?

One thing for sure, you can see what is happening in Perl can't you?

  @UC_Nstart = (
    "\\x{C0}-\\x{D6}",
    "\\x{D8}-\\x{F6}",
    "\\x{F8}-\\x{2FF}",
    "\\x{370}-\\x{37D}",
    "\\x{37F}-\\x{1FFF}",
    "\\x{200C}-\\x{200D}",
    "\\x{2070}-\\x{218F}",
    "\\x{2C00}-\\x{2FEF}",
    "\\x{3001}-\\x{D7FF}",
    "\\x{F900}-\\x{FDCF}",
    "\\x{FDF0}-\\x{FFFD}",
    "\\x{10000}-\\x{EFFFF}",
  ); 
  @UC_Nchar = (
    "\\x{B7}",
    "\\x{0300}-\\x{036F}",
    "\\x{203F}-\\x{2040}",
  );
  $Nstrt = "[A-Za-z_:".join ('',@UC_Nstart)."]";
  $Nchar = "[-\\w:\\.".join ('',@UC_Nchar).join ('',@UC_Nstart)."]";
  $Name  = "(?:$Nstrt$Nchar*?)";


  ## v2 engine parse regex:
  ## -------------------------------------------------

  $RxParseXP1 =
qr/(?:<(?:(?:(\/*)($Name)\s*(\/*))|(?:($Name+)(\s+(?:(?:(?:".*?")|(?:'.*?'))|(?:[^>]*?))+)\s*(\/?))|(?:\?(.*?)\?)|(?:!(?:(?:DOCTYPE(.*?))|(?:\[CDATA\[(.*?)\]\])|(?:--(.*?)--)|(?:ATTLIST(.*?))|(?:ENTITY(.*?))|(?:ELEMENT(.*?)))))>)|([^<]*)(<?)/s;
  #                (  <(  (  1   12     2   3   3)|(  4      45   (  (  (       )|(       ))|(        )) 5   6   6)|(    7   7  )|(  !(  (         8   8)|(           9   9    )|(    0   0  )|(
1   1)|(        2   2)|(         3   3))))>)|4     45  5
  $RxAttr = qr/\G\s+(?:(?:($Name)\s*=\s*("|'|))|($Name+))/;

  $RxAttr_DL1 = qr/\G(?:([^'&<]*?)|([^'<]*?))'/;
  $RxAttr_DL2 = qr/\G(?:([^"&<]*?)|([^"<]*?))"/;
  $RxAttr_DL3 = qr/\G([^"'=<\s]+)/;
  $RxAttr_RM  = qr/[^\s\n]+/;
  $RxPi = qr/^($Name)\s+(.*?)$/s;



------------------------------

Date: Fri, 28 Nov 2008 23:54:17 GMT
From: sln@netherlands.com
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <3411j4tli0tinaie7ki4l5201k0b8igpsn@4ax.com>

On Fri, 28 Nov 2008 23:39:27 GMT, sln@netherlands.com wrote:

>On Fri, 28 Nov 2008 15:04:01 -0800, Tim Greer <tim@burlyhost.com> wrote:
>
Maybe you can't.

sln



------------------------------

Date: Sat, 29 Nov 2008 00:35:21 GMT
From: sln@netherlands.com
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <b931j4tfn3f3jfpij1f1ipmgh4prm9i6hm@4ax.com>

On Fri, 28 Nov 2008 23:39:27 GMT, sln@netherlands.com wrote:

>On Fri, 28 Nov 2008 15:04:01 -0800, Tim Greer <tim@burlyhost.com> wrote:
>
>>sln@netherlands.com wrote:
>>
>>> Please expunge the FAQ of these types of posts. You have no idea what
>>> you are talking about.
>>>>
>>> [snip]
>>>>If you'd like to help maintain the perlfaq, see the details in
>>>>perlfaq.pod.
>>> 
>>
>>Your post sounds like you don't like Perl.  Why are you here then?  FYI,
>>I'm pretty sure the guys at stonehenge have _some_ idea of what they
>>are talking about.  If you have a better suggestion for this particular
>>FAQ, why don't you make said suggestion?
>>
>>PS: I can't tell if you're trying to be sarcastic, but Perl isn't going
>>anywhere any time soon.
>
>It might be if the supporters ways are not changed.
>Before C, back to B  (was there A?), I did it. What comes after direct
>C parsed assembly? Not very much. What will ever come? Not until no 
>registers. Will Perl fall off the face of the earth. Why, yes. Yes
>it will. Does regular expressions mean anything more than a prototype
>to C? No, no it doesen't. Does that mean something you can do in Perl
>is useless because it can be done much faster in C?
>
>Then is Perl itself useless, with no advantage?
>
>One thing for sure, you can see what is happening in Perl can't you?
>
>  @UC_Nstart = (
>    "\\x{C0}-\\x{D6}",
>    "\\x{D8}-\\x{F6}",
>    "\\x{F8}-\\x{2FF}",
>    "\\x{370}-\\x{37D}",
>    "\\x{37F}-\\x{1FFF}",
>    "\\x{200C}-\\x{200D}",
>    "\\x{2070}-\\x{218F}",
>    "\\x{2C00}-\\x{2FEF}",
>    "\\x{3001}-\\x{D7FF}",
>    "\\x{F900}-\\x{FDCF}",
>    "\\x{FDF0}-\\x{FFFD}",
>    "\\x{10000}-\\x{EFFFF}",
>  ); 
>  @UC_Nchar = (
>    "\\x{B7}",
>    "\\x{0300}-\\x{036F}",
>    "\\x{203F}-\\x{2040}",
>  );
>  $Nstrt = "[A-Za-z_:".join ('',@UC_Nstart)."]";
>  $Nchar = "[-\\w:\\.".join ('',@UC_Nchar).join ('',@UC_Nstart)."]";
>  $Name  = "(?:$Nstrt$Nchar*?)";
>
>
>  ## v2 engine parse regex:
>  ## -------------------------------------------------
>
>  $RxParseXP1 =
>qr/(?:<(?:(?:(\/*)($Name)\s*(\/*))|(?:($Name+)(\s+(?:(?:(?:".*?")|(?:'.*?'))|(?:[^>]*?))+)\s*(\/?))|(?:\?(.*?)\?)|(?:!(?:(?:DOCTYPE(.*?))|(?:\[CDATA\[(.*?)\]\])|(?:--(.*?)--)|(?:ATTLIST(.*?))|(?:ENTITY(.*?))|(?:ELEMENT(.*?)))))>)|([^<]*)(<?)/s;
>  #                (  <(  (  1   12     2   3   3)|(  4      45   (  (  (       )|(       ))|(        )) 5   6   6)|(    7   7  )|(  !(  (         8   8)|(           9   9    )|(    0   0  )|(
>1   1)|(        2   2)|(         3   3))))>)|4     45  5
>  $RxAttr = qr/\G\s+(?:(?:($Name)\s*=\s*("|'|))|($Name+))/;
>
>  $RxAttr_DL1 = qr/\G(?:([^'&<]*?)|([^'<]*?))'/;
>  $RxAttr_DL2 = qr/\G(?:([^"&<]*?)|([^"<]*?))"/;
>  $RxAttr_DL3 = qr/\G([^"'=<\s]+)/;
>  $RxAttr_RM  = qr/[^\s\n]+/;
>  $RxPi = qr/^($Name)\s+(.*?)$/s;

I haven't enumerated conditionals yet, but then again I don't translate dtd's yet.
I have a toolset to manipulate the static documents introduced. Shall I downlad
and insert stream. Shall I worry about piped documants? Dunno, don't care right now.
Can I? You bet your ass I can.


sln



------------------------------

Date: Fri, 28 Nov 2008 16:35:21 -0800
From: Tim Greer <tim@burlyhost.com>
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <d70Yk.404$297.349@newsfe23.iad>

sln@netherlands.com wrote:

> $RxParseXP1 =
> qr/(?:<(?:(?:(\/*)($Name)\s*(\/*))|(?:($Name+)(\s+(?:(?:(?:".*?")
(?:'.*?'))|(?:[^>]*?))+)\s*(\/?))|(?:\?(.*?)\?)|(?:!(?
(?:DOCTYPE(.*?))|(?:\[CDATA\[(.*?)\]\])|(?:--(.*?)--)|(?:ATTLIST(.*?))
(?:ENTITY(.*?))|(?:ELEMENT(.*?)))))>)|([^<]*)(<?)/s;

Ever heard of /x?
-- 
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!


------------------------------

Date: Sat, 29 Nov 2008 00:37:57 GMT
From: sln@netherlands.com
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <hl31j49r1glihvp7i12r9lblju5fq8ohd2@4ax.com>

On Fri, 28 Nov 2008 16:35:21 -0800, Tim Greer <tim@burlyhost.com> wrote:

>sln@netherlands.com wrote:
>
>> $RxParseXP1 =
>> qr/(?:<(?:(?:(\/*)($Name)\s*(\/*))|(?:($Name+)(\s+(?:(?:(?:".*?")
>(?:'.*?'))|(?:[^>]*?))+)\s*(\/?))|(?:\?(.*?)\?)|(?:!(?
>(?:DOCTYPE(.*?))|(?:\[CDATA\[(.*?)\]\])|(?:--(.*?)--)|(?:ATTLIST(.*?))
>(?:ENTITY(.*?))|(?:ELEMENT(.*?)))))>)|([^<]*)(<?)/s;
>
>Ever heard of /x?
It begs the question, have you ever heard of markup?

sln



------------------------------

Date: Fri, 28 Nov 2008 16:44:13 -0800
From: Tim Greer <tim@burlyhost.com>
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <yf0Yk.905$st1.761@newsfe10.iad>

sln@netherlands.com wrote:

> On Fri, 28 Nov 2008 23:39:27 GMT, sln@netherlands.com wrote:
> 
>>On Fri, 28 Nov 2008 15:04:01 -0800, Tim Greer <tim@burlyhost.com>
>>wrote:
>>
> Maybe you can't.
> 
> sln

You have a history on this newsgroup of attacking people if they don't
reply to you in 15-20 minutes.  I don't live on usenet, my friend, so
relax.

Anyway, I saw you posting this same exact code on usenet back in early
2007-ish, arguing against all of the regulars then, too.  Let's not
bother repeating history.

I'm simply saying, if you have a viable suggestion that is superior and
generalized enough to improve this particular FAQ, that you submit it.
-- 
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!


------------------------------

Date: Fri, 28 Nov 2008 16:45:55 -0800
From: Tim Greer <tim@burlyhost.com>
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <7h0Yk.906$st1.621@newsfe10.iad>

sln@netherlands.com wrote:

> I haven't enumerated conditionals yet, but then again I don't
> translate dtd's yet. I have a toolset to manipulate the static
> documents introduced. Shall I downlad and insert stream. Shall I worry
> about piped documants? Dunno, don't care right now. Can I? You bet
> your ass I can.

I didn't ask.  Who are you talking to?
-- 
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!


------------------------------

Date: Fri, 28 Nov 2008 16:46:39 -0800
From: Tim Greer <tim@burlyhost.com>
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <Ph0Yk.907$st1.527@newsfe10.iad>

sln@netherlands.com wrote:

> On Fri, 28 Nov 2008 16:35:21 -0800, Tim Greer <tim@burlyhost.com>
> wrote:
> 
>>sln@netherlands.com wrote:
>>
>>> $RxParseXP1 =
>>> qr/(?:<(?:(?:(\/*)($Name)\s*(\/*))|(?:($Name+)(\s+(?:(?:(?:".*?")
>>(?:'.*?'))|(?:[^>]*?))+)\s*(\/?))|(?:\?(.*?)\?)|(?:!(?
>>(?:DOCTYPE(.*?))|(?:\[CDATA\[(.*?)\]\])|(?:--(.*?)--)|(?:ATTLIST(.*?))
>>(?:ENTITY(.*?))|(?:ELEMENT(.*?)))))>)|([^<]*)(<?)/s;
>>
>>Ever heard of /x?
> It begs the question, have you ever heard of markup?
> 
> sln

Yep, I've heard of an understand the use of the /x modifier *and*
markup.  I win.
-- 
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!


------------------------------

Date: Sat, 29 Nov 2008 01:01:22 GMT
From: sln@netherlands.com
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <uf41j45buub9njv6fr4rrvb5t47baca86d@4ax.com>

On Fri, 28 Nov 2008 16:44:13 -0800, Tim Greer <tim@burlyhost.com> wrote:

>sln@netherlands.com wrote:
>
>> On Fri, 28 Nov 2008 23:39:27 GMT, sln@netherlands.com wrote:
>> 
>>>On Fri, 28 Nov 2008 15:04:01 -0800, Tim Greer <tim@burlyhost.com>
>>>wrote:
>>>
>> Maybe you can't.
>> 
>> sln
>
>You have a history on this newsgroup of attacking people if they don't
>reply to you in 15-20 minutes.  I don't live on usenet, my friend, so
>relax.
>
>Anyway, I saw you posting this same exact code on usenet back in early
>2007-ish, arguing against all of the regulars then, too.  Let's not
>bother repeating history.
>
>I'm simply saying, if you have a viable suggestion that is superior and
>generalized enough to improve this particular FAQ, that you submit it.

This is the only one worth replying to. Consider yourself lucky.
In the last 6 weeks, I have a history of solving the most complex
regular expression problems posted on this usenet group.

You saw nothing even close to Version 2 Regex Engine you just saw.
You are a dumb, stupid idiot who is recanted by intelligent posters
here all the time. Unfortunately for you, dumb ass, you have no idea
what happens behind those regexs'.

Having a viable suggestion as you say that is superior enough to
improve this particualr FAQ, is entirely left to the originators of
this FAQ. The superiorority is obvious.

To expunge the posts is out of my hands.

But, your dip-shit, know-nothing attitute is in my hands, you know
nothing dumb-ass!

Or you got some great knowledge on this FAQ dumb-ass?

sln


------------------------------

Date: Fri, 28 Nov 2008 17:14:22 -0800
From: Tim Greer <tim@burlyhost.com>
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <QH0Yk.3917$zQ3.854@newsfe12.iad>

sln@netherlands.com wrote:

> On Fri, 28 Nov 2008 16:44:13 -0800, Tim Greer <tim@burlyhost.com>
> wrote:
> 
>>sln@netherlands.com wrote:
>>
>>> On Fri, 28 Nov 2008 23:39:27 GMT, sln@netherlands.com wrote:
>>> 
>>>>On Fri, 28 Nov 2008 15:04:01 -0800, Tim Greer <tim@burlyhost.com>
>>>>wrote:
>>>>
>>> Maybe you can't.
>>> 
>>> sln
>>
>>You have a history on this newsgroup of attacking people if they don't
>>reply to you in 15-20 minutes.  I don't live on usenet, my friend, so
>>relax.
>>
>>Anyway, I saw you posting this same exact code on usenet back in early
>>2007-ish, arguing against all of the regulars then, too.  Let's not
>>bother repeating history.
>>
>>I'm simply saying, if you have a viable suggestion that is superior
>>and generalized enough to improve this particular FAQ, that you submit
>>it.
> 
> This is the only one worth replying to. Consider yourself lucky.
> In the last 6 weeks, I have a history of solving the most complex
> regular expression problems posted on this usenet group.

Or so you say.  I appreciate that you have attempted to be involved and
offer solutions, but it seems more to form a basis to argue with people
and call them names, and accuse them -- that is the history I see from
you on this group.  That said, when your solutions wrap 4-5+ lines and
you can't understand or have enough sense to use the /x modifier, it
doesn't leave one with a good impression of your intentions, what with
your attitude you convey against most users on this group daily.

> You saw nothing even close to Version 2 Regex Engine you just saw.

Oh, no?

http://groups.google.com/group/comp.lang.perl.misc/msg/edc94963125eaf57

I didn't look character for character, but I recall pretty well, and
this is from Feb 2007.

> You are a dumb, stupid idiot who is recanted by intelligent posters
> here all the time.

Oh yeah?  This coming from a guy that doesn't understand /x?

> Unfortunately for you, dumb ass, you have no idea 
> what happens behind those regexs'.

I didn't mean to hurt your feelings. There's no need for name calling.

> Having a viable suggestion as you say that is superior enough to
> improve this particualr FAQ, is entirely left to the originators of
> this FAQ. The superiorority is obvious.

Maybe if you submit a viable, superior solution, it would be included. 
How do you think the FAQ's they post are formed?  It's from people here
and other places contributing (and contributing more than a poor
attitude and arrogance, by the way).

> To expunge the posts is out of my hands.

What is your point?

> But, your dip-shit, know-nothing attitute is in my hands, you know
> nothing dumb-ass!

FYI: I've replied recently to answer a couple of your questions to help
you, so if I know "nothing", you must know less than nothing.

I guess it makes sense to you, by your logic, to have a (invalid) reason
to attack and accuse someone, which explains your overall attitude on
this group.  I made a simple suggestion, and nothing condones your
response now.

If you believe the FAQ is in need of improvement, suggest a replacement
that is better suited.  Such a thing doesn't rightfully incite such a
hostile reaction, vulgarity and name calling.

> Or you got some great knowledge on this FAQ dumb-ass?
> 
> sln

What is it that I can help you with?

PS: I'm pretty good at regexs, especially in Perl (since about 1992), so
I'm serious when I ask if there's anything I can help you with.
-- 
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!


------------------------------

Date: Sat, 29 Nov 2008 01:25:29 GMT
From: sln@netherlands.com
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <bb61j4t2ta3gmf1p5rrimr8ktq25k1l1ni@4ax.com>

On Fri, 28 Nov 2008 17:14:22 -0800, Tim Greer <tim@burlyhost.com> wrote:

>sln@netherlands.com wrote:
>
[snip]
Your a fucking idiot.

Either you know and can contribute on the specific 
FAQ topic, like I did, or you can't.

Don't beat a dead horse.


sln


------------------------------

Date: Fri, 28 Nov 2008 17:39:16 -0800
From: Tim Greer <tim@burlyhost.com>
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <a31Yk.7286$kn7.4752@newsfe08.iad>

sln@netherlands.com wrote:

> On Fri, 28 Nov 2008 17:14:22 -0800, Tim Greer <tim@burlyhost.com>
> wrote:
> 
>>sln@netherlands.com wrote:
>>
> [snip]
> Your a fucking idiot.
> 
> Either you know and can contribute on the specific
> FAQ topic, like I did, or you can't.
> 
> Don't beat a dead horse.
> 
> 
> sln

Enough with the name calling already.  Anyway, I'm sorry, I must have
missed when you actually contributed something useful!?

BTW, here's a little advice; If you can't practice what you preach,
don't embarrass yourself by demanding others do what you can't seem to
do yourself.

That is, you ranted on making accusations, and I clarified in response
(and in defense of them).  I proved you wrong about what you had
claimed, and now you ignore all of that and resort to this.

I wish you'd had managed to stay relevant and on topic to begin with,
but do everyone the favor of acting more professional and following
your own advice and you won't be exposed for what you are.

Once again, if you have something to suggest or contribute, please do
so.  Accusing the guys at stonehenge (Randal, Tom, Brian, etc.) of not
knowing what they are talking about, isn't going to convince anyone
here that you are any authority.

You will probably never know anywhere near as much as they do, and you
admit this often, but don't allow that to stop you from accusing them
of being clueless.  Just calling people vulgar names and "idiots" isn't
really helping to remedy your grievances with the FAQ in question.

I remind you, if you're going to try and make a suggestion, don't post
regular expressions in a single line that wrap 5 lines or more in a
normal news reader.  Likely, if you actually did have a valid/better
suggestion, it still wouldn't make it to the official FAQ if you didn't
bother to have enough respect to use the /x modifier.

Really, who are you trying to convince here?  I wasn't looking to argue
with you, I had only made a suggestion.  If you fly off the handle and
call people names and make false accusations, especially when you fail
to make a good point in your post, don't expect many people to agree
with you.

That said, I am very much aware exactly what the regular expressions you
had posted do, so I'm unsure what you're attempting to argue with me
about?  Don't be so sensitive.  I had only suggested that instead of
attacking people and calling people names, you submit a viable, working
and superior (and readable -- i.e., /x) suggestion for a suitable
replacement.

That is not an attack on you to have suggested, so relax.  Also,
consider that an FAQ doesn't (and shouldn't) go into great depth of
detail to explain a very long regex, as it is intended to be very
condensed and generalized (even if your "solution" was viable, which it
is not).  You seriously need to practice what you preach, but I'm done
with this (and you).  I won't cast my shadow at your door again. 
You're safe, okay?
-- 
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!


------------------------------

Date: Sat, 29 Nov 2008 01:45:38 GMT
From: sln@netherlands.com
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <rk71j4h3ols2f1gmtjk5qmb6eeboku7k5m@4ax.com>

On Fri, 28 Nov 2008 17:39:16 -0800, Tim Greer <tim@burlyhost.com> wrote:

>sln@netherlands.com wrote:
>
>> On Fri, 28 Nov 2008 17:14:22 -0800, Tim Greer <tim@burlyhost.com>
>> wrote:
>> 
>>>sln@netherlands.com wrote:
>>>
>> [snip]
>> Your a fucking idiot.
>> 
>> Either you know and can contribute on the specific
>> FAQ topic, like I did, or you can't.
>> 
>> Don't beat a dead horse.
>> 
>> 
>> sln
>
>Enough with the name calling already.  Anyway, I'm sorry, I must have
>missed when you actually contributed something useful!?
>
>BTW, here's a little advice; If you can't practice what you preach,
>don't embarrass yourself by demanding others do what you can't seem to
>do yourself.
>
*plonk*

sln



------------------------------

Date: Fri, 28 Nov 2008 17:44:48 -0800
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <e971j4dds2lvv39kj9j0fh2qov97f23ca4@4ax.com>

Tim Greer <tim@burlyhost.com> wrote:
>sln@netherlands.com wrote:
>
>> Please expunge the FAQ of these types of posts. You have no idea what
>> you are talking about.
>
>Your post sounds like you don't like Perl.  Why are you here then?  FYI,
>I'm pretty sure the guys at stonehenge have _some_ idea of what they
>are talking about. 

Don't bother. This sln guy is identical to Robic0, both having quite
some history in this NG. I'm actually surprised that you haven't
filtered him yet.

Robic0 wrote a parser that as he claims is regular and nevertheless can
parse context-free languages like HTML. Therefore he is always on the
crusade to promote his parser instead of established, proven solutions. 
It doesn't deter him that he has no idea about Chomsky languages or
hierarchy and that he has been proven wrong many many times over.

Just kilffile him, he's not worth the bother.

jue



------------------------------

Date: Sat, 29 Nov 2008 01:49:21 GMT
From: sln@netherlands.com
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <pr71j4t729mel2sqpgmi7c7app2tvvmoht@4ax.com>

On Fri, 28 Nov 2008 17:44:48 -0800, Jürgen Exner <jurgenex@hotmail.com> wrote:

>Tim Greer <tim@burlyhost.com> wrote:
>>sln@netherlands.com wrote:
>>
>>> Please expunge the FAQ of these types of posts. You have no idea what
>>> you are talking about.
>>
>>Your post sounds like you don't like Perl.  Why are you here then?  FYI,
>>I'm pretty sure the guys at stonehenge have _some_ idea of what they
>>are talking about. 
>
>Don't bother. This sln guy is identical to Robic0, both having quite
>some history in this NG. I'm actually surprised that you haven't
>filtered him yet.
>
>Robic0 wrote a parser that as he claims is regular and nevertheless can
>parse context-free languages like HTML.
push, pop

*plonk*


sln



------------------------------

Date: Fri, 28 Nov 2008 17:50:38 -0800
From: Tim Greer <tim@burlyhost.com>
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <Od1Yk.7289$kn7.3748@newsfe08.iad>

Jürgen Exner wrote:

> Tim Greer <tim@burlyhost.com> wrote:
>>sln@netherlands.com wrote:
>>
>>> Please expunge the FAQ of these types of posts. You have no idea
>>> what you are talking about.
>>
>>Your post sounds like you don't like Perl.  Why are you here then? 
>>FYI, I'm pretty sure the guys at stonehenge have _some_ idea of what
>>they are talking about.
> 
> Don't bother. This sln guy is identical to Robic0, both having quite
> some history in this NG. I'm actually surprised that you haven't
> filtered him yet.
> 
> Robic0 wrote a parser that as he claims is regular and nevertheless
> can parse context-free languages like HTML. Therefore he is always on
> the crusade to promote his parser instead of established, proven
> solutions. It doesn't deter him that he has no idea about Chomsky
> languages or hierarchy and that he has been proven wrong many many
> times over.
> 
> Just kilffile him, he's not worth the bother.
> 
> jue

I was picking up on that as I viewed a little of his history.  I've
noticed he wants to push his 'regex engine' here a lot, even in
completely unsuitable situations (not just unqorkable, but unsuitable),
and I've just ignored those posts.  I do see the connection and mention
of the Robic0 poster in viewing the history.  I acknowledge he's best
filtered, at this point and I'm already done with him.
-- 
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!


------------------------------

Date: Sat, 29 Nov 2008 01:52:11 GMT
From: sln@netherlands.com
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <n181j4tdjhlp1vn813olgilsrgkmsjqle8@4ax.com>

On Sat, 29 Nov 2008 11:48:42 +1000 (EST), Res <res@ausics.net> wrote:

>the only fucking idiot here is you
>
>On Sat, 29 Nov 2008, sln@netherlands.com wrote:
>
>> On Fri, 28 Nov 2008 17:14:22 -0800, Tim Greer <tim@burlyhost.com> wrote:
>>
>>> sln@netherlands.com wrote:
>>>
>> [snip]
>> Your a fucking idiot.
>>
>> Either you know and can contribute on the specific
>> FAQ topic, like I did, or you can't.
>>
>> Don't beat a dead horse.
>>
>>
>> sln
>>
>>
>
>-- 
>Res
>
>If you are not part of the solution, then you are part of the problem!
Well said!

sln



------------------------------

Date: Sat, 29 Nov 2008 01:59:55 GMT
From: sln@netherlands.com
Subject: Re: FAQ 9.4 How do I remove HTML from a string?
Message-Id: <1f81j41kalfk1tpqa3ktf69njg1o7gahl9@4ax.com>

On Sat, 29 Nov 2008 11:48:42 +1000 (EST), Res <res@ausics.net> wrote:

>the only fucking idiot here is you
>
>On Sat, 29 Nov 2008, sln@netherlands.com wrote:
>
>> On Fri, 28 Nov 2008 17:14:22 -0800, Tim Greer <tim@burlyhost.com> wrote:
>>
>>> sln@netherlands.com wrote:
>>>
>> [snip]
>> Your a fucking idiot.
>>
>> Either you know and can contribute on the specific
>> FAQ topic, like I did, or you can't.
>>
>> Don't beat a dead horse.
>>
>>
>> sln
>>
>>
>
>-- 
>Res
>
>If you are not part of the solution, then you are part of the problem!

I'm not done yet. This is just a preview:

sub getAttrsFromXML
{
	# returns 0/1
	# -------------
	return 0 if (scalar(@_) < 3);
	return 0 if (ref($_[1]) ne 'SCALAR');
	return 0 if (ref($_[2]) ne 'ReS');

	my ($self, $xmlref, $result) = @_;
	my ($TRxp, $aRxpAttVal);

	$result->{'lastpos'} = $result->{'pos'};
	pos($$xmlref) = $result->{'pos'};

	if (ref($result->{'rx_tag'})  eq 'RxP') {
		$TRxp = $result->{'rx_tag'};
	}
	if (ref($result->{'rx_attval'}) eq 'ARRAY' && scalar(@{$result->{'rx_attval'}})) {
		$aRxpAttVal = $result->{'rx_attval'};
	}
	my $lcbpos = 0;

	while ($$xmlref =~ /$RxParseXP1/g)
	{
		if (defined $14) {
			if (length($15) && $lcbpos != pos($$xmlref)) {
				$lcbpos = pos($$xmlref);
				pos($$xmlref) = $lcbpos - 1;
			}
			next;
		}
		if (defined $4) {
			$result->{'term'} = length($6); # <tag attrib/>
			# $result = _getAttrARRAY ($self, $5, $convert_ent, $result);
			$result = _getAttrARRAY ($self, $5, $result->{'convert_ent'}, $result);

			if (defined $TRxp) {
				my $tag = $4;
				next if (!$TRxp->apply($tag));
				$result->{'tag'} = $tag;
			} else {	
				$result->{'tag'} = $4;
			}
			if (defined $aRxpAttVal)
			{
			    my $aref = $result->{'attrsref'};
			    my ($nvmatch, $rdx, $ndx) = (0,0,0);

			    while (my ($ARxp, $VRxp) = (@{$aRxpAttVal})[$rdx++, $rdx++])
			    {
				my ($lA, $lV) = (
				    ref($ARxp) eq 'RxP',
				    ref($VRxp) eq 'RxP'
				);
				$ndx  = 0;
			    	while ($aref->[$ndx])
			    	{
					my ($name, $val) = (
					    \$aref->[$ndx++],
					    \$aref->[$ndx++]
					);	
					next if ($lA && !$ARxp->apply($$name));
					next if ($lV && !$VRxp->apply($$val));
					$nvmatch |= 1;
					last;
			    	}
			    }
			    next if (!$nvmatch);
			}
			$result->{'offset'} = $-[0];
			$result->{'pos'} = pos($$xmlref);
			return 1;
		}
	}
	return 0;
}

sub _getAttrARRAY
{
	my ($self, $attrstr, $conv_ent, $hresult) = @_;
	@{$hresult->{'attrsref'}} = ();
	$hresult->{'badattrs'} = '';
	$hresult->{'dupattrs'} = '';
	$hresult->{'noquoteattrs'} = '';
	$hresult->{'errstr'} = '';
	my %hseen = ();
	my $aref = $hresult->{'attrsref'};
	my ($alt_attval, $attval, $rx, $ndx, $DL3);
	# my $tmpstr = $attrstr;
	pos($attrstr) = 0;

	while ($attrstr =~ /$RxAttr/gc)
	{
		if (defined $2)
		{
			$ndx = push @{$aref},$1;
			$DL3 = 0;

			if ($2 eq "'") {
				$rx = \$RxAttr_DL1;
			}
			elsif ($2 eq '"') {
				$rx = \$RxAttr_DL2;
			} else {
				# no quotes
				$rx = \$RxAttr_DL3;
				$DL3 = 1;
			}
			if (++$hseen{$1} == 2) {
				$hresult->{'dupattrs'} .= ", $1";
				$hresult->{'dupattrs'} =~ s/^(?:, )+//;
			}
			if ($attrstr =~ /$$rx/gc) {
				if (!$DL3)
				 {
					## normal quoted value
					if (defined $1) {
						push @{$aref},$1;
						next;
					}
					$attval = $2;
					if ($conv_ent && defined ($alt_attval = _convertEntities ($self, \$attval))) {
						push @{$aref},$$alt_attval;
						next;
					}
					push @{$aref},$attval;
					next;
				}
				## bad attrib, value is not quoted
				$attval = $1;
				if ($conv_ent && defined ($alt_attval = _convertEntities ($self, \$attval))) {
					push @{$aref},$$alt_attval;
				} else {
					push @{$aref},$attval;
				}
				$hresult->{'noquoteattrs'} .= ", ".$aref->[$ndx-1];
				$hresult->{'noquoteattrs'} =~ s/^(?:, )+//;
				next;
			}
			## bad value, its either '<' or no ["'] closure
			$hresult->{'badattrs'} .= ", ".$aref->[$ndx-1];
			$hresult->{'badattrs'} =~ s/^(?:, )+//;
			push @{$aref},'UNDEF_ATTRVAL';
			# trim up to '<', otherwise its reported as
			# improperly quoted or missing value
			$attrstr = substr ($attrstr, pos($attrstr));
			$attrstr =~ s/^[^<]+//;
		} else {
			## attrib with no attrib value
			## (standalone atrribute only)
			$ndx = push @{$aref},$3;
			if (++$hseen{$3} == 2) {
				$hresult->{'dupattrs'} .= ", $3";
				$hresult->{'dupattrs'} =~ s/^(?:, )+//;
			}
			$hresult->{'badattrs'} .= ", ".$aref->[$ndx-1];
			$hresult->{'badattrs'} =~ s/^(?:, )+//;
			push @{$aref},'UNDEF_ATTRVAL';
			next;
		}
		# bad, return that part of string which is in error
		$hresult->{'errstr'} = $attrstr;
		return $hresult;
	}
	if (length($attrstr) > pos($attrstr) &&
	    $attrstr =~ /$RxAttr_RM/) {
		$attrstr = substr ($attrstr, pos($attrstr));
		$attrstr =~ s/^\s+//; $attrstr =~ s/\s+$//;
		# bad, return that part of string which is in error
		# print "-BAD-:$tmpstr\n";
		$hresult->{'errstr'} = $attrstr if (length($attrstr));
	}
	return $hresult;
}




------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 2014
***************************************


home help back first fref pref prev next nref lref last post