[29465] in Perl-Users-Digest
Perl-Users Digest, Issue: 709 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Aug 2 03:09:42 2007
Date: Thu, 2 Aug 2007 00:09:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Thu, 2 Aug 2007 Volume: 11 Number: 709
Today's topics:
Re: @arts <tadmc@seesig.invalid>
Re: filehandle, read lines <robic0>
Re: if kill 9, $pid fails, is the error caught anywhere <nospam-abuse@ilyaz.org>
new CPAN modules on Thu Aug 2 2007 (Randal Schwartz)
Re: Object creation failure in perl ramesh.thangamani@gmail.com
Re: Perl and Sockets <ZevGreenblatt@gmail.com>
Re: Perl and Sockets <1usa@llenroc.ude.invalid>
Re: Perl and Sockets <1usa@llenroc.ude.invalid>
Re: Perl with DBI <jwcarlton@gmail.com>
Re: Posting to https <1usa@llenroc.ude.invalid>
Re: Q on localizing *STDOUT and fork <nobull67@gmail.com>
Stripping some HTML code, while leaving others <jwcarlton@gmail.com>
Re: Stripping some HTML code, while leaving others sln@netherlands.co
Re: Stripping some HTML code, while leaving others <paduille.4061.mumia.w+nospam@earthlink.net>
Re: Using split to count matches, but exclude certain p <tadmc@seesig.invalid>
Re: Using split to count matches, but exclude certain p surfitupdotcom@gmail.com
Re: Using split to count matches, but exclude certain p <attn.steven.kuo@gmail.com>
Re: XML Validation sln@netherlands.co
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Wed, 1 Aug 2007 18:46:17 -0500
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: @arts
Message-Id: <slrnfb26q9.pqv.tadmc@tadmc30.sbcglobal.net>
Michele Dondi <bik.mido@tiscalinet.it> wrote:
> I suppose you can't injure anyone by means of a
> usenet post. Well, except by suggesting as a solution to a given
> problem, well to e.g. take that red cable, peel it off, take that
> green cable, peel it off,
So far, so good...
> and then join them together.
... but that part was supposed to be:
and then touch them to your tongue.
(that's how they taught us to do in in Navy Electrician's school anyway.)
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
------------------------------
Date: Wed, 01 Aug 2007 19:05:26 -0700
From: god <robic0>
Subject: Re: filehandle, read lines
Message-Id: <nre2b3puigoh11ibcp144e73k9k1dtgl1t@4ax.com>
On Thu, 26 Jul 2007 12:50:58 -0000, roy <roy.schultheiss@googlemail.com> wrote:
>I receive a XML-File up to 1 GB full of orders every day. I have to
>split the orders and load them into a database for further processing.
>I share this job onto multiple processes. This runs properly now.
>
<snip>
You might be better off using my SAX parser RxParse 2.0
It does stream parsing very fast and edits good.
MIT featured it in one of thier publications.
robic0
------------------------------
Date: Thu, 2 Aug 2007 04:20:52 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: if kill 9, $pid fails, is the error caught anywhere?
Message-Id: <f8rm34$oj8$1@agate.berkeley.edu>
[A complimentary Cc of this posting was sent to
it_says_BALLS_on_your forehead
<simon.chao@fmr.com>], who wrote in article <1185994130.562023.205630@l70g2000hse.googlegroups.com>:
> nvm, i gather it's in '$!'. On another note, does anyone know of any
> reason besides 'No such process' that would lead to a return of 0 from
> the kill function? trying to come up with actionable instructions for
> support team.
What OS? On most OSes, there may be unkillable processes (not even by
root). [I rememeber getting ones on Solaris 2.6, when the floppy disk
started to misfunction....]
Hope this helps,
Ilya
------------------------------
Date: Thu, 2 Aug 2007 04:42:17 GMT
From: merlyn@stonehenge.com (Randal Schwartz)
Subject: new CPAN modules on Thu Aug 2 2007
Message-Id: <JM4qEH.1DGv@zorch.sf-bay.org>
The following modules have recently been added to or updated in the
Comprehensive Perl Archive Network (CPAN). You can install them using the
instructions in the 'perlmodinstall' page included with your Perl
distribution.
Acme-Anything-0.01
http://search.cpan.org/~jjore/Acme-Anything-0.01/
Anything, even non-existant modules are loadable
----
Algorithm-NGram-0.1
http://search.cpan.org/~revmischa/Algorithm-NGram-0.1/
----
Algorithm-NGram-0.2
http://search.cpan.org/~revmischa/Algorithm-NGram-0.2/
----
Apache-Yaalr-0.02.9
http://search.cpan.org/~jeremiah/Apache-Yaalr-0.02.9/
Perl module for Yet Another Apache Log Reader
----
Audio-aKodePlayer-0.01
http://search.cpan.org/~pajas/Audio-aKodePlayer-0.01/
A simple Perl interface to the aKode audio library.
----
Bundle-Starlink-Base-0.02
http://search.cpan.org/~bradc/Bundle-Starlink-Base-0.02/
A bundle to install modules required to build Starlink Perl modules.
----
Carp-Indeed-0.10
http://search.cpan.org/~ferreira/Carp-Indeed-0.10/
DEPRECATE Warns and dies noisily with stack backtraces
----
Class-Generate-1.10
http://search.cpan.org/~swartik/Class-Generate-1.10/
Generate Perl class hierarchies
----
Data-ConveyorBelt-0.01
http://search.cpan.org/~btrott/Data-ConveyorBelt-0.01/
----
Email-Abstract-2.133_03
http://search.cpan.org/~rjbs/Email-Abstract-2.133_03/
unified interface to mail representations
----
Games-NES-ROM-0.05
http://search.cpan.org/~bricas/Games-NES-ROM-0.05/
View information about an NES game from a ROM file
----
Grid-Transform-0.04
http://search.cpan.org/~gray/Grid-Transform-0.04/
fast grid transformations
----
HTML-Dashboard-0.03
http://search.cpan.org/~janert/HTML-Dashboard-0.03/
Spreadsheet-like formatting for HTML tables, with data-dependent coloring and highlighting: formatted reports
----
Integrator-Module-Build-1.056
http://search.cpan.org/~fxfx/Integrator-Module-Build-1.056/
Gather and synchronize Test::More results in Cydone's Integrator
----
JavaScript-Minifier-1.03
http://search.cpan.org/~pmichaux/JavaScript-Minifier-1.03/
Perl extension for minifying JavaScript code
----
Lingua-YaTeA-0.3
http://search.cpan.org/~thhamon/Lingua-YaTeA-0.3/
Perl extension for extracting terms from a corpus and providing a syntactic analysis in a head-modifier format.
----
Mail-SpamAssassin-Plugin-GoogleSafeBrowsing-1.01
http://search.cpan.org/~danborn/Mail-SpamAssassin-Plugin-GoogleSafeBrowsing-1.01/
SpamAssassin plugin to score mail based on Google blocklists.
----
Mediawiki-Blame-0.0.2
http://search.cpan.org/~daxim/Mediawiki-Blame-0.0.2/
see who is responsible for each line of page content
----
Net-DNS-0.61
http://search.cpan.org/~olaf/Net-DNS-0.61/
Perl interface to the DNS resolver
----
Net-Flickr-API-1.66
http://search.cpan.org/~ascope/Net-Flickr-API-1.66/
base API class for Net::Flickr::* libraries
----
Net-Google-SafeBrowsing-Blocklist-1.01
http://search.cpan.org/~danborn/Net-Google-SafeBrowsing-Blocklist-1.01/
Query a Google SafeBrowsing table
----
Net-Google-SafeBrowsing-UpdateRequest-1.01
http://search.cpan.org/~danborn/Net-Google-SafeBrowsing-UpdateRequest-1.01/
Update a Google SafeBrowsing table
----
Net-Google-SafeBrowsing-UpdateRequest-1.02
http://search.cpan.org/~danborn/Net-Google-SafeBrowsing-UpdateRequest-1.02/
Update a Google SafeBrowsing table
----
Net-XMPP2-0.06
http://search.cpan.org/~elmex/Net-XMPP2-0.06/
An implementation of the XMPP Protocol
----
PDF-Create-0.06
http://search.cpan.org/~markusb/PDF-Create-0.06/
create PDF files
----
PDF-Create-1.0
http://search.cpan.org/~markusb/PDF-Create-1.0/
create PDF files
----
Params-Clean-0.9
http://search.cpan.org/~plato/Params-Clean-0.9/
----
Perl-Tidy-20070801
http://search.cpan.org/~shancock/Perl-Tidy-20070801/
Parses and beautifies perl source
----
SyslgScnDamn-Blacklist-0.43
http://search.cpan.org/~muir/SyslgScnDamn-Blacklist-0.43/
----
SyslogScan-Daemon-0.41
http://search.cpan.org/~muir/SyslogScan-Daemon-0.41/
Watch log files
----
Template-Provider-Encoding-0.10
http://search.cpan.org/~miyagawa/Template-Provider-Encoding-0.10/
Explicitly declare encodings of your templates
----
Variable-Magic-0.03
http://search.cpan.org/~vpit/Variable-Magic-0.03/
Associate user-defined magic to variables from Perl.
----
WWW-CDBaby-0.01
http://search.cpan.org/~grantg/WWW-CDBaby-0.01/
Automate interaction with cdbaby.com!
----
WWW-CDBaby-0.03
http://search.cpan.org/~grantg/WWW-CDBaby-0.03/
Automate interaction with cdbaby.com!
----
WWW-Mixi-0.50
http://search.cpan.org/~tsukamoto/WWW-Mixi-0.50/
Perl extension for scraping the MIXI social networking service.
----
WWW-Search-News-1.075
http://search.cpan.org/~mthurn/WWW-Search-News-1.075/
----
WWW-Sitebase-0.2
http://search.cpan.org/~grantg/WWW-Sitebase-0.2/
Base class for Perl modules
----
WWW-Sitebase-0.3
http://search.cpan.org/~grantg/WWW-Sitebase-0.3/
Base class for Perl modules
If you're an author of one of these modules, please submit a detailed
announcement to comp.lang.perl.announce, and we'll pass it along.
This message was generated by a Perl program described in my Linux
Magazine column, which can be found on-line (along with more than
200 other freely available past column articles) at
http://www.stonehenge.com/merlyn/LinuxMag/col82.html
print "Just another Perl hacker," # the original
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
------------------------------
Date: Wed, 01 Aug 2007 23:10:45 -0700
From: ramesh.thangamani@gmail.com
Subject: Re: Object creation failure in perl
Message-Id: <1186035045.347049.275380@q75g2000hsh.googlegroups.com>
On Aug 1, 2:05 pm, anno4...@radom.zrz.tu-berlin.de wrote:
> <ramesh.thangam...@gmail.com> wrote in comp.lang.perl.misc:> I have written a code like this in my module:
>
> ^
> "Code" in the sense of "computer instructions" is a mass noun and
> doesn't take an indefinite article.
>
> > my $obj = MyModule->new() or die "Failed to create object of type
> > MyModule $!" in my code.
>
> Presumably this code is part of the script that *uses* your module,
> not part of the module itself.
>
> > Strangely enough sometimes the object
> > creation is failing and I am not sure what could really be the issue.
> > This script is running on Modperl and the platform is linux.
> > Appreciate any help on this
>
> You need to give us a little more than that if you want help. What
> does your ->new do? We can't debug code we don't see, show it.
>
> Anno
I am just creating an instance of MyModule that's it.
------------------------------
Date: Thu, 02 Aug 2007 02:11:19 -0000
From: "Zev G." <ZevGreenblatt@gmail.com>
Subject: Re: Perl and Sockets
Message-Id: <1186020679.098771.244930@57g2000hsv.googlegroups.com>
not sure what a program instance is. I am told that as opposed to
other kinds of programs, a sockets program must make an API call to
Windows in order to listen for incoming communication. Doing this will
hog the processor and display an houglass and then they won't be able
to get any other work done unless you use threading. I am hoping to
find a way to avoid using threading, so I'm wondering if the perl
Sockets wrapper takes care of the threading needs for me.
On Aug 1, 6:06 pm, xhos...@gmail.com wrote:
> "Zev G." <ZevGreenbl...@gmail.com> wrote:
> > Hi
>
> > I am working on a new project and the need to code in Sockets came up.
> > I have also been told that the nature of sockets entails multi-
> > threading. In other words, if my program is listening for a call, I
> > need to make sure the user can use the computer for other stuff.
>
> Pretty much any modern general-purpose OS these days is multi-tasking,
> so using the computer for other stuff is not a problem. Using the
> same *program instance* for other stuff could be, though. Is that what
> you meant?
>
> If so, you don't necessarily need threads to do that. You can use
> IO::Select or nonblocking IO instead.
>
> > I
> > have never done either of them. Someone refered me to Perl which I
> > have not used since college. I understand Perl is easy to brush up on
> > and pick up and it has a wrapper for sockets. Does this wrapper also
> > handle the threads?
>
> In my opinion, Perl threads are not (yet) suitable for serious work.
>
> Xho
>
> --
> --------------------http://NewsReader.Com/--------------------
> Usenet Newsgroup Service $9.95/Month 30GB
------------------------------
Date: Thu, 02 Aug 2007 02:44:48 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Perl and Sockets
Message-Id: <Xns997FE76AD29EFasu1cornelledu@127.0.0.1>
"Zev G." <ZevGreenblatt@gmail.com> wrote in
news:1186020679.098771.244930@57g2000hsv.googlegroups.com:
[ top-posting fixed, don't do that ]
> On Aug 1, 6:06 pm, xhos...@gmail.com wrote:
>> "Zev G." <ZevGreenbl...@gmail.com> wrote:
>> > Hi
>>
>> > I am working on a new project and the need to code in Sockets came
>> > up.
>> > I have also been told that the nature of sockets entails multi-
>> > threading. In other words, if my program is listening for a call, I
>> > need to make sure the user can use the computer for other stuff.
...
>> If so, you don't necessarily need threads to do that. You can use
>> IO::Select or nonblocking IO instead.
>>
>> > I
>> > have never done either of them. Someone refered me to Perl which I
>> > have not used since college. I understand Perl is easy to brush up
>> > on and pick up and it has a wrapper for sockets. Does this wrapper
>> > also handle the threads?
>>
>> In my opinion, Perl threads are not (yet) suitable for serious work.
> not sure what a program instance is.
I think we have reached an impasse hear: You do not know enough to know
what you know versus what you do not know.
> I am told that as opposed to
> other kinds of programs, a sockets program must make an API call to
> Windows in order to listen for incoming communication.
All programs running on any operating system make all sorts of API calls
all the time.
> Doing this will hog the processor
If you are referring to the accept call, I do not know of any modern
operating system where an accept will hog the processor.
If you are doing blocking IO, the program will just sit there, waiting
for the accept call to return, but it will not hog the processor. This
is no different than reading from a terminal using blocking IO.
> and display an houglass and then they won't be able
> to get any other work done unless you use threading.
Calling accept (or equivalent) in blocking mode does not 'display an
hourglass'. You can still 'get work done'.
> I am hoping to find a way to avoid using threading,
As recommended you need non-blocking IO (using select or IO::Select).
The Perl Cookbook has examples of this (IIRC).
> so I'm wondering if the perl
> Sockets wrapper takes care of the threading needs for me.
No, IO::Socket is not thread aware (I am not sure it is thread-safe,
though).
Your issues, however, are not related to Perl. You do not know enough of
the fundamental networking, IO and programming concepts. This is evident
in your inability to formulate the problem.
There is no way we can impart all necessary information through a series
of UseNet posts. You will have to do some studying and come to an
understanding of such underlying concepts first.
Then, you can understand and use the relevant recipes given in Chapter
18 of the Perl Cookbook.
Sinan
--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)
comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
------------------------------
Date: Thu, 02 Aug 2007 02:57:28 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Perl and Sockets
Message-Id: <Xns997FE990D2711asu1cornelledu@127.0.0.1>
"A. Sinan Unur" <1usa@llenroc.ude.invalid> wrote in
news:Xns997FE76AD29EFasu1cornelledu@127.0.0.1:
> Then, you can understand and use the relevant recipes given in Chapter
> 18 of the Perl Cookbook.
Correction: That should be Chapter 17.
--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)
comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
------------------------------
Date: Wed, 01 Aug 2007 19:55:09 -0700
From: Jason <jwcarlton@gmail.com>
Subject: Re: Perl with DBI
Message-Id: <1186023309.110869.325630@57g2000hsv.googlegroups.com>
On Jul 30, 9:04 am, "Petr Vileta" <sto...@practisoft.cz> wrote:
> Jason wrote:
> > I'm tagging this onto the same thread because it's the same topic, but
> > the issue is a little different.
>
> > With the ID field, I'm wanting to create a unique ID for each new
> > submission. I was originally using auto_increment, but the problem is
> > that when I remove a row, I do not want the ID to be reused.
>
> When you use auto_increment then ID will not be reused when you delete last
> added row. Try it ;-)
> If you want to know the inserted ID then use last_insert_id() sql function
> but you must use it immediate after insert command. Take a look into mysql
> manual.
> --
>
> Petr Vileta, Czech republic
> (My server rejects all messages from Yahoo and Hotmail. Send me your mail
> from another non-spammer site please.)
Everyone, this thread has been a HUGE help to me, and I really
appreciate all of your replies. It's all coming together very quickly,
and I'm seeing some major performance gains. I still have some work
ahead of me, but the logic is clicking now.
I really appreciate all of your help, as well as the friendly manner
with which it was delivered. Seriously, you have no idea how much I
appreciate it.
Jason
------------------------------
Date: Thu, 02 Aug 2007 02:53:25 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Posting to https
Message-Id: <Xns997FE8E168BEFasu1cornelledu@127.0.0.1>
jackson_samson@HOTMAIL.COM wrote in
news:1186004827.043797.267650@w3g2000hsg.googlegroups.com:
> On Aug 1, 5:41 pm, Stephen O'D <stephen.odonn...@gmail.com> wrote:
>> On Aug 1, 10:36 pm, jackson_sam...@HOTMAIL.COM wrote:
>>
>> > I am using CGI and have created a html form. When I submit it
>> > performs a POST to a https:// site. I keep getting a internal
>> > server error (500).
>>
>> > I need to Post to the https. If I post to a http, then it works.
>>
>> > Thanks,
>>
>> Were going to need more information. What error appears in your
>> webserver error log when the 500 response occurs? Its highly
>> probable that the https part of your server is configured differently
>> to the http side hence the error.
>>
>> Almost certainly a webserver issue and not a Perl one.
>
> Are you saying that the CGI module should let me post directly to an
> https site without any problems?
This has nothing to do with the CGI module.
Let's say you have
<form method="post" action="https://www.example.com/cgi-bin/test.pl">
The https specify how the web browser communicates with the web server.
By the time test.pl is invoked, the web server has handled its side of
the communication and test.pl is invoked just like any other CGI script.
> I will check the logs... I am using IIS 6.0 and posting to another
> remote page. I can create my own html page that posts directly to the
> https page without any errors.
Huh? The combination of the words and sentences above can mean many
different things and I am not willing start recommending strategies
contingent on all possible speculative chains of reasoning.
Check the error logs. They should tell you:
1) If the script was successfully invoked at all
2) If it was, possibly why it failed.
Discuss web server specific issues in a group dedicated to the
particular web server.
Sinan
--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)
comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
------------------------------
Date: Thu, 02 Aug 2007 06:26:52 -0000
From: Brian McCauley <nobull67@gmail.com>
Subject: Re: Q on localizing *STDOUT and fork
Message-Id: <1186036012.070979.170690@19g2000hsx.googlegroups.com>
On Aug 2, 12:41 am, kj <so...@987jk.com.invalid> wrote:
> Let me preface this question by making it clear that there's no
> particular problem I'm trying to solve, but rather I'm trying to
> clarify my understanding of how Perl works, at least under Unix.
> My question is this: is there a way to avoid the bothersome saving
> and restoring of STDOUT. I naively thought that one could do so
> by localizing *STDOUT. IOW, replace the LOOK_HERE block with:
>
> {
> local *STDOUT;
> open STDOUT, '>&', $out or die $!;
>
> {
> open my $pipe, '|-', '/usr/bin/sort', '-n' or die $!;
> print $pipe int( rand( ~0 ) ), "\n" for 1..1_000_000;
> }
> }
>
> Very nice, except it doesn't work. Now the output /usr/bin/sort
> (which, incidentally, in this example happens to be pretty big)
> goes to the terminal. BTW, this same thing happens if instead of
> redirecting STDOUT by duplicating the write-handle $out, I simply
> re-open STDOUT like this:
>
> {
> local *STDOUT;
> open STDOUT, '>', 'somefile' or die $!;
>
> {
> open my $pipe, '|-', '/usr/bin/sort', '-n' or die $!;
> print $pipe int( rand( ~0 ) ), "\n" for 1..1_000_000;
> }
> }
I wrote a quite detailed explanation of this here...
http://groups.google.com/group/comp.lang.perl.misc/browse_frm/thread/6d5c3d062c1e7608/fb1af6de388ee945
> Anyway, BTAIM, is there anyway to avoid the save/restore rigmarole?
I'm fairly sure I've seen modules on CPAN to wrap it up a bit but
under the hood AFAIK they'd still do the same thing.
------------------------------
Date: Wed, 01 Aug 2007 20:39:08 -0700
From: Jason <jwcarlton@gmail.com>
Subject: Stripping some HTML code, while leaving others
Message-Id: <1186025948.242190.82670@b79g2000hse.googlegroups.com>
I have a textarea field (it's actually a contenteditable field, but it
doesn't matter to Perl), and want to allow pre-authorized HTML code to
go through, but no un-authorized code.
I'm allowing them to choose between 4 or 5 font faces, 4 font sizes,
bold, italics, underline, and a group of colors. I do NOT, however,
want them to use <H1> or CSS.
This is really only a problem (so far) when someone copies an entire
article from a website. I had this posted today:
<H1 class=headline> # not OK
<FONT face=Arial>Investigators Search for Missing Cary Boy</FONT> # OK
</H1> # not OK
<br>
<DIV style="DISPLAY: none"> # not OK
<br>
<P class=byline> # <P> would be OK, but not class=byline
The article copied didn't originate from my site, of course, so the
CSS is irrelevant. Unless, of course, they stumble upon a class that I
did name; then there could be a real problem!
So, what's the most logical way of removing the "bad" code, but
leaving the "good"?
The only thought I had was to replace "good" HTML with UBB-style code:
$post =~ s/<font face=Arial>/[font face=Arial]/gi;
$post =~ s/<font face=Verdana>/[font face=Verdana]/gi;
and so on. Then, strip all of the remaining HTML:
$post =~ s/<.*?>//gs;
Then, convert the UBB code back to HTML:
$post =~ s/[font face=Arial]/<font face=Arial>/gi;
$post =~ s/[font face=Verdana]/<font face=Verdana>/gi;
This seems TERRIBLY cumbersome, though; especially when you consider
all of the color codes that I'll have to potentially match. I know
there's a better way, I just haven't thought of it yet.
Any ideas? TIA,
Jason
------------------------------
Date: Wed, 01 Aug 2007 21:44:30 -0700
From: sln@netherlands.co
Subject: Re: Stripping some HTML code, while leaving others
Message-Id: <spn2b3lfgp2jlg63q7ltub0uruhdef4oii@4ax.com>
On Wed, 01 Aug 2007 20:39:08 -0700, Jason <jwcarlton@gmail.com> wrote:
>I have a textarea field (it's actually a contenteditable field, but it
>doesn't matter to Perl), and want to allow pre-authorized HTML code to
>go through, but no un-authorized code.
>
>I'm allowing them to choose between 4 or 5 font faces, 4 font sizes,
>bold, italics, underline, and a group of colors. I do NOT, however,
>want them to use <H1> or CSS.
>
>This is really only a problem (so far) when someone copies an entire
>article from a website. I had this posted today:
>
><H1 class=headline> # not OK
><FONT face=Arial>Investigators Search for Missing Cary Boy</FONT> # OK
></H1> # not OK
><br>
><DIV style="DISPLAY: none"> # not OK
><br>
><P class=byline> # <P> would be OK, but not class=byline
>
>The article copied didn't originate from my site, of course, so the
>CSS is irrelevant. Unless, of course, they stumble upon a class that I
>did name; then there could be a real problem!
>
>So, what's the most logical way of removing the "bad" code, but
>leaving the "good"?
>
>The only thought I had was to replace "good" HTML with UBB-style code:
>$post =~ s/<font face=Arial>/[font face=Arial]/gi;
>$post =~ s/<font face=Verdana>/[font face=Verdana]/gi;
>
>and so on. Then, strip all of the remaining HTML:
>$post =~ s/<.*?>//gs;
>
>Then, convert the UBB code back to HTML:
>$post =~ s/[font face=Arial]/<font face=Arial>/gi;
>$post =~ s/[font face=Verdana]/<font face=Verdana>/gi;
>
>This seems TERRIBLY cumbersome, though; especially when you consider
>all of the color codes that I'll have to potentially match. I know
>there's a better way, I just haven't thought of it yet.
>
>Any ideas? TIA,
>
>Jason
I don't know what your calling a problem.
"removing the "bad" code, but leaving the "good"
Do you mean you accept direct uploads of quoted html, to be posted on your site
for viewing (ie: downloaded html to a renderer)?
I guess its easier to allow the poster to just upload, but, you should be
a little proactive, and supply an input form, replete with only your allowable
attributes.
Just my opinion.
Sln
------------------------------
Date: Thu, 02 Aug 2007 01:45:31 -0500
From: "Mumia W." <paduille.4061.mumia.w+nospam@earthlink.net>
Subject: Re: Stripping some HTML code, while leaving others
Message-Id: <13b2vh1m21hmia3@corp.supernews.com>
On 08/01/2007 10:39 PM, Jason wrote:
> I have a textarea field (it's actually a contenteditable field, but it
> doesn't matter to Perl), and want to allow pre-authorized HTML code to
> go through, but no un-authorized code.
> [ ... ]
>
> This seems TERRIBLY cumbersome, though; especially when you consider
> all of the color codes that I'll have to potentially match. I know
> there's a better way, I just haven't thought of it yet.
>
> Any ideas? TIA,
>
> Jason
>
I think that CPAN's HTML::TagFilter would make this very easy.
Or you could use HTML::Parser and write handlers to select only the tags
you want.
Like many, you've been tempted to use regular expressions for this, but
regular expressions cannot properly parse HTML, and writing your own
HTML parser is not for the faint of heart. Use a module.
------------------------------
Date: Wed, 1 Aug 2007 18:42:11 -0500
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: Using split to count matches, but exclude certain patterns
Message-Id: <slrnfb26ij.pqv.tadmc@tadmc30.sbcglobal.net>
surfitupdotcom@gmail.com <surfitupdotcom@gmail.com> wrote:
> I have script that recursively greps
> Attempts so far:
> # @surewords = split(/(?<!\_)${search_term}(?!\_)/im,
> $grep_out);
> # @surewords = split(/\_{0}${search_term}\_{0}/im,
> $grep_out);
> @surewords = split(/[^\_]${search_term}[^\_]/im,
> $grep_out);
> # @surewords = split(/(^|[^\_])${search_term}($|[^\_])/im,
> $grep_out);
There is no recursion anywhere in that code.
Perhaps you meant "repeatedly" instead?
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
------------------------------
Date: Wed, 01 Aug 2007 17:05:29 -0700
From: surfitupdotcom@gmail.com
Subject: Re: Using split to count matches, but exclude certain patterns
Message-Id: <1186013129.383350.31930@d30g2000prg.googlegroups.com>
On Aug 1, 4:42 pm, Tad McClellan <ta...@seesig.invalid> wrote:
> surfitupdot...@gmail.com <surfitupdot...@gmail.com> wrote:
> > I have script that recursively greps
> > Attempts so far:
> > # @surewords = split(/(?<!\_)${search_term}(?!\_)/im,
> > $grep_out);
> > # @surewords = split(/\_{0}${search_term}\_{0}/im,
> > $grep_out);
> > @surewords = split(/[^\_]${search_term}[^\_]/im,
> > $grep_out);
> > # @surewords = split(/(^|[^\_])${search_term}($|[^\_])/im,
> > $grep_out);
>
> There is no recursion anywhere in that code.
>
> Perhaps you meant "repeatedly" instead?
>
> --
> Tad McClellan
> email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
The recursion is elsewhere in the script. By the time it gets to this
split each line of $grep_out has one or more hits of the search term.
------------------------------
Date: Wed, 01 Aug 2007 18:55:37 -0700
From: "attn.steven.kuo@gmail.com" <attn.steven.kuo@gmail.com>
Subject: Re: Using split to count matches, but exclude certain patterns
Message-Id: <1186019737.623651.317330@x35g2000prf.googlegroups.com>
On Aug 1, 1:02 pm, surfitupdot...@gmail.com wrote:
(snipped)
>
> You read me correctly, idea was to split on any occurrence of my
> search term that does not have an underscore before or after it.
> Counting matches using split worked fine until I tried to exclude
> certain patterns. I will look at the perldoc you suggested but here
> is more info for the thread. Thanks, John
>
> Sample input: super _super_ _super super SUPER SUPER_ blahsuper
> Desired output: super super SUPER super
How did you plan on getting rid of the 'blah' substring by
doing a split?
>
> Current output using split(/(?<!_)${search_term}(?!_)/i, $grep_out);
> Array contents- _super_ _super SUPER_ blah
Your description said 'a underscore before ... OR
a underscore after'; so you also need an "OR" in your
regular expression. This is known as "Alternation"
(see perldoc perlre).
use Data::Dumper;
my $term = 'super';
my $string = 'super _super_ _super super SUPER SUPER_ blahsuper';
my @fragments = split(
/_\Q$term\E_? # exclude term with underscore in front
# (optional trailing _)
| # OR
_?\Q$term\E_/xi # exclude term with underscore afterward
# (optional leading _)
, $string);
print Dumper \@fragments;
__END__
I get:
$VAR1 = [
'super ',
' ',
' super SUPER ',
' blahsuper'
];
Is that what you wanted? As Paul said, there's
probably a better way to "count" things than
using split.
--
Hope this helps,
Steven
------------------------------
Date: Wed, 01 Aug 2007 22:05:42 -0700
From: sln@netherlands.co
Subject: Re: XML Validation
Message-Id: <6go2b3l63632sl7abuj5g7lcorm6j3cnbr@4ax.com>
On Wed, 25 Jul 2007 14:56:36 -0700, Shiraz <shirazk@gmail.com> wrote:
>I am trying to use the XML simple to parse out some xml data. If I use
>the code below with invalid xml, i just get a warning 'not well-formed
>(invalid token) at line 1, column 16, byte 16 at /usr/local/lib/perl5/
>site_perl/5.8.7/i686-linux/XML/Parser.pm line 187'
>A test like 'unless (my $data = $xml->XMLin($msg) ) ' doesnt work
>either.
>Anyone know how to test for valid XML using just XML::Simple or would
>i have to get a XML checking library
>
>Thanks,
>
>code:
>#!/usr/bin/perl
>use strict;
>use XML::Simple;
>$|=1;
>my $xml = new XML::Simple;
>my $msg = '<xml><select app>orig_gw</select></xml>'; #this is bad xml
>my $data = $xml->XMLin($msg)
>
>result:
>not well-formed (invalid token) at line 1, column 16, byte 16 at /usr/
>local/lib/perl5/site_perl/5.8.7/i686-linux/XML/Parser.pm line 187
You have kind of a funny question. Seams like your asking how can I get more
info on what the error is and how to fix it. Then it seams like your asking,
how can I see the error without exiting my parsing, and continue parsing.
The fact is, xml simple uses a parser that does syntax checking, a check
for valid xml structure. Guess what, once xml becomes the "invalid" state,
you can only recover the remaining xml in a cursorary way. Ie: anything
(not everything) you get after that, has a validity problem, do you
understand?
Most people use xml simple in one of two ways. You pre-parse the xml to
the point until you start to capture raw xml to a buffer. You then pass
the buffer to xml simple where it then puts it into an already known
hash structure, usable by your program.
The other way is to pass the whole xml file to xml simple to but
into a gigantic unwiedly/unknown multi-level hash array, where you
later itterate through it to the point where you think you have a
known structure.
Either way, you get zero out of any parser with invalid xml.
Parsers aren't forgiving, nor are the xml simples that use them.
"I don't wan't my program creating hash array garbage, making it
look bad, and blaming me" kind of attitude.
Of course an eval() around the access could trap that, for the most
part you should expect nothing but perfect xml structure, if you
intend to parse it for data. You should know your source.
You may want to use some locally available xml parse that actually
gives you much more detail on invalid xml, like what node is missing,
where, why and how, and also is tollerant to errors, and internally
self-corrects them to parse the entire xml, and give you an overall
evaluation.
One of them is available here, RxParse, a pure Perl xml,xhtml parser
that is excellent for that. Search the forum or google. I think the
one posted was version 1.1
good luck
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 709
**************************************