[24646] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 6810 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Aug 3 13:21:44 2004

Date: Tue, 3 Aug 2004 10:21:00 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Tue, 3 Aug 2004     Volume: 10 Number: 6810

Today's topics:
        HTML regex challenge <memetral@hotmail.com>
    Re: HTML regex challenge <invalid-email@rochester.rr.com>
    Re: HTML regex challenge <tadmc@augustmail.com>
    Re: HTML regex challenge <memetral@hotmail.com>
    Re: HTML regex challenge <kuujinbo@hotmail.com>
    Re: HTML regex challenge <tore@aursand.no>
        HTML::Entities::encode() returning wrong(?) entities <jh@333.org>
    Re: HTML::Entities::encode() returning wrong(?) entitie <jh@333.org>
    Re: HTML::Entities::encode() returning wrong(?) entitie <Joe.Smith@inwap.com>
    Re: HTML::Entities::encode() returning wrong(?) entitie <eric-amick@comcast.net>
        Image magick palette (kkarma)
    Re: Inconsistent behavior between SQL*Plus and Perl DBI (John)
    Re: Inconsistent behavior between SQL*Plus and Perl DBI (John)
    Re: Inconsistent behavior between SQL*Plus and Perl DBI (John)
    Re: Inconsistent behavior between SQL*Plus and Perl DBI (John)
    Re: Inconsistent behavior between SQL*Plus and Perl DBI <matrix_calling@yahoo.dot.com>
    Re: Inconsistent behavior between SQL*Plus and Perl DBI (John)
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sat, 24 Jul 2004 13:21:11 -0400
From: "Max Metral" <memetral@hotmail.com>
Subject: HTML regex challenge
Message-Id: <UL-dnTbrkbmZB5_cRVn-sw@speakeasy.net>

I'm matching some ASP.net code with some perl regex's to do localization.
I'm having some trouble with asp's embedded use of <% %> and differentiating
it from the html tag...  So, the thing I'm matching is like:

<tag a=b c="d">stuff</tag>

My reg ex is:

<tag([^>]*?)>(.*?)</tag>

Which works fine for the first example.  But it doesn't for this:

<tag a=b c="<%foo%>">stuff</tag>

As expected, it stops after %>.  Question is, how can I modify the
expression to still get the whole "attribute section" in that single
match...  I've tried various back reference constructs, but they don't seem
to do it.  The expression fragment I want is "match everything except right
bracket, unless there was a % before the right bracket"...

Hrmph,
--Max




------------------------------

Date: Sat, 24 Jul 2004 17:49:15 GMT
From: Bob Walton <invalid-email@rochester.rr.com>
Subject: Re: HTML regex challenge
Message-Id: <4102A0A5.7090206@rochester.rr.com>

Max Metral wrote:

> I'm matching some ASP.net code with some perl regex's to do localization.
> I'm having some trouble with asp's embedded use of <% %> and differentiating
> it from the html tag...  So, the thing I'm matching is like:
> 
> <tag a=b c="d">stuff</tag>
> 
> My reg ex is:
> 
> <tag([^>]*?)>(.*?)</tag>
> 
> Which works fine for the first example.  But it doesn't for this:
> 
> <tag a=b c="<%foo%>">stuff</tag>
> 
> As expected, it stops after %>.  Question is, how can I modify the
> expression to still get the whole "attribute section" in that single
> match...  I've tried various back reference constructs, but they don't seem
> to do it.  The expression fragment I want is "match everything except right
> bracket, unless there was a % before the right bracket"...
 ...


> --Max

Well, there's really only one way to do it right:  Parse the HTML. 
There are *bunches* of other cases that can bite you besides the one you 
found, and, in general, it is most difficult to handle them all, 
particularly in a single regexp.  Actually, it is probably difficult to 
even know about them all.  See:

    perldoc HTML::Parser
    perldoc -q HTML

The latter document has a few of the possible trip-ups listed.
-- 
Bob Walton
Email: http://bwalton.com/cgi-bin/emailbob.pl



------------------------------

Date: Sat, 24 Jul 2004 15:52:29 -0500
From: Tad McClellan <tadmc@augustmail.com>
Subject: Re: HTML regex challenge
Message-Id: <slrncg5j0d.7h7.tadmc@magna.augustmail.com>

Max Metral <memetral@hotmail.com> wrote:

> Subject: HTML regex challenge


Parsing arbitrary HTML with a regex is nearly impossible.

You need a Real Parser that knows the HTML grammar.


> The expression fragment I want is "match everything except right
> bracket, unless there was a % before the right bracket"...


Your problem description will not do the Right Thing for this HTML:

   <img src="cool.jpg" alt=">>Cool pic!<<">

after you fix the regex for that case, post it here and we
will show some other HTML that breaks it.

Then after you fix the regex for _that_ case, post the regex
and we'll do it again.

Lather, rinse, repeat.

We can keep that up longer than you can.  :-)


-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


------------------------------

Date: Sat, 24 Jul 2004 22:23:02 -0400
From: "Max Metral" <memetral@hotmail.com>
Subject: Re: HTML regex challenge
Message-Id: <MZydneQVCKOZhJ7cRVn-og@speakeasy.net>

Understood.  To argue my case only slightly more, I'm not parsing arbitrary
html, I'm looking for a single tag called "localize" which I the replace the
contents of with the contents of an XML entry from a resource file.  So
there's never a case where > appears in an attribute of that tag, UNLESS
it's inside an ASP block (<% %>).  The attributes of the localize tag are
very restricted, true/false type things, except for the fact that somebody
may need to "bind" one of these true/falses to a functon call.

So my latest is:
<localize((?:[^>]*%>[^>])*[^>]*)>(.*?)</localize>

which fixes my original problem, but it's true that that won't handle

<localize visible="<%# x > 5%>">foo</localize>

but that seems fixable and "final", in that that's the only case that could
occur given the allowable values of the tag...

The problem with most HTML parsers is that (shocker) they don't handle
ASP.Net (which isn't HTML)...  So rather than modding something big I was
hoping to keep it simple, even if that means constraining the user of the
tag somewhat.

"Tad McClellan" <tadmc@augustmail.com> wrote in message
news:slrncg5j0d.7h7.tadmc@magna.augustmail.com...
> Max Metral <memetral@hotmail.com> wrote:
>
> > Subject: HTML regex challenge
>
>
> Parsing arbitrary HTML with a regex is nearly impossible.
>
> You need a Real Parser that knows the HTML grammar.
>
>
> > The expression fragment I want is "match everything except right
> > bracket, unless there was a % before the right bracket"...
>
>
> Your problem description will not do the Right Thing for this HTML:
>
>    <img src="cool.jpg" alt=">>Cool pic!<<">
>
> after you fix the regex for that case, post it here and we
> will show some other HTML that breaks it.
>
> Then after you fix the regex for _that_ case, post the regex
> and we'll do it again.
>
> Lather, rinse, repeat.
>
> We can keep that up longer than you can.  :-)
>
>
> -- 
>     Tad McClellan                          SGML consulting
>     tadmc@augustmail.com                   Perl programming
>     Fort Worth, Texas




------------------------------

Date: Sun, 25 Jul 2004 20:13:51 +0900
From: ko <kuujinbo@hotmail.com>
Subject: Re: HTML regex challenge
Message-Id: <2mhj0aFn8fsfU1@uni-berlin.de>

Max Metral wrote:
> Understood.  To argue my case only slightly more, I'm not parsing arbitrary
> html, I'm looking for a single tag called "localize" which I the replace the
> contents of with the contents of an XML entry from a resource file.  So
> there's never a case where > appears in an attribute of that tag, UNLESS
> it's inside an ASP block (<% %>).  The attributes of the localize tag are
> very restricted, true/false type things, except for the fact that somebody
> may need to "bind" one of these true/falses to a functon call.
> 
> So my latest is:
> <localize((?:[^>]*%>[^>])*[^>]*)>(.*?)</localize>
> 
> which fixes my original problem, but it's true that that won't handle
> 
> <localize visible="<%# x > 5%>">foo</localize>
> 
> but that seems fixable and "final", in that that's the only case that could
> occur given the allowable values of the tag...

Generally, you can't argue against the advice to use a HTML module to 
parse HTML.

If you *really* want to use a regex, (now that you have elaborated 
you're looking for a single, *specific* instance) there are modules on 
CPAN that make this job easier, one being Regexp::Common:

use strict;
use warnings;
use Regexp::Common qw /balanced/;

my $text = q[<localize visible="<%# x > 5%>">foo</localize>];
(my $changed = $text) =~
     s/$RE{balanced}{-begin => '<%'}{-end => '%>'}{-keep}
     /changed text/x;
print $changed . "\n";

Also, when replying to someone please keep the content you quote at the 
top and your reply on the bottom. Your reply to Tad is an example of 
top-posting, which is covered in the group's posting guidelines 
available here:

http://mail.augustmail.com/~tadmc/clpmisc.shtml

HTH - keith


------------------------------

Date: Sun, 25 Jul 2004 16:26:45 +0200
From: Tore Aursand <tore@aursand.no>
Subject: Re: HTML regex challenge
Message-Id: <pan.2004.07.25.14.26.44.214394@aursand.no>

On Sat, 24 Jul 2004 13:21:11 -0400, Max Metral wrote:
> My reg ex is:
> 
> <tag([^>]*?)>(.*?)</tag>
> 
> Which works fine for the first example.  But it doesn't for this:
> 
> <tag a=b c="<%foo%>">stuff</tag>

Hint:  Think right-left, not left-right.


-- 
Tore Aursand <tore@aursand.no>
"Life is pleasant. Death is peaceful. It's the transition that's
 troublesome." (Isaac Asimov)


------------------------------

Date: Fri, 23 Jul 2004 19:18:47 +0100
From: Jim Higson <jh@333.org>
Subject: HTML::Entities::encode() returning wrong(?) entities
Message-Id: <vtqdnXbyoeUVy5zcRVn-rw@eclipse.net.uk>

I'm calling encode_entities on some text I have read from a file, to turn it
into a webpage. According to file:

$ file text/text.en
$ text/text.en: UTF-8 Unicode English text, with very long lines

(although this might not matter)
Anyway, the letter ä appears in the text, and should be changed to &auml;

However, instead it is changed to:
&Atilde;&curren;

I can't see anything unusual about my code. Any ideas why I'm having this
problem?





------------------------------

Date: Fri, 23 Jul 2004 20:43:44 +0100
From: Jim Higson <jh@333.org>
Subject: Re: HTML::Entities::encode() returning wrong(?) entities
Message-Id: <lfWdnZXxbbrt95zcRVn-rQ@eclipse.net.uk>

Jim Higson wrote:

> I'm calling encode_entities on some text I have read from a file, to turn
> it into a webpage. According to file:
> 
> $ file text/text.en
> $ text/text.en: UTF-8 Unicode English text, with very long lines
> 
> (although this might not matter)
> Anyway, the letter ä appears in the text, and should be changed to &auml;
> 
> However, instead it is changed to:
> &Atilde;&curren;
> 
> I can't see anything unusual about my code. Any ideas why I'm having this
> problem?


I just found the answer myself - as  I suspected it was to do with reading
the unicode in perl. Adding use open ':utf8'; to the top of the source
fixed this (although I'm not quite certain exactly what this means)


------------------------------

Date: Sun, 25 Jul 2004 10:13:08 GMT
From: Joe Smith <Joe.Smith@inwap.com>
Subject: Re: HTML::Entities::encode() returning wrong(?) entities
Message-Id: <UILMc.182888$Oq2.50078@attbi_s52>

Jim Higson wrote:

> $ text/text.en: UTF-8 Unicode English text, with very long lines
> Anyway, the letter ä appears in the text, and should be changed to &auml;

In UTF-8 encoding, the single character "ä" is stored as two bytes:
"\xC3" and "\xA9".  If you allow perl to think that the file is ISO-8859-1,
it will interpret those two bytes as "Ã" and "©".  You need to tell perl
that the file is :utf8 in order for it to recognize those two bytes as
being a single Unicode character.

	-Joe


------------------------------

Date: Sun, 25 Jul 2004 17:13:19 -0400
From: Eric Amick <eric-amick@comcast.net>
Subject: Re: HTML::Entities::encode() returning wrong(?) entities
Message-Id: <s588g01cdqfaq3e4bhtvp1j3jpmlcsa3po@4ax.com>

On Fri, 23 Jul 2004 20:43:44 +0100, Jim Higson <jh@333.org> wrote:

>Jim Higson wrote:
>
>> I'm calling encode_entities on some text I have read from a file, to turn
>> it into a webpage. According to file:
>> 
>> $ file text/text.en
>> $ text/text.en: UTF-8 Unicode English text, with very long lines
>> 
>> (although this might not matter)
>> Anyway, the letter ä appears in the text, and should be changed to &auml;
>> 
>> However, instead it is changed to:
>> &Atilde;&curren;
>> 
>> I can't see anything unusual about my code. Any ideas why I'm having this
>> problem?
>
>
>I just found the answer myself - as  I suspected it was to do with reading
>the unicode in perl. Adding use open ':utf8'; to the top of the source
>fixed this (although I'm not quite certain exactly what this means)

It tells Perl to open all files with UTF-8 encoding set by default. Only
you can say whether that is the right thing. If it isn't, you can
specify it for specific files by using ':utf8' as the second argument of
a three-argument open or with a binmode call on the appropriate
filehandle.

-- 
Eric Amick
Columbia, MD


------------------------------

Date: 30 Jul 2004 20:16:27 -0700
From: kkarma@eudoramail.com (kkarma)
Subject: Image magick palette
Message-Id: <7770e62f.0407301916.5ee2ec22@posting.google.com>

I searched a few and seems that here is the right place.
I use image magick on Linux and also in windows. My question is how do
I do in order to convert a .bmp image to 256 colors but using a
palette (colormap?) that contains the windows palette. I need a final
256 colors bmp with the windows colors palette, because windows
defines one of his colors to be the transparent color for his
controls. On PaintShopPro this is achieveable checking include windows
colors on the 256 colors reducing dialog box.
Thanks!


------------------------------

Date: 23 Jul 2004 08:04:27 -0700
From: jpeter1978@yahoo.com (John)
Subject: Re: Inconsistent behavior between SQL*Plus and Perl DBI
Message-Id: <545336be.0407230704.2b1b2569@posting.google.com>

Mladen Gogala <gogala@sbcglobal.net> wrote in message news:<pan.2004.07.19.21.07.36.811529@sbcglobal.net>...
> On Mon, 19 Jul 2004 13:52:40 -0700, John wrote:
> 
> > I'm trying to create a PL/SQL procedure with Perl's DBI module. I
> > first create the query and then use the "do" method to send it to
> > Oracle. However, when I run the script I get the following error:
> > 
> > DBD::Oracle::db do failed: ORA-24344: success with compilation error
> > (DBD SUCCESS_WITH_INFO: OCIStmtExecute) ...
> 
> If I am allowed to guess, there is a problem with quoting 
> in your create statement. You can, of course, take a look at
> USER_ERRORS, which will tell you at which line did the error 
> happen and what was the message. If I may ask, why are you trying to
> create procedure from DBI? That's precisely what sqlplus is all about.
> You should only be calling procedure from DBI, not try creating it.

I'm pretty sure there isn't a problem with the statement because if I
print it out after creating it and put the printed query into
SQL*Plus, the procedure is created and works as expected. It's only
when I try to create it from within Perl using $dbh->do( $query );
that things go awry.

The USER_ERRORS table is empty.

I need to ensure that the procedure is made. It makes one less
dependency that maintainers will have to know/worry about.


------------------------------

Date: 23 Jul 2004 08:09:09 -0700
From: jpeter1978@yahoo.com (John)
Subject: Re: Inconsistent behavior between SQL*Plus and Perl DBI
Message-Id: <545336be.0407230709.4e17d0e2@posting.google.com>

> > *** Code Snippet ***
> > 
> > $temp_query = <<"QUERY_CREATE_PROC";
> I think this might be your problem --^

Don't think so. I've used this format frequently. It's never been a
issue.

> Perhaps oughtn't put a line terminator there.  For the assignment you 
> don't need the <<"QUERY_CREATE_PROC" either.  perl is perfectly happy to 
> read a multiline string up to the ";" terminator.

I use here-docs everywhere. They are not only good for printing
multiple lines of text, but are also an easy way to document code.


------------------------------

Date: 23 Jul 2004 08:11:43 -0700
From: jpeter1978@yahoo.com (John)
Subject: Re: Inconsistent behavior between SQL*Plus and Perl DBI
Message-Id: <545336be.0407230711.49b198c5@posting.google.com>

Gregory Toomey <nospam@bigpond.com> wrote in message news:<1265282.LFvGq1lU9c@GMT-hosting-and-pickle-farming>...
> John wrote:
> 
> ...
> > Oddly enough, if I print out this command and paste it into SQL*Plus,
> > the procedure is created and works as expected.
> ...
> 
> 
> Try printing out $temp_query after you assign it to the here doc, then cut &
> paste into sql*plus. There may be some Perl variable interpolation you did
> not expect.

Been there, done that. Apparently I didn't make that clear.

> 
> gtoomey


------------------------------

Date: 23 Jul 2004 08:23:45 -0700
From: jpeter1978@yahoo.com (John)
Subject: Re: Inconsistent behavior between SQL*Plus and Perl DBI
Message-Id: <545336be.0407230723.2a45f525@posting.google.com>

Mladen Gogala <gogala@sbcglobal.net> wrote in message news:<pan.2004.07.20.04.12.17.974321@sbcglobal.net>...
> On Tue, 20 Jul 2004 09:44:29 +1000, Gregory Toomey wrote:
> 
> > Try printing out $temp_query after you assign it to the here doc, then cut &
> > paste into sql*plus. There may be some Perl variable interpolation you did
> > not expect.
> 
> Oracle has a table called USER_ERRORS in which the last compilation error
> is recorded. There is also a table called USER_SOURCE, from which the
> lines of the compiled source can be obtained. A logical step would be
> to extract the source of the unit that was actually compiled and 
> compare it with the desired. Perl is a mighty tool, and I use it
> frequently, but it shouldn't be the only tool.

Thanks for cluing me in on USER_SOURCE. I've been looking for a way to
do that. Call me lazy, but it seems a little cumbersome to have to
type

select TEXT from USER_SOURCE where NAME = 'PROC_NAME';

every time I want to print out a procedure. Do you know of any
shortcut in SQL*Plus? Or should I write my own procedure to do it?


------------------------------

Date: Fri, 23 Jul 2004 21:12:19 +0530
From: Abhinav <matrix_calling@yahoo.dot.com>
Subject: Re: Inconsistent behavior between SQL*Plus and Perl DBI
Message-Id: <BsaMc.29$pk.81@news.oracle.com>

John wrote:

> I'm trying to create a PL/SQL procedure with Perl's DBI module. I
> first create the query and then use the "do" method to send it to
> Oracle. However, when I run the script I get the following error:
> 
> DBD::Oracle::db do failed: ORA-24344: success with compilation error
> (DBD SUCCESS_WITH_INFO: OCIStmtExecute) ...
> 
> Oddly enough, if I print out this command and paste it into SQL*Plus,
> the procedure is created and works as expected.
> 
> I can't seem to find much on this error, so I hope someone else has
> figured this one out. BTW, I'm using Oracle 9i and SQL*Plus under
> RedHat.
> 
> *** Code Snippet ***
> 
> $temp_query = <<"QUERY_CREATE_PROC";
> create or replace procedure $pager_proc
> ( PAGER_NAME in varchar2, EXPIRATION_AGE in number ) as
>     cursor PAGER_VIEWS is (
>         select OBJECT_NAME, CREATED from ALL_OBJECTS
>         where OBJECT_NAME like 'PAGER_' || PAGER_NAME || '_%' and
>               OBJECT_TYPE = 'VIEW'
>     );
>     EXECUTE_DROP integer default dbms_sql.open_cursor;
>     DUMMY integer;
> begin
>     for ROW in PAGER_VIEWS
>     loop
>         if ( SYSDATE - ROW.CREATED ) * 24 * 60 > EXPIRATION_AGE then
		^^^^^
Not sure about this, but the error could be with SYSDATE.
I am no pl/sql user, but one of my colleagues was having a problem with the 
quoting related to SYSDATE.

I will fill in with more details when I can get in touch with him. In the 
meantime, hope this helps in some way.


>             dbms_sql.parse( EXECUTE_DROP,
>                             'drop view ' || ROW.OBJECT_NAME,
>                             dbms_sql.native );
>             DUMMY := dbms_sql.execute( EXECUTE_DROP );
>         end if;
>     end loop;
>     commit;
> end $pager_proc;
> QUERY_CREATE_PROC
> 
> $dbh->do( $temp_query );

--

Abhinav


------------------------------

Date: 23 Jul 2004 08:46:10 -0700
From: jpeter1978@yahoo.com (John)
Subject: Re: Inconsistent behavior between SQL*Plus and Perl DBI
Message-Id: <545336be.0407230746.5c32de50@posting.google.com>

Richard Morse <remorse@partners.org> wrote in message news:<remorse-66C3E2.13234820072004@plato.harvard.edu>...
> In article <545336be.0407191252.1a478239@posting.google.com>,
>  jpeter1978@yahoo.com (John) wrote:
> 
> > I'm trying to create a PL/SQL procedure with Perl's DBI module. I
> > first create the query and then use the "do" method to send it to
> > Oracle. However, when I run the script I get the following error:
> > 
> > DBD::Oracle::db do failed: ORA-24344: success with compilation error
> > (DBD SUCCESS_WITH_INFO: OCIStmtExecute) ...
> > 
> > Oddly enough, if I print out this command and paste it into SQL*Plus,
> > the procedure is created and works as expected.
> > 
> > I can't seem to find much on this error, so I hope someone else has
> > figured this one out. BTW, I'm using Oracle 9i and SQL*Plus under
> > RedHat.
> > 
> > *** Code Snippet ***
> > 
> > $temp_query = <<"QUERY_CREATE_PROC";
> > create or replace procedure $pager_proc
> > ( PAGER_NAME in varchar2, EXPIRATION_AGE in number ) as
>  [snip]
> >     commit;
> > end $pager_proc;
> 
> you need a '/' on a blank line here...
> 
> > QUERY_CREATE_PROC
> > 
> > $dbh->do( $temp_query );
> 
> SQL*Plus has special magic to handle PL/SQL.  Try adding a line with a 
> '/' at the end of your procedure -- this tells Oracle to go ahead and 
> execute your code immediately.

Oddly enough, putting a '/' after the statement actually inserts it as
part of the procedure. However, when executing it in SQL*Plus the '/'
is interpreted. This appears to be the problem. Apparently ending
things with a '/' is SQL*Plus specific. Even if I put 'commit;' after
"end $pager_proc", it is added to the procedure. I seem to have found
a solution that works however. I just make the commit call separately.
Like so:

$dbh->do( $temp_query );
$dbh->do( 'commit' );

Thanks for all the help. You guys have been great.

> HTH,
> Ricky


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 6810
***************************************


home help back first fref pref prev next nref lref last post