[31877] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3140 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Sep 23 11:09:34 2010

Date: Thu, 23 Sep 2010 08:09:14 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 23 Sep 2010     Volume: 11 Number: 3140

Today's topics:
    Re: Displaying 'umlaut' character <hjp-usenet2@hjp.at>
    Re: Displaying 'umlaut' character <sherm.pendley@gmail.com>
    Re: Displaying 'umlaut' character <joel-garry@home.com>
    Re: Displaying 'umlaut' character <jurgenex@hotmail.com>
    Re: Displaying 'umlaut' character <rvtol+usenet@xs4all.nl>
    Re: Displaying 'umlaut' character <ben@morrow.me.uk>
    Re: Displaying 'umlaut' character <uri@StemSystems.com>
    Re: Displaying 'umlaut' character <jurgenex@hotmail.com>
    Re: Displaying 'umlaut' character (Randal L. Schwartz)
    Re: Displaying 'umlaut' character <jurgenex@hotmail.com>
        problems assembling POST HTTP::Request <ron.eggler@gmail.com>
    Re: problems assembling POST HTTP::Request <ben@morrow.me.uk>
    Re: Removing tag + closing tag <tadmc@seesig.invalid>
    Re: Removing tag + closing tag sln@netherlands.com
    Re: Removing tag + closing tag <tcmvandenheuvel@gmail.com>
    Re: Removing tag + closing tag sln@netherlands.com
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 22 Sep 2010 15:29:16 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Displaying 'umlaut' character
Message-Id: <slrni9k15d.85t.hjp-usenet2@hrunkner.hjp.at>

On 2010-09-22 08:13, Frank van Bortel <fbortel@home.nl> wrote:
> Apart from what I replied earlier, the correct way to encode
> is of course "&ouml;" (without the quotes...)

That's not *the* correct way, just *a* correct way. Encoding it in the
charset indicated in the Content-Type header or a meta tag is equally
correct (and preferrable in most cicumstances, IMHO).

	hp



------------------------------

Date: Wed, 22 Sep 2010 09:55:13 -0400
From: Sherm Pendley <sherm.pendley@gmail.com>
Subject: Re: Displaying 'umlaut' character
Message-Id: <m2aan9q3we.fsf@sherm.shermpendley.com>

jt@toerring.de (Jens Thoms Toerring) writes:

> And a web server normally
> sends a HTML header with the page that may contain a line
> like
>
> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

No, <meta http-equiv='...'> elements are only of use where there is *no*
web server in use, such as when one is browing local files found on one's
own HD.

A web server will send a *real* HTTP header, which will override any <meta>
equivalents found in the HTML content.

sherm--

-- 
Sherm Pendley
                                   <http://camelbones.sourceforge.net>
Cocoa Developer


------------------------------

Date: Wed, 22 Sep 2010 09:35:33 -0700 (PDT)
From: joel garry <joel-garry@home.com>
Subject: Re: Displaying 'umlaut' character
Message-Id: <69d63c1a-27bf-4bc2-9ac1-c5c60f378fda@x20g2000pro.googlegroups.com>

On Sep 22, 12:20=A0am, Frank van Bortel <fbor...@home.nl> wrote:
> On 09/22/2010 06:50 AM, dn.p...@gmail.com wrote:
>
>
>
> > My aim is to display the =91special=92 (NON-Ascii) German character/
> > diacritic umlaut or diaresis correctly on a browser. The browser calls
> > a cgi perl-script which resides on a linux server. The browser which
> > calls the perl-script displays Vietnamese characters correctly (but
> > not the umlaut) without any special setting. The script sets NLS_LANG
> > variable to AMERICAN_AMERICA.UTF8 and uses utf8 module, but that=92s
> > about it.
>
> > $ENV{'NLS_LANG'}=3D'AMERICAN_AMERICA.UTF8';
> > =A0 =A0 =A0Works for Vietnamese characters, but not with umlaut (=F6).
>
> > But even before we get to a perl-script, perhaps the LC_CTYPE env
> > variable needs to be set correctly. From my windows laptop, if I
> > access Oracle through Oracle Query Server, I can see the umlaut. But
> > if I open a linux-window, initiate an sqlplus session, and run the
> > same SQL, I do not see the umlaut correctly. I have tried a few values
> > for the env variable LC_CTYPE (like iso_8859_1, en_US,
> > en_US.iso88591), but with no luck. The surprising thing is that
> > =91umalut=92 is a muck-known alphabet, Vietnamese alphabets are less-
> > known. Yet the Vietnamese characters are being displayed correctly.
>
> > What settings should I use in a perl-script or for a linux-window to
> > see the umlaut correctly? Please advise.
>
> Maybe this helps: (shameless self promotion)http://vanbortel.blogspot.com=
/2009/04/special-characters-part-i.html
> Last part is here:http://vanbortel.blogspot.com/2010/01/special-character=
s-part-iv.html

Thanks for that Frank, I'm always forgetting where I've seen the
excellent write-up.

It always need to be emphasized that using the wrong database
character set creates a ticking time bomb, as Oracle is so
sophisticated about automatic conversions in various circumstances.

jg
--
@home.com is bogus.
http://www.fastcompany.com/1690122/bmw-touts-integration-with-ipads-blackbe=
rry-google


------------------------------

Date: Wed, 22 Sep 2010 18:13:54 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Displaying 'umlaut' character
Message-Id: <2bal96ph8gsipkgiuu48suafj6ppn1dj9p@4ax.com>

Frank van Bortel <fbortel@home.nl> wrote:
>Apart from what I replied earlier, the correct way to encode
>is of course "&ouml;" (without the quotes...)

If that were true then I guess we wouldn't need Unicode and all the
gazillion other attempts to represent non-English letters.

jue


------------------------------

Date: Thu, 23 Sep 2010 13:40:26 +0200
From: "Dr.Ruud" <rvtol+usenet@xs4all.nl>
Subject: Re: Displaying 'umlaut' character
Message-Id: <4c9b3caa$0$41117$e4fe514c@news.xs4all.nl>

On 2010-09-23 03:13, Jürgen Exner wrote:
> Frank van Bortel<fbortel@home.nl>  wrote:

>> Apart from what I replied earlier, the correct way to encode
>> is of course "&ouml;" (without the quotes...)
>
> If that were true then I guess we wouldn't need Unicode and all the
> gazillion other attempts to represent non-English letters.

Non-English? The trema (diaeresis) is often used: cooperate reenact 
zoology Brontë naïve. (Umlaut diacritics are not.)

-- 
Ruud


------------------------------

Date: Thu, 23 Sep 2010 13:15:35 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Displaying 'umlaut' character
Message-Id: <70trm7-ifv2.ln1@osiris.mauzo.dyndns.org>

Quoth "Dr.Ruud" <rvtol+usenet@xs4all.nl>:
> On 2010-09-23 03:13, Jürgen Exner wrote:
> > Frank van Bortel<fbortel@home.nl>  wrote:
> 
> >> Apart from what I replied earlier, the correct way to encode
> >> is of course "&ouml;" (without the quotes...)
> >
> > If that were true then I guess we wouldn't need Unicode and all the
> > gazillion other attempts to represent non-English letters.
> 
> Non-English? The trema (diaeresis) is often used: cooperate reenact 
> zoology 

Only if you're *seriously* pretentious.

> Brontë

This is a case of historically-frozen pretension: the authors' father
was born with the surname 'Brunty', and changed it because he thought it
would make him more interesting.

> naïve.

Foreign import, of which there are many other cases: pâté, œuvre,
mediæval…

I don't think there are any native English words which need any
non-ASCII letters.

Ben



------------------------------

Date: Thu, 23 Sep 2010 08:42:41 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: Displaying 'umlaut' character
Message-Id: <87lj6s4on2.fsf@quad.sysarch.com>

>>>>> "BM" == Ben Morrow <ben@morrow.me.uk> writes:

  BM> I don't think there are any native English words which need any
  BM> non-ASCII letters.

for some definition of native english!
for other definitions, all of english is non-native.  :)

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Thu, 23 Sep 2010 06:25:47 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Displaying 'umlaut' character
Message-Id: <o7lm969h3rm02pceglu817v45ppjnrj8kt@4ax.com>

"Dr.Ruud" <rvtol+usenet@xs4all.nl> wrote:
>On 2010-09-23 03:13, Jürgen Exner wrote:
>> Frank van Bortel<fbortel@home.nl>  wrote:
>
>>> Apart from what I replied earlier, the correct way to encode
>>> is of course "&ouml;" (without the quotes...)
>>
>> If that were true then I guess we wouldn't need Unicode and all the
>> gazillion other attempts to represent non-English letters.
>
>Non-English? The trema (diaeresis) is often used: cooperate reenact 
>zoology Brontë naïve. (Umlaut diacritics are not.)

Quite right. Nevertheless it's still not a character found in the
English alphabet.

jue


------------------------------

Date: Thu, 23 Sep 2010 06:26:57 -0700
From: merlyn@stonehenge.com (Randal L. Schwartz)
Subject: Re: Displaying 'umlaut' character
Message-Id: <868w2s1tge.fsf@red.stonehenge.com>

>>>>> "Uri" == Uri Guttman <uri@StemSystems.com> writes:

Uri> for other definitions, all of english is non-native.  :)

Really?  What language was the origination of commonly-used "laser" and
"radar"?

Looks like native english to me.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion


------------------------------

Date: Thu, 23 Sep 2010 06:53:13 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Displaying 'umlaut' character
Message-Id: <snmm969mkmuev9oss7p7m20p9dfjc16ktq@4ax.com>

merlyn@stonehenge.com (Randal L. Schwartz) wrote:
>>>>>> "Uri" == Uri Guttman <uri@StemSystems.com> writes:
>
>Uri> for other definitions, all of english is non-native.  :)
>
>Really?  What language was the origination of commonly-used "laser" and
>"radar"?
>Looks like native english to me.

You are right with "radar", but "laser" is pure Americanese ;-)

jue


------------------------------

Date: Wed, 22 Sep 2010 09:55:23 -0700 (PDT)
From: cerr <ron.eggler@gmail.com>
Subject: problems assembling POST HTTP::Request
Message-Id: <e322bf68-e7c2-4f37-a0b8-da3bed87ea21@g10g2000vbc.googlegroups.com>

Hi,

I'm trying to post a filename (that generally comes out of a html
<input type="file" box) to a perl script out of my perl script that
should pretend as if it was done by hand on the webpage...
The code POST request i came up with looks like this:

				my $post = HTTP::Request->new(POST => $PostPage,
				Content_Type => 'form-data',
				Content      => [ filename  => 'file://mnt/ENGINEERING/Docs/Tropos/
pAce34-7.1.2.3-5189k-efs.bin',
                              ]);

but when i execute it i get:

Bad header argument at ./upgrdeFrmwr.pl line 50

Where line 50 is the first line of my POST request... I do'nt
understand, what kind of header argument may I require?

Thanks for hints and ideas!
Ron


------------------------------

Date: Wed, 22 Sep 2010 18:29:16 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: problems assembling POST HTTP::Request
Message-Id: <c0rpm7-2302.ln1@osiris.mauzo.dyndns.org>


Quoth cerr <ron.eggler@gmail.com>:
> 
> I'm trying to post a filename (that generally comes out of a html
> <input type="file" box) to a perl script out of my perl script that
> should pretend as if it was done by hand on the webpage...
> The code POST request i came up with looks like this:
> 
> 				my $post = HTTP::Request->new(POST => $PostPage,
> 				Content_Type => 'form-data',
> 				Content      => [ filename  => 'file://mnt/ENGINEERING/Docs/Tropos/
> pAce34-7.1.2.3-5189k-efs.bin',
>                               ]);
> 
> but when i execute it i get:
> 
> Bad header argument at ./upgrdeFrmwr.pl line 50
> 
> Where line 50 is the first line of my POST request... I do'nt
> understand, what kind of header argument may I require?

HTTP::Request->new takes up to four arguments, the third of which must
be an HTTP::Headers object. You have passed six, none of which are.

I suspect you want the POST function (not method) out of
HTTP::Request::Common, or the ->post method on LWP::UserAgent.

Ben



------------------------------

Date: Wed, 22 Sep 2010 08:11:13 -0500
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: Removing tag + closing tag
Message-Id: <slrni9k03b.65h.tadmc@tadbox.sbcglobal.net>

jwcarlton <jwcarlton@gmail.com> wrote:
> On Sep 20, 11:25 pm, Tad McClellan <ta...@seesig.invalid> wrote:

>> You make your reputation, and then you live with it.

> Reputation? To my knowledge, I have no enemies here.


I'm quite sure you earned some in this thread:

http://groups.google.com/groups/search?as_umsgid=1158029547.380968.166590%40e63g2000cwd.googlegroups.com


> Methinks you may just be filtering people that use Google Groups. 


No, it is for you specifically:

   % ingrates, sour grapes
           From: jwcarlton@gmail.com


> so all you
> really contributed was unnecessary BS.


That was on purpose.


-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.


------------------------------

Date: Wed, 22 Sep 2010 17:24:08 -0700
From: sln@netherlands.com
Subject: Re: Removing tag + closing tag
Message-Id: <u46l9650408a3gig0ikgf8gubnqntsvo8k@4ax.com>

On Mon, 20 Sep 2010 18:01:58 -0700 (PDT), jwcarlton <jwcarlton@gmail.com> wrote:

>Let's say I have something like this:
>
>$var = "<font background='#F5F5F5'>Here is some <font
>color='#DADADA'>text</font>. Cool, huh?</font>";
>
>I want to remove <font background='#F5F5F5'> and it's matching </
>font>, but not the nested font tags.

The nesting can be handled via regex recursion (Perl 5.10 and above)
if you can live with an  attribute = (?:"[^<]*?"|'[^<]*?') scenario.

It can still be handled if you can't live with "[^<]*?".
This requires a different strategy of evaluating attr/val in the
loop body upon a sucessful match with the
   <(?:font(\s+(?:(?:".*?")|(?:'.*?')|(?:[^>]*?))+)\s*(\/?))>
expression, which is guaranteed not to overrun the next markup.
It simply just stores the position or not.
See the below code.

A change scheme with regex might be faster than a tree since all
thats being done is sparce matching with mild validation parsing.
Depends on what you are willing to live with.
If you take out the debug stuff, its really not much code.

-sln
----------------
use strict;
use warnings;

## OP:
## "I want to remove <font background='#F5F5F5'> and it's
##  matching </font>, but not the nested font tags."

##
  my $debug = 1;    # level: 0, 1 or 2
  
  my $xml=<<EOXML;
<data>
  start
  <font background='#F5F5F5'>
     Here is some
     <font a>
        <font color='#A5A5A5' background='#BABABA'/>
        <font background='#DADADA'>
           text
           <font/>
        </font>
        Cool,
        <font color='#F5F5F5'>
           huh?
           <font b>
              italics
              <!--
                <font background='#CFCFCF'>
                   in comment
                </font>
              -->
              <font background='#EFEFEF'>
                 more
              </font>
           </font>
        </font>
     </font>
     <font/>
  </font>
  end
</data>
EOXML

##
  my $attr       = 'background';
  my $open_attr  = q{<font\s+[^>]*?(?<=\s)}.$attr.q{\s*=\s*(?:"[^<]*?"|'[^<]*?')[^>]*?(?<!\/)>};
  my $close_attr = q{<font\s+[^>]*?(?<=\s)}.$attr.q{\s*=\s*(?:"[^<]*?"|'[^<]*?')[^>]*?\s*\/>};
  my $open       = q{<font\s*[^>]*?(?<!\/)>};
  my $close      = q{<\/font\s*>};

  my $regx = qr/

     (<!(?:\[CDATA\[.*?\]\]|--.*?--|\[[A-Z][A-Z\ ]*\[.*?\]\])>)  #1
    |
     ($close_attr)  #2
    |
     ( #3
        (?: ($open_attr) | $open )  #4
        ( #5
           (?:
              (?>
                 (?:
                      (?:<!(?:\[CDATA\[.*?\]\]|--.*?--|\[[A-Z][A-Z\ ]*\[.*?\]\])>)
                    | (?! $open | $close ) . 
                 )+
              )
             | (?3)
           )*
        ) 
        ($close)  #6
     )
  /xs;

##
  my @cleartag;

  while ( $xml =~ /$regx/ig )
  {
    if (defined $1) {
        print "---->\$1 = '$1'\n" if $debug > 1;
        pos($xml) = $+[1];
    }
    elsif (defined $2) {
        push @cleartag, [$-[2], length $2];
        print "---->\$2 = '$2'\n" if $debug > 1;
        pos($xml) = $+[2];
    }
    else {
        if (defined $4) {
            push @cleartag, [$-[4], length $4];
            push @cleartag, [$-[6], length $6];
            print "---->\$4 = '$4'\n" if $debug > 1;
            print "---->\$6 = '$6'\n" if $debug > 1;
         }
         pos($xml) = $-[5];
    }
  }

  if (@cleartag)
  {
    print "\n--- OLD ------------\n$xml\n\n" if $debug;
    for my $ref ( sort {$b->[0]<=>$a->[0]} @cleartag )
    {
        print "offset= $ref->[0], length= $ref->[1]\n" if $debug > 1;
        substr $xml, $ref->[0], $ref->[1], ($debug > 1 ? '-' x $ref->[1] : "");
    }
    print "\n--- NEW (", (@cleartag/2),") -------\n$xml\n\n" if $debug;
  }
  else {
    print "No changes made!\n";
  }
  print "---------\nDone!\n";

__END__

Output:

--- OLD ------------
<data>
  start
  <font background='#F5F5F5'>
     Here is some
     <font a>
        <font color='#A5A5A5' background='#BABABA'/>
        <font background='#DADADA'>
           text
           <font/>
        </font>
        Cool,
        <font color='#F5F5F5'>
           huh?
           <font b>
              italics
              <!--
                <font background='#CFCFCF'>
                   in comment
                </font>
              -->
              <font background='#EFEFEF'>
                 more
              </font>
           </font>
        </font>
     </font>
     <font/>
  </font>
  end
</data>



--- NEW (3.5) -------
<data>
  start

     Here is some
     <font a>


           text
           <font/>

        Cool,
        <font color='#F5F5F5'>
           huh?
           <font b>
              italics
              <!--
                <font background='#CFCFCF'>
                   in comment
                </font>
              -->

                 more

           </font>
        </font>
     </font>
     <font/>

  end
</data>


---------
Done!



------------------------------

Date: Thu, 23 Sep 2010 01:35:11 -0700 (PDT)
From: Theo van den Heuvel <tcmvandenheuvel@gmail.com>
Subject: Re: Removing tag + closing tag
Message-Id: <d1f9e0b2-ea3d-4f17-aa7f-78aef36fc0b4@m16g2000vbs.googlegroups.com>

On 23 sep, 02:24, s...@netherlands.com wrote:
> On Mon, 20 Sep 2010 18:01:58 -0700 (PDT), jwcarlton <jwcarl...@gmail.com> wrote:
> >Let's say I have something like this:
>
> >$var = "<font background='#F5F5F5'>Here is some <font
> >color='#DADADA'>text</font>. Cool, huh?</font>";
>
> >I want to remove <font background='#F5F5F5'> and it's matching </
> >font>, but not the nested font tags.
>
> The nesting can be handled via regex recursion (Perl 5.10 and above)
 ... Loads of regex madness
> Done!

The OP is strongly recommended to follow the advice that is posted
here every week and use an existing HTML parser instead of doing
something that can be mathematically proven to be impossible unless
for fairly trivial cases. Sln's approach only indicates how convoluted
and vulnerable the regex attempts need to be. They can never scale
when requirements are added.

Theo van den Heuvel


------------------------------

Date: Thu, 23 Sep 2010 07:18:00 -0700
From: sln@netherlands.com
Subject: Re: Removing tag + closing tag
Message-Id: <m4mm969f5aubejoal0q16kcvg2les29ksn@4ax.com>

On Thu, 23 Sep 2010 01:35:11 -0700 (PDT), Theo van den Heuvel <tcmvandenheuvel@gmail.com> wrote:

>On 23 sep, 02:24, s...@netherlands.com wrote:
>> The nesting can be handled via regex recursion (Perl 5.10 and above)
>
>... use an existing HTML parser instead of doing

Uh, that would be xhtml or xml. Allowing un-closed tags requires
an inside-out nesting strategy when stripping out selected tags.
Its doable.

>something that can be mathematically proven to be impossible unless
>for fairly trivial cases.

Inpossible? You obviously tried the code. Work for you?
Let me know if it doesen't.
Nothing trivial about this code. I've just shown how this can be done
with rx recursion. Prove this wrong mathematically!

> They can never scale when requirements are added.

They? There is nothing "general" about the code. Its specific.
Do you actually know what it does?

>
You speak of "parser" but don't understand the language.
This is not parsing anything other than balanced text
using the recursion engine of Perl 5.10.
Take it up with Larry if there is a problem.

Lets just say nesting can be handled quite well.

-sln


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3140
***************************************


home help back first fref pref prev next nref lref last post