[31362] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 2614 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Sep 28 16:10:12 2009

Date: Mon, 28 Sep 2009 13:09:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 28 Sep 2009     Volume: 11 Number: 2614

Today's topics:
        Another one joins te c.l.p.misc crowd <iain@nospam.hythe.net>
    Re: Another one joins te c.l.p.misc crowd <rabbits77@my-deja.com>
        CGI and UTF-8 <hhr-m@web.de>
    Re: CGI and UTF-8 <jurgenex@hotmail.com>
    Re: CGI and UTF-8 <hjp-usenet2@hjp.at>
    Re: CGI and UTF-8 <OJZGSRPBZVCX@spammotel.com>
    Re: decimal round off issue <hjp-usenet2@hjp.at>
    Re: decimal round off issue <nospam-abuse@ilyaz.org>
    Re: each - iterator clash <bugbear@trim_papermule.co.uk_trim>
        FAQ 5.10 How can I set up a footer format to be used wi <brian@theperlreview.com>
        FAQ 5.40 How do I traverse a directory tree? <brian@theperlreview.com>
        FAQ 6.13 What does it mean that regexes are greedy?  Ho <brian@theperlreview.com>
        FAQ 7.12 How can I tell if a variable is tainted? <brian@theperlreview.com>
        FAQ 7.4 How do I skip some return values? <brian@theperlreview.com>
        FAQ 8.39 How do I set CPU limits? <brian@theperlreview.com>
    Re: Trying to parse/match a C string literal <hjp-usenet2@hjp.at>
        utf8 length OK, or utf8 print OK :-( <peter@www.pjb.com.au>
    Re: utf8 length OK, or utf8 print OK :-( <hjp-usenet2@hjp.at>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sun, 27 Sep 2009 22:08:19 +0100
From: Iain Campbell <iain@nospam.hythe.net>
Subject: Another one joins te c.l.p.misc crowd
Message-Id: <jf2dncslEsXeSSLXnZ2dnUVZ8hydnZ2d@brightview.co.uk>

Thought I would start by saying "Hi!" instead of lurking. Been hacking 
Perl for a few years, but guess this is where I come to really start 
learning the craft.

Friendly hello to all!

Iain.

PS. Anyone got any wizard tips for filtering out news spam; I'm using 
Thunderbird and it's been donkeys years since I used Usenet? Thanks.


------------------------------

Date: Sun, 27 Sep 2009 22:25:37 -0400
From: rabbits77 <rabbits77@my-deja.com>
Subject: Re: Another one joins te c.l.p.misc crowd
Message-Id: <ad2c$4ac01ea8$477ee79f$4000@news.eurofeeds.com>

Iain Campbell wrote:
> Thought I would start by saying "Hi!" instead of lurking. Been hacking 
> Perl for a few years, but guess this is where I come to really start 
> learning the craft.

Going to a local perl mongers group can help too.
As can www.perlmonks.com .

> Friendly hello to all!
> 
> Iain.
> 
> PS. Anyone got any wizard tips for filtering out news spam; I'm using 
> Thunderbird and it's been donkeys years since I used Usenet? Thanks.

Killfiling ("filters" in thunderbird) the regular
trolls, cranks, and spammers should help. Based
on your usenet provider most of the crap will
get removed upstream. I use eurofeeds and
see almost no spam in this group. It is possible
c.l.p.misc doesn't get much spam anyway but, in any
event, I wouldn't worry too much about it.


------------------------------

Date: Mon, 28 Sep 2009 15:41:49 +0200
From: Helmut Richter <hhr-m@web.de>
Subject: CGI and UTF-8
Message-Id: <Pine.LNX.4.64.0909281501590.4453@lxhri01.lrz.lrz-muenchen.de>

I have the task of describing for authors how to prepare forms by CGI scripts
in perl, in particular, how to modify existing scripts to conform to a new
CMS. Meanwhile the CGI-generated pages are all in code UTF-8.

If I have understood everything correctly, the cooperation of the standard CGI
module and the Encode module is utterly tedious, as explained below. Perhaps
I have not seen the obvious.

Dealing with UTF-8 requires that byte strings and texts strings are
meticulously kept apart. Now, one of the functions of the CGI module is the
reuse of the last input as default for the next time. But the input is a byte
string, so the default value must be a byte string as well. An example:

We want to ask for a location and provide the default answer "München"
(Munich's German name) as default in the form. The obvious, but wrong, way
would be

  $cgi->textfield(-name =>'ort', -value => 'München', -size => 40)

but that would interpret the string 'München' as a text string. This is always
wrong: Either STDOUT is binary, then the wide character will hurt. Or else,
STDOUT is UTF-8 (that is, binmode (STDOUT, ":utf8"); has been done), then the
value, if not modified by the user of the form, comes back as something else,
in this case as 'MÃ¼nchen' with the two bytes of the one UTF-8 character
interpreted as two characters. After all, there is no way to do the equivalent
of binmode for the post method of CGI.

The only work-around which I have found is to consequently use byte strings:

  $Muenchen = encode ('utf8', 'München');
  $cgi->textfield(-name =>'ort', -value => $Muenchen, -size => 40)

This works but has the drawback that an extra step of decoding all input
values to text strings is required when the interaction with the user of
the form is over.

I have the suspicion that I am thinking to complicated and that there is a
simple -- and simple to explain -- method for dealing with CGI forms when the
code used is UTF-8. 

-- 
Helmut Richter


------------------------------

Date: Mon, 28 Sep 2009 08:58:47 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: CGI and UTF-8
Message-Id: <q5n1c59vv1vlsh8cdqleqf2cko0f5rpvfi@4ax.com>

Helmut Richter <hhr-m@web.de> wrote:
>We want to ask for a location and provide the default answer "München"[...]
>
>  $cgi->textfield(-name =>'ort', -value => 'München', -size => 40)
>
>but that would interpret the string 'München' as a text string. This is always
>wrong: Either STDOUT is binary, then the wide character will hurt. Or else,
>STDOUT is UTF-8 (that is, binmode (STDOUT, ":utf8"); has been done), then the
>value, if not modified by the user of the form, comes back as something else,
>in this case as 'MÃ¼nchen' with the two bytes of the one UTF-8 character
>interpreted as two characters. After all, there is no way to do the equivalent
>of binmode for the post method of CGI.

I assume you did set the META charset of the HTML page to UTF-8? Or did
you let the browser guess about the encoding and then it returned the
wrong encoding in the form response?

jue


------------------------------

Date: Mon, 28 Sep 2009 20:04:55 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: CGI and UTF-8
Message-Id: <slrnhc1um8.3n9.hjp-usenet2@hrunkner.hjp.at>

On 2009-09-28 13:41, Helmut Richter <hhr-m@web.de> wrote:
[the usual problems with CGI and UTF-8]
> I have the suspicion that I am thinking to complicated and that there is a
> simple -- and simple to explain -- method for dealing with CGI forms when the
> code used is UTF-8. 
>

AFAICT no. Newer versions of CGI have some UTF-8 support, but it isn't
documented at all. In previous threads I've poked around a bit in it
and posted what I found:

* news:slrng4ln1q.h0v.hjp-usenet2@hrunkner.hjp.at
  http://groups.google.at/groups/search?as_umsgid=slrng4ln1q.h0v.hjp-usenet2%40hrunkner.hjp.at&hl=en
  
* news:slrnghu894.1qq.hjp-usenet2@hrunkner.hjp.at
  http://groups.google.at/groups/search?as_umsgid=slrnghu894.1qq.hjp-usenet2%40hrunkner.hjp.at&hl=en

Hope that gives you a starting point.

	hp



------------------------------

Date: Mon, 28 Sep 2009 21:39:11 +0200
From: "Jochen Lehmeier" <OJZGSRPBZVCX@spammotel.com>
Subject: Re: CGI and UTF-8
Message-Id: <op.u0zfjlnsmk9oye@frodo>

On Mon, 28 Sep 2009 15:41:49 +0200, Helmut Richter <hhr-m@web.de> wrote:

> If I have understood everything correctly, the cooperation of the  
> standard CGI module and the Encode module is utterly tedious, as  
> explained below. Perhaps I have not seen the obvious.

Perhaps. I don't exactly know what's going on with your code. I have only  
had
good results when using existing CGI scripts with utf8. That is, scripts  
that used
to run with latin1 were deployed "as is" in a utf8 setting.

The biggest issues I ran into were with DBD::Oracle, which has some very  
ugly
problems in the utf8 world indeed (which, to be honest, are documented as  
"features"),
but that is a different story, not related to CGI.

> Dealing with UTF-8 requires that byte strings and texts strings are
> meticulously kept apart.

Uhm. What are byte strings, what are text strings? Perl does not use these  
words
in the context of utf8.

> else, STDOUT is UTF-8 (that is, binmode (STDOUT, ":utf8"); has been  
> done),

This should not be done. The correct line would be

   binmode STDOUT,":encoding(utf8)";

This activates error checking etc., while your version treats string as  
utf8 while
not checking them at all, which could lead to bad_things[tm] (some docs  
hinted
at segmentation faults even, though I do not know if that is true).

> in this case as 'MÃƒÂ¼nchen' with the two bytes of the one UTF-8 character
> interpreted as two characters. After all, there is no way to do the  
> equivalent of binmode for the post method of CGI.

Sure there is_

   binmode STDIN,":encoding(utf8)";
   $query=new CGI();

If because of some reason you cannot run the binmode before you create the  
$query
object (this happened to me for some reason I won't go into), then it's no  
problem either.
Then you can convert the parameters after "new CGI()" read them from STDIN:

   # Warning, treat this as PSEUDO-CODE, it is from memory only
   $query=new CGI();
   foreach $key ($query->param)
   {
     $query->param($key,Encode::decode("utf8",$query->param($key)));

     # Treating file upload parameters and multi-value parameters are left
     # as an excercise for the reader.
   }

> I have the suspicion that I am thinking to complicated

Aye. ;-)

> and that there is a simple -- and simple to explain -- method for  
> dealing with CGI forms when the code used is UTF-8.

binmode ... ":encoding(utf8)" on both STDIN and STDOUT. Plus proper  
declaration of the charset
for your browser (in the HTTP header and the HTML header, just to be sure).

Good luck!


------------------------------

Date: Sun, 27 Sep 2009 14:24:45 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: decimal round off issue
Message-Id: <slrnhbumce.g75.hjp-usenet2@hrunkner.hjp.at>

On 2009-09-27 02:46, Ilya Zakharevich <nospam-abuse@ilyaz.org> wrote:
> On 2009-09-26, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
>> (The default perl FP->string conversion is a custom implementation
>> (since the C library doesn't offer the functionality)
>
> C library definitely offers the functionality.

Really? Which standard C function gives you the shortest decimal string
representation of a floating point number which can be be converted back
to an fp number with the same value?

The closest I see is gcvt, which isn't a standard C function (but
probably portable enough for the purposes of perl) and doesn't quite cut
it either: It prints
1.0000000000000000818...e-05 as 1.0000000000000001e-05 instead of
0.00001 or 1e-05 (tested with glibc 2.7).

	hp



------------------------------

Date: Mon, 28 Sep 2009 00:01:24 +0000 (UTC)
From: Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: decimal round off issue
Message-Id: <slrnhbvv6k.88g.nospam-abuse@chorin.math.berkeley.edu>

On 2009-09-27, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
> On 2009-09-27 02:46, Ilya Zakharevich <nospam-abuse@ilyaz.org> wrote:
>> On 2009-09-26, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
>>> (The default perl FP->string conversion is a custom implementation
>>> (since the C library doesn't offer the functionality)
>>
>> C library definitely offers the functionality.
>
> Really? Which standard C function gives you the shortest decimal string
> representation of a floating point number which can be be converted back
> to an fp number with the same value?

Really?  From when "The default perl FP->string conversion" follows
this requirement?  See $# (sp?). [*]

   [*]  Of course, it is the ONLY sane semantic for Perl (see perldoc
	perlnumber).  However, AFAIK, it is not implemented.

Here is the history as I know it.  About '96 somebody made a patch
which implemented *this* semantic.  With a very noticable slowdown as
a side effect.

At the moment I had no idea how numbers are handled in Perl.  When (in
2 or 3 years) I discovered what a mess it is (I wrote an automated
testing system, and about 60% of sanity tests were failing), I started
to fix it.  Now: a part of the fix was a consistent caching of the
results of number-->string conversion [**].

   [**] As a resent message about print() shows, it is not THAT
	consistent now...

With *this* fix in place, switching to slower number-->string
conversion MIGHT have been cured of significant slowdowns.  However,
myself, I never found tuits to check this.

Yours,
Ilya


------------------------------

Date: Mon, 28 Sep 2009 08:54:14 +0100
From: bugbear <bugbear@trim_papermule.co.uk_trim>
To: =?ISO-8859-1?Q?J=FCrgen_Exner?= <jurgenex@hotmail.com>
Subject: Re: each - iterator clash
Message-Id: <4AC06BA6.9060703@trim_papermule.co.uk_trim>

Jürgen Exner wrote:
> bugbear <bugbear@trim_papermule.co.uk_trim> wrote:
>> So - is there any convenient way to satisfy my curiosity
>> and FIND the nested keys/values/each call
>> doing the damage?
>>
>> I thought I understood my code (which *is*
>> quite large), and do not understand why such a nested
>> call would be present.
> 
> Please post a minimal sample program that exhibits the issue you
> described.

If I could get it small enough to do that,
I wouldn't need to post.

My requirment here is to find a fault
in a large and complex context.

I was hoping that there might be (and from
Ben's post, there *are*) techniques that could
help me.

   BugBear


------------------------------

Date: Mon, 28 Sep 2009 16:00:05 GMT
From: PerlFAQ Server <brian@theperlreview.com>
Subject: FAQ 5.10 How can I set up a footer format to be used with write()?
Message-Id: <945wm.445280$Ta5.8208@newsfe15.iad>

This is an excerpt from the latest version perlfaq5.pod, which
comes with the standard Perl distribution. These postings aim to 
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .

--------------------------------------------------------------------

5.10: How can I set up a footer format to be used with write()?

    There's no builtin way to do this, but perlform has a couple of
    techniques to make it possible for the intrepid hacker.



--------------------------------------------------------------------

The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.

If you'd like to help maintain the perlfaq, see the details in 
perlfaq.pod.


------------------------------

Date: Mon, 28 Sep 2009 04:00:02 GMT
From: PerlFAQ Server <brian@theperlreview.com>
Subject: FAQ 5.40 How do I traverse a directory tree?
Message-Id: <6xWvm.22382$6f4.13170@newsfe08.iad>

This is an excerpt from the latest version perlfaq5.pod, which
comes with the standard Perl distribution. These postings aim to 
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .

--------------------------------------------------------------------

5.40: How do I traverse a directory tree?

    (contributed by brian d foy)

    The "File::Find" module, which comes with Perl, does all of the hard
    work to traverse a directory structure. It comes with Perl. You simply
    call the "find" subroutine with a callback subroutine and the
    directories you want to traverse:

            use File::Find;

            find( \&wanted, @directories );

            sub wanted {
                    # full path in $File::Find::name
                    # just filename in $_
                    ... do whatever you want to do ...
                    }

    The "File::Find::Closures", which you can download from CPAN, provides
    many ready-to-use subroutines that you can use with "File::Find".

    The "File::Finder", which you can download from CPAN, can help you
    create the callback subroutine using something closer to the syntax of
    the "find" command-line utility:

            use File::Find;
            use File::Finder;

            my $deep_dirs = File::Finder->depth->type('d')->ls->exec('rmdir','{}');

            find( $deep_dirs->as_options, @places );

    The "File::Find::Rule" module, which you can download from CPAN, has a
    similar interface, but does the traversal for you too:

            use File::Find::Rule;

            my @files = File::Find::Rule->file()
                                                             ->name( '*.pm' )
                                                             ->in( @INC );



--------------------------------------------------------------------

The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.

If you'd like to help maintain the perlfaq, see the details in 
perlfaq.pod.


------------------------------

Date: Sun, 27 Sep 2009 10:00:02 GMT
From: PerlFAQ Server <brian@theperlreview.com>
Subject: FAQ 6.13 What does it mean that regexes are greedy?  How can I get around it?
Message-Id: <CIGvm.270$S_4.64@newsfe23.iad>

This is an excerpt from the latest version perlfaq6.pod, which
comes with the standard Perl distribution. These postings aim to 
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .

--------------------------------------------------------------------

6.13: What does it mean that regexes are greedy?  How can I get around it?

    Most people mean that greedy regexes match as much as they can.
    Technically speaking, it's actually the quantifiers ("?", "*", "+",
    "{}") that are greedy rather than the whole pattern; Perl prefers local
    greed and immediate gratification to overall greed. To get non-greedy
    versions of the same quantifiers, use ("??", "*?", "+?", "{}?").

    An example:

            $s1 = $s2 = "I am very very cold";
            $s1 =~ s/ve.*y //;      # I am cold
            $s2 =~ s/ve.*?y //;     # I am very cold

    Notice how the second substitution stopped matching as soon as it
    encountered "y ". The "*?" quantifier effectively tells the regular
    expression engine to find a match as quickly as possible and pass
    control on to whatever is next in line, like you would if you were
    playing hot potato.



--------------------------------------------------------------------

The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.

If you'd like to help maintain the perlfaq, see the details in 
perlfaq.pod.


------------------------------

Date: Sun, 27 Sep 2009 22:00:01 GMT
From: PerlFAQ Server <brian@theperlreview.com>
Subject: FAQ 7.12 How can I tell if a variable is tainted?
Message-Id: <BfRvm.230723$0e4.202091@newsfe19.iad>

This is an excerpt from the latest version perlfaq7.pod, which
comes with the standard Perl distribution. These postings aim to 
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .

--------------------------------------------------------------------

7.12: How can I tell if a variable is tainted?

    You can use the tainted() function of the Scalar::Util module, available
    from CPAN (or included with Perl since release 5.8.0). See also
    "Laundering and Detecting Tainted Data" in perlsec.



--------------------------------------------------------------------

The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.

If you'd like to help maintain the perlfaq, see the details in 
perlfaq.pod.


------------------------------

Date: Sun, 27 Sep 2009 16:00:02 GMT
From: PerlFAQ Server <brian@theperlreview.com>
Subject: FAQ 7.4 How do I skip some return values?
Message-Id: <6_Lvm.22011$6f4.876@newsfe08.iad>

This is an excerpt from the latest version perlfaq7.pod, which
comes with the standard Perl distribution. These postings aim to 
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .

--------------------------------------------------------------------

7.4: How do I skip some return values?

    One way is to treat the return values as a list and index into it:

            $dir = (getpwnam($user))[7];

    Another way is to use undef as an element on the left-hand-side:

            ($dev, $ino, undef, undef, $uid, $gid) = stat($file);

    You can also use a list slice to select only the elements that you need:

            ($dev, $ino, $uid, $gid) = ( stat($file) )[0,1,4,5];



--------------------------------------------------------------------

The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.

If you'd like to help maintain the perlfaq, see the details in 
perlfaq.pod.


------------------------------

Date: Mon, 28 Sep 2009 10:00:02 GMT
From: PerlFAQ Server <brian@theperlreview.com>
Subject: FAQ 8.39 How do I set CPU limits?
Message-Id: <CO%vm.230749$0e4.184140@newsfe19.iad>

This is an excerpt from the latest version perlfaq8.pod, which
comes with the standard Perl distribution. These postings aim to 
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .

--------------------------------------------------------------------

8.39: How do I set CPU limits?

    (contributed by Xho)

    Use the "BSD::Resource" module from CPAN. As an example:

            use BSD::Resource;
            setrlimit(RLIMIT_CPU,10,20) or die $!;

    This sets the soft and hard limits to 10 and 20 seconds, respectively.
    After 10 seconds of time spent running on the CPU (not "wall" time), the
    process will be sent a signal (XCPU on some systems) which, if not
    trapped, will cause the process to terminate. If that signal is trapped,
    then after 10 more seconds (20 seconds in total) the process will be
    killed with a non-trappable signal.

    See the "BSD::Resource" and your systems documentation for the gory
    details.



--------------------------------------------------------------------

The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.

If you'd like to help maintain the perlfaq, see the details in 
perlfaq.pod.


------------------------------

Date: Sun, 27 Sep 2009 13:58:49 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Trying to parse/match a C string literal
Message-Id: <slrnhbukrr.g75.hjp-usenet2@hrunkner.hjp.at>

On 2009-09-25 16:38, Uri Guttman <uri@StemSystems.com> wrote:
>>>>>> "PJH" == Peter J Holzer <hjp-usenet2@hjp.at> writes:
>  PJH> On 2009-09-24 18:56, Uri Guttman <uri@StemSystems.com> wrote:
>  >>>>>>> "jpc" == jl post@hotmail com <jl_post@hotmail.com> writes:
>  >> 
>  jpc> I'm trying to write Perl code that scans through a C/C++ and
>  jpc> matches string literals.  I want to use a regular expression for this,
>  jpc> so that if given these inputs, it will extract these outputs:
>  >> 
>  >> that can't be done easily with a single regex so don't even try. look at
>  >> text::balanced on cpan which is designed to match c strings and similar things.
>
>  PJH> At the translation stage where string literals are recognised by
>  PJH> a C compiler there are no nesting constructs, so I don't see why
>  PJH> you would want to use Text::Balanced.
>
>  PJH> (For a real solution you would need to take comments into account, but
>  PJH> they don't nest either)
>
> but you can have a string literal inside a comment

No, you can't. You can have something that looks like a string literal,
but it's just a series of characters which happens to have two quote
characters in it:

/* aa "bb */ " cc

This is a comment with the content « aa "bb », followed by a string
literal which starts with « cc», not the beginning of a comment with the
content « aa "bb */ " cc».


Here is my take on the problem with regexps:

#!/usr/bin/perl
use warnings;
use strict;

use File::Slurp;

my $_ = read_file($ARGV[0]);

while (
    m{
        \G
        (?: 
            (?: /\* .*? \*/ )
            |
            [^'"]
            |
            ' (?: [^'\\] | \\ (?: [0-7]{0,3} | x [0-9A-Fa-f]{2} | . ) ) '
        )*
        (" (?: [^"\\] | \\ (?: [0-7]{0,3} | x [0-9A-Fa-f]{2} | . ) )* " )
    }sxg
) {
    print "$1\n";
}
__END__


and here my attempt at using Text::Balanced:

#!/usr/bin/perl
use warnings;
use strict;

use File::Slurp;
use Text::Balanced qw(extract_multiple extract_delimited);

my $_ = read_file($ARGV[0]);

while (defined (
        my $section
            = extract_multiple(
                $_,
                [
                    qr{/\* .*? \*/}sx,
                    qr{[^'"]}sx,
                    sub { extract_delimited($_[0], q{'"}) },
                ]
              )
       )
) {
    print "$section\n" if $section =~ /^"/;
}
__END__


I'm not sure whether the second one works correctly: The description of
extract_delimited doesn't quite match the definition of C string and
character literals but I think the difference doesn't matter in this
case (Except for multi-line strings, but I don't handle them correctly
in the regexp version either).

I don't think the Text::Balanced version is much more readable or
elegant or easier to maintain.

	hp


------------------------------

Date: 28 Sep 2009 00:00:40 GMT
From: Peter Billam <peter@www.pjb.com.au>
Subject: utf8 length OK, or utf8 print OK :-(
Message-Id: <slrnhbvv5b.jud.peter@box8.pjb.com.au>

Apologies for YAUTF8Q. If your newsreader isn't displaying it right,
the string is eacute t eacute, French for summer, 3 letters long.
   $s='Ã©tÃ©';
   printf "length of %s is %d\n", $s, length($s);
outputs:
   length of Ã©tÃ© is 5
where the string is printed right, but the length is wrong :-(

and if I add a "use utf8",
   use utf8;
   $s='Ã©tÃ©';
   printf "length of %s is %d\n", $s, length($s);
outputs:
   length of ï¿½ï¿½is 3
where the length is right, but the string has vanished :-(

How can I get both right ?  I'm using v5.10.0 on debian lenny,
and all my xterms and other apps use utf8 by default...

Regards,  Peter

-- 
Peter Billam       www.pjb.com.au    www.pjb.com.au/comp/contact.html


------------------------------

Date: Mon, 28 Sep 2009 02:22:51 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: utf8 length OK, or utf8 print OK :-(
Message-Id: <slrnhc00er.pqh.hjp-usenet2@hrunkner.hjp.at>

On 2009-09-28 00:00, Peter Billam <peter@www.pjb.com.au> wrote:
> Apologies for YAUTF8Q. If your newsreader isn't displaying it right,
> the string is eacute t eacute, French for summer, 3 letters long.
>    $s='Ã©tÃ©';
>    printf "length of %s is %d\n", $s, length($s);
> outputs:
>    length of Ã©tÃ© is 5
> where the string is printed right, but the length is wrong :-(
>
> and if I add a "use utf8",
>    use utf8;

binmode STDOUT, ":encoding(UTF-8)";

>    $s='Ã©tÃ©';
>    printf "length of %s is %d\n", $s, length($s);
> outputs:
>    length of ï¿½ï¿½is 3
> where the length is right, but the string has vanished :-(
>
> How can I get both right ?

See above.

	hp



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 2614
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[31362] in Perl-Users-Digest

Perl-Users Digest, Issue: 2614 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Mon Sep 28 16:10:12 2009

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Sep 28 16:10:12 2009