[25428] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 7673 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Jan 19 18:05:39 2005

Date: Wed, 19 Jan 2005 15:05:14 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 19 Jan 2005     Volume: 10 Number: 7673

Today's topics:
    Re: Directory listing <a.newmane.remove@eastcoastcz.com>
    Re: FAQ 4.64 How can I get the unique keys from two has <a.newmane.remove@eastcoastcz.com>
    Re: FAQ 4.64 How can I get the unique keys from two has <a.newmane.remove@eastcoastcz.com>
    Re: FAQ 4.64 How can I get the unique keys from two has (Anno Siegel)
    Re: FAQ 4.64 How can I get the unique keys from two has <a.newmane.remove@eastcoastcz.com>
    Re: FAQ 5.0 How do I pack arrays of doubles or floats f xhoster@gmail.com
    Re: Help Perlling a program <bik.mido@tiscalinet.it>
    Re: How bad is $'? (Was: "Get substring of line") <bik.mido@tiscalinet.it>
        locale problem <no@mail.org>
    Re: locale problem <news@chaos-net.de>
    Re: locale problem <no@mail.org>
    Re: locale problem <news@chaos-net.de>
    Re: locale problem <flavell@ph.gla.ac.uk>
    Re: locale problem <no@mail.org>
    Re: MAP Question <bik.mido@tiscalinet.it>
    Re: Need help with Perl and MySQL database data load <spamtrap@dot-app.org>
        Negative lookahead regex clarification needed <shifty_MyU@yahoo.com>
    Re: Negative lookahead regex clarification needed <flavell@ph.gla.ac.uk>
    Re: Negative lookahead regex clarification needed <jgibson@mail.arc.nasa.gov>
    Re: Negative lookahead regex clarification needed (Anno Siegel)
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 19 Jan 2005 09:22:21 -0800
From: "Alfred Z. Newmane" <a.newmane.remove@eastcoastcz.com>
Subject: Re: Directory listing
Message-Id: <357jaeF4jeq7eU1@individual.net>

Richard Gration wrote:
> On Tue, 18 Jan 2005 17:59:23 -0800, Crom wrote:
>
>> Abigail wrote:
>>> Frank Raz (rv01@gre.ac.uk) wrote on MMMMCLVIII September MCMXCIII in
>>> <URL:news:DhhHd.280$Z31.44@newsfe1-gui.ntli.net>:
>>> ##  Hi
>>> ##
>>> ##  I am a new comer to the world of PERL scripting.
>>>
>>> The language is called Perl. The binary is called perl. There's no
>>> such thing as PERL.
>>
>> Please excuse my bluntness, but the sky is NOT falling.
>
> I think you're missing the point. Programming is a precise endeavour.
> It is an enormous help if you are in the habit of being precise all
> the time, rather than trying to decide what context you are in and
> whether you can afford to be sloppy, both in expression and
> interpretation. That's why the objection to casing. It's about
> respecting the input filters (and therefore *time*) of the people
> whose help you are asking for.
>
> At least, that's one way of looking at it.

Agreed. I always also suspected it also had to do with a Unix/Linux 
user's nature, seeing things as case sensitive, like in the OS itself. 
Like the primary bin file being `perl`, and the name, being a proper 
noun, 'Perl'. It's a no brainer a lot of people who frequent this news 
group are (mostly long time) Unix or Linux users, and even if you 
aren't, it's still good to keep things right, less there be confusion. 




------------------------------

Date: Wed, 19 Jan 2005 09:35:06 -0800
From: "Alfred Z. Newmane" <a.newmane.remove@eastcoastcz.com>
Subject: Re: FAQ 4.64 How can I get the unique keys from two hashes?
Message-Id: <357k2bF4jchecU1@individual.net>

Anno Siegel wrote:
> PerlFAQ Server  <comdog@panix.com> wrote in comp.lang.perl.misc:
>> This message is one of several periodic postings to
>> comp.lang.perl.misc intended to make it easier for perl programmers
>> to find answers to common questions. The core of this message
>> represents an excerpt
>> from the documentation provided with Perl.
>>
>> --------------------------------------------------------------------
>>
>> 4.64: How can I get the unique keys from two hashes?
>
> The phrase "unique keys from two hashes" is ambiguous.  From the
> solution...
>
>>     First you extract the keys from the hashes into lists, then
>>     solve the "removing duplicates" problem described above. For
>> example:
>>
>>         %seen = ();
>>         for $element (keys(%foo), keys(%bar)) {
>>             $seen{$element}++;
>>         }
>>         @uniq = keys %seen;
>
> ...it is clear that it meant to mean "the keys from both hashes, each
> only once", but it can also be read as "the keys that appear in one
> hash but not the other".

I think you got that a little wrong, at least the 2nd part here.

   "the keys that appear in one hash but not the other".

is not the same as

   "the keys from both hashes, each only once"

For the former, if (1,2,4,5) and (1,3,4,6), you would end up with
(2,3,5,6), as those are the ones that appear in one list, but not
 the other.

For the latter, if (1,2,4,5) and (1,3,4,6), you would end up with
(1,2,3,4,5,6)

My point is the two statements you put forth are /not/ the same, but
quite different. Just thought I'd point that out. :-) 




------------------------------

Date: Wed, 19 Jan 2005 09:37:28 -0800
From: "Alfred Z. Newmane" <a.newmane.remove@eastcoastcz.com>
Subject: Re: FAQ 4.64 How can I get the unique keys from two hashes?
Message-Id: <357k6pF4ischbU1@individual.net>

Alfred Z. Newmane wrote:
> Anno Siegel wrote:
>> PerlFAQ Server  <comdog@panix.com> wrote in comp.lang.perl.misc:
>>> This message is one of several periodic postings to
>>> comp.lang.perl.misc intended to make it easier for perl programmers
>>> to find answers to common questions. The core of this message
>>> represents an excerpt
>>> from the documentation provided with Perl.
>>>
>>> --------------------------------------------------------------------
>>>
>>> 4.64: How can I get the unique keys from two hashes?
>>
>> The phrase "unique keys from two hashes" is ambiguous.  From the
>> solution...
>>
>>>     First you extract the keys from the hashes into lists, then
>>>     solve the "removing duplicates" problem described above. For
>>> example:
>>>
>>>         %seen = ();
>>>         for $element (keys(%foo), keys(%bar)) {
>>>             $seen{$element}++;
>>>         }
>>>         @uniq = keys %seen;
>>
>> ...it is clear that it meant to mean "the keys from both hashes, each
>> only once", but it can also be read as "the keys that appear in one
>> hash but not the other".
>
> I think you got that a little wrong, at least the 2nd part here.
>
>   "the keys that appear in one hash but not the other".
>
> is not the same as
>
>   "the keys from both hashes, each only once"
>
> For the former, if (1,2,4,5) and (1,3,4,6), you would end up with
> (2,3,5,6), as those are the ones that appear in one list, but not
> the other.
>
> For the latter, if (1,2,4,5) and (1,3,4,6), you would end up with
> (1,2,3,4,5,6)
>
> My point is the two statements you put forth are /not/ the same, but
> quite different. Just thought I'd point that out. :-)

Forgot to add, perhaps a more correct over all statement would be:

"The /combined/ keys of both hashs, each only once (no duplicates.)" 




------------------------------

Date: 19 Jan 2005 18:06:15 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: FAQ 4.64 How can I get the unique keys from two hashes?
Message-Id: <csm7in$coc$1@mamenchi.zrz.TU-Berlin.DE>

Alfred Z. Newmane <a.newmane.remove@eastcoastcz.com> wrote in comp.lang.perl.misc:
> Anno Siegel wrote:
> > PerlFAQ Server  <comdog@panix.com> wrote in comp.lang.perl.misc:
> >> This message is one of several periodic postings to
> >> comp.lang.perl.misc intended to make it easier for perl programmers
> >> to find answers to common questions. The core of this message
> >> represents an excerpt
> >> from the documentation provided with Perl.
> >>
> >> --------------------------------------------------------------------
> >>
> >> 4.64: How can I get the unique keys from two hashes?
> >
> > The phrase "unique keys from two hashes" is ambiguous.  From the
> > solution...
> >
> >>     First you extract the keys from the hashes into lists, then
> >>     solve the "removing duplicates" problem described above. For
> >> example:
> >>
> >>         %seen = ();
> >>         for $element (keys(%foo), keys(%bar)) {
> >>             $seen{$element}++;
> >>         }
> >>         @uniq = keys %seen;
> >
> > ...it is clear that it meant to mean "the keys from both hashes, each
> > only once", but it can also be read as "the keys that appear in one
> > hash but not the other".
> 
> I think you got that a little wrong, at least the 2nd part here.
> 
>    "the keys that appear in one hash but not the other".
> 
> is not the same as
> 
>    "the keys from both hashes, each only once"

Yes.  That's my point.  Ambiguous.

Anno


------------------------------

Date: Wed, 19 Jan 2005 14:51:57 -0800
From: "Alfred Z. Newmane" <a.newmane.remove@eastcoastcz.com>
Subject: Re: FAQ 4.64 How can I get the unique keys from two hashes?
Message-Id: <3586kiF4j42elU1@individual.net>

Anno Siegel wrote:
> Alfred Z. Newmane <a.newmane.remove@eastcoastcz.com> wrote in
> comp.lang.perl.misc:
>> Anno Siegel wrote:
>>> PerlFAQ Server  <comdog@panix.com> wrote in comp.lang.perl.misc:
>>>> This message is one of several periodic postings to
>>>> comp.lang.perl.misc intended to make it easier for perl programmers
>>>> to find answers to common questions. The core of this message
>>>> represents an excerpt
>>>> from the documentation provided with Perl.
>>>>
>>>> --------------------------------------------------------------------
>>>>
>>>> 4.64: How can I get the unique keys from two hashes?
>>>
>>> The phrase "unique keys from two hashes" is ambiguous.  From the
>>> solution...
>>>
>>>>     First you extract the keys from the hashes into lists, then
>>>>     solve the "removing duplicates" problem described above. For
>>>> example:
>>>>
>>>>         %seen = ();
>>>>         for $element (keys(%foo), keys(%bar)) {
>>>>             $seen{$element}++;
>>>>         }
>>>>         @uniq = keys %seen;
>>>
>>> ...it is clear that it meant to mean "the keys from both hashes,
>>> each only once", but it can also be read as "the keys that appear
>>> in one hash but not the other".
>>
>> I think you got that a little wrong, at least the 2nd part here.
>>
>>    "the keys that appear in one hash but not the other".
>>
>> is not the same as
>>
>>    "the keys from both hashes, each only once"
>
> Yes.  That's my point.  Ambiguous.

My apologies. 




------------------------------

Date: 19 Jan 2005 19:08:15 GMT
From: xhoster@gmail.com
Subject: Re: FAQ 5.0 How do I pack arrays of doubles or floats for XS code?
Message-Id: <20050119140815.386$1U@newsreader.com>

PerlFAQ Server <comdog@panix.com> wrote:
> This message is one of several periodic postings to comp.lang.perl.misc
> intended to make it easier for perl programmers to find answers to
> common questions. The core of this message represents an excerpt
> from the documentation provided with Perl.
>
> --------------------------------------------------------------------
>
> 5.0: How do I pack arrays of doubles or floats for XS code?
>
>     The kgbpack.c code in the PGPLOT module on CPAN does just this.

I can't find a kgbpack.c file in PGPLOT (2.18)


Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service                        $9.95/Month 30GB


------------------------------

Date: Wed, 19 Jan 2005 23:44:50 +0100
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: Help Perlling a program
Message-Id: <72jtu0tr90f63a6ic2ildu2na0vk839hr6@4ax.com>

On Tue, 18 Jan 2005 20:15:00 -0700, "YYusenet" <yyusenet@yahoo.com>
wrote:

>Subject: Help Perlling a program

While this _may_ be yet another perlish pun/neologism, IMHO it doesn't
sound woo tell. I'd say "making a program more"... ehm, "perlish"!

>I am new to Perl, so I am probably making my programs (or scripts?) in a 
>non-Perl way.  I was wondering if the following script could be made with 
>less repition or lines of code, or in a more Perl like way.

Less repetition of code is desirable in _any_ language. Making it more
perlish is desirable for various reasons in connection with the former
or not. (Not the least, you hopefully you will be staring at your code
marvelling at how beautiful it is! ;-)

>#perl

I _think_ that should be (at least)

  #!perl

even under Windows, especially if you want to put switches there, that
is. Incidentally, no harm done using e.g.

  #!/usr/bin/perl

there too.

>use warnings;
>use strict;
>use File::Copy;
>
>print "--------RENAMING .jpg.bak STARTING--------\n";

This has nothing to do with your actual question, but to you really
need messages of these kind?

>my @files = grep { -f} glob "*.jpg";

Perfect, but! I'd use 

  grep -f, whatever;

instead - however I want to stress that it's just a matter of personal
preference.

More importantly, do you really need to create @files, i.e. are you
using it again later? Hmmm, let me see: no, you're not making any
_substantial_ use of it, then you may just do

  for (grep -f, glob "*.jpg") { # ...

>foreach (@files) {
> print "renaming $_...\t\tto\t$_.jbak\n";

Again out of personal preference, I'd print this message only upon
success.

> move ($_,$_ . ".jbak") or
>  die "unable to rename $_!";

Yet another side note: do you really want to die()? In cases like this
_occasionally_ I happen to prefer to warn() and go on...

Oh, and as a side note to the side note I'd indicate _what_ has been
impossible to rename to _what_.

BTW, since you're _not_ moving files across fs boundaries (not that I
can see...), you may have safely used rename() in the first place.

>print "--------RENAMING COMPLETE--------\n";

ditto as above!

>print "--------RENAMING .gif.bak STARTING--------\n";

ditto as above!

>@files = grep { -f} glob "*.gif";
[snip]

Ouch, but this is basically the same code as above! If you didn't mind
those "informing" messages, then you could do everything in a swept:

  for (grep -f, glob "*.jpg *.gif") { # ...

If you _do_, then you may either use an external loop or a sub.


>my $counter = 0;
>@files = grep { -f} glob "*.jbak";

Ditto as above wrt factorization of code. The $counter stuff doesn't
change the issue consistently.

Now that I think of it, I see why you can't (for some sense of
"can't") do gifs and jpgs in one swept. But this is only a weak excuse
for with only a moderate effort you could use a "unisex" backup suffix
and recover the original extension by one out of many ways.

However, as a cmt to your program as a whole, I don't think your
scheme for renaming in two passes as particularly efficient or
elegant. Granted: provided that you haven't backup copies lying around
(e.g. after a die() as above) you won't loose files and you don't need
to explicitly test for target existence, but if, for example you keep
already renamed files in the same dir, then you will be doing the same
work again and again. So I'd just do that: check for existence and
increase the counter accordingly, in one pass.

>print "--------PROGRAM FINISHED--------\n";
><STDIN>

Would you regard this as useful?!?


Michele
-- 
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
 .'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,


------------------------------

Date: Wed, 19 Jan 2005 23:44:29 +0100
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: How bad is $'? (Was: "Get substring of line")
Message-Id: <ndntu0lkg171r8svdgr1478tb5eudmqop7@4ax.com>

On 18 Jan 2005 07:29:11 -0800, jl_post@hotmail.com (J. Romano) wrote:

>On Wednesday, January 5, 2005, Uri Guttman said:
>>
>> ... try to actually isolate the issue in a proper
>> benchmark. spend some deep time in thought in how
>> to do it. post your benchmark for review. then talk
>> about $' with some confidence.
>
>   Okay, I'll take you up on that offer.
>
>   (But be warned!  Expect a "tome".  :)

Indeed: as a personal cmt to you, I have the impression you tend to be
overly verbose. I mean verboseness is not so bad per se, but the
actual impression, gathered from other posts of yours, is that the
actual content in them that may have been of some interest could have
been expressed much more concisely, thus making it easier for it to
reach a wider audience.

In this particular case I couldn't go beyond the first few paragraphs
and I'm not willing to.

>The first benchmark program I came up with was this one:
>
>#!/usr/bin/perl
>use strict;
>use warnings;
>use Benchmark;
>
>my $count = 1e7;
>my $prefix = q!$a = $'  if "a" =~ m/a/;!;
>my $line = '$a = 1  if "abc=xyz" =~ m/=/;';

Why are you (running under strict and) using as a generic variable the
predefined global variable $a?

I mean, this post of yours has some flavour of a tutorial: so why
exposing to potential newbies something we usually warn them against?

>my $badProgram = "$prefix $line";
>
>timethese($count, {good => $line});
>timethese($count, {bad => $badProgram, prefix => $prefix});
>__END__

What is the point of benchmarking two completely different snippets,
one of which just does something more than the other?

Also, indeed lines of code benchmarked as strings are
(string-)eval()ed which means that they are parsed and executed as
stand-alone perl programs, but (as of the docs => see!) _in the
lexical context_ of the current program, which means that the
interpreter they're executed with is not a total stranger to the
current one.

So, if you want to compare something using any of $`, $&, $' (but $&
is said to be less expensive than the other two) and something that
doesn't, then you have to use two separate programs, e.g.:

$ perl -MBenchmark=:hireswallclock -e '\
timethis -30, sub { "aaaaaaaaaaa" =~ /\w+/ }, "all";'
       all: 32.2295 wallclock secs (32.23 usr +  0.00 sys = 32.23 CPU)
@ 1331766.21/s (n=42922825)

$ perl -MBenchmark=:hireswallclock -e '\
"aaa" =~ /\w/; $`;\
timethis -30, sub { "aaaaaaaaaaa" =~ /\w+/ }, "all";'
       all: 31.4189 wallclock secs (31.42 usr +  0.00 sys = 31.42 CPU)
@ 985777.53/s (n=30973130)


PS: FWIW I _partly_ agree with you that the general habit of frowning
upon the use of $`, $&, $' a priori is to some extent exaggerated.


HTH,
Michele
-- 
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
 .'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,


------------------------------

Date: Wed, 19 Jan 2005 16:12:11 GMT
From: osmo <no@mail.org>
Subject: locale problem
Message-Id: <vFvHd.333$vl1.6@read3.inet.fi>

I have problems with using my locale in perl. I have all my locale 
environment variables set to "fi_FI.UTF-8" (finnish).

I use the following script to test perl locale handling:
--------------------------
#!/usr/bin/perl

print "\nno locale:\n";
print +(sort grep /\w/, map { chr } 0..255), "\n";

use locale;
print "\nusing locale:\n";
print +(sort grep /\w/, map { chr } 0..255), "\n";
--------------------------

And the output looks like:
--------------------------
no locale:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz

using locale:
_aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ0123456789
--------------------------

The order of the characters is a bit different, so "use locale" 
obviously does something, but it does not include the scandinavian 
letters åÅäÄöÖ, which are in the finnish alphabets. Out of curiousity I 
tried the same thing with locales "de_DE.UTF-8" (german) and 
"sv_SV.UTF-8" (swedish) and got the exact same output.

How do I get \w, [[:alpha:]] and others to match my locale characters?

I'm using perl v5.8.4 on Linux.


------------------------------

Date: 19 Jan 2005 18:04:54 GMT
From: Martin Kissner <news@chaos-net.de>
Subject: Re: locale problem
Message-Id: <slrncut8a6.rse.news@maki.homeunix.net>

osmo wrote :
> I have problems with using my locale in perl. I have all my locale 
> environment variables set to "fi_FI.UTF-8" (finnish).
>
> I use the following script to test perl locale handling:
> --------------------------
> #!/usr/bin/perl
>
> print "\nno locale:\n";
> print +(sort grep /\w/, map { chr } 0..255), "\n";
>
> use locale;
> print "\nusing locale:\n";
> print +(sort grep /\w/, map { chr } 0..255), "\n";
> --------------------------
>
> And the output looks like:
> --------------------------
> no locale:
> 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
>
> using locale:
> _aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ0123456789
> --------------------------
>
> The order of the characters is a bit different, so "use locale" 
> obviously does something, but it does not include the scandinavian 
> letters åÅäÄöÖ, which are in the finnish alphabets. Out of curiousity I 
> tried the same thing with locales "de_DE.UTF-8" (german) and 
> "sv_SV.UTF-8" (swedish) and got the exact same output.
>
> How do I get \w, [[:alpha:]] and others to match my locale characters?


	#!/usr/usr/bin/perl
	
	use warnings;
	use strict;
	use locale;
	use POSIX qw(locale_h);
	
	print +(sort grep /\w/, map { chr } 0..255), "\n";

with 'de_DE.UTF-8' the output looks like this:
0123456789AÀÁÂÃÄÅÆBCÇDEÈÉÊËFGHIÌÍÎÏJKLMNÑOÒÓÔÕÖØPQRSTUÙÚÛÜVWXYÝZÐÞ_aàá
âãäåæbcçdeèéêëfghiìíîïjklmnñoòóôõöøpqrsßtuùúûüvwxyýÿzðþµªº

Also look at perldoc perllocale

HTH
Martin

-- 
Epur Si Muove (Gallileo Gallilei)


------------------------------

Date: Wed, 19 Jan 2005 19:18:46 GMT
From: osmo <no@mail.org>
Subject: Re: locale problem
Message-Id: <qoyHd.452$vl1.348@read3.inet.fi>

Martin Kissner wrote:
> osmo wrote :
> 
>>I have problems with using my locale in perl. I have all my locale 
>>environment variables set to "fi_FI.UTF-8" (finnish).
>>
>>I use the following script to test perl locale handling:
>>--------------------------
>>#!/usr/bin/perl
>>
>>print "\nno locale:\n";
>>print +(sort grep /\w/, map { chr } 0..255), "\n";
>>
>>use locale;
>>print "\nusing locale:\n";
>>print +(sort grep /\w/, map { chr } 0..255), "\n";
>>--------------------------
>>
>>And the output looks like:
>>--------------------------
>>no locale:
>>0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
>>
>>using locale:
>>_aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ0123456789
>>--------------------------
>>
>>The order of the characters is a bit different, so "use locale" 
>>obviously does something, but it does not include the scandinavian 
>>letters åÅäÄöÖ, which are in the finnish alphabets. Out of curiousity I 
>>tried the same thing with locales "de_DE.UTF-8" (german) and 
>>"sv_SV.UTF-8" (swedish) and got the exact same output.
>>
>>How do I get \w, [[:alpha:]] and others to match my locale characters?
> 
> 
> 
> 	#!/usr/usr/bin/perl
> 	
> 	use warnings;
> 	use strict;
> 	use locale;
> 	use POSIX qw(locale_h);
> 	
> 	print +(sort grep /\w/, map { chr } 0..255), "\n";
> 
> with 'de_DE.UTF-8' the output looks like this:
> 0123456789AÀÁÂÃÄÅÆBCÇDEÈÉÊËFGHIÌÍÎÏJKLMNÑOÒÓÔÕÖØPQRSTUÙÚÛÜVWXYÝZÐÞ_aàá
> âãäåæbcçdeèéêëfghiìíîïjklmnñoòóôõöøpqrsßtuùúûüvwxyýÿzðþµªº
> 
> Also look at perldoc perllocale
> 
> HTH
> Martin
> 

And for me the output looks like this:
_aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ0123456789

I have read the perdocs and i know how one should get the locale 
characters, but i just don't seem to get them, in any language. So i'm 
wondering if there are some common problems that might be causing this.

Why do i get the characters in different order after the "use locale;" 
in the script in my first post?

Osmo


------------------------------

Date: 19 Jan 2005 19:44:56 GMT
From: Martin Kissner <news@chaos-net.de>
Subject: Re: locale problem
Message-Id: <slrncute5n.rse.news@maki.homeunix.net>

osmo wrote :
> Martin Kissner wrote:
>> 
>> 	#!/usr/usr/bin/perl
>> 	
>> 	use warnings;
>> 	use strict;
>> 	use locale;
>> 	use POSIX qw(locale_h);
>> 	
>> 	print +(sort grep /\w/, map { chr } 0..255), "\n";
>> 
>> with 'de_DE.UTF-8' the output looks like this:
>> 0123456789AÀÁÂÃÄÅÆBCÇDEÈÉÊËFGHIÌÍÎÏJKLMNÑOÒÓÔÕÖØPQRSTUÙÚÛÜVWXYÝZÐÞ_aàá
>> âãäåæbcçdeèéêëfghiìíîïjklmnñoòóôõöøpqrsßtuùúûüvwxyýÿzðþµªº
>> 
>
> And for me the output looks like this:
> _aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ0123456789
>
> I have read the perdocs and i know how one should get the locale 
> characters, but i just don't seem to get them, in any language. So i'm 
> wondering if there are some common problems that might be causing this.

I'm not an expert, so use the following suggestions with care.

Make sure that perl believes the locale System is suported.
If so: "perl -V:d_setlocale" will give you "d_setlocale='define';" 

Make sure the LANG-Variable is set and exported.
You might as well put in this line after 'use POSIX qw(locale_h);'

	setlocale(LC_CTYPE, "de_DE.ISO8859-1");

Of course you need to have the charsets which you use installed on your
computer.  On my system they are located in 'ls /usr/share/locale/'.

> Why do i get the characters in different order after the "use locale;" 
> in the script in my first post?

Sorry, I can not answer this question.

HTH
Martin

-- 
Epur Si Muove (Gallileo Gallilei)


------------------------------

Date: Wed, 19 Jan 2005 20:44:12 +0000
From: "Alan J. Flavell" <flavell@ph.gla.ac.uk>
Subject: Re: locale problem
Message-Id: <Pine.LNX.4.61.0501192039530.3924@ppepc56.ph.gla.ac.uk>

On Wed, 19 Jan 2005, osmo wrote:

> I have read the perdocs and i know how one should get the locale 
> characters,

I can't give you a definitive answer, but have I missed something in 
this thread?  Take a look at

http://www.perldoc.com/perl5.8.4/pod/perlunicode.html#Interaction-with-Locales

AIUI you said you've got utf-8 in your locale.  perlunicode.html 
documents unicode interaction with locales under the heading "BUGS".

> but i just don't seem to get them, in any language. So i'm wondering 
> if there are some common problems that might be causing this.

Maybe you're provoking the very "bugs" that are warned against?

Apologies in advance if I'm missing the point.



------------------------------

Date: Wed, 19 Jan 2005 21:26:11 GMT
From: osmo <no@mail.org>
Subject: Re: locale problem
Message-Id: <TfAHd.542$vl1.473@read3.inet.fi>

Alan J. Flavell wrote:
> http://www.perldoc.com/perl5.8.4/pod/perlunicode.html#Interaction-with-Locales
> 
> AIUI you said you've got utf-8 in your locale.  perlunicode.html 
> documents unicode interaction with locales under the heading "BUGS".

I suppose this might well be the case. I have just seen others 
successfully using utf-8 and locales and i didn't seem to find a clear 
statement in the vast perldoc, but now you pointed me in the right 
place: "Use of locales with Unicode is discouraged."

Hope the perl people will fix this for some newer release.

Osmo


------------------------------

Date: Wed, 19 Jan 2005 23:44:33 +0100
From: Michele Dondi <bik.mido@tiscalinet.it>
Subject: Re: MAP Question
Message-Id: <q1itu0h78of27kmtk80turoe6nn0fp8e4u@4ax.com>

On 19 Jan 2005 10:27:45 +0100, Arndt Jonasson <do-not-use@invalid.net>
wrote:

>To add to this subject (as relative Perl newbie), some common ways of
>pointing out that a word (or longer construct) in running text is not
>meant as an ordinary English word, but rather as Perl code, are:

Funny, I felt the need to clarify the point too. I'm reading news
offline, so I hadn't read your reply yet, when writing mine.
-- 
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
 .'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,


------------------------------

Date: Wed, 19 Jan 2005 14:35:57 -0500
From: Sherm Pendley <spamtrap@dot-app.org>
Subject: Re: Need help with Perl and MySQL database data load
Message-Id: <zeedndmN_oEDK3PcRVn-oQ@adelphia.com>

Oscar wrote:

> It's strange that using the older password scheme had to be used since
> my MySQL version was a new installation and a recent version.

Perhaps not. Connecting to a recent version of the server with an older
client version is precisely the cause of the problem.

If MySQL has been recently upgraded, you'd need to re-install DBD::mysql to
build and link it against the new library. Or, if you're using a binary
copy of the module, it might have been built against an older MySQL client
library than the one you're using. Etc.

sherm--

-- 
Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org


------------------------------

Date: 19 Jan 2005 11:09:19 -0800
From: "shifty" <shifty_MyU@yahoo.com>
Subject: Negative lookahead regex clarification needed
Message-Id: <1106161759.879622.23020@f14g2000cwb.googlegroups.com>

Hi,

I'm trying to hack my way through a regex for a chunk of code I'm going
to use.  I've been using a Regex Coach to run through this and I think
I have correct syntax.

I am trying to find any one of several 'hacked' variants of the word
"microsoft" (ex: m1cr0s0ft, mi=E7r0=A70ft, etc.), but NOT match on the
actual word "microsoft".  I need the regex to be case sensitive.

This is my regex - it seems to work, but I don't know if the syntax is
honestly correct and I don't want it to break later:

(?i).*\b(?:(?!microsoft)m+[i1l\\\|!=A1=EE=ED=EC=EF]+[C=E7]+r+[o0=F6=F8=F5=
=F4=F3=F2=F0]+[s=A7]+[o0=F6=F8=F5=F4=F3=F2=F0]+f+[t\+]+)\b.*

This expression will:
Be case insensitive
Have a word boundary to limit only finding the word I'm looking for
Allow anything to preceed this word's boundaries
Match on several variants of 'microsoft' as long as negative lookahead
doesn't find the proper spelling
Will not capture the match if one is found

Is this correct?  Any help is appreciated.  I'm going to need to knock
out several of these things.

I'm just starting with regex, and I'm totally in love - but it's really
easy to be inefficient and it's also easy really, really easy to miss
"false positives" caused by overlooking an aspect of your expression.
Reminds me of 'chess vs. chemistry' or something.



------------------------------

Date: Wed, 19 Jan 2005 20:37:20 +0000
From: "Alan J. Flavell" <flavell@ph.gla.ac.uk>
Subject: Re: Negative lookahead regex clarification needed
Message-Id: <Pine.LNX.4.61.0501192028130.3924@ppepc56.ph.gla.ac.uk>

On Wed, 19 Jan 2005, shifty wrote:

> I'm trying to hack my way through a regex for a chunk of code I'm going
> to use.  I've been using a Regex Coach to run through this and I think
> I have correct syntax.

I didn't know what "Regex Coach" is (I do now, courtesy of Google), 
but I find "pcretest" (part of the PCRE package from Phil Hazel) to be 
a valuable aid.

> I am trying to find any one of several 'hacked' variants of the word
> "microsoft" (ex: m1cr0s0ft, miçr0§0ft, etc.), but NOT match on the
> actual word "microsoft".  I need the regex to be case sensitive.

Off the top of my head:  Perhaps it would be better to do a character 
translation on the string, and then compare the result with the 
original.

OTOH, if you're in a context where only a regex is acceptable (you're 
not by any chance writing recipes for spamassassin?) then I might have 
to take that back.



------------------------------

Date: Wed, 19 Jan 2005 13:12:25 -0800
From: Jim Gibson <jgibson@mail.arc.nasa.gov>
Subject: Re: Negative lookahead regex clarification needed
Message-Id: <190120051312251793%jgibson@mail.arc.nasa.gov>

In article <1106161759.879622.23020@f14g2000cwb.googlegroups.com>,
shifty <shifty_MyU@yahoo.com> wrote:

> Hi,
> 
> I'm trying to hack my way through a regex for a chunk of code I'm going
> to use.  I've been using a Regex Coach to run through this and I think
> I have correct syntax.
> 
> I am trying to find any one of several 'hacked' variants of the word
> "microsoft" (ex: m1cr0s0ft, miçr0§0ft, etc.), but NOT match on the
> actual word "microsoft".  I need the regex to be case sensitive.
> 
> This is my regex - it seems to work, but I don't know if the syntax is
> honestly correct and I don't want it to break later:
> 
>
> (?i).*\b(?:(?!microsoft)m+[i1l\\\|!¡îíìï]+[Cç]+r+[o0öøõôóòð]+[s§]+[o0öøõôóòð]+
> f+[t\+]+)\b.*

Yes, it does work, but it could be simplified:

1, It is useless to have .* at the beginning and end of the regex.
2. It is useless to group with (?: ... ) in this case
3. You don't need all of the plus signs unless you expect repeated
characters.
4. A plus sign is not special inside a character class and need not be
escaped in [t+].
5. Have you tried it with 'microsof+'? The plus sign is not a word
character (\w = [a-zA-Z_0-9]) and if it is followed by another non-word
character, as you want, then the regex will not match. Try using \b|\s
at the end of the word, instead. If you have any non-word characters at
the beginning, the same principle applies.
6. You can use the //i modifier to make the regex case-insensitive
instead of the (?i) construct.
7. You can use the //x modifier to improve readability by adding
whitespace.
8. You can use the qr// operator to create and save a regular
expression for later use.
9. Dont forget $ as a replacement for s, $ needs escaping in
double-quote context of a regular expression.


> 
> This expression will:
> Be case insensitive
> Have a word boundary to limit only finding the word I'm looking for
> Allow anything to preceed this word's boundaries
> Match on several variants of 'microsoft' as long as negative lookahead
> doesn't find the proper spelling
> Will not capture the match if one is found
> 
> Is this correct?  Any help is appreciated.  I'm going to need to knock
> out several of these things.
> 
> I'm just starting with regex, and I'm totally in love - but it's really
> easy to be inefficient and it's also easy really, really easy to miss
> "false positives" caused by overlooking an aspect of your expression.
> Reminds me of 'chess vs. chemistry' or something.
> 

With all of the above points in mind, I would suggest the following:

my $regex = qr(
  (?:\b|\s)
  (?!microsoft)
  m
  [i1l\\\|!¡îíìï]
  [Cç]
  r
  [o0öøõôóòð]
  [s§\$]
  [o0öøõôóòð]
  f
  [t+]
  (?:\b|\s)
)ix;

and use $regex as in:

   if( $line=~ $regex ) {
      # match
   }

Are you looking for other approximations such as 'microsloth' and
'microsquash'?


----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= East/West-Coast Server Farms - Total Privacy via Encryption =---


------------------------------

Date: 19 Jan 2005 21:28:23 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Negative lookahead regex clarification needed
Message-Id: <csmjdn$jce$1@mamenchi.zrz.TU-Berlin.DE>

shifty <shifty_MyU@yahoo.com> wrote in comp.lang.perl.misc:
> Hi,i
> 
> I'm trying to hack my way through a regex for a chunk of code I'm going
> to use.  I've been using a Regex Coach to run through this and I think
> I have correct syntax.

If the syntax weren't correct it wouldn't compile.  What you are asking is
whether it does what you want it to do, which is about semantics.

> I am trying to find any one of several 'hacked' variants of the word
> "microsoft" (ex: m1cr0s0ft, miçr0§0ft, etc.), but NOT match on the
> actual word "microsoft".  I need the regex to be case sensitive.
> 
> This is my regex - it seems to work, but I don't know if the syntax is
> honestly correct and I don't want it to break later:
> 
> (?i).*\b(?:(?!microsoft)m+[i1l\\\|!¡îíìï]+[Cç]+r+[o0öøõôóòð]+[s§]+[o0öøõôóòð]+f+[t\+]+)\b.*

That string is mangled.  It appears to contain literal backspaces or other
control characters that make it hard to analyze.  It may well not compile.

Is there any reason why you want to use lookahead to exclude unaltered
strings like "microsoft"?  Just skip those strings using an extra regex,
and concentrate on matching the altered variants.

To do this in a maintainable way, I'd first build a hash of possible
replacement characters.  For "microsoft", it might look like this:

    my %repla = (
        m => 'm',
        i => 'i1',
        c => 'cç',
        r => 'r',
        o => 'o0',
        s => 's5§',
        f => 'f',
        t => 't+',
    );
    $_ = quotemeta for values %repla; # make regex-safe

Add more characters to cover other words besides "microsoft".

Then build your regex from the replacement strings in a systematic
way:

    my $re = join '', map "[$_]", @repla{ split //, 'microsoft'};
    $re = qr/$re/i; # made case-insensitive here

To test it, run

    for ( qw( microsoft miçr0§0ft m1cros0f+ m1crosaft) ) {
        next if /^microsoft$/i,
        print "$_\n" if $_ =~ $re;
    }

It prints only the middle two examples.

If you really need to do everything in one regex (yes, it does make a slight
difference), you can introduce negative lookahead by changing the line
containing qr// to

    $re = qr/(?!microsoft)$re/i;

Working this way, there is little doubt about what the code does, and it
will be easy to modify and extend.  There is also no need for a "Regex
Coach" with dubious I/O habits.

Anno


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 7673
***************************************


home help back first fref pref prev next nref lref last post