[23159] in Perl-Users-Digest
Perl-Users Digest, Issue: 5380 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Aug 17 18:11:28 2003
Date: Sun, 17 Aug 2003 15:10:12 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sun, 17 Aug 2003 Volume: 10 Number: 5380
Today's topics:
Re: Hudson River (Randal L. Schwartz)
Re: Hudson River <REMOVEsdnCAPS@comcast.net>
Re: Hudson River <flavell@mail.cern.ch>
Re: Hudson River <noreply@gunnar.cc>
Re: Hudson River <gellyfish@gellyfish.com>
Re: Hudson River <noreply@gunnar.cc>
Re: Hudson River <noreply@gunnar.cc>
Re: Hudson River <tim@vegeta.ath.cx>
Re: Hudson River <noreply@gunnar.cc>
Re: Hudson River <tassilo.parseval@rwth-aachen.de>
Re: Hudson River <rgarciasuarez@free.fr>
Re: mod_perl 2 Setup ? <rbaba99@caramail.com>
Multithreading and sockets blocking I/O on Win32 (Dragos D)
Order of evaluation of expressions (Mark Jason Dominus)
Re: Order of evaluation of expressions <rgarciasuarez@free.fr>
Re: Problem with join and unicode <flavell@mail.cern.ch>
Regex Question <mikeflan@earthlink.net>
Re: Regex Question <uri@stemsystems.com>
Re: Regex Question <trammell+usenet@hypersloth.invalid>
Re: Regex Question <mikeflan@earthlink.net>
Re: simple perl script <bharn_S_ish@te_P_chnologi_A_st._M_com>
Testing whether a subroutine exists (symbolic ref) <apollock11@hotmail.com>
Re: Testing whether a subroutine exists (symbolic ref) <erutiurf@web.de>
Re: Testing whether a subroutine exists (symbolic ref) <uri@stemsystems.com>
Re: Testing whether a subroutine exists (symbolic ref) <apollock11@hotmail.com>
Re: <bwalton@rochester.rr.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sun, 17 Aug 2003 05:32:51 GMT
From: merlyn@stonehenge.com (Randal L. Schwartz)
Subject: Re: Hudson River
Message-Id: <2dc316d7fc94cff23f990de35d21b0be@news.teranews.com>
>>>>> "Gunnar" == Gunnar Hjalmarsson <noreply@gunnar.cc> writes:
Gunnar> Now, let's say that I have some code that deals with form data from
Gunnar> STDIN, including:
Gunnar> for (split /&/, $data)
Gunnar> If I would post it here, somebody would most certainly claim it to be
Gunnar> broken since I don't include ';' when splitting.
Gunnar> Then, if I would explain that in *my* program, data is only submitted
Gunnar> from a form using the POST method, some people here would *not* accept
Gunnar> my explanation, but still claim that the code is "broken", refer to
Gunnar> "Joe newbie" etc.
No. You don't have control over how data is submitted to your form.
And by limiting flexibility artificially, you are breaking legitimate
users who might want to create a bookmark for a particular form
submission, for example.
Why Hand Code a limited solution, when a flexible solution is available
to you with the simple phrase "use CGI qw(param);"?
It's not a sacred cow. It's the collective voices of people WHO HAVE
BEEN BURNED BY HAND CODING.
Get it? Voice of reason. Voice of experience. Pay attention.
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
------------------------------
Date: Sun, 17 Aug 2003 14:11:39 -0500
From: "Eric J. Roode" <REMOVEsdnCAPS@comcast.net>
Subject: Re: Hudson River
Message-Id: <Xns93DA9A7DF7A42sdn.comcast@206.127.4.25>
-----BEGIN xxx SIGNED MESSAGE-----
Hash: SHA1
Gunnar Hjalmarsson <noreply@gunnar.cc> wrote in news:bhntk8$1ak8t$1@ID-
184292.news.uni-berlin.de:
> Let's say that I use this regex:
>
> if ( /abc\sxyz/ )
>
> Somebody might object and say:
> "It won't match unless 'abc' and 'xyz' are separated by exactly one
> space, so you'd better do /abc\s*xyz/."
>
> But if I explain the context in *my* program, letting him/her know
> that I *know* that they are always separated by one space, my
> explanation would probably be accepted.
>
> Now, let's say that I have some code that deals with form data from
> STDIN, including:
>
> for (split /&/, $data)
>
> If I would post it here, somebody would most certainly claim it to be
> broken since I don't include ';' when splitting.
>
> Then, if I would explain that in *my* program, data is only submitted
> from a form using the POST method, some people here would *not* accept
> my explanation, but still claim that the code is "broken", refer to
> "Joe newbie" etc.
>
> So, what's the difference between these two examples? Well, there is
> absolutely no rational reason for claiming that there is a difference.
There certainly is. When you use a regular expression, you can be
reasonably certain who will use it under what circumstances. But you
have no idea who is going to link to your page and expect it to work with
semicolons as well as ampersands.
Don't believe me? A while ago, I was revising my "favorite links" page,
making it pass w3c's validation suites for strict HTML and CSS
compliance. You can't have naked ampersands in URLs in strict HTML. You
either have to code them as & (annoying in a URL), or change them to
semicolons (much easier to do, and easier to read). I found a whole mess
of sites -- some of them major, public sites -- that couldn't handle
semicolons instead of ampersands. Losers.
Apparently, many people think "nobody's going to access this page except
via the form that I have written, so I can do it any damn way I please."
That attitude limits the flexibility of people like me.
- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print
-----BEGIN xxx SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>
iQA/AwUBPz/TT2PeouIeTNHoEQLjUACghqj28tIeNoyq1dpmzpAl3+TQWx8AoME1
/N392tLwoYFI+GG8FerwHMZd
=Wj9S
-----END PGP SIGNATURE-----
------------------------------
Date: Sun, 17 Aug 2003 21:25:28 +0200
From: "Alan J. Flavell" <flavell@mail.cern.ch>
Subject: Re: Hudson River
Message-Id: <Pine.LNX.4.53.0308172116071.2624@lxplus003.cern.ch>
On Sun, Aug 17, Eric J. Roode inscribed on the eternal scroll:
> You can't have naked ampersands in URLs in strict HTML.
To nit-pick over terminology: you certainly _can_ have naked
ampersands "in URLs" - indeed, actual forms submission via GET
absolutely depends on it.
What you can't have are naked ampersands in URL *references* in HTML,
of the kind which occur in the attribute values for href= or src= etc.
in HTML.
I know this will seem pedantic, but failure to observe the distinction
between the URL itself, and a reference to it in HTML, has led in the
past to all kinds of confused URL-garbling.
> You either have to code them as & (annoying in a URL), or change
> them to semicolons (much easier to do, and easier to read). I found
> a whole mess of sites -- some of them major, public sites -- that
> couldn't handle semicolons instead of ampersands.
So any troll will be ready to tell us that they're doing no worse than
many successful high-profile commercial sites - so why are we
worrying?
> Losers.
I think so too, but there's always Sturgeon's Law on the side of our
detractors.
> Apparently, many people think "nobody's going to access this page except
> via the form that I have written, so I can do it any damn way I please."
Apparently, many people like the sense of power which such things give
them over their users (or at least -appear- to give them).
> That attitude limits the flexibility of people like me.
They would like that, you see.
No, I'm not trying to defend them, just to make an observation on the
perversity of human nature.
--
"If designers haven't done previous work for the web, they can come
to it with certain preconceptions." - Martin Tanton in uk.n.w.a
(a sample of the British art of understatement! -ed.)
------------------------------
Date: Sun, 17 Aug 2003 22:45:40 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: Hudson River
Message-Id: <bhopi4$1kcu0$1@ID-184292.news.uni-berlin.de>
Randal L. Schwartz wrote:
>>>>>> "Gunnar" == Gunnar Hjalmarsson <noreply@gunnar.cc>
>>>>>> writes:
>
> Gunnar> for (split /&/, $data)
>
> Gunnar> If I would post it here, somebody would most certainly
> Gunnar> claim it to be broken since I don't include ';' when
> Gunnar> splitting.
>
> Gunnar> Then, if I would explain that in *my* program, data is only
> Gunnar> submitted from a form using the POST method, some people
> Gunnar> here would *not* accept my explanation, but still claim
> Gunnar> that the code is "broken", refer to "Joe newbie" etc.
>
> No. You don't have control over how data is submitted to your
> form.
The particular case I have in mind is the contact form program that
you access if you click the link in my sig, and that script explicitly
ignores submissions that are not POST.
> And by limiting flexibility artificially, you are breaking
> legitimate users who might want to create a bookmark for a
> particular form submission, for example.
Again, in this particular case, it would make absolutely no sense to
permit bookmark submissions.
> Why Hand Code a limited solution, when a flexible solution is
> available to you with the simple phrase "use CGI qw(param);"?
Again, in this particular case, I have absolutely no interest in
CGI.pm's flexibility as regards POST vs. GET. On the contrary, I have
an explicit interest *not* to allow GET submissions.
But I think we have drifted off topic now.
> It's not a sacred cow. It's the collective voices of people WHO
> HAVE BEEN BURNED BY HAND CODING.
>
> Get it? Voice of reason. Voice of experience. Pay attention.
Whether I "get it" depends on what exactly you mean by that.
I do pay attention. I listen respectfully. I take into account the
implied warnings. I take pains to be careful.
However, I reserve the right to decide, from case to case, if it's
motivated to make use of a module, whether it's CGI.pm or some other
module. I feel that I'm entitled to do so without automatically be
accused of using "bad" or "broken" code.
You are trying to prevent the clueless, careless programmers from
spreading bad code. That's fine. But do you really believe that you
make them less clueless and careless by enforcing dogmatic programming
rules in this group? I'm convinced that they just leave, if they ever
were here.
I'm not the only one who feels this way. I noticed that you replied to
Mark's message in the "perl zombies" thread, and I believe that he
expressed something similar. But most people with similar feelings
keep quiet and/or leave.
The only Perl book I have is the "camel". I do like it, and one
important reason is that the comments are shaded-off and respectful to
the reader's judge. I'd love it if more of the messages in this group
were written in the spirit of the "camel" comments. If that would
happen, some of the clueless and careless programmers might start
listen rather than being repelled.
Do *you* get what *I'm* trying to say, Randal?
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: Sun, 17 Aug 2003 21:22:12 GMT
From: Jonathan Stowe <gellyfish@gellyfish.com>
Subject: Re: Hudson River
Message-Id: <8mS%a.9$Yx6.2759@news.dircon.co.uk>
Gunnar Hjalmarsson <noreply@gunnar.cc> wrote:
>
> The particular case I have in mind is the contact form program that
> you access if you click the link in my sig, and that script explicitly
> ignores submissions that are not POST.
>
No it doesn't. Otherwise one wouldn't be able to follow the link. Of course
it is entirely appropriate that it should only accept a POST when the form
is submitted, but I am not sure what this has to do with the use or not of
some module.
/J\
--
Jonathan Stowe |
<http://www.gellyfish.com> | This space for rent
|
------------------------------
Date: Sun, 17 Aug 2003 23:30:11 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: Hudson River
Message-Id: <bhos5k$1k5l9$1@ID-184292.news.uni-berlin.de>
Eric J. Roode wrote:
> Gunnar Hjalmarsson <noreply@gunnar.cc> wrote in
> news:bhntk8$1ak8t$1@ID- 184292.news.uni-berlin.de:
>
>> So, what's the difference between these two examples? Well, there
>> is absolutely no rational reason for claiming that there is a
>> difference.
>
> There certainly is. When you use a regular expression, you can be
> reasonably certain who will use it under what circumstances. But
> you have no idea who is going to link to your page and expect it to
> work with semicolons as well as ampersands.
Please see my reply to Randal where I explained that in the case I'm
thinking of, the script ignores GET submissions.
> Don't believe me?
Well, you are of course right in cases where submissions can be made
using GET (which they normally can). And you don't need to worry about
me in that respect. I wouldn't think of not splitting on both & and ;
in a 'normal' CGI script. And I do think.
Hey, even if I'm a relative beginner, I'm a *careful* programmer. And
I sometimes make use of modules, sometimes not. :)
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: Sun, 17 Aug 2003 23:30:14 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: Hudson River
Message-Id: <bhos5m$1k5l9$2@ID-184292.news.uni-berlin.de>
Alan J. Flavell wrote:
> To nit-pick over terminology: you certainly _can_ have naked
> ampersands "in URLs" - indeed, actual forms submission via GET
> absolutely depends on it.
>
> What you can't have are naked ampersands in URL *references* in
> HTML, of the kind which occur in the attribute values for href= or
> src= etc. in HTML.
>
> I know this will seem pedantic, but failure to observe the
> distinction between the URL itself, and a reference to it in HTML,
> has led in the past to all kinds of confused URL-garbling.
Indeed an important distinction, Alan. To get it right, you do need to
be aware of that. In other words, you need to know what you are doing.
To return to my favorite topic: CGI.pm. Take a beginner who blindly
follows the firm advise from most regulars here, and start doing his
or her CGI by help of CGI.pm. One of the points with modules is that
you "shall not re-invent wheels", since experts already have taken
care of everything in a much better way than you ever could, right?
Okay, doesn't that mean a *greater* risk that s/he refrains from
learning the stuff you called our attention to above, compared to if
s/he had started to explore CGI by re-inventing a few wheels?
Just an (additional) thought.
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: 17 Aug 2003 14:32:55 -0700
From: Tim Hammerquist <tim@vegeta.ath.cx>
Subject: Re: Hudson River
Message-Id: <slrnbjvt6b.bkl.tim@vegeta.ath.cx>
Gunnar Hjalmarsson graced us by uttering:
> [...] I just don't understand why the regulars, who so strongly
> advocate the use of modules in general and CGI.pm in
> particular, provoke such reactions over and over again. It's a
> mystery to me why some people here make such a fuss about the
> fact that some of us sometimes choose to parse form data using
> a few lines of own code instead of using CGI.pm. You are lousy
> missionaries. ;-)
There are no laws, religious or otherwise, preventing you from
parsing your own form data.
However, if you choose to reinvent the wheel, especially on that
scale, you are likely to make several, possibly dangerous,
mistakes. Mistakes that have been fixed in CGI. Mistakes that,
if asked about in clpm, are not likely to be answered in any
manner except, "Why aren't you using CGI.pm? It fixed that
problem/bug/vulnerability months/years ago!"
But if you know better than the thousands of people who've
helped write and/or test CGI.pm or any of the dozens of other
recommended CGI-related modules on the CPAN...
Tim Hammerquist
--
Congratulations, you have just reinvented the wheel. *And* made it square.
-- Simon Cozens in comp.lang.perl.misc
------------------------------
Date: Sun, 17 Aug 2003 23:44:19 +0200
From: Gunnar Hjalmarsson <noreply@gunnar.cc>
Subject: Re: Hudson River
Message-Id: <bhot04$1m6d6$1@ID-184292.news.uni-berlin.de>
Jonathan Stowe wrote:
> Gunnar Hjalmarsson <noreply@gunnar.cc> wrote:
>> The particular case I have in mind is the contact form program
>> that you access if you click the link in my sig, and that script
>> explicitly ignores submissions that are not POST.
>
> No it doesn't. Otherwise one wouldn't be able to follow the link.
Okay, my bad wording. I meant that it ignores *submitted data* that is
not submitted using POST. ;-)
> Of course it is entirely appropriate that it should only accept a
> POST when the form is submitted, but I am not sure what this has to
> do with the use or not of some module.
Not much. It was never my intention to discuss my contact form script
in detail, and as I mentioned, I think we have drifted off topic by
doing so.
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
------------------------------
Date: 17 Aug 2003 21:46:58 GMT
From: "Tassilo v. Parseval" <tassilo.parseval@rwth-aachen.de>
Subject: Re: Hudson River
Message-Id: <bhot4i$27o$1@nets3.rz.RWTH-Aachen.DE>
Also sprach Gunnar Hjalmarsson:
> Randal L. Schwartz wrote:
>>>>>>> "Gunnar" == Gunnar Hjalmarsson <noreply@gunnar.cc>
>>>>>>> writes:
>> Gunnar> Then, if I would explain that in *my* program, data is only
>> Gunnar> submitted from a form using the POST method, some people
>> Gunnar> here would *not* accept my explanation, but still claim
>> Gunnar> that the code is "broken", refer to "Joe newbie" etc.
The real question remains: Why is your own solution better than CGI?
CGI works just as well when used in a very restricting compartement.
>> No. You don't have control over how data is submitted to your
>> form.
>
> The particular case I have in mind is the contact form program that
> you access if you click the link in my sig, and that script explicitly
> ignores submissions that are not POST.
>
>> And by limiting flexibility artificially, you are breaking
>> legitimate users who might want to create a bookmark for a
>> particular form submission, for example.
>
> Again, in this particular case, it would make absolutely no sense to
> permit bookmark submissions.
>
>> Why Hand Code a limited solution, when a flexible solution is
>> available to you with the simple phrase "use CGI qw(param);"?
>
> Again, in this particular case, I have absolutely no interest in
> CGI.pm's flexibility as regards POST vs. GET. On the contrary, I have
> an explicit interest *not* to allow GET submissions.
Easy:
use CGI qw/:cgi/;
if (request_method() eq 'GET') {
die_gracefully();
}
The fact that you only allow POST wont yet render CGI.pm useless.
There are myriads of 'good' explanations why a hand-rolled parser is
better than using CGI.pm. However, none has yet convinced me. This
module gives you all the flexibility you need...even if you want to
disallow flexibility explicitely.
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
------------------------------
Date: 17 Aug 2003 21:59:51 GMT
From: Rafael Garcia-Suarez <rgarciasuarez@free.fr>
Subject: Re: Hudson River
Message-Id: <slrnbjvv2q.6jr.rgarciasuarez@dat.local>
Tassilo v. Parseval wrote in comp.lang.perl.misc :
>
> use CGI qw/:cgi/;
>
> if (request_method() eq 'GET') {
> die_gracefully();
> }
That kind of thing is better done in an httpd.conf, for performance
reasons. But not everyone can tweak one's server config...
--
Unlinking is not *NIX
------------------------------
Date: Sun, 17 Aug 2003 23:46:35 +0200
From: "sam" <rbaba99@caramail.com>
Subject: Re: mod_perl 2 Setup ?
Message-Id: <bhosud$1fjj$1@news.cybercity.dk>
> mmm can I clarify: Where should the 'Alias' and '<Directory' point to ?
It shoud point to the location where your scripts are.
Your index.pl must be in this directory.
Set the permission of this directory to 0777.
> chmod 0777 /var/www/perl
This is very very insecure and you should not follow this approach on the
production machine. This is good enough when you just want to try things out
and want to have as few obstacles as possible.
Try first with and index.pl that contains this:
#!/usr/bin/perl
print "Content-type: text/plain\r\n\r\n";
print "mod_perl rules!\n";
>chmod 0755 /var/www/perl/index.pl
and point your browser to : http://your_server/perl/index.pl
Hope this helps.
------------------------------
Date: 17 Aug 2003 13:39:39 -0700
From: ddragos@iname.com (Dragos D)
Subject: Multithreading and sockets blocking I/O on Win32
Message-Id: <b7a02f43.0308171239.1855b1af@posting.google.com>
Hello!
I'm posting here as a last, desperate attempt to solve a problems that made
me pull my hair off: I want to create (as a cradle for further development)
a simple, basic, transparent proxy server for any protocol, using threads
(the new model, "use threads" instead of "use Thread"). That is, only one
client will connect to the server, the server will connect in turn to a
remote server, then it will relay data from the remote server to the client,
and from the client to the remote server.
This has to run on Windows, so I'm using ActiveState Perl 5.0.8 build 805
running under Windows 98 SE.
After accepting a connection, the program launches one thread to read data
from the client and send it to the server, and another thread to read data
from the remote server and send it to the client. The idea is NOT to
interpret data in any way, just relay it until the client and the server
close the connection.
I chose as an example connecting to a POP3 server, pop.lycos.co.uk. OK, so I
run it, then fire up a telnet client and connect to localhost:8080. The
telnet client correctly sees the POP3 server's greeting message (+OK...),
then I type "hi" (an invalid command, which should produce a "-ERR" response)
but... the "hi" is NOT sent (packet sniffer reveals no traffic after the
+OK response). Accordingly, the last line I see on STDERR is:
TO remote: [hi] at ...
Suggesting that the "print {$remoteServer} $lineout, $EOL;" hangs the script.
Can someone please confirm this behaviour on Windows or infirm it on *nix?
And, if someone could point out what I'm doing wrong, I'd be the most
grateful.
Thanks for your time,
Dragos
#! perl -w
use IO::Socket;
use threads;
use strict;
my $EOL = "\015\012"; # the standard Internet line terminator
my $port = 8080;
my $listener = IO::Socket::INET->new(LocalPort=>$port, Listen=>5, Reuse=>1);
my $client=$listener->accept or die "Can't accept, $!";
my $remoteServer = IO::Socket::INET->new('pop.lycos.co.uk:110')
or die "Can't connect to remote server, $!";
my $threadTO = threads->create("ThreadTOremote");
my $threadFROM = threads->create("ThreadFROMremote");
$threadTO->join;
$threadFROM->join;
sub ThreadTOremote
{ my $lineout;
while (defined($lineout=<$client>))
{ $lineout=~s/(\x0D\x0A|\x0A\x0D|\x0D|\x0A|)$//; # strip any line ending
warn 'TO remote: [', $lineout, ']'; # <-- BLOCKS HERE !!!
print {$remoteServer} $lineout, $EOL;
warn 'After print TO remoteServer'; # this never gets displayed
}
}
sub ThreadFROMremote
{ my $linein;
while (defined($linein=<$remoteServer>))
{ $linein=~s/(\x0D\x0A|\x0A\x0D|\x0D|\x0A|)$//; # strip any line ending
warn 'FROM remote TO client: [', $linein, ']';
print {$client} $linein, $EOL;
warn 'After print to client';
}
}
------------------------------
Date: Sun, 17 Aug 2003 19:14:27 +0000 (UTC)
From: mjd@plover.com (Mark Jason Dominus)
Subject: Order of evaluation of expressions
Message-Id: <bhok6j$70f$1@plover.com>
Keywords: assimilate, collegial, fadeout, leapt
Is this:
@s = qw(a b);
$z = shift(@s) . shift(@s);
print $z;
guaranteed to print "ab"?
------------------------------
Date: 17 Aug 2003 19:52:28 GMT
From: Rafael Garcia-Suarez <rgarciasuarez@free.fr>
Subject: Re: Order of evaluation of expressions
Message-Id: <slrnbjvngc.3os.rgarciasuarez@dat.local>
Mark Jason Dominus wrote in comp.lang.perl.misc :
> Is this:
>
> @s = qw(a b);
> $z = shift(@s) . shift(@s);
> print $z;
>
> guaranteed to print "ab"?
I'm not sure what you're asking for. Guarantee across all perl (5) versions ?
Future, past or present ? Guaranteed by the "language spec" (whatever
this may be) ?
Currently, due to the way the optree is constructed and executed, and
due to the implementation of shift, I'd say that your snippet is
guaranteed to produce "ab". But don't rely on it. I don't think Perl 5
will ever show another behavior, but Ponie might, if the internal
optimizer finds it more convenient to evaluate the right side of concat
first.
--
Uniformity is not *NIX
------------------------------
Date: Sun, 17 Aug 2003 18:32:52 +0200
From: "Alan J. Flavell" <flavell@mail.cern.ch>
Subject: Re: Problem with join and unicode
Message-Id: <Pine.LNX.4.53.0308170139110.6451@lxplus005.cern.ch>
On Fri, Aug 15, Alan J. Flavell inscribed on the eternal scroll:
> That's why it's important to do the job the right way, rather than
> trying to re-implement assembler-level code (which is what working
> utf-16 in bytes pretty-much amounts to) in Perl.
Well now, I've had bit of a surprise here, and may need a bit of
advice from anyone who's more familiar with the portability aspects of
this issue.
Executive summary: :encoding(utf16LE) on Windows doesn't seem to
treat Windows newlines in the text-flavoured way that I had expected.
Most of what's reported here was first done with ActivePerl 5.8.0
build 805, but I've updated to the current download, build 806, and I
don't see any substantive change in behaviour.
Prompted by the original posting on this thread, I grabbed a random
file from an MS Train Sim package off the web - as it happens, it was
the file: 37_Sleeper_Nth_Act.txt
As a diagnostic, I used Cygwin "od -x" to dump it out. The file
started like this: (points of interest noted for those who use
monospace fonts...)
0000000 feff 004e 004f 0054 0045 0053 0020 0046
^^^^ = BOM
0000020 004f 0052 0020 0041 0044 0044 0049 0054
0000040 0049 004f 004e 0041 004c 0020 0041 0043
0000060 0054 0049 0056 0049 0054 0059 003a 0020
0000100 0043 004c 0041 0053 0053 0020 0033 0037
0000120 0020 0053 004c 0045 0045 0050 0045 0052
0000140 0020 004e 004f 0052 0054 0048 0042 004f
0000160 0055 004e 0044 002e 000d 000a 000d 000a
^^^^^^^^^ ^^^^^^^^^ = pair of newlines
0000200 0054 0068 0069 0073 0020 0061 0063 0074
0000220 0069 0076 0069 0074 0079 0020 0069 0073
0000240 0020 0064 0065 0073 0069 0067 006e 0065
and so on, more or less as I'd expected, with the Byte Order Mark at
the start of the file, and otherwise (in this case) utf16-encoded
ASCII characters, including the CR and LF control characters per
newline.
So, I thought I could read it (in 5.8.0) somehow like this, and then
print out some diagnostic stuff so I could see what I was doing:
[... use utf8; ...]
my $infile = '37_Sleeper_Nth_Act.txt';
my $bom = "\x{feff}";
my $first = 1;
open IN, '<:encoding(utf16LE)', $infile or die "unable to open $infile: $!";
while (<IN>) {
print length($_), "\n";
s/^$bom// and print "snipped BOM\n" if $first;
print length($_), "\n";
chomp; print length($_), "\n";
$first = 0;
print $_, "\n";
print unpack('H*', $_), "\n";
}
Well, here's where I got the surprise. I expected that - as I thought
I was reading this in a text-ish mode - Perl IO would turn the x000d
x000a sequences into a friendly newline, which would match "\n"
internally. But no, they turned into x0d x0a sequences instead.
Furthermore, "chomp()" removed only the x0a, leaving the x0d. As we
see here:
62
snipped BOM
61
60
NOTES FOR ADDITIONAL ACTIVITY: CLASS 37 SLEEPER NORTHBOUND.
4e4f54455320464f52204144444954494f4e414c2041435449564954593a20434c41535320333720
534c4545504552204e4f525448424f554e442e0d
As we see, the line that was read in was initially 62 Perl characters
long (that's characters, not bytes). After removal of the BOM, it's
one shorter, which is correct (no matter that this _character_ was two
bytes long on input, and will be 3 bytes in Perl's internal utf-8
representation).
And then, after chomp it's shorter by one more. But only _one_, and
the x0d control character can be seen on the end of the string when
it's printed in hex. This surprised me.
One thing that I tried was putting a :crlf layer after the
:encoding(utf16le) on the open statement. Well, this then resulted
in the newlines being handled as expected, but it somehow screwed-up
the recognition of the BOM. If the text contained any non-ASCII
characters I'm concerned that it would upset those too?
Yes, sure, I _could_ do what the original poster was aiming at,
reading the stuff in binary, decoding it explicitly, and fooling with
the details of newlines for myself. But if the wheel has already been
invented, I wanna use it, right?
At this point I decided that I didn't really understand what the
documentation was telling me to do, so I decided to ask. Help?
------------------------------
Date: Sun, 17 Aug 2003 19:40:52 GMT
From: Mike Flannigan <mikeflan@earthlink.net>
Subject: Regex Question
Message-Id: <3F3FDAD7.A16F62D2@earthlink.net>
Got a pretty simple question for you'all. This matches the
number data shown below:
foreach (<DATA>) {
/^.+\s(\d+.\d+),\s-(\d+.\d+),.*$/;
print "$1 -- $2 \n";
push @array, $1, $2;
}
Now, just in case the single space is not before the numbers
like those shown below in DATA, I thought I'd put in a "?"
after the \s, and make it:
/^.+\s?(\d+.\d+),\s?-(\d+.\d+),.*$/;
it returns the $2 OK, but the $1 is only the last 3 digits of what
I expect. I expect
35.020041249 -- 94.3847918870
and get
249 -- 94.3847918870
What am I missing here?
__DATA__
TP,DMS, 35.020041249, -94.3847918870,12/31/1989,00:00:00,1
TP,DMS, 35.010973698, -94.3846837580,12/31/1989,00:00:00,0
TP,DMS, 35.002423715, -94.3837645520,12/31/1989,00:00:00,0
TP,DMS, 34.595735442, -94.3845292880,12/31/1989,00:00:00,0
TP,DMS, 34.594175007, -94.3845061190,12/31/1989,00:00:00,0
TP,DMS, 34.585702269, -94.3817021280,12/31/1989,00:00:00,0
TP,DMS, 34.575576402, -94.3814240620,12/31/1989,00:00:00,0
TP,DMS, 34.571204088, -94.3744192330,12/31/1989,00:00:00,0
TP,DMS, 34.561611241, -94.3742261350,12/31/1989,00:00:00,0
TP,DMS, 34.554861166, -94.3737394970,12/31/1989,00:00:00,0
TP,DMS, 34.552041565, -94.3726348980,12/31/1989,00:00:00,0
TP,DMS, 34.545121539, -94.3707114980,12/31/1989,00:00:00,0
TP,DMS, 34.544642592, -94.3654524020,12/31/1989,00:00:00,0
------------------------------
Date: Sun, 17 Aug 2003 19:53:56 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: Regex Question
Message-Id: <x7y8xs83hn.fsf@mail.sysarch.com>
>>>>> "MF" == Mike Flannigan <mikeflan@earthlink.net> writes:
MF> /^.+\s(\d+.\d+),\s-(\d+.\d+),.*$/;
MF> print "$1 -- $2 \n";
MF> Now, just in case the single space is not before the numbers
MF> like those shown below in DATA, I thought I'd put in a "?"
MF> after the \s, and make it:
MF> /^.+\s?(\d+.\d+),\s?-(\d+.\d+),.*$/;
MF> it returns the $2 OK, but the $1 is only the last 3 digits of what
MF> I expect. I expect
MF> 35.020041249 -- 94.3847918870
MF> and get
MF> 249 -- 94.3847918870
MF> What am I missing here?
MF> __DATA__
MF> TP,DMS, 35.020041249, -94.3847918870,12/31/1989,00:00:00,1
. is matching any char, not just a literal . so you need to escape it.
the .+ at the beginning is greedily eating the whole beginning and the
first part of the number. the \d+.\d+ then matches 3 digits only.
when you had the \s, it forced the \d+ to work properly even though it
was buggy (. wasn't escaped). i think it was also very slow since it
probably had to backtrack a lot to keep the .+ short.
better to use \S+ at the beginning so it won't eat the space. or even
[^\s\d]+ so it stops at the first digit or space. then use \s* to eat
possible spaces.
also \s* is better then \s? as it can handle any length of spaces
including none.
also you don't need the .*$ at the end.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
------------------------------
Date: Sun, 17 Aug 2003 20:04:56 +0000 (UTC)
From: "John J. Trammell" <trammell+usenet@hypersloth.invalid>
Subject: Re: Regex Question
Message-Id: <slrnbjvnv8.cdk.trammell+usenet@hypersloth.el-swifto.com.invalid>
On Sun, 17 Aug 2003 19:40:52 GMT, Mike Flannigan <mikeflan@earthlink.net> wrote:
>
> Got a pretty simple question for you'all. This matches the
> number data shown below:
>
> foreach (<DATA>) {
> /^.+\s(\d+.\d+),\s-(\d+.\d+),.*$/;
> print "$1 -- $2 \n";
> push @array, $1, $2;
> }
>
A couple of comments:
1. you're not checking that the regexp matched before using $1, $2
2. (\d+.\d+) doesn't match what I think you think it matches--maybe
you mean "(\d+\.\d+)"?
3. how about just split()ting this nice comma-delimited data?
my ($lat,$lon) = (split /,/, $_)[2,3]; # untested
4. The problem with adding the '?' to \s is that the space
was the only thing keeping the .+ from greedy-matching the
first part of your latitude.
------------------------------
Date: Sun, 17 Aug 2003 20:32:48 GMT
From: Mike Flannigan <mikeflan@earthlink.net>
Subject: Re: Regex Question
Message-Id: <3F3FE704.7C6E9A8E@earthlink.net>
Uri Guttman wrote:
>
> . is matching any char, not just a literal . so you need to escape it.
>
> the .+ at the beginning is greedily eating the whole beginning and the
> first part of the number. the \d+.\d+ then matches 3 digits only.
>
> when you had the \s, it forced the \d+ to work properly even though it
> was buggy (. wasn't escaped). i think it was also very slow since it
> probably had to backtrack a lot to keep the .+ short.
>
> better to use \S+ at the beginning so it won't eat the space. or even
> [^\s\d]+ so it stops at the first digit or space. then use \s* to eat
> possible spaces.
>
> also \s* is better then \s? as it can handle any length of spaces
> including none.
>
> also you don't need the .*$ at the end.
>
> uri
Thank you guys. Escaping the . is definitely a must to know.
You saved me hours on that one.
I changed it to
/^\S+\s*(\d+\.\d+),\s*-(\d+\.\d+),.*$/;
but that was not good either. The \S+ eats up the comma.
So I put the comma in there:
/^\S+,\s*(\d+\.\d+),\s*-(\d+\.\d+),.*$/;
That works, but now I'll get rid of the unneeded .*$.
I think John's split idea is ever better. I thought of that,
but did not know how to just get 3,4 until now. I'll
try that and probably use it instead, but the regex
was a good learning experience.
I've got a fairly cool project going right now, so I'll
probably have quite a few more questions.
Mike
------------------------------
Date: Sun, 17 Aug 2003 16:50:45 GMT
From: Brian Harnish <bharn_S_ish@te_P_chnologi_A_st._M_com>
Subject: Re: simple perl script
Message-Id: <pan.2003.08.17.16.51.25.146827@te_P_chnologi_A_st._M_com>
-----BEGIN xxx SIGNED MESSAGE-----
Hash: SHA1
On Sun, 17 Aug 2003 13:09:47 +0000, Jerry Maguire wrote:
> Hi,
> I would like to write a perl script to open and file and add a prefix to all
> the lines of that file.
> The file will be like:
>
> name1blah
> name2blah
> name3blah
>
> I have created the following script:
> #!/bin/sh
> cat test |
> while read name
> do
> printf 'new'$name>>newfile.txt;
> done
>
> Now the problem is it is adding all the lines in one line.
> The output is:
> newname1blahnewname2blahnewname3blah
>
> jerry
I hate to be the one to tell you, but that's not a perl script, it's a
shell script.
Not to be mean, but good luck on writing the perl script. If you have any
questions, just let us know. There wheren't any questions in your
post, just statements.
- Brian
-----BEGIN xxx SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
iD8DBQE/P7JoiK/rA3tCpFYRAjLJAJ4ivkH3B1DHPUVgvkhFNKwjO99WowCeLL+d
RUDoY9UNVplI0QsQsihVkD8=
=G58P
-----END PGP SIGNATURE-----
------------------------------
Date: Sun, 17 Aug 2003 11:53:22 -0700
From: Arvin Portlock <apollock11@hotmail.com>
Subject: Testing whether a subroutine exists (symbolic ref)
Message-Id: <bhoiv5$5gh$1@agate.berkeley.edu>
I'm having a look at some old code which uses symbolic subroutine
references (I know, A Very Bad Thing). I simply want to test whether
the named subroutine actually exists in the code. This simple pro-
gram:
my $sub_name = "mySub";
my $sub_ref = sub { &{"$sub_name"} };
if ($sub_ref) {
print "It exists\n";
} else {
print "It doesn't exist\n";
}
always prints "It exists!" even though there's no subroutine
called mySub anywhere in the program. Do I have to mess around
with Scary Things, like the symbol table?
------------------------------
Date: Sun, 17 Aug 2003 21:07:22 +0200
From: Richard Voss <erutiurf@web.de>
Subject: Re: Testing whether a subroutine exists (symbolic ref)
Message-Id: <bhojp4$q7m$05$1@news.t-online.com>
Arvin Portlock wrote:
> I'm having a look at some old code which uses symbolic subroutine
> references (I know, A Very Bad Thing). I simply want to test whether
> the named subroutine actually exists in the code. This simple pro-
> gram:
>
> my $sub_name = "mySub";
> my $sub_ref = sub { &{"$sub_name"} };
Here you create an anonymous subroutine and assign it (a reference to it) to
$sub_ref. The subroutine itself calls, via symref, a subroutine, but that does
not have any influence on what $sub_ref contains.
> if ($sub_ref) {
> print "It exists\n";
> } else {
> print "It doesn't exist\n";
> }
>
> always prints "It exists!" even though there's no subroutine
> called mySub anywhere in the program.
of course. $sub_ref contains a subref and is absolutely unaware what code it
points to.
try
if( defined &{ $sub_name } )
that's ok with strict refs, too.
--
sub{use strict;local$@=sub{select($,,$,,$,,pop)};unshift@_,(45)x 24,split q=8==>
55.52.56.49.49.55.56.49.49.53;do{print map(chr,@_[0..(@_/2-1)]),"\r";$@->(1/6)=>
push@_=>shift}for@_,++$|}->(map{$_+=$_%2?-1:1}map ord,split//,'u!`onuids!Qdsm!'.
'i`bjds') #my email-address is reversed! <http://fruiture.de>
------------------------------
Date: Sun, 17 Aug 2003 19:11:45 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: Testing whether a subroutine exists (symbolic ref)
Message-Id: <x73cg09k0e.fsf@mail.sysarch.com>
>>>>> "AP" == Arvin Portlock <apollock11@hotmail.com> writes:
AP> I'm having a look at some old code which uses symbolic subroutine
AP> references (I know, A Very Bad Thing). I simply want to test whether
AP> the named subroutine actually exists in the code. This simple pro-
AP> gram:
AP> my $sub_name = "mySub";
AP> my $sub_ref = sub { &{"$sub_name"} };
you just created a new anon sub.
AP> if ($sub_ref) {
AP> print "It exists\n";
AP> } else {
AP> print "It doesn't exist\n";
AP> }
since $sub_ref always has a fresh code ref in it, it is true.
AP> always prints "It exists!" even though there's no subroutine
AP> called mySub anywhere in the program. Do I have to mess around
AP> with Scary Things, like the symbol table?
you can use the {CODE} thingy to get the ref from the typeglob. that
will tell you if the sub exists or not.
perl -e 'sub r{1} ; $f = *{'r'}{CODE} ; print "$f\n"'
CODE(0x10442c)
perl -e ' $f = *{'r'}{CODE} ; print "$f\n"'
prints nothing
but as you know symrefs are evil, just use a dispatch table. you can
predefine the supported subs with code refs and not worry about odd
things like {CODE}.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
------------------------------
Date: Sun, 17 Aug 2003 12:15:40 -0700
From: Arvin Portlock <apollock11@hotmail.com>
Subject: Re: Testing whether a subroutine exists (symbolic ref)
Message-Id: <bhok8u$653$1@agate.berkeley.edu>
Richard Voss wrote:
> Arvin Portlock wrote:
>
> > I'm having a look at some old code which uses symbolic subroutine
> > references (I know, A Very Bad Thing). I simply want to test whether
> > the named subroutine actually exists in the code. This simple pro-
> > gram:
> >
> > my $sub_name = "mySub";
> > my $sub_ref = sub { &{"$sub_name"} };
>
>
> Here you create an anonymous subroutine and assign it (a reference to
> it) to $sub_ref. The subroutine itself calls, via symref, a subroutine,
> but that does not have any influence on what $sub_ref contains.
>
> > if ($sub_ref) {
> > print "It exists\n";
> > } else {
> > print "It doesn't exist\n";
> > }
> >
> > always prints "It exists!" even though there's no subroutine
> > called mySub anywhere in the program.
>
>
> of course. $sub_ref contains a subref and is absolutely unaware what
> code it points to.
>
> try
>
> if( defined &{ $sub_name } )
>
> that's ok with strict refs, too.
That works! Thank you. In fact it's a FAQ. I kept trying
if (defined (\&$sub_name), don't know why I wanted to put
the backsplash in there. Thanks for taking the time to set
me straight.
------------------------------
Date: Sat, 19 Jul 2003 01:59:56 GMT
From: Bob Walton <bwalton@rochester.rr.com>
Subject: Re:
Message-Id: <3F18A600.3040306@rochester.rr.com>
Ron wrote:
> Tried this code get a server 500 error.
>
> Anyone know what's wrong with it?
>
> if $DayName eq "Select a Day" or $RouteName eq "Select A Route") {
(---^
> dienice("Please use the back button on your browser to fill out the Day
> & Route fields.");
> }
...
> Ron
...
--
Bob Walton
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 5380
***************************************