[32591] in Perl-Users-Digest
Perl-Users Digest, Issue: 3863 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Jan 18 00:09:25 2013
Date: Thu, 17 Jan 2013 21:09:08 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Thu, 17 Jan 2013 Volume: 11 Number: 3863
Today's topics:
Re: close_on_exec in Perl : close socket opened in pare <rweikusat@mssgmbh.com>
Re: close_on_exec in Perl : close socket opened in pare <rweikusat@mssgmbh.com>
Re: close_on_exec in Perl : close socket opened in pare <ben@morrow.me.uk>
Re: close_on_exec in Perl : close socket opened in pare <rweikusat@mssgmbh.com>
Re: close_on_exec in Perl : close socket opened in pare <ben@morrow.me.uk>
Re: close_on_exec in Perl : close socket opened in pare <ben@morrow.me.uk>
Re: close_on_exec in Perl : close socket opened in pare <derykus@gmail.com>
Re: close_on_exec in Perl : close socket opened in pare <chen.yack@gmail.com>
Interpolating hash (Seymour J.)
Re: Interpolating hash <derykus@gmail.com>
Re: Interpolating hash <bjoern@hoehrmann.de>
Re: Interpolating hash <derykus@gmail.com>
Re: Interpolating hash <ben@morrow.me.uk>
Re: Regular expression for BOM required <bugbear@trim_papermule.co.uk_trim>
Re: Regular expression for BOM required <hjp-usenet2@hjp.at>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Thu, 17 Jan 2013 13:21:30 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: close_on_exec in Perl : close socket opened in parent process when fork child , is not working
Message-Id: <877gnceyhx.fsf@sapphire.mobileactivedefense.com>
Ben Morrow <ben@morrow.me.uk> writes:
> Quoth =?UTF-8?B?6ZmI5LqR5pif?= <chen.yack@gmail.com>:
[...]
>> 1 #!/bin/env perl
>> 2 use Linux::Inotify2;
>> 3 use Modern::Perl;
>> 4 use Mojo::IOLoop;
>> 5
>> 6 $^F=0;
>
> Don't do that. If you should manage to end up with fds less than 2 with
> the close-on-exec bit set, you will probably confuse any process you
> exec.
That's sort of an understatement: The way UNIX(*) 'file descriptor
creating' calls work is that a newly created file descriptor will
always use the lowest available file descriptor number. This means
that if file descriptor 2 ('stderr') is closed, the next file
descriptor created by a program running in this environment (assuming
0 and 1 are still open) will have the number 2 and anything which
writes 'to the standard error output' in the program will henceforth
use this file descriptor. It is very likely that the code of the
program wasn't written with this possibility in mind and that the
'thing' which is now referred to by the former stderr descriptor is
actually supposed to be used for something completely different, eg, a
TCP connection to some remote server who doesn't expect random
diagnostic output but communication conforming to some kind of
application protocol.
------------------------------
Date: Thu, 17 Jan 2013 13:43:16 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: close_on_exec in Perl : close socket opened in parent process when fork child , is not working
Message-Id: <8738y0exhn.fsf@sapphire.mobileactivedefense.com>
Ben Morrow <ben@morrow.me.uk> writes:
> Quoth =?UTF-8?B?6ZmI5LqR5pif?= <chen.yack@gmail.com>:
>> I open a socket in my main perl program,
>> than i execute a shell script :'/home/admin/t.sh', when i use CTRL-C to
>> interrupt the perl program, i saw that the port '4444' was already
>> openning by the shell script's programB
>>
>> soChow to close the socket fd when the `system` function was executed ?
>
> Well, normally the answer would be 'set the close-on-exec flag', but it
> is usually set by default on newly-opened filehandles since $^F is
> usually 2. That means Mojo::IOLoop is going out of its way to clear the
> flag, which suggests perhaps you should leave it that way.
Should this be the case, this 'suggests' that 'MoJo' is seriously
broken in this respect: A listening file descriptor which 'leaks'
through to an independent program in this way will prevent the actual
server from being restarted until this independent program has
terminated because the bind call in the server will fail with
EADDRINUSE for as long as it is still 'sitting' on this socket. That's
the kind of errors which makes people reboot 24x7 'server computers' in
despair because 'some important program' can't be started and no one
understands why (even if someone understands why, 'go hunting for the
leaked descriptor to terminate the offending process' may be seriously
onerous in a sufficiently hostile environment[*]).
[*] A 'sufficiently hostile environment' I remember would be the
'NS-BSD' BSD-based NetASQ firewall OS where this problem would
occasionally prevent restarting the process which provided 'the
GUI'. I ended up with writing a Perl script which correlated the fstat
and netstat output together (IIRC), based on the kernel addresses of
'tcp socket control blocks', this being the only way to determine
which process had a particular fd open in that environment. And I
wasn't really one of the support people supposed to deal with these
beasts, the problem only reached me because this device couldn't be
rebooted and everybody else was at his wits end (which is completely
ok for a non-programmer confronted with *this*).
------------------------------
Date: Thu, 17 Jan 2013 23:46:47 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: close_on_exec in Perl : close socket opened in parent process when fork child , is not working
Message-Id: <74fms9-4he1.ln1@anubis.morrow.me.uk>
Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
> Ben Morrow <ben@morrow.me.uk> writes:
> > Quoth =?UTF-8?B?6ZmI5LqR5pif?= <chen.yack@gmail.com>:
> >> I open a socket in my main perl program,
> >> than i execute a shell script :'/home/admin/t.sh', when i use CTRL-C to
> >> interrupt the perl program, i saw that the port '4444' was already
> >> openning by the shell script's programB
> >>
> >> soChow to close the socket fd when the `system` function was executed ?
> >
> > Well, normally the answer would be 'set the close-on-exec flag', but it
> > is usually set by default on newly-opened filehandles since $^F is
> > usually 2. That means Mojo::IOLoop is going out of its way to clear the
> > flag, which suggests perhaps you should leave it that way.
>
> Should this be the case, this 'suggests' that 'MoJo' is seriously
Mojo. I don't like the name much, but that's no reason to misspell it.
> broken in this respect: A listening file descriptor which 'leaks'
> through to an independent program in this way will prevent the actual
> server from being restarted until this independent program has
> terminated because the bind call in the server will fail with
> EADDRINUSE for as long as it is still 'sitting' on this socket.
I wouldn't call it 'broken'. It is no longer usual for Perl programs of
the sort which use frameworks like Mojo to exec external programs, so
the question of random programs inheriting your fds shouldn't normally
arise. It looks as though Mojo clears the CLOEXEC bit because under some
circumstance your program will run as a long-running daemon, and it
periodically re-execs itself to start from a clean slate.
> [*] A 'sufficiently hostile environment' I remember would be the
> 'NS-BSD' BSD-based NetASQ firewall OS where this problem would
> occasionally prevent restarting the process which provided 'the
> GUI'. I ended up with writing a Perl script which correlated the fstat
> and netstat output together (IIRC), based on the kernel addresses of
> 'tcp socket control blocks', this being the only way to determine
> which process had a particular fd open in that environment. And I
> wasn't really one of the support people supposed to deal with these
> beasts, the problem only reached me because this device couldn't be
> rebooted and everybody else was at his wits end (which is completely
> ok for a non-programmer confronted with *this*).
I don't know; seems like fairly ordinary sysadmin stuff to me... (I
assume this was not one of the BSDs with sockstat(8)?)
Ben
------------------------------
Date: Fri, 18 Jan 2013 00:57:12 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: close_on_exec in Perl : close socket opened in parent process when fork child , is not working
Message-Id: <874nifmhp3.fsf@sapphire.mobileactivedefense.com>
Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
>> Ben Morrow <ben@morrow.me.uk> writes:
[...]
>> > Well, normally the answer would be 'set the close-on-exec flag', but it
>> > is usually set by default on newly-opened filehandles since $^F is
>> > usually 2. That means Mojo::IOLoop is going out of its way to clear the
>> > flag, which suggests perhaps you should leave it that way.
>>
>> Should this be the case, this 'suggests' that 'MoJo' is seriously
>> broken in this respect: A listening file descriptor which 'leaks'
>> through to an independent program in this way will prevent the actual
>> server from being restarted until this independent program has
>> terminated because the bind call in the server will fail with
>> EADDRINUSE for as long as it is still 'sitting' on this socket.
>
> I wouldn't call it 'broken'. It is no longer usual for Perl programs of
> the sort which use frameworks like Mojo to exec external programs, so
> the question of random programs inheriting your fds shouldn't normally
> arise.
Is this documented? If so, it's a misfeature. If not, it's another
case of "it works on my laptop !!1" (and - surely - no one would ever
think of doing anything I NEVER do !!2), IOW, it's broken and the
people who wrote the perl code knew better.
> It looks as though Mojo clears the CLOEXEC bit because under some
> circumstance your program will run as a long-running daemon, and it
> periodically re-execs itself to start from a clean slate.
If this is the case, it looks suspiciously like "the Mojo code
contains serious bugs the developers neither understand nor care about
and they thought periodically invoking exec would be a great
workaround even if this means that 'exec without preautions' will be
broken for everyone else despite the perl documentation claims that
perl deals with this without application programmers having to worry
about that". In particular, this suggests that 'Mojo' leaks memory.
------------------------------
Date: Fri, 18 Jan 2013 01:47:19 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: close_on_exec in Perl : close socket opened in parent process when fork child , is not working
Message-Id: <76mms9-9pf1.ln1@anubis.morrow.me.uk>
Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
> Ben Morrow <ben@morrow.me.uk> writes:
> > Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
> >> Ben Morrow <ben@morrow.me.uk> writes:
>
> >> > Well, normally the answer would be 'set the close-on-exec flag', but it
> >> > is usually set by default on newly-opened filehandles since $^F is
> >> > usually 2. That means Mojo::IOLoop is going out of its way to clear the
> >> > flag, which suggests perhaps you should leave it that way.
> >>
> >> Should this be the case, this 'suggests' that 'MoJo' is seriously
> >> broken in this respect: A listening file descriptor which 'leaks'
> >> through to an independent program in this way will prevent the actual
> >> server from being restarted until this independent program has
> >> terminated because the bind call in the server will fail with
> >> EADDRINUSE for as long as it is still 'sitting' on this socket.
> >
> > I wouldn't call it 'broken'. It is no longer usual for Perl programs of
> > the sort which use frameworks like Mojo to exec external programs, so
> > the question of random programs inheriting your fds shouldn't normally
> > arise.
>
> Is this documented? If so, it's a misfeature.
I don't disagree.
> > It looks as though Mojo clears the CLOEXEC bit because under some
> > circumstance your program will run as a long-running daemon, and it
> > periodically re-execs itself to start from a clean slate.
>
> If this is the case, it looks suspiciously like "the Mojo code
> contains serious bugs the developers neither understand nor care about
> and they thought periodically invoking exec would be a great
> workaround even if this means that 'exec without preautions' will be
> broken for everyone else despite the perl documentation claims that
> perl deals with this without application programmers having to worry
> about that". In particular, this suggests that 'Mojo' leaks memory.
I wouldn't express it like that, but, again, I don't disagree.
Ben
------------------------------
Date: Thu, 17 Jan 2013 11:14:32 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: close_on_exec in Perl : close socket opened in parent process when fork child , is not working
Message-Id: <o13ls9-l691.ln1@anubis.morrow.me.uk>
Quoth =?UTF-8?B?6ZmI5LqR5pif?= <chen.yack@gmail.com>:
> I open a socket in my main perl program,
> than i execute a shell script :'/home/admin/t.sh', when i use CTRL-C to
> interrupt the perl program, i saw that the port '4444' was already
> openning by the shell script's program。
>
> so,how to close the socket fd when the `system` function was executed ?
Well, normally the answer would be 'set the close-on-exec flag', but it
is usually set by default on newly-opened filehandles since $^F is
usually 2. That means Mojo::IOLoop is going out of its way to clear the
flag, which suggests perhaps you should leave it that way.
So, you have two options: clear the flag (with fcntl) just before the
system and restore its previous value afterwards, or do the fork and
exec manually. In the latter case you can obviously close any
filehandles you like between fork and exec.
> 1 #!/bin/env perl
> 2 use Linux::Inotify2;
> 3 use Modern::Perl;
> 4 use Mojo::IOLoop;
> 5
> 6 $^F=0;
Don't do that. If you should manage to end up with fds less than 2 with
the close-on-exec bit set, you will probably confuse any process you
exec. Unless you're doing something peculiar, fds 0, 1 and 2 are always
open, so there's no need to set $^F lower than 2.
In any case, you should localise any global if you change it, and you
should be rather careful about calling external code (like Mojo) with
important globals set to non-standard values.
> 7 Mojo::IOLoop->server({
> 8 port => 4444,
> 9 },sub{
> 10 my ($stream,$chunk) = @_;
> 11 $stream->write('HTTP/1.1 200 OK');
> 12 print " I am server \n";
> 13 });
> 14
> 15 system('/home/admin/t.sh &>/dev/null &');
Careful with that syntax. &> is a cshism, and /bin/sh doesn't
necessarily understand it.
> 16
> 17 Mojo::IOLoop->start unless Mojo::IOLoop->is_running;
Ben
------------------------------
Date: Thu, 17 Jan 2013 13:29:29 -0800 (PST)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: close_on_exec in Perl : close socket opened in parent process when fork child , is not working
Message-Id: <a39dffd2-918f-4af7-80f6-00d34869ca09@googlegroups.com>
On Thursday, January 17, 2013 2:03:17 AM UTC-8, =E9=99=88=E4=BA=91=E6=98=9F=
wrote:
> I open a socket in my main perl program,
>=20
> than i execute a shell script :'/home/admin/t.sh', when i use CTRL-C to i=
nterrupt the perl program, i saw that the port '4444' was already openning =
by the shell script's program=E3=80=82
>=20
>=20
>=20
> so=EF=BC=8Chow to close the socket fd when the `system` function was exec=
uted ?
>=20
>=20
>=20
> the $^F variable is unuseful.
>=20
>=20
>=20
> very thanks !
>=20
> =20
>=20
>=20
>=20
> 1 #!/bin/env perl
>=20
> 2 use Linux::Inotify2;
>=20
> 3 use Modern::Perl;
>=20
> 4 use Mojo::IOLoop;
>=20
> 5=20
>=20
> 6 $^F=3D0;
>=20
> 7 Mojo::IOLoop->server({
>=20
> 8 port =3D> 4444,
>=20
> 9 },sub{
>=20
> 10 my ($stream,$chunk) =3D @_;
>=20
> 11 $stream->write('HTTP/1.1 200 OK');
>=20
> 12 print " I am server \n";
>=20
> 13 });
>=20
> 14=20
>=20
> 15 system('/home/admin/t.sh &>/dev/null &');
>=20
> 16=20
>=20
> 17 Mojo::IOLoop->start unless Mojo::IOLoop->is_running;
Other options would definitely be preferable but if
you need to close only this particular descriptor
you could close it in the shell, eg,=20
my $fd =3D fileno($some_socket_stream);
system("exec $fd <&-; /home/admin/t.sh ....");
--=20
Charles DeRykus
------------------------------
Date: Thu, 17 Jan 2013 18:42:36 -0800 (PST)
From: =?UTF-8?B?6ZmI5LqR5pif?= <chen.yack@gmail.com>
Subject: Re: close_on_exec in Perl : close socket opened in parent process when fork child , is not working
Message-Id: <c76e08cb-f8df-42f8-a76c-db13aeea4a7e@googlegroups.com>
You are right, I use following code , watch /proc/2312/fd , find that shell=
script does not hold any socket fd.
this should be the mojolicious's problem.
$cat t.pl=20
#!/bin/env perl
use Modern::Perl;
use IO::Socket; # or Socket;
$^F=3D2;
# server
my $server_port =3D 4444;
my $server =3D IO::Socket::INET->new(
LocalPort =3D> $server_port,
Type =3D> SOCK_STREAM,
Reuse =3D> 1,
Listen =3D> 10)=20
or die "Couldn't be a tcp server on port $server_port:$!\n";
my $client;
while($client =3D $server->accept()){
say 'client connected ...';
system('/home/admin/t.sh &>/dev/null &');
=20
}
=E5=9C=A8 2013=E5=B9=B41=E6=9C=8817=E6=97=A5=E6=98=9F=E6=9C=9F=E5=9B=9BUTC+=
8=E4=B8=8B=E5=8D=887=E6=97=B614=E5=88=8632=E7=A7=92=EF=BC=8CBen Morrow=E5=
=86=99=E9=81=93=EF=BC=9A
> Quoth =3D?UTF-8?B?6ZmI5LqR5pif?=3D <chen.yack@gmail.com>:
>=20
> > I open a socket in my main perl program,
>=20
> > than i execute a shell script :'/home/admin/t.sh', when i use CTRL-C to
>=20
> > interrupt the perl program, i saw that the port '4444' was already
>=20
> > openning by the shell script's program=E3=80=82
>=20
> >=20
>=20
> > so=EF=BC=8Chow to close the socket fd when the `system` function was ex=
ecuted ?
>=20
>=20
>=20
> Well, normally the answer would be 'set the close-on-exec flag', but it
>=20
> is usually set by default on newly-opened filehandles since $^F is
>=20
> usually 2. That means Mojo::IOLoop is going out of its way to clear the
>=20
> flag, which suggests perhaps you should leave it that way.
>=20
>=20
>=20
> So, you have two options: clear the flag (with fcntl) just before the
>=20
> system and restore its previous value afterwards, or do the fork and
>=20
> exec manually. In the latter case you can obviously close any
>=20
> filehandles you like between fork and exec.
>=20
>=20
>=20
> > 1 #!/bin/env perl
>=20
> > 2 use Linux::Inotify2;
>=20
> > 3 use Modern::Perl;
>=20
> > 4 use Mojo::IOLoop;
>=20
> > 5=20
>=20
> > 6 $^F=3D0;
>=20
>=20
>=20
> Don't do that. If you should manage to end up with fds less than 2 with
>=20
> the close-on-exec bit set, you will probably confuse any process you
>=20
> exec. Unless you're doing something peculiar, fds 0, 1 and 2 are always
>=20
> open, so there's no need to set $^F lower than 2.
>=20
>=20
>=20
> In any case, you should localise any global if you change it, and you
>=20
> should be rather careful about calling external code (like Mojo) with
>=20
> important globals set to non-standard values.
>=20
>=20
>=20
> > 7 Mojo::IOLoop->server({
>=20
> > 8 port =3D> 4444,
>=20
> > 9 },sub{
>=20
> > 10 my ($stream,$chunk) =3D @_;
>=20
> > 11 $stream->write('HTTP/1.1 200 OK');
>=20
> > 12 print " I am server \n";
>=20
> > 13 });
>=20
> > 14=20
>=20
> > 15 system('/home/admin/t.sh &>/dev/null &');
>=20
>=20
>=20
> Careful with that syntax. &> is a cshism, and /bin/sh doesn't
>=20
> necessarily understand it.
>=20
>=20
>=20
> > 16=20
>=20
> > 17 Mojo::IOLoop->start unless Mojo::IOLoop->is_running;
>=20
>=20
>=20
> Ben
------------------------------
Date: Thu, 17 Jan 2013 14:15:35 -0500
From: Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>
Subject: Interpolating hash
Message-Id: <50f84dd7$12$fuzhry+tra$mr2ice@news.patriot.net>
I would like to produce a message that includes the value of a hash.
Perl does not recognize a % sigil inside of quoted strings. I recall
that there is syntax that causes interpolation of an expression, but
haven't been able to find it in the documentation. Is there a clean
way to do this, or should I just piece the text together with the
concatenation operator, or interpolate an intermediate variable? Of
the available techniques, which is considered to be the best style?
Thanks.
--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>
Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to spamtrap@library.lspace.org
------------------------------
Date: Thu, 17 Jan 2013 11:58:18 -0800 (PST)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: Interpolating hash
Message-Id: <da62318f-adff-4c91-b05b-1103347d9290@googlegroups.com>
On Thursday, January 17, 2013 11:15:35 AM UTC-8, Seymour J. Shmuel Metz wrote:
> I would like to produce a message that includes the value of a hash.
>
> Perl does not recognize a % sigil inside of quoted strings. I recall
>
> that there is syntax that causes interpolation of an expression, but
>
> haven't been able to find it in the documentation. Is there a clean
>
> way to do this, or should I just piece the text together with the
>
> concatenation operator, or interpolate an intermediate variable? Of
>
> the available techniques, which is considered to be the best style?
>
>
maybe you're thinking of @{[]} , eg.
%h=(a=1,b=>2);
say "hash: @{[%h]}";
--
Charles DeRykus
------------------------------
Date: Thu, 17 Jan 2013 21:01:00 +0100
From: Bjoern Hoehrmann <bjoern@hoehrmann.de>
Subject: Re: Interpolating hash
Message-Id: <21mgf89dn72s971h6s9ao7mccj08l5424h@hive.bjoern.hoehrmann.de>
* Shmuel wrote in comp.lang.perl.misc:
>I would like to produce a message that includes the value of a hash.
>Perl does not recognize a % sigil inside of quoted strings. I recall
>that there is syntax that causes interpolation of an expression, but
>haven't been able to find it in the documentation. Is there a clean
>way to do this, or should I just piece the text together with the
>concatenation operator, or interpolate an intermediate variable? Of
>the available techniques, which is considered to be the best style?
You do not need `%` to refer to a value in a hash in Perl5,
my %hash = ('key' => 'value');
print "$hash{key}";
would print `value`.
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
------------------------------
Date: Thu, 17 Jan 2013 12:01:01 -0800 (PST)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: Interpolating hash
Message-Id: <f4120e38-9af2-4b2a-a144-ca47216c11bb@googlegroups.com>
On Thursday, January 17, 2013 11:58:18 AM UTC-8, C.DeRykus wrote:
> On Thursday, January 17, 2013 11:15:35 AM UTC-8, Seymour J. Shmuel Metz wrote:
>
>> ..
>
> ...
>
> %h=(a=1,b=>2);
^
=>
--
Charles DeRykus
------------------------------
Date: Thu, 17 Jan 2013 22:44:48 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Interpolating hash
Message-Id: <0gbms9-k2e1.ln1@anubis.morrow.me.uk>
Quoth Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>:
> I would like to produce a message that includes the value of a hash.
'Value' how? As a $"-separated list of undifferentiated keys and values,
or in some other form?
> Perl does not recognize a % sigil inside of quoted strings. I recall
> that there is syntax that causes interpolation of an expression, but
> haven't been able to find it in the documentation.
It isn't in the documentation because it isn't a syntax as such, more of
a hack. The usual form looks like "@{ [ %hash ] }", which builds an anon
array from a list and then immediately dereferences it and interprets
the result. (Don't be tempted to use the "${\(...)}" form, since \()
gives list context to its inside, which is confusing.)
> Is there a clean
> way to do this, or should I just piece the text together with the
> concatenation operator, or interpolate an intermediate variable? Of
> the available techniques, which is considered to be the best style?
I would consider either of those alternatives cleaner than that hack. Of
course, "..." . %foo . "..." will give you the scalar value of the hash,
which is a not-particularly-useful string; you need something more like
"..." . join($", %foo) . "...", or to use a module like Data::Dump which
pretty-prints.
Increasingly I find myself preferring sprintf when interpolating
expressions into strings.
Ben
------------------------------
Date: Thu, 17 Jan 2013 12:16:38 +0000
From: bugbear <bugbear@trim_papermule.co.uk_trim>
Subject: Re: Regular expression for BOM required
Message-Id: <2KGdnZf7UYw6dmrNnZ2dnUVZ8vqdnZ2d@brightview.co.uk>
Peter J. Holzer wrote:
> On 2013-01-14 10:12, bugbear <bugbear@trim_papermule.co.uk_trim> wrote:
>> Peter Gordon wrote:
>>> "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote in news:slrnkf30s7.kis.hjp-
>>> usenet2@hrunkner.hjp.at:
>>>
>>>> You want to match the single character U+FEFF BOM here, not a sequence
>>>> of two characters U+00FE LATIN SMALL LETTER THORN U+00FF LATIN SMALL
>>>> LETTER Y WITH DIAERESIS.
>>>>
>>>> So you have to write
>>>>
>>>> say "Found regular expression" if /\x{FEFF}/;
>>>>
>>>> print;
>>>> }
>>>>
>>> Thanks Peter,
>>> It was the curly braces which I was missing.
>>>
>>
>> Presumably you also have to check for the "other order" ?
>
> No. After decoding there is no byte order any more, just characters, and
> the character you want to match is \x{FEFF}.
>
> If you try to open a big-endian file with :encoding(utf16le), the script
> will die trying to read the first line.
>
> (If you open it with :encoding(utf16), the BOM will be used to determine
> endianness and *not* passed through - this seems a little inconsistent
> to me)
I had (perhaps wrongly) assumed that the OP's true intent (or need)
was to read the BOM and use it to decide *which* byte order
was being used, and hence to use the correct decoder.
BugBear
------------------------------
Date: Thu, 17 Jan 2013 16:32:06 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: Regular expression for BOM required
Message-Id: <slrnkfg6bm.236.hjp-usenet2@hrunkner.hjp.at>
On 2013-01-17 12:16, bugbear <bugbear@trim_papermule.co.uk_trim> wrote:
> Peter J. Holzer wrote:
>> On 2013-01-14 10:12, bugbear <bugbear@trim_papermule.co.uk_trim> wrote:
>>> Peter Gordon wrote:
>>>> "Peter J. Holzer" <hjp-usenet2@hjp.at> wrote in news:slrnkf30s7.kis.hjp-
>>>> usenet2@hrunkner.hjp.at:
[$_ was read from a file opened with ":encoding(utf16le)"]
>>>>> say "Found regular expression" if /\x{FEFF}/;
[...]
>>> Presumably you also have to check for the "other order" ?
>>
>> No. After decoding there is no byte order any more, just characters, and
>> the character you want to match is \x{FEFF}.
>>
>> If you try to open a big-endian file with :encoding(utf16le), the script
>> will die trying to read the first line.
>>
>> (If you open it with :encoding(utf16), the BOM will be used to determine
>> endianness and *not* passed through - this seems a little inconsistent
>> to me)
>
> I had (perhaps wrongly) assumed that the OP's true intent (or need)
> was to read the BOM and use it to decide *which* byte order
> was being used, and hence to use the correct decoder.
If that was the intent of the OP, opening the file in one byte order and
checking for a reversed BOM wouldn't work: The diamond operator dies
when it encounters the wrong BOM (of course you could catch the
exception and then try the other endianness).
I think there are two good ways to open UTF-16 files with unknown byte
order:
1) The carefree method: Just use :encoding(utf16), and it will
automatically determine the endianness from the BOM, and you don't
have to care whether the file is little or big endian. Plus, the BOM
is automatically filtered out so you don't have to. On the flipside,
you lose the information about the endianness and the BOM, so if you
need that, this isn't for you.
2) Open the file in binary mode and read the first few bytes. Determine
the correct encoding from those, rewind and set the encoding layer.
This is more work, but a lot more flexible: You can detect any
encoding you want.
As always, there are probably more ways to do it.
hp
--
_ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR | Man feilt solange an seinen Text um, bis
| | | hjp@hjp.at | die Satzbestandteile des Satzes nicht mehr
__/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3863
***************************************