[32777] in Perl-Users-Digest
Perl-Users Digest, Issue: 4041 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Sep 23 14:14:36 2013
Date: Mon, 23 Sep 2013 11:14:08 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Mon, 23 Sep 2013 Volume: 11 Number: 4041
Today's topics:
Re: utilities in perl <ben@morrow.me.uk>
Re: utilities in perl (Tim McDaniel)
Re: utilities in perl <ben@morrow.me.uk>
Re: utilities in perl <hjp-usenet3@hjp.at>
Re: utilities in perl <rweikusat@mobileactivedefense.com>
Re: utilities in perl <justin.1303@purestblue.com>
Re: utilities in perl (hymie!)
Re: utilities in perl (Tim McDaniel)
Re: utilities in perl <ben@morrow.me.uk>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sun, 22 Sep 2013 21:16:38 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: utilities in perl
Message-Id: <6qv3ha-t94.ln1@anubis.morrow.me.uk>
Quoth Cal Dershowitz <Cal@example.invalid>:
>
> Do germans typically have directories for their own stuff that have
> german encodings like:
>
> /home/jue/Documents/Persoenlichkeiten/
>
> , where you have an actual o umlaut as opposed to the english
> transcription? Maybe even the u too.
The answer to that is rather complicated :).
As far as the kernel is concerned, paths are just sequences of bytes.
Byte 47 (/) is treated as a directory delimiter, and byte 0 is forbidden
(since the kernel interfaces use C strings it's actually impossible to
pass a 0 byte). Any further interpretation of those bytes as a sequence
of characters is left entirely up to the application.
Traditinally, most apps would completely ignore the issue of filename
charsets[0], so a user typed text into a terminal, the terminal
converted it into bytes and sent it to the app, the app sent it to the
kernel unmodified, some other app got it back and sent it back to the
terminal, and the user saw the same text as before. The net effect of
this was that filenames were encoded using whatever charset the user's
terminal used; apps which needed to convert filenames into characters
(for instance, graphical file browsers, which have to do their own font
selection) would assume that the user had set their locale environment
variables (LC_ALL or LOCALE) to a locale using the right character set.
This worked after a fashion, but only if users never changed their
locale settings and different users trying to share files always used
the same charset. As a result the convention has (partially) changed:
these days, many apps (particularly graphical apps) will assume
filenames are always in UTF-8, regardless of the locale settings. This
is not universal; in particular, command-line programs like ls often
still assume filename are in the locale's charset.sal; in particular,
command-line programs like ls often still assume filename are in the
locale's charset.
Perl on Unix currently behaves the same as the kernel: the strings you
pass to 'open' and get back from 'readdir' are strings of bytes, and
deciding whether to interpret them as UTF-8 or something else is up to
you. If the user is using a UTF-8 locale it's probably safe to assume
filenames are also in UTF-8 (though don't assume they will be valid);
otherwise, they may be UTF-8, they may be the locale charset, or they
may be something else entirely.
I use Unicode in filenames for my own stuff when I have a reason to, and
it doesn't cause me any problems; but I have set things up so I have a
UTF-8 terminal, a UTF-8 locale and a file manager that uses UTF-8. I
wouldn't want to rely on it in general.
Ben
[0] I am using 'charset' in the MIME sense of 'a mapping from human-
readable text to sequences of bytes, and back again'. The Unicode
people make a lot of careful distinctions between 'character sets'
and 'character encodings' and 'transformation formats' and so on,
none of which matter here.
------------------------------
Date: Mon, 23 Sep 2013 04:17:32 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: utilities in perl
Message-Id: <l1ofcs$i38$1@reader1.panix.com>
In article <acq3ha-rr2.ln1@anubis.morrow.me.uk>,
Ben Morrow <ben@morrow.me.uk> wrote:
>
>Quoth "Peter J. Holzer" <hjp-usenet3@hjp.at>:
>>
>> To summarize:
>>
>> On startup, a program is provided with three sets of information:
>>
>> 1) The argument vector: This is an array of strings containing the
>> command name and any "command line" arguments, i.e. the arguments
>> you type on the command line after the command (interactively), or
>> the arguments to exec (in a program). (Perl is a bit unusual in that
>> it shoves the first argument (the command name) into $0 and only the
>> rest of the arguments into @ARGV).
>
>No, the first argument (the command name) goes into $^X, the first
>non-option argument goes into $0, and the rest of the arguments go into
>@ARGV.
I found your answer confusing. When I type a command line, like just
now with
$ chmod u+x local/test/106.pl
$ local/test/106.pl hello world
$0 was 'local/test/106.pl', as I expected, which was what I was
thinking of as the "command name", and I was thinking of "hello" as
the "first non-option argument".
However, the first line of the script was
#! /usr/bin/perl
and $^X was output as '/usr/bin/perl'.
So I think the explanation should be expanded. In UNIXy systems, for
a script that starts with #! and run from the command line, the
program on the #! line is put into $^X, and in particular, if it's a
Perl script, $^X is the perl program being run. $0 is set using the
first word on the command line (identifying the script itself), and
the rest of the arguments are put into @ARGV.
--
Tim McDaniel, tmcd@panix.com
------------------------------
Date: Mon, 23 Sep 2013 06:38:55 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: utilities in perl
Message-Id: <fo05ha-mph1.ln1@anubis.morrow.me.uk>
Quoth tmcd@panix.com:
> In article <acq3ha-rr2.ln1@anubis.morrow.me.uk>,
> Ben Morrow <ben@morrow.me.uk> wrote:
> >
> >No, the first argument (the command name) goes into $^X, the first
> >non-option argument goes into $0, and the rest of the arguments go into
> >@ARGV.
>
> I found your answer confusing. When I type a command line, like just
> now with
> $ chmod u+x local/test/106.pl
> $ local/test/106.pl hello world
> $0 was 'local/test/106.pl', as I expected, which was what I was
> thinking of as the "command name", and I was thinking of "hello" as
> the "first non-option argument".
>
> However, the first line of the script was
> #! /usr/bin/perl
> and $^X was output as '/usr/bin/perl'.
>
> So I think the explanation should be expanded. In UNIXy systems, for
> a script that starts with #! and run from the command line, the
> program on the #! line is put into $^X, and in particular, if it's a
> Perl script, $^X is the perl program being run. $0 is set using the
> first word on the command line (identifying the script itself), and
> the rest of the arguments are put into @ARGV.
You snipped the next paragraph, where I mentioned the kernel's #!
processing. This happens entirely inside the kernel (or in the shell on
*really* old systems), and the command line *as perl sees it* ends up as
/usr/bin/perl local/text/106.pl hello world
and the various bits are disposed of as I described. If you invoke perl
'normally' with the command-line above you get the same values in the
variables.
[I am of course omitting the step where the kernel invokes the PT_INTERP
entry in the perl binary, giving a final final command line starting
with /libexec/ld-elf.so.1 or /lib/ld-linux.so.2 or some such, but that
occurs before the start of perl's main() and so doesn't really count.]
Ben
------------------------------
Date: Mon, 23 Sep 2013 12:35:52 +0200
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: utilities in perl
Message-Id: <slrnl406c8.u7d.hjp-usenet3@hrunkner.hjp.at>
On 2013-09-22 18:43, Ben Morrow <ben@morrow.me.uk> wrote:
>
> Quoth "Peter J. Holzer" <hjp-usenet3@hjp.at>:
>>
>> To summarize:
>>
>> On startup, a program is provided with three sets of information:
>>
>> 1) The argument vector: This is an array of strings containing the
>> command name and any "command line" arguments, i.e. the arguments
>> you type on the command line after the command (interactively), or
>> the arguments to exec (in a program). (Perl is a bit unusual in that
>> it shoves the first argument (the command name) into $0 and only the
>> rest of the arguments into @ARGV).
>
> No, the first argument (the command name) goes into $^X, the first
> non-option argument goes into $0, and the rest of the arguments go into
> @ARGV.
That's what I get from adding parenthetical remarks just befor posting.
You are right of course, from the POV of the perl process. $0 and @ARGV
are handles as I described from the POV of the caller, but I didn't
write that.
> (Unless perl gets $^X from somewhere else, in which case the
> first argument is thrown away, or you pass an -e option, in which case
> $0 is "-e".)
>
> This is further confused by the kernel's (and perl's) #! processing, but
> by the time perl gets its final argument list to process the first
> argument is a path to perl itself.
Why "further confused"? The mechanism you describe is perl's attempt to
undo the effects of kernel's #! processing.
The caller invokes »execl("/usr/local/bin/script", "script", "foo",
NULL),
the kernel finds "#!/usr/bin/perl" in "/usr/local/bin/script" and
invokes /usr/bin/perl with the argv ["/usr/bin/perl",
"/usr/local/bin/script", "foo"] instead (note that the original argv[0]
is lost in the process)
the perl interpreter then "hides" itself by putting what it thinks was
the original argv[0] into $0 and the original argv[1] .. argv[argc-1]
into @ARGV.
> This is not really unusual: it's what all the shells do, and I'd wager
> also any other language which has some equivalent to $0.
>
>> 2) The environment: Another array of strings. By convention each program
>> passes this through to any programs it invokes and the strings are in
>> "key=value" format. This contains the PATH, locale information,
>> information about the terminal (if applicable) and other
>> configuration information.
>>
>> 3) A set of three file descriptors numbered 0, 1, and 2, and typically
>> called stdin, stdout, and stderr respectively in most programming
>> languages. These are *file descriptors*, not strings. You can read
>> from them (well, you should read only from stdin) with the read
>> system call (or higher level functions like getc() in C or <> in
>> Perl) and write to them (stdout and stderr, at least) with write (or
>> print or printf, etc.)
>
> In fact, a completely arbitrary set of file descriptors, which may or
> may not be contiguously numbered. It's entirely possible to invoke a
> program with one of the standard fds closed, though it's not a good idea
> since many programs misbehave.
Linux enforces that at least these three file descriptors are open at
least on setuid programs, but I don't know offhand whether that's done
by the kernel or the startup code. And I am aware that this isn't true
for other unixes.
> It's also not uncommon to pass additional open file descriptors.
Yes, I should have written "at least three". You can always pass more,
and indeed some Unixes did pass a fd to the controlling terminal as file
descriptor 3 ("stdtty") by default.
hp
--
_ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
|_|_) | | Man feilt solange an seinen Text um, bis
| | | hjp@hjp.at | die Satzbestandteile des Satzes nicht mehr
__/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
------------------------------
Date: Mon, 23 Sep 2013 11:45:55 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: utilities in perl
Message-Id: <87bo3j95p8.fsf@sable.mobileactivedefense.com>
tmcd@panix.com (Tim McDaniel) writes:
> In article <acq3ha-rr2.ln1@anubis.morrow.me.uk>,
> Ben Morrow <ben@morrow.me.uk> wrote:
>>
>>Quoth "Peter J. Holzer" <hjp-usenet3@hjp.at>:
>>>
>>> To summarize:
>>>
>>> On startup, a program is provided with three sets of information:
>>>
>>> 1) The argument vector: This is an array of strings containing the
>>> command name and any "command line" arguments, i.e. the arguments
>>> you type on the command line after the command (interactively), or
>>> the arguments to exec (in a program). (Perl is a bit unusual in that
>>> it shoves the first argument (the command name) into $0 and only the
>>> rest of the arguments into @ARGV).
>>
>>No, the first argument (the command name) goes into $^X, the first
>>non-option argument goes into $0, and the rest of the arguments go into
>>@ARGV.
>
> I found your answer confusing. When I type a command line, like just
> now with
> $ chmod u+x local/test/106.pl
> $ local/test/106.pl hello world
> $0 was 'local/test/106.pl', as I expected, which was what I was
> thinking of as the "command name", and I was thinking of "hello" as
> the "first non-option argument".
That's probably how the shell invoked it but it need not be done in this
way. Assuming execl as an example, the general format of that is
execl("/path/to/file", "argument #0", ...);
the first argument to execl being the pathname of the file which is
supposed to be executed and the next being what ends up in argv[0]. By
convention, this should be 'the program name' and IIRC, POSIX even says
somewhere that it should really just be the name and not the
path. Assuming that /tmp/a.pl is the following perl script,
-----
#!/usr/bin/perl
print($^X, "\t", $0, "\t", $ARGV[0], "\n");
-----
this could be invoked via
-----
#include <unistd.h>
int main(void)
{
execl("/tmp/a.pl", "Blafasel", "Are we having an argument?", (void *)0);
return 0;
}
-----
and the output would be
-----
/usr/bin/perl /tmp/a.pl Are we having an argument?
-----
with the original 'program name' ("Blafasel") vanishing in the
process. It could also be called with
-----
#include <unistd.h>
int main(void)
{
execl("/usr/bin/perl", "Now what?", "/tmp/a.pl", "Are we having an argument?", (void *)0);
return 0;
}
-----
This will result in the same output on a system which supports
/proc/self/exe aka 'Linux' but in case perl has to resort to the real
'program name' argument, $^X should become "Now what?" (according to the
documentation).
------------------------------
Date: Mon, 23 Sep 2013 15:49:31 +0100
From: Justin C <justin.1303@purestblue.com>
Subject: Re: utilities in perl
Message-Id: <r016ha-uja.ln1@zem.masonsmusic.co.uk>
On 2013-09-22, Ben Morrow <ben@morrow.me.uk> wrote:
>
> Quoth Cal Dershowitz <Cal@example.invalid>:
>>
>> Do germans typically have directories for their own stuff that have
>> german encodings like:
>>
>> /home/jue/Documents/Persoenlichkeiten/
>>
>> , where you have an actual o umlaut as opposed to the english
>> transcription? Maybe even the u too.
>
> The answer to that is rather complicated :).
[snip]
Ben, you really need to get out more.
Justin.
--
Justin C, by the sea.
------------------------------
Date: 23 Sep 2013 15:27:36 GMT
From: hymie@lactose.homelinux.net (hymie!)
Subject: Re: utilities in perl
Message-Id: <52405de8$0$20630$862e30e2@ngroups.net>
In our last episode, the evil Dr. Lacto had captured our hero,
"Peter J. Holzer" <hjp-usenet3@hjp.at>, who said:
>On 2013-09-21 19:59, hymie! <hymie@lactose.homelinux.net> wrote:
>> In our last episode, the evil Dr. Lacto had captured our hero,
>> Cal Dershowitz <Cal@example.invalid>, who said:
>>>I don't want to spend too long talking about something where I clearly
>>>don't get it, but everyone else here does. I know this is a perl group,
>>>so C talk is OT.
>>>
>>>int main(int argc, char * argv)
>>>
>>>Do people still think these values don't come from STDIN in this context?
>>
>> STDIN means that a program that is already running has asked you a
>> question and is waiting for you to type in an answer.
>
>No, it doesn't mean that. Many programs reading from stdin never ask you
>any questions.
I was trying to simplify the situation for a user who, by his own
admission, doesn't get it.
>> It is possible, however, that one of the arguments you provide to
>> the program is - . That is a clue to the operating system that
>> "this argument should not read data from a pre-existing file, it should
>> read from STDIN."
>
>Also wrong. It's not a clue to the operating system, it is a clue to the
>program.
My mistake.
--hymie! http://lactose.homelinux.net/~hymie hymie@lactose.homelinux.net
-------------------------------------------------------------------------------
------------------------------
Date: Mon, 23 Sep 2013 16:08:55 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: utilities in perl
Message-Id: <l1pp2n$l9b$1@reader1.panix.com>
In article <87bo3j95p8.fsf@sable.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
>tmcd@panix.com (Tim McDaniel) writes:
>> I found your answer confusing. When I type a command line, ...
...
>That's probably how the shell invoked it but it need not be done in this
>way. Assuming execl as an example, ...
I was restricting myself to the shell, and in particular to my
*perception* of the command line, in particular the "program name" and
"first argument". Certainly exec.*() makes things clearer and allows
playing some games.
--
Tim McDaniel, tmcd@panix.com
------------------------------
Date: Mon, 23 Sep 2013 17:28:51 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: utilities in perl
Message-Id: <3r66ha-u2n1.ln1@anubis.morrow.me.uk>
Quoth tmcd@panix.com:
> In article <87bo3j95p8.fsf@sable.mobileactivedefense.com>,
> Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
> >tmcd@panix.com (Tim McDaniel) writes:
> >> I found your answer confusing. When I type a command line, ...
> ...
> >That's probably how the shell invoked it but it need not be done in this
> >way. Assuming execl as an example, ...
>
> I was restricting myself to the shell, and in particular to my
> *perception* of the command line, in particular the "program name" and
> "first argument". Certainly exec.*() makes things clearer and allows
> playing some games.
It's important to be clear, though, that whether you invoke a perl
script as
/path/to/script arg
with a #! line or
perl /path/to/script arg
the arguments perl sees are the same, so the variables end up set the
same.
This is quite separate from the possibility of mucking about with
argv[0]. In the first case that argument is (I think) thrown away by the
kernel; in the second perl will, as Rainer said, only use it for $^X if
it hasn't got some other way of finding its own path.
Ben
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 4041
***************************************