[33095] in Perl-Users-Digest
Perl-Users Digest, Issue: 4371 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Feb 15 18:09:25 2015
Date: Sun, 15 Feb 2015 15:09:08 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sun, 15 Feb 2015 Volume: 11 Number: 4371
Today's topics:
A couple of useless but fun scripts for your amusement. <see.my.sig@for.my.address>
CGI parse and a simple question <hslee911@yahoo.com>
Re: CGI parse and a simple question <rweikusat@mobileactivedefense.com>
Re: eval <gravitalsun@hotmail.foo>
Extract data with regular expressions <noreply2me@yahoo.com>
Re: Extract data with regular expressions <jurgenex@hotmail.com>
Re: Extract data with regular expressions <see.my.sig@for.my.address>
Re: Extract data with regular expressions <see.my.sig@for.my.address>
Re: Traversing through sub dirs and read file contents <m@rtij.nl.invlalid>
Re: Traversing through sub dirs and read file contents <m@rtij.nl.invlalid>
Upgrading Perl modules <news@todbe.com>
Re: What is the difference between using or non using s <ben.usenet@bsb.me.uk>
What is the difference between using or non using singl <pengyu.ut@gmail.com>
Re: Whitespace in code <see.my.sig@for.my.address>
Re: Whitespace in code <hjp-usenet3@hjp.at>
Re: Why can I get away with this? <kaz@kylheku.com>
Re: Why can I get away with this? <rweikusat@mobileactivedefense.com>
Re: Why can I get away with this? <see.my.sig@for.my.address>
Re: Why can I get away with this? <see.my.sig@for.my.address>
Re: Why can I get away with this? <rweikusat@mobileactivedefense.com>
Re: Why can I get away with this? <m@rtij.nl.invlalid>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Sun, 15 Feb 2015 05:36:46 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: A couple of useless but fun scripts for your amusement.
Message-Id: <4NidnaPr4up3PX3JnZ2dnUVZ572dnZ2d@giganews.com>
Firstly, a program for writing random essays consisting of x paragraphs
of y words each:
#! /usr/bin/perl
# "essay.perl"
use v5.14;
use strict;
use warnings;
our @Words = qw (
a
aah
aahed
...
insert over 109,000 words here, 1 word per line
...
zymurgy
zyzzyva
zyzzyvas
);
sub RandomNumber {
return int(rand(109583)) ;
}
for (1..$ARGV[0]) {
for (1..$ARGV[1]) {
my $Word = $Words[RandomNumber];
print "$Word", ' ';
}
print "\n\n";
}
You'll have to acquire your own dictionary and insert it into the qw(),
because it's too big to put in this message without slowing everyone's
computers to a standstill. The dictionary I used can be found HERE:
http://www-01.sil.org/linguistics/wordlists/english/wordlist/wordsEn.txt
Invocation:
% essay.perl 3 75
(Prints 3 paragraphs of 75 random English words each.)
As for what earthly use that is, it *can* serve as a shock tool for
breaking writer's block. Clauses such as "Graveless hunks mutagenically
reinterrogate commodious photoengraving" have a delightfully
unrehearsed feel to them. If nothing else, it does send one to consult
a dictionary to see what various rarely-used words actually mean.
Secondly, if one wants to write gibberish in a more oriental way....
#! /usr/bin/perl
# "chinese.perl"
use v5.14;
use strict;
use warnings;
use utf8;
binmode STDIN, ":encoding(utf8)";
binmode STDOUT, ":encoding(utf8)";
sub RandomNumber {
return 0x4E00 + int(rand(0x51A6)); # 4E00 through 9FA5
}
for (1..$ARGV[0]) {
for (1..$ARGV[1]) {
print chr RandomNumber;
}
print "\n";
}
Invocation:
% chinese.perl 15 15
(prints a 15 x 15 grid of random Hanzi (Chinese) ideographs.)
牱覩俏涎装氠缮埄硥皋筊櫤玪癝飲
蒑槽弽埍毬僺馑慿龃试兘橛恰搢剁
瞃裓譐堚痬務拄鞌祳摑誆籪麓菕桌
欣洱循鱟篯謖麼幢屆飄拐俻笎顖轊
穤砇锕冚辻巷礼佹巊黶鎅蒰埕邗騍
輞眐按稾縪胻欃熲颔鳴鮰舻觯癙牆
躐甂恨燽槙漍害俳靕躡悰孟蕵缻煭
壐火硚鹘祇萔漷臧铳殈掷鈖擧魫吴
翸捉仐哅蒟馜誾肽摋誾筂歠邝浱褶
蘈洢愡峵啲缵怲傤衚巐杢諺盵豠鰼
郭笪囨铜頞絶颁謨潨鄧殀蛝憲寜攐
渚釦脺嘘鎦駟鋃酐麵烂焬鞻槤禢个
碤踐弟廝鞸繡鉅掹瘦覊骾杅峀猣钜
熺俆鱽塲鍿厹骇呻囤怈儇轍讣昱覂
嘯廡橌麧犺呻艌翟祂编弌滝琷芔鰩
Useless, but fascinating because of the vast variety of ideographs
in Chinese. This script selects from a block of 20,902 of them.
--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley
------------------------------
Date: Fri, 13 Feb 2015 12:00:30 -0800 (PST)
From: James <hslee911@yahoo.com>
Subject: CGI parse and a simple question
Message-Id: <a9277a8d-6b79-4b1f-a118-abfca3a8a2f9@googlegroups.com>
I just throw this out.
map {my($k,$v)=split/=/;$v=~tr/+/ /;
$v=~s/%(..)/pack("C",hex($1))/eg;${$k}="$v"} split/&/,$ENV{'QUERY_STRING'};
script.cgi?A=hello+perl&B=perl+world
Question is, is there a better way to do multiple substitution? For example,
instead of several lines like this,
$v =~ tr/+/ /;
$v =~ s/%(..)/pack("C",hex($1))/eg;
$v =~ s/some/other/;
etc.
one line, where $v appears only once?
Tks.
James
------------------------------
Date: Fri, 13 Feb 2015 20:18:02 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: CGI parse and a simple question
Message-Id: <87oaoxin0l.fsf@doppelsaurus.mobileactivedefense.com>
James <hslee911@yahoo.com> writes:
> I just throw this out.
>
> map {my($k,$v)=split/=/;$v=~tr/+/ /;
> $v=~s/%(..)/pack("C",hex($1))/eg;${$k}="$v"} split/&/,$ENV{'QUERY_STRING'};
>
> script.cgi?A=hello+perl&B=perl+world
>
> Question is, is there a better way to do multiple substitution? For example,
> instead of several lines like this,
>
> $v =~ tr/+/ /;
> $v =~ s/%(..)/pack("C",hex($1))/eg;
> $v =~ s/some/other/;
> etc.
>
> one line, where $v appears only once?
It's possible to combine both kinds of translation,
,----
| [rw@doppelsaurus]~#perl -de 0
|
| Loading DB routines from perl5db.pl version 1.33
| Editor support available.
|
| Enter h or `h h' for help, or `man perldebug' for more help.
|
| main::(-e:1): 0
| DB<1> $v = 'bla%20fasel+13'
|
| DB<2> $v =~ s/(\+|%(..))/$+ eq '+' ? ' ' : pack('C', hex($2))/ge
|
| DB<3> p $v
| bla fasel 13
`----
however, whether or not this can be regarded as an improvement is very
much debatable (I'd certainly prefer the simpler, two-statement
variant).
------------------------------
Date: Sat, 14 Feb 2015 01:16:38 +0200
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: eval
Message-Id: <mbm0ks$25tu$1@news.ntua.gr>
On 11/2/2015 12:04 πμ, C.DeRykus wrote:
> IIUC, maybe hack the open with a scalar filehandle (available since 5.8.0) :
>
> for (...) {
> { local *STDERR;
> open(STDERR,'>',\my $err);
> my $result = eval "$code";
> if ($@ or $err) {
> print "code evaluation failed because \$@=$@ \$err=$err\n";
> }
> else {
> print "eval succeed, result is $result err=$err\n"
> }
> }
> }
>
Your idea to open the STDERR as memory file and catch the compilation
error there was great, thanks. Some aesthetic changes.
my $code = '$val >= 20 ? 1:0';
{
local *STDERR;
for my $val ('break it', 30 , 10) {
open STDERR, '>', \my $err;
my $result = eval "$code";
close STDERR;
if ($@) {
print "evaluated code died by user request *$@*\n";
}
elsif ($err) {
print "evaluated code did not compile, correct your code *$err*\n";
}
else {
print "evaluated code succeed result is $result\n"
}
}
}
print STDERR "123\n";
------------------------------
Date: Sun, 15 Feb 2015 13:01:31 -0700
From: "Robert Crandal" <noreply2me@yahoo.com>
Subject: Extract data with regular expressions
Message-Id: <1e6dneNT67qBZn3JnZ2dnUVZ5gGdnZ2d@giganews.com>
I have an eBook that is saved in a simple text file.
The file has about 1500-2000 lines of text data.
Each page is separated by a border, which is just
a line of 20 asterisks, like this:
********************
My goal is to scan the entire text file, and remove
all "pages" that contain profanity words or other
miscellaneous words of my choosing.
OR, I could find all the "good" pages that lack
profanity and other words, and simply extract those
pages and append them to a separate text file.
Does anybody know a good way to accomplish
this task with regular expressions?
Thanx
------------------------------
Date: Sun, 15 Feb 2015 13:43:19 -0800
From: Jrgen Exner <jurgenex@hotmail.com>
Subject: Re: Extract data with regular expressions
Message-Id: <pe42eatio4p99e3fbhapnp2t9jojf68eb4@4ax.com>
"Robert Crandal" <noreply2me@yahoo.com> wrote:
>I have an eBook that is saved in a simple text file.
>The file has about 1500-2000 lines of text data.
>Each page is separated by a border, which is just
>a line of 20 asterisks, like this:
>
>********************
>
>My goal is to scan the entire text file, and remove
>all "pages" that contain profanity words or other
>miscellaneous words of my choosing.
>
>OR, I could find all the "good" pages that lack
>profanity and other words, and simply extract those
>pages and append them to a separate text file.
>
>Does anybody know a good way to accomplish
>this task with regular expressions?
Regular expressions match or they don't match. They do not filter, they
do not replace, and they do not remove.
Having said that it may very well be possible to contruct an RE which
will match "all good pages" or something like that. But why?
The most natural approch to me seems to use two steps:
- split the text into individual pages (at the "border")
- and then apply a filter to extract all "good" pages.
And yes, for each sub-task REs can be used as part of the more
encompassing commands.
jue
------------------------------
Date: Sun, 15 Feb 2015 14:34:31 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: Extract data with regular expressions
Message-Id: <7tOdnfsRlMpug3zJnZ2dnUVZ572dnZ2d@giganews.com>
On 2/15/2015 12:01 PM, Robert Crandal wrote:
> I have an eBook that is saved in a simple text file.
> The file has about 1500-2000 lines of text data.
> Each page is separated by a border, which is just
> a line of 20 asterisks, like this:
>
> ********************
>
> My goal is to scan the entire text file, and remove
> all "pages" that contain profanity words or other
> miscellaneous words of my choosing.
>
> OR, I could find all the "good" pages that lack
> profanity and other words, and simply extract those
> pages and append them to a separate text file.
>
> Does anybody know a good way to accomplish
> this task with regular expressions?
If all the "page" separators are exactly 20 asterisks, preceded and
succeeded by a newline, then I'd slurp in one "page" at a time by
setting record separator $/ to "\n********************\n", then
test each record for bad words, do "next RECORD" if a bad word is
found, else print the record. The following is not tested, but
I think it will get you going in the right direction:
#! /usr/bin/perl
use v5.14;
use strict;
use warnings;
$/ = qq(\n********************\n);
my @BadWords = qw ( asdf qwer yuio );
RECORD: while (<>) {
foreach my $BadWord (@BadWords) {
next RECORD if ($_ =~ m/$BadWord/)
}
say;
}
--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley
------------------------------
Date: Sun, 15 Feb 2015 14:51:35 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: Extract data with regular expressions
Message-Id: <7Mudna2LWdtpv3zJnZ2dnUVZ572dnZ2d@giganews.com>
Ok, I just tested my program and it works fine:
#! /usr/bin/perl
use v5.14;
use strict;
use warnings;
$/ = qq(\n********************\n);
my @BadWords = qw ( asdf qwer yuio );
RECORD: while (<>) {
my $bad = 0;
foreach my $BadWord (@BadWords) {
next RECORD if ($_ =~ m/$BadWord/)
}
say;
}
Given this file:
======= BEGIN INPUT =======
playboys regrows correality requisition droits offered
angeles surfy wile lacrimation aged seignories practicing
hereinto workmanship fuggy municipally asdf underpinnings
brocket unpremeditated pinochle crazier coaeval obviously
able supinated hostler burrows artichoke vivant crosstown
********************
baneful celebrations angle growler landscape beside tzetzes
normal bootery bespoke henhouses tribuneship bouncer
displeasure crewman tenth curarization honestness sensitize
reminisces cometh sent obscurantists eventualities mechanics
vanity tazze nonalignment dowering nephew nonconfidence
********************
chaotically sooners rocketing luckiest holeproof damnableness
soc infertilely supernumerary expertise sulphid frisson
surceases joyously kins drooled agrarianism paraphrases ribby
wittiness grabbiest junketer accumulable hemokonia matriculants
sieged yuio forgoes staking nonadjacent offprint mug pawpaw
======= END INPUT =======
The script rejects the first paragraph because it contains bad
word "asdf", prints the second paragraph, and rejects the third
because it contains bad word "yuio":
======= BEGIN OUTPUT =======
baneful celebrations angle growler landscape beside tzetzes
normal bootery bespoke henhouses tribuneship bouncer
displeasure crewman tenth curarization honestness sensitize
reminisces cometh sent obscurantists eventualities mechanics
vanity tazze nonalignment dowering nephew nonconfidence
********************
======= END OUTPUT =======
--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley
------------------------------
Date: Thu, 12 Feb 2015 22:35:22 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: Traversing through sub dirs and read file contents
Message-Id: <qtivqb-s82.ln1@news.rtij.nl>
On Thu, 12 Feb 2015 13:34:49 +0100, G.B. wrote:
> And that's about one of the many lines in File::Find.
1) I would probably hand code it the same way but more important 2) those
problems exist whatever solution you choose.
So yes, it is an interesting discussion but it does not say anything
about File::Find per se.
M4
------------------------------
Date: Sun, 15 Feb 2015 23:43:44 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: Traversing through sub dirs and read file contents
Message-Id: <02k7rb-8vc.ln1@news.rtij.nl>
On Thu, 12 Feb 2015 20:25:43 +0100, Peter J. Holzer wrote:
>> [ I have used disk editors to 'erase' directories on ext2 file systems
>> followed by an fsck. That would at least finish in finite time. An rm *
>> in the offending directory (or anything else I could think of using
>> syscalls) would literally take days. And that was less than 1M files.
>
> It wasn't expanding "*" (the equivalent to readdir) which took days, it
> was looking up and removing each file.
Yup, exactly. Hence my statement that FSses may not handle so many files
gracefully. I know ext2 didn't and I remember older NTFSses also to have
problems.
M4
------------------------------
Date: Sat, 14 Feb 2015 10:40:59 -0800
From: "$Bill" <news@todbe.com>
Subject: Upgrading Perl modules
Message-Id: <mbo4qn$mro$1@dont-email.me>
I finally upgraded to the current AS Perl 5.20 and it seems to me (my short
term memory sucks) there was a way to list installed modules on the prior
install and install them in the newer version in some sort of automated
way. But I don't remember if it was a script from a poster or what.
I believe I used to use site/lib/PPM-CONF ... ?
Can someone enlighten me on the fastest method to upgrade my missing modules
so I don't have to reinvent the wheel again ?
TIA, Bill
------------------------------
Date: Sun, 15 Feb 2015 12:55:16 +0000
From: Ben Bacarisse <ben.usenet@bsb.me.uk>
Subject: Re: What is the difference between using or non using single quotes in a pair of backquotes?
Message-Id: <87bnkvmj0r.fsf@bsb.me.uk>
Peng Yu <pengyu.ut@gmail.com> writes:
> The two print statements print differently.
Hmm... not here they don't. The ''s can make a difference but I don't
think you posted the right code.
> Could anyone show me how to understand the difference?
That's a big topic so it's better to start with the case that is
actually bothering you.
> ~$ ./main.pl
> abc
> -n abc
>
From this output (including the extra blank line) my guess is that you
had...
> ~$ cat main.pl
> #!/usr/bin/env perl
>
> use strict;
> use warnings;
>
> my $x='abc';
> my $y=`echo -n $x`;
> print "$y\n";
> my $z=`echo -n '$x'`;
my $z=`echo '-n $x'`;
instead. The difference is now due to how the shell interprets quotes.
The echo command sees one argument -n<space>abc which it outputs with a
closing newline.
> print "$z\n";
--
Ben.
------------------------------
Date: Sat, 14 Feb 2015 19:56:24 -0800 (PST)
From: Peng Yu <pengyu.ut@gmail.com>
Subject: What is the difference between using or non using single quotes in a pair of backquotes?
Message-Id: <e125abff-c879-4588-9e55-a11fb792e67f@googlegroups.com>
Hi,
The two print statements print differently. Could anyone show me how to understand the difference? Thanks.
~$ ./main.pl
abc
-n abc
~$ cat main.pl
#!/usr/bin/env perl
use strict;
use warnings;
my $x='abc';
my $y=`echo -n $x`;
print "$y\n";
my $z=`echo -n '$x'`;
print "$z\n";
Regards,
Peng
------------------------------
Date: Sun, 15 Feb 2015 05:09:33 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: Whitespace in code
Message-Id: <KMGdnePBIPkUB33JnZ2dnUVZ57ydnZ2d@giganews.com>
On 2/9/2015 9:50 PM, $Bill wrote:
> On 2/9/2015 09:57, Robbie Hatley wrote:
> >
> > Since then, I've switched to a different text editor on my Win8.1
> > notebook, "Notepad++" ...
>
> Haven't tried it, but I would suggest you try a Win32 native port of
> Vim or Emacs - I've been using vim (gvim) the entire time I've been
> using a PC and it's predecessor (vi) the entire time I was on UNIX.
I find the learning curve for vi to be more time-consuming than
I can afford. And it doesn't have many of the great features of
Notepad++, such as:
1. "Workspace & Projects" panels on left side of screen like an IDE.
2. Tabbed documents, like Firefox.
3. Syntax highlighting for a variety of programming languages
http://www.notepad-plus-plus.org/
> ...
> I would also recommend using a UNIX shell on Windows:
> ftp://ftp.astron.com/pub/tcsh/ or possibly another UNIX shell port
> instead of the dumb cmd.exe.
I don't use cmd.exe because it doesn't handle #!, and because it
doesn't come with any utilities and languages, etc. Instead, I use
Cygwin:
http://www.cygwin.com/
Cygwin gives a unix-like interface to Windows. Features include:
1. Uses unix-like file path nomenclature. "C:\argle" becomes
"/cygdrive/c/argle", your compilers, utilities, etc, are in
"/usr/bin", and your home directory is by default "/home/user_name".
2. Comes with lots of programming languages and utilities.
3. Comes with a package manager to keep them all up to date.
4. Its shell is Bash, so you can use all of the Bash commands
and Bash shell scripting.
5. It has both 32-bit and 64-bit versions. I'm currently using
the 64-bit version on my 64-bit Asus notebook.
--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley
------------------------------
Date: Sun, 15 Feb 2015 23:03:01 +0100
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: Whitespace in code
Message-Id: <slrnme25sl.5fp.hjp-usenet3@hrunkner.hjp.at>
On 2015-02-15 13:09, Robbie Hatley <see.my.sig@for.my.address> wrote:
> On 2/9/2015 9:50 PM, $Bill wrote:
>> On 2/9/2015 09:57, Robbie Hatley wrote:
>> >
>> > Since then, I've switched to a different text editor on my Win8.1
>> > notebook, "Notepad++" ...
>>
>> Haven't tried it, but I would suggest you try a Win32 native port of
>> Vim or Emacs - I've been using vim (gvim) the entire time I've been
>> using a PC and it's predecessor (vi) the entire time I was on UNIX.
>
> I find the learning curve for vi to be more time-consuming than
> I can afford.
I think that one should invest some effort into learning the tools of
the trade. And for a programmer, the editor is one of the most important
tools (you are using it several hours a day, after all). That doesn't
mean that vim is the *right* tool for you. Maybe notepad++ (which is a
fine editor - we recommend it to our Windows users) is better for you.
Or maybe an IDE like Eclipse.
Just stay away from minimal "editors" like notepad or nano. They have a
place, but not on a programmer's workbench.
> And it doesn't have many of the great features of Notepad++, such as:
>
> 1. "Workspace & Projects" panels on left side of screen like an IDE.
True (AFAIK). (But with gvim you can tear off the buffers menu, which at
leas allows you to rapidly switch between files in your project.)
> 2. Tabbed documents, like Firefox.
vim has tabs.
> 3. Syntax highlighting for a variety of programming languages
vim has syntax highlighting. For lots of languages (I'd be genuinely
surprised if notepad++ (or any other editor except emacs) has syntax
highlighting for more languages).
hp
--
_ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
|_|_) | | Man feilt solange an seinen Text um, bis
| | | hjp@hjp.at | die Satzbestandteile des Satzes nicht mehr
__/ | http://www.hjp.at/ | zusammenpat. -- Ralph Babel
------------------------------
Date: Sat, 14 Feb 2015 01:37:18 +0000 (UTC)
From: Kaz Kylheku <kaz@kylheku.com>
Subject: Re: Why can I get away with this?
Message-Id: <20150213172800.715@kylheku.com>
On 2015-02-10, Robbie Hatley <see.my.sig@for.my.address> wrote:
> I could have a file named.....
> our $FileName = "\x01犬草\n\x02\N{MALE SIGN}猫\x03\x04\a\0";
> print "\n\$FileName = $FileName\n\n";
>
>
> Oh, my.
So what's your point? Millions of people world round use kanji in Windows
filesystem names.
I have a bunch myself. For instance, in the titles of Japanese media files:
audio, video, as well as the directories they reside in.
> Wait, there's actually one other character which *MUST* be disallowed
> in file names in nearly every file system, and that's '\0', except
Only if the OS is written in some language in which it is customary to work
with null-terminated strings.
This representation for character strings is not completely written
in stone.
> perhaps as the vary last character of a file name. The reason I say that
> is, if you put '\0' at the beginning or middle of a file name, when Perl
> or the OS tries to read back the file name, it stops reading characters
> when it hits the null terminator, so that THIS file name:
If an OS happens not to have the problem, why would its designers care that
it's a problem for some programming language.
------------------------------
Date: Sat, 14 Feb 2015 16:20:49 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Why can I get away with this?
Message-Id: <87a90gsbvi.fsf@doppelsaurus.mobileactivedefense.com>
Kaz Kylheku <kaz@kylheku.com> writes:
> On 2015-02-10, Robbie Hatley <see.my.sig@for.my.address> wrote:
[...]
>> perhaps as the vary last character of a file name. The reason I say that
>> is, if you put '\0' at the beginning or middle of a file name, when Perl
>> or the OS tries to read back the file name, it stops reading characters
>> when it hits the null terminator, so that THIS file name:
>
> If an OS happens not to have the problem, why would its designers care that
> it's a problem for some programming language.
It's not a problem with Perl,
[rw@doppelsaurus]~#perl -MDevel::Peek -e '$x = pack("C*", (0) x 20);
Dump($x)'
SV = PV(0x604d00) at 0x623418
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x61c850 "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"\0
CUR = 20
LEN = 24
and - strictly speaking - not even a problem with C,
----------
#include <stdio.h>
char *s = "\0\0\0\0";
char *p = "a";
int main(void)
{
char *r;
int c;
r = s;
do c = *r++, printf("%d ", c); while (c != 'a');
return 0;
}
---------
(This program has undefined behaviour. Any responsibility for the
effects caused by running it is hereby disclaimed)
0-termination is just a convention employed by some functions in the C
standard library. It is a problem with people's expectations: It makes
sense to exclude 0 from the allowed bytes in a filename because people
with a C background will expect it to be special.
------------------------------
Date: Sun, 15 Feb 2015 04:19:01 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: Why can I get away with this?
Message-Id: <ptmdnWMbLNYsE33JnZ2dnUVZ57ydnZ2d@giganews.com>
On 2/10/2015 12:11 AM, Martijn Lievaart wrote:
> On Mon, 09 Feb 2015 23:23:41 -0800, Robbie Hatley wrote:
>
> > Wait, there's actually one other character which *MUST* be disallowed in
> > file names in nearly every file system, and that's '\0', except perhaps
> > as the vary last character of a file name. The reason I say that is, if
> > you put '\0' at the beginning or middle of a file name, when Perl or the
> > OS tries to read back the file name, it stops reading characters when it
> > hits the null terminator, so that THIS file name:
> >
> > $FileName = "斊詥觬榹苵\0匞寨蹼粿砺";
> >
> > would be foreshortened on readback to:
> >
> > $FileName = "斊詥觬榹苵";
> >
> > and give "file not found" errors.
>
> I guess you have a C background, because the above is not logical at all.
>
> However, it is still true, see
> https://msdn.microsoft.com/en-us/library/aa365247%28VS.85%29
I'm not seeing how it's "not logical". C uses null string terminators.
Every version of Microsoft Windows is written mostly in C (as far as I know).
Perl is written in C but bypasses C's "null terminators" and allows embedded
'\0' in strings. However, Windows system calls for accessing directories
do NOT bypass C's "null terminators", and foreshorten file names which contain
embedded '\0'.
As for Linux, I'm not sure how it handles that, and I can't check, because
the hard disk on my desktop PC crashed today, and it will be some weeks
before I can get it fixed. (I'm currently on my Asus notebook, which runs
Windows 8.1.)
As for Windows, when I run the Perl program below, it actually sort of works!
It successfully writes some words to a file and reads them back.
But the file name on disk is foreshortened. Apparently when open()
reads the file name from the directory, it invokes a Windows system call
which foreshortens both the name given by Perl, *AND* the name in the
directory, so they match, and hence the Perl script can read/write the file
even though the file isn't really named what the Perl script thinks it is!
While Perl thinks the file's name is "犬草\0♂猫", the actual name of the file
on the hard disk is "犬草".
#! /usr/bin/perl
use v5.14;
use strict;
use warnings;
use utf8;
use open qw( :encoding(utf8) :std );
my $FileName = "犬草\0♂猫";
say "\$FileName = $FileName";
open MYFILE, ">", $FileName;
print MYFILE "incising taiwanese incompatible balk procured\n";
print MYFILE "graveless quiescence hunk photoengraving scurvier\n";
print MYFILE "mesmerizers visit nutting allegretto ounces\n";
print MYFILE "incrimination idealizing huggermugger savoury commodious\n";
print MYFILE "mutagenically reinterrogates phiz gonophore dewaxed\n";
close MYFILE;
open MYFILE, "<", $FileName;
while (<MYFILE>) {
print "$_";
}
close MYFILE;
exit 0;
--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley
------------------------------
Date: Sun, 15 Feb 2015 04:42:14 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: Why can I get away with this?
Message-Id: <b7GdndCl9aa_CX3JnZ2dnUVZ572dnZ2d@giganews.com>
On 2/13/2015 5:37 PM, Kaz Kylheku wrote:
> On 2015-02-10, Robbie Hatley <see.my.sig@for.my.address> wrote:
> > I could have a file named.....
> > our $FileName = "\x01犬草\n\x02\N{MALE SIGN}猫\x03\x04\a\0";
> > print "\n\$FileName = $FileName\n\n";
> >
> > Oh, my.
>
> So what's your point? Millions of people world round use kanji in Windows
> filesystem names.
Actually, that string contains Kanji, Hanzi (the superset from which Kanji
is taken), ASCII control characters (such as "Begin Transmission" and
"End Transmission"), the "male gender" symbol, the "alarm bell" character
(which should make your computer go "ding" if rendered properly),
and the "null" character. In other words, as George Takei would say,
"Oh, my." :-) So no, I don't think you're going to find anyone but a
madman (such as myself) using all those characters in a file name.
But yes, by using Perl as interface to the Windows 8.1 file system,
you *can* successfully create, read, and write files with such
preposterous names. Not that I recommend that people do so!!!!!
> > Wait, there's actually one other character which *MUST* be disallowed
> > in file names in nearly every file system, and that's '\0', except
>
> Only if the OS is written in some language in which it is customary to work
> with null-terminated strings.
Many OSs (as well as Perl) are written in C. And while Perl does allow
'\0' embedded in strings, many OS APIs do not, and give unwanted results
if you try (such as, file on disk has wrong name, with some of the
characters chopped o
--
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley
------------------------------
Date: Sun, 15 Feb 2015 20:49:47 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Why can I get away with this?
Message-Id: <87twymvr10.fsf@doppelsaurus.mobileactivedefense.com>
Robbie Hatley <see.my.sig@for.my.address> writes:
> On 2/10/2015 12:11 AM, Martijn Lievaart wrote:
>> On Mon, 09 Feb 2015 23:23:41 -0800, Robbie Hatley wrote:
>>
>> > Wait, there's actually one other character which *MUST* be disallowed in
>> > file names in nearly every file system, and that's '\0', except perhaps
>> > as the vary last character of a file name. The reason I say that is, if
>> > you put '\0' at the beginning or middle of a file name, when Perl or the
>> > OS tries to read back the file name, it stops reading characters when it
>> > hits the null terminator, so that THIS file name:
>> >
>> > $FileName = "斊詥觬榹苵\0匞寨蹼粿砺";
>> >
>> > would be foreshortened on readback to:
>> >
>> > $FileName = "斊詥觬榹苵";
>> >
>> > and give "file not found" errors.
>>
>> I guess you have a C background, because the above is not logical at all.
>>
>> However, it is still true, see
>> https://msdn.microsoft.com/en-us/library/aa365247%28VS.85%29
>
> I'm not seeing how it's "not logical". C uses null string terminators.
> Every version of Microsoft Windows is written mostly in C (as far as I know).
> Perl is written in C but bypasses C's "null terminators" and allows embedded
> '\0' in strings. However, Windows system calls for accessing directories
> do NOT bypass C's "null terminators", and foreshorten file names which contain
> embedded '\0'.
A 'string' is defined as
A string is a contiguous sequence of characters terminated by
and including the first null character.
in section 7.1.1 of ISO/IEC 9899:1999 ("C99") and that's the start of
chapter 7 whose title is "Library", ie, this is a convention employed by
certain functions in the C standard library and nothing more than that:
No actual program written in C is required to use any of these function
and thus, honour this convention, perl itself being an example
here. That 0-bytes are not allowed in Windows filenames is a design
choice presumably intended to "be nice to C programmers", not the
consequence of some law of nature or so.
------------------------------
Date: Sun, 15 Feb 2015 23:49:48 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: Why can I get away with this?
Message-Id: <cdk7rb-8vc.ln1@news.rtij.nl>
On Thu, 12 Feb 2015 20:18:52 +0100, Peter J. Holzer wrote:
> On 2015-02-11 23:32, Martijn Lievaart <m@rtij.nl.invlalid> wrote:
>> On Wed, 11 Feb 2015 21:50:35 +0100, Peter J. Holzer wrote:
>>> On 2015-02-10 08:11, Martijn Lievaart <m@rtij.nl.invlalid> wrote:
>>>> On Mon, 09 Feb 2015 23:23:41 -0800, Robbie Hatley wrote:
>>>>> Wait, there's actually one other character which *MUST* be
>>>>> disallowed in file names in nearly every file system, and that's
>>>>> '\0', except perhaps as the vary last character of a file name. The
>>>>> reason I say that is, if you put '\0' at the beginning or middle of
>>>>> a file name, when Perl or the OS tries to read back the file name,
>>>>> it stops reading characters when it hits the null terminator, so
>>>>> that THIS file name:
> [example deleted]
>>>>
>>>> I guess you have a C background, because the above is not logical at
>>>> all.
>>>
>>> How is "I observed that \0 terminates a file name, therefore I
>>> conclude that \0 cannot be part of a file name" not logical? It may be
>>> an overgeneralisation (e.g. there might be an escape mechanism), but
>>> the conclusion sounds logical to me.
>>
>> File names cannot contain nulls. Therefore we can use C strings in the
>> API. C strings cannot contain nulls. Therefore file names cannot
>> contain nulls.
>>
>> See anything wrong with that reasoning? :-)
>
> I see two things wrong with it:
>
> 1) It's circuitous.
>
> 2) It has nothing to do with Robbie's reasoning. You just invented that
> out of whole cloth to make him look like an utter idiot[1]. That's a
> nasty tactic, however, not very effective on Usenet, where everybody
> can go back and read what he really wrote. And even less effective
> when you actually quote that. Gee, if you're going to put word's in
> anybody's mouth, at least make a token effort to make it convincing.
I don't agree with that, see below.
>
>> Do note that the context here is Windows, so the posix heritage does
>> not apply.
>
> Irrelevant since Robbie didn't refer to any "POSIX heritage". He made an
> observation (An embedded NUL character terminates a file name)
It's this observation that is false...
> and drew
> a conclusion (NUL characters in file names are disallowed).
... so this conclusion is also not valid.
[ As shown by the fact that Windows actually allows NULs in filenames (in
some situations). Whether that is a good idea to try is obviously a whole
new can of worms, it's probably akin to newlines in filenames. Allowed,
but obscene. ]
> [1] I'm generally a big fan if Hanlon's razor, but I can't believe your
> reading skills are that bad.
I still think the reasoning here is incorrect, not my reading skills.
M4
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 4371
***************************************