[33096] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4372 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Feb 16 11:09:21 2015

Date: Mon, 16 Feb 2015 08:09:06 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 16 Feb 2015     Volume: 11 Number: 4372

Today's topics:
    Re: Extract data with regular expressions <gravitalsun@hotmail.foo>
    Re: Extract data with regular expressions <from_usenet_2014@wasell.user32.com>
    Re: Extract data with regular expressions <see.my.sig@for.my.address>
    Re: Multi-field sorting (general case) <uri@stemsystems.com>
        use Storable: failed example from Intermediate Perl <senorsmile@gmail.com>
    Re: use Storable: failed example from Intermediate Perl <ben.usenet@bsb.me.uk>
    Re: use Storable: failed example from Intermediate Perl <justin.1410@purestblue.com>
    Re: Whitespace in code <kaz@kylheku.com>
    Re: Whitespace in code <kaz@kylheku.com>
    Re: Whitespace in code <lionslair@consolidated.net>
    Re: Whitespace in code <bauhaus@futureapps.invalid>
    Re: Why can I get away with this? <see.my.sig@for.my.address>
    Re: Why can I get away with this? <hjp-usenet3@hjp.at>
    Re: Why can I get away with this? <kaz@kylheku.com>
    Re: Why can I get away with this? <m@rtij.nl.invlalid>
    Re: Why can I get away with this? <m@rtij.nl.invlalid>
    Re: Why can I get away with this? <see.my.sig@for.my.address>
    Re: Why can I get away with this? <m@rtij.nl.invlalid>
    Re: Why can I get away with this? <m@rtij.nl.invlalid>
    Re: Why can I get away with this? <rweikusat@mobileactivedefense.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Mon, 16 Feb 2015 13:36:01 +0200
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: Extract data with regular expressions
Message-Id: <mbskm2$1mdq$1@news.ntua.gr>

On 15/2/2015 22:01, Robert Crandal wrote:
> I have an eBook that is saved in a simple text file.



#! /usr/bin/perl
use strict; use warnings; use feature qw/say/;

my @BadWords   = qw/******************** ======= asdf qwer yuio/;
my $regex_skip = '(?:'.join('|', map {quotemeta $_} reverse sort 
@BadWords).')';
    $regex_skip = qr/$regex_skip/;


while (<DATA>) {
$_ =~ $regex_skip ? next : chomp;
say
}






------------------------------

Date: Mon, 16 Feb 2015 13:27:31 +0100
From: Wasell <from_usenet_2014@wasell.user32.com>
Subject: Re: Extract data with regular expressions
Message-Id: <MPG.2f4c00903b878f2698969e@news.eternal-september.org>

On Sun, 15 Feb 2015 14:51:35 -0800, in article 
<7Mudna2LWdtpv3zJnZ2dnUVZ572dnZ2d@giganews.com>, Robbie Hatley wrote:
> 
> Ok, I just tested my program and it works fine:
[snip]

Beware of the Scunthorp problem:
  <https://en.wikipedia.org/wiki/Scunthorpe_problem>

>        next RECORD if ($_ =~ m/$BadWord/)

Possibly better:
    next RECORD if ($_ =~ m/\b$BadWord\b/)



------------------------------

Date: Mon, 16 Feb 2015 06:19:44 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: Extract data with regular expressions
Message-Id: <veGdnS2mBebmYXzJnZ2dnUVZ572dnZ2d@giganews.com>

On 2/16/2015 4:27 AM, Wasell wrote:
> On Sun, 15 Feb 2015 14:51:35 -0800, in article
> <7Mudna2LWdtpv3zJnZ2dnUVZ572dnZ2d@giganews.com>, Robbie Hatley wrote:
>>
>> Ok, I just tested my program and it works fine:
> [snip]
>
> Beware of the Scunthorp problem:
>    <https://en.wikipedia.org/wiki/Scunthorpe_problem>
>
>>         next RECORD if ($_ =~ m/$BadWord/)
>
> Possibly better:
>      next RECORD if ($_ =~ m/\b$BadWord\b/)
>


Ah, good point. Thanks for pointing that out. Otherwise, if a person
was trying to filter out, for example, the word "ass", the script
would also reject the word "amass".


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley


------------------------------

Date: Mon, 16 Feb 2015 01:22:07 -0500
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: Multi-field sorting (general case)
Message-Id: <87wq3i4bqo.fsf@stemsystems.com>


use Sort::Maker ;

uri


------------------------------

Date: Sun, 15 Feb 2015 19:34:23 -0800 (PST)
From: senorsmile <senorsmile@gmail.com>
Subject: use Storable: failed example from Intermediate Perl
Message-Id: <69b4e4e2-6bf5-4e99-8e9d-1482a53fd1d7@googlegroups.com>

I am going through Intermediate Perl a second time, trying to absorb all of the examples.  

I am attempting to run the following code: 

use Storable;
my @data1 = qw(one won);
my @data2 = qw(two too to);
push @data2, \@data1;
push @data1, \@data2;
my $frozen = freeze [\@data1, \@data2];

which gives me a syntax error.  I have simplified it to the point where I get a more useful error: 

use Storable;
my @data1 = qw(one won);
my $frozen = freeze \@data1;

Backslash found where operator expected at -e line 4, near "freeze \"
	(Do you need to predeclare freeze?)


If I use the "fully qualified" form of freeze 
  Storable::freeze
it works fine.  I shouldn't have to do this though, right? 


------------------------------

Date: Mon, 16 Feb 2015 11:33:10 +0000
From: Ben Bacarisse <ben.usenet@bsb.me.uk>
Subject: Re: use Storable: failed example from Intermediate Perl
Message-Id: <87pp9ahz0p.fsf@bsb.me.uk>

senorsmile <senorsmile@gmail.com> writes:

> I am going through Intermediate Perl a second time, trying to absorb
> all of the examples.
>
> I am attempting to run the following code: 
>
> use Storable;
> my @data1 = qw(one won);
> my @data2 = qw(two too to);
> push @data2, \@data1;
> push @data1, \@data2;
> my $frozen = freeze [\@data1, \@data2];
>
> which gives me a syntax error.
<snip>
> If I use the "fully qualified" form of freeze 
>   Storable::freeze
> it works fine.  I shouldn't have to do this though, right? 

No, you should have to because freeze is not imported by default.  It's
up to the module what names you see with a default "use Storage" and
freeze is not one of those names.

You can write

  use Storage qw(freeze);

to use that name unqualified.

-- 
Ben.


------------------------------

Date: Mon, 16 Feb 2015 12:02:31 +0000
From: Justin C <justin.1410@purestblue.com>
Subject: Re: use Storable: failed example from Intermediate Perl
Message-Id: <nr29rb-93g.ln1@zem.masonsmusic.co.uk>

On 2015-02-16, senorsmile <senorsmile@gmail.com> wrote:
> I am going through Intermediate Perl a second time, trying to absorb all of the examples.  
>
> I am attempting to run the following code: 
>
> use Storable;
> my @data1 = qw(one won);
> my @data2 = qw(two too to);
> push @data2, \@data1;
> push @data1, \@data2;
> my $frozen = freeze [\@data1, \@data2];
>
> which gives me a syntax error.  I have simplified it to the point where I get a more useful error: 
>
> use Storable;
> my @data1 = qw(one won);
> my $frozen = freeze \@data1;
>
> Backslash found where operator expected at -e line 4, near "freeze \"
> 	(Do you need to predeclare freeze?)
>
>
> If I use the "fully qualified" form of freeze 
>   Storable::freeze
> it works fine.  I shouldn't have to do this though, right? 

Storable doesn't export 'freeze' by default. Either do what you're 
doing now or change the 'use' line:

use Storable qw/freeze/;

and import freeze.


   Justin.

-- 
Justin C, by the sea.


------------------------------

Date: Mon, 16 Feb 2015 02:22:28 +0000 (UTC)
From: Kaz Kylheku <kaz@kylheku.com>
Subject: Re: Whitespace in code
Message-Id: <20150215175704.37@kylheku.com>

On 2015-02-15, Robbie Hatley <see.my.sig@for.my.address> wrote:
>
> On 2/9/2015 9:50 PM, $Bill wrote:
>
>> On 2/9/2015 09:57, Robbie Hatley wrote:
>> >
>> > Since then, I've switched to a different text editor on my Win8.1
>> > notebook, "Notepad++" ...
>>
>> Haven't tried it, but I would suggest you try a Win32 native port of
>> Vim or Emacs - I've been using vim (gvim) the entire time I've been
>> using a PC and it's predecessor (vi) the entire time I was on UNIX.
>
> I find the learning curve for vi to be more time-consuming than
> I can afford.

For me, that learning curve was some long-forgotten week in 1995 when I
switched to Vim.  Actually, at that time, I downloaded and compiled every open
source clone of Vi I could get my hands on. Vim clearly beat all of them (and
nothing has caught up since).

A small investment in learning long ago: big, lasting payoff.

The learning curve argument falls flat in multiple ways because learning
curves don't last forever, but their benefits are enduring, and
because anything which has no learning curve isn't worth it, generally.
If something requires next to no skill, it only holds back those who
can develop skill at the same level as the unskilled.

> And it doesn't have many of the great features of
> Notepad++, such as:
>
> 1. "Workspace & Projects" panels on left side of screen like an IDE.

Silly Clutter.

> 2. Tabbed documents, like Firefox.

Vim has them (gvim). Tabs, horizontal and vertical splits, etc.

> 3. Syntax highlighting for a variety of programming languages

Vim's syntax highlighting is second to none.

I even dispatch Vim out of my web server to do syntax highlighting on the fly.

The out-of-date installation of Vim I have on this machine has well over
500 syntax definitions:

$ ls /usr/share/vim/vim73/syntax/*.vim | wc
    543     543   20836

I have experience developing a complicated syntax highlighting file for Vim.  I
don't think anything else out there has the expressivity to do an
equally accurate job. Maybe Emacs.


------------------------------

Date: Mon, 16 Feb 2015 02:25:52 +0000 (UTC)
From: Kaz Kylheku <kaz@kylheku.com>
Subject: Re: Whitespace in code
Message-Id: <20150215182235.580@kylheku.com>

On 2015-02-10, $Bill <news@todbe.com> wrote:
> I would also recommend using a UNIX shell on Windows:
>    ftp://ftp.astron.com/pub/tcsh/  or possibly another UNIX shell port
> instead of the dumb cmd.exe.

Actually, cmd.exe might be preferrable to tcsh. Tough call!


------------------------------

Date: Sun, 15 Feb 2015 21:49:03 -0600
From: Martin Eastburn <lionslair@consolidated.net>
Subject: Re: Whitespace in code
Message-Id: <2NdEw.616922$z32.149484@fx30.iad>

On 2/15/2015 7:09 AM, Robbie Hatley wrote:
>
> On 2/9/2015 9:50 PM, $Bill wrote:
>
>> On 2/9/2015 09:57, Robbie Hatley wrote:
>> >
>> > Since then, I've switched to a different text editor on my Win8.1
>> > notebook, "Notepad++" ...
>>
>> Haven't tried it, but I would suggest you try a Win32 native port of
>> Vim or Emacs - I've been using vim (gvim) the entire time I've been
>> using a PC and it's predecessor (vi) the entire time I was on UNIX.
>
> I find the learning curve for vi to be more time-consuming than
> I can afford. And it doesn't have many of the great features of
> Notepad++, such as:
>
> 1. "Workspace & Projects" panels on left side of screen like an IDE.
> 2. Tabbed documents, like Firefox.
> 3. Syntax highlighting for a variety of programming languages
>
> http://www.notepad-plus-plus.org/
>
>> ...
>> I would also recommend using a UNIX shell on Windows:
>>    ftp://ftp.astron.com/pub/tcsh/  or possibly another UNIX shell port
>> instead of the dumb cmd.exe.
>
> I don't use cmd.exe because it doesn't handle #!, and because it
> doesn't come with any utilities and languages, etc. Instead, I use
> Cygwin:
>
> http://www.cygwin.com/
>
> Cygwin gives a unix-like interface to Windows. Features include:
>
> 1. Uses unix-like file path nomenclature. "C:\argle" becomes
>     "/cygdrive/c/argle", your compilers, utilities, etc, are in
>     "/usr/bin", and your home directory is by default "/home/user_name".
> 2. Comes with lots of programming languages and utilities.
> 3. Comes with a package manager to keep them all up to date.
> 4. Its shell is Bash, so you can use all of the Bash commands
>     and Bash shell scripting.
> 5. It has both 32-bit and 64-bit versions. I'm currently using
>     the 64-bit version on my 64-bit Asus notebook.
>
>
>
How about column cut and paste?  Out of a Text page not a spread sheet.

I've used it a number of times.  Handy.

Martin



------------------------------

Date: Mon, 16 Feb 2015 11:43:38 +0100
From: "G.B." <bauhaus@futureapps.invalid>
Subject: Re: Whitespace in code
Message-Id: <mbshjj$f54$1@dont-email.me>

On 15.02.15 23:03, Peter J. Holzer wrote:
> Or maybe an IDE like Eclipse.

For me, a big problem with learning vi is caused by authors
of tutorials using this approach: NOT starting it with
everything that really is intuitive (mnemonic) in vi,
once accepted:
d for delete, r for replace, w for word, e for end,
) for end of sentence, } for end of paragraph, / for RE,
% for matching bracket etc., BUT to start from hysterical
raisins like h j k l!  This degree of unintuitive operation
can drive anyone away, I think.

OTOH, I wouldn't want an IDE without some form of integration
with "the rest", viz. navigation, overviews, folding, manual
pages, etc. Even that is available when using plain vi on Unix,
for editing some (non-Perl?) languages. Is it available with
non-IDE syntax highlighting editors?  I need to look at Epic
again!



------------------------------

Date: Sun, 15 Feb 2015 15:12:35 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: Why can I get away with this?
Message-Id: <1uidnQWMbJRCunzJnZ2dnUVZ57ydnZ2d@giganews.com>

On 2/15/2015 12:49 PM, Rainer Weikusat wrote:
> Robbie Hatley <see.my.sig@for.my.address> writes:
>> On 2/10/2015 12:11 AM, Martijn Lievaart wrote:
>>> On Mon, 09 Feb 2015 23:23:41 -0800, Robbie Hatley wrote:
>>>
>>>> Wait, there's actually one other character which *MUST* be disallowed in
>>>> file names in nearly every file system, and that's '\0', except perhaps
>>>> as the vary last character of a file name. The reason I say that is, if
>>>> you put '\0' at the beginning or middle of a file name, when Perl or the
>>>> OS tries to read back the file name, it stops reading characters when it
>>>> hits the null terminator, so that THIS file name:
>>>>
>>>>       $FileName = "斊詥觬榹苵\0匞寨蹼粿砺";
>>>>
>>>> would be foreshortened on readback to:
>>>>
>>>>       $FileName = "斊詥觬榹苵";
>>>>
>>>> and give "file not found" errors.
>>>
>>> I guess you have a C background, because the above is not logical at all.
>>>
>>> However, it is still true, see
>>> https://msdn.microsoft.com/en-us/library/aa365247%28VS.85%29
>>
>> I'm not seeing how it's "not logical". C uses null string terminators.
>> Every version of Microsoft Windows is written mostly in C (as far as I know).
>> Perl is written in C but bypasses C's "null terminators" and allows embedded
>> '\0' in strings. However, Windows system calls for accessing directories
>> do NOT bypass C's "null terminators", and foreshorten file names which contain
>> embedded '\0'.
>
> A 'string' is defined as
>
> 	A string is a contiguous sequence of characters terminated by
> 	and including the first null character.
>
> in section 7.1.1 of   ISO/IEC 9899:1999 ("C99") and that's the start of
> chapter 7 whose title is "Library", ie, this is a convention employed by
> certain functions in the C standard library and nothing more than that:
> No actual program written in C is required to use any of these function
> and thus, honour this convention, perl itself being an example
> here. That 0-bytes are not allowed in Windows filenames is a design
> choice presumably intended to "be nice to C programmers", not the
> consequence of some law of nature or so.


Which is to say that Microsoft's programmers *could have* written the
Windows APIs so that they don't use any calls to C's standard library.

But they didn't, because refusing to use your language's standard library
and insisting on "reinventing the wheel" over and over again wastes time,
increases bugs, decreases efficiency, and increases code bloat.

So it's moot. The fact remains, if you embed '\0' in a file name, you'll
have massive problems trying to use that file name in Windows, and probably
in most operating systems and file systems, because they're mostly based
on C, and make good use of C's standard library and its idioms (which
include using '\0' as a string terminator).

Therefore, if one wants to write Perl code that's portable and will work
well in conjunction with real-world OSs, hardware, and files systems,
embedding '\0' in strings is a poor idea.



-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley


------------------------------

Date: Mon, 16 Feb 2015 00:24:44 +0100
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: Why can I get away with this?
Message-Id: <slrnme2als.847.hjp-usenet3@hrunkner.hjp.at>

On 2015-02-15 22:49, Martijn Lievaart <m@rtij.nl.invlalid> wrote:
> On Thu, 12 Feb 2015 20:18:52 +0100, Peter J. Holzer wrote:
>> On 2015-02-11 23:32, Martijn Lievaart <m@rtij.nl.invlalid> wrote:
>>> On Wed, 11 Feb 2015 21:50:35 +0100, Peter J. Holzer wrote:
>>>> How is "I observed that \0 terminates a file name, therefore I
>>>> conclude that \0 cannot be part of a file name" not logical? It may be
>>>> an overgeneralisation (e.g. there might be an escape mechanism), but
>>>> the conclusion sounds logical to me.
>>>
>>> File names cannot contain nulls. Therefore we can use C strings in the
>>> API. C strings cannot contain nulls. Therefore file names cannot
>>> contain nulls.
>>>
>>> See anything wrong with that reasoning? :-)
>> 
>> I see two things wrong with it:
>> 
>> 1) It's circuitous.
>> 
>> 2) It has nothing to do with Robbie's reasoning. You just invented that
>>     out of whole cloth to make him look like an utter idiot[1]. That's a
>>     nasty tactic, however, not very effective on Usenet, where everybody
>>     can go back and read what he really wrote. And even less effective
>>     when you actually quote that. Gee, if you're going to put word's in
>>     anybody's mouth, at least make a token effort to make it convincing.
>
> I don't agree with that, see below.
>
>> 
>>> Do note that the context here is Windows, so the posix heritage does
>>> not apply.
>> 
>> Irrelevant since Robbie didn't refer to any "POSIX heritage". He made an
>> observation (An embedded NUL character terminates a file name)
>
> It's this observation that is false...
>
>> and drew a conclusion (NUL characters in file names are disallowed).
>
> ... so this conclusion is also not valid.

But it's still logical. 

And whether Windows allows 0 characters or not, you still found it
necessary to put words in Robbies mouth instead of simply pointing out
that his observation was incomplete.

        hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpat. -- Ralph Babel


------------------------------

Date: Mon, 16 Feb 2015 01:51:16 +0000 (UTC)
From: Kaz Kylheku <kaz@kylheku.com>
Subject: Re: Why can I get away with this?
Message-Id: <20150215172034.275@kylheku.com>

On 2015-02-15, Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
> here. That 0-bytes are not allowed in Windows filenames is a design
> choice presumably intended to "be nice to C programmers", not the
> consequence of some law of nature or so.

The MS-DOS services like INT 21h, code 31h, take null-terminated
strings (a.k.a. "ASCIIZ" to x86 assembly language programmers).

Did DOS use ASCIIZ to be nice to C programmers? Probably not.

Though MS-DOS has features inspired clearly by Unix, like pipe syntax, and .
and .. directories for self and parent, these are probably the results of IBM
change requests to Microsoft; I don't think that the 86-DOS from Seattle
Computer PRoducts had those features.  So the hypothesis that DOS used
null-terminated string because its designers were aware of Unix isn't
plausible.

If we dig deeper, we see that delimited strings were used by CP/M.
CP/M BDOS services (like Open File) require a pointer to a file control block
(FCB) argument. In the FCB, the 8 character name and 3 character type fields
are padded with blanks. So the name FOO would be "FOO     ".

However, CP/M has a function 152 for parsing a file name, to fill the
fields in a FCB. This is used by the command processor. This function
recognizes a null character as a delimiter (among other things).
So on the CP/M command line, you would not have been able to specify
a file name with a null.


------------------------------

Date: Mon, 16 Feb 2015 10:29:57 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: Why can I get away with this?
Message-Id: <ltp8rb-58b.ln1@news.rtij.nl>

On Mon, 16 Feb 2015 00:24:44 +0100, Peter J. Holzer wrote:

> 
> And whether Windows allows 0 characters or not, you still found it
> necessary to put words in Robbies mouth instead of simply pointing out
> that his observation was incomplete.

If that is the impression I left, then I apologize. That was certainly 
not the intent.

M4



------------------------------

Date: Mon, 16 Feb 2015 10:41:25 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: Why can I get away with this?
Message-Id: <5jq8rb-58b.ln1@news.rtij.nl>

On Sun, 15 Feb 2015 15:12:35 -0800, Robbie Hatley wrote:

> 
> Which is to say that Microsoft's programmers *could have* written the
> Windows APIs so that they don't use any calls to C's standard library.

I'm completely not getting what you are saying here, as the Windows API 
never uses calls to C's standard library.

> So it's moot. The fact remains, if you embed '\0' in a file name, you'll
> have massive problems trying to use that file name in Windows, and
> probably in most operating systems and file systems, because they're
> mostly based on C, and make good use of C's standard library and its
> idioms (which include using '\0' as a string terminator).

Agree on that.

> Therefore, if one wants to write Perl code that's portable and will work
> well in conjunction with real-world OSs, hardware, and files systems,
> embedding '\0' in strings is a poor idea.

If you mean embedding NULs in filenames stored as Perl strings, you just 
said above that embedding NULs in filenames in general is a bad idea, so 
obviously, it's a bad idea to embed NULs in filenames stored as Perl 
strings.

But that does not mean ('Therefore') that it is a bad idea to use NULs in 
Perl strings. There are perfectly valid uses for that, which often 
interact with real-world OSses or hardware.

M4


------------------------------

Date: Mon, 16 Feb 2015 06:31:03 -0800
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: Why can I get away with this?
Message-Id: <tqqdnSAOgbe_YnzJnZ2dnUVZ57ydnZ2d@giganews.com>


On 2/16/2015 1:41 AM, Martijn Lievaart wrote:

> On Sun, 15 Feb 2015 15:12:35 -0800, Robbie Hatley wrote:
>
>>
>> Which is to say that Microsoft's programmers *could have* written the
>> Windows APIs so that they don't use any calls to C's standard library.
>
> I'm completely not getting what you are saying here, as the Windows API
> never uses calls to C's standard library.

Seeing as how the code is closed-source, I don't see how anyone could
know for sure if the programmers were "rolling their own" or making
use of the C standard library, unless one is one of the Microsoft
programmers who wrote that code.

>
>> So it's moot. The fact remains, if you embed '\0' in a file name, you'll
>> have massive problems trying to use that file name in Windows, and
>> probably in most operating systems and file systems, because they're
>> mostly based on C, and make good use of C's standard library and its
>> idioms (which include using '\0' as a string terminator).
>
> Agree on that.
>
>> Therefore, if one wants to write Perl code that's portable and will work
>> well in conjunction with real-world OSs, hardware, and files systems,
>> embedding '\0' in strings is a poor idea.
>
> If you mean embedding NULs in filenames stored as Perl strings, you just
> said above that embedding NULs in filenames in general is a bad idea, so
> obviously, it's a bad idea to embed NULs in filenames stored as Perl
> strings.
>
> But that does not mean ('Therefore') that it is a bad idea to use NULs in
> Perl strings. There are perfectly valid uses for that, which often
> interact with real-world OSses or hardware.

I meant primarily in file names; but also in any other places where
code external to the Perl script one is writing is using the C idiom
of terminating strings at '\0'. Otherwise, the string which other
code sees may get chopped off in mid st



-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley


------------------------------

Date: Mon, 16 Feb 2015 15:57:07 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: Why can I get away with this?
Message-Id: <33d9rb-58b.ln1@news.rtij.nl>

On Mon, 16 Feb 2015 06:31:03 -0800, Robbie Hatley wrote:

> On 2/16/2015 1:41 AM, Martijn Lievaart wrote:
> 
>> On Sun, 15 Feb 2015 15:12:35 -0800, Robbie Hatley wrote:
>>
>>
>>> Which is to say that Microsoft's programmers *could have* written the
>>> Windows APIs so that they don't use any calls to C's standard library.
>>
>> I'm completely not getting what you are saying here, as the Windows API
>> never uses calls to C's standard library.
> 
> Seeing as how the code is closed-source, I don't see how anyone could
> know for sure if the programmers were "rolling their own" or making use
> of the C standard library, unless one is one of the Microsoft
> programmers who wrote that code.
> 

Ahh, now I get what you are saying, yes I agree with that. In fact, it is 
likely that some sort of "standard library" was used, as that is the only 
sensible way to write an OS today, or back than (It actually was not that 
long before Windows, that many OSses were written pure or mostly in 
assembly, Unix being the most important exception). That often is not 
"the" standard C Library but an extract of it. But that is splitting 
hairs.

> 
>>> So it's moot. The fact remains, if you embed '\0' in a file name,
>>> you'll have massive problems trying to use that file name in Windows,
>>> and probably in most operating systems and file systems, because
>>> they're mostly based on C, and make good use of C's standard library
>>> and its idioms (which include using '\0' as a string terminator).
>>
>> Agree on that.
>>
>>> Therefore, if one wants to write Perl code that's portable and will
>>> work well in conjunction with real-world OSs, hardware, and files
>>> systems, embedding '\0' in strings is a poor idea.
>>
>> If you mean embedding NULs in filenames stored as Perl strings, you
>> just said above that embedding NULs in filenames in general is a bad
>> idea, so obviously, it's a bad idea to embed NULs in filenames stored
>> as Perl strings.
>>
>> But that does not mean ('Therefore') that it is a bad idea to use NULs
>> in Perl strings. There are perfectly valid uses for that, which often
>> interact with real-world OSses or hardware.
> 
> I meant primarily in file names; but also in any other places where code
> external to the Perl script one is writing is using the C idiom of
> terminating strings at '\0'. Otherwise, the string which other code sees
> may get chopped off in mid st

:-)

OK, I get what you are saying. I may be on a pedantic roll here, but 
please reread what you said, because it was not what you wanted to say.

If I write Perl code, I usually do not use embedded nulls, because 
whatever I'm interfacing with cannot handle them, or more regularly, 
because it is not needed or simply makes no sense at all.

When writing pure Perl, I never feel the need to use embedded nulls.

Using embedded nulls in Perl strings only makes sense when an external 
interface demands it. So embedded nulls are actually only likely when 
talking to real-world OSses, hardware and external libraries, about the 
opposite of what you said above. :-)

M4


M4


------------------------------

Date: Mon, 16 Feb 2015 15:59:18 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: Why can I get away with this?
Message-Id: <67d9rb-58b.ln1@news.rtij.nl>

On Sun, 15 Feb 2015 04:42:14 -0800, Robbie Hatley wrote:

> On 2/13/2015 5:37 PM, Kaz Kylheku wrote:
>  > On 2015-02-10, Robbie Hatley <see.my.sig@for.my.address> wrote:
>  > > I could have a file named.....
>  > > our $FileName = "\x01犬草\n\x02\N{MALE SIGN}猫\x03\x04\a\0"; print
>  > > "\n\$FileName = $FileName\n\n";
>  > >
>  > > Oh, my.
>  >
>  > So what's your point? Millions of people world round use kanji in
>  > Windows filesystem names.
> 
> Actually, that string contains Kanji, Hanzi (the superset from which
> Kanji is taken), ASCII control characters (such as "Begin Transmission"
> and "End Transmission"), the "male gender" symbol, the "alarm bell"
> character (which should make your computer go "ding" if rendered
> properly),
> and the "null" character. In other words, as George Takei would say,
> "Oh, my." :-)  So no, I don't think you're going to find anyone but a
> madman (such as myself) using all those characters in a file name.
> 
> But yes, by using Perl as interface to the Windows 8.1 file system,
> you *can* successfully create, read, and write files with such
> preposterous names. Not that I recommend that people do so!!!!!
> 
>  > > Wait, there's actually one other character which *MUST* be
>  > > disallowed in file names in nearly every file system, and that's
>  > > '\0', except
>  >
>  > Only if the OS is written in some language in which it is customary
>  > to work with null-terminated strings.
> 
> Many OSs (as well as Perl) are written in C. And while Perl does allow
> '\0' embedded in strings, many OS APIs do not, and give unwanted results
> if you try (such as, file on disk has wrong name, with some of the
> characters chopped o

Thanks!

This post sums it up completely I think, this is exactly what I was 
trying to say in my clumsy way.

M4


------------------------------

Date: Mon, 16 Feb 2015 15:56:00 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Why can I get away with this?
Message-Id: <87egpp7svj.fsf@doppelsaurus.mobileactivedefense.com>

Robbie Hatley <see.my.sig@for.my.address> writes:
> On 2/15/2015 12:49 PM, Rainer Weikusat wrote:
>> Robbie Hatley <see.my.sig@for.my.address> writes:

[...]

>>> I'm not seeing how it's "not logical". C uses null string terminators.
>>> Every version of Microsoft Windows is written mostly in C

[...]

>> A 'string' is defined as
>>
>> 	A string is a contiguous sequence of characters terminated by
>> 	and including the first null character.
>>
>> in section 7.1.1 of   ISO/IEC 9899:1999 ("C99") and that's the start of
>> chapter 7 whose title is "Library", ie, this is a convention employed by
>> certain functions in the C standard library and nothing more than that:
>> No actual program written in C is required to use any of these function
>> and thus, honour this convention,

[...]

> Which is to say that Microsoft's programmers *could have* written the
> Windows APIs so that they don't use any calls to C's standard library.

No. It is to say that the C language doesn't define a data type 'string'
(it has string literals but these are specifically not required to
adhere to the library convention) and that the statement 'C uses null
string terminators' is therefore wrong: There are situations when using
null-terminated strings in C programs is convenient (but mostly not
because of the pitiful support for 'strings' the C standard library
provides) and there are times when it isn't (for instance, when dealing
with dynamically allocated, growable strings).

[...]

> So it's moot. The fact remains, if you embed '\0' in a file name, you'll
> have massive problems trying to use that file name in Windows, and probably
> in most operating systems and file systems, because they're mostly based
> on C, and make good use of C's standard library and its idioms (which
> include using '\0' as a string terminator).

This is an unsubstantiated conjecture, especially considering that there
are two kinds of C implementations, 'hosted' and 'freestanding', and
that the str* functions are not part of the library required to be
supported by freestanding implementations.

> Therefore, if one wants to write Perl code that's portable and will work
> well in conjunction with real-world OSs, hardware, and files systems,
> embedding '\0' in strings is a poor idea.

I'm not aware of any 'real-world OS, hardware or filesystem' which cares
in the slightest for the contents of Perl strings as Perl strings
(AFAIK, neither a 'Perl OS' nor 'a Perl filesystem' nor any kind of
'Perl hardware implementation exists and if they did, I'd expect them to
support 'features of Perl'). As soon as the corresponding data moves out
of Perl, all kinds of bizarre effects can be expected for anything which
isn't a "US-sanctioned printable character", but that's a different
conversation.



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4372
***************************************


home help back first fref pref prev next nref lref last post