[33097] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4373 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Feb 17 00:09:20 2015

Date: Mon, 16 Feb 2015 21:09:04 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 16 Feb 2015     Volume: 11 Number: 4373

Today's topics:
    Re: Extract data with regular expressions <rweikusat@mobileactivedefense.com>
    Re: Extract data with regular expressions (Tim McDaniel)
    Re: Extract data with regular expressions <rweikusat@mobileactivedefense.com>
    Re: use Storable: failed example from Intermediate Perl <senorsmile@gmail.com>
    Re: Whitespace in code <news@todbe.com>
    Re: Whitespace in code <rweikusat@mobileactivedefense.com>
    Re: Whitespace in code <hjp-usenet3@hjp.at>
    Re: Whitespace in code <hjp-usenet3@hjp.at>
    Re: Whitespace in code <hjp-usenet3@hjp.at>
    Re: Whitespace in code <news@todbe.com>
    Re: Why can I get away with this? <hjp-usenet3@hjp.at>
    Re: Why can I get away with this? <hjp-usenet3@hjp.at>
    Re: Why can I get away with this? <hjp-usenet3@hjp.at>
    Re: Why can I get away with this? <m@rtij.nl.invlalid>
    Re: Why can I get away with this? <lionslair@consolidated.net>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Mon, 16 Feb 2015 16:13:56 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Extract data with regular expressions
Message-Id: <87a90d7s1n.fsf@doppelsaurus.mobileactivedefense.com>

Robbie Hatley <see.my.sig@for.my.address> writes:
> On 2/16/2015 4:27 AM, Wasell wrote:
>> On Sun, 15 Feb 2015 14:51:35 -0800, in article
>> <7Mudna2LWdtpv3zJnZ2dnUVZ572dnZ2d@giganews.com>, Robbie Hatley wrote:
>>>
>>> Ok, I just tested my program and it works fine:
>> [snip]
>>
>> Beware of the Scunthorp problem:
>>    <https://en.wikipedia.org/wiki/Scunthorpe_problem>
>>
>>>         next RECORD if ($_ =~ m/$BadWord/)
>>
>> Possibly better:
>>      next RECORD if ($_ =~ m/\b$BadWord\b/)
>
> Ah, good point. Thanks for pointing that out. Otherwise, if a person
> was trying to filter out, for example, the word "ass", the script
> would also reject the word "amass".

While this probably doesn't matter much here, a better way to do this is
to build a regex matching all 'bad words' as this will scan each
'record' only once (perl also supports building tries from a set of
regex alternatives):

-------------
#! /usr/bin/perl

use v5.14;
use strict;
use warnings;

$/ = qq(\n********************\n);

my @BadWords = qw ( asdf qwer yuio );

my $bad_re = '\b(?:'.join('|', map { quotemeta($_) } @BadWords ).')\b';

RECORD: while (<>) {
    next if /$bad_re/;
   say;
}
------------

Yet a different (and very likely slower) approach would be to build a
hash of 'bad words' and check the input word-by-word with the help of a
regex-based lexical analyser:

------------
#! /usr/bin/perl

use v5.14;
use strict;
use warnings;

$/ = qq(\n********************\n);

my @BadWords = qw ( asdf qwer yuio );

my %bad_word = map { $_, 1 } @BadWords;
    

RECORD: while (<>) {
    for ($_) {
	/\G(\w+)/gc && do {
	    next RECORD if $bad_word{$1};
	    redo;
	};

	/\G\W+/gc and redo;
    }

    say;
}
------------


------------------------------

Date: Mon, 16 Feb 2015 18:52:05 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: Extract data with regular expressions
Message-Id: <mbte8l$m5p$1@reader1.panix.com>

In article <MPG.2f4c00903b878f2698969e@news.eternal-september.org>,
Wasell  <from_usenet_2014@wasell.user32.com> wrote:
>On Sun, 15 Feb 2015 14:51:35 -0800, in article 
><7Mudna2LWdtpv3zJnZ2dnUVZ572dnZ2d@giganews.com>, Robbie Hatley wrote:
>> 
>> Ok, I just tested my program and it works fine:
>[snip]
>
>Beware of the Scunthorp problem:
>  <https://en.wikipedia.org/wiki/Scunthorpe_problem>
>
>>        next RECORD if ($_ =~ m/$BadWord/)
>
>Possibly better:
>    next RECORD if ($_ =~ m/\b$BadWord\b/)

I think this is better:

    next RECORD if /\b$BadWord\b/;

Parentheses are not needed in trailing statement modifiers, and "$_ =~"
is the default.

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Mon, 16 Feb 2015 21:32:34 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Extract data with regular expressions
Message-Id: <871tlp7dal.fsf@doppelsaurus.mobileactivedefense.com>

Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
> Robbie Hatley <see.my.sig@for.my.address> writes:
>> On 2/16/2015 4:27 AM, Wasell wrote:
>>> On Sun, 15 Feb 2015 14:51:35 -0800, in article
>>> <7Mudna2LWdtpv3zJnZ2dnUVZ572dnZ2d@giganews.com>, Robbie Hatley wrote:
>>>>
>>>> Ok, I just tested my program and it works fine:
>>> [snip]
>>>
>>> Beware of the Scunthorp problem:
>>>    <https://en.wikipedia.org/wiki/Scunthorpe_problem>
>>>
>>>>         next RECORD if ($_ =~ m/$BadWord/)
>>>
>>> Possibly better:
>>>      next RECORD if ($_ =~ m/\b$BadWord\b/)
>>
>> Ah, good point. Thanks for pointing that out. Otherwise, if a person
>> was trying to filter out, for example, the word "ass", the script
>> would also reject the word "amass".

[...]

> Yet a different (and very likely slower) approach would be to build a
> hash of 'bad words' and check the input word-by-word with the help of a
> regex-based lexical analyser:

[...]

> RECORD: while (<>) {
>     for ($_) {
> 	/\G(\w+)/gc && do {
> 	    next RECORD if $bad_word{$1};
> 	    redo;
> 	};
>
> 	/\G\W+/gc and redo;
>     }

[...]

Something I like to add: Not all of this is really needed. Specifically,
the redo in the do block can be omitted to make one case fall through to
the other and the /c-flag on the 2nd regex as a text which has neither a
word nor a non-word character at the current position has obviously been
processed completely. This leads to the following inner loop:

    for ($_) {
	next RECORD if /\G(\w+)/gc and $bad_word{$1};
	/\G\W+/g and redo;
    }

which can also be expressed as

    /\G(\w+)/gc and $bad_word{$1} and next RECORD or /\G\W+/g and redo
        for $_;

Random idea (probably not very useful): for could default to $_ in
absence of something else.



------------------------------

Date: Mon, 16 Feb 2015 18:11:07 -0800 (PST)
From: senorsmile <senorsmile@gmail.com>
Subject: Re: use Storable: failed example from Intermediate Perl
Message-Id: <f2a9c83a-c593-4eb4-b2eb-c77a0d90c574@googlegroups.com>

On Monday, February 16, 2015 at 4:12:06 AM UTC-8, Justin C wrote:
> On 2015-02-16, senorsmile wrote:
> > I am going through Intermediate Perl a second time, trying to absorb all of the examples.  
> >
> > I am attempting to run the following code: 
> >
> > use Storable;
> > my @data1 = qw(one won);
> > my @data2 = qw(two too to);
> > push @data2, \@data1;
> > push @data1, \@data2;
> > my $frozen = freeze [\@data1, \@data2];
> >
> > which gives me a syntax error.  I have simplified it to the point where I get a more useful error: 
> >
> > use Storable;
> > my @data1 = qw(one won);
> > my $frozen = freeze \@data1;
> >
> > Backslash found where operator expected at -e line 4, near "freeze \"
> > 	(Do you need to predeclare freeze?)
> >
> >
> > If I use the "fully qualified" form of freeze 
> >   Storable::freeze
> > it works fine.  I shouldn't have to do this though, right? 
> 
> Storable doesn't export 'freeze' by default. Either do what you're 
> doing now or change the 'use' line:
> 
> use Storable qw/freeze/;
> 
> and import freeze.
> 
> 
>    Justin.
> 
> -- 
> Justin C, by the sea.

Thanks guys.  I realize now after rereading the docs on metacpan for Storable that this is indeed the case.  So, I have two (hopefully not too annoying) followup questions: 

1. Is this how it's supposed to be? (And thus, the example in the book is incomplete?)

2. What is the best way, without examining the source of the .pm to figure out what is imported from a module by default?  i.e. How would YOU go figure it out in the most time efficient manner?


------------------------------

Date: Mon, 16 Feb 2015 11:50:34 -0800
From: "$Bill" <news@todbe.com>
Subject: Re: Whitespace in code
Message-Id: <mbthl3$uuj$1@dont-email.me>

On 2/15/2015 18:25, Kaz Kylheku wrote:
> On 2015-02-10, $Bill <news@todbe.com> wrote:
>> I would also recommend using a UNIX shell on Windows:
>>     ftp://ftp.astron.com/pub/tcsh/  or possibly another UNIX shell port
>> instead of the dumb cmd.exe.
>
> Actually, cmd.exe might be preferrable to tcsh. Tough call!

You must be kidding.  :)



------------------------------

Date: Mon, 16 Feb 2015 19:53:09 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Whitespace in code
Message-Id: <8761b17hwa.fsf@doppelsaurus.mobileactivedefense.com>

"$Bill" <news@todbe.com> writes:
> On 2/15/2015 18:25, Kaz Kylheku wrote:
>> On 2015-02-10, $Bill <news@todbe.com> wrote:
>>> I would also recommend using a UNIX shell on Windows:
>>>     ftp://ftp.astron.com/pub/tcsh/  or possibly another UNIX shell port
>>> instead of the dumb cmd.exe.
>>
>> Actually, cmd.exe might be preferrable to tcsh. Tough call!
>
> You must be kidding.  :)

http://www.perl.com/doc/FMTEYEWTK/versus/csh.whynot


------------------------------

Date: Mon, 16 Feb 2015 22:10:12 +0100
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: Whitespace in code
Message-Id: <slrnme4n5k.fa6.hjp-usenet3@hrunkner.hjp.at>

On 2015-02-16 02:22, Kaz Kylheku <kaz@kylheku.com> wrote:
> On 2015-02-15, Robbie Hatley <see.my.sig@for.my.address> wrote:
>> On 2/9/2015 9:50 PM, $Bill wrote:
>>> On 2/9/2015 09:57, Robbie Hatley wrote:
>>> >
>>> > Since then, I've switched to a different text editor on my Win8.1
>>> > notebook, "Notepad++" ...
>>>
>>> Haven't tried it, but I would suggest you try a Win32 native port of
>>> Vim or Emacs - I've been using vim (gvim) the entire time I've been
>>> using a PC and it's predecessor (vi) the entire time I was on UNIX.
>>
>> I find the learning curve for vi to be more time-consuming than
>> I can afford.
>
> For me, that learning curve was some long-forgotten week in 1995 when I
> switched to Vim.  Actually, at that time, I downloaded and compiled every open
> source clone of Vi I could get my hands on.

So you obviously already knew vi. Was that your learning curve for
switching from vi to vim? 

For me the learning curve for vi was probably the better part of a
semester. The basics I had within 2 hours or so (but then I already knew
the UCSD p-System editor, which was also command based), but to get
really familiar took a long time (using the damn thing on a VT102
terminal with probably not ideal termcap settings had something to do
with it. Having used KEDIT (a PC clone of the extremely powerful IBM
host XEDIT, not the KDE editor) before that was also not exactly
motivating).


> Vim clearly beat all of them (and nothing has caught up since).
>
> A small investment in learning long ago: big, lasting payoff.

Almost 30 years ago in my case. Unfortunately I haven't really kept up:
Vim is much more powerful than vi, but I use only a tiny fraction of
the new features.

> The learning curve argument falls flat in multiple ways because learning
> curves don't last forever, but their benefits are enduring, and
> because anything which has no learning curve isn't worth it, generally.
> If something requires next to no skill, it only holds back those who
> can develop skill at the same level as the unskilled.

Unfortunately, while the learning curve argument would fall flat in a
rational world, it is extremely important in todays "I need it
yesterday" world. There is never time to learn anything properly, but
there is always time to waste by doing stuff ineffectively.


> I have experience developing a complicated syntax highlighting file for Vim.

That's one of the things I never did. I should try it some time.

        hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Mon, 16 Feb 2015 22:22:11 +0100
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: Whitespace in code
Message-Id: <slrnme4ns3.fa6.hjp-usenet3@hrunkner.hjp.at>

On 2015-02-16 10:43, G.B. <bauhaus@futureapps.invalid> wrote:
> On 15.02.15 23:03, Peter J. Holzer wrote:
>> Or maybe an IDE like Eclipse.
>
> For me, a big problem with learning vi is caused by authors
> of tutorials using this approach: NOT starting it with
> everything that really is intuitive (mnemonic) in vi,
> once accepted:
> d for delete, r for replace, w for word, e for end,
> ) for end of sentence, } for end of paragraph, / for RE,
> % for matching bracket etc., BUT to start from hysterical
> raisins like h j k l!  This degree of unintuitive operation
> can drive anyone away, I think.

Sturgeon's law: 90% of everything is crud.

This is doubly true for tutorials in the web. And probably thrice true
for vi tutorials. I'm not sure if I have ever read a really good vi
tutorial, although pieces of several tutorials helped me tremendously.

I think the most important rules with vi are:

1) vi is not a typewriter emulator. It is an editor, i.e. a tool
   for editing (= changing) text. As a programmer (and probably also as
   an author) you spend most of your time changing existing text, not
   writing new text, so that's what it is optimized for.

2) vi knows two types of commands: Actions and Movements. They 
   are orthogonal and you can combine them.

3) vi doesn't have "modes". What is described as the "insert mode" 
   in most tutorials is really just the argument to one of the insert
   commands. (Unfortunately this isn't really true in most vi clones
   including vim any more)

        hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Mon, 16 Feb 2015 22:29:04 +0100
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: Whitespace in code
Message-Id: <slrnme4o90.fa6.hjp-usenet3@hrunkner.hjp.at>

On 2015-02-16 19:53, Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
> "$Bill" <news@todbe.com> writes:
>> On 2/15/2015 18:25, Kaz Kylheku wrote:
>>> On 2015-02-10, $Bill <news@todbe.com> wrote:
>>>> I would also recommend using a UNIX shell on Windows:
>>>>     ftp://ftp.astron.com/pub/tcsh/  or possibly another UNIX shell port
>>>> instead of the dumb cmd.exe.
>>>
>>> Actually, cmd.exe might be preferrable to tcsh. Tough call!
>>
>> You must be kidding.  :)
>
> http://www.perl.com/doc/FMTEYEWTK/versus/csh.whynot

That was relative to the bourne shell. The comparison of csh vs. cmd.exe
might produce a different result (in fact, having used command.com,
cmd.exe, sh, csh, zsh, bash, ksh, I would prefer csh to cmd.exe unless
cmd.exe has evolved a lot more in the last 20 years than I noticed).

That should be moot though - is there a system which has tcsh available
but not bash? If both are available there is no question that I would
prefer bash (and if zsh is available I prefer that).

        hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Mon, 16 Feb 2015 13:36:13 -0800
From: "$Bill" <news@todbe.com>
Subject: Re: Whitespace in code
Message-Id: <mbtnr8$sbr$1@dont-email.me>

On 2/16/2015 11:53, Rainer Weikusat wrote:
> "$Bill" <news@todbe.com> writes:
>> On 2/15/2015 18:25, Kaz Kylheku wrote:
>>> On 2015-02-10, $Bill <news@todbe.com> wrote:
>>>> I would also recommend using a UNIX shell on Windows:
>>>>      ftp://ftp.astron.com/pub/tcsh/  or possibly another UNIX shell port
>>>> instead of the dumb cmd.exe.
>>>
>>> Actually, cmd.exe might be preferrable to tcsh. Tough call!
>>
>> You must be kidding.  :)
>
> http://www.perl.com/doc/FMTEYEWTK/versus/csh.whynot

Apples and oranges.

1) That article is about 'programming in' csh not just using it as a shell.

2) There aren't many shells with native ports to Windoze.

3) tcsh is better than csh and way more powerful than sh.

4) Korn or bash native ports would probably be acceptable also.

5) I used Cygwin a while back, but am not fond of the 'emulation layer'
    approach - I prefer native ports of UNIX commands/utilities.



------------------------------

Date: Mon, 16 Feb 2015 21:46:13 +0100
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: Why can I get away with this?
Message-Id: <slrnme4lol.fa6.hjp-usenet3@hrunkner.hjp.at>

On 2015-02-16 01:51, Kaz Kylheku <kaz@kylheku.com> wrote:
> On 2015-02-15, Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
>> here. That 0-bytes are not allowed in Windows filenames is a design
>> choice presumably intended to "be nice to C programmers", not the
>> consequence of some law of nature or so.
>
> The MS-DOS services like INT 21h, code 31h, take null-terminated
> strings (a.k.a. "ASCIIZ" to x86 assembly language programmers).
>
> Did DOS use ASCIIZ to be nice to C programmers? Probably not.
>
> Though MS-DOS has features inspired clearly by Unix, like pipe syntax, and .
> and .. directories for self and parent, these are probably the results of IBM
> change requests to Microsoft; I don't think that the 86-DOS from Seattle
> Computer PRoducts had those features.  So the hypothesis that DOS used
> null-terminated string because its designers were aware of Unix isn't
> plausible.

There was a massive change in the API of MS-DOS between 1.x and 2.x.
MS-DOS 1.x was basically a CP/M clone. MS-DOS 2.x got all the Unix
system calls which could be implemented on a single-process OS (with
some minor changes, e.g. changing the path separator from '/' to a
configurable character ('\' by default)).

The designers of MS-DOS 2.x were obviously very much aware of Unix.

        hp

-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Mon, 16 Feb 2015 23:13:10 +0100
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: Why can I get away with this?
Message-Id: <slrnme4qrm.fa6.hjp-usenet3@hrunkner.hjp.at>

On 2015-02-16 14:57, Martijn Lievaart <m@rtij.nl.invlalid> wrote:
> On Mon, 16 Feb 2015 06:31:03 -0800, Robbie Hatley wrote:
>> On 2/16/2015 1:41 AM, Martijn Lievaart wrote:
>>> On Sun, 15 Feb 2015 15:12:35 -0800, Robbie Hatley wrote:
>>>> Which is to say that Microsoft's programmers *could have* written the
>>>> Windows APIs so that they don't use any calls to C's standard library.
>>>
>>> I'm completely not getting what you are saying here, as the Windows API
>>> never uses calls to C's standard library.
>> 
>> Seeing as how the code is closed-source, I don't see how anyone could
>> know for sure if the programmers were "rolling their own" or making use
>> of the C standard library, unless one is one of the Microsoft
>> programmers who wrote that code.
>> 
>
> Ahh, now I get what you are saying, yes I agree with that. In fact, it is 
> likely that some sort of "standard library" was used, as that is the only 
> sensible way to write an OS today,

Uh, no. An OS is basically the classical example of a use of a
"freestanding" C enviromment, i.e. one which cannot depend on the
presence of the standard library (because many standard library
functions invoke OS functions).

There are some functions and macros which need to be present even in a
freestanding implementation, but not even the the string handling
functions <string.h> are among them.

So an OS wouldn't have to use C conventions because it is written in C
(because it is really written in a subset of C which doesn't have these
conventions).

But it may have to use C conventions because it is expected that many
applications will be written in C. This was not the case for MS-DOS 1.0,
and maybe not even for Windows 1.0, but I am sure this was a major
design criterium for the Win32 API: In the mid-90's a lot of Windows
applications were written in either C or C++. Allowing file names which
could not be handled with standard C functions (like fopen) would have
been a major nono. (Yes, I've seen the function you posted. Did anybody
ever use that? Was that even intended for native file systems, or maybe
only for some network file systems? Did the underlying file systems
allow such file names?).


>>>> Therefore, if one wants to write Perl code that's portable and will
>>>> work well in conjunction with real-world OSs, hardware, and files
>>>> systems, embedding '\0' in strings is a poor idea.
>>>
>>> If you mean embedding NULs in filenames stored as Perl strings, you
>>> just said above that embedding NULs in filenames in general is a bad
>>> idea, so obviously, it's a bad idea to embed NULs in filenames stored
>>> as Perl strings.
>>>
>>> But that does not mean ('Therefore') that it is a bad idea to use NULs
>>> in Perl strings. There are perfectly valid uses for that, which often
>>> interact with real-world OSses or hardware.

Right. Perl strings can represent a lot of things (for example, I
frequently store bitmaps or pictures in Perl strings), and they certaily
can contain null bytes (or characters). But some things which can be
stored in Perl strings (like file names, host names, or people's names)
cannot contain null characters.

> If I write Perl code, I usually do not use embedded nulls, because 
> whatever I'm interfacing with cannot handle them, or more regularly, 
> because it is not needed or simply makes no sense at all.
>
> When writing pure Perl, I never feel the need to use embedded nulls.
>
> Using embedded nulls in Perl strings only makes sense when an external 
> interface demands it. So embedded nulls are actually only likely when 
> talking to real-world OSses, hardware and external libraries, about the 
> opposite of what you said above. :-)

As I wrote above, it depends on what your string represents: A
human-readable text will probably never include a null character[1] (but
it may include a null byte if you use an encoding where a null character
can contain null byte(s)). A string (like a file name or a host name)
which - in addition to being human-readable - has to conform to some
externally imposed restriction, is even less likely to contain null
characters. But a string as "a sequence of bytes/values" handled by a
perl script? Yes, of course that can contain zeros: Bitmaps, binary
files, audio data (or any sequence of physical measurements), etc.

        hp

[1] I'm again showing my age by providing a counter-example: There were
    terminals where NUL-bytes had to be inserted after CRNL to ensure
    that it was ready to print the start of the new line. A text file
    intended to be printed on such a terminal might have included NUL
    characters (although at least on UNIX it was the terminal driver's
    duty to include such padding). Also, in the early days of MS-DOS,
    some text files included (specific) printer codes for bold-face,
    italic, font-sizes, etc. Some of those may have included NUL
    characters. But it's probably better to regard such files as
    "binaries".

-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Mon, 16 Feb 2015 23:35:35 +0100
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: Why can I get away with this?
Message-Id: <slrnme4s5n.fa6.hjp-usenet3@hrunkner.hjp.at>

On 2015-02-16 15:56, Rainer Weikusat <rweikusat@mobileactivedefense.com> wrote:
> Robbie Hatley <see.my.sig@for.my.address> writes:
>> On 2/15/2015 12:49 PM, Rainer Weikusat wrote:
>>> Robbie Hatley <see.my.sig@for.my.address> writes:
>>>> I'm not seeing how it's "not logical". C uses null string terminators.
>>>> Every version of Microsoft Windows is written mostly in C
>>> A 'string' is defined as
>>>
>>> 	A string is a contiguous sequence of characters terminated by
>>> 	and including the first null character.
>>>
>>> in section 7.1.1 of   ISO/IEC 9899:1999 ("C99") and that's the start of
>>> chapter 7 whose title is "Library", ie, this is a convention employed by
>>> certain functions in the C standard library and nothing more than that:
>>> No actual program written in C is required to use any of these function
>>> and thus, honour this convention,
>
> [...]
>
>> Which is to say that Microsoft's programmers *could have* written the
>> Windows APIs so that they don't use any calls to C's standard library.
>
> No. It is to say that the C language doesn't define a data type 'string'
> (it has string literals but these are specifically not required to
> adhere to the library convention)

This is a rather bold statement. 

There are some library functions which don't expect or produce
null-terminated strings. There is also one[1] circumstance in which a
string literal will not result in a null-terminated sequence of bytes.
But in general the rule for string literals and library functions is the
same: A string literal is translated to a sequence of bytes terminated
with '\0', and library functions dealing with "strings" expect and
produce the same. Indeed this is the very definition of a "string" in
the standard: "A string is a contiguous sequence of characters
terminated by and including the first null character." (Oh, wait, that
was already posted by some "Rainer Weikusat". Are you sure there aren't
two people using that name?)
A sequence of chars may or may not be terminated by '\0', but if it
isn't, it's not a "string" as defined by the standard[2] (just like a
sequence of unsigned short is not a string, although it may represent a
sequence of characters encoded in UTF-16).

        hp

[1] AFAIK only one. But I'll probably think of a second one 5 minutes
    after sending this posting.

[2] Every standard gets (and needs) to define its terms. The C
    standard's use of "string" may be a bit idiosyncratic, but it's not
    worse than its definition of "byte" or "object".

-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Tue, 17 Feb 2015 00:34:29 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: Why can I get away with this?
Message-Id: <5dbarb-58b.ln1@news.rtij.nl>

On Mon, 16 Feb 2015 23:13:10 +0100, Peter J. Holzer wrote:

[big snip]

Absolutely.

> [1] I'm again showing my age by providing a counter-example: There were
>     terminals where NUL-bytes had to be inserted after CRNL to ensure
>     that it was ready to print the start of the new line. A text file

Funny, I just posted in another newsgroup about the ASR 33, which is 
basically a TELEX modified for ASCII. Talk about age showing.

:-)

M4


------------------------------

Date: Mon, 16 Feb 2015 20:50:42 -0600
From: Martin Eastburn <lionslair@consolidated.net>
Subject: Re: Why can I get away with this?
Message-Id: <n0yEw.75643$Oy4.50516@fx17.iad>

On 2/16/2015 5:34 PM, Martijn Lievaart wrote:
> On Mon, 16 Feb 2015 23:13:10 +0100, Peter J. Holzer wrote:
>
> [big snip]
>
> Absolutely.
>
>> [1] I'm again showing my age by providing a counter-example: There were
>>      terminals where NUL-bytes had to be inserted after CRNL to ensure
>>      that it was ready to print the start of the new line. A text file
>
> Funny, I just posted in another newsgroup about the ASR 33, which is
> basically a TELEX modified for ASCII. Talk about age showing.
>
> :-)
>
> M4
>
ASR A-sync send receive 33 is the 8 level of the ever popular ASR 32 
which is 5 level and had replaced the many older block pin machines.

The 32 and 33 were floating drums machines - think of the IBM floating 
ball - in a barrel.  They were TTY's I have a 33 today. With manuals.

My first 32 was on a net, but soon off.  The 33 was the first quality
printer for my PC - back in 75.  We used them in research and with PDP's 
all of the time.  The Floating ball from IBM was a beautiful PDP printer 
term in the lab.  The first one went to a friend in trade so he
could put it on a Micro Processor of the famed PDP-11.  That was about
10 years ago now.

The mainframe (IBM 1620) used drum printers with upper and lower case.
They were 600 pages an hour.  All caps were twice as fast.

Martin




------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4373
***************************************


home help back first fref pref prev next nref lref last post