[33136] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 4414 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Apr 14 16:09:19 2015

Date: Tue, 14 Apr 2015 13:09:04 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Tue, 14 Apr 2015     Volume: 11 Number: 4414

Today's topics:
    Re: "Deep Recursion" warning on factorial script. <bauhaus@futureapps.invalid>
    Re: "Deep Recursion" warning on factorial script. <rweikusat@mobileactivedefense.com>
    Re: "Deep Recursion" warning on factorial script. <lionslair@consolidated.net>
    Re: "Deep Recursion" warning on factorial script. <news@todbe.com>
    Re: "Deep Recursion" warning on factorial script. <bauhaus@futureapps.invalid>
    Re: "Deep Recursion" warning on factorial script. <rweikusat@mobileactivedefense.com>
        -d test recognizes Windows 8.1 SYMLINKD as being direct <see.my.sig@for.my.address>
    Re: -d test recognizes Windows 8.1 SYMLINKD as being di <rweikusat@mobileactivedefense.com>
    Re: -d test recognizes Windows 8.1 SYMLINKD as being di <gravitalsun@hotmail.foo>
    Re: Fun With Unicode <hjp-usenet3@hjp.at>
    Re: Fun With Unicode (Seymour J.)
    Re: Regex replace line breaks (correction) <see.my.sig@for.my.address>
    Re: Regex replace line breaks <whynot@pozharski.name>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Mon, 13 Apr 2015 17:14:17 +0200
From: Georg Bauhaus <bauhaus@futureapps.invalid>
Subject: Re: "Deep Recursion" warning on factorial script.
Message-Id: <mggmef$r65$1@dont-email.me>

On 13.04.15 16:15, Rainer Weikusat wrote:
> Georg Bauhaus <bauhaus@futureapps.invalid> writes:
>> On 12.04.15 23:20, Rainer Weikusat wrote:
>>> The languages are different.
>>
>> Any language is different from any other than itself. But
>> not every difference is a false friend, let alone of necessity(*).
>
> This is a case were usage diverged in the past despite common roots and
> as of today, the historic divergence is a fact of life which has to be
> dealt with.

Sure, but there is no need to repeat mistakes, provided that there
demonstrably was a mistake. As I said, '=' needs a little introduction
and then some, because people usually learn at school that '=' is used
to express equality. So, for them, i = i + 1 just looks impossible.

> Unless some  context supplies an interpretation, (A . B) doesn't mean
> anything.

Precisely!

>> Without an elitist approach, though, there is no excuse for
>> interpreting everything differently once again, IMHO.
>>
>> 3 + 4 = -1
>>
>> "The languages are just different, you know. '+' stands for
>> subtraction, as we have already used '-' as the negation operator."
>
> 'Mathematically', + is an operator symbol and it represents some kind of
> operation defined on the members of some set.

Right, similarly '=' stands for something, mathematically.

> Consequently,
>
> 3 + 4 = 0.75
>
> for some definition of +.
>
> Or
>
> ("3" + "4").equals("34")
>
> but
>
> ("34" - "4")
>
> is an error and not "3".

Yes, a syntax error, given the above.

> [...]
>
>> http://www.i-programmer.info/history/people/144-dijkstra.html?start=1

> when people are given sets of abstact rules they
> must obey to but are free to apply all of their ingenuitiy in order to
> continue "writing FORTRAN in any language"

IOW, these people would still not produce a good structure?
Does a "non-structured PL" using overloaded math symbols
demonstrably alleviate the pain? ((Sub)contraries lurking...)

>  IOW, I consider this a very bad article.

Badly written? It says that "structured programming" would have been
more acceptable if Dijkstra had been a different person. So, again,
human influence on the transfer of knowledge.



------------------------------

Date: Mon, 13 Apr 2015 17:58:13 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: "Deep Recursion" warning on factorial script.
Message-Id: <87sic46k4a.fsf@doppelsaurus.mobileactivedefense.com>

Georg Bauhaus <bauhaus@futureapps.invalid> writes:
> On 13.04.15 16:15, Rainer Weikusat wrote:
>> Georg Bauhaus <bauhaus@futureapps.invalid> writes:
>>> On 12.04.15 23:20, Rainer Weikusat wrote:
>>>> The languages are different.
>>>
>>> Any language is different from any other than itself. But
>>> not every difference is a false friend, let alone of necessity(*).
>>
>> This is a case were usage diverged in the past despite common roots and
>> as of today, the historic divergence is a fact of life which has to be
>> dealt with.
>
> Sure, but there is no need to repeat mistakes, provided that there
> demonstrably was a mistake. As I said, '=' needs a little introduction
> and then some, because people usually learn at school that '=' is used
> to express equality. So, for them, i = i + 1 just looks impossible.

That some people get the opportunity to force their ideas of 'sensible
syntax' onto children at an early age and that their idea of 'sensible
syntax' stems from an older tradition than the ideas about sensible
behaviour the Catholic church likes to propagate (coincidentally, also
by forcing them onto children at an early age which has "always been
this way") came from doesn't mean they're right and everybody else is
wrong[*].

IMHO, there is no "right" or "wrong" in this question, just people
who assert that their opinion must be right because it's theirs.

[*] Similarly to something Dijkstra said, one could state that no one
    who has been brought up internalizing that single letters picked from
    two or three different scripts make sensible variable names can ever
    become a decent programmer, at best, he will end up an expert
    machine trickster (NB: This is total nonsense in both cases).

[...]

>>> http://www.i-programmer.info/history/people/144-dijkstra.html?start=1
>
>> when people are given sets of abstact rules they
>> must obey to but are free to apply all of their ingenuitiy in order to
>> continue "writing FORTRAN in any language"
>
> IOW, these people would still not produce a good structure?

As a practical example, consider the following pseudo-code (losely
inspired by SEAM):

if (...) {
	.
	if (...) {
        	.
        } else {
        	.
        }
	.
} else {
	if (...) {
		.
        } else {
		.
        }
        .
        if (...) {
        	.
	} else {
        	.
	}
        .
}
 .
if (...) {
	.
} else {
	.
}
 .
if (...) {
	.
} else {
	.
        if (...) {
        	.
	}
        .
}        

[single dots are supposed to represent a (branchless) sequence of
statements]

Provided I didn't miscount them, there are 36 possible codepaths through
this thicket, IOW, it implements 36 different algorithms sharing code to
some degree. Now, please compare this with the following statement (from
the first page of the article you linked to):

	Imagine that you have in front of you the text of a well written
	program. Now take a pair of scissors and cut it up into random
	chunks. Shuffle them and stick back together with tape. The
	resulting program now doesn't work but to make it work all you
	have to do is put GOTO instructions at the start and end of each
	shuffled chunk to make the order of execution the same as it was
	before the shuffle.

Is there a non-superficial difference[*] between the two?

[*] Beyond absense of goto in the first example and the second being a
    simplified example.


------------------------------

Date: Mon, 13 Apr 2015 23:26:12 -0500
From: Martin Eastburn <lionslair@consolidated.net>
Subject: Re: "Deep Recursion" warning on factorial script.
Message-Id: <HF0Xw.316123$7p1.137217@fx10.iad>

John the code that changes itself and uses the address space of the 
various call and jumps can not be done in assembly.   Only Machine can 
do it as it is hardwired to specific memory and does specific op-codes.

ASM creates all sorts of buffers and you can't execute the op-code 
stored in an address space.   It works on labels and relationships.
And address space is locked out from storing data into it and then 
branching off into it.  When working with a finite amount of memory
one does tricky code.  Try biorhythm  program that draws the graphs.
Fit that into 4k memory.  ASM won't a program of size into 4k.

I have the code that does it in 8080 code and have tried to write it in 
ASM in as tight a code but I had to create a different way to do the 
functions to get the code done.

Machine can do anything.  Assembly is rigid to rules and specific 
methods.  If you start with assembly or FORTRAN or COBOL or FORTH or 
Aida.... you get structured and write code as the language defines.

Martin

On 4/13/2015 9:41 AM, John Black wrote:
> In article <xMFWw.126830$WX4.26009@fx17.fr7>, lionslair@consolidated.net says...
>>
>> John they are not the same thing.
>>
>> In assembly one uses Text messages:
>>
>> NOP
>> JMP addr2
>>
>> where machine :
>> 000
>> 303 000 323
>>
>> The assembly is like a low level language.  Jumps are to labels...
>> In machine a jump is to binary 16 bit (in those days) addresses.
>>
>> Much different.  One complies Assembly into machine but fixes the code
>> with a text editor.
>>
>> Machine code is keyed in by hand, paper tape or loaded off disk.....
>>
>> Assembly becomes machine after a compile.  It is fixed in text editor
>> and compiled again until a final binary run program is found to work.
>>
>> Machine is written in executable code.  But on paper first and then
>> keyed in via switches or pads.
>>
>> Martin
>
> Martin, I am aware of what you say above.  I was responding to comments like these:
>
>>> I wrote some code that was impossible to copy into assembly.  It was too
>>> tight and efficient.
>
> There is no machine code that is "impossible" to code in assemby because it is "too tight and
> efficient".  Any code that can be written in machine code can be written in assembly more
> easily.  The assembler changes JMP to 303 which makes things more readable but changes
> nothing.  The assembler also converts labels to addresses and offsets but that is also a just
> matter of conviencence (so you don't have to do it by hand) - again, that changes nothing.
> The code will be just as tight and efficient unless the coder just doesn't know what he is
> doing.
>
> John Black
>
>> On 4/11/2015 11:21 PM, John Black wrote:
>>> In article <Qj0Ww.255068$bk5.103789@fx06.iad>, lionslair@consolidated.net says...
>>>> I wrote some code that was impossible to copy into assembly.  It was to
>>>> tight and efficient.   I was able to do assembly in larger memory due to
>>>> the restrictive rules of the compiler.
>>>>
>>>> My machine code was kept to under 4k instructions.  Above that I used
>>>> assembly as it was on a larger machine
>>>
>>> I probably shouldn't argue since we're getting off topic, but assembly code and machine code
>>> are the same thing.  Assembly can use op code names like "ld" instead of numbers and can uses
>>> labels instead of numeric addresses or offsets but there is a 1:1 correspondence between
>>> assembly instructions and machine code instructions.
>>>
>>> John Black
>>>
>
>


------------------------------

Date: Mon, 13 Apr 2015 21:33:34 -0700
From: "$Bill" <news@todbe.com>
Subject: Re: "Deep Recursion" warning on factorial script.
Message-Id: <mgi59e$254$1@dont-email.me>

On 4/13/2015 21:26, Martin Eastburn wrote:
> John the code that changes itself and uses the address space of the various call and jumps can not be done in assembly.   Only Machine can do it as it is hardwired to specific memory and does specific op-codes.

That depends entirely on the assembler.  A good, flexible assembler can duplicate
anything you can write without one and has the advantage of being readable.

My first language was IBM 360 BAL (ignoring Gotran and Fortran in college).

I worked with some Univac military computers whose assembly language could pretty
much replicate anything in machine code.


------------------------------

Date: Tue, 14 Apr 2015 13:20:56 +0200
From: Georg Bauhaus <bauhaus@futureapps.invalid>
Subject: Re: "Deep Recursion" warning on factorial script.
Message-Id: <mgit4u$b4v$1@dont-email.me>

On 13.04.15 18:58, Rainer Weikusat wrote:

(context: i = i + 1 is impossible maths.)

> That some people get the opportunity to force their ideas of 'sensible
> syntax' onto children at an early age

Well, you mentioned facts.

> IMHO, there is no "right" or "wrong" in this question, just people
> who assert that their opinion must be right because it's theirs.

Whatever creates a comparatively large number of mistakes per effort
and at the same time correlates with any proud elite's preferences
seems worth investigating, IMHO.


>> IOW, these people would still not produce a good structure?
>
> As a practical example, consider the following pseudo-code (losely
> inspired by SEAM):
>
> if (...) {

[big conditional mess]

> 	Imagine that you have in front of you the text of a well written
> 	program. Now take a pair of scissors and cut it up into random
> 	chunks. Shuffle them and stick back together with tape. The
> 	resulting program now doesn't work but to make it work all you
> 	have to do is put GOTO instructions at the start and end of each
> 	shuffled chunk to make the order of execution the same as it was
> 	before the shuffle.

[largely the same]

So what? Structure is not effortlessly arrived at by misunderstanding
structure as a compound of arbitrary quantities of if/while/callable-name.
That is, it doesn't stop at underlying mathematical simplification.
Hence my question. It helps, I thought, when a language adds at least:
if, while, sub, because these allows us to express in syntax what the
proud assemblist will achieve for himself with only discipline.

Check this: Perl has next, last, continue, and break while all of
them could be handled by just goto and discipline, right?




------------------------------

Date: Tue, 14 Apr 2015 15:32:49 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: "Deep Recursion" warning on factorial script.
Message-Id: <87h9si6ar2.fsf@doppelsaurus.mobileactivedefense.com>

Georg Bauhaus <bauhaus@futureapps.invalid> writes:

[assignment vs predicate, 1:1 Endlostand]

I'm going to ignore this except mentioning that

3 = 4

is no less peverse than

i = i + 1

One could argue that it's actually more peverse because the assertion
will be true after the assignment was executed while the obvious
contradiction in terms will remain a contradiction forever.

>>> IOW, these people would still not produce a good structure?
>>
>> As a practical example, consider the following pseudo-code (losely
>> inspired by SEAM):
>>
>> if (...) {
>
> [big conditional mess]
>
>> 	Imagine that you have in front of you the text of a well written
>> 	program. Now take a pair of scissors and cut it up into random
>> 	chunks. Shuffle them and stick back together with tape. The
>> 	resulting program now doesn't work but to make it work all you
>> 	have to do is put GOTO instructions at the start and end of each
>> 	shuffled chunk to make the order of execution the same as it was
>> 	before the shuffle.
>
> [largely the same]
>
> So what? Structure is not effortlessly arrived at by misunderstanding
> structure as a compound of arbitrary quantities of if/while/callable-name.
> That is, it doesn't stop at underlying mathematical simplification.
> Hence my question. It helps, I thought, when a language adds at least:
> if, while, sub, because these allows us to express in syntax what the
> proud assemblist will achieve for himself with only discipline.

I think "Computer says yes" is not so important here. Higher-level
constructs enable expressing meaning instead of mechanics which makes
them possibly easier to understand (and thus, also more powerful), ie,
they're useful because of their actual positive qualities (what they enable)
instead of because their fictional negative qualities (what someone
hoped they'd prevent).


------------------------------

Date: Tue, 14 Apr 2015 06:08:40 -0700
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: -d test recognizes Windows 8.1 SYMLINKD as being directory???
Message-Id: <qLGdndMncLTFjLDInZ2dnUVZ57ydnZ2d@giganews.com>


Yikes.

I was just testing my subroutine "RecurseDirs", which
recursively navigates a directory tree down from the
current directory and applies a given subroutine to each
subdirectory. My test program uses a "print current directory"
function as the subroutine to apply to each node:


#! /usr/bin/perl
#  /rhe/scripts/test/sub-test.perl
use v5.14;
use strict;
use warnings;
use Cwd;
use RH::Dir;
sub f {
    my $dir = getcwd();
    say $dir;
}
RecurseDirs(\&f);


The results were what you'd expect, except near the bottom
of the listing. Check this out:


/rhe/.idlerc
/rhe/archives/code-samples/graphics-code
/rhe/archives/code-samples/MenuCode/ftp.oreilly.com-examples-windows-outlook.annoy
/rhe/archives/code-samples/MenuCode
 ... etc ...
 ... etc ...
 ... etc ...
 ........... hundreds more lines ................
 ... etc ...
 ... etc ...
 ... etc ...
/rhe/src/math
/rhe/src/test
/rhe/src/util
/rhe/src
WARNING!!! Filename gargoyle re-uses an existing inode!
/rhe/test-zone/tarsier
/rhe/test-zone
/rhe


The item mentioned, "gargoyle", isn't really a directory at all,
but a Windows "SYMLINKD" object. But Perl's -d test recognizes
it as being a "directory" anyway. That's problematic, because
THIS:

    /rhe/test-zone/tarsier/gargoyle

is just a SYMLINKD leading back to THIS:

    /rhe/test-zone

Which would have caused infinite recursion in RecurseDirs.
Good thing I thought to include a check for inode reusage:


#! /usr/bin/perl
#  /lib/perl5/5.14/RH/Dir.pm
package RH::Dir;
use v5.14;
use strict;
use warnings;
use Cwd;
use open qw( :encoding(utf8) :std );

our @inodes;

sub is_new_inode ($) {
    my $inode = shift;
    foreach (@inodes) {
       if ($_ == $inode) {
          return 0;
       }
    }
    return 1;
}

sub GetSubdirs ($) {
    my $dirpath = shift;
    my $dirhandle;
    my @filerecords;
    opendir($dirhandle, $dirpath) or die "Can\'t open directory \"$dirpath\". $!.";
    FILE: while (my $filename=readdir($dirhandle))
    {
       next FILE if not -d $filename;
       my ($dev, $inode, $mode, $nlink, $size, $atime, $mtime, $ctime)
          = (stat($filename))[0,1,2,3,7,8,9,10];
       next FILE if "." eq $filename;
       next FILE if ".." eq $filename;
       if (!is_new_inode($inode)) {
          say("WARNING!!! Filename $filename re-uses an existing inode!");
          next FILE;
       }
       push @inodes, $inode;
       push @filerecords,
          {
             "Name"  => $filename,
             "Dev"   => $dev,
             "Inode" => $inode,
             "Mode"  => $mode,
             "Nlink" => $nlink,
             "Size"  => $size,
             "Atime" => $atime,
             "Mtime" => $mtime,
             "Ctime" => $ctime
          };
    }
    closedir($dirhandle);
    return \@filerecords;
}

sub RecurseDirs {
    my $f = shift;
    my $DirPath = getcwd();                         # Current Working Directory
    my $subdirs = RH::Dir::GetSubdirs($DirPath);    # ref to array
    if (scalar(@{$subdirs}) > 0) {                  # if any subdirs exist
       foreach (@{$subdirs}) {                      # iterate through them
          chdir $_->{Name};                         # chdir to each
          RecurseDirs($f);                          # recurse
          chdir "..";                               # pop back up to cur dir
       }
    }
    $f->();                                         # execute f
}


Probably this problem is due to Perl's -d test getting its
info from Cygwin's Bash, which apparently reports a SYMLINKD
object as being just a "directory". Which is bad, because
unlike actual directories, SYMLINKD objects can link back
to an ancestor, throwing directory-walking programs into
infinite recursions, if the programmer doesn't take precautions.


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley


------------------------------

Date: Tue, 14 Apr 2015 17:34:42 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: -d test recognizes Windows 8.1 SYMLINKD as being directory???
Message-Id: <87618y653x.fsf@doppelsaurus.mobileactivedefense.com>

Robbie Hatley <see.my.sig@for.my.address> writes:
> I was just testing my subroutine "RecurseDirs", which
> recursively navigates a directory tree down from the
> current directory and applies a given subroutine to each
> subdirectory. My test program uses a "print current directory"
> function as the subroutine to apply to each node:
>
>
> #! /usr/bin/perl
> #  /rhe/scripts/test/sub-test.perl
> use v5.14;
> use strict;
> use warnings;
> use Cwd;
> use RH::Dir;
> sub f {
>    my $dir = getcwd();
>    say $dir;
> }
> RecurseDirs(\&f);
>
>
> The results were what you'd expect, except near the bottom
> of the listing. Check this out:
>
>
> /rhe/.idlerc
> /rhe/archives/code-samples/graphics-code
> /rhe/archives/code-samples/MenuCode/ftp.oreilly.com-examples-windows-outlook.annoy
> /rhe/archives/code-samples/MenuCode
> ... etc ...
> ... etc ...
> ... etc ...
> ........... hundreds more lines ................
> ... etc ...
> ... etc ...
> ... etc ...
> /rhe/src/math
> /rhe/src/test
> /rhe/src/util
> /rhe/src
> WARNING!!! Filename gargoyle re-uses an existing inode!
> /rhe/test-zone/tarsier
> /rhe/test-zone
> /rhe

That's consistent with the behaviour on Linux (and very likely
elsewhere),

[rw@doppelsaurus]/tmp#mkdir dir
[rw@doppelsaurus]/tmp#ln -s dir alias
[rw@doppelsaurus]/tmp#perl -e 'print -d("alias"), "\n";'
1

and also what's usually wanted because the whole point of having a
symlink is that the 2nd name can be used instead of the first. Because
of this, Perl defaults to using stat to determine the information for
the filetest operators. It's possible to use lstat (=> stat(2),
lstat(2)) instead calling it explicitly and then use the -X with _ as
argument,

[rw@doppelsaurus]/tmp#perl -e 'lstat("alias"); print -d(_), "\n";'

^^^^
 As can be seen, nothing can be seen here

or use the lstat return values directly.


------------------------------

Date: Tue, 14 Apr 2015 21:42:10 +0300
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: -d test recognizes Windows 8.1 SYMLINKD as being directory???
Message-Id: <mgjn23$t76$1@news.grnet.gr>

Node recursion at windows with sym links support.
You need to place somewhere at your path the
free microsoft utility    junction.exe






#!/usr/bin/perl
use strict;
use warnings;
use constant FOLLOW_SYM_LINKS => 1;

recursive_dir('g:/LMDE 2 Betsy');


sub recursive_dir
{
opendir my $dh, $_[0] or die "Could not read dir \"$_[0]\" because 
\"$!\"\n";

	foreach (grep ! /^\.{1,2}$/, readdir $dh)
	{
	my $fullpath = "$_[0]/$_";

	 	if (-d $fullpath)
		{
		my $r = qx[junction.exe -q "$fullpath" 2>&1] =~/(?i)No reparse points 
found/ ?0:1;
		next unless  (( ! $r ) || FOLLOW_SYM_LINKS);
		
		print "DIR: $fullpath\n";
		recursive_dir($fullpath)
		}
		elsif (-f $fullpath)
		{
		print "FIL: $fullpath\n"
		}
	}

closedir $dh
}


------------------------------

Date: Mon, 13 Apr 2015 22:14:28 +0200
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: Fun With Unicode
Message-Id: <slrnmio8t4.nlp.hjp-usenet3@hrunkner.hjp.at>

On 2015-04-13 10:45, Robbie Hatley <see.my.sig@for.my.address> wrote:
> Firstly, I discovered that the first line of the original file
> started with "Byte Order Mark" or "BOM",

Windows programs have the annoying habit of writing a BOM at the start
of UTF-8 encoded files even though there is no byte order that would
mark.

> which was "\x{EFBBBF}".

No.

> My attempts to remove that were failing. But after doing some
> research I discovered that the representation of the BOM
> inside Perl is *NOT* the same as the bytes in the file. A Unicode
> BOM at the beginning of a file is EFBBBF, but the internal
> representation in Perl is "\x{FEFF}", or "\N{BOM}" for short.

Yes. BOM is U+FEFF. 

UTF-8 is a way to map Unicode code points (e.g. U+FEFF) to sequences of
bytes (e.g. EF BB BF). The mapping has a number of desirable properties
(e.g. characters U+00 .. U+007F are encoded as a single byte and
compatible with US-ASCII, sequences are self-terminating, partial
sequences are detectable), but it is not a byte-by-byte mapping of a
binary representation of the Unicode code point. You can't just
concatenate the hex values of some UTF-8 bytes to get a Unicode code
point, you have to compute it properly. See
http://en.wikipedia.org/wiki/UTF-8#Description 

        hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpa�t. -- Ralph Babel


------------------------------

Date: Mon, 13 Apr 2015 17:44:42 -0400
From: Shmuel (Seymour J.) Metz <spamtrap@library.lspace.org.invalid>
Subject: Re: Fun With Unicode
Message-Id: <552c38ca$9$fuzhry+tra$mr2ice@news.patriot.net>

In <s9GdnXonTOPXA7bInZ2dnUVZ57ydnZ2d@giganews.com>, on 04/13/2015
   at 03:45 AM, Robbie Hatley <see.my.sig@for.my.address> said:

>the text in question is encoded in UTF-8.

But noot appropriately.

>Firstly, I discovered that the first line of the original file
>started with "Byte Order Mark" or "BOM", which was "\x{EFBBBF}".

No, that the transform, note the code page, for U+FEFF "ZERO WIDTH
NO-BREAK SPACE"

>A Unicode BOM at the beginning of a file is EFBBBF,

What gives you that idea? It has never been and never will be.

>chomp was removing the "\x0a" from "\x0d0a" and leaving the 
>"\x0d" behind.

That's because you're using a system for which /n is LF (0a), not CR
LF.

See 6.  Byte order mark (BOM) in RFC 3629.


-- 
Shmuel (Seymour J.) Metz, SysProg and JOAT  <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action.  I reserve the
right to publicly post or ridicule any abusive E-mail.  Reply to
domain Patriot dot net user shmuel+news to contact me.  Do not
reply to spamtrap@library.lspace.org



------------------------------

Date: Mon, 13 Apr 2015 12:44:55 -0700
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: Regex replace line breaks (correction)
Message-Id: <GPydnWiXG_8qgbHInZ2dnUVZ572dnZ2d@giganews.com>


Oops. Yesterday I wrote:

> ASCII:
> CR = \r = \x0d
> LF = \n = \x0a
>
> EBCDIC:
> CR = \r = \x0d
> LF = \n = \x25

But that's not quite correct. While EBCDIC does have CR and LF
characters, it doesn't use them for newline. Instead, unlike
ASCII, ISO-8859-1, or UTF-8 (which do *not* have a dedicated
"newline" character), EBCDIC uses a character called NL=NewLine,
encoding x\15.

So instead of the above, I should have written:

ASCII (on Unix):
    Perl Entity:     ASCII entity:            Encoding:
    \r ("Return" ) = CR ("Carriage Return") = \x0d
    \n ("Newline") = LF ("Line Feed"      ) = \x0a

1252 (on Windows):
    Perl Entity:     1252 entity:             Encoding:
    \r ("Return" ) = CR ("Carriage Return") = \x0d
    \n ("Newline") = CRLF ("Carriage Return, Line Feed") = \x0d\x0a

EBCDIC (on certain IBM mainframes):
    Perl Entity:     EBCDIC entity:           Encoding:
    \r ("Return" ) = CR ("Carriage Return") = \x0d
    \n ("Newline") = NL ("New Line"       ) = \x15
    None           = LF ("Line Feed"      ) = \x25

Which is why I'm saying that if one is writing a Perl script for
a particular target machine, one should not use \n as "newline",
because that assumes "development machine" == "target machine",
which might not be true, and makes the script non-portable.
Instead, better to end lines with the numerical code for "newline"
on the target computer: \x9B or \x15 or \x{0d0a} or whatever it
happens to be.


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley


------------------------------

Date: Tue, 14 Apr 2015 10:08:13 +0300
From: Eric Pozharski <whynot@pozharski.name>
Subject: Re: Regex replace line breaks
Message-Id: <slrnmipf6t.jf6.whynot@orphan.zombinet>

with <N-6dnVhqGcxBNbbInZ2dnUVZ572dnZ2d@giganews.com> Robbie Hatley
wrote:
> On 4/12/2015 12:14 AM, Eric Pozharski wrote:

*SKIP*  [ Your sarcasm is so sarcastic ]
> So why are you promoting s/\v+/\n/g as a solution to the OP's stated
> problem? It's not.

Because it's Perl.

*CUT*

-- 
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4414
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[33136] in Perl-Users-Digest

Perl-Users Digest, Issue: 4414 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Tue Apr 14 16:09:19 2015

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Tue Apr 14 16:09:19 2015