[33132] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4409 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Apr 8 06:09:19 2015

Date: Wed, 8 Apr 2015 03:09:05 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 8 Apr 2015     Volume: 11 Number: 4409

Today's topics:
        "Deep Recursion" warning on factorial script. <see.my.sig@for.my.address>
    Re: "Deep Recursion" warning on factorial script. <gamo@telecable.es>
    Re: One more reason I like Perl. <see.my.sig@for.my.address>
    Re: One more reason I like Perl. <see.my.sig@for.my.address>
    Re: One more reason I like Perl. <rweikusat@mobileactivedefense.com>
    Re: One more reason I like Perl. <bauhaus@futureapps.invalid>
    Re: One more reason I like Perl. <rweikusat@mobileactivedefense.com>
    Re: One more reason I like Perl. (Jens Thoms Toerring)
        Perl Architect (Modern Perl) <chrisvalentine1970@gmail.com>
    Re: Regex replace line breaks <see.my.sig@for.my.address>
    Re: Regex replace line breaks <see.my.sig@for.my.address>
    Re: Regex replace line breaks <news@todbe.com>
    Re: Regex replace line breaks <see.my.sig@for.my.address>
    Re: Regex replace line breaks <gamo@telecable.es>
    Re: Regex replace line breaks <rweikusat@mobileactivedefense.com>
    Re: Regex replace line breaks <see.my.sig@for.my.address>
    Re: Regex replace line breaks <whynot@pozharski.name>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Tue, 07 Apr 2015 18:08:37 -0700
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: "Deep Recursion" warning on factorial script.
Message-Id: <7qGdnSECMLoT4rnInZ2dnUVZ572dnZ2d@giganews.com>


I cooked-up and ran the following script today and it worked fine,
but gave a "Deep Recursion on line 19" warning (refering to the
line that calls the subroutine "Factorial"). I had to put in the
'no warnings "recursion"' line to get Perl to stop complaining:

#! /usr/bin/perl
#  factorial-table-100
use v5.14;
use strict;
use warnings;
use Math::BigInt;
no warnings "recursion";
sub Factorial ($);
for ( my $i = 1 ; $i <= 100 ; ++$i ) {
    my $x = Math::BigInt->new($i);
    printf("%3d! = %158s\n", $x, Factorial($x));
}
exit 0;
sub Factorial ($) {
    my $x = shift;
    return (1 == $x) ? 1 : $x * Factorial($x - 1);
}

Is there a way to warn Perl in advance that one is going to
put, say, 100 levels of recursive function calls on the stack,
so it should prepare for that? Or is saying 'no warnings "recursion"'
the only way to shut-up the warnings?


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley


------------------------------

Date: Wed, 08 Apr 2015 06:11:59 +0200
From: gamo <gamo@telecable.es>
Subject: Re: "Deep Recursion" warning on factorial script.
Message-Id: <mg29r3$db2$1@speranza.aioe.org>

El 08/04/15 a las 03:08, Robbie Hatley escribió:
>
> I cooked-up and ran the following script today and it worked fine,
> but gave a "Deep Recursion on line 19" warning (refering to the
> line that calls the subroutine "Factorial"). I had to put in the
> 'no warnings "recursion"' line to get Perl to stop complaining:
>
> #! /usr/bin/perl
> #  factorial-table-100
> use v5.14;
> use strict;
> use warnings;
> use Math::BigInt;
> no warnings "recursion";
> sub Factorial ($);
> for ( my $i = 1 ; $i <= 100 ; ++$i ) {
>     my $x = Math::BigInt->new($i);
>     printf("%3d! = %158s\n", $x, Factorial($x));
> }
> exit 0;
> sub Factorial ($) {
>     my $x = shift;
>     return (1 == $x) ? 1 : $x * Factorial($x - 1);
> }
>
> Is there a way to warn Perl in advance that one is going to
> put, say, 100 levels of recursive function calls on the stack,
> so it should prepare for that? Or is saying 'no warnings "recursion"'
> the only way to shut-up the warnings?
>
>

It seems that the recursive calls counter is an acumulator.

Try this instead:

#!/usr/bin/perl -w

use bigint;
use Memoize;

memoize 'factorial';

for (1..200) {
     print "$_ ", factorial($_), "\n";
}

sub factorial {
     my $i = shift;
     return 1 if $i <2;
     return $i*factorial($i-1);
}





-- 
http://www.telecable.es/personales/gamo/
The generation of random numbers is too important to be left to chance


------------------------------

Date: Mon, 06 Apr 2015 23:46:13 -0700
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: One more reason I like Perl.
Message-Id: <4YadnYNoypiz4L7InZ2dnUVZ572dnZ2d@giganews.com>


On 4/2/2015 2:30 AM, Jens Thoms Toerring wrote:

> I'd rather not be an unsuspecting user of one of your programs;-)
> A program that segfaults has a serious defect that must be cor-
> rected.

The C++ version of my "substitute" script I threw together as
quickly as I could (a couple of hours), with no debugging at all.
It seems to kinda "work", but when given invalid input, obviously
fails in a much more violent way than the Perl version does.

Which is one advantage Perl (and other interpreted languages) has
over languages which compile source code to frozen binaries:
"real-time input nanny" functionality. Basically, when putting
input into evals, Perl invokes its compiler in a "JIT" fashion,
so you get compiler errors in real-time, as you enter data.

Whereas, an exe program can't do that, because it has no compiler
available at run time, so it just *assumes* the input is valid,
and when it isn't, it does violent things such as tries to write
to memory it doesn't own.

> It's no substitute for checking the validity of the arguments

Isn't it? Seems to me that Perl did a reasonably good job of
argument validity checking even when the programmer (me) didn't.

> since it just tells the user that the programmer has
> made a mistake.

Oh, it says more than that. The Perl version gave some very
detailed statements about the nature of the errors in the
input. (Though in language only a Perl programmer could
understand, admittedly.)

And one can say that it's actually the user that make the
"mistake", because the nature of the program is that it will
only work correctly when it's "user" is a Perl programmer
well-versed in regular expressions. That's it's target audience.

> And having a script spit out these messages from
> the Perl interpreter is perhaps nice during development,
> but it's not a "user interface".

Depends on which "users". This is a loaded gun, not a nerf ball.
It's not safe for kids, crazies, crooks, or newbs at Perl or REs.
But a sweet tool for someone who knows Perl and REs well and needs
to do massive substitutions in files in a hurry. Change "cat" to
"dog" and "Cat" to "Dog" globally in a 837MB text file? No problem.

(First, lets change the name of the script to "s" after s///,
to keep command lines shorter.)

s 'cat' 'dog' 'g' < input.txt | s 'Cat' 'Dog' 'g' > output.txt

> So, what you've demonstrated, as far as I can see, is that it's
> a bit easier to write small throw-away programs in Perl than in
> C++. But nobody ever had any doubts about that, I guess...

If I ever did, this exercise removed them.  Lordy, must have took
me about 2 hours to convert that simple 5min script from Perl to
C++, and it didn't even work as well. Not the language for that
kind of thing, obviously. Like using a battleship to go squirrel
hunting.


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley


------------------------------

Date: Mon, 06 Apr 2015 23:46:36 -0700
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: One more reason I like Perl.
Message-Id: <4YadnYJoypjZ4L7InZ2dnUVZ570AAAAA@giganews.com>


On 4/2/2015 5:26 AM, Rainer Weikusat wrote:

> ... a sensible user interface should check whether arguments from outside
> the program make any sense[*] prior to trying to use them and should print
> intelligible error messages enabling correction of the problem but
> technically, this remains an operator error.
>
> [*] ... because the 'heartbleed' publicity stunt could have been avoided
> in this way :->

Well, ya. Admittedly, my "substitute.perl" script is like a fully-loaded
Winchester 30-06 hunting rifle. I leave it up to the user to point it in
the right direction and not shoot something he shouldn't. :-) Definitely
not a security-conscious program. And to make it so would take vastly
longer than the 5min I spent writing it. It's intended for mature
Perl programmers with thorough knowledge of Perl and REs, and no ill
intent.


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley


------------------------------

Date: Tue, 07 Apr 2015 11:49:26 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: One more reason I like Perl.
Message-Id: <87384c9ps9.fsf@doppelsaurus.mobileactivedefense.com>

Robbie Hatley <see.my.sig@for.my.address> writes:
> On 4/2/2015 2:30 AM, Jens Thoms Toerring wrote:

[...]

> Depends on which "users". This is a loaded gun, not a nerf ball.
> It's not safe for kids, crazies, crooks, or newbs at Perl or REs.
> But a sweet tool for someone who knows Perl and REs well and needs
> to do massive substitutions in files in a hurry. Change "cat" to
> "dog" and "Cat" to "Dog" globally in a 837MB text file? No problem.
>
> (First, lets change the name of the script to "s" after s///,
> to keep command lines shorter.)
>
> s 'cat' 'dog' 'g' < input.txt | s 'Cat' 'Dog' 'g' > output.txt

[rw@doppelsaurus]/tmp#cat a
cat
Dog Cat
cat dog
Cat cat Cat
[rw@doppelsaurus]/tmp#perl -pe 's/cat/dog/g; s/Cat/Dog/g' a
dog
Dog Dog
dog dog
Dog dog Dog

or use a single match

[rw@doppelsaurus]/tmp#perl -pe 's/([Cc])at/chr(ord($1)+1)."og"/eg' a
dog
Dog Dog
dog dog
Dog dog Dog

or do it in place

[rw@doppelsaurus]/tmp#perl -i -pe 's/([Cc])at/chr(ord($1)+1)."og"/eg' a
[rw@doppelsaurus]/tmp#cat a
dog
Dog Dog
dog dog
Dog dog Dog

NB: This (reportedly) won't work on "largish IBM computers" someone
might find lieing around somewhere.


------------------------------

Date: Tue, 07 Apr 2015 13:15:16 +0200
From: "G.B." <bauhaus@futureapps.invalid>
Subject: Re: One more reason I like Perl.
Message-Id: <mg0e6d$a97$1@dont-email.me>

On 07.04.15 12:49, Rainer Weikusat wrote:
> [rw@doppelsaurus]/tmp#perl -pe 's/([Cc])at/chr(ord($1)+1)."og"/eg' a
> dog
> Dog Dog
> dog dog
> Dog dog Dog

:)

Or, if one wanted a quick transformation of cats into dogs:

$ perl -pe 'tr/Ccat/Ddog/' a




------------------------------

Date: Tue, 07 Apr 2015 14:57:31 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: One more reason I like Perl.
Message-Id: <87wq1o11o4.fsf@doppelsaurus.mobileactivedefense.com>

"G.B." <bauhaus@futureapps.invalid> writes:
> On 07.04.15 12:49, Rainer Weikusat wrote:
>> [rw@doppelsaurus]/tmp#perl -pe 's/([Cc])at/chr(ord($1)+1)."og"/eg' a
>> dog
>> Dog Dog
>> dog dog
>> Dog dog Dog
>
> :)
>
> Or, if one wanted a quick transformation of cats into dogs:
>
> $ perl -pe 'tr/Ccat/Ddog/' a

The idea is neat but it comes with certain side effects:

[rw@doppelsaurus]/tmp#cat a
Cornish pasty
[rw@doppelsaurus]/tmp#perl -pe 'tr/Ccat/Ddog/' a
Dornish posgy


------------------------------

Date: 8 Apr 2015 10:02:45 GMT
From: jt@toerring.de (Jens Thoms Toerring)
Subject: Re: One more reason I like Perl.
Message-Id: <cokcm3F8enqU1@mid.uni-berlin.de>

Robbie Hatley <see.my.sig@for.my.address> wrote:

> On 4/2/2015 2:30 AM, Jens Thoms Toerring wrote:

> > I'd rather not be an unsuspecting user of one of your programs;-)
> > A program that segfaults has a serious defect that must be cor-
> > rected.

> The C++ version of my "substitute" script I threw together as
> quickly as I could (a couple of hours), with no debugging at all.
> It seems to kinda "work", but when given invalid input, obviously
> fails in a much more violent way than the Perl version does.

If it took you several hours then you must have counted in
the time for writing and debugging your own regex handling
library. riting that was your own decision and completely
unrelated to the task at hand since there are enough of them
out there you could have used and thus shortened development
time by 99%.

A C++ program comparable to your Perl script (but already
doing a bit more of argument checking, actually all that
is necessary) would look like this

#include <iostream>
#include "rhregex.h"

int main( int argc, char * argv[ ] ) {
    if ( argc < 3 ) {
        std::cerr << "Need at least 2 arguments: pattern and replacement\n";
        return 1;
    }

    char flag = ( argv[3] && *argv[3] == 'g' ) ? 'g' : 'n';
    std::string  line;

    while ( std::getline( cin, line ) )
        std::cout << rhregex::Substitute( argv[1], argv[2], line, flag )
                  << std::endl;
    return 0;
}

Of course, a help() function to print out information on what
the program is supposed to do and what input to supply to it
would be nice, but your Perl script does that neither.

Don't tell me that writing and compiling this would take more than
a couple of minutes. Not much longer or harder to read than your
Perl script. And it won't segfault unless you made some mistakes
in your rhregex::Substitute() function (which you can't blame the
language for).

> Which is one advantage Perl (and other interpreted languages) has
> over languages which compile source code to frozen binaries:
> "real-time input nanny" functionality. Basically, when putting
> input into evals, Perl invokes its compiler in a "JIT" fashion,
> so you get compiler errors in real-time, as you enter data.

That's all nice for short throw-away programs no-one else is
ever going to have to use. But with e.g. a program that takes
hours to run I'd much prefer getting a warning that something
looks fishy before I start it than after it had run for 5 hours
and only then bails out... And, ok, you can't do 'eval' in C++,
but how often is that really needed? You get some advantages
when using an interpreted language and you trade them for some
other problems (like: what do you do if there's no Perl inter-
preter on the machine? How much slower is a Perl script compared
to a compiled program? etc.).

> Whereas, an exe program can't do that, because it has no compiler
> available at run time, so it just *assumes* the input is valid,
> and when it isn't, it does violent things such as tries to write
> to memory it doesn't own.

No, that's a clear sign that the programmer f**ked up. Don't
blaim the tool for the incompetence of the user of the tool.
Some people simply shouldn't be left alone in a room with a
chain saw (or a kitchen knife, in some cases;-). No program
ever should assume that external input is what it wants, it
always has to do those checks, regardless of what language is
used. Not doing that is either incompetence or laziness (or
both) and has led to vast problems. There's a reason why Perl
has the "taint" mode, I guess;-)

> > It's no substitute for checking the validity of the arguments

> Isn't it? Seems to me that Perl did a reasonably good job of
> argument validity checking even when the programmer (me) didn't.

No, all the Perl interpreter did was dumping some report about
it's internal state, telling you that you didn't do your job
of input validation and left it to you to figure out what had
gone wrong.

As I already said, this is nice during development, but no "user
interface". Your script is quite short, so it's relatively easy
to spot what may have gone wrong. But take something more com-
plicated and try to use it in 6 months time when you've forgotten
what you had done there and you'll probably have a hard time
understanding what these error messages from the Perl interpreter
are supposed to tell you.

> > since it just tells the user that the programmer has
> > made a mistake.

> Oh, it says more than that. The Perl version gave some very
> detailed statements about the nature of the errors in the
> input. (Though in language only a Perl programmer could
> understand, admittedly.)

All of which a user of your program has no need to know.
If your Perl script spits out such stuff YOU have messed up.

> And one can say that it's actually the user that make the
> "mistake"

That's something I violently disagree with. A user should not
be able to make any mistakes that aren't caught by the program
- which then should give a hopefully clear explanation what the
user is supposed to supply as input instead. Blaiming the user
for being too lazy to write those checks is unfortunately com-
mon, but is utterly wrong.

> , because the nature of the program is that it will
> only work correctly when it's "user" is a Perl programmer
> well-versed in regular expressions. That's it's target audience.

That reduces your user base considerably, doesn't it?
If Perl scripts were all written only for the use by
experienced Perl programmers there wouldn't be (m)any, I
guess...
                         Regards, Jens
-- 
  \   Jens Thoms Toerring  ___      jt@toerring.de
   \__________________________      http://toerring.de


------------------------------

Date: Mon, 6 Apr 2015 08:38:15 -0700 (PDT)
From: C Valentine <chrisvalentine1970@gmail.com>
Subject: Perl Architect (Modern Perl)
Message-Id: <a17963b4-d881-44e6-9519-d54fb9e44154@googlegroups.com>

Job Summary 
To build software applications in an agile environment and to provide the technical architecture of web-based software products with Modern Perl.

Perl Architect/Developer

Responsibilities
* Designing and/or building of prototypes for new features 
* Developing/updating online documentation
* Liaising with development partners on all aspects of the SDLC and integration

Skills and requirements
* Experience as a Perl Architect/Developer 
* Solid understanding of analysis, coding and documentation practices
* Analysing needs and product requirements to create a design
* Modern object orientated Perl (Moose, Catalyst)
* MySQL, ORMs (DBIx::Class, Class::DBI, Rose::DB, CPAN)

Nice to Haves:
* Full stack development experience
* PSGI Frameworks
* Continuous Integration / Test Driven develop with Siesta or Selenium
* Puppet/Jenkins/DevOPS tools             
* Postgres/Sybase/Mysql/memcache/activemq/Git
* Load balanced high traffic transaction web applications


Thanks,

Chris 
VP of Consulting Services

Credent Technologies, LLC
chris@credenttech.com
Work: (972) 891-3053


------------------------------

Date: Mon, 06 Apr 2015 22:13:05 -0700
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: Regex replace line breaks
Message-Id: <QvCdnf4LzarH-r7InZ2dnUVZ572dnZ2d@giganews.com>


On 4/4/2015 7:41 PM, Robert Crandal wrote:

> I am reading and storing an entire text file into a single
> string variable.  My goal is to replace all line feed (LF) or
> carriage return (CR) characters with a single CR
> character.

Are you sure? On most systems, that won't do what you think it does.

On Unix, Linux, and Cygwin, your lines will all overprint
each other, because they take you at your word, and do NOT
give you line feeds between lines, but just keep returning
the carriage (cursor, actually) to the left of the line
and over-writing.

On Microsoft OSs and software, CR alone does nothing. In Notepad,
for example, it just prints your text as one mammoth line,
with no line breaks at all.

But if you want to do it, one way is like so:

#! /usr/bin/perl
#  /rhe/scripts/test/line-end-adjust-test.perl
use v5.14;
use strict;
use warnings;
my $var  = "111111111111111\x0d";
    $var .= "222222222222222\x0a";
    $var .= "333333333333333\x0a\x0a";
    $var .= "444444444444444\x0d\x0d";
    $var .= "555555555555555\x0d\x0a";
    $var .= "666666666666666\x0a\x0d";
    $var .= "777777777777777\x0d\x0d\x0a\x0a";
    $var .= "888888888888888\x0d\x0a\x0a\x0d";
$var =~ s/(?:\x0d|\x0a)+/\x0d/g; # Convert all line ends to \x0d
print $var;


I redirected the output to a file and opened the file in a
hex editor, and the contents looked like THIS:

31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 0D
32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 0D
33 33 33 33 33 33 33 33 33 33 33 33 33 33 33 0D
34 34 34 34 34 34 34 34 34 34 34 34 34 34 34 0D
35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 0D
36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 0D
37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 0D
38 38 38 38 38 38 38 38 38 38 38 38 38 38 38 0D

Which proves that my RE is converting all those different
clusterings of \x0a and \x0d into just one \0d.

But again, that's probably NOT what you want.
If I cat the file in Cygwin, I just get:

888888888888888

(That's all 8 lines, but you only see the last, because they over-write
each other.)

And if I look at the file in Notepad, I see:

111111111111111222222222222222333333333333333444444444444444555... (etc)

(That is, no line breaks at all.)

Whereas, if I use \x0a instead of \x0d, at least in Cygwin it prints
correctly:

111111111111111
222222222222222
333333333333333
444444444444444
555555555555555
666666666666666
777777777777777
888888888888888

So perhaps what you really want is the following?

$var =~ s/(?:\x0d|\x0a)+/\x0a/g; # Convert all line ends to \x0a


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley


------------------------------

Date: Mon, 06 Apr 2015 22:26:45 -0700
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: Regex replace line breaks
Message-Id: <vZqdnQ63tfQQ977InZ2dnUVZ57ydnZ2d@giganews.com>


On 4/4/2015 8:25 PM, gamo wrote:

> El 05/04/15 a las 04:41, Robert Crandal escribió:
>> I am reading and storing an entire text file into a single
>> string variable.  My goal is to replace all line feed (LF) or
>> carriage return (CR) characters with a single CR
>> character.
>>
>> Basically, I need to ensure that all paragraphs in my
>> string are single-spaced with a single CR character.
>>
>> The problem is, I am reading text files from different sources,
>> so I am seeing different representations of "line breaks".
>> Sometimes it is just CR, or CR LF, or CR CR LF, etc...
>> In byte form, a CR equals hex character "0D" and
>> LF is hex character "0A".
>>
>> As an example, suppose my string is:
>>
>> "Hello world!<CRLF><CRLF>Bye world!"
>>
>> I want to change it to:
>>
>> "Hello world!<CR>Bye world!"  // single spaced.
>>
>> This seems to be a job for regular expressions, especially
>> since there are different ways to represent line breaks.
>> For my purposes, assume that a single line break may be
>> any of these:  CR,  CF LF, or CR CR LF.
>>
>> How can I single-space my string with regular expressions?
>>
>>
>
> It seems easy. Try to slurp in a variable all the text, and then
> substitute \r by nothing.
>
> (untested)
>
> local $/="";
>
> my $var = <>;
>
> $var =~ s/\r\r/\r/g;
> $var =~ s/\r\n/\n/g;

I believe that the ordinals of '\r' and '\n' are implementation dependent.
They *usually* equate to '\x0d' and '\x0a' respectively, but seem to recall
reading that that's not true on all systems.

Also, your REs don't handle all possible clusterings.

I think the following would be more portable and more complete, if a person
really wanted to change each cluster of \x0a's and/or \x0d's into \x0d :

$var =~ s/(?:\x0d|\x0a)+/\x0d/g;

Though, on Unix, Linux, and Cygwin, \x0a would work better.

And with Microsoft software, \x0d\x0a works better.

(Personally, though, I use \x0a as line end on text files even in Windows,
because I use Notepad++, which understands Unix line endings, and also
has other nifty features such as syntax highlighting for Perl, C++, etc.
That way Perl scripts I write on my Windows notepad work fine on a Linux
desktop unaltered.)


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley


------------------------------

Date: Mon, 06 Apr 2015 22:59:53 -0700
From: "$Bill" <news@todbe.com>
Subject: Re: Regex replace line breaks
Message-Id: <mfvrn4$8dk$1@dont-email.me>

On 4/6/2015 22:13, Robbie Hatley wrote:
>
> But if you want to do it, one way is like so:
>
> #! /usr/bin/perl
> #  /rhe/scripts/test/line-end-adjust-test.perl
> use v5.14;
> use strict;
> use warnings;
> my $var  = "111111111111111\x0d";
>     $var .= "222222222222222\x0a";
>     $var .= "333333333333333\x0a\x0a";
>     $var .= "444444444444444\x0d\x0d";
>     $var .= "555555555555555\x0d\x0a";
>     $var .= "666666666666666\x0a\x0d";
>     $var .= "777777777777777\x0d\x0d\x0a\x0a";
>     $var .= "888888888888888\x0d\x0a\x0a\x0d";
> $var =~ s/(?:\x0d|\x0a)+/\x0d/g; # Convert all line ends to \x0d
> print $var;

 ...

> So perhaps what you really want is the following?
>
> $var =~ s/(?:\x0d|\x0a)+/\x0a/g; # Convert all line ends to \x0a

I would normally just do something like this:

$var =~ s/[\r\n]+/\n/g; # convert line endings to appropriate one for OS

Should give the same results.



------------------------------

Date: Tue, 07 Apr 2015 00:25:39 -0700
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: Regex replace line breaks
Message-Id: <ybCdnceuc-HxG77InZ2dnUVZ572dnZ2d@giganews.com>


On 4/6/2015 10:59 PM, $Bill wrote:

> On 4/6/2015 22:13, Robbie Hatley wrote:
>>
>> But if you want to do it, one way is like so:
>>
>> #! /usr/bin/perl
>> #  /rhe/scripts/test/line-end-adjust-test.perl
>> use v5.14;
>> use strict;
>> use warnings;
>> my $var  = "111111111111111\x0d";
>>     $var .= "222222222222222\x0a";
>>     $var .= "333333333333333\x0a\x0a";
>>     $var .= "444444444444444\x0d\x0d";
>>     $var .= "555555555555555\x0d\x0a";
>>     $var .= "666666666666666\x0a\x0d";
>>     $var .= "777777777777777\x0d\x0d\x0a\x0a";
>>     $var .= "888888888888888\x0d\x0a\x0a\x0d";
>> $var =~ s/(?:\x0d|\x0a)+/\x0d/g; # Convert all line ends to \x0d
>> print $var;
>
> ....
>
>> So perhaps what you really want is the following?
>>
>> $var =~ s/(?:\x0d|\x0a)+/\x0a/g; # Convert all line ends to \x0a
>
> I would normally just do something like this:
>
> $var =~ s/[\r\n]+/\n/g; # convert line endings to appropriate one for OS
>
> Should give the same results.

Ah. Interesting. A more "pragmatic" approach, whereas my version is
more of a "fastidious" approach.

Won't give same results in *all* cases, though.

Firstly, I doubt that the assumptions \r == \x0d and \n == \x0a
are globally true.

Secondly, the results would be the same only if the person using the
"fastidious" approach actually *is* doing the appropriate thing for
the OS the files are going to be used on.

For the most part, I like your version better, if the intent is to
use the resulting files on the same OS that Perl is running on.

However, if that's not the case, then the "fastidious" approach would
be needed to force the issue. For example, someone running Perl on
Debian Linux is making files to be used on some OS called "Hullapalooza"
that uses \x04 as "newline". Then you'd have to do something like this:

$var =~ s/[\x04\x0a\x0d]+/\x04/g; # convert line endings for "Hullpalooza"


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley


------------------------------

Date: Tue, 07 Apr 2015 10:20:02 +0200
From: gamo <gamo@telecable.es>
Subject: Re: Regex replace line breaks
Message-Id: <mg0416$19e$1@speranza.aioe.org>

El 07/04/15 a las 09:25, Robbie Hatley escribió:
>>> $var =~ s/(?:\x0d|\x0a)+/\x0a/g; # Convert all line ends to \x0a
>>
>> I would normally just do something like this:
>>
>> $var =~ s/[\r\n]+/\n/g; # convert line endings to appropriate one for OS
>>
>> Should give the same results.
>
> Ah. Interesting. A more "pragmatic" approach, whereas my version is
> more of a "fastidious" approach.
>
> Won't give same results in *all* cases, though.
>
> Firstly, I doubt that the assumptions \r == \x0d and \n == \x0a
> are globally true.

As far as you must find a non-ASCII compatible computer, or a perl
builded outside standard C, you could doubt.

-- 
http://www.telecable.es/personales/gamo/
The generation of random numbers is too important to be left to chance


------------------------------

Date: Tue, 07 Apr 2015 12:04:39 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Regex replace line breaks
Message-Id: <87y4m48aig.fsf@doppelsaurus.mobileactivedefense.com>

gamo <gamo@telecable.es> writes:
> El 07/04/15 a las 09:25, Robbie Hatley escribió:
>>>> $var =~ s/(?:\x0d|\x0a)+/\x0a/g; # Convert all line ends to \x0a
>>>
>>> I would normally just do something like this:
>>>
>>> $var =~ s/[\r\n]+/\n/g; # convert line endings to appropriate one for OS
>>>
>>> Should give the same results.
>>
>> Ah. Interesting. A more "pragmatic" approach, whereas my version is
>> more of a "fastidious" approach.
>>
>> Won't give same results in *all* cases, though.
>>
>> Firstly, I doubt that the assumptions \r == \x0d and \n == \x0a
>> are globally true.
>
> As far as you must find a non-ASCII compatible computer, or a perl
> builded outside standard C, you could doubt.

'standard C' doesn't define any codepoints, it just requires that
certain characters exist in the execution character set. And it's also
not that simple for Perl as \n represents whatever the environments idea
of a linefeed character or character sequence might be, eg printing the
string processed above on systems designed by people obsessed with
ancient, mechanical typewriters to a text file will (reportedly)
dutifully restore the "ping -- push it back" character (I
actually spent some time typing on such a device in the past despite I'm
still a good deal in front of 60 ...).

The first section in 'perldoc perlport' has all the gory details.


------------------------------

Date: Tue, 07 Apr 2015 18:19:34 -0700
From: Robbie Hatley <see.my.sig@for.my.address>
Subject: Re: Regex replace line breaks
Message-Id: <Mq6dnasWdMyCH7nInZ2dnUVZ57ydnZ2d@giganews.com>


On 4/7/2015 1:20 AM, gamo wrote:

> El 07/04/15 a las 09:25, Robbie Hatley escribió:
> > ... Firstly, I doubt that the assumptions \r == \x0d and \n == \x0a
> > are globally true....
>
> As far as you must find a non-ASCII compatible computer, or a perl
> builded outside standard C, you could doubt.

Such as, EBCDIC. I'm sure there are still a few computers around running
that encoding and other encodings that don't mirror ASCII.

(That's one of several things I like about utf-8: Firstly, it mirrors
ASCII in it's lowest 128 codepoints; and secondly, it mostly mirrors
ISO-8859-1 in its next 128 codepoints. So you get a double dose of
backward compatibility before you break into stuff like Kanji/Hanzi,
Arabic, Hebrew, Tamil, Hindi, etc.)


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley


------------------------------

Date: Wed, 08 Apr 2015 10:49:55 +0300
From: Eric Pozharski <whynot@pozharski.name>
Subject: Re: Regex replace line breaks
Message-Id: <slrnmi9nd3.d6k.whynot@orphan.zombinet>

with <Mq6dnasWdMyCH7nInZ2dnUVZ57ydnZ2d@giganews.com> Robbie Hatley wrote:
> On 4/7/2015 1:20 AM, gamo wrote:
>> El 07/04/15 a las 09:25, Robbie Hatley escribió:

*SKIP*
> Such as, EBCDIC. I'm sure there are still a few computers around
> running that encoding and other encodings that don't mirror ASCII.

What EBCDIC has to do with anything, in this particualr case?  May I
remind you, and your herd, that clpm has no karma thingy?

p.s.  Personally, I like George's approach most.

*CUT*

-- 
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4409
***************************************


home help back first fref pref prev next nref lref last post