[33131] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4408 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Apr 6 09:09:15 2015

Date: Mon, 6 Apr 2015 06:09:03 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 6 Apr 2015     Volume: 11 Number: 4408

Today's topics:
    Re: Backticks status in scalar context: bug or missunde <whynot@pozharski.name>
    Re: Backticks status in scalar context: bug or missunde <rweikusat@mobileactivedefense.com>
    Re: Does "exit" closes all filehandles? <rweikusat@mobileactivedefense.com>
    Re: One more reason I like Perl. <rweikusat@mobileactivedefense.com>
        Regex replace line breaks <noreply2me@yahoo.com>
    Re: Regex replace line breaks <gamo@telecable.es>
    Re: Regex replace line breaks <gamo@telecable.es>
    Re: Regex replace line breaks <JimSGibson@gmail.com>
    Re: Regex replace line breaks <justin.1504@purestblue.com>
    Re: Regex replace line breaks <gravitalsun@hotmail.foo>
    Re: Regex replace line breaks <hjp-usenet3@hjp.at>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sun, 05 Apr 2015 15:18:03 +0300
From: Eric Pozharski <whynot@pozharski.name>
Subject: Re: Backticks status in scalar context: bug or missunderstanding
Message-Id: <slrnmi29vr.8mg.whynot@orphan.zombinet>

with <mfmjn9$vvf$1@rumours.uwaterloo.ca> David Canzi wrote:
> Martin Str|mberg  <ams@luminous.ludd.ltu.se> wrote:
>>Peter Makholm <peter@makholm.net> wrote:
>>> Martin Str|mberg <ams@luminous.ludd.ltu.se> writes:

*SKIP*
>> Ok, that explains what is happening. If I try the command
>> "/bin/false_does_not_exists", I do get undef.
> Just to increase the confusion...

When you run along without warnings, that's the price you pay.

> $o = `no-such-command x`
> Result: $o is undefined.

This is straightforward fork-and-exec.  Then B<exec()> (through failed
exec(2)) finds out that there's no 'no-such-command' in $ENV{PATH}.  And
reports it (a) through STDERR (but you need warnings for this, and you
don't) and (b) through $? (but you don't check it either).

> $o = `no-such-command *`
> Result: $o is defined.

This is fork-and-exec $ENV{SHELL} (or whatever appropriate) with
arguments (-c option of /bin/sh) what in turn expands '*', then exec(2)
finds out there's no 'no-such-command' and /bin/sh dies, what is
reported through $? (and again you don't check it).

p.s.  'exec(2)' is meta-syntactic.

-- 
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom


------------------------------

Date: Sun, 05 Apr 2015 19:03:56 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Backticks status in scalar context: bug or missunderstanding
Message-Id: <87mw2m1mgj.fsf@doppelsaurus.mobileactivedefense.com>

Eric Pozharski <whynot@pozharski.name> writes:
> with <mfmjn9$vvf$1@rumours.uwaterloo.ca> David Canzi wrote:
>> Martin Str|mberg  <ams@luminous.ludd.ltu.se> wrote:
>>>Peter Makholm <peter@makholm.net> wrote:
>>>> Martin Str|mberg <ams@luminous.ludd.ltu.se> writes:
>
> *SKIP*
>>> Ok, that explains what is happening. If I try the command
>>> "/bin/false_does_not_exists", I do get undef.
>> Just to increase the confusion...
>
> When you run along without warnings, that's the price you pay.

"People can't keep from making remarks about that", even when totally
besides the point (as here) ...

>> $o = `no-such-command x`
>> Result: $o is undefined.
>
> This is straightforward fork-and-exec.  Then B<exec()> (through failed
> exec(2)) finds out that there's no 'no-such-command' in $ENV{PATH}.  And
> reports it (a) through STDERR (but you need warnings for this, and you
> don't) and (b) through $? (but you don't check it either).

$! is also set to the error encountered by the child. 

>
>> $o = `no-such-command *`
>> Result: $o is defined.
>
> This is fork-and-exec $ENV{SHELL} (or whatever appropriate) with
> arguments (-c option of /bin/sh) what in turn expands '*'

This is 'execution via system command interpreter' which is /bin/sh on
UNIX(*). It doesn't take the environment variable SHELL into account (at
least the version I tested, 5.14.2.


------------------------------

Date: Sat, 04 Apr 2015 15:31:25 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Does "exit" closes all filehandles?
Message-Id: <87vbhcufr6.fsf@doppelsaurus.mobileactivedefense.com>

gamo <gamo@telecable.es> writes:
> I wonder what could happen if not.

Yes (if you have it, you could use strace to observe that). Closing the
file descriptors themselves isn't really necessary as they'll cease to
be when the process dies but any output still sitting in this or that
buffer has to be flushed.


------------------------------

Date: Sun, 05 Apr 2015 21:47:04 +0100
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: One more reason I like Perl.
Message-Id: <87zj6m8fqv.fsf@doppelsaurus.mobileactivedefense.com>

Georg Bauhaus <bauhaus@futureapps.invalid> writes:
> On 02.04.15 20:14, Rainer Weikusat wrote:
>>> Not everyone agrees that "invalid input" is subsumed under
>>> "precondition not satisfied":[...]
>>
>> A 'precondition' is a condition which has to be true prior to starting
>> to execute an algorithm for it to work as intended.
>> Eg, this program
>>
>> ---------
>> #include <stdio.h>
>> #include <string.h>
>>
>> int main(int argc, char **argv)
>> {
>>      argv[1][strlen(argv[1]) - 4] = 'b';
>>      puts(*argv);
>>      puts(argv[1]);
>>
>>      return 0;
>> }
>> ---------
>>
>> will print argv[0] followed by argv[1] with the character 4 characters
>> in front of the end of argv[1] replaced by b provided an argv[1] whose
>> length is at least 4 was actually supplied (otherwise, its behaviour is
>> undefined).
>
> Interesting twist. Redefines input as not input but as whatever happens
> to have been put in^H^Hfor parameters to the program...

It redefines "input" as "information flowing into the code from the
outside", with "command line arguments" being the most easiest to use
for a quick example ...

>
>>> I'd then find it odd to characterize 'heartbleed' as an operator
>>> error,
>>
>> I didn't. I just mentioned this because "don't implement input validation" is a
>> very renowned academic/ scientifc practice and nobody was ever stripped
>> of his laurel wreath just because of that.
>
> This is worrisome. Do you have facts to back this up?

I think I just mentioned the "How to cause a global cyrpto disaster [for
fun and profit]" master thesis and the person who caused the disaster
certainly got less flak for that than I got for an (IMHO) minor
oversight in some code I (carelessly) contributed to Linux in 2011. The
other prominent example, is - of course - the original BSD fingerd. But
nearly each and every "buffer overflow", "SQL injections vulnerability",
"cross something requests forgery" etc was/ is ultimatively caused by
code processing input from untrusted sources without prior
validation. And most people who work "in the industry" do have
university degrees ...


------------------------------

Date: Sat, 4 Apr 2015 19:41:11 -0700
From: "Robert Crandal" <noreply2me@yahoo.com>
Subject: Regex replace line breaks
Message-Id: <3NKdnfwD5t1XPb3InZ2dnUVZ5sqdnZ2d@giganews.com>

I am reading and storing an entire text file into a single
string variable.  My goal is to replace all line feed (LF) or
carriage return (CR) characters with a single CR
character.

Basically, I need to ensure that all paragraphs in my
string are single-spaced with a single CR character.

The problem is, I am reading text files from different sources,
so I am seeing different representations of "line breaks".
Sometimes it is just CR, or CR LF, or CR CR LF, etc...
In byte form, a CR equals hex character "0D" and
LF is hex character "0A".

As an example, suppose my string is:

"Hello world!<CRLF><CRLF>Bye world!"

I want to change it to:

"Hello world!<CR>Bye world!"  // single spaced.

This seems to be a job for regular expressions, especially
since there are different ways to represent line breaks.
For my purposes, assume that a single line break may be
any of these:  CR,  CF LF, or CR CR LF.

How can I single-space my string with regular expressions?






------------------------------

Date: Sun, 05 Apr 2015 05:25:00 +0200
From: gamo <gamo@telecable.es>
Subject: Re: Regex replace line breaks
Message-Id: <mfq9ur$5um$1@speranza.aioe.org>

El 05/04/15 a las 04:41, Robert Crandal escribió:
> I am reading and storing an entire text file into a single
> string variable.  My goal is to replace all line feed (LF) or
> carriage return (CR) characters with a single CR
> character.
>
> Basically, I need to ensure that all paragraphs in my
> string are single-spaced with a single CR character.
>
> The problem is, I am reading text files from different sources,
> so I am seeing different representations of "line breaks".
> Sometimes it is just CR, or CR LF, or CR CR LF, etc...
> In byte form, a CR equals hex character "0D" and
> LF is hex character "0A".
>
> As an example, suppose my string is:
>
> "Hello world!<CRLF><CRLF>Bye world!"
>
> I want to change it to:
>
> "Hello world!<CR>Bye world!"  // single spaced.
>
> This seems to be a job for regular expressions, especially
> since there are different ways to represent line breaks.
> For my purposes, assume that a single line break may be
> any of these:  CR,  CF LF, or CR CR LF.
>
> How can I single-space my string with regular expressions?
>
>

It seems easy. Try to slurp in a variable all the text, and then
substitute \r by nothing.

(untested)

local $/="";

my $var = <>;

$var =~ s/\r\r/\r/g;
$var =~ s/\r\n/\n/g;



-- 
http://www.telecable.es/personales/gamo/
The generation of random numbers is too important to be left to chance


------------------------------

Date: Sun, 05 Apr 2015 05:56:50 +0200
From: gamo <gamo@telecable.es>
Subject: Re: Regex replace line breaks
Message-Id: <mfqbqg$9cs$1@speranza.aioe.org>

El 05/04/15 a las 05:25, gamo escribió:
> (untested)
>
> local $/="";
>
> my $var = <>;
>
> $var =~ s/\r\r/\r/g;
> $var =~ s/\r\n/\n/g;

More you could add:

$var =~ s/\r/\n/g;
$var =~ s/\n\n/\n/g;

Therefore, you never miss a new line, but
replaces all extras by only one.

-- 
http://www.telecable.es/personales/gamo/
The generation of random numbers is too important to be left to chance


------------------------------

Date: Sat, 04 Apr 2015 22:43:53 -0700
From: Jim Gibson <JimSGibson@gmail.com>
Subject: Re: Regex replace line breaks
Message-Id: <040420152243535136%JimSGibson@gmail.com>

In article <3NKdnfwD5t1XPb3InZ2dnUVZ5sqdnZ2d@giganews.com>, Robert
Crandal <noreply2me@yahoo.com> wrote:

> I am reading and storing an entire text file into a single
> string variable.  My goal is to replace all line feed (LF) or
> carriage return (CR) characters with a single CR
> character.
> 
> Basically, I need to ensure that all paragraphs in my
> string are single-spaced with a single CR character.
> 
> The problem is, I am reading text files from different sources,
> so I am seeing different representations of "line breaks".
> Sometimes it is just CR, or CR LF, or CR CR LF, etc...
> In byte form, a CR equals hex character "0D" and
> LF is hex character "0A".
> 
> As an example, suppose my string is:
> 
> "Hello world!<CRLF><CRLF>Bye world!"
> 
> I want to change it to:
> 
> "Hello world!<CR>Bye world!"  // single spaced.
> 
> This seems to be a job for regular expressions, especially
> since there are different ways to represent line breaks.
> For my purposes, assume that a single line break may be
> any of these:  CR,  CF LF, or CR CR LF.
> 
> How can I single-space my string with regular expressions?

s/[\r\n]+/\n/g;

-- 
Jim Gibson


------------------------------

Date: Sun, 5 Apr 2015 12:44:03 +0100
From: Justin C <justin.1504@purestblue.com>
Subject: Re: Regex replace line breaks
Message-Id: <3pj7vb-ell.ln1@moonlight.purestblue.com>

On 2015-04-05, Robert Crandal <noreply2me@yahoo.com> wrote:
> I am reading and storing an entire text file into a single
> string variable.  My goal is to replace all line feed (LF) or
> carriage return (CR) characters with a single CR
> character.
>
> Basically, I need to ensure that all paragraphs in my
> string are single-spaced with a single CR character.
>
> The problem is, I am reading text files from different sources,
> so I am seeing different representations of "line breaks".
> Sometimes it is just CR, or CR LF, or CR CR LF, etc...
> In byte form, a CR equals hex character "0D" and
> LF is hex character "0A".

[snip]

There's always dos2unix.

   Justin.


------------------------------

Date: Sun, 05 Apr 2015 18:30:56 +0300
From: George Mpouras <gravitalsun@hotmail.foo>
Subject: Re: Regex replace line breaks
Message-Id: <mfrkfr$nj5$1@news.grnet.gr>

s/\v+/\n/g


------------------------------

Date: Sun, 5 Apr 2015 17:33:41 +0200
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: Regex replace line breaks
Message-Id: <slrnmi2lel.k6p.hjp-usenet3@hrunkner.hjp.at>

On 2015-04-05 02:41, Robert Crandal <noreply2me@yahoo.com> wrote:
> I am reading and storing an entire text file into a single
> string variable.  My goal is to replace all line feed (LF) or
> carriage return (CR) characters with a single CR
> character.

Any reason why you want to replace it with a CR character (standard on
MacOS 9 and earlier and pretty much nothing else) and not an LF
character (standard in Perl, C, and pretty much and any programming
language with roots in Unix) or a CRLF sequence (standard in most
internet protocols as well as MS-DOS and Windows)? 


> The problem is, I am reading text files from different sources,
> so I am seeing different representations of "line breaks".
> Sometimes it is just CR, or CR LF, or CR CR LF, etc...
> In byte form, a CR equals hex character "0D" and
> LF is hex character "0A".
>
> As an example, suppose my string is:
>
> "Hello world!<CRLF><CRLF>Bye world!"
>
> I want to change it to:
>
> "Hello world!<CR>Bye world!"  // single spaced.
>
> This seems to be a job for regular expressions, especially
> since there are different ways to represent line breaks.
> For my purposes, assume that a single line break may be
> any of these:  CR,  CF LF, or CR CR LF.

Not LF? 


> How can I single-space my string with regular expressions?

s/\x{0D}\x{0D}\x{0A}|\x{0D}\x{0A}|\x{0D}/\x{0D}/g

        hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4408
***************************************


home help back first fref pref prev next nref lref last post