[32820] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 4085 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Nov 27 11:09:39 2013

Date: Wed, 27 Nov 2013 08:09:05 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 27 Nov 2013     Volume: 11 Number: 4085

Today's topics:
    Re: Several Topics - Nov. 19, 2013 <hjp-usenet3@hjp.at>
    Re: Several Topics - Nov. 19, 2013 <ben@morrow.me.uk>
    Re: Several Topics - Nov. 19, 2013 <hjp-usenet3@hjp.at>
    Re: Several Topics - Nov. 19, 2013 <ben@morrow.me.uk>
    Re: Several Topics - Nov. 19, 2013 <derykus@gmail.com>
    Re: Several Topics - Nov. 19, 2013 <derykus@gmail.com>
    Re: Several Topics - Nov. 19, 2013 <rweikusat@mobileactivedefense.com>
        STDOUT beginner problem mat.krawczyk@gmail.com
    Re: STDOUT beginner problem <rweikusat@mobileactivedefense.com>
    Re: STDOUT beginner problem <gamo@telecable.es>
    Re: STDOUT beginner problem <ben@morrow.me.uk>
    Re: STDOUT beginner problem <gamo@telecable.es>
    Re: STDOUT beginner problem <rweikusat@mobileactivedefense.com>
    Re: STDOUT beginner problem <gamo@telecable.es>
    Re: STDOUT beginner problem <ben@morrow.me.uk>
    Re: STDOUT beginner problem <gamo@telecable.es>
    Re: STDOUT beginner problem <rweikusat@mobileactivedefense.com>
    Re: STDOUT beginner problem <rweikusat@mobileactivedefense.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Tue, 26 Nov 2013 15:49:25 +0100
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: Several Topics - Nov. 19, 2013
Message-Id: <slrnl99d7l.dhk.hjp-usenet3@hrunkner.hjp.at>

On 2013-11-26 07:31, Eric Pozharski <whynot@pozharski.name> wrote:
> with <87d2lpw1bf.fsf@new.chromatico.net> Charlton Wilbur wrote:
>
> *SKIP*
>
>> The time hit I expect isn't in allocation; it's in copying.
>
> That allocation is such a SOB.
>
>> If I have @foo with 1000 strings, and I say @bar = reverse @foo, that
>> requires copying 1000 strings, no? 
>
> I'm not that sure:
[...]
> 	cmpthese timethese -5, {
> 	  code00 => sub { @ab = map -$_, reverse @aa  },
> 	  code01 => sub { @ab = reverse @aa; $_ = -$_ foreach @ab },
> 	};
[...]

I don't see how this benchmark is supposed to show whether reverse needs
to copy the elements.

A better demonstration is this:

% cat foo
#!/usr/bin/perl
use v5.10;
$| = 1;

say 1;
my @a = ( "a" x 130_000) x 1000;

say 2;
my @b = reverse @a;

say 3;

% ltrace -o foo.ltrace perl ./foo
1
2
3
% egrep '^write\(1, |malloc\(130' foo.ltrace | cut -d= -f1 | uniq -c
      1 write(1, "1\n", 2)
   1000 malloc(130004)
      1 write(1, "2\n", 2)
   1000 malloc(130004)
      1 write(1, "3\n", 2)

As you can see, both the assignment to @a and the assignment to @b
allocate 1000 objects of 130004 bytes. So «@b = reverse @a» creates a
copy of each element, but it still isn't clear whether it's reverse that
makes the copy or the assignment operator. In any case there is only one
copy, not two, so something is optimized here.

Also note that 

    for (reverse @a) {
	$_ = "x";
    }

does not make a (temporary) copy of @a. The assignments within the loop
modify the contents of @a.

	hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Tue, 26 Nov 2013 15:50:06 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Several Topics - Nov. 19, 2013
Message-Id: <eisema-vig.ln1@anubis.morrow.me.uk>


Quoth "Peter J. Holzer" <hjp-usenet3@hjp.at>:
> 
> As you can see, both the assignment to @a and the assignment to @b
> allocate 1000 objects of 130004 bytes. So «@b = reverse @a» creates a
> copy of each element, but it still isn't clear whether it's reverse that
> makes the copy or the assignment operator. In any case there is only one
> copy, not two, so something is optimized here.

It's the assignment that copies the strings. I'm not quite sure why
you'd expect two copies: '@a' in list context evaluates to a list of the
actual elements of @a, 'reverse @a' then reverses that list, and then
the assignment copies the list into the new array.

> Also note that 
> 
>     for (reverse @a) {
> 	$_ = "x";
>     }
> 
> does not make a (temporary) copy of @a. The assignments within the loop
> modify the contents of @a.

That's expected, given what I explained above: this is not a special
optimisation of 'for', so (for instance) '@b = \(reverse @a)' will leave
@b containing refs to the actual elements of @a, in reverse order. 'for'
is optimised in this case, however, into a direct iteration over the
elements of @a in reverse order: that is, @a is not expanded onto the
stack as a list, and pp_reverse is not called. (This makes a difference
when you have many elements, rather than elements which are individually
large.)

Also, '@a = reverse @a' in void context does an in-place reverse, with
no copying, as does '@a = sort @a'.

Ben



------------------------------

Date: Tue, 26 Nov 2013 18:04:10 +0100
From: "Peter J. Holzer" <hjp-usenet3@hjp.at>
Subject: Re: Several Topics - Nov. 19, 2013
Message-Id: <slrnl99l4a.u5f.hjp-usenet3@hrunkner.hjp.at>

On 2013-11-26 15:50, Ben Morrow <ben@morrow.me.uk> wrote:
> Quoth "Peter J. Holzer" <hjp-usenet3@hjp.at>:
>> As you can see, both the assignment to @a and the assignment to @b
>> allocate 1000 objects of 130004 bytes. So «@b = reverse @a» creates a
>> copy of each element, but it still isn't clear whether it's reverse that
>> makes the copy or the assignment operator. In any case there is only one
>> copy, not two, so something is optimized here.
>
> It's the assignment that copies the strings. I'm not quite sure why
> you'd expect two copies: '@a' in list context evaluates to a list of the
> actual elements of @a, 'reverse @a' then reverses that list, and then
> the assignment copies the list into the new array.

"Expect" is too strong a word, I simply didn't know. reverse gets a list
of elements and returns another list of elements. AFAICS it isn't
documented (and certainly not self-evident), that the elements of the
resulting list are aliases to and not copies of the elements in the
input list.  It is useful from a performance POV and sometimes (e.g. in
a for loop) even semantically, but it's one of Perl's idiosyncrasies
which may be perfectly obvious if you have the right mental model, but
aren't if you don't have it. And given that the mental model isn't
documented anywhere, perl usually does what I mean and sometimes it
surprises me.

As another example, compare

    @b = \(grep 1, @a);

and

    @b = \(map $_, @a);

You can probably tell what does do and explain why they do it. I have a
pretty good explanation for their behaviour after I tried them. But I
couldn't have predicted the result.

	hp


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) |                    | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel


------------------------------

Date: Wed, 27 Nov 2013 00:37:13 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Several Topics - Nov. 19, 2013
Message-Id: <perfma-9il.ln1@anubis.morrow.me.uk>


Quoth "Peter J. Holzer" <hjp-usenet3@hjp.at>:
> On 2013-11-26 15:50, Ben Morrow <ben@morrow.me.uk> wrote:
> > Quoth "Peter J. Holzer" <hjp-usenet3@hjp.at>:
> >> As you can see, both the assignment to @a and the assignment to @b
> >> allocate 1000 objects of 130004 bytes. So «@b = reverse @a» creates a
> >> copy of each element, but it still isn't clear whether it's reverse that
> >> makes the copy or the assignment operator. In any case there is only one
> >> copy, not two, so something is optimized here.
> >
> > It's the assignment that copies the strings. I'm not quite sure why
> > you'd expect two copies: '@a' in list context evaluates to a list of the
> > actual elements of @a, 'reverse @a' then reverses that list, and then
> > the assignment copies the list into the new array.
> 
> "Expect" is too strong a word, I simply didn't know. reverse gets a list
> of elements and returns another list of elements. AFAICS it isn't
> documented (and certainly not self-evident), that the elements of the
> resulting list are aliases to and not copies of the elements in the
> input list.  It is useful from a performance POV and sometimes (e.g. in
> a for loop) even semantically, but it's one of Perl's idiosyncrasies
> which may be perfectly obvious if you have the right mental model, but
> aren't if you don't have it. And given that the mental model isn't
> documented anywhere, perl usually does what I mean and sometimes it
> surprises me.

That's certainly true. Unfortunately, the only way to understand Perl's
semantics in detail is to read perl's source.

> As another example, compare
> 
>     @b = \(grep 1, @a);
> 
> and
> 
>     @b = \(map $_, @a);
> 
> You can probably tell what does do and explain why they do it. I have a
> pretty good explanation for their behaviour after I tried them. But I
> couldn't have predicted the result.

Huh. That is not at all what I would have expected :). (I would have
expected both cases to alias.) It's not an accidental side-effect of the
implementation, either: pp_mapwhile explicitly copies its return values,
except in the case where they are already temporaries (which will be the
usual case, of course).

I suppose it makes *some* sense if you consider the EXPR form of map to
be short for the BLOCK form, and consider the BLOCK to be essentially an
anonymous sub; subs alias their parameters, so $_ inside the block is an
alias, but they copy their return values, so the returned list is a list
of copies. I'm not sure I entirely buy that, though...

Ben



------------------------------

Date: Wed, 27 Nov 2013 01:10:49 -0800 (PST)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: Several Topics - Nov. 19, 2013
Message-Id: <57a7581f-ec4b-4723-8584-58406abaca48@googlegroups.com>

On Tuesday, November 26, 2013 4:37:13 PM UTC-8, Ben Morrow wrote:
> Quoth "Peter J. Holzer" <hjp-usenet3@hjp.at>:
> 
> ...
> 
> > As another example, compare
> >     @b = \(grep 1, @a);
> > and
> >     @b = \(map $_, @a);
> 
> > You can probably tell what does do and explain why they do it. I have a
> > pretty good explanation for their behaviour after I tried them. But I
> > couldn't have predicted the result.
>  
> Huh. That is not at all what I would have expected :). (I would have 
> expected both cases to alias.) It's not an accidental side-effect of the
> implementation, either: pp_mapwhile explicitly copies its return values,
> except in the case where they are already temporaries (which will be the
> usual case, of course).
> 
> I suppose it makes *some* sense if you consider the EXPR form of map to
> be short for the BLOCK form, and consider the BLOCK to be essentially an
> anonymous sub; subs alias their parameters, so $_ inside the block is an
> alias, but they copy their return values, so the returned list is a list
> of copies. I'm not sure I entirely buy that, though...
> 

It seems especially deceptive since the dump output:

perl -MData::Dumper -E '@a=qw/foo bar baz/; @b=\grep(1,@a);say Dumper $_ for \(@a,@b)'

perl -MData::Dumper -E '@a=qw/foo bar baz/; @b=\map($_,@a);say Dumper $_ for \(@a,@b)'

looks the same until there's a tweak:

 ...@a=qw/foo bar baz/;$a[0]="barf";...

-- 
Charles DeRykus


------------------------------

Date: Wed, 27 Nov 2013 02:28:40 -0800 (PST)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: Several Topics - Nov. 19, 2013
Message-Id: <ebcdda14-0bb1-427c-9fac-5970a69a99f9@googlegroups.com>

On Wednesday, November 27, 2013 1:10:49 AM UTC-8, C.DeRykus wrote:
> On Tuesday, November 26, 2013 4:37:13 PM UTC-8, Ben Morrow wrote:
> 
> > Quoth "Peter J. Holzer" <hjp-usenet3@hjp.at>:
> 
> > 
> 
> > ...
> 
> > 
> 
> > > As another example, compare
> 
> > >     @b = \(grep 1, @a);
> 
> > > and
> 
> > >     @b = \(map $_, @a);
> 
> > 
> 
> > > You can probably tell what does do and explain why they do it. I have a
> 
> > > pretty good explanation for their behaviour after I tried them. But I
> 
> > > couldn't have predicted the result.
> 
> >  
> 
> > Huh. That is not at all what I would have expected :). (I would have 
> 
> > expected both cases to alias.) It's not an accidental side-effect of the
> 
> > implementation, either: pp_mapwhile explicitly copies its return values,
> > except in the case where they are already temporaries (which will be the usual case, of course).
> 
> > I suppose it makes *some* sense if you consider the EXPR form of map to
> > be short for the BLOCK form, and consider the BLOCK to be essentially an
> > anonymous sub; subs alias their parameters, so $_ inside the block is an
> > alias, but they copy their return values, so the returned list is a list
> > of copies. I'm not sure I entirely buy that, though...
> 
> It seems especially deceptive since the dump 
> perl -MData::Dumper -E '@a=qw/foo bar baz/; 
@b=\map($_,@a);say Dumper $_ for \(@a,@b)'
>  
> looks the same until there's a tweak:
> 
> ...@a=qw/foo bar baz/;$a[0]="barf";...
> 

mis-tweaked:

@a=qw/foo bar baz/;@b=\map($_,@a);$a[0]="barf"..

--  
Charles DeRykus


------------------------------

Date: Wed, 27 Nov 2013 14:41:04 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: Several Topics - Nov. 19, 2013
Message-Id: <87mwkp3o1r.fsf@sable.mobileactivedefense.com>

"Peter J. Holzer" <hjp-usenet3@hjp.at> writes:
> On 2013-11-26 15:50, Ben Morrow <ben@morrow.me.uk> wrote:
>> Quoth "Peter J. Holzer" <hjp-usenet3@hjp.at>:
>>> As you can see, both the assignment to @a and the assignment to @b
>>> allocate 1000 objects of 130004 bytes. So «@b = reverse @a» creates a
>>> copy of each element, but it still isn't clear whether it's reverse that
>>> makes the copy or the assignment operator. In any case there is only one
>>> copy, not two, so something is optimized here.
>>
>> It's the assignment that copies the strings. I'm not quite sure why
>> you'd expect two copies: '@a' in list context evaluates to a list of the
>> actual elements of @a, 'reverse @a' then reverses that list, and then
>> the assignment copies the list into the new array.
>
> "Expect" is too strong a word, I simply didn't know. reverse gets a list
> of elements and returns another list of elements. AFAICS it isn't
> documented (and certainly not self-evident), that the elements of the
> resulting list are aliases to and not copies of the elements in the
> input list.

That's nevertheless pretty obvious: reverse is called with a list of
scalars (SV *) on the stack and it reverses that list. Why would this
copy the scalars?


------------------------------

Date: Tue, 26 Nov 2013 09:38:10 -0800 (PST)
From: mat.krawczyk@gmail.com
Subject: STDOUT beginner problem
Message-Id: <36703e18-8c08-4ab2-bfe2-7e17a4708310@googlegroups.com>

Hello,

I would like to write simple script for emails decoding. My problem is connected with input and output of an external program. I would like to use html2text converter: 

 open(HTML2TEXT, "| /usr/bin/html2text ") || die "html2text failed: $!\n";
 $text = print HTML2TEXT $html;
 close HTML2TEXT;
 print $text;

but $text is empty and output is directed to STDOUT.

I will be grateful for any help..

Mateusz Krawczyk


------------------------------

Date: Tue, 26 Nov 2013 18:40:32 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: STDOUT beginner problem
Message-Id: <87zjorknvj.fsf@sable.mobileactivedefense.com>

mat.krawczyk@gmail.com writes:
> I would like to write simple script for emails decoding. My problem is connected with input and output of an external program. I would like to use html2text converter: 
>
>  open(HTML2TEXT, "| /usr/bin/html2text ") || die "html2text failed: $!\n";
>  $text = print HTML2TEXT $html;
>  close HTML2TEXT;
>  print $text;
>
> but $text is empty and output is directed to STDOUT.

Output is to stdout because you didn't redirect it somewhere
else. Generally, the built-in 'pipe open' can't do what you want (write
data to some process and read its output back). IPC::Open2 can do that,
although using that is not as straight-forward as it seems (there's a
chance that both processes deadlock because both wait for data written
by the other). One way to deal with that is to use select and switch
between reading and writing as required. Another reasonably easy way
would be to use three processes, one which reads the output from the
external command, a 2nd which runs it and a 3rd which feeds input to it.

Example
-------
my ($in, $proc, $line, $rc);

$rc = open($proc, '-|');
if ($rc == 0) {
    $rc = open($proc, '-|');
    if ($rc == 0) {
	#
	# 3rd process: reads from input file, stdout connected
	# to 2nd pipe
	#
	open($in, '<', '/var/log/syslog');
	print $line while $line = <$in>;
	exit(0);
    }

    #
    # 2nd process: stdin redirected from 2nd pipe, stdout
    # connected to 1st
    #
    open(STDIN, '<&', $proc);
    exec('tr', '6', '^');
}

#
# original process: reads processed data from 1st pipe
#
print $line while $line = <$proc>;


------------------------------

Date: Tue, 26 Nov 2013 22:31:17 +0100
From: gamo <gamo@telecable.es>
Subject: Re: STDOUT beginner problem
Message-Id: <l733v2$7s4$2@speranza.aioe.org>

El 26/11/13 18:38, mat.krawczyk@gmail.com escribió:
> Hello,
>
> I would like to write simple script for emails decoding. My problem is connected with input and output of

  an external program. I would like to use html2text converter:
>
>   open(HTML2TEXT, "| /usr/bin/html2text ") || die "html2text failed: $!\n";
>   $text = print HTML2TEXT $html;
>   close HTML2TEXT;
>   print $text;
>
> but $text is empty and output is directed to STDOUT.
>
> I will be grateful for any help..
>
> Mateusz Krawczyk
>

Maybe simple backticks are what you are searching for

~$ cat test.backticks
#!/usr/bin/perl -W

use strict;

my $html = '<p>hi</p>';
my $text = `echo "$html" | /usr/bin/html2text`;
print $text, "\n";

~$ perl test.backticks
hi

~$ man perlop

(pay attention to the different ticks)

Good luck



------------------------------

Date: Tue, 26 Nov 2013 22:21:18 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: STDOUT beginner problem
Message-Id: <ufjfma-l3k.ln1@anubis.morrow.me.uk>


Quoth gamo <gamo@telecable.es>:
> El 26/11/13 18:38, mat.krawczyk@gmail.com escribió:
> > Hello,
> >
> > I would like to write simple script for emails decoding. My problem is
> connected with input and output of
> 
>   an external program. I would like to use html2text converter:
> >
> >   open(HTML2TEXT, "| /usr/bin/html2text ") || die "html2text failed: $!\n";
> >   $text = print HTML2TEXT $html;
> >   close HTML2TEXT;
> >   print $text;
> >
> > but $text is empty and output is directed to STDOUT.
> >
> > I will be grateful for any help..
> 
> Maybe simple backticks are what you are searching for
> 
> ~$ cat test.backticks
> #!/usr/bin/perl -W
> 
> use strict;
> 
> my $html = '<p>hi</p>';
> my $text = `echo "$html" | /usr/bin/html2text`;

Careful with your quoting. It would probably be better to write the HTML
to a file.

> print $text, "\n";
> 
> ~$ perl test.backticks
> hi
> 
> ~$ man perlop
> 
> (pay attention to the different ticks)

You can use qx// instead of backticks, and it's usually clearer.

Ben



------------------------------

Date: Tue, 26 Nov 2013 23:50:01 +0100
From: gamo <gamo@telecable.es>
Subject: Re: STDOUT beginner problem
Message-Id: <l738in$i87$1@speranza.aioe.org>

El 26/11/13 23:21, Ben Morrow escribió:
> Careful with your quoting. It would probably be better to write the HTML
> to a file.

 ...probably, but the OP seems to not have problems with the input

> You can use qx// instead of backticks, and it's usually clearer.
>
> Ben
>

Then, he must use qx!! or some other separators

Thanks




------------------------------

Date: Tue, 26 Nov 2013 22:55:06 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: STDOUT beginner problem
Message-Id: <87txeylqnp.fsf@sable.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:
> Quoth gamo <gamo@telecable.es>:

[...]

>> my $html = '<p>hi</p>';
>> my $text = `echo "$html" | /usr/bin/html2text`;
>
> Careful with your quoting. It would probably be better to write the HTML
> to a file.

Not necessary. When starting to make "Gee that looks *complicated*,
can't I sell him something else instead?" assumptions, the simple way to
do this would be to create a small shell script,

------
#!/bin/sh
printf '%s' "$1" | html2text
------

and use that like this:

------
my $html = '<html><body><em>$(echo 3)</em></body></html>';
open($h2t, '-|', '/tmp/h2t', $html);
print(<$h2t>);
------

(the reason for using printf is that echo may interpret \-escapes in its
argument).


    



------------------------------

Date: Wed, 27 Nov 2013 00:27:59 +0100
From: gamo <gamo@telecable.es>
Subject: Re: STDOUT beginner problem
Message-Id: <l73aq1$mni$3@speranza.aioe.org>

El 26/11/13 23:55, Rainer Weikusat escribió:
> Ben Morrow <ben@morrow.me.uk> writes:
>> Quoth gamo <gamo@telecable.es>:
 ...
>> Careful with your quoting. It would probably be better to write the HTML
>> to a file.
>
> Not necessary. When starting to make "Gee that looks *complicated*,
> can't I sell him something else instead?" assumptions, the simple way to
> do this would be to create a small shell script,
>
> ------
> #!/bin/sh
> printf '%s' "$1" | html2text
> ------
>
> and use that like this:
>
> ------
 ...
> open($h2t, '-|', '/tmp/h2t', $html);
> print(<$h2t>);
> ------
>
> (the reason for using printf is that echo may interpret \-escapes in its
> argument).
>
>

Simplest is to read a file from html2text argument but if he wants to 
use cat file | html2text or  echo to, he could, because the 
interpretation of escapes is disabled by default:

  DESCRIPTION
        Echo the STRING(s) to standard output.

        -n     do not output the trailing newline

        -e     enable interpretation of backslash escapes

        -E     disable interpretation of backslash escapes (default)






------------------------------

Date: Wed, 27 Nov 2013 00:50:27 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: STDOUT beginner problem
Message-Id: <j7sfma-9il.ln1@anubis.morrow.me.uk>


Quoth gamo <gamo@telecable.es>:
> El 26/11/13 23:55, Rainer Weikusat escribió:
> > Ben Morrow <ben@morrow.me.uk> writes:
> >> Quoth gamo <gamo@telecable.es>:
> ...
> >> Careful with your quoting. It would probably be better to write the HTML
> >> to a file.
> >
> > Not necessary. When starting to make "Gee that looks *complicated*,
> > can't I sell him something else instead?" assumptions, the simple way to
> > do this would be to create a small shell script,
> >
> > ------
> > #!/bin/sh
> > printf '%s' "$1" | html2text
> > ------
> >
> > and use that like this:
> >
> > ------
> ...
> > open($h2t, '-|', '/tmp/h2t', $html);
> > print(<$h2t>);
> > ------

'Oh, but there's no need to put that script in a file either...':

    open my $h2t, "-|", "/bin/sh", "-c", 
        q/printf %s "$1" | html2text/, $html;

 ...and we end up with the sort of mess shell always turns into.
Sometimes a temporary file is the cleanest and simplest solution.

> > (the reason for using printf is that echo may interpret \-escapes in its
> > argument).
> 
> Simplest is to read a file from html2text argument but if he wants to 
> use cat file | html2text or  echo to, he could, because the 
> interpretation of escapes is disabled by default:
> 
>   DESCRIPTION
>         Echo the STRING(s) to standard output.
> 
>         -n     do not output the trailing newline
> 
>         -e     enable interpretation of backslash escapes
> 
>         -E     disable interpretation of backslash escapes (default)

*My* echo(1), OTOH, recognises neither -e nor -E, and the manpage says:

| The newline may also be suppressed by appending '\c' to the end of the
| string, as is done by iBCS2 compatible systems.  Note that the -n option
| as well as the effect of '\c' are implementation-defined in IEEE Std
| 1003.1-2001 ("POSIX.1") as amended by Cor. 1-2002.  For portability, echo
| should only be used if the first argument does not start with a hyphen
| ('-') and does not contain any backslashes ('\').  If this is not suffi-
| cient, printf(1) should be used.

and also this:

| Most shells provide a builtin echo command which tends to differ from
| this utility in the treatment of options and backslashes.  Consult the
| builtin(1) manual page.

so using echo to pass arbitrary text is not reliable.

Ben



------------------------------

Date: Wed, 27 Nov 2013 10:13:23 +0100
From: gamo <gamo@telecable.es>
Subject: Re: STDOUT beginner problem
Message-Id: <l74d3j$lkv$1@speranza.aioe.org>

El 27/11/13 01:50, Ben Morrow escribió:
>>          -E     disable interpretation of backslash escapes (default)

> *My*  echo(1), OTOH, recognises neither -e nor -E, and the manpage says:
>
> | The newline may also be suppressed by appending '\c' to the end of the
> | string, as is done by iBCS2 compatible systems.  Note that the -n option
> | as well as the effect of '\c' are implementation-defined in IEEE Std
> | 1003.1-2001 ("POSIX.1") as amended by Cor. 1-2002.  For portability, echo
> | should only be used if the first argument does not start with a hyphen
> | ('-') and does not contain any backslashes ('\').  If this is not suffi-
> | cient, printf(1) should be used.
>
> and also this:
>
> | Most shells provide a builtin echo command which tends to differ from
> | this utility in the treatment of options and backslashes.  Consult the
> | builtin(1) manual page.
>
> so using echo to pass arbitrary text is not reliable.
>
> Ben
>


I'm afraid that is common to have 2 echo utilities avaible. One built-in 
in the bash that does accept escapes and one in /bin/echo
who do not. It could be compared "help echo" with "man echo." My 
response to the OP would be to substitute "echo" by "/bin/echo,"
as I remember it's said to do ever to enhance security when
invoquing commands.

Thanks




------------------------------

Date: Wed, 27 Nov 2013 15:54:15 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: STDOUT beginner problem
Message-Id: <87bo153kns.fsf@sable.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:
> Quoth gamo <gamo@telecable.es>:
>> El 26/11/13 23:55, Rainer Weikusat escribi:
>> > Ben Morrow <ben@morrow.me.uk> writes:
>> >> Quoth gamo <gamo@telecable.es>:
>> ...
>> >> Careful with your quoting. It would probably be better to write the HTML
>> >> to a file.
>> >
>> > Not necessary. When starting to make "Gee that looks *complicated*,
>> > can't I sell him something else instead?" assumptions, the simple way to
>> > do this would be to create a small shell script,
>> >
>> > ------
>> > #!/bin/sh
>> > printf '%s' "$1" | html2text
>> > ------
>> >
>> > and use that like this:
>> >
>> > ------
>> ...
>> > open($h2t, '-|', '/tmp/h2t', $html);
>> > print(<$h2t>);
>> > ------
>
> 'Oh, but there's no need to put that script in a file either...':
>
>     open my $h2t, "-|", "/bin/sh", "-c", 
>         q/printf %s "$1" | html2text/, $html;
>
> ...and we end up with the sort of mess shell always turns into.
> Sometimes a temporary file is the cleanest and simplest solution.

In this particular case, the main complication is that html2text doesn't
support passing the text-to-be-processed literally as command-line
argument. And the simplest way to remedy that while avoiding issues with
'inappropriate data interpretation/ execution' is to create a shell
script which takes such an argument and passes it to html2text in the
appropriate way. This yields a new and possibly generally useful command
with more reasonable semantics. Actually, the replacement command could
be written in any programming language including Perl but for these
kinds of task, the shell is IMO the most appropriate tool.

Inline use of such a different programming language instead of creating
is new command is both messy and short-sighted.



------------------------------

Date: Wed, 27 Nov 2013 16:03:54 +0000
From: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Subject: Re: STDOUT beginner problem
Message-Id: <8738mh3k7p.fsf@sable.mobileactivedefense.com>

gamo <gamo@telecable.es> writes:
gamo <gamo@telecable.es> writes:
> El 26/11/13 23:55, Rainer Weikusat escribió:
>> Ben Morrow <ben@morrow.me.uk> writes:
>>> Quoth gamo <gamo@telecable.es>:
> ...
>>> Careful with your quoting. It would probably be better to write the HTML
>>> to a file.
>>
>> Not necessary. When starting to make "Gee that looks *complicated*,
>> can't I sell him something else instead?" assumptions, the simple way to
>> do this would be to create a small shell script,
>>
>> ------
>> #!/bin/sh
>> printf '%s' "$1" | html2text
>> ------
>>
>> and use that like this:
>>
>> ------
> ...
>> open($h2t, '-|', '/tmp/h2t', $html);
>> print(<$h2t>);
>> ------
>>
>> (the reason for using printf is that echo may interpret \-escapes in its
>> argument).
>
> Simplest is to read a file from html2text argument but if he wants to
> use cat file | html2text or  echo to, he could, because the
> interpretation of escapes is disabled by default:

That's not the problem with the backticks idea. For a
live-demonstration, create a file with the following content:

-----------
#!/usr/bin/perl
$output = `echo "$ARGV[0]" | html2text`;
print($output);
-----------

and execute that with '$(ls /)' as first argument. And the ls / could as
well have been cd; rm -rf *. And the OP wrote about processing e-mail
which is not exactly a trusted data source ...


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 4085
***************************************


home help back first fref pref prev next nref lref last post