[32689] in Perl-Users-Digest
Perl-Users Digest, Issue: 3838 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Jun 3 14:55:56 2013
Date: Sun, 16 Dec 2012 02:17:13 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Sun, 16 Dec 2012 Volume: 11 Number: 3838
Today's topics:
help - how to find what is the code for "+/-" symbol co <juliani.moon@gmail.com>
Re: help - how to find what is the code for "+/-" symbo <rvtol+usenet@xs4all.nl>
Re: help - how to find what is the code for "+/-" symbo <ben@morrow.me.uk>
Re: help - how to find what is the code for "+/-" symbo <hjp-usenet2@hjp.at>
Re: help - how to find what is the code for "+/-" symbo <ben@morrow.me.uk>
Re: split() and @_: Perl changed between 5.8 and 5.14 <rweikusat@mssgmbh.com>
Re: split() and @_: Perl changed between 5.8 and 5.14 <derykus@gmail.com>
Re: split() and @_: Perl changed between 5.8 and 5.14 (Tim McDaniel)
Re: split() and @_: Perl changed between 5.8 and 5.14 <ben@morrow.me.uk>
Re: split() and @_: Perl changed between 5.8 and 5.14 <derykus@gmail.com>
Re: split() and @_: Perl changed between 5.8 and 5.14 <rweikusat@mssgmbh.com>
Re: split() and @_: Perl changed between 5.8 and 5.14 (Tim McDaniel)
Re: split() and @_: Perl changed between 5.8 and 5.14 <ben@morrow.me.uk>
Re: split() and @_: Perl changed between 5.8 and 5.14 <rweikusat@mssgmbh.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Fri, 14 Dec 2012 22:19:15 -0800 (PST)
From: Joe <juliani.moon@gmail.com>
Subject: help - how to find what is the code for "+/-" symbol copied from Windows app
Message-Id: <8aa41513-91bc-4776-9b31-97ed1f4c3f2b@googlegroups.com>
I received an Excel data file that contains a "+/-" symbol (html code \&plu=
smn; ±), that can be copied and displayed in Word, Notepad, "Kompoze=
r" html editor, unix vi, pico editors, and load to/retrieve from MySQL oper=
ated on linux. But when I need to manipulate the data in perl, I am lost a=
s how to recognize the symbol with RE. Could anyone help?
Thanks in advance!
joe
------------------------------
Date: Sat, 15 Dec 2012 10:01:10 +0100
From: "Dr.Ruud" <rvtol+usenet@xs4all.nl>
Subject: Re: help - how to find what is the code for "+/-" symbol copied from Windows app
Message-Id: <50cc3c56$0$6876$e4fe514c@news2.news.xs4all.nl>
On 2012-12-15 07:19, Joe wrote:
> I received an Excel data file that contains a "+/-" symbol (html code \± ±), that can be copied and displayed in Word, Notepad, "Kompozer" html editor, unix vi, pico editors, and load to/retrieve from MySQL operated on linux. But when I need to manipulate the data in perl, I am lost as how to recognize the symbol with RE. Could anyone help?
First, look up its Unicode code point.
Google for: unicode plus minus, which will lead you to (for example)
http://www.fileformat.info/info/unicode/char/b1/index.htm
From there it is easy to deduce: \x{B1}.
The page also gives you the exact Unicode character name
'PLUS-MINUS SIGN', which you can use in regular expressions.
--
Ruud
------------------------------
Date: Sat, 15 Dec 2012 12:34:34 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: help - how to find what is the code for "+/-" symbol copied from Windows app
Message-Id: <qb7up9-p7u.ln1@anubis.morrow.me.uk>
Quoth "Dr.Ruud" <rvtol+usenet@xs4all.nl>:
> On 2012-12-15 07:19, Joe wrote:
>
> > I received an Excel data file that contains a "+/-" symbol (html code
> \± ±), that can be copied and displayed in Word, Notepad,
> "Kompozer" html editor, unix vi, pico editors, and load to/retrieve from
> MySQL operated on linux. But when I need to manipulate the data in
> perl, I am lost as how to recognize the symbol with RE. Could anyone
> help?
>
> First, look up its Unicode code point.
>
> Google for: unicode plus minus, which will lead you to (for example)
> http://www.fileformat.info/info/unicode/char/b1/index.htm
>
> From there it is easy to deduce: \x{B1}.
Or just use what you already know:
use HTML::Entities "decode_entities";
my $plusmn = decode_entities "±";
Ben
------------------------------
Date: Sat, 15 Dec 2012 14:32:31 +0100
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: help - how to find what is the code for "+/-" symbol copied from Windows app
Message-Id: <slrnkcouvf.vq7.hjp-usenet2@hrunkner.hjp.at>
On 2012-12-15 12:34, Ben Morrow <ben@morrow.me.uk> wrote:
> Quoth "Dr.Ruud" <rvtol+usenet@xs4all.nl>:
>> On 2012-12-15 07:19, Joe wrote:
>> > I received an Excel data file that contains a "+/-" symbol (html code
>> \± ±), that can be copied and displayed in Word, Notepad,
>> "Kompozer" html editor, unix vi, pico editors, and load to/retrieve from
>> MySQL operated on linux. But when I need to manipulate the data in
>> perl, I am lost as how to recognize the symbol with RE. Could anyone
>> help?
>>
>> First, look up its Unicode code point.
>>
>> Google for: unicode plus minus, which will lead you to (for example)
>> http://www.fileformat.info/info/unicode/char/b1/index.htm
>>
>> From there it is easy to deduce: \x{B1}.
>
> Or just use what you already know:
>
> use HTML::Entities "decode_entities";
> my $plusmn = decode_entities "±";
Or just copy/paste the sign into your source code:
#!/usr/bin/perl
use warnings;
use strict;
use utf8;
my $text = "the result is 8±2";
if ($text =~ m/±/) {
print "The text contains a plus/minus sign\n";
}
__END__
(Of course you need to make sure that the module you use to read the
Excel sheet really returns the ± as a single character U+00B1, but this is true
for all methods. If it doesn't, use Encode::decode to convert whatever
your Excel module returns into something sane.)
hp
--
_ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR | Man feilt solange an seinen Text um, bis
| | | hjp@hjp.at | die Satzbestandteile des Satzes nicht mehr
__/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
------------------------------
Date: Sat, 15 Dec 2012 14:04:05 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: help - how to find what is the code for "+/-" symbol copied from Windows app
Message-Id: <ljcup9-c2v.ln1@anubis.morrow.me.uk>
Quoth "Peter J. Holzer" <hjp-usenet2@hjp.at>:
>
> Or just copy/paste the sign into your source code:
>
> #!/usr/bin/perl
> use warnings;
> use strict;
> use utf8;
>
> my $text = "the result is 8±2";
Should I comment on the irony of your newsreader having converted that
to ISO8859-1? :)
(This is why I'm slightly suspicious of the whole idea of non-ASCII
source code. It's fine as long as it's just in a file, but tends to be
much less likely to survive diffs/mailing-list posts/&c. without being
mangled.)
Ben
------------------------------
Date: Thu, 13 Dec 2012 15:03:35 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: split() and @_: Perl changed between 5.8 and 5.14
Message-Id: <8738zaj8oo.fsf@sapphire.mobileactivedefense.com>
Rainer Weikusat <rweikusat@mssgmbh.com> writes:
[...]
> There was a time when someone considered split splitting into @_ in
> scalar context a good idea.
As a historical remark targetted at people who are possibly not aware
of that (and in defense of the original choice :-):
The behaviour of shift and split regarding @_ is consistent with the
way a Bourne-style UNIX(*) shell treats the so-called 'positional
parameters' which are usually bound to the 'arguments' passed to the
current 'execution context', eg a shell function
show()
{
echo $1
}
will invoke the echo command with its first argument. In Perl, this
would look like this:
sub show
{
print $_[0], "\n";
}
Modifying this as follows:
show()
{
shift
echo $1
}
or
sub show
{
shift;
print $_[0], "\n";
}
would shift the (virtual) argument array one position to the left, ie
throw throw the first argument away and print the second. The shell
also supports a split operations which splits a string into 'words'
separated by the current value of the IFS (internal field separator)
variable. These words are then assigned to the positional parameters,
eg
show()
{
IFS_="$IFS"
IFS=' '
set -- $1
IFS="$IFS_"
echo $2
}
will split the first argument into 'words' using a single space as
separator and print the second of these. In Perl with the original
split, this would look like this:
sub show
{
+split(/ /, $_[0]);
print $_[1], "\n";
}
I assume the reason the Bourne-shell behaves in this way is probably an
efficiency hack from the late 1970s and IMHO, imitating this behaviour
in Perl wasn't a good idea: Perl has arrays, the Bourne-shell language
doesn't and having to force split into a scalar context 'somehow'
doesn't really improve this.
------------------------------
Date: Thu, 13 Dec 2012 12:35:34 -0800 (PST)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: split() and @_: Perl changed between 5.8 and 5.14
Message-Id: <8643e2d0-3e7f-4690-ad70-9f44db2d3bb3@googlegroups.com>
On Wednesday, December 12, 2012 5:01:52 AM UTC-8, Kenny McCormack wrote:
> I have an old program that I wrote back in 2001, which has worked fine ever
>
> since - right up until today, when I ran it for the first time in quite a
>
> while. The script depends on the fact that (when it was written) when you
>
> do split(), it puts the data into @_.
>
>
>
> From what I can tell, the following are all true. Please confirm or deny:
>
>
>
> 1) In 5.8, this worked.
>
> 2) Somewhere along the way, this usage became "deprecated". I found a web
>
> site that explicitly said that, while the usage is deprecated, it still
>
> works, since if it was removed, old code (heh heh - such as mine) would
>
> get broken.
>
> 3) In 5.14, it doesn't work. No error or warning message is generated, but
>
> @_ is left unchanged.
>
>
>
> P.S. I changed the program line from something like:
>
>
>
> $x = @_[split(...)-1];
>
>
>
> to:
>
>
>
> @tmp = split(...);
>
> $x = @tmp[@tmp-1];
>
Shorter:
$x = $tmp[-1];
Or, sidestep warnings, if it's gotta be a one-liner:
$x = @{[split(...)]}[-1];
--
Charles DeRykus
------------------------------
Date: Thu, 13 Dec 2012 23:23:18 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: split() and @_: Perl changed between 5.8 and 5.14
Message-Id: <kado15$g0g$1@reader1.panix.com>
In article <ka9v80$qa5$1@news.xmission.com>,
Kenny McCormack <gazelle@shell.xmission.com> wrote:
>P.S. I changed the program line from something like:
>
> $x = @_[split(...)-1];
>
>to:
>
> @tmp = split(...);
> $x = @tmp[@tmp-1];
Oddly enough, in perl 5.8.8, there's no warning for that assignment to
$x, but
$x = @tmp[2];
(when @tmp has at least three elements, not undef, &c &c) produces
Scalar value @tmp[2] better written as $tmp[2] at local/test/080.pl line 16.
@tmp[@tmp-1] is an array slice. It returns a list of values, one for
each subscript. Since there is only one value of subscript, it
returns a list of one value.
$x = SOME_LIST assigns to $x the last value of SOME_LIST, just as with
$x = (this, that, the_other, irrelevant, ignored, the_value_assigned);
With a list of one element, it will assign that value to $x.
In sum, that works, but it's generally considered better style to write
$x = $tmp[@tmp-1];
assigning a scalar to a scalar. And, as I indicated, I'm surprised
that there was no warning for the original version.
--
Tim McDaniel, tmcd@panix.com
------------------------------
Date: Fri, 14 Dec 2012 00:09:56 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: split() and @_: Perl changed between 5.8 and 5.14
Message-Id: <kb7qp9-a9v2.ln1@anubis.morrow.me.uk>
Quoth tmcd@panix.com:
> In article <ka9v80$qa5$1@news.xmission.com>,
> Kenny McCormack <gazelle@shell.xmission.com> wrote:
> >
> > @tmp = split(...);
> > $x = @tmp[@tmp-1];
>
> Oddly enough, in perl 5.8.8, there's no warning for that assignment to
> $x, but
> $x = @tmp[2];
> (when @tmp has at least three elements, not undef, &c &c) produces
> Scalar value @tmp[2] better written as $tmp[2] at
> local/test/080.pl line 16.
That warning is produced heuristically: in principle @tmp[2] is a
perfectly valid array slice that happens to only slice out one element,
but since it's likely you meant an element rather than a slice perl
produces a warning. It only does this when it is *sure* you meant a
single element, and the optimizer can only be sure of this for
relatively simple expressions. So, for instance, @tmp[$x+1+2] warns, but
@tmp[($x+1)+2] doesn't, because the expression just got too complex for
the optimizer to be certain it would only produce a single value in list
context.
> In sum, that works, but it's generally considered better style to write
> $x = $tmp[@tmp-1];
> assigning a scalar to a scalar. And, as I indicated, I'm surprised
> that there was no warning for the original version.
The case where the difference really matters is
@tmp[1] = ...
which is a list rather than a scalar assignment, and gives list context
to the RHS.
Ben
------------------------------
Date: Thu, 13 Dec 2012 22:07:38 -0800 (PST)
From: "C.DeRykus" <derykus@gmail.com>
Subject: Re: split() and @_: Perl changed between 5.8 and 5.14
Message-Id: <0bc6e72e-6d1c-46ae-9a88-b438d65c991b@googlegroups.com>
On Thursday, December 13, 2012 3:23:18 PM UTC-8, Tim McDaniel wrote:
> In article <ka9v80$qa5$1@news.xmission.com>,
>
> Kenny McCormack <gazelle@shell.xmission.com> wrote:
>
> >P.S. I changed the program line from something like:
>
> >
>
> > $x = @_[split(...)-1];
>
> >
>
> >to:
>
> >
>
> > @tmp = split(...);
>
> > $x = @tmp[@tmp-1];
>
>
>
> Oddly enough, in perl 5.8.8, there's no warning for that assignment to
>
> $x, but
>
> $x = @tmp[2];
>
> (when @tmp has at least three elements, not undef, &c &c) produces
>
> Scalar value @tmp[2] better written as $tmp[2] at local/test/080.pl line 16.
>
A bit far-fetched but here's an example of how
things could go wrong due to context:
perl -E '@tmp = 0..3; sub foo{@tmp};
$foo[0]= $tmp[&foo]; say "\@foo=@foo";
@foo[0]= @tmp[&foo]; say "\@foo=@foo"'
$foo[0]= # @tmp in scalar c. = 4 so $tmp[4]=undef
@foo[0]=0 # @tmp in list c. so 1st element only
>
>
> @tmp[@tmp-1] is an array slice. It returns a list of values, one for
>
> each subscript. Since there is only one value of subscript, it
>
> returns a list of one value.
>
> $x = SOME_LIST assigns to $x the last value of SOME_LIST, just as with
>
> $x = (this, that, the_other, irrelevant, ignored, the_value_assigned);
>
> With a list of one element, it will assign that value to $x.
>
>
>
> In sum, that works, but it's generally considered better style to write
>
> $x = $tmp[@tmp-1];
>
> assigning a scalar to a scalar.
True and, since the OP just wanted the last array
member, the clearest idiom is just: $x = $tmp[-1]
> ...
--
Charles DeRykus
------------------------------
Date: Fri, 14 Dec 2012 14:45:30 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: split() and @_: Perl changed between 5.8 and 5.14
Message-Id: <87a9tgvgj9.fsf@sapphire.mobileactivedefense.com>
"C.DeRykus" <derykus@gmail.com> writes:
[...]
>> In sum, that works, but it's generally considered better style to write
>>
>> $x = $tmp[@tmp-1];
>>
>> assigning a scalar to a scalar.
>
>
> True and, since the OP just wanted the last array
> member, the clearest idiom is just: $x = $tmp[-1]
Well, except that splitting the string into n components in order
extract the last part which is separated by 'some regex' from any
preceding parts (if any) isn't a particularly good idea.
---------
use Benchmark;
my $a = 'a 'x16;
$a .= 'b';
timethese(-5,
{
split => sub {
my @tmp;
@tmp = split(/ /, $a);
return $tmp[-1];
},
re => sub {
$a =~ /^(?:.* )?(.*)/ and return $1;
}});
------------------------------
Date: Sat, 15 Dec 2012 00:14:42 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: split() and @_: Perl changed between 5.8 and 5.14
Message-Id: <kagfdi$7sl$1@reader1.panix.com>
In article <87a9tgvgj9.fsf@sapphire.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
>"C.DeRykus" <derykus@gmail.com> writes:
>
>[...]
>
>>> In sum, that works, but it's generally considered better style to write
>>>
>>> $x = $tmp[@tmp-1];
>>>
>>> assigning a scalar to a scalar.
>>
>>
>> True and, since the OP just wanted the last array
>> member, the clearest idiom is just: $x = $tmp[-1]
>
>Well, except that splitting the string into n components in order
>extract the last part which is separated by 'some regex' from any
>preceding parts (if any) isn't a particularly good idea.
>
>---------
>use Benchmark;
>
>my $a = 'a 'x16;
>$a .= 'b';
>
>timethese(-5,
> {
> split => sub {
> my @tmp;
>
> @tmp = split(/ /, $a);
> return $tmp[-1];
> },
>
> re => sub {
> $a =~ /^(?:.* )?(.*)/ and return $1;
> }});
You presuppose for this that execution time is the main or only
foundation of goodness. I can tell what the split&[-1] above does
instantly; I'd have to look up ?: and think a moment about what the
regex does; they don't do the same things for trailing whitespace.
If it's not on a major execution path, I'd much prefer split&[-1] for
my own readability.
--
Tim McDaniel, tmcd@panix.com
------------------------------
Date: Sat, 15 Dec 2012 02:02:36 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: split() and @_: Perl changed between 5.8 and 5.14
Message-Id: <sa2tp9-nnf.ln1@anubis.morrow.me.uk>
Quoth tmcd@panix.com:
> In article <87a9tgvgj9.fsf@sapphire.mobileactivedefense.com>,
> Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
> >
> >Well, except that splitting the string into n components in order
> >extract the last part which is separated by 'some regex' from any
> >preceding parts (if any) isn't a particularly good idea.
> >
> >---------
> >use Benchmark;
> >
> >my $a = 'a 'x16;
> >$a .= 'b';
> >
> >timethese(-5,
> > {
> > split => sub {
> > my @tmp;
> >
> > @tmp = split(/ /, $a);
> > return $tmp[-1];
> > },
> >
> > re => sub {
> > $a =~ /^(?:.* )?(.*)/ and return $1;
You need /s here; once you have that the ^ is redundant, since /./s will
match anything and the pattern will always match as early as possible.
> > }});
>
> You presuppose for this that execution time is the main or only
> foundation of goodness. I can tell what the split&[-1] above does
> instantly; I'd have to look up ?: and think a moment about what the
> regex does; they don't do the same things for trailing whitespace.
> If it's not on a major execution path, I'd much prefer split&[-1] for
> my own readability.
$a =~ s/.* //s;
If you care about the trailing whitespace, then you need something like
$a =~ s/.* (?!\z)//s;
though that is a little less obvious. Alternatively,
my ($tmp) = $a =~ /([^ ])\z/;
my ($tmp) = $a =~ /([^ ][ ]*)\z/;
though that has the disadvantage of being harder to adapt to a
multi-character separator.
Ben
------------------------------
Date: Sat, 15 Dec 2012 18:38:52 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: split() and @_: Perl changed between 5.8 and 5.14
Message-Id: <87ehirw477.fsf@sapphire.mobileactivedefense.com>
tmcd@panix.com (Tim McDaniel) writes:
[...]
>> splitting the string into n components in order
>>extract the last part which is separated by 'some regex' from any
>>preceding parts (if any) isn't a particularly good idea.
[...]
>> re => sub {
>> $a =~ /^(?:.* )?(.*)/ and return $1;
>> }});
>
> You presuppose for this that execution time is the main or only
> foundation of goodness. I can tell what the split&[-1] above does
> instantly; I'd have to look up ?: and think a moment about what the
> regex does;
You presuppose that 'ignorance is bliss' is an universal truth
:-). Assuming I didn't knew about (:?) and I came accross it
somewhere in existing code, I would look it up and insofar I'd judge
the use made of it as 'reasonably straight-forward exploitation of an
existing facility' (admittedly, subjective) I'd be happy that I learnt
something new which will likely be of use to me in future.
> they don't do the same things for trailing whitespace.
It also doesn't boil eggs or send greeting cards timely in case of
relatives' birthdays or any number of other things, even a lot
computer-related ones: A formal definition of correctness I remember
was 'assuming the preconditions were true initially and the invariant
conditions stayed true during execution, the postconditions will be
true afterwards'. It follows that any correct piece of code can be
rendered incorrect by modifying any of these three sets in a suitable
way.
> If it's not on a major execution path, I'd much prefer split&[-1] for
> my own readability.
Oh, it will be on a major execution path eventually, maybe tomorrow or
next year or after some kind of 'junior programmer' copied&pasted the
'known to be working' code into a hard realtime system or just because
your assumption was wrong.
It is generally better to avoid problems which can be easily avoided
instead of waiting until the pile of 1st level support guys who died
of nervous exhaustion while hectically trying to convince customers
that what hit them isn't really a problem becomes so high that getting
through the office door becomes difficult (The obvious alternative is
to always change jobs quickly enough that the obnoxious burden of
'making the code actually work well enough to solve the problem' falls
onto someone else. NB: I've encountered quite a few people who acted
in this way).
NB^2: This text is supposed to be somewhat tongue-in-cheek and not
intended to be offensive or abusive.
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3838
***************************************