[32345] in Perl-Users-Digest
Perl-Users Digest, Issue: 3612 Volume: 11
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Feb 15 21:09:28 2012
Date: Wed, 15 Feb 2012 18:09:10 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Wed, 15 Feb 2012 Volume: 11 Number: 3612
Today's topics:
Elegant ways to convert '' or 'number' to a number (Tim McDaniel)
Re: Elegant ways to convert '' or 'number' to a number <glex_no-spam@qwest-spam-no.invalid>
Re: Elegant ways to convert '' or 'number' to a number <ben@morrow.me.uk>
Re: Elegant ways to convert '' or 'number' to a number (Tim McDaniel)
Re: Elegant ways to convert '' or 'number' to a number <rweikusat@mssgmbh.com>
Re: Elegant ways to convert '' or 'number' to a number (Tim McDaniel)
Re: Elegant ways to convert '' or 'number' to a number <rweikusat@mssgmbh.com>
Re: Elegant ways to convert '' or 'number' to a number (Tim McDaniel)
Re: Elegant ways to convert '' or 'number' to a number <cartercc@gmail.com>
Re: Elegant ways to convert '' or 'number' to a number <usenet05@drabble.me.uk>
Re: Elegant ways to convert '' or 'number' to a number (Tim McDaniel)
Re: Elegant ways to convert '' or 'number' to a number <ben@morrow.me.uk>
Re: Opening Unicode files? tchrist@perl.com
Re: Remove all HTML but keep <p> tags <nospam.gravitalsun.antispam@hotmail.com.nospam>
Re: Remove all HTML but keep <p> tags <*@eli.users.panix.com>
Re: sorting unicode file under windows command line <eric.pement@gmail.com>
Re: sorting unicode file under windows command line <nospam.gravitalsun.antispam@hotmail.com.nospam>
Re: sorting unicode file under windows command line tchrist@perl.com
Re: sorting unicode file under windows command line tchrist@perl.com
Re: Unicode-AGE of a character? tchrist@perl.com
Re: WWW::Mechanize submit_form does not return expected <justin.1201@purestblue.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Tue, 14 Feb 2012 21:37:11 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Elegant ways to convert '' or 'number' to a number
Message-Id: <jhek67$dpd$1@reader1.panix.com>
So a cow-orker has the result of a database query. It's being
returned as a string: it contains an integer or it's a null string.
(Yes, we're certain.) He wants to use it in a numeric context, but is
getting
Argument "" isn't numeric in numeric eq (==) at FILE line NUMBER.
warnings (as we have "use warnings" on). I had had the impression
that "+ 0" would do the conversion and avoid a warning, but that's not
the case.
Is there an elegant idiom for converting such a string to a number
without producing a warning if it happens to be a null string?
* $t = $t ? 0+$t : 0;
is not what I would call "elegant", especially in this case, where
it's not $t but $alonghashtablename{AVERYLONGHASHINDEXNAME}.
* $t = "0$t" + 0;
isn't so elegant either (though at least it does not convert to
octal, as I first wondered).
* $t ||= 0;
doesn't give you a number per se, but at least the string converts
to a number in numeric contexts without a warning.
* ($t ||= 0) += 0;
Its main virtue is that it converts it to a number and only uses $t
once.
I'm thinking that
{ no warnings 'numeric'; $t += 0; }
is the best choice. It has the great virtue of being abundantly clear
on what they want, as opposed to a person reading having to figure out
some hack that happens to do it without a warning.
I should note that we're using 5.8.8 and NO I have no way whatsoever
of changing that.
--
Tim McDaniel, tmcd@panix.com
------------------------------
Date: Tue, 14 Feb 2012 16:35:29 -0600
From: "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid>
Subject: Re: Elegant ways to convert '' or 'number' to a number
Message-Id: <4f3ae1b1$0$75671$815e3792@news.qwest.net>
On 02/14/12 15:37, Tim McDaniel wrote:
> So a cow-orker has the result of a database query.
I didn't know cows used databases.. :-)
> It's being
> returned as a string: it contains an integer or it's a null string.
> (Yes, we're certain.) He wants to use it in a numeric context, but is
> getting
> Argument "" isn't numeric in numeric eq (==) at FILE line NUMBER.
> warnings (as we have "use warnings" on). I had had the impression
> that "+ 0" would do the conversion and avoid a warning, but that's not
> the case.
If you need to have warnings enabled, for some reason, then it might
be cleaner to fix the source of the problem. e.g. the SQL ...
ISNULL or NVL or IFNULL or ....
I generally don't 'use warnings' in production, because of these and the
uninitialized warnings. Just lazy, I guess.
Another solution, don't try to modify the value, just shortcut it and
only let it try the numeric test, it if it's defined:
if( defined $val && $val == 1234 )
That won't help when printing $val though..
------------------------------
Date: Tue, 14 Feb 2012 23:25:46 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Elegant ways to convert '' or 'number' to a number
Message-Id: <q47r09-d1k2.ln1@anubis.morrow.me.uk>
Quoth tmcd@panix.com:
> So a cow-orker has the result of a database query. It's being
> returned as a string: it contains an integer or it's a null string.
> (Yes, we're certain.) He wants to use it in a numeric context, but is
> getting
> Argument "" isn't numeric in numeric eq (==) at FILE line NUMBER.
> warnings (as we have "use warnings" on). I had had the impression
> that "+ 0" would do the conversion and avoid a warning, but that's not
> the case.
>
> Is there an elegant idiom for converting such a string to a number
> without producing a warning if it happens to be a null string?
By 'null string' you mean an empty string, not undef, right?
> * $t = $t ? 0+$t : 0;
> is not what I would call "elegant", especially in this case, where
> it's not $t but $alonghashtablename{AVERYLONGHASHINDEXNAME}.
You can fix that with for:
$_ = $_ ? 0+$_ : 0
for $alonghashtablename{AVERYLONGHASHINDEXNAME};
> I'm thinking that
> { no warnings 'numeric'; $t += 0; }
> is the best choice. It has the great virtue of being abundantly clear
> on what they want, as opposed to a person reading having to figure out
> some hack that happens to do it without a warning.
Yes, I think so. You shouldn't be afraid of turning warnings off where
you need to: the fact that you've done so serves as a good indication
that you are intentionally doing something a little out of the ordinary.
Ben
------------------------------
Date: Wed, 15 Feb 2012 00:43:54 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: Elegant ways to convert '' or 'number' to a number
Message-Id: <jhev4a$5v7$1@reader1.panix.com>
In article <q47r09-d1k2.ln1@anubis.morrow.me.uk>,
Ben Morrow <ben@morrow.me.uk> wrote:
>
>Quoth tmcd@panix.com:
>> Is there an elegant idiom for converting such a string to a number
>> without producing a warning if it happens to be a null string?
>
>By 'null string' you mean an empty string, not undef, right?
Yes. I agree with Cerebron on dragons: the concepts ought to be kept
rigidly distinct.
--
Tim McDaniel, tmcd@panix.com
------------------------------
Date: Wed, 15 Feb 2012 15:25:49 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Elegant ways to convert '' or 'number' to a number
Message-Id: <87ty2s6ruq.fsf@sapphire.mobileactivedefense.com>
tmcd@panix.com (Tim McDaniel) writes:
> In article <q47r09-d1k2.ln1@anubis.morrow.me.uk>,
> Ben Morrow <ben@morrow.me.uk> wrote:
>>
>>Quoth tmcd@panix.com:
>>> Is there an elegant idiom for converting such a string to a number
>>> without producing a warning if it happens to be a null string?
>>
>>By 'null string' you mean an empty string, not undef, right?
>
> Yes. I agree with Cerebron on dragons: the concepts ought to be kept
> rigidly distinct.
They are not different concepts and Perl and have never been different
concepts in Perl: Perl is one of those 'messy' languages designed
around the idea that automatic conversions are a good thing because
they reduce the amount of boilerplate code people need to write (and
other people need to read). Other languages have been designed in
different ways and might be more suitable who consider those different
ways essential.
------------------------------
Date: Wed, 15 Feb 2012 16:46:04 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: Elegant ways to convert '' or 'number' to a number
Message-Id: <jhgngc$rhh$1@reader1.panix.com>
In article <87ty2s6ruq.fsf@sapphire.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
>tmcd@panix.com (Tim McDaniel) writes:
>> In article <q47r09-d1k2.ln1@anubis.morrow.me.uk>,
>> Ben Morrow <ben@morrow.me.uk> wrote:
>>>
>>>Quoth tmcd@panix.com:
>>>> Is there an elegant idiom for converting such a string to a
>>>> number without producing a warning if it happens to be a null
>>>> string?
>>>
>>>By 'null string' you mean an empty string, not undef, right?
>>
>> Yes. I agree with Cerebron on dragons: the concepts ought to be
>> kept rigidly distinct.
>
>They are not different concepts and Perl and have never been
>different concepts in Perl
They can be distinguished via defined(). They cause different
warnings (uninitialized value).
More fundamentally, I do consider it worthwhile to distinguish between
different values of missing data.
--
Tim McDaniel, tmcd@panix.com
------------------------------
Date: Wed, 15 Feb 2012 17:09:10 +0000
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: Elegant ways to convert '' or 'number' to a number
Message-Id: <8762f86n2h.fsf@sapphire.mobileactivedefense.com>
tmcd@panix.com (Tim McDaniel) writes:
> In article <87ty2s6ruq.fsf@sapphire.mobileactivedefense.com>,
> Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
>>tmcd@panix.com (Tim McDaniel) writes:
>>> In article <q47r09-d1k2.ln1@anubis.morrow.me.uk>,
>>> Ben Morrow <ben@morrow.me.uk> wrote:
>>>>
>>>>Quoth tmcd@panix.com:
>>>>> Is there an elegant idiom for converting such a string to a
>>>>> number without producing a warning if it happens to be a null
>>>>> string?
>>>>
>>>>By 'null string' you mean an empty string, not undef, right?
>>>
>>> Yes. I agree with Cerebron on dragons: the concepts ought to be
>>> kept rigidly distinct.
>>
>>They are not different concepts and Perl and have never been
>>different concepts in Perl
>
> They can be distinguished via defined(). They cause different
> warnings (uninitialized value).
They are not disinguishable for their 'string value' because automatic
conversions are done by Perl as required, as I already wrote. Provided
run-time warnings are enabled, some pretty random looking subset of
the situations where such an automatic conversion takes places cause
some text to be printed. That's an optional feature some people
consider to be useful (primarily for others, as it seems ...).
> More fundamentally, I do consider it worthwhile to distinguish between
> different values of missing data.
http://en.wikipedia.org/wiki/Law_of_excluded_middle
------------------------------
Date: Wed, 15 Feb 2012 20:01:50 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: Elegant ways to convert '' or 'number' to a number
Message-Id: <jhh2ve$8im$1@reader1.panix.com>
In article <8762f86n2h.fsf@sapphire.mobileactivedefense.com>,
Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
>tmcd@panix.com (Tim McDaniel) writes:
>> In article <87ty2s6ruq.fsf@sapphire.mobileactivedefense.com>,
>> Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
>>>tmcd@panix.com (Tim McDaniel) writes:
>>>> In article <q47r09-d1k2.ln1@anubis.morrow.me.uk>,
>>>> Ben Morrow <ben@morrow.me.uk> wrote:
>>>>>
>>>>>Quoth tmcd@panix.com:
>>>>>> Is there an elegant idiom for converting such a string to a
>>>>>> number without producing a warning if it happens to be a null
>>>>>> string?
>>>>>
>>>>>By 'null string' you mean an empty string, not undef, right?
>>>>
>>>> Yes. I agree with Cerebron on dragons: the concepts ought to be
>>>> kept rigidly distinct.
>>>
>>>They are not different concepts and Perl and have never been
>>>different concepts in Perl
>>
>> They can be distinguished via defined(). They cause different
>> warnings (uninitialized value).
>
>They are not disinguishable for their 'string value' because
>automatic conversions are done by Perl as required
Nevertheless, they are different concepts, although there is only one
place (defined()) where code can tell the difference, so far as I
know.
--
Tim McDaniel, tmcd@panix.com
------------------------------
Date: Wed, 15 Feb 2012 14:17:28 -0800 (PST)
From: ccc31807 <cartercc@gmail.com>
Subject: Re: Elegant ways to convert '' or 'number' to a number
Message-Id: <34e2d0d3-7b2e-42fb-bc7b-bfb1a3461278@y10g2000vbn.googlegroups.com>
On Feb 14, 4:37=A0pm, t...@panix.com (Tim McDaniel) wrote:
> So a cow-orker has the result of a database query. =A0It's being
> returned as a string: it contains an integer or it's a null string.
> (Yes, we're certain.) =A0He wants to use it in a numeric context,
> Is there an elegant idiom for converting such a string to a number
> without producing a warning if it happens to be a null string?
What's wrong with using int(), or sprintf()?
I had a similar problem, but the reverse. I used person ID numbers as
keys in a hash table, and manipulated the values in various ways,
which included Microsoft Excel. I kept getting aggravating errors over
a long time, and after one experience went through the results line by
line and discovered that sometimes ID numbers with leading zeros were
treated like real numbers, and a value like '4321' does not match a
hash key like '0004321'.
I started using sprintf() when in doubt, and it solved the problem. I
just convert whatever the value is into a string, and it preserves the
leading zeros.
If you want to reverse the process, int() would probably work and be
less verbose than sprint().
CC.
CC.
------------------------------
Date: Wed, 15 Feb 2012 22:52:26 GMT
From: Graham Drabble <usenet05@drabble.me.uk>
Subject: Re: Elegant ways to convert '' or 'number' to a number
Message-Id: <Xns9FFAE8AFF93E2grahamdrabblelineone@ID-77355.user.dfncis.de>
On 14 Feb 2012 tmcd@panix.com (Tim McDaniel) wrote in
news:jhek67$dpd$1@reader1.panix.com:
> So a cow-orker has the result of a database query. It's being
> returned as a string: it contains an integer or it's a null
> string.
Has he though of changing the query. You don't say what DB he's using
but the following should work in MSSQL and I would expect similar to
be possible in other DBMSs.
Instead of
"SELECT
column
from table"
use
"SELECT
case
when column is null then 0
else column
end
from table"
------------------------------
Date: Wed, 15 Feb 2012 23:10:13 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: Elegant ways to convert '' or 'number' to a number
Message-Id: <jhhe0l$pn$1@reader1.panix.com>
In article <34e2d0d3-7b2e-42fb-bc7b-bfb1a3461278@y10g2000vbn.googlegroups.com>,
ccc31807 <cartercc@gmail.com> wrote:
>On Feb 14, 4:37 pm, tmcd@panix.com (Tim McDaniel) wrote:
>> So a cow-orker has the result of a database query. It's being
>> returned as a string: it contains an integer or it's a null string.
>> (Yes, we're certain.) He wants to use it in a numeric context,
>> Is there an elegant idiom for converting such a string to a number
>> without producing a warning if it happens to be a null string?
>
>What's wrong with using int(),
The part about "without producing a warning if it happens to be a null string".
$ perl -e 'use warnings; int("")'
Argument "" isn't numeric in int at -e line 1.
Since it does happen to be an integer or the null string, I think "0+"
is equivalent to int().
> or sprintf()?
Not sure what you mean in this case. Do you mean this?
$ perl -e 'use warnings; my $x = sprintf("%d", ""); print "[$x]\n"'
Argument "" isn't numeric in sprintf at -e line 1.
[0]
--
Tim McDaniel, tmcd@panix.com
------------------------------
Date: Wed, 15 Feb 2012 23:23:23 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Elegant ways to convert '' or 'number' to a number
Message-Id: <bcrt09-10b.ln1@anubis.morrow.me.uk>
Quoth tmcd@panix.com:
> In article <34e2d0d3-7b2e-42fb-bc7b-bfb1a3461278@y10g2000vbn.googlegroups.com>,
> ccc31807 <cartercc@gmail.com> wrote:
> >On Feb 14, 4:37 pm, tmcd@panix.com (Tim McDaniel) wrote:
> >> So a cow-orker has the result of a database query. It's being
> >> returned as a string: it contains an integer or it's a null string.
> >> (Yes, we're certain.) He wants to use it in a numeric context,
> >> Is there an elegant idiom for converting such a string to a number
> >> without producing a warning if it happens to be a null string?
> >
> >What's wrong with using int(),
>
> The part about "without producing a warning if it happens to be a null string".
>
> $ perl -e 'use warnings; int("")'
> Argument "" isn't numeric in int at -e line 1.
>
> Since it does happen to be an integer or the null string, I think "0+"
> is equivalent to int().
It is, yes, but actually
$x = int($x || 0);
is at least as clear as any of the other solutions we've come up with.
It has the advantage that it preserves 'non-numeric' warnings for
everything other than the empty string, just in case the input data ends
up wrong at some point in the future.
Ben
------------------------------
Date: Wed, 15 Feb 2012 13:02:57 -0800 (PST)
From: tchrist@perl.com
Subject: Re: Opening Unicode files?
Message-Id: <8003778.32.1329339777548.JavaMail.geo-discussion-forums@pbcr5>
On Tuesday, December 27, 2011 2:19:00 PM UTC-7, Ilya Zakharevich wrote:
> On 2011-12-27, Ben Morrow <ben@morrow.me.uk> wrote:
> > Encode::Guess, which can be invoked as
> >
> > open my $fh, '< :encoding(Guess)', $filename
> >
> > Somewhat annoyingly, you have to explicitly use Encode::Guess or it
> > won't recognise the encoding name, and you have to use
> > Encode::Guess->set_suspects to set the list of encodings to try.
>
> Same question as to the other answer: does it ship with Perl? And I
> do not want any guessing; I want a very deterministic procedure...
Ilya,
I understand completely. I find that Encode::Guess is too unreliable for
my purposes. I have a replacement version that is built on a statistical
model derived from very large English-language corpora, which it gets
right 99.79% of the time, including on conflicting 8-bit encodings. For
example, it knows CP1252 from MacRoman from ISO-8859-1 from ISO-8859-15,
etc. I have a working alpha version of the code, so if you are interested in this
technique or wish to know more, please send me mail. You can fetch the
alpha version from
http://training.perl.com/scripts/Encode-Guess-Educated-0.03.tar.gz
I'm having trouble with my PAUSE id, so it isn't on CPAN yet.
Hope this helps, and do feel free to write. I never look here for anything,
so am likely to miss a reply.
--tom
------------------------------
Date: Tue, 14 Feb 2012 20:14:53 +0200
From: "George Mpouras" <nospam.gravitalsun.antispam@hotmail.com.nospam>
Subject: Re: Remove all HTML but keep <p> tags
Message-Id: <jhe8au$2oqs$1@news.ntua.gr>
# try this
use strict;
use warnings;
my $htm=sub{local $/=undef;$_=$_[0];<$_>}->(\*DATA);
while( $htm =~/<p[^>]*?>(.*?)<\/p>/gi ) {
print "*$^N*\n"
}
__DATA__
<p>Earth</p> blah1 <p style=...>Sun</p> blah1
<p style=...>Moon</p> blah2 <p>
Venus
</p><p style=...>Hermes</p>blah2<p>
Jupiter</p>
------------------------------
Date: Wed, 15 Feb 2012 00:19:20 +0000 (UTC)
From: Eli the Bearded <*@eli.users.panix.com>
Subject: Re: Remove all HTML but keep <p> tags
Message-Id: <eli$1202141919@qz.little-neck.ny.us>
In comp.lang.perl.misc,
George Mpouras <nospam.gravitalsun.antispam@hotmail.com.nospam> wrote:
> # try this
> while( $htm =~/<p[^>]*?>(.*?)<\/p>/gi ) {
> print "*$^N*\n"
> }
That won't be effective if there are <PRE> (or other <P\w+>) tags,
missing </P> tags, if there is markup inside the paragraphs, if
the close paragraph has optional whitespace like "</P >", or if
the paragraph contains newlines.
That's all the errors I can find in a first glance.
Elijah
------
thinks it is html comments that make the original problem tricky
------------------------------
Date: Tue, 14 Feb 2012 07:34:40 -0800 (PST)
From: Eric Pement <eric.pement@gmail.com>
Subject: Re: sorting unicode file under windows command line
Message-Id: <109afbdf-fbd1-4749-9c9b-07228448e39a@m2g2000vbc.googlegroups.com>
On Feb 14, 4:34=A0am, happytoday <ehabaziz2...@gmail.com> wrote:
> I am trying to sort a file according to unicode field
> (position,length) under Berkeley unix version (windows version). I
> tried msort3.exe utility but can not find msort3.exe working with me.
> Is there a command line utitlity or perl/sedawk program that sorts a
> file according to unicode column UTF-8 with start_position,length_positio=
n.
You should try GNU sort, which does run under Windows. Note the need
to set some environment variables. From the info pages:
(1) If you use a non-POSIX locale (e.g., by setting `LC_ALL' to
`en_US'), then `sort' may produce output that is sorted differently
than you're accustomed to. In that case, set the `LC_ALL' environment
variable to `C'. Note that setting only `LC_COLLATE' has two
problems.
First, it is ineffective if `LC_ALL' is also set. Second, it has
undefined behavior if `LC_CTYPE' (or `LANG', if `LC_CTYPE' is unset)
is
set to an incompatible value. For example, you get undefined behavior
if `LC_CTYPE' is `ja_JP.PCK' but `LC_COLLATE' is `en_US.UTF-8'.
A comment on stackoverflow.com says this:
... keep in mind that GNU sort depends on a correct locale setting
(the LC_* environment variables, and specifically the LC_COLLATE one).
LC_COLLATE (or LC_ALL) should be set to a locale with UTF-8 support
(e.g. en_US.UTF-8 or el_GR.UTF-8), preferably in the language that you
are interested in.
To sort on start position, end position, do this;
sort -t x -k 1.M,1.N
where 'x' is a character known to exist nowhere in the file, M is the
start column number, and N is the end column number.
Eric
------------------------------
Date: Tue, 14 Feb 2012 19:40:24 +0200
From: "George Mpouras" <nospam.gravitalsun.antispam@hotmail.com.nospam>
Subject: Re: sorting unicode file under windows command line
Message-Id: <jhe6ae$2g36$1@news.ntua.gr>
#!c:/Perl/bin/perl.exe
# Have a happy sorting
use encoding 'utf8';
my @File;
my %Positions_and_lenghts = (
0 => 1 ,
2 => 1 ,
4 => 4 ,
9 => 6 ,
);
#For external file
#open FILE, '>:utf8', 'c:/ome/utf8/file' or die "$^E\n";
binmode STDOUT, ':utf8';
while (my $line = <DATA>) {
my $row;
foreach my $POS (sort {$a<=>$b} keys %Positions_and_lenghts) {
push @{$row}, substr $line, $POS, $Positions_and_lenghts{$POS}
}
push @File, $row
}
#use Data::Dumper; print Dumper(\@File);exit;
foreach my $row (sort {$a->[0] cmp $b->[0] || $a->[2] cmp $b->[2]} @File) {
print "@{$row}\n"
}
__DATA__
1 a αα δδδ
1 b ββ γγγ
2 a αα βββ
2 b ββ ααα
------------------------------
Date: Wed, 15 Feb 2012 13:08:12 -0800 (PST)
From: tchrist@perl.com
Subject: Re: sorting unicode file under windows command line
Message-Id: <27879777.8.1329340092108.JavaMail.geo-discussion-forums@pbor1>
I think you should consider using the ucsort script. It is made
for sorting Unicode in the UTF-8 encoding, because it sorts
via the Unicode Collation Algorithm. It by default sorts the entire
line without thinking about fields, but there are options to account
for those. They are not like the regular sort program's options.
For example, to sort just using characters 10-20 in each line, you
might do
$ ucsort --pre="s/^.{10}(.{10}).*/$1/" < inputfile > outputfile
You can get the beta version from
http://training.perl.com/scripts/ucsort
--tom
------------------------------
Date: Wed, 15 Feb 2012 13:13:25 -0800 (PST)
From: tchrist@perl.com
Subject: Re: sorting unicode file under windows command line
Message-Id: <2187186.21.1329340405174.JavaMail.geo-discussion-forums@pbgq3>
On Tuesday, February 14, 2012 8:34:40 AM UTC-7, Eric Pement wrote:
> A comment on stackoverflow.com says this:
>
> ... keep in mind that GNU sort depends on a correct locale setting
> (the LC_* environment variables, and specifically the LC_COLLATE one).
> LC_COLLATE (or LC_ALL) should be set to a locale with UTF-8 support
> (e.g. en_US.UTF-8 or el_GR.UTF-8), preferably in the language that you
> are interested in.
This is the problem with all the vendor-locale things: they are unreliable,
and they require particular locale settings. This is not reasonable in
a Unicode world. Much better to use Unicode::Collate and if necessary also
Unicode::Collate::Locale. For example, a pure-Perl solution for sorting
according to German phonebook conventions, and with uppercase
before lowercase, is:
$ ucsort --locale=de__phonebook --upper_before_lower
------------------------------
Date: Wed, 15 Feb 2012 13:20:40 -0800 (PST)
From: tchrist@perl.com
Subject: Re: Unicode-AGE of a character?
Message-Id: <13067596.31.1329340840651.JavaMail.geo-discussion-forums@pbbpk4>
On Monday, January 9, 2012 11:47:56 PM UTC-7, Ilya Zakharevich wrote:
> I looked through the docs I could find, and can't find any way to
> determine the "Unicode AGE" of a particular codepoint except for:
>=20
> a) running /\p{Present_in: FOO}/ for all forseeable values of FOO;
>=20
> b) manually parsing $out =3D do 'unicore/To/Age.pl';.
>=20
> Do I miss anything?
I don=92t think so. When preparing the 4th Edition of Programming Perl
for printing, we needed to run an analysis of code point use by age. I=20
ended up doing this:
$char_info->{Age} =3D do { given ( $char ) {
when( /\p{Age=3D1.1}/ ) { '1.1' }
when( /\p{Age=3D2.0}/ ) { '2.0' }
when( /\p{Age=3D2.1}/ ) { '2.1' }
when( /\p{Age=3D3.0}/ ) { '3.0' }
when( /\p{Age=3D3.1}/ ) { '3.1' }
when( /\p{Age=3D3.2}/ ) { '3.2' }
when( /\p{Age=3D4.0}/ ) { '4.0' }
when( /\p{Age=3D4.1}/ ) { '4.1' }
when( /\p{Age=3D5.0}/ ) { '5.0' }
when( /\p{Age=3D5.1}/ ) { '5.1' }
when( /\p{Age=3D5.2}/ ) { '5.2' }
when( /\p{Age=3D6.0}/ ) { '6.0' }
default { 'N/A' }
} };
Which of course is suboptimal to say the least. I can criticize
it in quote a few directions. But it's what we used anyway.
I believe that Karl has some new stuff in the current blead that
exposes some of the character maps so you don't have to parse
the .pl files yourself. You might check into that.
--tom
------------------------------
Date: Wed, 15 Feb 2012 13:17:48 +0000
From: Justin C <justin.1201@purestblue.com>
Subject: Re: WWW::Mechanize submit_form does not return expected
Message-Id: <ssns09-k8d.ln1@zem.masonsmusic.co.uk>
On 2012-02-14, Justin C <justin.1201@purestblue.com> wrote:
> On 2012-02-10, Tad McClellan <tadmc@seesig.invalid> wrote:
>>
>> I'd use the "web scraping proxy" from AT&T:
>>
>> http://www2.research.att.com/sw/tools/wsp/
>>
>> It logs HTTP requests/responses in the form of Perl code (UserAgent).
>>
>> Then we don't need to know what all the JS does, we just need to know
>> how to construct the request we want...
>
> Thank you, Tad, that's very useful. I think I've found a cookie. I'm
> testing new code now.
Update: After much *much* time, too much time, trying to debug this I
finally found the problem. I had a typo in the name of a field in the
$mech->submit_form. A small transpose of two letters. The site still
functioned as I expected, but my error caused the site not to give me a
confirmation number.
I hate how debugging takes 3 times (or more) the time it takes to code!
Anyway, thanks to Tad and J for their suggestions.
Justin.
--
Justin C, by the sea.
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V11 Issue 3612
***************************************