[22292] in Perl-Users-Digest
Perl-Users Digest, Issue: 4513 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Feb 5 03:08:12 2003
Date: Wed, 5 Feb 2003 00:06:38 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Wed, 5 Feb 2003 Volume: 10 Number: 4513
Today's topics:
Re: Crossposting (was: Fetchrow Question) <flavell@mail.cern.ch>
Re: Crossposting (was: Fetchrow Question) <mgarrish@rogers.com>
Re: Crossposting (was: Fetchrow Question) <uri@stemsystems.com>
FastCGI question <itodd@remove.itodd.org>
Re: FastCGI question <goldbb2@earthlink.net>
Re: gSTLFilt/perl/g++ problem: regexp too big <fma@doe.carleton.ca>
Re: How do you: 'make install PREFIX=somepath', and the <f_ker@yahoo.co.uk_NO_SPAM>
Re: How do you: 'make install PREFIX=somepath', and the <f_ker@yahoo.co.uk_NO_SPAM>
Re: How do you: 'make install PREFIX=somepath', and the <f_ker@yahoo.co.uk_NO_SPAM>
question about perl and and active directory find - use <jozefn@bolt.sonic.net>
select case <istink@real.bad.com>
Re: select case <tassilo.parseval@post.rwth-aachen.de>
Some confusion <smiley@uvgotemail.com>
Re: Some confusion <fnaffle@visation.com>
Re: sort, my concoction of <mgjv@tradingpost.com.au>
Re: sort, my concoction of <goldbb2@earthlink.net>
Re: sort, my concoction of <mgjv@tradingpost.com.au>
Re: sort, my concoction of <uri@stemsystems.com>
Re: sort, my concoction of <goldbb2@earthlink.net>
Re: sort, my concoction of <tassilo.parseval@post.rwth-aachen.de>
Re: Until loop not working with multiple conditions (Paul)
Re: Until loop not working with multiple conditions <uri@stemsystems.com>
Re: Until loop not working with multiple conditions <josef.moellers@fujitsu-siemens.com>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Wed, 5 Feb 2003 00:46:02 +0100
From: "Alan J. Flavell" <flavell@mail.cern.ch>
Subject: Re: Crossposting (was: Fetchrow Question)
Message-Id: <Pine.LNX.4.53.0302050043040.6462@lxplus082.cern.ch>
On Feb 4, Tad McClellan inscribed on the eternal scroll:
> mgarrish <mgarrish@rogers.com> wrote:
[nothing worthy of note]
> Pardon me, but your ignorance is showing.
Would this be the ...garrish@sympatico that I plonked last year?
Looks to be due for a fresh plonk.
------------------------------
Date: Wed, 05 Feb 2003 03:44:53 GMT
From: "mgarrish" <mgarrish@rogers.com>
Subject: Re: Crossposting (was: Fetchrow Question)
Message-Id: <VM%%9.559070$F2h1.36171@news01.bloor.is.net.cable.rogers.com>
"Uri Guttman" <uri@stemsystems.com> wrote in message
news:x7u1fjpumi.fsf@mail.sysarch.com...
>
> matt, i am not arguing with you. i am trolling you.
>
I'm well aware of that. But trolling is not a one way street, so until the
next time you make the mistake of stepping in my pond, remember that it is
now Matt 3 Uri 0...
------------------------------
Date: Wed, 05 Feb 2003 04:17:46 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: Crossposting (was: Fetchrow Question)
Message-Id: <x7lm0vnyxj.fsf@mail.sysarch.com>
>>>>> "m" == mgarrish <mgarrish@rogers.com> writes:
m> "Uri Guttman" <uri@stemsystems.com> wrote in message
m> news:x7u1fjpumi.fsf@mail.sysarch.com...
>>
>> matt, i am not arguing with you. i am trolling you.
>>
m> I'm well aware of that. But trolling is not a one way street, so until the
m> next time you make the mistake of stepping in my pond, remember that it is
m> now Matt 3 Uri 0...
or more like matt 3 plonks, uri 0.
now go away already. you are not contributing to anything. you have
proven you are a jerk and we all know it. any more will be overkill.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
----- Stem and Perl Development, Systems Architecture, Design and Coding ----
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
Damian Conway Perl Classes - January 2003 -- http://www.stemsystems.com/class
------------------------------
Date: Wed, 05 Feb 2003 01:05:36 -0500
From: Todd <itodd@remove.itodd.org>
Subject: FastCGI question
Message-Id: <v41b4ak9g18622@corp.supernews.com>
Before I rebuild Apache for FastCGI, I'd like to know if my perl scripts
will have their own PIDs when they run under mod_FastCGI or will they
simply inherit the httpd's PID?
--
Todd
------------------------------
Date: Wed, 05 Feb 2003 00:47:20 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: FastCGI question
Message-Id: <3E40A568.7181E1B8@earthlink.net>
Todd wrote:
>
> Before I rebuild Apache for FastCGI, I'd like to know if my perl
> scripts will have their own PIDs when they run under mod_FastCGI or
> will they simply inherit the httpd's PID?
Yes, they'll have their own pids, just like CGI in this respect.
However, if you fetch a particular FastCGI script twice, then both
invocations will connect to the same process, and of course it's pid
won't have changed.
--
"So, who beat the clueless idiot today?"
"Well, we flipped for it, but when Kuno
landed, he wasn't in any shape to fight."
"Next time, try flipping a *coin.*"
------------------------------
Date: 4 Feb 2003 23:38:30 GMT
From: Shing-Fat Fred Ma <fma@doe.carleton.ca>
To: leor@bdsoft.com
Subject: Re: gSTLFilt/perl/g++ problem: regexp too big
Message-Id: <3E404EF0.ECA8B86A@doe.carleton.ca>
Just a followup to the problem below, with additional
details. It seems that almost *any* errors at all
will cause the error message filter to die, with the
current installation of Perl 5.004_04. There are
three test cases that I base this on, using the
following code (much simpler than the code in the
original post):
std::map<std::string,float> coll ;
std::map<std::string,float>::iterator pos ;
coll.insert(std::make_pair("otto",22.3));
coll.insert(std::make_pair("yasha",32.3));
coll.insert(std::make_pair("dongol",31.3));
coll.insert(std::make_pair("old key",51.3));
std::cout << std::endl ;
for( pos=coll.begin() ; pos!=coll.end() ; ++pos )
std::cout << "key: " << pos->first << "\t"
<< "value: " << pos->second
<< std::endl ;
coll["new_key"] = coll["old key"] ;
std::cout << std::endl ;
for( pos=coll.begin() ; pos!=coll.end() ; ++pos )
std::cout << "key: " << pos->first << "\t"
<< "value: " << pos->second
<< std::endl ;
This causes g++ errors, which breaks the error
message filter gstlfilt. The error
turns out to be that g++ 2.95.2 doesn't take
string literals as (const char *), whereas the STL
expects that. So the errors were eliminated by
casting all string literals into (const char *).
This leads to the 2nd test case, which worked fine
because there were no errors. There are no errors,
so Perl can't die from overly long error messages.
However, the whole point of gstlfilt is to help
clarify the messages when there are STL errors.
As a test, I introduced an STL error by changing
both instances of std::map to std::multimap. This
is case 3, which causes lengthy errors because multimap
containers do not have the operator=. And the
perl filter broke again, with the same error message
(regexp too long).
So the problem seems like it's perl more than gcc.
Our solaris installation of Perl breaks on errors,
period, and the fact that gcc is older exacerbates
that. The same STL error doesn't break gstlfilt
when run with the newer perl and newer gcc on cygwin
(versions below).
I should qualify these crude tests. I couldn't get
the (const char *) error of case 1 to show itself on
the newer toolset. I tried by using gcc's
-fno-const-sting, but no error.
Considering my numerous unsuccesful attempts to build
my own gcc, I thought it better to try and build
perl. I downloaded the 10MB tgz file and looked at
the solaris instructions. There are many reasons why
I would prefer a solution other than building perl,
which I mentioned in my original email. First and
foremost is time and inexperience. The solaris
instructions are quite lengthy, and many of the tools
that are mentioned, I don't have e.g.:
for tools (sccs, lex, yacc ): SUNWbtool,
SUNWsprot, SUNWtoo
for libraries & headers: SUNWhea, SUNWarc, SUNWlibm, SUNWlibms, SUNWdfbh,
SUNWcg6h, SUNWxwinc, SUNWolinc
I'm sure that with time, I can eventually find and/or
install them (and figure out what the they mean), but
since I'm not sysadmin, and I'm (in theory) getting
algorithm results for my thesis, I want to avoid the
banging my head against the tools until my thesis is
on track (I have spent considerable amount of
time doing that, but not necessarily on this particular
issue). My experience with trying to build gcc has
made me very wary of bucking the trend and building my
own tools. I've also spent loads of time just my
head into perl and STL (no small deal here, I found),
so I'm really trying to avoid going off track right now.
Second, it takes alot of space for a personal build.
So, I guess I'm back to the original question....aside
from building my own Perl, is there a workaround for
the fact the g++ errors breaks the perl filter for stl
error messages?
Thanks in advance.
Fred
--
Fred Ma, fma@doe.carleton.ca
Carleton University, Dept. of Electronics
1125 Colonel By Drive, Ottawa, Ontario
Canada, K1S 5B6
Shing-Fat Fred Ma wrote:
> Hello,
>
> I'm using Leor Zolman's Perl filter, gstlfilt, to clean up g++
> error messages on my use of the STL. It is the most recent:
>
> BD Software STL Message Decryptor
> Release 2.20 for gcc 2/3 (01/18/2003)
>
> The code is attached below.
>
> It works fine on g++ 3.2 running on cygwin 1.3.19-1
> with perl 5.6.1.
>
> I tried on g++ 2.95.2 running on Solaris 8 with
> Perl 5.004_04 (built for sun4-solaris). It complains
> that the regexp is too big.
>
> The meaning of the message is clear. The solution,
> from the web, seems to be breaking up the expression.
>
> I've spent the past few days learning perl, but I
> seriously doubt I will be able to robustly change
> the stlfilt script. My newness to Perl is compounded
> by my newsness to STL and its error messages.
> The only other solutions I can think of are to build my
> own Perl or g++ (I'm not root).
>
> I've tried building my own g++ in the past month,
> around 3 or 4 times. It's a very big program, and
> even building the software to test it is challenging.
> I haven't been able to build a version that passes
> the tests with reasonably few errors. Granted, I
> can only guess what is reasonable, but let's just
> say, I've had mountains of failed tests.
>
> Before attempting to build my own Perl, I wonder if
> I could get some advice on whether that is really
> the best way to be spending my time. Is there a
> better way to resolve the problem without blindly
> spending days (and weeks) randomly rebuilding
> things that are already on the system? These
> builds take alot of disk space, as well as time to
> get the knowledge to build them. Unfortunately,
> my sys admin is quite busy (we have just one), so
> it's not realistic to go bug him continuously about
> rebuilding stuff, especially if I'm the only person
> who benefits (it is a hardware electronics department).
>
> If there really is no other way, how bad is it to build
> Perl?
>
> Fred
>
> P.S. I have not posted to comp.lang.c++ because
> it would not be considered on-topic, even though
> there is alot of STL expertise there.
> --
> Fred Ma, fma@doe.carleton.ca
> Carleton University, Dept. of Electronics
> 1125 Colonel By Drive, Ottawa, Ontario
> Canada, K1S 5B6
> =========================================
> The program is the file testmap.cpp in the sample files
> that accompanies gstlfilt. It was compiled with
>
> gfilt testmap.cpp
>
> and generated the error
>
> /\b([io])stream_iterator<((?:(?:(?:(?:\b[a-zA-Z_]\w*::)*\b[a-zA-Z_]\w* )?(?:(?:\b[a-zA-Z_]\w*::)*\b[a-zA-Z_]\w* )?(?:(?:\b[a-zA-/: regexp too big at /home/fma/INSTROOT/bin/gSTLFilt.pl line 531, <STDIN> chunk 1.
> BD Software STL Message Decryptor Release 2.20 for gcc 2/3 (01/18/2003)
>
> The c++ code is:
>
> /////////////////////////////////////////////
> // testmap.cpp
> /////////////////////////////////////////////
> #include <iostream>
> #include <map>
> #include <algorithm>
> #include <cmath>
> using namespace std;
>
> const int values[] = { 1,2,3,4,5 };
> const int NVALS = sizeof values / sizeof (int);
>
> struct intComp: public binary_function<int, int, bool>
> {
> bool operator()(int a, int b) const
> {
> return a < b;
> }
> };
>
> int main()
> {
> using namespace std;
>
> typedef map<int, double> valmap;
> typedef map<int *, double *> pmap;
>
> valmap m2(1,2,3);
> pmap m3(1,2,3);
> map<int, double, intComp> valmap3;
>
> valmap m;
> pmap p;
>
> for (int i = 0; i < NVALS; i++)
> {
> m.insert(make_pair(values[i], pow(values[i], .5)));
> valmap3.insert(0);
> }
>
> valmap::iterator it = 100;
> valmap::const_iterator cit = 100;
>
> m.insert(1,2);
> m.insert(make_pair(36, 10.57)); // fine, more convenient
> m.insert(m.end(), make_pair(40, 2.29)); // also fine
> return 0;
>
> }
>
------------------------------
Date: Tue, 04 Feb 2003 23:50:27 +0000
From: Asfand yar Qazi <f_ker@yahoo.co.uk_NO_SPAM>
Subject: Re: How do you: 'make install PREFIX=somepath', and then platform-independantly find somepath?
Message-Id: <b1pjj6$t9v$1@newsg2.svr.pol.co.uk>
>
> You're building a module, the OP was building perl with PREFIX=$MODPATH.
>
I'm not building Perl. I'm embedding the 'libperl.a' in an external
program, and need to give it a different path at build time (according
to the user's taste.) 'libperl.a' is built independantly of my program
by the user him/her/(it?)self.
>
> The situation is different when installing a module. Apparently, some
> modules don't play entirely by the rules (or part of the install process
> doesn't) when it comes to installing in a private directory through
> PREFIX=$MODPATH. It *should* suffice to add $MODPATH to @INC, but
> sometimes it doesn't. If I read correctly the remarks in perlmodinstall
> (starting with "Also note that..."), additionally specifying
> "$MODPATH/lib/site_perl" should cover all cases, though no explanation
> is given when to use which.
>
> [...]
>
> Anno
I think I've arrived at the solution:
$MODPATH="...";
$_=$Config{sitelib_stem};
s!(.+)(site_perl)!$MODPATH/\2!;
print;
(or something)
------------------------------
Date: Tue, 04 Feb 2003 23:53:13 +0000
From: Asfand yar Qazi <f_ker@yahoo.co.uk_NO_SPAM>
Subject: Re: How do you: 'make install PREFIX=somepath', and then platform-independantly find somepath?
Message-Id: <b1pjob$eg1$1@news7.svr.pol.co.uk>
>
> You're building a module, the OP was building perl with PREFIX=$MODPATH.
>
I'm not building Perl. I'm embedding the 'libperl.a' in an external
program, and need to give it a different path at build time (according
to the user's taste.) 'libperl.a' is built independantly of my program
by the user him/her/(it?)self.
>
> That's a somewhat different situation. If you build perl with a PREFIX
> perl will find modules that are installed by this perl in the standard
> way (without PREFIX). This is the OP's situation as I understand it.
>
> The situation is different when installing a module. Apparently, some
> modules don't play entirely by the rules (or part of the install process
> doesn't) when it comes to installing in a private directory through
> PREFIX=$MODPATH. It *should* suffice to add $MODPATH to @INC, but
> sometimes it doesn't. If I read correctly the remarks in perlmodinstall
> (starting with "Also note that..."), additionally specifying
> "$MODPATH/lib/site_perl" should cover all cases, though no explanation
> is given when to use which.
>
> [...]
>
> Anno
I think I've arrived at the solution:
$MODPATH="...";
$_=$Config{sitelib_stem};
s!(.+)(/site_perl)!$MODPATH\2!;
print;
(or something)
p.s. Sorry if I have posted something similar to this message several
times... I cancelled a few messages and I hope they stayed cancelled!
------------------------------
Date: Wed, 05 Feb 2003 07:15:32 +0000
From: Asfand yar Qazi <f_ker@yahoo.co.uk_NO_SPAM>
Subject: Re: How do you: 'make install PREFIX=somepath', and then platform-independantly find somepath?
Message-Id: <b1qdlm$mkc$1@news8.svr.pol.co.uk>
>
>
> You're building a module, the OP was building perl with PREFIX=$MODPATH.
I'm not building Perl. I'm embedding the 'libperl.a' in an external
program, and need to give it a different path at build time (according
to the user's taste.) 'libperl.a' is built independantly of my program
by the user him/her/(it?)self.
>
> That's a somewhat different situation. If you build perl with a PREFIX
> perl will find modules that are installed by this perl in the standard
> way (without PREFIX). This is the OP's situation as I understand it.
>
> The situation is different when installing a module. Apparently, some
> modules don't play entirely by the rules (or part of the install process
> doesn't) when it comes to installing in a private directory through
> PREFIX=$MODPATH. It *should* suffice to add $MODPATH to @INC, but
> sometimes it doesn't. If I read correctly the remarks in perlmodinstall
> (starting with "Also note that..."), additionally specifying
> "$MODPATH/lib/site_perl" should cover all cases, though no explanation
> is given when to use which.
>
> [...]
>
> Anno
I think I know what to do now. After MUCH MUCH thought I can say that
the following program, when executed, outputs a path to append to the
end of PREFIX. This gives a path that, when used with 'perl -I', allows
perl to find modules in PREFIX, regardless of where/why/when/how perl
was installed.
#!/usr/local/bin/perl -w
use Config;
use strict;
my $stem = $Config{sitelib_stem};
my $prefix = $Config{prefix};
$stem =~ m!($prefix)(.+)!;
print $2;
__END__
------------------------------
Date: Wed, 05 Feb 2003 00:05:51 GMT
From: Joseph Norris <jozefn@bolt.sonic.net>
Subject: question about perl and and active directory find - users in groups
Message-Id: <Pine.LNX.4.40.0302041603290.1247-100000@bolt.sonic.net>
This is what I have so far that gives me the names of the groups:
sub ADS_check_name{
my ($user_name) = @_;
my %Config;
my $iCount = 0;
$Config{path} = "WinNT://" . Win32::DomainName();
if( $AD = GetADSIObject( \%Config ) ) {
my $Schema = Win32::OLE->GetObject( $AD->{Schema} );
foreach my $Object ( in $AD ) {
if ( $Object->{'Class'} eq 'Group' ) {
print "$Object->{Name}\n";
}
}
}
}
sub GetADSIObject {
my( $Config ) = @_;
my $ADSIObject;
my $ADsPath = $Config->{path};
$ADSIObject = Win32::OLE->GetObject( $ADsPath );
return( $ADSIObject );
}
I just don't know where to go from here in windoze land. I have not
found a good way to find out just what keys are in $Object ref and
I don't know how to find out which users belong to a group that the
print statement spews out.
Any help would be most appreciated.
#Joseph Norris (Perl - what else is there?/Linux/CGI/Mysql)
print @c=map chr $_+100,(6,17,15,16,-68,-3,10,11,16,4,1,14,-68,12,1,14,8,
-68,4,-3,-1,7,1,14,-68,-26,11,15,1,12,4,-68,-22,11,14,14,5,15,-90);
------------------------------
Date: Wed, 05 Feb 2003 01:17:52 -0500
From: istink <istink@real.bad.com>
Subject: select case
Message-Id: <3E40AC90.596D478C@real.bad.com>
I can't seem to find a select case in perl.
how could this be?
is it called something else?
------------------------------
Date: 5 Feb 2003 06:49:25 GMT
From: "Tassilo v. Parseval" <tassilo.parseval@post.rwth-aachen.de>
Subject: Re: select case
Message-Id: <b1qc5l$a8i$1@nets3.rz.RWTH-Aachen.DE>
Also sprach istink:
> I can't seem to find a select case in perl.
> how could this be?
> is it called something else?
No, it doesn't exist in the Perl-core. See 'perldoc -q switch' that
shows how to achieve a similar thing in Perl. You might be so lucky to
get away with a dispatch-table (as shown at the bottom of this FAQ)
which is the most elegant and efficient solution to a lot of
switch/case-ish problems.
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
------------------------------
Date: Wed, 5 Feb 2003 02:09:49 -0500
From: "Smiley" <smiley@uvgotemail.com>
Subject: Some confusion
Message-Id: <v41ebas2ods290@corp.supernews.com>
Can anybody tell me what's standard wages for web programmers doing contract
work in Perl, PHP, ColdFusion, and the like make? I've heard from some
sources that these programmers are a dime a dozen and are often found
working for minimum wage - yet I've been led to believe that programmers in
general make $30,000 - $40,000/yr on the lower end.
What's the truth here?
------------------------------
Date: Wed, 5 Feb 2003 07:39:53 +0000 (UTC)
From: "Fnaffle" <fnaffle@visation.com>
Subject: Re: Some confusion
Message-Id: <b1qf49$sth$1@venus.btinternet.com>
Check out the job sites (jobserve.com etc) for an accurate picture of what
the current market is. Contract rates for CF in the UK have dropped for yer
average guy fro £45/50ph to £20/30ph over the past coupla years - although
it is quite possible to get higher rates if you have the experience and
ability and are able to take on complete project work. As regards PHP this
is a good tool in your toolbelt but as far as I can see the market out there
is very cheap.
No idea about the U.S. tho...
"Smiley" <smiley@uvgotemail.com> wrote in message
news:v41ebas2ods290@corp.supernews.com...
> Can anybody tell me what's standard wages for web programmers doing
contract
> work in Perl, PHP, ColdFusion, and the like make? I've heard from some
> sources that these programmers are a dime a dozen and are often found
> working for minimum wage - yet I've been led to believe that programmers
in
> general make $30,000 - $40,000/yr on the lower end.
>
> What's the truth here?
>
>
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.445 / Virus Database: 250 - Release Date: 21/01/2003
------------------------------
Date: Tue, 04 Feb 2003 23:06:38 GMT
From: Martien Verbruggen <mgjv@tradingpost.com.au>
Subject: Re: sort, my concoction of
Message-Id: <slrnb40hrt.se5.mgjv@verbruggen.comdyn.com.au>
On 4 Feb 2003 07:27:16 GMT,
Tassilo v. Parseval <tassilo.parseval@post.rwth-aachen.de> wrote:
> Also sprach istink:
>> I want it to strip all non alpha/num
>> and sort it.
>> $aa=~ s/\W+//g;
>> $bb=~ s/\W+//g;
>> $aa cmp $bb;
> -1- @temp = map { $_->[0] }
> -2- sort { $a->[1] cmp $b->[1] }
> -3- map { my $last = (split /\|/)[-1];
> -4- $last =~ tr/a-z//cd;
> -5- [ $_, $last ] } @db;
Note that tr/a-z//cd is very different from s/\W+//g and also does not
match what the OP asked for in words (tr/a-zA-Z0-9//cd would come
closer). In general, using character ranges like that is unportable,
and should be avoided. A-Z or a-z never portably translates into all
(upper/lowercase) letters, even though 0-9 always translates to all
digits.
Using things like [:upper:], [:lower:], [:alpha:], [:alnum:] etc in a
regular expression character class is much more reliable. tr///
shouldn't be used for this sort of thing, where you're looking to
operate on character ranges that include all letters. I think the
OP's s///; was correct, except that they left the underscore in as
well. Maybe:
$last =~ s/[^[:alnum:]]+//g;
I agree with the recommendation of using a Schwartzian transform (ST),
or something equivalent.
You can also do it with a Guttman-Rosler transform (GRT):
@db =
map { (split /\0/)[1] }
sort
map { my $l = (split /\|/)[-1];
$l =~ s/[^[:alnum:]]+//g;
join "\00", $l, $_
} @db;
This has the advantage (the general GRT advantage over ST) that it
doesn't call an external compare subroutine for each comparison, and
in that comparison it doesn't have to dereference things.
I've used the null character as a join delimiter, because it is
unlikely to show up in the text, and it sorts before the lowest
character you're interested in. Because the key Perl is comparing now
is not entirely limited to the real key (last column), elements that
have an identical real key will still be sorted in a determinate order
based on the rest of the line. The ST will for identical real keys
output the records in the order they originally appeared in, while
this GRT will output them based on the (alphabetical) order of the
numerical first column. Whether or not that is important is up to you
to decide. If it is important, it can be dealt with (by building a
better primary key, which maybe includes a sequence number, using pack
or sprintf), but the effort and loss of code readability are probably
a price too high to pay. Unelss, of course, you have really, really
large arrays. But then you should probably not be coding in perl in
the first place :)
Martien
PS. A non-sophisiticated benchmark shows for the posted data and
increase in speed between 15 and 35% for the GRT over the ST,
depending a bit on the length of the array to be sorted. Different
data might get worse or better results, but I'm pretty certain GRT
will always be faster than ST here.
--
|
Martien Verbruggen | We are born naked, wet and hungry. Then
Trading Post Australia | things get worse.
|
------------------------------
Date: Tue, 04 Feb 2003 19:02:55 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: sort, my concoction of
Message-Id: <3E4054AF.D10B126C@earthlink.net>
Martien Verbruggen wrote:
[snip]
> @db =
> map { (split /\0/)[1] }
> sort
> map { my $l = (split /\|/)[-1];
> $l =~ s/[^[:alnum:]]+//g;
> join "\00", $l, $_
> } @db;
A few comments... First, although the OP's data doesn't contain any
embedded nuls, it could -- your split would discard all data after the
first nul. Thus, if you want to use split, it ought to be:
(split /\0/, $_, 2)[1]
But substr and index is usually faster.
Similarly when finding the last record, substr and rindex should be
faster than split.
Third, having a BLOCK, instead of an EXPR, for map, is slower. So, try
the follwoing:
@db =
map substr($_, index($_, "\0")+1),
sort
map join("\0", join("", substr($_,rindex($_,"|")+1) =~ /\w+/g), $_),
@db;
[untested]
(The reason for doing join("", ... =~ /\w+/g), instead of s/\W+//g, is
that it's easier to do a single expression, and avoids the need for a
BLOCK. I've no idea whether it's faster or slower than s/\W+//g)
--
"So, who beat the clueless idiot today?"
"Well, we flipped for it, but when Kuno
landed, he wasn't in any shape to fight."
"Next time, try flipping a *coin.*"
------------------------------
Date: Wed, 05 Feb 2003 00:54:02 GMT
From: Martien Verbruggen <mgjv@tradingpost.com.au>
Subject: Re: sort, my concoction of
Message-Id: <slrnb40o59.se5.mgjv@verbruggen.comdyn.com.au>
On Tue, 04 Feb 2003 19:02:55 -0500,
Benjamin Goldberg <goldbb2@earthlink.net> wrote:
> Martien Verbruggen wrote:
> [snip]
>> @db =
>> map { (split /\0/)[1] }
>> sort
>> map { my $l = (split /\|/)[-1];
>> $l =~ s/[^[:alnum:]]+//g;
>> join "\00", $l, $_
>> } @db;
>
> A few comments... First, although the OP's data doesn't contain any
> embedded nuls, it could -- your split would discard all data after the
> first nul.
Which is why I said:
I've used the null character as a join delimiter, because it is
unlikely to show up in the text, and it sorts before the lowest
character you're interested in.
> Thus, if you want to use split, it ought to be:
> (split /\0/, $_, 2)[1]
But this is a better solution.
> But substr and index is usually faster.
>
> Similarly when finding the last record, substr and rindex should be
> faster than split.
Yep, you are absolutely right. But it tends to make the code harder to
read. :)
> Third, having a BLOCK, instead of an EXPR, for map, is slower. So, try
> the follwoing:
>
> @db =
> map substr($_, index($_, "\0")+1),
> sort
> map join("\0", join("", substr($_,rindex($_,"|")+1) =~ /\w+/g), $_),
> @db;
Hmmm.. That certainly is a lot harder to read than the join/split :) I
guess a bit of comment in the actual code would be prudent for this.
Benchmarking your changes with the same benchmark I used before gives
something like this,
ST = Schwartzian Transform
GRT = Guttman-Rosler Transform (split)
MGRT = Guttman-Rosler Transform (index)
ARRAY LENGTH: 27
Rate ST GRT MGRT
ST 2473/s -- -14% -44%
GRT 2869/s 16% -- -35%
MGRT 4392/s 78% 53% --
ARRAY LENGTH: 2700
Rate ST GRT MGRT
ST 18.4/s -- -25% -43%
GRT 24.4/s 33% -- -25%
MGRT 32.5/s 77% 33% --
> [untested]
I didn't test for correctness either, just assumed it was :)
> (The reason for doing join("", ... =~ /\w+/g), instead of s/\W+//g, is
> that it's easier to do a single expression, and avoids the need for a
> BLOCK. I've no idea whether it's faster or slower than s/\W+//g)
I doubt it would make much of a difference.
Uri, care to come up with a pack/unpack solution that's even faster?
:)
Martien
--
|
Martien Verbruggen | We are born naked, wet and hungry. Then
Trading Post Australia | things get worse.
|
------------------------------
Date: Wed, 05 Feb 2003 01:02:51 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: sort, my concoction of
Message-Id: <x7d6m7pmit.fsf@mail.sysarch.com>
>>>>> "MV" == Martien Verbruggen <mgjv@tradingpost.com.au> writes:
MV> Uri, care to come up with a pack/unpack solution that's even faster?
MV> :)
i leave that as an exercise to the reader. :)
my co-author on that used to be here but he retired. he was the
specialist using N format of pack for integer sorting. the paper covers
all the details and it is very easy. also he used substr/index for speed
as well. the big win is losing the sort callback into perl. the gains in
using the pre/postprocessing tricks are good in the real world even if
they are not counted in the O() world.
http://sysarch.com/perl/sort_paper.html
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
----- Stem and Perl Development, Systems Architecture, Design and Coding ----
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
Damian Conway Perl Classes - January 2003 -- http://www.stemsystems.com/class
------------------------------
Date: Tue, 04 Feb 2003 20:19:27 -0500
From: Benjamin Goldberg <goldbb2@earthlink.net>
Subject: Re: sort, my concoction of
Message-Id: <3E40669F.DCA0D94@earthlink.net>
Martien Verbruggen wrote:
> Benjamin Goldberg wrote:
[snip]
> Benchmarking your changes with the same benchmark I used before gives
> something like this,
>
> ST = Schwartzian Transform
> GRT = Guttman-Rosler Transform (split)
> MGRT = Guttman-Rosler Transform (index)
>
> ARRAY LENGTH: 27
>
> Rate ST GRT MGRT
> ST 2473/s -- -14% -44%
> GRT 2869/s 16% -- -35%
> MGRT 4392/s 78% 53% --
[snip]
> Uri, care to come up with a pack/unpack solution that's even faster?
> :)
The big advantage of pack/unpack is that one (often) can create some
type of prefix-free format which still sorts properly. But this really
only works well if [part of] the data consists of numbers.
However, the key being sorted on is a string, whose only special
property is that it contains all \w characters. For the packed string
to be sortable, this data *must* be in the front. And for it
unpackable, we need some way of seperating the main string from the
sorting data.
The only way I can think of for satisfying this criteria is to seperate
the coallating data from the main data with some character, such as "\0"
I suppose that one could use pack("Z*a*", ...) instead of
join("\0",...), but benchmarking shows join to be faster.
Also, unpack("x[Z*]a*", $_) is only supported on perl 5.8, not on
earlier perls. (And I've no idea if it is faster, since I don't happen
to have perl5.8 installed, and only know of this feature from being on
the perl5porters mailing list)
--
"So, who beat the clueless idiot today?"
"Well, we flipped for it, but when Kuno
landed, he wasn't in any shape to fight."
"Next time, try flipping a *coin.*"
------------------------------
Date: 5 Feb 2003 06:44:51 GMT
From: "Tassilo v. Parseval" <tassilo.parseval@post.rwth-aachen.de>
Subject: Re: sort, my concoction of
Message-Id: <b1qbt3$a52$1@nets3.rz.RWTH-Aachen.DE>
Also sprach Martien Verbruggen:
> On 4 Feb 2003 07:27:16 GMT,
> Tassilo v. Parseval <tassilo.parseval@post.rwth-aachen.de> wrote:
>> Also sprach istink:
>
>>> I want it to strip all non alpha/num
>>> and sort it.
>
>
>>> $aa=~ s/\W+//g;
>>> $bb=~ s/\W+//g;
>>> $aa cmp $bb;
>
>
>> -1- @temp = map { $_->[0] }
>> -2- sort { $a->[1] cmp $b->[1] }
>> -3- map { my $last = (split /\|/)[-1];
>> -4- $last =~ tr/a-z//cd;
>> -5- [ $_, $last ] } @db;
>
> Note that tr/a-z//cd is very different from s/\W+//g and also does not
> match what the OP asked for in words (tr/a-zA-Z0-9//cd would come
> closer). In general, using character ranges like that is unportable,
> and should be avoided. A-Z or a-z never portably translates into all
> (upper/lowercase) letters, even though 0-9 always translates to all
> digits.
Yes, you are right. tr/// isn't locales-aware either whereas proper
character classes in s/// are (with the appropriate pragma) which could
matter to the OP. I think I was too focused on the OP's sample-data.
[ snipped excellent stuff about Guttman-Rosler Transform to which I have
nothing to add ]
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
------------------------------
Date: 4 Feb 2003 15:54:31 -0800
From: paulhefford@hotmail.com (Paul)
Subject: Re: Until loop not working with multiple conditions
Message-Id: <40b39fd3.0302041554.9808d90@posting.google.com>
Thank you everyone for your responses.
I am new to perl and I thought after looking at few references that
"==" and "eq" would be interchangeable. Obviously not!
------------------------------
Date: Wed, 05 Feb 2003 00:54:52 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: Until loop not working with multiple conditions
Message-Id: <x7n0lbpmw3.fsf@mail.sysarch.com>
>>>>> "P" == Paul <paulhefford@hotmail.com> writes:
P> I am new to perl and I thought after looking at few references that
P> "==" and "eq" would be interchangeable. Obviously not!
a good piece of advice for you in learning perl is never assume
stuff. always look things up in the docs before you use new operators or
functions. there are many little nooks and crannies of perl and the best
way to explore them is to read the docs as needed and follow the cross
references as they interest you. you will learn as you go and find out
how to look things up and also why the docs are the best perl reference.
the perlop document would have easily shown you that the symbolic and
named comparison operators work on numeric and string values
respectively. perl is very good about providing the right tool for the
job as opposed to one tool (say ==) and forcing it to be used in
contrived ways.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
----- Stem and Perl Development, Systems Architecture, Design and Coding ----
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
Damian Conway Perl Classes - January 2003 -- http://www.stemsystems.com/class
------------------------------
Date: Wed, 05 Feb 2003 08:57:06 +0100
From: Josef =?iso-8859-1?Q?M=F6llers?= <josef.moellers@fujitsu-siemens.com>
Subject: Re: Until loop not working with multiple conditions
Message-Id: <3E40C3D2.DA61167A@fujitsu-siemens.com>
Uri Guttman wrote:
> =
> >>>>> "P" =3D=3D Paul <paulhefford@hotmail.com> writes:
> =
> P> I am new to perl and I thought after looking at few references tha=
t
> P> "=3D=3D" and "eq" would be interchangeable. Obviously not!
> =
> a good piece of advice for you in learning perl is never assume
> stuff.
s/perl //
"To make an assumption means to fool yourself." (Dunno who said that)
-- =
Josef M=F6llers (Pinguinpfleger bei FSC)
If failure had no penalty success would not be a prize
-- T. Pratchett
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
The Perl-Users Digest is a retransmission of the USENET newsgroup
comp.lang.perl.misc. For subscription or unsubscription requests, send
the single line:
subscribe perl-users
or:
unsubscribe perl-users
to almanac@ruby.oce.orst.edu.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
To request back copies (available for a week or so), send your request
to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
where x is the volume number and y is the issue number.
For other requests pertaining to the digest, send mail to
perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
sending perl questions to the -request address, I don't have time to
answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 4513
***************************************