[24756] in Perl-Users-Digest
Perl-Users Digest, Issue: 6909 Volume: 10
daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Aug 25 14:06:01 2004
Date: Wed, 25 Aug 2004 11:05:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)
Perl-Users Digest Wed, 25 Aug 2004 Volume: 10 Number: 6909
Today's topics:
Re: "Perlish Patterns" by Phil Crow <troc@pobox.com>
Re: -s vs du - different results <uri@stemsystems.com>
Re: -s vs du - different results <zebee@zip.com.au>
Re: -s vs du - different results <uri@stemsystems.com>
Re: -s vs du - different results <Paul.Gaborit@invalid.invalid>
Re: 5.8.5 make test fails on lib/ExtUtils/Constant with <clint@0lsen.net>
Re: 5.8.5 make test fails on lib/ExtUtils/Constant with <clint@0lsen.net>
Re: convertinga directory path into a hash <tassilo.von.parseval@rwth-aachen.de>
creating anonymous subroutines at runtime <cpyfykf02@sneakemail.com>
Re: creating anonymous subroutines at runtime <cpyfykf02@sneakemail.com>
Re: creating anonymous subroutines at runtime <nobull@mail.com>
Re: creating anonymous subroutines at runtime <mritty@gmail.com>
Re: Oracle DBI/DBD and bind vars - so slooooowwwww <Juha.Laiho@iki.fi>
Re: Performance Improvement of complex data structure ( <sgilpin@gmail.com>
Re: performance surprise -- why? <haltingNOSPAM@comcast.net>
Re: Perl and DOS I/O <richard@zync.co.uk>
Re: recursive functions (David Combs)
Re: Simulating the open() command. (Chris Heller)
Re: Slide show: this should be fairly straightforward - <segraves_f13@mindspring.com>
Re: start some actions with Perl without Cron? <makbo@pacbell.net>
Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)
----------------------------------------------------------------------
Date: Wed, 25 Aug 2004 16:28:50 GMT
From: Rocco Caputo <troc@pobox.com>
Subject: Re: "Perlish Patterns" by Phil Crow
Message-Id: <slrncipfim.hdf.troc@eyrie.homenet>
On Tue, 24 Aug 2004 08:22:02 +0200, Peter Michael wrote:
> Hi,
>
> I ordered "Perlish Patterns" (Apress) by Phil Crow in advance via
> amazon. Now they told me that the book is "no longer available".
> Does anybody know what happened to it?
Maybe it sold out. :)
Have a look at http://perldesignpatterns.com/ while you're waiting.
--
Rocco Caputo - http://poe.perl.org/
------------------------------
Date: Wed, 25 Aug 2004 14:13:04 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: -s vs du - different results
Message-Id: <x7pt5f8fxy.fsf@mail.sysarch.com>
>>>>> "SH" == Sam Holden <sholden@flexal.cs.usyd.edu.au> writes:
SH> On Wed, 25 Aug 2004 05:51:09 GMT, Uri Guttman <uri@stemsystems.com> wrote:
SH> [snip du and perl's '-s' giving different results]
>> ls -l pad_bench.pl
>> -rw-r--r-- 1 uri staff 523 May 3 03:25 pad_bench.pl
>> perl -le 'print -s "pad_bench.pl"'
>> 523
>> du -sb pad_bench.pl
>> 1024 pad_bench.pl
SH> My du doesn't seem to do that:
SH> ; ls -l resolver.pl
SH> -rw-r--r-- 1 sholden pgrad 436 Jul 18 15:58 resolver.pl
SH> ; perl -le 'print -s "resolver.pl"'
SH> 436
SH> ; du -sb resolver.pl
SH> 436 resolver.pl
SH> ;
SH> But I see that it's a version thing...
SH> ; du --version
SH> du (coreutils) 5.2.1
SH> ; ./du --version
SH> du (fileutils) 4.1
SH> ; du -bs resolver.pl
SH> 436 resolver.pl
SH> ; ./du -bs resolver.pl
SH> 4096 resolver.pl
interesting. i have du (GNU fileutils) 4.0 on my sparc/solaris.
SH> So the --apparent-size was added sometime between those two versions
SH> and -b changed to be:
SH> -b, --bytes equivalent to `--apparent-size --block-size=1'
bah!
i have always used du with block counts as i wanted 'disk usage'. i
never cared about byte usage. in fact i always use the -k option for du
since i want to know storage that way.
SH> The joys of incompatable unix tools - someone should write a portable
SH> scripting language to avoid these issues...
hmmmm.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
------------------------------
Date: Wed, 25 Aug 2004 14:36:59 GMT
From: Zebee Johnstone <zebee@zip.com.au>
Subject: Re: -s vs du - different results
Message-Id: <slrncip8lu.elm.zebee@zeus.zipworld.com.au>
In comp.lang.perl.misc on Wed, 25 Aug 2004 14:13:04 GMT
Uri Guttman <uri@stemsystems.com> wrote:
>
> i have always used du with block counts as i wanted 'disk usage'. i
> never cared about byte usage. in fact i always use the -k option for du
> since i want to know storage that way.
du (fileutils) 4.1
Written by Torbjorn Granlund, David MacKenzie, Larry McVoy, and Paul
Eggert.
Copyright (C) 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is
NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
[zebee@clone zebee]$ ls -l netgear.cfg
-rw-r--r-- 1 zebee www 30452 Aug 28 2003 netgear.cfg
[zebee@clone zebee]$ du -b netgear.cfg
32768 netgear.cfg
[zebee@clone zebee]$ du netgear.cfg
32 netgear.cfg
[zebee@clone zebee]$ perl -le 'print -s "netgear.cfg"'
30452
is there a perl way to get block usage?
Zebee
------------------------------
Date: Wed, 25 Aug 2004 15:15:36 GMT
From: Uri Guttman <uri@stemsystems.com>
Subject: Re: -s vs du - different results
Message-Id: <x78yc38d1q.fsf@mail.sysarch.com>
>>>>> "ZJ" == Zebee Johnstone <zebee@zip.com.au> writes:
ZJ> du (fileutils) 4.1
ZJ> Written by Torbjorn Granlund, David MacKenzie, Larry McVoy, and Paul
ZJ> Eggert.
ZJ> Copyright (C) 2001 Free Software Foundation, Inc.
ZJ> This is free software; see the source for copying conditions. There is
ZJ> NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
ZJ> PURPOSE.
ZJ> [zebee@clone zebee]$ ls -l netgear.cfg
ZJ> -rw-r--r-- 1 zebee www 30452 Aug 28 2003 netgear.cfg
ZJ> [zebee@clone zebee]$ du -b netgear.cfg
ZJ> 32768 netgear.cfg
ZJ> [zebee@clone zebee]$ du netgear.cfg
ZJ> 32 netgear.cfg
ZJ> [zebee@clone zebee]$ perl -le 'print -s "netgear.cfg"'
ZJ> 30452
ZJ> is there a perl way to get block usage?
see the other posts by sam. he is using a more recent du which makes -b
act more like -s. but that still won't handle gaps correctly. sam
recommend rounding up the -s to the next block size (or you could just
count blocks with a mod (%) operation on the block size). you really
need block counts IMO as that is what the cdrom will need. fractional
trailing blocks still take up whole blocks on most file systems (reiser
is one that doesn't do that).
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
------------------------------
Date: Wed, 25 Aug 2004 17:44:23 +0200
From: Paul Gaborit <Paul.Gaborit@invalid.invalid>
Subject: Re: -s vs du - different results
Message-Id: <r7u0urw7c8.fsf@michelange.enstimac.fr>
À (at) Wed, 25 Aug 2004 14:36:59 GMT,
Zebee Johnstone <zebee@zip.com.au> écrivait (wrote):
> [zebee@clone zebee]$ perl -le 'print -s "netgear.cfg"'
> 30452
>
> is there a perl way to get block usage?
Yes :
$ perl -le 'print +(stat "netgear.cfg")[12]'
But, is there a perl way to get block size ? ;-)
--
Paul Gaborit - <http://www.enstimac.fr/~gaborit/>
Perl en français - <http://www.enstimac.fr/Perl/>
------------------------------
Date: Wed, 25 Aug 2004 16:40:15 GMT
From: Clint Olsen <clint@0lsen.net>
Subject: Re: 5.8.5 make test fails on lib/ExtUtils/Constant with icc
Message-Id: <slrncipg7f.1gei.clint@belle.0lsen.net>
On 2004-08-25, Tassilo v. Parseval <tassilo.von.parseval@rwth-aachen.de> wrote:
>
> $ cd t
> $ ./TEST base/*.t
>
> Note sure how much this is going to help you. The 27th test appears to be
> in base/lex.t so it's apparently something fishy with the Perl lexer.
No, the test that fails is lib/ExtUtils/Constant. The base tests all pass.
In fact, this is the only test that fails.
-Clint
------------------------------
Date: Wed, 25 Aug 2004 18:00:11 GMT
From: Clint Olsen <clint@0lsen.net>
Subject: Re: 5.8.5 make test fails on lib/ExtUtils/Constant with icc
Message-Id: <slrncipktb.1gei.clint@belle.0lsen.net>
On 2004-08-25, Clint Olsen <clint@0lsen.net> wrote:
>
> No, the test that fails is lib/ExtUtils/Constant. The base tests all
> pass. In fact, this is the only test that fails.
And here's the verbose output (thanks to your help):
ok 20
ok 21
ok 22
ok 23 - maketest
ok 24 - regen
ok 25 - regen worked
# make = 'make clean'
ok 26
# Extra file 'ExtTest.il'
not ok 27
FAILED at test 27
Failed 1 test script out of 1, 0.00% okay.
I have no idea what this means...
-Clint
------------------------------
Date: Wed, 25 Aug 2004 18:20:15 +0200
From: "Tassilo v. Parseval" <tassilo.von.parseval@rwth-aachen.de>
Subject: Re: convertinga directory path into a hash
Message-Id: <2p3si4Fgb0cuU1@uni-berlin.de>
Also sprach Anno Siegel:
> Tassilo v. Parseval <tassilo.parseval@post.rwth-aachen.de> wrote in comp.lang.perl.misc:
>> It's a short-cut. More explicitely:
>>
>> $ref->{ $head } = { };
>> path2hash($tail, $ref->{ $head });
>>
>> In Perl, assignments have a return value and that was what I was using
>> here.
>>
>> Other than that, it should be pretty straight-forward. Note that this is
>> a so called primitive recursion because each instantiation of the
>> function cuts off a piece of its argument ($head) and calls path2hash
>> with the thusly diminshed argument ($tail).
>
> That means that recursion can be replaced with a jump to the
> subroutine. Untested:
>
> sub path2hash {
> my ($p, $ref) = @_;
> return if not $p;
> my ($head, $tail) = $p =~ m!/?([^/]+)(.*)!;
> @_ = ( $tail, $ref->{ $head} = {});
> goto &path2hash;
>
> "goto &..." is (among other things) Perl's method to cut off tail
> recursion.
If it had been proper tail recursion, yes. Note that by doing it this
way, you loose the ability to return the populated hash-reference and
you have to do a call-by-reference (sort of). In context, this will
become:
sub path2hash {
my ($p) = @_;
return if not $p;
my ($head, $tail) = $p =~ m!/?([^/]+)(.*)!;
@_ = ($tail, $_[1]->{ $head } = {});
goto &path2hash;
}
path2hash("/home/user/mail", my $ref);
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
------------------------------
Date: Wed, 25 Aug 2004 19:06:45 +0200
From: Arne =?ISO-8859-1?Q?G=F6deke?= <cpyfykf02@sneakemail.com>
Subject: creating anonymous subroutines at runtime
Message-Id: <20040825190645.5733bc97.cpyfykf02@sneakemail.com>
hey everyone,
is there a possibility to create anonymous subroutines at runtime with code=
-refs inside?
I am aware of 'eval' but I cannot get it working like this (example to make=
clear what I want):
$callback =3D sub { "some code" }
$cb =3D eval "sub { &{$cb}(); }"; # This certainly fails.. since CODE(0xwha=
tever) is not a code-ref=20
$cb should then be passed somewhere else. Basically, I just want to create =
nested anonymous subroutines..=20
thanks in advance!
arne g=F6deke
------------------------------
Date: Wed, 25 Aug 2004 19:09:20 +0200
From: Arne =?ISO-8859-1?Q?G=F6deke?= <cpyfykf02@sneakemail.com>
Subject: Re: creating anonymous subroutines at runtime
Message-Id: <20040825190920.7a19d34b.cpyfykf02@sneakemail.com>
On Wed, 25 Aug 2004 19:06:45 +0200
Arne G=F6deke <cpyfykf02@sneakemail.com> wrote:
> hey everyone,
>=20
> is there a possibility to create anonymous subroutines at runtime with co=
de-refs inside?
> I am aware of 'eval' but I cannot get it working like this (example to ma=
ke clear what I want):
>=20
> $callback =3D sub { "some code" }
> $cb =3D eval "sub { &{$cb}(); }"; # This certainly fails.. since CODE(0xw=
hatever) is not a code-ref=20
>=20
> $cb should then be passed somewhere else. Basically, I just want to creat=
e nested anonymous subroutines..=20
>=20
> thanks in advance!
> arne g=F6deke
>=20
erm.. sorry, I made a mistake... that line should read:
$cb =3D eval "sub { &{$callback}(); }"; # This certainly fails.. since CODE=
(0xwhatever) is not a code-ref
------------------------------
Date: Wed, 25 Aug 2004 18:23:01 +0100
From: Brian McCauley <nobull@mail.com>
Subject: Re: creating anonymous subroutines at runtime
Message-Id: <cgihmd$dmk$1@sun3.bham.ac.uk>
Arne Gödeke wrote:
> On Wed, 25 Aug 2004 19:06:45 +0200
> Arne Gödeke <cpyfykf02@sneakemail.com> wrote:
>
>
>>hey everyone,
>>
>>is there a possibility to create anonymous subroutines at runtime with code-refs inside?
>>I am aware of 'eval' but I cannot get it working like this (example to make clear what I want):
>>
>>$callback = sub { "some code" }
>>$cb = eval "sub { &{$cb}(); }"; # This certainly fails.. since CODE(0xwhatever) is not a code-ref
>>
>>$cb should then be passed somewhere else. Basically, I just want to create nested anonymous subroutines..
>>
>>thanks in advance!
>>arne gödeke
>>
>
> erm.. sorry, I made a mistake... that line should read:
>
> $cb = eval "sub { &{$callback}(); }"; # This certainly fails.. since CODE(0xwhatever) is not a code-ref
Well the problem here is that $callback is iterpolating. Simply escape
the $ (or use some other quoting) to prevent this.
But I suspect there's more (or maybe less) to what you are tring to do
than you are describing.
Be warned - this sort of game very easily leads to memory leaks.
------------------------------
Date: Wed, 25 Aug 2004 17:29:33 GMT
From: "Paul Lalli" <mritty@gmail.com>
Subject: Re: creating anonymous subroutines at runtime
Message-Id: <104Xc.7808$rT1.3079@trndny02>
"Arne Gödeke" <cpyfykf02@sneakemail.com> wrote in message
news:20040825190920.7a19d34b.cpyfykf02@sneakemail.com...
On Wed, 25 Aug 2004 19:06:45 +0200 Arne Gödeke <cpyfykf02@sneakemail.com>
wrote:
>> hey everyone,
>>
>> is there a possibility to create anonymous subroutines at runtime with
code-refs inside?
>> I am aware of 'eval' but I cannot get it working like this (example to
make clear what I want):
>>
>> $callback = sub { "some code" }
>> $cb = eval "sub { &{$cb}(); }"; # This certainly fails.. since
CODE(0xwhatever) is not a code-ref
>>
>> $cb should then be passed somewhere else. Basically, I just want to
create nested anonymous subroutines..
>>
>> thanks in advance!
>> arne gödeke
>>
> erm.. sorry, I made a mistake... that line should read:
> $cb = eval "sub { &{$callback}(); }"; # This certainly fails.. since
CODE(0xwhatever) is not a code-ref
I don't really understand why you would want to do this, but the following
worked for me when I modified your code:
$callback = sub {
print "Hello World\n";
};
$cb = eval 'sub { &{$callback}; }';
$cb->();
The main problem is that your double quotes in the eval were interpolating
$callback, whereas you wanted the eval to see the variable name. Changing to
single quotes eliminates this problem.
Paul Lalli
------------------------------
Date: Wed, 25 Aug 2004 15:57:03 GMT
From: Juha Laiho <Juha.Laiho@iki.fi>
Subject: Re: Oracle DBI/DBD and bind vars - so slooooowwwww
Message-Id: <cgicq1$pab$1@ichaos.ichaos-int>
"Lord0" <lawrence.tierney@bipsolutions.com> said:
>We're having a bit of a problem using bind vars with Oracle via
>DBI/DBD::Oracle. Basically if we use bind vars the following code/query
>takes about 35 seconds to return results if we don't use bind vars i.e. hard
>code the search parameters into the query then the results are returned in
>about 10seconds! Any ideas? We are using the following environment:
Btw, which Oracle version?
I think you might wish to take this question to an Oracle-related group.
There are cases in Oracle where it makes better (much better) execution
plans for queries where it has the parameters within the query as opposed
to using bind parameters. This behaviour is also highly dependent on
Oracle server version and configuration (init.ora parameters). Also,
whether or not the tables and indices related to the query are analyzed
does affect the query performance. I suggest generating an execution
plan at least for the case with hardcoded parameters (and preferably
also for the bind-parameter case -- I'm just not certain how one would
generate one).
As a final note, if the 'CONTAINS(complete_entry, ?) > 0' does what I think
(i.e. selects recors where the 'complete_entry' field contains the desired
text), then be careful with this -- it'll always omit any indexes there
might be for the 'complete_entry' field (because text indexes in Oracle
index from start of the text, and if you match something from within the
text, the indexes just cannot be used).
Hmm.. now that I think of this; if you have a fixed set of words you'd like
to match (so, 'network' and some amount of others), and you don't expect
many updates on this field, you might be able to make some experimentation
by building function-based indices on
CONTAINS(complete_entry, 'word')
for each word you'd like to match -- and then using the corresponding
fixed-parameter expressions in your query. The timestamps could still
use the bind parameter.
Leaving the actual SQL query for reference:
>$sql="SELECT id, title, to_char(auto_enter_date, 'DD/MM/YYYY') as
>auto_enter_date, tracker_ref, SUBSTR(description, 1, 255) AS description
>FROM boss_contract_admin WHERE (1=1) AND auto_enter_date >= TO_DATE(?,
>'DD/MM/YYYY') AND auto_enter_date <= TO_DATE(?, 'DD/MM/YYYY') AND
>CONTAINS(complete_entry, ?) > 0";
--
Wolf a.k.a. Juha Laiho Espoo, Finland
(GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V
PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
"...cancel my subscription to the resurrection!" (Jim Morrison)
------------------------------
Date: 25 Aug 2004 09:49:29 -0700
From: "Scott Gilpin" <sgilpin@gmail.com>
Subject: Re: Performance Improvement of complex data structure (hash of hashes of hashes)
Message-Id: <cgifup$ooj@odah37.prod.google.com>
Anno Siegel wrote:
> Scott Gilpin <sgilpin@gmail.com> wrote in comp.lang.perl.misc:
> > Hi everyone -
> >
> > I'm trying to improve the performance (runtime) of a program that
> > processes large files. The output of the processing is some fixed
> > number of matrices (that can vary between invocations of the
program),
> > each of which has a different number of rows, and the same number
of
> > columns. However, the number of rows and columns may not be known
> > until the last row of the original file is read. The original file
> > contains approximately 100 millon rows. Each individual matrix has
> > between 5 and 200 rows, and between 50 and 10000 columns. The data
> > structure I'm using is a hash of hashes of hashes that stores this
> > info. N is the total number of columns, M1 is the total number of
> > rows in matrix #1, M2 is the total number of rows in matrix 2, etc,
> > etc. The total number of matrices is between 3 and 15.
>
> [...]
>
> > Here is the code that I'm using to build up this data structure.
I'm
> > running perl version 5.8.3 on solaris 8 (sparc processor). The
system
> > is not memory bound or cpu bound - this program is really the only
> > thing that runs. There are several gigabytes of memory, and this
> > program doesn't grow bigger than around 100 MB. Right now the run
time
> > for the following while loop with 100 million rows of data is about
6
> > hours. Any small improvements would be great.
>
> It shouldn't take that long, unless the data structure blows up way
> beyond 100 MB.
>
> > ## loop to process each row of the original data
> > while(<INDATA>)
> > {
> > chomp($_);
> >
> >
> > ## Each row is delimited with |
> > my @original_row = split(/\|/o,$_);
> >
> > ## The cell value and the column name are always in the same
> > position
> > my $cell_value = $original_row[24];
> > my $col_name = $original_row[1];
> >
> > ## Add this column name to the list of ones we've seen
> > $columns_seen{$col_name}=1;
>
> Where is this used?
>
> > ## For each matrix, loop through and increment the
> > row/column value
> > foreach my $matrix (@matrixList)
>
> Where is @matrixList set?
>
> > {
> >
> > ## positionHash tells the position of the value for
> > ## this matrix in the original data row
> > my $row_name = $original_row[$positionHash{$matrix}];
>
> Where is %positionHash set?
>
> > $matrix_values{$matrix}{$row_name}{$col_name} +=
> > $cell_value;
> > }
> >
> > } ## end while
>
> This code isn't runnable. How are we to improve code we can't run?
Thanks for your reply. I apologize for not being more clear in my
original post. I've included the entire code to produce the desired
output. The first 10 lines of the input file are:
928219|7|6|MI|2
928219|9|5|MO|1
928219|11|5|CA|41
928219|8|6|MA|1
928219|5|5|WY|3
701396|10|7|QC|8
701396|17|1|MI|1
928219|0|3|CA|2
701396|13|1|CA|2
928219|1|1|CA|2
The header is:
col_name|matrix1_rows|matrix2_rows|matrix3_rows|cell_values
The source code is:
#!/usr/local/bin/perl5.8.3
use strict;
## The list of matrices actually varies between invocations
## of the program - anywhere from 3 - 15
my @matrixList = ("matrix1", "matrix2", "matrix3");
## Position hash has the row positions of the values for each matrix
my %positionHash = (matrix1 => 1, matrix2 => 2, matrix3 => 3);
## Keep track of the columns we've seen so far
my %columns_seen = ();
## hash of hashes of hashes - matrix => rows => columns
my %matrix_values = ();
open (INDATA, "data.txt") ||
die "can't open data.txt";
while(<INDATA>) {
chomp($_);
## Each row is variable width, delimited with |
my @original_row = split(/\|/,$_);
## The cell value and the column name are always in the same
## position
my $cell_value = $original_row[4];
my $col_name = $original_row[0];
## Add this column name to the list of ones we've seen
$columns_seen{$col_name}=1;
## For each matrix, loop through and increment the
## row/column value
foreach my $matrix (@matrixList) {
my $row_name = $original_row[$positionHash{$matrix}];
$matrix_values{$matrix}{$row_name}{$col_name} += $cell_value;
}
} ## end while
## The following code runs very quicky compare to the
## while loop above (2 mins vs. 6 hrs)
## I'm only including it to produce the desired output
## Create a header row with column names that is the same
## for all matrices
my $header = "";
foreach my $col_name (sort keys %columns_seen) {
$header = $header . "," . "$col_name";
}
## Create a file for each separate matrix
foreach my $matrix (@matrixList) {
## Open output file
my $OUT_FILE = $matrix . ".csv";
open (OUTFILE, ">$OUT_FILE") || die "can't open $OUT_FILE";
## Now we create the first line of file.
## Starting with the matrix name and a comma.
## Then printing out the column names.
my $firstline = $matrix . "$header";
print OUTFILE "$firstline\n";
## Loop for each row in the matrix
foreach my $row_name (keys(%{$matrix_values{$matrix} } )) {
my $line = $row_name;
## Loop for each column in the matrix
foreach my $col_name (sort keys %columns_seen) {
my $cell_value;
if ($matrix_values{$matrix}{$row_name}{$col_name}) {
$cell_value =
$matrix_values{$matrix}{$row_name}{$col_name};
} else {
$cell_value = ".";
}
$line = $line . ",$cell_value";
}
print OUTFILE "$line\n";
}
close OUTFILE;
}
>
> To make it runnable, I had to realize that %positionHash is nowhere
> set and come up with a plausible one. Same for @matrixList. I had
> to find that %columns_seen is nowhere used, and discard it. Then I
> had to generate a set of input data of for the program to run with.
> It would have been your job to do that, and you are far better
equipped
> to do it.
>
> That said, I don't see room for fundamental improvement. Apparently
> each "cell value" contributes to all matrices in the same column,
> but in lines that are determined indirectly (though %positionHash).
>
> You program does that without doing any excessive extra work. There
> may be re-arrangements of the data structure with corresponding code
> adaptions that make it marginally faster, but I wouldn't expect
> anything better than 10%.
>
> > I tried using DProf & dprofpp, but that didn't reveal anything
> > interesting.
>
> It can't. DProf works on subroutine basis, but your code doesn't
> use any subroutines.
>
> > I also tried setting the initial size of each hash
using
> > 'keys', but this didn't show any improvement. I could only
initialize
> > the hash of hashes - and not the third level of hashes (since I
don't
> > know the values in the second hash until they are read in from the
> > file). I know that memory allocation in C is expensive, as is
> > re-hashing - I suspect that's what's taking up a lot of the time.
>
> One thing to observe is whether program speed deteriorates over
> time. Just print something to stderr every so-many records and
> watch the rhythm. If it gets slower with time, the problem is most
> likely memory related. If it doesn't, you're none the wiser.
The runtime scales linearly with the number of rows, so
I don't believe it to be a memory issue.
>
> Anno
------------------------------
Date: Wed, 25 Aug 2004 17:47:02 GMT
From: Joe Davison <haltingNOSPAM@comcast.net>
Subject: Re: performance surprise -- why?
Message-Id: <m27jrnxg8d.fsf@Jupiter.local>
On 25 Aug 2004, Anno Siegel wrote:
> Joe Davison <haltingNOSPAM@comcast.net> wrote in comp.lang.perl.misc:
> > I'm searching the genome with a perl script I wrote and encountered
> > a surprise when I tried to improve the performance -- it got worse,
> > much worse, and I'm wondering if there's a better way to do my
> > second effort.
> >
> > Here's the basic problem:
> >
> > Given a short sequence, say AGTACT, and a chromosome, say
> > CCCTAAACCCTAAACCCTAAACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAAAT...(30MB
> > string).
> >
> > I want to find all the places in the chromosome where the sequence
> > occurs.
> >
> > Method 1: $genome =~ s/($sequence)/\n$1/g; I then wrote the chopped
> > string to a file and counted the lengths of the lines using a simple
> > awk program (don't worry about why).
> >
> > That runs in about 4 seconds on my G3 iBook. But I figured I didn't
> > really need a second copy of a 30MB file that differed only in the
> > placement and number of newlines, so why not just save the positions
> > where it starts? I looked at somebody else's code and tried:
> >
> > Method 2: $_=$genome;
> > while( m/$sequence/g) { push @indices,pos();}
> > and then write out the indices.
> >
> > I only waited half an hour on my iBook before I killed that one.
> >
> > I tried again with a couple of smaller files on my G4 desktop.
> >
> >
> > Method 5MB time 11MB time
> > 1 2.6 sec 5.1 sec
> > 2 48.1 sec 3:20.2 = 200.2 sec
>
> That is unexpected. The substitution method must move parts of the
> string for every match, so I'd expect it to be slower than global
> matching.
>
> I benchmarked both, and also a method based on the index() function.
> The results show indexing and global matching in the same ballpark,
> both almost twice as fast as substitution (code appended below):
>
> substitute 13.6/s -- -41% -45%
> indexing 23.1/s 69% -- -7%
> globmatch 24.8/s 82% 7% --
>
> This was on a slow matching with much smaller (200 K) samples, but
> the dependence on size should be largely linear. Where your
> quadratic (well, more-than-linear) behavior comes from is anybody's
> guess.
>
> Anno
>
Thanks.
I took your program and modified it so that instead of generating the
string you're matching, it read if from a file, and took the file name
from the command line -- so I could feed it the same files I'd used
before.
My files are k1, k10, k50, k100 and k200
(k1 being 1000 lines, k10 being 10,000 lines, etc)
the file sizes are:
k1 59K
k10 595K
k50 2M
k100 5M
k200 11M
The files have newlines every 60 char or so, but I strip them before
running the test. -- Oh yes, I moved the nl stripping out of substitute(),
and did it last.
Here's my results -- the time line is for the whole program, which
includes time to read the file and strip the newlines.
Interesting that on my system
867 MHz Apple G4 1.25GB ram, Mac OS X(10.3.5), perl 5.8.1
substitute is close to indexing, both significantly faster than
globmatch. Looks like that may be worth trying, at any rate.
I don't understand the percentages being displayed here --
k1:
Rate globmatch substitute indexing
globmatch 355/s -- -56% -60%
substitute 815/s 129% -- -7%
indexing 878/s 147% 8% --
time: 3.08s user 1.68s system 90% cpu 5.261 total
k10:
Rate globmatch substitute indexing
globmatch 3.60/s -- -95% -96%
substitute 67.6/s 1777% -- -20%
indexing 84.6/s 2248% 25% --
time: 2.91s user 1.54s system 90% cpu 4.922 total
k50:
(warning: too few iterations for a reliable count)
Rate globmatch substitute indexing
globmatch 9.77e-02/s -- -99% -99%
substitute 11.3/s 11492% -- -20%
indexing 14.2/s 14399% 25% --
time: 10.38s user 13.20s system 89% cpu 26.479 total
k100:
(warning: too few iterations for a reliable count)
Rate globmatch substitute indexing
globmatch 2.42e-02/s -- -100% -100%
substitute 5.71/s 23477% -- -16%
indexing 6.80/s 27941% 19% --
time: 34.66s user 50.17s system 88% cpu 1:35.80 total
k200:
(warning: too few iterations for a reliable count)
(warning: too few iterations for a reliable count)
Rate globmatch substitute indexing
globmatch 5.65e-03/s -- -100% -100%
substitute 2.83/s 50017% -- -17%
indexing 3.42/s 60440% 21% --
time: 142.39s user 214.24s system 85% cpu 6:56.92 total
joe
------------------------------
Date: Wed, 25 Aug 2004 14:31:49 +0100
From: "Richard Gration" <richard@zync.co.uk>
Subject: Re: Perl and DOS I/O
Message-Id: <cgi4eh$4lb$1@news.freedom2surf.net>
In article <878e8c25.0408242344.289ccfb6@posting.google.com>, "Hemant
Kumar" <kumarh@gmail.com> wrote:
<snip>
> I want the perl script to read the results generated. Based on results
> read, give input for getchar. I have tried using system(), Open (this
> only allows me to do either input or output but not both). I don't have
> the source for the DOS program so need to do both input and output from
> Perl.
> Can you suggest a way to get around these limitations ? Thanks a lot,
> Hemant
I'd echo the comments of Graham Wood, use IPC::Open2. I used it to drive
a coin toss program which was in essence identical to your random num
generator. It's very easy, if you adapt the example from the docs of
IPC::Open2 you should have a working program in about 5 or 10 minutes!
Rich
------------------------------
Date: Wed, 25 Aug 2004 13:56:30 +0000 (UTC)
From: dkcombs@panix.com (David Combs)
Subject: Re: recursive functions
Message-Id: <cgi5qe$ltk$1@reader1.panix.com>
In article <57ydnRUXUOKhhozcRVn-pg@adelphia.com>,
Sherm Pendley <spamtrap@dot-app.org> wrote:
>steve_f wrote:
>
>> hmmmm....!! ok, yes this post was off topic, I really should of found a
>> programming group!
>
>To bring it back on-topic, then, you might want to have a look at
>O'Reilly's "Mastering Algorithms with Perl". A number of recursive
>algorithms are discussed in it.
I think he'd be a lot better off getting one or several
standard algorithms texts --- over the last 20 or so years there's
been so very many good ones!
Go to amazon and search, I guess, for "algorithms" or the like,
and read the reader-reviews.
David
------------------------------
Date: 25 Aug 2004 10:29:02 -0700
From: tomaco@gmail.com (Chris Heller)
Subject: Re: Simulating the open() command.
Message-Id: <f9630e2c.0408250929.6f4d4b03@posting.google.com>
Tassilo,
I tried your approach and it seems to work to a point, but I am
noticing some strange effects. Perhaps this would be better in a new
message, but I'll post here since it is realted to my original
question still.
here is the code as it stands now:
sub myopen(*$$$$){
my ($fh, $fileop, $ip, $port) = @_;
my $fhr = qualify_to_ref($fh);
socket($fhr, PF_INET, SOCK_STREAM, (getprotobyname('tcp))[2]);
...
print $fhr "$fileop\n";
print "ERROR: Socket closed in myopen()\n" if (! -s $fhr);
return 1;
}
sub myclose(*){
my $fh = shift;
my $fhr = qualify_to_ref($fh);
print $fhr "EOT\n"; # End of Transmission Delimiter
close($fhr);
return 1;
}
and my little test app does this:
myopen SKT, ">writeop", $ip, $port;
print SKT "LINE1\n";
print SKT "LINE2\n";
myclose(SKT);
The strangeness comes when I watch my server in the debugger, and
watch how the perl script operates and errors.
In the myopen() command, the line:
print "ERROR: Socket closed in myopen()\n" if (! -s $fhr);
executes, and the error message is printed, indicating that the socket
is now closed.
This seems to be the case because in the test script (running perl -w)
the two print commands both warn that they are printing on a unopened
filehandle SKT.
But when I get into myclose() the line:
print $fhr "EOT\n";
does not print such a warning, and watching the data come to my server
the string "EOT" does pass across the wire!
So, it seems that I am able to create an open filehandle in myopen(),
which is good, but when I return from myopen() that filehandle appears
closed to perl, not so good, but when I then enter myclose() that same
filehandle is once again open and things continue on accordingly!
How can this be? Shouldn't the filehandle be closed in myclose()?
-Chris
------------------------------
Date: Wed, 25 Aug 2004 17:59:29 GMT
From: "Bill Segraves" <segraves_f13@mindspring.com>
Subject: Re: Slide show: this should be fairly straightforward - a what language to use question
Message-Id: <5s4Xc.12334$2L3.4382@newsread3.news.atl.earthlink.net>
"Al Davis" <no-email@no-one.net> wrote in message
news:ia5ni0pgnn2h1gkrgqig0mlkhhb64qkq4v@4ax.com...
<Lengthy specs deleted>
Randal Schwartz' website has some articles that may help you get sarted,
e.g.,
http://www.stonehenge.com/merlyn/WebTechniques/
Look at articles 29 and 33.
--
Bill Segraves
------------------------------
Date: Wed, 25 Aug 2004 13:04:58 GMT
From: Mark Bole <makbo@pacbell.net>
Subject: Re: start some actions with Perl without Cron?
Message-Id: <_70Xc.12037$2l2.2648@newssvr29.news.prodigy.com>
Sara wrote:
> "PHP2" <gp@nospm.hr> wrote in message news:<cg5u56$d6$1@ls219.htnet.hr>...
>
>>is possible start some actions with Perl without Cron?
>>
>>for example send email to users from database after 3 days or delete
>>something from database automaticaly after 3 day with Perl but without Cron?
>
>
>
> You can "sleep" for 60 x 60 x 24 x 3 seconds. I'm not sure if that
> particular integer is in the range of "sleep"; I'll leave that as an
> exercise to the reader.
>
> If that's what you mean. But you won't get the builtin advantages of
> using cron like crash recovery, logging, etc.
>
> However, on some systems the admin locks down cron with /etc/cron.deny
> or allow, so occasionally you'll need to "roll your own" in those
> hostile environments. I've done it, but never with such an extended
> sleep time.
>
> Personally if I was going to sleep that long, I'd be more inclined to
> have a small shellscript that kicks off the Perl script every n
> seconds. That way the program isn't perpetually resident (yes I'm an
> old-timer!).
>
> G
Your examples all mention "database". Some databases, such as Oracle,
have a built-in job scheduler that provide capabilities similar to cron,
so you could just use that instead (portable across Windows and Unix).
There is also built-in SMTP support for sending e-mails (in Oracle).
There's also the "at" command which is still cron under the covers, but
doesn't require an entry in the crontab file. You can even have each
"at" job, as its last step, schedule another "at" job.
--Mark Bole
------------------------------
Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin)
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>
Administrivia:
#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc. For subscription or unsubscription requests, send
#the single line:
#
# subscribe perl-users
#or:
# unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.
NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice.
To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.
#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.
#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.
------------------------------
End of Perl-Users Digest V10 Issue 6909
***************************************