[24755] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 6908 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Aug 25 09:06:09 2004

Date: Wed, 25 Aug 2004 06:05:09 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 25 Aug 2004     Volume: 10 Number: 6908

Today's topics:
    Re: Can this code be better? <>
    Re: convertinga directory path into a hash (Anno Siegel)
    Re: Information exchange between unrelated processes (Anno Siegel)
        old perl FTP module (justme)
    Re: old perl FTP module $_@_.%_
        open of a pipe and waitpid() <andreas@andiboehm.de>
    Re: open of a pipe and waitpid() (Anno Siegel)
        Oracle DBI/DBD and bind vars - so slooooowwwww <lawrence.tierney@bipsolutions.com>
    Re: Oracle's RPAD problematic via Perl's DBI module <pm@katz.cc.univie.ac.at>
    Re: Oracle's RPAD problematic via Perl's DBI module <nospam@bigpond.com>
    Re: Oracle's RPAD problematic via Perl's DBI module <makbo@pacbell.net>
    Re: Oracle's RPAD problematic via Perl's DBI module (J)
    Re: Parsing FileName for upload <richard@zync.co.uk>
    Re: Parsing FileName for upload <tore@aursand.no>
    Re: Performance Improvement of complex data structure ( (Anno Siegel)
    Re: performance surprise -- why? (Anno Siegel)
        Perl and DOS I/O (Hemant Kumar)
    Re: Perl and DOS I/O <Graham.T.removethis.Wood@oracle.andthis.com>
    Re: Perl and DOS I/O <jurgenex@hotmail.com>
    Re: PHP in a Perl Script <richard@zync.co.uk>
    Re: split question <usenet@morrow.me.uk>
    Re: split question <rafalk@comcast.net>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 25 Aug 2004 08:34:36 -0400
From: Lou Moran <>
Subject: Re: Can this code be better?
Message-Id: <tdvoi0p2r5kut1ahhtjfeh74cos9pkij29@4ax.com>

On Tue, 24 Aug 2004 23:47:57 +0100, Brian McCauley <nobull@mail.com>
wrote:

>
>
>John W. Krahn wrote:
>
>> Brian McCauley wrote:
> >
>>>   chomp my $loc = lc <STDIN>;
>>  
>> Precedence Brian!  You need parentheses for chomp to do the right thing.
>> 
>>    chomp( my $loc = lc <STDIN> );
>
>Oops.

oddly I knew what you meant.


------------------------------

Date: 25 Aug 2004 12:27:59 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: convertinga directory path into a hash
Message-Id: <cgi0kf$kk$1@mamenchi.zrz.TU-Berlin.DE>

Tassilo v. Parseval <tassilo.parseval@post.rwth-aachen.de> wrote in comp.lang.perl.misc:
> Also sprach Zebee Johnstone:
> 
> > In comp.lang.perl.misc on Tue, 24 Aug 2004 09:57:17 +0200
> > Tassilo v. Parseval <tassilo.von.parseval@rwth-aachen.de> wrote:
> >> Also sprach Zebee Johnstone:
> >> 
> >>> I have a unix directory path, say /home/user/mail
> >>> 
> >>> Not knowing how long it will be, that is, how many elements, how can I
> >>> convert it into a sequence of hashes:
> >>> 	$ref->{'home'}->{'user'}->{'mail'}
> >> 
> >> This can be done with a simple recursive function:
> >> 
> >>     sub path2hash {
> >> 	my ($p, $ref) = @_;
> >> 	return if not $p;
> >> 	my ($head, $tail) = $p =~ m!/?([^/]+)(.*)!;
> >> 	path2hash($tail, $ref->{ $head } = {});
> > 
> > I don't understand what you are passing here, and how the subroutine
> > sees it.  
> 
> It's a short-cut. More explicitely:
> 
>     $ref->{ $head } = { };
>     path2hash($tail, $ref->{ $head });
> 
> In Perl, assignments have a return value and that was what I was using
> here.
> 
> Other than that, it should be pretty straight-forward. Note that this is
> a so called primitive recursion because each instantiation of the
> function cuts off a piece of its argument ($head) and calls path2hash
> with the thusly diminshed argument ($tail).

That means that recursion can be replaced with a jump to the
subroutine.  Untested:

     sub path2hash {
    my ($p, $ref) = @_;
    return if not $p;
    my ($head, $tail) = $p =~ m!/?([^/]+)(.*)!;
    @_ = ( $tail, $ref->{ $head} = {});
    goto &path2hash;

"goto &..." is (among other things) Perl's method to cut off tail
recursion.

Anno


------------------------------

Date: 25 Aug 2004 10:46:00 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Information exchange between unrelated processes
Message-Id: <cghql8$pvm$2@mamenchi.zrz.TU-Berlin.DE>

pk <p.krupp@web.de> wrote in comp.lang.perl.misc:
> Hi!
> 
> The Task: There are different unrelated processes which have to
> communicate informations. They must be able to read, write and delete
> these informations from/to a central entity and it must be guaranteed,
> that, while one process is writing, no other one can do the same.
> 
> For the moment, this is realized with a file and a locked acces to it.
> This works fine, but I guess it is rather slow and uses quite a lot of
> resources. So I thought of using shared memory, but in the System V
> IPC Section of the Camel-Book I find some arguments against it.
> 
> So what would we an apropriate technique?

That is discussed in perlipc.  Did you read that?

Anno


------------------------------

Date: 25 Aug 2004 05:27:45 -0700
From: eight02645999@yahoo.com (justme)
Subject: old perl FTP module
Message-Id: <c0837966.0408250427.14fe67c6@posting.google.com>

hi

i have perl version 5.005_03 and i am looking for Net::FTP module
that is suitable for my perl version. Anyone knows where i can
find old perl modules like this...
thanks


------------------------------

Date: Wed, 25 Aug 2004 12:31:48 GMT
From: $_@_.%_
Subject: Re: old perl FTP module
Message-Id: <UE%Wc.7007$rT1.5468@trndny02>


eight02645999@yahoo.com (justme) wrote in message-id:
<c0837966.0408250427.14fe67c6@posting.google.com>
>
>hi
>
>i have perl version 5.005_03 and i am looking for Net::FTP module
>that is suitable for my perl version. Anyone knows where i can
>find old perl modules like this...
>thanks

You may be able to find what you are looking for here:

http://backpan.cpan.org/authors/id/G/GB/GBARR/




------------------------------

Date: Wed, 25 Aug 2004 11:51:04 +0200
From: Andreas Boehm <andreas@andiboehm.de>
Subject: open of a pipe and waitpid()
Message-Id: <2p35o8Fg7g4vU1@uni-berlin.de>

Hello,

does there exist an interference between close() and waitpid() in the 
folling code fragement, that uses an open() to a piped command?

           $fhcmd=new FileHandle;
           $pidcmd=open($fhcmd,"$cmd |");
           if (defined($pidcmd)) {
             $selcmd=new IO::Select($fhcmd);
             while (!eof($fhcmd)) {
               my (@cmdfhready);
               @cmdfhready=$selcmd->can_read(1);
               if (scalar(@cmdfhready)>0) {
                 $a=<$fhcmd>;
                 print STDOUT $a;
               }
             }
             close $fhcmd;
             waitpid $pidcmd, 0;
           } else {
             $fhcmd=undef;
             $pidcmd=undef;
             $OK=1;
           }

Does there exist a need of a waitpid() after the close()?

regards,
Andreas
	



------------------------------

Date: 25 Aug 2004 10:51:23 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: open of a pipe and waitpid()
Message-Id: <cghqvb$pvm$3@mamenchi.zrz.TU-Berlin.DE>

Andreas Boehm  <andreas@andiboehm.de> wrote in comp.lang.perl.misc:
> Hello,
> 
> does there exist an interference between close() and waitpid() in the 
> folling code fragement, that uses an open() to a piped command?
> 
>            $fhcmd=new FileHandle;
>            $pidcmd=open($fhcmd,"$cmd |");
>            if (defined($pidcmd)) {
>              $selcmd=new IO::Select($fhcmd);
>              while (!eof($fhcmd)) {
>                my (@cmdfhready);
>                @cmdfhready=$selcmd->can_read(1);
>                if (scalar(@cmdfhready)>0) {
>                  $a=<$fhcmd>;
>                  print STDOUT $a;
>                }
>              }
>              close $fhcmd;
>              waitpid $pidcmd, 0;
>            } else {
>              $fhcmd=undef;
>              $pidcmd=undef;
>              $OK=1;
>            }
> 
> Does there exist a need of a waitpid() after the close()?

No.

From "perldoc close":

               set to 0.)  Closing a pipe also waits for the
               process executing on the pipe to complete, ...

The only way to "wait (for a process) to complete" under Unix is
wait() or watipid(), so close has already done that.

Anno


------------------------------

Date: Wed, 25 Aug 2004 08:40:05 GMT
From: "Lord0" <lawrence.tierney@bipsolutions.com>
Subject: Oracle DBI/DBD and bind vars - so slooooowwwww
Message-Id: <FfYWc.12$D83.5@newsfe3-gui.ntli.net>

Hi there,

We're having a bit of a problem using bind vars with Oracle via
DBI/DBD::Oracle. Basically if we use bind vars the following code/query
takes about 35 seconds to return results if we don't use bind vars i.e. hard
code the search parameters into the query then the results are returned in
about 10seconds! Any ideas? We are using the following environment:

OS: White Hat Enterprise Linux (2.4..21-15.EL.smp)
Perl:  5.8.0.3
DBI: 1.43.4
DBD::Oracle: 1.15

###### CODE BEGINS #####

#!/usr/bin/perl

## Wee script to demonstrate the issue

use DBI;
use strict;
use CGI::Carp qw/fatalsToBrowser/;
use CGI;
my $q=new CGI;
print $q->header();
my  $sql = '';

#with bindvars
$sql="SELECT id, title, to_char(auto_enter_date, 'DD/MM/YYYY') as
auto_enter_date, tracker_ref, SUBSTR(description, 1, 255) AS description
FROM boss_contract_admin WHERE (1=1) AND auto_enter_date >= TO_DATE(?,
'DD/MM/YYYY') AND auto_enter_date <= TO_DATE(?, 'DD/MM/YYYY') AND
CONTAINS(complete_entry, ?) > 0";

#without bindvars
#$sql="SELECT id, title, to_char(auto_enter_date, 'DD/MM/YYYY') as
auto_enter_date, tracker_ref, SUBSTR(description, 1, 255) AS description
FROM boss_contract_admin WHERE (1=1) AND auto_enter_date >=
TO_DATE('24/05/2004', 'DD/MM/YYYY') AND auto_enter_date <=
TO_DATE('21/08/2004', 'DD/MM/YYYY') AND CONTAINS(complete_entry, 'network' )
> 0";

my $dbh=DBI->connect("DBI:Oracle:BOSS", username, password) or
die("\n\nERROR: Can't connect to Database\n$DBI::errstr\n\n");

$dbh->{RaiseError}=1;
$dbh->{LongReadLen}=1048576; # Set the max size of text inserts/updates to
1Mb
$dbh->{LongTruncOk}=1; # If text string is over 1Mb then silently truncate,
do not throw error
$dbh->{AutoCommit}=0;
$dbh->{FetchHashKeyName} = 'NAME_lc';

my $sth = $dbh->prepare($sql);

#comment the next 6 lines out if not using bind vars
my @bind_vars = ('24/05/2004', '21/08/2004', 'network');
my $i=1;
foreach my $bind_var(@bind_vars) {
 $sth->bind_param($i, $bind_var);#
 $i++;
}

$sth->execute();
my $data = $sth->fetchall_hashref('id');
use Data::Dumper; die Dumper($data);

#### CODE ENDS ####

I'd appreciate your thoughts as we are pretty stumped. We were sure that
using bind vars would be faster esp for subsequent searches as Oracle would
cache the search but we are just not seeing this.

Kind regards

Lord0




------------------------------

Date: 25 Aug 2004 09:44:24 GMT
From: Peter Marksteiner <pm@katz.cc.univie.ac.at>
Subject: Re: Oracle's RPAD problematic via Perl's DBI module
Message-Id: <412c5f78$0$11094$3b214f66@usenet.univie.ac.at>

In comp.lang.perl.misc dn_perl@hotmail.com <dn_perl@hotmail.com> wrote:
: Say the table is : students(name CHAR(8))
: Entries in Students are : '        ' (8 blanks),
: 'bob     '   and     'dave    ' .

: my $st_name = " " ;  # non-blank name
: my $dstmt = $dbh->prepare("select count(*) from students 
:       where name = RPAD(?,8) ") ;    # STMT AA
: $dstmt->execute($st_name) or die "sql call failed";   
: my $num_entries = $dstmt->fetchrow() ;
: $dstmt->finish ;

: $num_entries should be set to 1; instead it is set to 0.
: This problem occurs only for a blank string.

: But still I am surprised why STMT AA (above) fails to return
: the expected result.

What Oracle Version are you using? There are some subtle differences
in the handling of trailing blanks between Oracle 8 and 9. I have a 
table "foo" containing two blank entries. When running the following code

my $ss = "SELECT COUNT(*) FROM foo WHERE bar = RPAD(?,8)";
my $blank = ' ';
my $sth = $dbh->prepare($ss) or die;
$sth->execute($blank) or die;
print "Blank: ",  $sth->fetch->[0], "\n";

I get the following result:

Blank: 2        # Oracle 9 database using local Oracle 9 client 
Blank: 2        # Oracle 9 client connecting to a remote Oracle 8 database
Blank: 0        # Oracle 8 database using local Oracle 8 client
Blank: 0        # Oracle 8 client connecting to a remote Oracle 9 database

   Peter

-- 
Peter Marksteiner
Vienna University Computer Center


------------------------------

Date: Wed, 25 Aug 2004 19:53:00 +1000
From: Gregory Toomey <nospam@bigpond.com>
Subject: Re: Oracle's RPAD problematic via Perl's DBI module
Message-Id: <2p35s0Fcv605U1@uni-berlin.de>

dn_perl@hotmail.com wrote:

> The following code is returning an unexpected result.
> (Untested cut-n-paste; apologies)
> 
> 
> Say the table is : students(name CHAR(8))
> Entries in Students are : '        ' (8 blanks),
> 'bob     '   and     'dave    ' .
> 
> use strict ;
> use DBI ;
> 
> my $st_name = " " ;  # non-blank name
> my $dstmt = $dbh->prepare("select count(*) from students
>       where name = RPAD(?,8) ") ;    # STMT AA
> $dstmt->execute($st_name) or die "sql call failed";
> my $num_entries = $dstmt->fetchrow() ;
> $dstmt->finish ;
> 
> $num_entries should be set to 1; instead it is set to 0.
> This problem occurs only for a blank string.
> If $st_name = "dave" , then the statements work properly.
> 
> If I run the query via sqlplus :
> select count(*) from students where name = RPAD(' ',8) ,
>     the result is 1, as expected.
>    
> I am using the getaround :
> my $dstmt = $dbh->prepare("select count(*) from students
>       where trim(name) = ? or (name = ' ' and trim(?) is null)  ") ;
> $dstmt->execute($st_name, $st_name) or die "sql call failed";

This is doing a full table scan.
 
> But still I am surprised why STMT AA (above) fails to return
> the expected result.
> 
> -----------

For a start its very unusual to rpad fields in a database.

The underlying problem probably has to do with Oracle/dbi treatment of ''
and NULL.

gtoomey


------------------------------

Date: Wed, 25 Aug 2004 11:29:27 GMT
From: Mark Bole <makbo@pacbell.net>
Subject: Re: Oracle's RPAD problematic via Perl's DBI module
Message-Id: <rK_Wc.12029$F32.1595@newssvr29.news.prodigy.com>

Peter Marksteiner wrote:

> In comp.lang.perl.misc dn_perl@hotmail.com <dn_perl@hotmail.com> wrote:
> : Say the table is : students(name CHAR(8))
> : Entries in Students are : '        ' (8 blanks),
> : 'bob     '   and     'dave    ' .
> 
> : my $st_name = " " ;  # non-blank name
> : my $dstmt = $dbh->prepare("select count(*) from students 
> :       where name = RPAD(?,8) ") ;    # STMT AA
> : $dstmt->execute($st_name) or die "sql call failed";   
> : my $num_entries = $dstmt->fetchrow() ;
> : $dstmt->finish ;
> 
> : $num_entries should be set to 1; instead it is set to 0.
> : This problem occurs only for a blank string.
> 
> : But still I am surprised why STMT AA (above) fails to return
> : the expected result.
> 
> What Oracle Version are you using? There are some subtle differences
> in the handling of trailing blanks between Oracle 8 and 9. I have a 
> table "foo" containing two blank entries. When running the following code
> 
> my $ss = "SELECT COUNT(*) FROM foo WHERE bar = RPAD(?,8)";
> my $blank = ' ';
> my $sth = $dbh->prepare($ss) or die;
> $sth->execute($blank) or die;
> print "Blank: ",  $sth->fetch->[0], "\n";
> 
> I get the following result:
> 
> Blank: 2        # Oracle 9 database using local Oracle 9 client 
> Blank: 2        # Oracle 9 client connecting to a remote Oracle 8 database
> Blank: 0        # Oracle 8 database using local Oracle 8 client
> Blank: 0        # Oracle 8 client connecting to a remote Oracle 9 database
> 
>    Peter
> 

I too received the correct result using the OP's test case under Oracle 
9i client and server.

Only Oracle 8.1.7.4 client (terminal release of 8i product) is certified 
for connecting to Oracle 9i server.  Earlier versions (such as 8.1.7.3) 
are not, and you may also face other issues, such as mysterious errors 
related to the DATE datatype.  Highly recommended to recompile your DBD 
module using the Oracle 9i libraries if you haven't done so.

You might also find help in the Perl documentation under Database Handle 
Attributes, "ora_ph_type".  Your use of the CHAR datatype in your table 
is unusual in my experience, one almost always uses VARCHAR2 instead. 
Try searching for "blank-padded comparison semantics" at 
http://tahiti.oracle.com

ORA_VARCHAR2 -  Strip trailing spaces and allow embedded \0 bytes.
              This is the normal default placeholder type.

ORA_STRING - Don't strip trailing spaces and end the string at
              the first \0.

ORA_CHAR - Don't strip trailing spaces and allow embedded \0.
              Force 'blank-padded comparison semantics'.

--Mark Bole



------------------------------

Date: 25 Aug 2004 04:50:35 -0700
From: jdiff@rediffmail.com (J)
Subject: Re: Oracle's RPAD problematic via Perl's DBI module
Message-Id: <838e6a73.0408250350.40cbe47d@posting.google.com>

dn_perl@hotmail.com (dn_perl@hotmail.com) wrote in message news:<97314b5b.0408242150.216b914c@posting.google.com>...
> The following code is returning an unexpected result.
> (Untested cut-n-paste; apologies)
> 
> 
> Say the table is : students(name CHAR(8))
> Entries in Students are : '        ' (8 blanks),
> 'bob     '   and     'dave    ' .
> 
> use strict ;
> use DBI ;
> 
> my $st_name = " " ;  # non-blank name
> my $dstmt = $dbh->prepare("select count(*) from students 
>       where name = RPAD(?,8) ") ;    # STMT AA
> $dstmt->execute($st_name) or die "sql call failed";   
> my $num_entries = $dstmt->fetchrow() ;
> $dstmt->finish ;
> 
> $num_entries should be set to 1; instead it is set to 0.
> This problem occurs only for a blank string.
> If $st_name = "dave" , then the statements work properly.
> 
> If I run the query via sqlplus :
> select count(*) from students where name = RPAD(' ',8) ,
>     the result is 1, as expected.
> 
> 
> I am using the getaround :
> my $dstmt = $dbh->prepare("select count(*) from students 
>       where trim(name) = ? or (name = ' ' and trim(?) is null)  ") ; 
> $dstmt->execute($st_name, $st_name) or die "sql call failed";  
> 
> 
> But still I am surprised why STMT AA (above) fails to return
> the expected result.
> 
> -----------

   Check this out
   RPAD(?,8)
    
  Does this work?Not probably!


------------------------------

Date: Wed, 25 Aug 2004 10:30:45 +0100
From: "Richard Gration" <richard@zync.co.uk>
Subject: Re: Parsing FileName for upload
Message-Id: <cghmae$eb$1@news.freedom2surf.net>

In article <f896a829.0408241349.46d52e77@posting.google.com>, "Tony
McGuire" <tony@paradoxcommunity.com> wrote:


> If a user selects a file on a Windows box, with IE at least, the FULL
> PATH of the file on the user's system is transmitted to the server.  If
> the user does the same thing using Opera on a Linux box, and apparently
> Firebird, then only the specific filename gets transmitted to the
> server.

Here are some snippets from code I wrote to deal with this exact problem:

    use CGI qw(&param &cookie &upload &url_param &uploadInfo);
    use File::Basename;

    my ($error,$flag);
    my @img_file_suffices = qw(jpg jpeg gif png bmp);
    my $isWin = ($ENV{HTTP_USER_AGENT} =~ /Windows/);

    my $fh = upload('img_file');
    my $fn = param('img_file');
    my $fi = uploadInfo($fn);
    my $user_dir = get_upload_dir_for_user($c,$i);
    if ($fi->{'Content-Type'} =~ /^image/) {
          fileparse_set_fstype("MSWin32") if ($isWin);
          my ($fn_name,$fn_path,$fn_suffix) = fileparse($fn,@img_file_suffices);
          fileparse_set_fstype("Unix") if ($isWin);
          $fn = $fn_name . $fn_suffix;
          my $index = 0;
          my $path = "$ENV{DOCUMENT_ROOT}/$user_dir";
          my $filename = "$i->{ui}->{uid}_${index}_$fn";
          while (-e "$path/$filename") {
               $index++;
               $filename = "$i->{ui}->{uid}_${index}_$fn";
          }

          open (IMG,">$path/$filename") or die MyApp::Error->new(403,qq(Error writing file "$path/$filename": $!));
          while (<$fh>) {print IMG}
          close (IMG);
          $i->{img_src_link} = "/$user_dir/$filename";
    } else {
          $error++;
          $flag = 'ERROR_NOT_AN_IMAGE_FILE';
    }

HTH
Rich


------------------------------

Date: Wed, 25 Aug 2004 14:33:25 +0200
From: Tore Aursand <tore@aursand.no>
Subject: Re: Parsing FileName for upload
Message-Id: <pan.2004.08.25.12.33.24.181366@aursand.no>

On Tue, 24 Aug 2004 14:49:59 -0700, Tony McGuire wrote:
> I've been going batty trying to figure out a routine that will detect
> when there is a full path sent and parse the file name from that path,
> and when there is only a file name sent.

Use the File::Basename module.  With that module, you can easily extract
the various parts of a filename (i.e. path and the filename itself), and
then compare it to the original string.

BTW: Why do you need to know _if_ there is a full path present?


-- 
Tore Aursand <tore@aursand.no>
"The purpose of all war is ultimately peace." (Saint Augustine)


------------------------------

Date: 25 Aug 2004 09:19:45 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Performance Improvement of complex data structure (hash of hashes of hashes)
Message-Id: <cghljh$mfv$1@mamenchi.zrz.TU-Berlin.DE>

Scott Gilpin <sgilpin@gmail.com> wrote in comp.lang.perl.misc:
> Hi everyone -
> 
> I'm trying to improve the performance (runtime) of a program that
> processes large files.  The output of the processing is some fixed
> number of matrices (that can vary between invocations of the program),
> each of which has a different number of rows, and the same number of
> columns.  However, the number of rows and columns may not be known
> until the last row of the original file is read.  The original file
> contains approximately 100 millon rows.  Each individual matrix has
> between 5 and 200 rows, and between 50 and 10000 columns.  The data
> structure I'm using is a hash of hashes of hashes that stores this
> info.   N is the total number of columns, M1 is the total number of
> rows in matrix #1, M2 is the total number of rows in matrix 2, etc,
> etc.  The total number of matrices is between 3 and 15.

[...]

> Here is the code that I'm using to build up this data structure.  I'm
> running perl version 5.8.3 on solaris 8 (sparc processor).  The system
> is not memory bound or cpu bound - this program is really the only
> thing that runs.  There are several gigabytes of memory, and this
> program doesn't grow bigger than around 100 MB.  Right now the run time
> for the following while loop with 100 million rows of data is about 6
> hours.  Any small improvements would be great.

It shouldn't take that long, unless the data structure blows up way
beyond 100 MB.

> ## loop to process each row of the original data
> while(<INDATA>)
> {
> chomp($_);
> 
> 
> ## Each row is delimited with |
> my @original_row = split(/\|/o,$_);
> 
> ## The cell value and the column name are always in the same
> position
> my $cell_value = $original_row[24];
> my $col_name = $original_row[1];
> 
> ## Add this column name to the list of ones we've seen
> $columns_seen{$col_name}=1;

Where is this used?

> ##  For each matrix, loop through and increment the
> row/column value
> foreach my  $matrix   (@matrixList)

Where is @matrixList set?

> {
> 
> ## positionHash tells the position of the value for
> ## this matrix in the original data row
> my $row_name = $original_row[$positionHash{$matrix}];

Where is %positionHash set?

> $matrix_values{$matrix}{$row_name}{$col_name} +=
> $cell_value;
> }
> 
> }   ## end while

This code isn't runnable.  How are we to improve code we can't run?

To make it runnable, I had to realize that %positionHash is nowhere
set and come up with a plausible one.  Same for @matrixList.  I had
to find that %columns_seen is nowhere used, and discard it.  Then I
had to generate a set of input data of for the program to run with.
It would have been your job to do that, and you are far better equipped
to do it.

That said, I don't see room for fundamental improvement.  Apparently
each "cell value" contributes to all matrices in the same column,
but in lines that are determined indirectly (though %positionHash).

You program does that without doing any excessive extra work.  There
may be re-arrangements of the data structure with corresponding code
adaptions that make it marginally faster, but I wouldn't expect
anything better than 10%.

> I tried using DProf & dprofpp,  but that didn't reveal anything
> interesting.

It can't.  DProf works on subroutine basis, but your code doesn't
use any subroutines.

>              I also tried setting the initial size of each hash using
> 'keys', but this didn't show any improvement.  I could only initialize
> the hash of hashes - and not the third level of hashes (since I don't
> know the values in the second hash until they are read in from the
> file).  I know that memory allocation in C is expensive, as is
> re-hashing - I suspect that's what's taking up a lot of the time.

One thing to observe is whether program speed deteriorates over
time.  Just print something to stderr every so-many records and
watch the rhythm.  If it gets slower with time, the problem is most
likely memory related.  If it doesn't, you're none the wiser.

Anno


------------------------------

Date: 25 Aug 2004 10:40:56 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: performance surprise -- why?
Message-Id: <cghqbo$pvm$1@mamenchi.zrz.TU-Berlin.DE>

Joe Davison  <haltingNOSPAM@comcast.net> wrote in comp.lang.perl.misc:
> I'm searching the genome with a perl script I wrote and encountered a
> surprise when I tried to improve the performance -- it got worse, much
> worse, and I'm wondering if there's a better way to do my second effort.
> 
> Here's the basic problem:
> 
> Given a short sequence, say AGTACT, and a chromosome, say 
> CCCTAAACCCTAAACCCTAAACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAAAT...(30MB string).
> 
> I want to find all the places in the chromosome where the sequence
> occurs.
> 
> Method 1:   $genome =~ s/($sequence)/\n$1/g;
> I then wrote the chopped string to a file and counted the lengths of the
> lines using a simple awk program (don't worry about why).
> 
> That runs in about 4 seconds on my G3 iBook. But I figured I didn't
> really need a second copy of a 30MB file that differed only in the
> placement and number of newlines, so why not just save the positions
> where it starts?    I looked at somebody else's code and tried:
> 
> Method 2:  $_=$genome;
>            while( m/$sequence/g) { push @indices,pos();}
> and then write out the indices.
> 
> I only waited half an hour on my iBook before I killed that one.
> 
> I tried again with a couple of smaller files on my G4 desktop.
> 
> 
> Method    5MB time   11MB time
>   1         2.6 sec    5.1 sec
>   2        48.1 sec   3:20.2 = 200.2 sec

That is unexpected.  The substitution method must move parts of the
string for every match, so I'd expect it to be slower than global
matching.

I benchmarked both, and also a method based on the index() function.
The results show indexing and global matching in the same ballpark,
both almost twice as fast as substitution (code appended below):

    substitute 13.6/s         --       -41%       -45%
    indexing   23.1/s        69%         --        -7%
    globmatch  24.8/s        82%         7%         --

This was on a slow matching with much smaller (200 K) samples, but
the dependence on size should be largely linear.  Where your
quadratic (well, more-than-linear) behavior comes from is anybody's
guess.

Anno

=============================================================

#!/usr/local/bin/perl
use strict; use warnings; $| = 1;
use Vi::QuickFix;

use Benchmark ':all';

use constant SIZE => 400_000;
my $sequence = 'AGTACT';
my $str;
$str .= ( qw( A C G T))[ rand 4] for 1 .. SIZE;

goto bench;

print "globmatch: ", globmatch(), "\n";
print "substitute: ", substitute(), "\n";
print "indexing: ", indexing(), "\n";
exit;

bench:
cmpthese( -1, {
    globmatch => 'globmatch',
    substitute => 'substitute',
    indexing => 'indexing',
});

######################################################################

sub globmatch {
    my @indices;
    $_ = $str;
    push @indices, pos while /$sequence/g;
    scalar @indices;
}

sub substitute {
    $_ = $str;
    s/$sequence/\n$sequence/g;
    tr/\n//;
}

sub indexing {
    $_ = $str;
    my @indices;
    my $pos = 0;
    while ( 1 ) {
        last if ( $pos = index( $_, $sequence, $pos)) < 0;
        push @indices, $pos;
        $pos += length $sequence;
    }
    scalar @indices;
}


------------------------------

Date: 25 Aug 2004 00:44:56 -0700
From: kumarh@gmail.com (Hemant Kumar)
Subject: Perl and DOS I/O
Message-Id: <878e8c25.0408242344.289ccfb6@posting.google.com>

I want to start a DOS program from perl, read its output and give it
further input when the program prompts (based on output of the
program).

Suppose the DOS program is as follows

#include <stdio.h>

void main()	{
        int choice;
	srand();

        do {
		printf ("\n%d\n", rand());
  	        printf ("Do you want to do more ?\n");
	        choice = getchar();
        } while ( choice != 'n');

	printf("\nGoodbye\n");
}

I want the perl script to read the results generated. Based on results
read, give input for getchar. I have tried using system(), Open (this
only allows me to do either input or output but not both). I don't
have the source for the DOS program so need to do both input and
output from Perl.
Can you suggest a way to get around these limitations ?

Thanks a lot,
Hemant


------------------------------

Date: Wed, 25 Aug 2004 11:19:35 +0100
From: Graham Wood <Graham.T.removethis.Wood@oracle.andthis.com>
Subject: Re: Perl and DOS I/O
Message-Id: <pTZWc.23$133.66@news.oracle.com>

Hemant Kumar wrote:

> I want to start a DOS program from perl, read its output and give it
> further input when the program prompts (based on output of the
> program).
<snip>

The module IPC::Open2 gives you connections to the input and output of a 
process.  Alternatively you could try Expect.pm.  Expect designed to 
interact with command line programs, to expect certain output from the 
programs and to provide appropriate responses to that output.  There is 
also an IPC::Open3 which gives you STDERR as well as STDIN and STDOUT.

You get IPC::Open2 in perl 5.6.1 but you have to install Expect from CPAN.

I haven't tried any of these.

perldoc IPC::Open2 for details.

Graham



------------------------------

Date: Wed, 25 Aug 2004 11:52:56 GMT
From: "J�rgen Exner" <jurgenex@hotmail.com>
Subject: Re: Perl and DOS I/O
Message-Id: <s4%Wc.2455$2F.1197@trnddc05>

Hemant Kumar wrote:
> I want to start a DOS program from perl, read its output and give it
> further input when the program prompts (based on output of the
> program).

perldoc -q expect

jue




------------------------------

Date: Wed, 25 Aug 2004 10:53:57 +0100
From: "Richard Gration" <richard@zync.co.uk>
Subject: Re: PHP in a Perl Script
Message-Id: <cghnlu$pj$1@news.freedom2surf.net>

In article <MATWc.141004$Oi.85750@fed1read04>, "Gary"
<reachus@netlink.info> wrote:


> I am calling a perl script that writes a WEB page - Plain and simple
> except that the perl script also writes out some PHP processing. All
> works fine when I just create an html file of the program but when I use
> perl to write it the php tags are ignored ?
> I have the apache Xbithack on so php is parsed for all html files and
> the perl script says it is an html file as below.  This code is called
> from a WEB page
> #!/usr/local/bin/perl
> print <<END;
> Content-type: text/html
> <?php
> SCRIPT etc etc
> ?>
> END
> PHP tags are just printed on the screen.  Any pointers.
> Gary
> 

I assume this is on Apache ...

If it's Apache 1.x then the output of your script is parsed for the
presence of a couple of specific headers, which are added if they aren't
there (this is part of the CGI spec) and then sent on it's way to the
browser. It is not possible in Apache 1.x to have this output handed off
to another module for processing (here the request has been served by the
cgi-script handler - you cannot then chain to the php-script handler, not
in Apache 1.x).

However, in Apache 2.x it *is* possible. Not sure how you go about it in
httpd.conf 'cos I don't use it (<aside>Is mod_perl 2.x finished
yet?</aside>), but this is one limitation which was addressed
specifically when designing Apache 2.x, you can now chain content
handlers so that you can have the output of scripts parsed for SSI, have
perl scripts generate php, have SSI generate php which generates perl
(maybe ;-) and other full-on wackiness.

HTH
Rich


------------------------------

Date: Tue, 24 Aug 2004 11:48:41 +0100
From: Ben Morrow <usenet@morrow.me.uk>
Subject: Re: split question
Message-Id: <91jrv1-vd5.ln1@osiris.mauzo.dyndns.org>


Quoth "Mike Lepore" <lepor5e@bestweb.net>:
> Not sure whether this is a split question or a regex question (newbie).
> Given a string variable which contains at least one space character,
> how can I extract everything to the left of the first space,
> and everything to the right of the first space?

IIRC it's Randal who says 'if you know what you want to keep, use a
regex; if you know what you want to throw away, use split' so that makes
this a split question.

> The number of characters and fields can't be hardcoded.
> Example:
> $Variable = "abc def ghi jkl";
> $Variable1 become "abc" and $Variable2  becomes "def ghi jkl"

my ($Variable1, $Variable2) = split / /, $Variable, 2;

Now, which part of perldoc -f split didn't you understand?

Ben

-- 
Every twenty-four hours about 34k children die from the effects of poverty.
Meanwhile, the latest estimate is that 2800 people died on 9/11, so it's like
that image, that ghastly, grey-billowing, double-barrelled fall, repeated
twelve times every day. Full of children. [Iain Banks]         ben@morrow.me.uk


------------------------------

Date: Wed, 25 Aug 2004 07:47:58 -0400
From: "Rafal Konopka" <rafalk@comcast.net>
Subject: Re: split question
Message-Id: <nKydnVExwdB-4bHcRVn-pA@comcast.com>

> Yet another syntactic variant:
>
> ( $Variable1, $Variable2 ) = map {$`,$'} $Variable =~ /\s+/;
>
> Rafal

I personally use either the regex or split!

However, I did not think it was a bad solution because:

1. it plainly shows what's what
2. Having tried it once or twice, I never noticed any performance penalty
(which leads me to believe that on small files, it's probably negligible)

Could someone explain to me why the $`/$' incurs the penalty and
     ( $Variable1, $Variable2 ) = $Variable =~ /(\S+)\s+(.+)/;
doesn't. I really do not know.

BTW, I have read perlre and perlvar.  I never noticed any explanation as to
WHY there is the penalty.

Rafal




------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 6908
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[24755] in Perl-Users-Digest

Perl-Users Digest, Issue: 6908 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Wed Aug 25 09:06:09 2004

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Aug 25 09:06:09 2004