[31400] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 2652 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon Oct 26 16:09:42 2009

Date: Mon, 26 Oct 2009 13:09:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 26 Oct 2009     Volume: 11 Number: 2652

Today's topics:
    Re: 'use' command being executed <justin.0908@purestblue.com>
    Re: 'use' command being executed <uri@StemSystems.com>
    Re: Alternatives to LWP::Parallel <glex_no-spam@qwest-spam-no.invalid>
    Re: Alternatives to LWP::Parallel <pete.sundstrom@gmail.com>
    Re: How to prevent hanging when writing lots of text to <jl_post@hotmail.com>
    Re: How to prevent hanging when writing lots of text to <ben@morrow.me.uk>
    Re: How to prevent hanging when writing lots of text to <jl_post@hotmail.com>
    Re: How to prevent hanging when writing lots of text to <jl_post@hotmail.com>
        Parse nodes in a XML file for comparison to CGI posted  <richardk.cj@gmail.com>
    Re: Parse nodes in a XML file for comparison to CGI pos <rkb@i.frys.com>
    Re: Parse nodes in a XML file for comparison to CGI pos <rkb@i.frys.com>
    Re: Perl bioinformatics <cartercc@gmail.com>
    Re: Perl bioinformatics <uri@StemSystems.com>
    Re: Perl bioinformatics <ben@morrow.me.uk>
    Re: Perl bioinformatics (Bradley K. Sherman)
    Re: Perl bioinformatics <OJZGSRPBZVCX@spammotel.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Mon, 26 Oct 2009 16:00:49 -0000
From: Justin C <justin.0908@purestblue.com>
Subject: Re: 'use' command being executed
Message-Id: <97f.4ae5c7b1.2ff18@zem>

On 2009-10-22, jhavero <jhavero@gmail.com> wrote:
> The 'use' command below tries to run when this is executed from Unix
> so I get an error that it cannot find the  Win32::File module. If I
> comment out this 'use' line the script works in Unix and the 'print
> "Unix"' statement runs and the 'print "Windows"' statement never runs.

unless ($os eq "MSWin32") {
    use ...
}

Now you just need to figure out what your operating system is. I'll give
you a clue, it's a frequently asked question.

	Justin.

-- 
Justin C, by the sea.


------------------------------

Date: Mon, 26 Oct 2009 12:29:58 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: 'use' command being executed
Message-Id: <87eioqqfsp.fsf@quad.sysarch.com>

>>>>> "JC" == Justin C <justin.0908@purestblue.com> writes:

  JC> On 2009-10-22, jhavero <jhavero@gmail.com> wrote:
  >> The 'use' command below tries to run when this is executed from Unix
  >> so I get an error that it cannot find the  Win32::File module. If I
  >> comment out this 'use' line the script works in Unix and the 'print
  >> "Unix"' statement runs and the 'print "Windows"' statement never runs.

  JC> unless ($os eq "MSWin32") {
  JC>     use ...
  JC> }

  JC> Now you just need to figure out what your operating system is. I'll give
  JC> you a clue, it's a frequently asked question.

and did you test this? did you read the other posts in the thread about
why that is wrong? that use line will ALWAYS execute regardless of the
OS involved.

uri

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Mon, 26 Oct 2009 12:02:31 -0500
From: "J. Gleixner" <glex_no-spam@qwest-spam-no.invalid>
Subject: Re: Alternatives to LWP::Parallel
Message-Id: <4ae5d628$0$48224$815e3792@news.qwest.net>

Peter wrote:
> I'm in the process of writing a script that will take a whole bunch of
> URL's and get their HTTP status.  I was going to use
> LWP::Parallel::UserAgent to handle parallel HTTP requests.  However, I
> found that this module does not work with later version of libwww and
> it looks like this module is no longer maintained.
> 
> So I started looking around at other methods/modules I could use.
> There are certainly quite a few to choose from and some are quite
> complex for my needs.
> 
> Some of the modules I looked at were:

Why not simply Parallel::ForkManager and LWP::UserAgent?

[...]


------------------------------

Date: Mon, 26 Oct 2009 11:15:53 -0700 (PDT)
From: Peter <pete.sundstrom@gmail.com>
Subject: Re: Alternatives to LWP::Parallel
Message-Id: <3a29d470-6a6d-4d43-b118-ea328d36e046@e4g2000prn.googlegroups.com>

On 27 Oct, 06:02, "J. Gleixner" <glex_no-s...@qwest-spam-no.invalid>
wrote:
> Peter wrote:
> > I'm in the process of writing a script that will take a whole bunch of
> > URL's and get their HTTP status. =A0I was going to use
> > LWP::Parallel::UserAgent to handle parallel HTTP requests. =A0However, =
I
> > found that this module does not work with later version of libwww and
> > it looks like this module is no longer maintained.
>
> > So I started looking around at other methods/modules I could use.
> > There are certainly quite a few to choose from and some are quite
> > complex for my needs.
>
> > Some of the modules I looked at were:
>
> Why not simply Parallel::ForkManager and LWP::UserAgent?
>
> [...]


Ah, now that looks to be just what I need.  I don't know how I managed
to miss that one.  Thanks.


------------------------------

Date: Mon, 26 Oct 2009 08:46:01 -0700 (PDT)
From: "jl_post@hotmail.com" <jl_post@hotmail.com>
Subject: Re: How to prevent hanging when writing lots of text to a pipe?
Message-Id: <f28c2aa7-c40f-4709-8028-4620198804e4@u36g2000prn.googlegroups.com>

On Oct 23, 2:16 pm, Ben Morrow <b...@morrow.me.uk> wrote:
>
> A pipe doesn't report EOF until there are no handles on it opened for
> writing. The parent still has its write handle open, and for all the OS
> knows it might be wanting to write to the pipe too.

   Makes sense.


> You can use IO::Scalar under 5.6. (Indeed, you could simply use
> IO::Scalar under all versions of perl: it's a little less efficient and
> a little less pretty than the PerlIO-based solution in 5.8, but it will
> work just fine.)

   That's a great suggestion!  To use IO::Scalar in my program, I had
to create two new IO::Scalars: one to write to, and one to read from.
I edited my sample program to be:


#!/usr/bin/perl

use strict;
use warnings;

print "Enter a number: ";
my $number = <STDIN>;
chomp($number);

my @lines = do
{
   use IO::Scalar;

   my $output;
   my $writeHandle = new IO::Scalar(\$output);

   # Populate $output:
   print $writeHandle "$_\n"  foreach 1 .. $number;
   close($writeHandle);

   # Populate @lines with the lines in $output:
   my $readHandle = new IO::Scalar(\$output);
   <$readHandle>
};

print "Extracted output lines:\n @lines";

__END__


For some reason my program wouldn't work with just one IO::Scalar.
Regardless, it works perfectly now, and without the need to fork a new
process.

   Thanks again for your excellent response, Ben.  Your advice was
very helpful.

   -- Jean-Luc


------------------------------

Date: Mon, 26 Oct 2009 16:39:20 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: How to prevent hanging when writing lots of text to a pipe?
Message-Id: <ouvgr6-v502.ln1@osiris.mauzo.dyndns.org>


Quoth "jl_post@hotmail.com" <jl_post@hotmail.com>:
> On Oct 23, 2:16 pm, Ben Morrow <b...@morrow.me.uk> wrote:
> 
> > You can use IO::Scalar under 5.6. (Indeed, you could simply use
> > IO::Scalar under all versions of perl: it's a little less efficient and
> > a little less pretty than the PerlIO-based solution in 5.8, but it will
> > work just fine.)
> 
>    That's a great suggestion!  To use IO::Scalar in my program, I had
> to create two new IO::Scalars: one to write to, and one to read from.
> I edited my sample program to be:
> 
> 
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> 
> print "Enter a number: ";
> my $number = <STDIN>;
> chomp($number);
> 
> my @lines = do
> {
>    use IO::Scalar;
> 
>    my $output;
>    my $writeHandle = new IO::Scalar(\$output);
> 
>    # Populate $output:
>    print $writeHandle "$_\n"  foreach 1 .. $number;
>    close($writeHandle);
> 
>    # Populate @lines with the lines in $output:
>    my $readHandle = new IO::Scalar(\$output);
>    <$readHandle>

Um, there's no need for this. Just use

    split /\n/, $output;

> For some reason my program wouldn't work with just one IO::Scalar.

Probably you have forgotten that you need to rewind the filehandle after
writing and before reading.

Ben



------------------------------

Date: Mon, 26 Oct 2009 12:05:58 -0700 (PDT)
From: "jl_post@hotmail.com" <jl_post@hotmail.com>
Subject: Re: How to prevent hanging when writing lots of text to a pipe?
Message-Id: <65eee6c5-4a21-49bf-8308-5d0e0139817a@m33g2000pri.googlegroups.com>

> Quoth "jl_p...@hotmail.com" <jl_p...@hotmail.com>:
>
> >    # Populate @lines with the lines in $output:
> >    my $readHandle = new IO::Scalar(\$output);
> >    <$readHandle>

On Oct 26, 10:39 am, Ben Morrow <b...@morrow.me.uk> wrote:
>
> Um, there's no need for this. Just use
>
>     split /\n/, $output;

   That doesn't do the same thing.  Splitting on /\n/ removes the
newlines from the entries, and creates an extra final element that's
an empty string.

   I could have used this instead:

      split m/(?<=\n)(?!\z)/, $output;

That way the $output is split after each newline, but only if that
newline is not the last character of $output.  (All newlines would be
retained with their lines.)

   I'm not sure which is faster or more efficient, but I figured I'd
avoid the look-behind and negative look-ahead, and instead use the
(more familiar) diamond operator on a file handle to split out each
line.


> > For some reason my program wouldn't work with just one IO::Scalar.

> Probably you have forgotten that you need to rewind the filehandle after
> writing and before reading.

   Ah, you're right again.  Now I can avoid the second IO::Scalar and
use a seek() call instead:


#!/usr/bin/perl

use strict;
use warnings;

print "Enter a number: ";
my $number = <STDIN>;
chomp($number);

my @lines = do
{
   use IO::Scalar;
   my $output;
   my $handle = new IO::Scalar(\$output);

   # Print the lines into the $handle:
   print $handle "$_\n"  foreach 1 .. $number;

   # Now rewind the handle and put its lines into @lines:
   seek($handle, 0, 0);
   <$handle>
};

print "Extracted output lines:\n @lines";

__END__


   Thanks once again, Ben.

   -- Jean-Luc


------------------------------

Date: Mon, 26 Oct 2009 12:31:45 -0700 (PDT)
From: "jl_post@hotmail.com" <jl_post@hotmail.com>
Subject: Re: How to prevent hanging when writing lots of text to a pipe?
Message-Id: <dcc68292-3eaa-4067-b7ee-1af9f81c064f@w37g2000prg.googlegroups.com>

On Oct 23, 8:44 pm, "C.DeRykus" <dery...@gmail.com> wrote:
>
> Hm, if there's no IPC involved, can't you simply populate
> an array directly...eliminating filehandles, Perl version
> worries, and  the 'do' statement completely. Did I miss
> something else?

   I left out a few details, such as the fact that the routine I'm
calling writes to a filehandle and contains over a thousand lines of
code.  (The routine is much larger than the original "foreach" loop I
used as an example.)  I could go through all the code and change it so
that it pushes its lines onto an array, but then I'd have to change
all the code that calls that routine as well.

   Or I could make a copy of that routine and change only that copy,
but then any changes (major and minor) made to the original routine
would have to be made a second time in the new routine.  (I'd rather
not maintain two almost identical large routines, if it can be
avoided.)

   Of course, I could just hand the routine a write-filehandle to a
temporary file on disk, but since I'd just have to read the file
contents back in, I'd rather just skip that step and avoid the disk I/
O altogether.  (Plus, there's no guarantee that the user has
permission to write to a temporary file outside of /tmp.)

   Ideally, I would like to be able to write to a filehandle that
didn't require disk I/O.  Creating a pipe() accomplishes that, but as
I mentioned before, it requires a fork() process to properly avoid
hanging the program.

   The other solutions are to use open() to write to a scalar (which
works, but only on Perl 5.8 and later) and using IO::Scalar (which
should work on Perl 5.6 and later).  So that's why I'm currently
sticking with IO::Scalar.

   If you know of a better way, let me know.  (There may be an obvious
way I'm just not seeing.)

   -- Jean-Luc


------------------------------

Date: Mon, 26 Oct 2009 12:10:05 -0700 (PDT)
From: ricky <richardk.cj@gmail.com>
Subject: Parse nodes in a XML file for comparison to CGI posted variable.
Message-Id: <77bf0e0c-2693-4514-afab-7bc32c7854af@h40g2000prf.googlegroups.com>

I hope I can explain this well, I have a Perl script that sends out
pages using:use Mail::Sendmail;
I'm now trying to create a subroutine to record or log these pages
that are sent out to a textfile.
So, I have the following XML file and here's a snippet of the file:

File: group.xml

<?xml version="1.0" encoding="utf-8"?>
<GroupListing>
 <Group>
    <Name>SUPPORT TEAM</Name>
    <Pager>3123454389@archwireless.net   bsa-dss-support@abc.com</
Pager>
  </Group>
  <Group>
    <Name>DI Connect</Name>
    <Pager>6084530243@archwireless.net</Pager>
  </Group>
  <Group>
    <Name>MO Messenger</Name>
    <Pager>6164561136@archwireless.net</Pager>
  </Group>
  <Group>
    <Name>DM Support</Name>
    <Pager>2151230924@archwireless.net</Pager>
  </Group>
  <Group>
    <Name>Domino Support</Name>
    <Pager>3123451018@archwireless.net  tomjones@abc.com
tri0851@gmail.com</Pager>
  </Group>
  <Group>
    <Name>FILES APP SUPPORT</Name>
    <Pager>212670897@archwireless.net</Pager>
  </Group>
</GroupListing>

As you can see within every <Group> element there's two nodes <Name> &
<Pager>. And as you can see some of the <Pager> nodes can have more
than one entry within them.

Now, I have the following Perl code subroutine that needs some
modifications to what I need to solve.

############################
sub writeLog {
my $xml = '/data/group.xml';
$date = localtime(time);
open(LOG, ">>PageLog.txt") || die "Cannot open PageLog.txt: $!";
my $simple = XMLin($xml);
for my $search (qw /3123451018@archwireless.net/) {
    my $groups = $simple->{'Group'};
    foreach my $group (@{$groups}) {
    print LOG "$date|$group->{'Name'}|$namnum|$body\n" if $group->
{'Pager'} =~ m/\Q$search\E/gi;
      }
    }
close (LOG);
}


What's passed from the HTML page to the Perl-CGI code is only the
<Pager> node information which is called: "$to_group", not the <Name>.

In this line of Perl code:
for my $search (qw /3123451018@archwireless.net/)

I hardcoded some <Pager> data to match against the group.xml file.
Then I print out to a text file the <Name> which matches the <Pager>
data.

Now, what I need is to change this line:
for my $search (qw /3123451018@archwireless.net/)

to accept a variable from the CGI posting and find the match in the
group.xml and pull out the <Name> to printed to the text file.

thanks!!




------------------------------

Date: Mon, 26 Oct 2009 12:43:29 -0700 (PDT)
From: Ron Bergin <rkb@i.frys.com>
Subject: Re: Parse nodes in a XML file for comparison to CGI posted variable.
Message-Id: <8750bc60-0679-4d88-afa3-f369dfe3f382@f20g2000prn.googlegroups.com>

On Oct 26, 11:10=A0am, ricky <richardk...@gmail.com> wrote:
> I hope I can explain this well, I have a Perl script that sends out
> pages using:use Mail::Sendmail;
> I'm now trying to create a subroutine to record or log these pages
> that are sent out to a textfile.
> So, I have the following XML file and here's a snippet of the file:
>
> File: group.xml
>
> <?xml version=3D"1.0" encoding=3D"utf-8"?>
> <GroupListing>
> =A0<Group>
> =A0 =A0 <Name>SUPPORT TEAM</Name>
> =A0 =A0 <Pager>3123454...@archwireless.net =A0 bsa-dss-supp...@abc.com</
> Pager>
> =A0 </Group>
> =A0 <Group>
> =A0 =A0 <Name>DI Connect</Name>
> =A0 =A0 <Pager>6084530...@archwireless.net</Pager>
> =A0 </Group>
> =A0 <Group>
> =A0 =A0 <Name>MO Messenger</Name>
> =A0 =A0 <Pager>6164561...@archwireless.net</Pager>
> =A0 </Group>
> =A0 <Group>
> =A0 =A0 <Name>DM Support</Name>
> =A0 =A0 <Pager>2151230...@archwireless.net</Pager>
> =A0 </Group>
> =A0 <Group>
> =A0 =A0 <Name>Domino Support</Name>
> =A0 =A0 <Pager>3123451...@archwireless.net =A0tomjo...@abc.com
> tri0...@gmail.com</Pager>
> =A0 </Group>
> =A0 <Group>
> =A0 =A0 <Name>FILES APP SUPPORT</Name>
> =A0 =A0 <Pager>212670...@archwireless.net</Pager>
> =A0 </Group>
> </GroupListing>
>
> As you can see within every <Group> element there's two nodes <Name> &
> <Pager>. And as you can see some of the <Pager> nodes can have more
> than one entry within them.
>
> Now, I have the following Perl code subroutine that needs some
> modifications to what I need to solve.
>
> ############################
> sub writeLog {
> my $xml =3D '/data/group.xml';
> $date =3D localtime(time);
> open(LOG, ">>PageLog.txt") || die "Cannot open PageLog.txt: $!";
> my $simple =3D XMLin($xml);
> for my $search (qw /3123451...@archwireless.net/) {
> =A0 =A0 my $groups =3D $simple->{'Group'};
> =A0 =A0 foreach my $group (@{$groups}) {
> =A0 =A0 print LOG "$date|$group->{'Name'}|$namnum|$body\n" if $group->
> {'Pager'} =3D~ m/\Q$search\E/gi;
> =A0 =A0 =A0 }
> =A0 =A0 }
> close (LOG);
>
> }
>
> What's passed from the HTML page to the Perl-CGI code is only the
> <Pager> node information which is called: "$to_group", not the <Name>.
>
> In this line of Perl code:
> for my $search (qw /3123451...@archwireless.net/)
>
> I hardcoded some <Pager> data to match against the group.xml file.
> Then I print out to a text file the <Name> which matches the <Pager>
> data.
>
> Now, what I need is to change this line:
> for my $search (qw /3123451...@archwireless.net/)
>
> to accept a variable from the CGI posting and find the match in the
> group.xml and pull out the <Name> to printed to the text file.
>
> thanks!!


#!/usr/bin/perl

use strict;
use warnings;
use Mail::Sendmail;
use XML::Simple;

my $xml      =3D 'group.xml';
my $simple   =3D XMLin($xml);
my $cgi      =3D CGI->new;
my $to_group =3D $cgi->param('to_group');
my $body     =3D $cgi->param('YourMessage');
my $namnum   =3D $cgi->param('YourName')  . '@' . $cgi->param
('YourNumber');
my $groups   =3D $simple->{Group};

foreach my $group ( @$groups ) {
    if ( $group->{'Pager'} =3D~ /^$to_group/ ) {
        my %mail;
        my @to =3D split /\s+/, $group->{'Pager'};
        foreach my $reciepient (@to)  {
            $mail{To}   =3D $reciepient;
            $mail{Subject} =3D $namnum;
            $mail{Message} =3D $body;
            sendmail %mail;
        }
        writeLog($group->{'Name'}, $namnum, $body);
        last;
    }
}

sub writeLog {
    my ($name, $name_num, $body) =3D @_;
    my $date =3D localtime(time);

    open my $LOG, '>>', 'AlphaPageLog.txt'
      or die "Cannot open AlphaPageLog.txt: $!";

    print $LOG "$name|$date|$name_num|$body\n";

    close $LOG;
}


------------------------------

Date: Mon, 26 Oct 2009 12:56:21 -0700 (PDT)
From: Ron Bergin <rkb@i.frys.com>
Subject: Re: Parse nodes in a XML file for comparison to CGI posted variable.
Message-Id: <5a9831f5-15cb-458b-8c20-d9e2004cbd3f@z4g2000prh.googlegroups.com>

On Oct 26, 11:43=A0am, Ron Bergin <r...@i.frys.com> wrote:
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
> use Mail::Sendmail;
> use XML::Simple;
>
> my $xml =A0 =A0 =A0=3D 'group.xml';
> my $simple =A0 =3D XMLin($xml);
> my $cgi =A0 =A0 =A0=3D CGI->new;
> my $to_group =3D $cgi->param('to_group');
> my $body =A0 =A0 =3D $cgi->param('YourMessage');
> my $namnum =A0 =3D $cgi->param('YourName') =A0. '@' . $cgi->param
> ('YourNumber');
> my $groups =A0 =3D $simple->{Group};
>
> foreach my $group ( @$groups ) {
> =A0 =A0 if ( $group->{'Pager'} =3D~ /^$to_group/ ) {
> =A0 =A0 =A0 =A0 my %mail;
> =A0 =A0 =A0 =A0 my @to =3D split /\s+/, $group->{'Pager'};
> =A0 =A0 =A0 =A0 foreach my $reciepient (@to) =A0{
> =A0 =A0 =A0 =A0 =A0 =A0 $mail{To} =A0 =3D $reciepient;
> =A0 =A0 =A0 =A0 =A0 =A0 $mail{Subject} =3D $namnum;
> =A0 =A0 =A0 =A0 =A0 =A0 $mail{Message} =3D $body;
> =A0 =A0 =A0 =A0 =A0 =A0 sendmail %mail;
> =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 writeLog($group->{'Name'}, $namnum, $body);
> =A0 =A0 =A0 =A0 last;
> =A0 =A0 }
>
I should point out that this has the minimum level of error handling
and lacks file locking.  I'll leave those issues to the OP to figure
out.

ricky, this is just a minor adjustment to what I provided in your EE
question.
http://www.experts-exchange.com/Programming/Languages/Scripting/Perl/Q_2484=
1328.html?cid=3D1131#a25659537
> }
>
> sub writeLog {
> =A0 =A0 my ($name, $name_num, $body) =3D @_;
> =A0 =A0 my $date =3D localtime(time);
>
> =A0 =A0 open my $LOG, '>>', 'AlphaPageLog.txt'
> =A0 =A0 =A0 or die "Cannot open AlphaPageLog.txt: $!";
>
> =A0 =A0 print $LOG "$name|$date|$name_num|$body\n";
>
> =A0 =A0 close $LOG;
>
> }
>
>



------------------------------

Date: Mon, 26 Oct 2009 09:00:49 -0700 (PDT)
From: ccc31807 <cartercc@gmail.com>
Subject: Re: Perl bioinformatics
Message-Id: <56bcf5e0-d0cf-4de0-bbef-6b6fd07236ed@p23g2000vbl.googlegroups.com>

On Oct 26, 10:45=A0am, b...@panix.com (Bradley K. Sherman) wrote:
> >The usual problem is the huge volume of data that needs processing.
> >Therefore typically the standard algorithms don't work any more and you
> >need a really strong background in data processing.

>
> This is not really fair. =A0Most of bioinformatics is data wrangling
> and Perl is exactly the right choice for that.

In my day job, I deal with data files on the order of several hundred
thousand records. The scripts I write to produce reports from these
data files sometimes take a second (or several seconds) to run. The
data file I have for the bioinformatics project is much larger, but is
a lot simpler (it's a dotplot file).

Sometimes, data files can be so huge that the script just breaks.
Sometimes, the script just runs longer than you might expect.
Obviously, the longer time really isn't a problem ... there's no
difference between a script that runs in microseconds and one that
runs in minutes (say, between 60 and 120) ... as long as the script
runs to completion.

I'm sympathetic to jue's observation about the scaling problem, but
after having looked at the data, the fact that it's genomic or
biological is totally irrelevant. It's really the amount of data
rather than the kind of data that seems to be significant.

You seem to have a handle on what's going on. Is using Perl for
bioinformatics totally off the wall, or a reasonable option for data
mangling?

CC


------------------------------

Date: Mon, 26 Oct 2009 12:25:21 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: Perl bioinformatics
Message-Id: <87ocnuqg0e.fsf@quad.sysarch.com>

>>>>> "JE" == Jürgen Exner <jurgenex@hotmail.com> writes:

  JE> ccc31807 <cartercc@gmail.com> wrote:
  >> I'm not changing jobs, but I've been contacted about some contract
  >> opportunities that (reportedly) are difficult but seem easy enough to
  >> me, manipulating genome files to produce various kinds of reports,
  >> graphs, etc. I have zero experience in this, so I'm just wondering ...

  JE> The usual problem is the huge volume of data that needs processing.
  JE> Therefore typically the standard algorithms don't work any more and you
  JE> need a really strong background in data processing. 
  JE> Perl is not necessariy the best choice here. Perl's powerful features
  JE> make it easy to write code that seems to do the job, but it won't scale
  JE> from the small test samples to the huge actual data set where you really
  JE> need special methods and optimizations.

  JE> A little while ago there was someone posting questions here regularly
  JE> about how to deal with genom sequences. If don't know if he is still
  JE> around, but maybe you can check the archives and contact him.

i will disagree on this. first off, perl is major in the biotech world
for several reasons. one it is the best at text processing and most
large genetic files are just plain text formats. secondly, there is
large package called bioperl (with its own mailing list and community)
that does tons of standard things on those files and more. finally, if
you look back a bit, there is a great article called 'how perl saved the
human genome project'. when that project was initially running it was
distributed over many labs worldwide. and they created many new
incompatible file formats for the data. the author of cgi.pm (who is
really an MD and genetic researcher) designed perl modules to convert
those formats to a common set of core formats so they could easily
exchange data. so perl has a strong tie to the biotech industry that is
not likely to be broken for a long while.

as for jobs, i don't see many leads in that industry but they are
usually looking for direct experience in it (hard to get from the
outside) and/or higher degrees in related fields because you would be
working in such an environment where you need it.

so if the OP can learn enough from books and practice to get a job in
the field, i say go for it. there many be other hurdles to jump but i
can't predict what they will be.

uri
perlhunter.com (so i know something about the perl job market)

-- 
Uri Guttman  ------  uri@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------


------------------------------

Date: Mon, 26 Oct 2009 16:42:08 +0000
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Perl bioinformatics
Message-Id: <040hr6-v502.ln1@osiris.mauzo.dyndns.org>


Quoth ccc31807 <cartercc@gmail.com>:
>
> You seem to have a handle on what's going on. Is using Perl for
> bioinformatics totally off the wall, or a reasonable option for data
> mangling?

The people who maintain the BioPerl distributions on CPAN seem to think
it's a decent choice of language. See also
http://use.perl.org/~Alias/journal/39783 .

Ben



------------------------------

Date: Mon, 26 Oct 2009 16:53:19 +0000 (UTC)
From: bks@panix.com (Bradley K. Sherman)
Subject: Re: Perl bioinformatics
Message-Id: <hc4k5v$e5s$1@reader1.panix.com>

In article <56bcf5e0-d0cf-4de0-bbef-6b6fd07236ed@p23g2000vbl.googlegroups.com>,
ccc31807  <cartercc@gmail.com> wrote:
> ...
>You seem to have a handle on what's going on. Is using Perl for
>bioinformatics totally off the wall, or a reasonable option for data
>mangling?
>

I think that Perl is the primary language for bioinformatics.
I can't back that up with numbers but I have been working in
bioinformatics since 1992.  Some of the younger bioinformaticians
might want to make a case for Python, but I'm skeptical.

My philosophy is to use Perl until it becomes necessary to
write something in C.  It rarely becomes necessary.

Learning databases and statistics are also of great importance.

    --bks



------------------------------

Date: Mon, 26 Oct 2009 18:40:13 +0100
From: "Jochen Lehmeier" <OJZGSRPBZVCX@spammotel.com>
Subject: Re: Perl bioinformatics
Message-Id: <op.u2e4pbf6mk9oye@frodo>

On Mon, 26 Oct 2009 17:00:49 +0100, ccc31807 <cartercc@gmail.com> wrote:

> You seem to have a handle on what's going on. Is using Perl for
> bioinformatics totally off the wall, or a reasonable option for data
> mangling?

I have no idea about bioinformatics, but Perl is easy enough that you  
should be able to get a book, jot down a quick & dirty test script and  
just sic it on your biggest and meanest data set.

Then you get a quick handle on how long basic stuff takes. If it works  
fast enough, fine; if not, feel free to ask here. And if you find that  
it's just not the right tool, then you won't have lost much.

IMO, the deal breaker will be if you have to handle data in an O(n^2)  
fashion (or worse), i.e. where one would really use some very special  
index structure, especially if the whole data set does not fit into RAM.

Good luck!


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 2652
***************************************


home help back first fref pref prev next nref lref last post