[31879] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 3142 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Sep 25 14:09:31 2010

Date: Sat, 25 Sep 2010 11:09:13 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sat, 25 Sep 2010     Volume: 11 Number: 3142

Today's topics:
        complex problem <ela@yantai.org>
    Re: complex problem <willem@turtle.stack.nl>
    Re: complex problem <xhoster@gmail.com>
    Re: complex problem <ela@yantai.org>
    Re: FAQ 4.8 How do I perform an operation on a series o (David Canzi)
        How to initialize a referenced array? <feltra@gmail.com>
    Re: How to initialize a referenced array? <ben@morrow.me.uk>
    Re: How to initialize a referenced array? <feltra@gmail.com>
        how to solve when no root privilege? <ela@yantai.org>
    Re: how to solve when no root privilege? <sherm.pendley@gmail.com>
        Long script "just stops" sometime <jerrykrinock@gmail.com>
    Re: Long script "just stops" sometime <ben@morrow.me.uk>
    Re: Need Regex for phone number <m@rtij.nl.invlalid>
    Re: Need Regex for phone number <monkey@joemoney.net>
    Re: Need Regex for phone number <tadmc@seesig.invalid>
    Re: Removing tag + closing tag <ben@morrow.me.uk>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 24 Sep 2010 19:52:38 -0700
From: "ela" <ela@yantai.org>
Subject: complex problem
Message-Id: <i7i3dq$hna$1@ijustice.itsc.cuhk.edu.hk>

I have a hundred of files that have predefined columns but unknown number of 
rows:

for example, a row from file1 is like this (delimited by \t)

chr19   56837617    56841944 
SIGLEC14(NM_001098612,expression:8.88665993183852)  0   -   56837617 
56841944    255,12,12   1   4327    0

what I have to do is to build a big table containing information:

Name\tExpression_file1\tExpression_file2\t ... 
Expression_file99\tExpression_file100\n
NM_001098612\t8.88665993183852\t ...\n
 ...

While using regular expression to extract the information is easy, and then 
building the table by associative array is also simple, the main problem 
falls into padding the previous columns with no records. While the current 
column can be safely padded, the previous columns will have to look up more 
and more. Recursion seems to be the solution after my trial on using 
"for-loops", but the recursion routine appears more difficult than I 
previously thought... 




------------------------------

Date: Fri, 24 Sep 2010 14:17:44 +0000 (UTC)
From: Willem <willem@turtle.stack.nl>
Subject: Re: complex problem
Message-Id: <slrni9pco8.1nnc.willem@turtle.stack.nl>

ela wrote:
) I have a hundred of files that have predefined columns but unknown number of 
) rows:
<snip>
) While using regular expression to extract the information is easy, and then 
) building the table by associative array is also simple, the main problem 
) falls into padding the previous columns with no records.

Easy.  Do two passes over the data.
First pass finds max number of columns, second pass does all the padding.


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT


------------------------------

Date: Fri, 24 Sep 2010 19:53:43 -0700
From: Xho Jingleheimerschmidt <xhoster@gmail.com>
Subject: Re: complex problem
Message-Id: <4c9d740d$0$3766$ed362ca5@nr5-q3a.newsreader.com>

ela wrote:
> I have a hundred of files that have predefined columns but unknown number of 
> rows:
> 
> for example, a row from file1 is like this (delimited by \t)
> 
> chr19   56837617    56841944 
> SIGLEC14(NM_001098612,expression:8.88665993183852)  0   -   56837617 
> 56841944    255,12,12   1   4327    0

Word wrap makes this rather difficult to read.

> 
> what I have to do is to build a big table containing information:
> 
> Name\tExpression_file1\tExpression_file2\t ... 
> Expression_file99\tExpression_file100\n
> NM_001098612\t8.88665993183852\t ...\n
> ....
> 
> While using regular expression to extract the information is easy, and then 
> building the table by associative array is also simple, the main problem 
> falls into padding the previous columns with no records.

Why is that a problem?  If all the files are passed in on @ARGV, and 
each file of input is turned into a new column in the output, then you 
already know what all the columns in the output are going to be, right 
up front.


> While the current 
> column can be safely padded, the previous columns will have to look up more 
> and more. Recursion seems to be the solution after my trial on using 
> "for-loops", but the recursion routine appears more difficult than I 
> previously thought... 

I don't understand how recursion could plausibly be useful here.

Anyway, what I often find myself doing is using two hashes.

my %exp;
my %sample;
while (<>) {
   my ($refseq,$expression,$sample)=parse_however($_);
   $exp{$refseq}{$sample}=$expression;
   $sample{$sample}=();
};

Now %sample contains an entry for every sample/tissue/file which has at 
least one second-level entry in %exp.

Of course you could have reversed the nesting order of the keys, 
$exp{$sample}{$refseq}, but I assume that would be inconvenient for 
other reasons, or you would have done it already.


Xho


------------------------------

Date: Sat, 25 Sep 2010 20:56:49 -0700
From: "ela" <ela@yantai.org>
Subject: Re: complex problem
Message-Id: <i7kri4$ie6$1@ijustice.itsc.cuhk.edu.hk>

> Anyway, what I often find myself doing is using two hashes.
>
> my %exp;
> my %sample;
> while (<>) {
>   my ($refseq,$expression,$sample)=parse_however($_);
>   $exp{$refseq}{$sample}=$expression;
>   $sample{$sample}=();
> };
>
> Now %sample contains an entry for every sample/tissue/file which has at
> least one second-level entry in %exp.

sorry but would you mind elaborating why the second hash can do the purpose? 
I can't quite follow it...





------------------------------

Date: Fri, 24 Sep 2010 18:50:05 +0000 (UTC)
From: dmcanzi@remulak.uwaterloo.ca (David Canzi)
Subject: Re: FAQ 4.8 How do I perform an operation on a series of integers?
Message-Id: <i7irst$n61$1@rumours.uwaterloo.ca>

In article <230920101838100894%brian.d.foy@gmail.com>,
brian d foy  <brian.d.foy@gmail.com> wrote:
>In article <i79cmm$d8k$1@rumours.uwaterloo.ca>, David Canzi
><dmcanzi@remulak.uwaterloo.ca> wrote:
>
>> In article <75Wlo.61884$y85.61853@newsfe13.iad>,
>> PerlFAQ Server  <brian@theperlreview.com> wrote:
>
>> >            for ($i=5; $i < 500_005; $i++) {
>
>> >            for my $i (5 .. 500_005) {
>> 
>> Should be:   for my $i (5 .. 500_004) {
>
>Fixed, although I made the fix up in the for() so the same numbers were
>in each example.

I was trying to get 3 lines to be consistent with each other:

    for ($i=5; $i <= 500_005; $i++) {
    for my $i (5 .. 500_005) {
    will not create a list of 500,000 integers.

The list 5 .. 500_005 contains 500,001 integers.

Was that over picky?  I can never tell.

-- 
David Canzi		| "If you can't learn to do something well, learn
			| to enjoy doing it poorly." -- Ashleigh Brilliant


------------------------------

Date: Sat, 25 Sep 2010 09:28:34 -0700 (PDT)
From: feltra <feltra@gmail.com>
Subject: How to initialize a referenced array?
Message-Id: <9a39ee71-40ab-413b-816d-1c941a1c61e0@28g2000yqm.googlegroups.com>

Hi,

Am using arrays with only references in a sub-routine.  While I got
the hang of how to access an element of the array using the '->'
operator,  I do not know how to intialize this array.   I.e. I want to
be able to do something like

@myarr=();   $#myarr = -1;

inside the subroutine, but myarr is only a reference to an array not
the actual array...

Hope the above problem description is clear.

If anyone knows how to do this, kindly help by posting the answer or
tell me where to look...

Thanks & Best Regards,
-feltra


------------------------------

Date: Sat, 25 Sep 2010 17:42:07 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: How to initialize a referenced array?
Message-Id: <vbl1n7-dp41.ln1@osiris.mauzo.dyndns.org>


Quoth feltra <feltra@gmail.com>:
> Am using arrays with only references in a sub-routine.  While I got
> the hang of how to access an element of the array using the '->'
> operator,  I do not know how to intialize this array.   I.e. I want to
> be able to do something like
> 
> @myarr=();   $#myarr = -1;

Assuming $aref holds an array reference, that would be

    @$aref = ();    $#$aref = -1;

but I don't see why you think you need to do this. What are you actually
trying to do?

Ben



------------------------------

Date: Sat, 25 Sep 2010 10:11:36 -0700 (PDT)
From: feltra <feltra@gmail.com>
Subject: Re: How to initialize a referenced array?
Message-Id: <7e369db5-794f-47b3-b1c0-29db8864a070@h7g2000yqn.googlegroups.com>

Thanks for the quick reply, Ben..    It works like a charm!

Basically I am copying one array to another, and within the sub, I
wouldn't know which one to init (ie. which is the receiving array),
except thru references...

It's possible that there is a better method than copying, but at least
this solution will solve my problem for now..

Thanks a lot & Best Regards,
-feltra


On Sep 25, 9:42=A0pm, Ben Morrow <b...@morrow.me.uk> wrote:
> Quoth feltra <fel...@gmail.com>:
>
> > Am using arrays with only references in a sub-routine. =A0While I got
> > the hang of how to access an element of the array using the '->'
> > operator, =A0I do not know how to intialize this array. =A0 I.e. I want=
 to
> > be able to do something like
>
> > @myarr=3D(); =A0 $#myarr =3D -1;
>
> Assuming $aref holds an array reference, that would be
>
> =A0 =A0 @$aref =3D (); =A0 =A0$#$aref =3D -1;
>
> but I don't see why you think you need to do this. What are you actually
> trying to do?
>
> Ben



------------------------------

Date: Sat, 25 Sep 2010 21:19:27 -0700
From: "ela" <ela@yantai.org>
Subject: how to solve when no root privilege?
Message-Id: <i7kssi$j14$1@ijustice.itsc.cuhk.edu.hk>

I'm using a perl that needs to access a module but since the perl is 
accessed from different paths so the module is "invisible". Since I don't 
have root privilege to place the module to the library path, what can I do 
to solve the problem? Either duplicating the perl and module or moving all 
the data files to the perl program path is a bad idea... though it does 
solve the problem... 




------------------------------

Date: Sat, 25 Sep 2010 10:23:41 -0400
From: Sherm Pendley <sherm.pendley@gmail.com>
Subject: Re: how to solve when no root privilege?
Message-Id: <m2vd5tsxzm.fsf@sherm.shermpendley.com>

"ela" <ela@yantai.org> writes:

> I'm using a perl that needs to access a module but since the perl is 
> accessed from different paths so the module is "invisible".

"Invisible" in what way? Is your script running in a chroot jail in which
the path is literally invisible? Is it running as a user that does not
have permission to read the path? Or is it simply that the path to the
module isn't in the default @INC?

The last issue is easy, and something you can do in your own scripts -
have a look at 'perldoc lib'.

For the first two, you need to either convince your admin to make the
module path visible (by configuring the jail and/or permissions), or
install the module to a path that is already visible.

sherm--

-- 
Sherm Pendley
                                   <http://camelbones.sourceforge.net>
Cocoa Developer


------------------------------

Date: Fri, 24 Sep 2010 15:11:17 -0700 (PDT)
From: Jerry Krinock <jerrykrinock@gmail.com>
Subject: Long script "just stops" sometime
Message-Id: <03d179ff-8f96-441d-b1e5-d37401ab2b6c@m35g2000prn.googlegroups.com>

I've written a 1500-line script which processes several dozen files of
source text written in Markdown to html.  It takes several minutes to
run, indicating progress by printf statements.  However, about 20% of
the time, in the middle of processing a Markdown file, it just stops
progressing, as though it is in an infinite loop.  If I kill the
process and restart, it always completes successfully.

My script is, of course, being a script, not particularly efficient.
I was thinking that maybe Perl was running out of memory or something,
although that's not supposed to happen nowadays (Perl 5.10.0, Mac OS X
10.6).  And when I check it in Apple's Activity Monitor during normal
operation, I find that its CPU and memory usage are hardly noticeable,
maybe 3% and a few tens of megabytes.

Are there any conditions under which Perl would "just stop"?

Any suggestions to troubleshoot this would be appreciated.

Thanks!

Jerry Krinock


------------------------------

Date: Fri, 24 Sep 2010 23:56:26 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Long script "just stops" sometime
Message-Id: <qtmvm7-i6d.ln1@osiris.mauzo.dyndns.org>


Quoth Jerry Krinock <jerrykrinock@gmail.com>:
> I've written a 1500-line script which processes several dozen files of
> source text written in Markdown to html.  It takes several minutes to
> run, indicating progress by printf statements.  However, about 20% of
> the time, in the middle of processing a Markdown file, it just stops
> progressing, as though it is in an infinite loop.  If I kill the
> process and restart, it always completes successfully.
> 
> My script is, of course, being a script, not particularly efficient.
> I was thinking that maybe Perl was running out of memory or something,
> although that's not supposed to happen nowadays (Perl 5.10.0, Mac OS X
> 10.6).

I don't know what you mean by 'not supposed to happen'; memory is stilla
finite resource, and writing a script which deliberately uses all
available memory is trivial (and, sometimes, even useful). However,
running out of memory causes perl to exit with a 'panic' message, not
hang.

> And when I check it in Apple's Activity Monitor during normal
> operation, I find that its CPU and memory usage are hardly noticeable,
> maybe 3% and a few tens of megabytes.
> 
> Are there any conditions under which Perl would "just stop"?

I can't think of any, but that doesn't mean there aren't any. If CPU
usage is low I would expect it to be blocking on *something*; are you
using multiple threads/processes that might have deadlocked? Do you lock
any files someone else might have locked?

> Any suggestions to troubleshoot this would be appreciated.

Add the following somewhere early on:

    require Carp;
    $SIG{INFO} = sub { Carp::cluck("SIGINFO") };
    $SIG{QUIT} = sub { Carp::confess("SIGQUIT") };

Now you can press ^T to get a backtrace, and ^\ to get a backtrace and
kill the program. Once you know where it's failing, you can start
cutting the program down to something minimal which still exhibits the
failure.

Ben



------------------------------

Date: Thu, 23 Sep 2010 22:04:27 +0200
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: Need Regex for phone number
Message-Id: <bfosm7-hn9.ln1@news.rtij.nl>

On Thu, 23 Sep 2010 12:19:38 -0700, lotug wrote:

> I need Regex to indentify 3108222400 phone number.

/3108222400/ should work. If that is not what you mean, please better 
identify what you want matched.

M4


------------------------------

Date: Sat, 25 Sep 2010 01:21:31 -0400
From: monkeys paw <monkey@joemoney.net>
Subject: Re: Need Regex for phone number
Message-Id: <aZGdnW2n7oFqGwDRnZ2dnUVZ_tWdnZ2d@insightbb.com>

On 9/23/2010 3:19 PM, lotug wrote:
> I need Regex to indentify 3108222400 phone number.


If obvious holds true:

#!/usr/local/bin/perl
use strict;
use warnings;

my $num = '3108222400';

#Pick off 3 3 and 4 for US nums
$num =~ /^(\d{3})(\d{3})(\d{4})$/;

print "$1-$2-$3\n";

#OUTPUT:
#310-822-2400


------------------------------

Date: Sat, 25 Sep 2010 07:04:00 -0500
From: Tad McClellan <tadmc@seesig.invalid>
Subject: Re: Need Regex for phone number
Message-Id: <slrni9rpa7.hjv.tadmc@tadbox.sbcglobal.net>

monkeys paw <monkey@joemoney.net> wrote:

> $num =~ /^(\d{3})(\d{3})(\d{4})$/;
>
> print "$1-$2-$3\n";


You should never use the "dollar digit" variables without first
testing to ensure that the match succeeded. Else they will contain
"stale" values from a previous match that did succeed:

    print "$1-$2-$3\n" if $num =~ /^(\d{3})(\d{3})(\d{4})$/;

or

    if ($num =~ /^(\d{3})(\d{3})(\d{4})$/) {
        print "$1-$2-$3\n";
    }


-- 
Rest In Peace: 
Jonah McClellan gave his life for his country in a
helicopter crash in Afghanistan on September 21,2010.
Please pray for his wife and three children.


------------------------------

Date: Fri, 24 Sep 2010 12:14:24 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Removing tag + closing tag
Message-Id: <gpdum7-769.ln1@osiris.mauzo.dyndns.org>


Quoth Theo van den Heuvel <tcmvandenheuvel@gmail.com>:
> 
> The OP is strongly recommended to follow the advice that is posted
> here every week and use an existing HTML parser instead of doing
> something that can be mathematically proven to be impossible unless
> for fairly trivial cases. Sln's approach only indicates how convoluted
> and vulnerable the regex attempts need to be. They can never scale
> when requirements are added.

Just for fun, here's a not-quite-complete but otherwise correct (modulo
bugs, obviously) implementation of an XML parser as a single regex (I
omitted PIs and DTDs, since they added about as many productions again).
It's not terribly useful as it stands (it just tells you if a given
string contains a valid XML document or not) but it could be made to
build a parse tree fairly easily using (?{}) (subject to the usual
caveats with that construction).

m(
    (?&document)

    (?(DEFINE)

        # Document

        (?<document>
            (?&prolog) (?&element) (?&Misc)*
        )

        # Character sets

        (?<Char> 
            [\x9-\xA\xD\x20-\x7E\x85\xA0-\x{D7FF}]      |
            [\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}]
        )
        (?<S> [\x20\x9\xD\xA]+ )

        # Names

        (?<NameStartChar>
            [:A-Z_a-z\xC0-\xD6\xD8-\xF6\xF8-\x{2FF}\x{370}-\x{37D}]     | 
            [\x{37F}-\x{1FFF}\x{200C}-\x{200D}\x{2070}-\x{218F}]        |
            [\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCD}]       |
            [\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}]
        )
        (?<NameChar>
            (?&NameStartChar)                               | 
            [-.0-9\xB7\x{0300}-\x{036F}\x{203F}-\x{2040}]
        )
        (?<Name>        (?&NameStartChar) (?&NameChar)* )
        (?<Names>       (?&Name) (?: \x20 (?&Name) )* )

        # Comments

        (?<Comment>
            <!-- (?:
                (?: (?! - ) (?&Char) ) |
                (?: - (?! - ) (?&Char) )
            )* -->
        )

        # Prolog

        (?<prolog>      (?&XMLDecl) (?&Misc)* )
        (?<XMLDecl>     
            <\?xml (?&VersionInfo) (?&EncodingDecl)? (?&S)? \?> 
        )
        (?<Misc>        (?&Comment) | (?&S) )
        (?<Eq>          (?&S)? = (?&S)? )

        (?<VersionInfo> (?&S) version (?&Eq) (?: '1\.[10]' | "1\.[10]" ) )

        (?<EncodingDecl>    
            (?&S) encoding (?&Eq) (?: "(?&EncName)" | '(?&EncName)' )
        )
        (?<EncName>     [A-Za-z] (?: [A-Za-z0-9._-] )* )

        # CDATA sections

        (?<CDSect>  (?&CDStart) (?&CData) (?&CDEnd) )
        (?<CDStart> <!\[CDATA\[ )
        (?<CData>   (?: (?! \]\]> ) (?&Char) )* )
        (?<CDEnd>   \]\]> )

        # Element

        (?<element> (?&EmptyElemTag) | (?&STag) (?&content) (?&ETag) )

        (?<STag>        < (?&Name) (?: (?&S) (?&Attribute) )* (?&S)? > )
        (?<Attribute>   (?&Name) (?&Eq) (?&AttValue) )
        (?<ETag>        </ (?&Name) (?&S)? > )

        (?<AttValue>
            " (?: [^<&"] | (?&Reference) )* " |
            ' (?: [^<&'] | (?&Reference) )* '
        )

        # Content of elements

        (?<content>
            (?&CharData)? (?:
                (?: (?&element) | (?&Reference) | (?&CDSect) |
                    (?&Comment)
                )
                (?&CharData)?
            )*
        )
        (?<CharData> (?! [^<&]* \]\]> [^<&]* ) [^<&]* )

        # Empty elements

        (?<EmptyElemTag>
            < (?&Name) 
                (?: (?&S) (?&Attribute) )* (?&S)?
            />
        )

        # Character reference

        (?<Reference>   (?&EntityRef) | (?&CharRef) )
        (?<CharRef>     &\# [0-9]+ ; | &\#x [0-9a-fA-F]+ ; )
        (?<EntityRef>   & (?&Name) ; )
    )
)xs

Ben



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3142
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[31879] in Perl-Users-Digest

Perl-Users Digest, Issue: 3142 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Sat Sep 25 14:09:31 2010

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Sep 25 14:09:31 2010