[30727] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 1972 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Nov 8 21:09:47 2008

Date: Sat, 8 Nov 2008 18:09:10 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sat, 8 Nov 2008     Volume: 11 Number: 1972

Today's topics:
        [OT]: maximum memory <nospam-abuse@ilyaz.org>
    Re: A couple of questions regarding runtime generation  sln@netherlands.com
    Re: determine when to change to or from daylight saving <m@rtij.nl.invlalid>
    Re: Help: How can I parse this properties file? sln@netherlands.com
    Re: Help: How can I parse this properties file? sln@netherlands.com
    Re: Sockets and threads... <Mark.Seger@hp.com>
        Split a multi-sequence file into individual files <ela@yantai.org>
    Re: Split a multi-sequence file into individual files <tadmc@seesig.invalid>
    Re: Split a multi-sequence file into individual files <wahab-mail@gmx.de>
    Re: Split a multi-sequence file into individual files <tim@burlyhost.com>
    Re: Split a multi-sequence file into individual files <tim@burlyhost.com>
    Re: Split a multi-sequence file into individual files <wahab-mail@gmx.de>
    Re: Split a multi-sequence file into individual files <tim@burlyhost.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Sat, 8 Nov 2008 23:54:47 +0000 (UTC)
From:  Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: [OT]: maximum memory
Message-Id: <gf58s7$2b63$1@agate.berkeley.edu>

[A complimentary Cc of this posting was NOT [per weedlist] sent to
Peter J. Holzer
<hjp-usenet2@hjp.at>], who wrote in article <slrnghb1ut.chg.hjp-usenet2@hrunkner.hjp.at>:
> On 2008-11-05 21:13, Ilya Zakharevich <nospam-abuse@ilyaz.org> wrote:
> ><hjp-usenet2@hjp.at>], who wrote in article <slrngh39kn.819.hjp-usenet2@hrunkner.hjp.at>:
> >> For 64-bit processes the limit is theoretically 16 Exabytes, but that's
> >> well beyond the capabilities of current hardware. 
> >
> > ???  Well *within* capabilities of current hardware.  One ethernet
> > card, and you have *a possibility* of unlimited (read: limited only by
> > software) expansion of available disk space; which may be
> > memory-mapped.
> 
> Last time I looked there was no processor which would actually use all
> 64 bits in the MMU. The usable number of bits is typically somewhere
> between 36 and 48, which limits the usable virtual memory (including
> memory-mapped files, etc.) to 2^36 to 2^48 bytes.

So, IIUC, I misinterpreted your remark.  I thought that you say that
currently, one can't get enough MEMORY to overflow 64bit.  And now you
say that one can't get enough MEMORY ADDRESS SPACE to overflow 64bit.

> > So it is a question of money only.

> If you have enough money to develop a new MMU for your CPU, you are
> right ;-).

About 10 years ago I looked through notes for a hardware design 101
class, and one of the first homeworks was to design a MMU, simple, but
good enough to bootstrap a processor via (a hard disk/whatever)
sitting on a bus.  They needed to catch memory accesses to a segment
in memory, and translate them to bus access commands; and I think the
requirement was to design this in terms of discrete components
(transistors).  So IIRC, I think even I have enough money for such a
design.  ;-)

Yours,
Ilya

P.S.  Thinking about it more: the price estimate I gave is in ballpark
      of a price of a particle physics detector (LHC, Tevatron).
      Given that current design is to through away 99.99999% (or
      whatever) of information as early as possible, any money spent
      on larger storage and memory throughput has a probability to
      improve a chance the data from experiments may be (later) used
      for unrelated purposes...

P.P.S.  I tried to imagine other scenarios which may quickly produce
	much more than 2^64 bytes of info.  First I thought of LLST
	(https://www.llnl.gov/str/November05/Brase.html), but it is
	only 2^55 B/year.  The only other "realistic" scenario I found
	is a very anxious bigbrother: a "good" video camera (I'm
	thinking about IMAX-like quality, 4K x 3K x 3 x 50p; maybe not
	available this year, but RSN) can easily saturate 10Gb-BASE
	connection (in RAW stream with minimal compression).

	So if London authorities decide to replace their spycams by
	such beasts, AND would like to preserve RAW streams, they
	would generate 10TB/sec.  This is 25e18 B/month, which is
	>2^64 B/month.  Viva the bigbrother!


------------------------------

Date: Sat, 08 Nov 2008 19:14:53 GMT
From: sln@netherlands.com
Subject: Re: A couple of questions regarding runtime generation of REGEXP's
Message-Id: <rrobh4p9qdcqiufkpnk6nt8p1ta2simhtr@4ax.com>

On Fri, 07 Nov 2008 01:29:34 GMT, sln@netherlands.com wrote:

>On Wed, 05 Nov 2008 19:54:14 GMT, sln@netherlands.com wrote:
>
>>On Wed, 05 Nov 2008 00:31:44 GMT, sln@netherlands.com wrote:
>>
>>>On Tue, 04 Nov 2008 12:23:07 +0100, Michele Dondi <bik.mido@tiscalinet.it> wrote:
>>>
>>>>On Mon, 03 Nov 2008 23:01:35 GMT, sln@netherlands.com wrote:
>>>>
>>>>>In my opinion, s///g should be allowed by qr{} using the scoping block it was created
>>>>>in, and later correctly used (s///g) within the context of a block that invokes the engine.
>>>>>
>>>>>This may violate 'first-order object' of the language. But then why are code extensions allowed?
>>>>>qr/(?{ code })/ and what is the scoping for them? To me this looks like parsing issues and
>>>>>if allowed would would internally result in a dynamic code issue like eval.
>>>>>I don't that this 'code' extension isn't treated as a literal anyway.
>>>>
>>>>Do not misunderstand me, I'm all with you: would you write a Perl
>>>>extension that allows to treat substitutions as first order objects of
>>>>the language? I would cherish that... Unfortunately I *for one*
>>>>haven't the slightest idea of where one could begin!
>>>>
>>>>In the meanwhile we must be happy with a clumsier solution, like...
>>>>
>>>>>I don't know if invoking a 'sub' (/e) is going to be any better than having to
>>>>>parse through a passed in argument list for the proper form. In all cases, it looks
>>>>>like the replacement text cannot include special var's unles an eval is used
>>>>>at runtime.
>>>>>
>>>>>Can you give an example of your regex and a sub solution?
>>>>
>>>>... sure:
>>>>
>>>>	my %subst = ( regex => qr/.../, code => sub { ... } );
>>>>
>>>>And then you use that to perform the substitution. You may even make
>>>>that the core data of a class, thus allowing objects like $subst with
>>>>a suitable ->apply($string) method.
>>>>
>>>>
>>>>Michele
>>>
>>[snip]
>>>
>>>This is a relief for me though. Thanks alot...
>>>
>>[snip]
>>>
>>
>>I settled on this lightweight class that handles the substution with some
>>variable type's. Still it is with minimal error checking to reduce overhead.
>>Added a few methods to generalize access, and it benchmarks pretty good.
>>
>>See any potential problems or performance issues ?
>>
>>sln
>>
>>----------------------
>>use strict;
>>use warnings;
>>

Ran into issues that were fixed. I just want to close this out with
the correct default 'code' sub, changed types, and added 'search_g()' method.
Thanks.

sln



sub NewRxP
{
	my ($regex,$code,$type) = @_;
	if (defined $code && ref($code) ne 'CODE') {
		my $temp = $type;
		$type = $code;
		$code = $temp;
	}
	return RxP->new('regex'=>$regex,'code'=>$code,'type'=>$type);
}


# =================

package RxP;
use vars qw(@ISA);
@ISA = qw();

sub new
{
	my ($class, @args) = @_;
	my $self  = {
	  'regex' => '',
	  'code'  => sub{''},
	  'type'  => 's',
	  'dflt_sub' => \&search
	};
	while (my ($name, $val) = splice (@args, 0, 2)) {
		next if (!defined $val);
		if ('regex' eq lc $name) {
			$self->{'regex'} = $val;
		}
		elsif ('code' eq lc $name && ref($val) eq 'CODE') {
			$self->{'code'} = $val;
		}
		elsif ('type' eq lc $name && $val =~ /(sg|gs|rg|gr|s|r)/i) {
			set_type ($self, lc $1);
		}
	}
	return bless ($self, $class);
}
sub get_type
{
	return $_[0]->{'type'};
}
sub set_type
{
	return 0 unless (defined $_[1]);
	if ($_[1] =~ /(sg|gs|rg|gr|s|r)/i) {
		$_[0]->{'dflt_sub'} = {
		   's'  => \&search,
		   'sg' => \&search_g,
		   'gs' => \&search_g,
		   'r'  => \&replace,
		   'rg' => \&replace_g,
		   'gr' => \&replace_g
		}->{lc $1};
		$_[0]->{'type'} = lc $1;
		return 1;
	}
	return 0;
}
sub apply
{
	return 0 unless (defined $_[1]);
	return &{$_[0]->{'dflt_sub'}};
}
sub search
{
	return 0 unless (defined $_[1]);
	return $_[1] =~ /$_[0]->{'regex'}/;
}
sub search_g
{
	return 0 unless (defined $_[1]);
	return $_[1] =~ /$_[0]->{'regex'}/g;
}
sub replace
{
	return 0 unless (defined $_[1]);
	return $_[1] =~ s/$_[0]->{'regex'}/&{$_[0]->{'code'}}/e;
}
sub replace_g
{
	return 0 unless (defined $_[1]);
	return $_[1] =~ s/$_[0]->{'regex'}/&{$_[0]->{'code'}}/ge;
}



------------------------------

Date: Sat, 8 Nov 2008 15:38:56 +0100
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: determine when to change to or from daylight savings time
Message-Id: <pan.2008.11.08.14.38.56@rtij.nl.invlalid>

On Sat, 08 Nov 2008 21:13:54 +1100, Martien Verbruggen wrote:

> THis is the second time you've used the (presumably) acronym CRTL. What
> exactly do you mean by that? I don't know the term, and Google,
> Wikipedia freedictionary and the hacker's dictionary also don't seem to
> help.

C RunTime Library, aka CRT.

M4


------------------------------

Date: Sat, 08 Nov 2008 18:38:16 GMT
From: sln@netherlands.com
Subject: Re: Help: How can I parse this properties file?
Message-Id: <c4lbh4pdqtfeuc9qvofq8fnb52flphtgva@4ax.com>

On Fri, 7 Nov 2008 19:55:48 -0800 (PST), "yuanyun.ken" <yuanyun.ken@gmail.com> wrote:

>Thanks for all the reply. and this problem has been solved.
>but sorry for my poor understanding on regex, and having to trouble
>you again,
>here I have another little problem:
>if the content ends with a real single backslash, I need read in the
>next line.
>
>How to use regex to do this?
>for example:
>line ends with               match
>\                            yes
>\\                           no
>\\\                          yes
>\\\\                         no
>Thanks for any help again.

I assume this pertains to the rules set out on the properties
in the original problem statement.

Tad's solution to check then end for 'odd' number of '\' works best
for a line continuation.

Be very cautious!! If you are trying to find a way to fix random
line splits when this file was generated, there is absolutely
NO solution available to you at all !!!
The reason is you already have escaping rules in place

The line split must be intelligently constucted in that only
an odd number of '\' at the end will determine line continuation.
And at the same time be used in the general escaping rules after
it is joined.

You can't just add a '\' where you would like to split the line then
remove it later without counting the existing escapes at the end.
Either way it takes intelligence to construct the file given the
existing escaping rules you laid out for yourself.

Notice the places where the split occurs in DATA below..
Even if you had an intelligent generator that splits the
line on a '\', it could still split on an even boundry.
Or say it adds a complement to make the split odd, still,
even then, the original can not be guaranteed to reassemble
because this conflicts with the original escape logic..

There is no solution then!


sln

use strict;
use warnings;

# ** Original
# expression  means:    key     value
# a=b=c                 a        b=c
# a\=b=c                a=b      c
# a\\=b=c               a\       b=c
# a\\\=b=c              a\=b     c

# ** Output
# a=b=c                   a       b=c
# a\=b=c                  a=b     c
# a\\=b=c                 a\      b=c
# a\\\=b=c                a\=b    c
# a\\\\=b=c               a\\     b=c
# a\\\=b\\=c              a\=b\   c
# a\\=b\\=c               a\      b\=c
# a=b=c                   a       b=c
# a\=b=c                  a=b     c
# a\\=b=c                 a\      b=c
# a\\\=b=c                a\=b    c
# a\\\\=b=c               a\\     b=c
# a\\\=b\\=c              a\=b\   c
# a\\=b\\=c               a\      b\=c



my $buf = '';

print "\nexpression  means:\tkey\tvalue\n";

foreach ( <DATA> ) {
	chomp;
	$_ = $buf . $_;

	if ( /(\\+)$/ and length($1) % 2 ) {
		# wouldn't want to do this ->  s/\\$//;
		$buf .= $_;	# cat this line to buffer
		next;		# read next line
	}
	if (/^((?:(?:\\.)*?|.*?)+)=(.*)$/) {
		# unescape built in sequences
		my ($key, $val) = ($1,$2);
		$key =~ s/\\(.)/$1/g;
		$val =~ s/\\(.)/$1/g;
		printf "%-20s\t%s\t%s\n", $_, $key, $val;
	}
	$buf = '';
}

__DATA__

# no line splits
a=b=c
a\=b=c
a\\=b=c
a\\\=b=c
a\\\\=b=c
a\\\=b\\=c
a\\=b\\=c

# ok line splits
a=b=c
a\
=b=c
a\
\=b=c
a\\\
=b=c
a\\\
\=b=c
a\\\=b\
\=c
a\\=b\
\=c

#some good/bad line splits
a=b=c
a\
=b=c
a\\
=b=c
a\\\
=b=c
a\\\\
=b=c
a\\\
=b\\=c
a\\=b\\
=c



------------------------------

Date: Sat, 08 Nov 2008 18:44:27 GMT
From: sln@netherlands.com
Subject: Re: Help: How can I parse this properties file?
Message-Id: <aenbh41rdhcfffimpn0b8j96j9bh3npp3o@4ax.com>

On Sat, 08 Nov 2008 18:38:16 GMT, sln@netherlands.com wrote:

>On Fri, 7 Nov 2008 19:55:48 -0800 (PST), "yuanyun.ken" <yuanyun.ken@gmail.com> wrote:
>
>>Thanks for all the reply. and this problem has been solved.
>>but sorry for my poor understanding on regex, and having to trouble
>>you again,
>>here I have another little problem:
>>if the content ends with a real single backslash, I need read in the
>>next line.
>>
>>How to use regex to do this?
>>for example:
>>line ends with               match
>>\                            yes
>>\\                           no
>>\\\                          yes
>>\\\\                         no
>>Thanks for any help again.
>
>I assume this pertains to the rules set out on the properties
>in the original problem statement.
>
>Tad's solution to check then end for 'odd' number of '\' works best
>for a line continuation.
>
>Be very cautious!! If you are trying to find a way to fix random
>line splits when this file was generated, there is absolutely
>NO solution available to you at all !!!
>The reason is you already have escaping rules in place
>
>The line split must be intelligently constucted in that only
>an odd number of '\' at the end will determine line continuation.
>And at the same time be used in the general escaping rules after
>it is joined.
>
>You can't just add a '\' where you would like to split the line then
>remove it later without counting the existing escapes at the end.
>Either way it takes intelligence to construct the file given the
>existing escaping rules you laid out for yourself.
>
>Notice the places where the split occurs in DATA below..
>Even if you had an intelligent generator that splits the
>line on a '\', it could still split on an even boundry.
>Or say it adds a complement to make the split odd, still,
>even then, the original can not be guaranteed to reassemble
>because this conflicts with the original escape logic..
>
>There is no solution then!
>
>
>sln
>
>use strict;
>use warnings;
>
># ** Original
># expression  means:    key     value
># a=b=c                 a        b=c
># a\=b=c                a=b      c
># a\\=b=c               a\       b=c
># a\\\=b=c              a\=b     c
>
# ** Output
# a=b=c                   a       b=c
# a\=b=c                  a=b     c
# a\\=b=c                 a\      b=c
# a\\\=b=c                a\=b    c
# a\\\\=b=c               a\\     b=c
# a\\\=b\\=c              a\=b\   c
# a\\=b\\=c               a\      b\=c
# a=b=c                   a       b=c
# a\=b=c                  a=b     c
# a\\=b=c                 a\      b=c
# a\\\=b=c                a\=b    c
# a\\\\=b=c               a\\     b=c
# a\\\=b\\=c              a\=b\   c
# a\\=b\\=c               a\      b\=c
# a=b=c                   a       b=c
# a\=b=c                  a=b     c
# =b=c                            b=c
# a\\\=b=c                a\=b    c
# =b=c                            b=c
# a\\\=b\\=c              a\=b\   c
# a\\=b\\                 a\      b\
# =c
>
>my $buf = '';
>
>print "\nexpression  means:\tkey\tvalue\n";
>
>foreach ( <DATA> ) {
>	chomp;
>	$_ = $buf . $_;
>
>	if ( /(\\+)$/ and length($1) % 2 ) {
>		# wouldn't want to do this ->  s/\\$//;
>		$buf .= $_;	# cat this line to buffer
>		next;		# read next line
>	}
>	if (/^((?:(?:\\.)*?|.*?)+)=(.*)$/) {
>		# unescape built in sequences
>		my ($key, $val) = ($1,$2);
>		$key =~ s/\\(.)/$1/g;
>		$val =~ s/\\(.)/$1/g;
>		printf "%-20s\t%s\t%s\n", $_, $key, $val;
>	}
>	$buf = '';
>}
>
>__DATA__
>
># no line splits
>a=b=c
>a\=b=c
>a\\=b=c
>a\\\=b=c
>a\\\\=b=c
>a\\\=b\\=c
>a\\=b\\=c
>
># ok line splits
>a=b=c
>a\
>=b=c
>a\
>\=b=c
>a\\\
>=b=c
>a\\\
>\=b=c
>a\\\=b\
>\=c
>a\\=b\
>\=c
>
>#some good/bad line splits
>a=b=c
>a\
>=b=c
>a\\
>=b=c
>a\\\
>=b=c
>a\\\\
>=b=c
>a\\\
>=b\\=c
>a\\=b\\
>=c



------------------------------

Date: Sat, 08 Nov 2008 15:26:18 -0500
From: Mark Seger <Mark.Seger@hp.com>
Subject: Re: Sockets and threads...
Message-Id: <gf4slb$2e0$1@usenet01.boi.hp.com>


> Sharing filhandles thru the fileno is the way to go with threads.
yes, clearly the way to go!  I now have a thread that listens for 
connections, accepts them and writes the fileno into a share array.
-mark


------------------------------

Date: Sat, 8 Nov 2008 23:22:13 +0800
From: "ela" <ela@yantai.org>
Subject: Split a multi-sequence file into individual files
Message-Id: <gf4ar4$mag$1@ijustice.itsc.cuhk.edu.hk>

From google, no need to reinvent the wheel but this one line code is too 
difficult to understand...

perl -ne 'BEGIN{ $/=">"; } if(/^\s*(\S+)/){ open(F,">$1.fsa")||warn"$1 write 
failed:$!\n";chomp;print F ">", $_ }' fastafile

anybody helps? 




------------------------------

Date: Sat, 8 Nov 2008 10:07:45 -0600
From: Tad J McClellan <tadmc@seesig.invalid>
Subject: Re: Split a multi-sequence file into individual files
Message-Id: <slrnghbeah.tt2.tadmc@tadmc30.sbcglobal.net>

ela <ela@yantai.org> wrote:
> From google, no need to reinvent the wheel but this one line code is too 
> difficult to understand...
>
> perl -ne 'BEGIN{ $/=">"; } if(/^\s*(\S+)/){ open(F,">$1.fsa")||warn"$1 write 
> failed:$!\n";chomp;print F ">", $_ }' fastafile
>
> anybody helps? 


BEGIN{ $/=">"; }              # set the Input Record Separator (perlvar.pod)
while ( <> ) {                # -n wraps in a while-diamond loop
    if( /^\s*(\S+)/ ){        # grab the first non-whitespace characters
        open(F,">$1.fsa") || warn"$1 write failed:$!\n"; # open a file
        chomp;                # remove ">" from end of string
        print F ">", $_;      # print ">" at beginning of string
    }
}



-- 
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


------------------------------

Date: Sat, 08 Nov 2008 20:49:32 +0100
From: Mirco Wahab <wahab-mail@gmx.de>
Subject: Re: Split a multi-sequence file into individual files
Message-Id: <gf4qkn$obj$1@mlucom4.urz.uni-halle.de>

Tad J McClellan wrote:
> ela <ela@yantai.org> wrote:
>> From google, no need to reinvent the wheel but this one line code is too 
>> difficult to understand...
>>
>> perl -ne 'BEGIN{ $/=">"; } if(/^\s*(\S+)/){ open(F,">$1.fsa")||warn"$1 write 
>> failed:$!\n";chomp;print F ">", $_ }' fastafile
>>
>> anybody helps? 
> 
> 
> BEGIN{ $/=">"; }              # set the Input Record Separator (perlvar.pod)
> while ( <> ) {                # -n wraps in a while-diamond loop
>     if( /^\s*(\S+)/ ){        # grab the first non-whitespace characters
>         open(F,">$1.fsa") || warn"$1 write failed:$!\n"; # open a file
>         chomp;                # remove ">" from end of string
>         print F ">", $_;      # print ">" at beginning of string
>     }
> }

I don't understand the purpose of the chomp,
maybe it needs to be in front of the if():

  ...
  local $/ = '>';
  while (<>) {
     chomp;
     if( /\s*(\S+)/ ) {
        open my $fh, '>', "$1.fsa" or warn "$1 $!";
        print $fh '>'.$_
     }
  }
  ...

Regards

M.


------------------------------

Date: Sat, 08 Nov 2008 12:00:01 -0800
From: Tim Greer <tim@burlyhost.com>
Subject: Re: Split a multi-sequence file into individual files
Message-Id: <5dmRk.543$e5.23@newsfe01.iad>

Mirco Wahab wrote:

> Tad J McClellan wrote:
>> ela <ela@yantai.org> wrote:
>>> From google, no need to reinvent the wheel but this one line code is
>>> too difficult to understand...
>>>
>>> perl -ne 'BEGIN{ $/=">"; } if(/^\s*(\S+)/){
>>> open(F,">$1.fsa")||warn"$1 write failed:$!\n";chomp;print F ">", $_
>>> }' fastafile
>>>
>>> anybody helps?
>> 
>> 
>> BEGIN{ $/=">"; }              # set the Input Record Separator
>> (perlvar.pod)
>> while ( <> ) {                # -n wraps in a while-diamond loop
>>     if( /^\s*(\S+)/ ){        # grab the first non-whitespace
>>     characters
>>         open(F,">$1.fsa") || warn"$1 write failed:$!\n"; # open a
>>         file
>>         chomp;                # remove ">" from end of string
>>         print F ">", $_;      # print ">" at beginning of string
>>     }
>> }
> 
> I don't understand the purpose of the chomp,
> maybe it needs to be in front of the if():
> 
>   ...
>   local $/ = '>';
>   while (<>) {
>      chomp;
>      if( /\s*(\S+)/ ) {
>         open my $fh, '>', "$1.fsa" or warn "$1 $!";
>         print $fh '>'.$_
>      }
>   }
>   ...
> 
> Regards
> 
> M.

perldoc -f chomp

Chomp removes any newline, if one exists (which it probably would on
<>).

It's the difference between (trying to) opening:
$1.fsa

and

$1
 .fsa


-- 
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!


------------------------------

Date: Sat, 08 Nov 2008 12:01:05 -0800
From: Tim Greer <tim@burlyhost.com>
Subject: Re: Split a multi-sequence file into individual files
Message-Id: <6emRk.544$e5.519@newsfe01.iad>

Tim Greer wrote:

> Mirco Wahab wrote:
> 
>> Tad J McClellan wrote:
>>> ela <ela@yantai.org> wrote:
>>>> From google, no need to reinvent the wheel but this one line code
>>>> is too difficult to understand...
>>>>
>>>> perl -ne 'BEGIN{ $/=">"; } if(/^\s*(\S+)/){
>>>> open(F,">$1.fsa")||warn"$1 write failed:$!\n";chomp;print F ">", $_
>>>> }' fastafile
>>>>
>>>> anybody helps?
>>> 
>>> 
>>> BEGIN{ $/=">"; }              # set the Input Record Separator
>>> (perlvar.pod)
>>> while ( <> ) {                # -n wraps in a while-diamond loop
>>>     if( /^\s*(\S+)/ ){        # grab the first non-whitespace
>>>     characters
>>>         open(F,">$1.fsa") || warn"$1 write failed:$!\n"; # open a
>>>         file
>>>         chomp;                # remove ">" from end of string
>>>         print F ">", $_;      # print ">" at beginning of string
>>>     }
>>> }
>> 
>> I don't understand the purpose of the chomp,
>> maybe it needs to be in front of the if():
>> 
>>   ...
>>   local $/ = '>';
>>   while (<>) {
>>      chomp;
>>      if( /\s*(\S+)/ ) {
>>         open my $fh, '>', "$1.fsa" or warn "$1 $!";
>>         print $fh '>'.$_
>>      }
>>   }
>>   ...
>> 
>> Regards
>> 
>> M.
> 
> perldoc -f chomp
> 
> Chomp removes any newline, if one exists 

Pardon... to be clear, it removes the new line at the end of the string
(not just any new line). 
-- 
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!


------------------------------

Date: Sat, 08 Nov 2008 21:06:55 +0100
From: Mirco Wahab <wahab-mail@gmx.de>
Subject: Re: Split a multi-sequence file into individual files
Message-Id: <gf4rla$omd$1@mlucom4.urz.uni-halle.de>

Tim Greer wrote:
> Mirco Wahab wrote:
>> Tad J McClellan wrote:
>>> BEGIN{ $/=">"; }              # set the Input Record Separator
>>> (perlvar.pod)
>>> while ( <> ) {                # -n wraps in a while-diamond loop
>>>     if( /^\s*(\S+)/ ){        # grab the first non-whitespace
>>>     characters
>>>         open(F,">$1.fsa") || warn"$1 write failed:$!\n"; # open a
>>>         file
>>>         chomp;                # remove ">" from end of string
>>>         print F ">", $_;      # print ">" at beginning of string
>>>     }
>>> }
>> I don't understand the purpose of the chomp,
>> maybe it needs to be in front of the if():
>>
>>   ...
>>   local $/ = '>';
>>   while (<>) {
>>      chomp;
>>      if( /\s*(\S+)/ ) {
>>         open my $fh, '>', "$1.fsa" or warn "$1 $!";
>>         print $fh '>'.$_
>>      }
>>   }
>>   ...
> 
> perldoc -f chomp
> 
> Chomp removes any newline, if one exists (which it probably would on
> <>).

No, it doesn't. It removes the $/, which is
here the '>'.

> It's the difference between (trying to) opening:
> $1.fsa
> 
> and
> 
> $1
> .fsa

No way. In the above problem, it would on the
first record get the '>' in $1, which leads
to an open argument of ">>.fsa" which
creates a file '.fsa' that contains noting.

Regards

M.
























------------------------------

Date: Sat, 08 Nov 2008 12:32:31 -0800
From: Tim Greer <tim@burlyhost.com>
Subject: Re: Split a multi-sequence file into individual files
Message-Id: <zHmRk.111$Cn3.10@newsfe20.iad>

Mirco Wahab wrote:

>> perldoc -f chomp
>> 
>> Chomp removes any newline, if one exists (which it probably would on
>> <>).
> 
> No, it doesn't. It removes the $/, which is
> here the '>'.

My newsreader is interpreting / / and <> for some reason (and I'm not
seeing what I should be seeing), so I didn't see all of the code for
what it was, I guess.  I saw while (<>) { chomp; ... } and hence my
reply. Disregard if it wasn't relevant after all.
-- 
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 1972
***************************************


home help back first fref pref prev next nref lref last post