[25585] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 7829 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Feb 25 06:05:53 2005

Date: Fri, 25 Feb 2005 03:05:23 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 25 Feb 2005     Volume: 10 Number: 7829

Today's topics:
        @data = @items{@fields} construct after a foreach - whe <strangeuser@strangeuser.com>
    Re: @data = @items{@fields} construct after a foreach - <nobull@mail.com>
    Re: [perl-python] generic equivalence partition <eppstein@ics.uci.edu>
    Re: [perl-python] generic equivalence partition <eppstein@ics.uci.edu>
        create thread from within another? <gargoyle@no.spam>
    Re: cubic root subroutine (John M. Gamble)
    Re: cubic root subroutine <do-not-use@invalid.net>
    Re: Division/math bug in perl? (Anno Siegel)
    Re: Division/math bug in perl? <nospam-abuse@ilyaz.org>
    Re: Division/math bug in perl? <do-not-use@invalid.net>
    Re: Having Trouble Recursing a Function <zen13097@zen.co.uk>
    Re: How to generate random emails? <infos@leocharre.com>
    Re: How to tell if a subroutine arg is a constant <vek@station02.ohout.pharmapartners.nl>
    Re: Intercepting data flow between 2 apps <hackeras@gmail.com>
    Re: OOP Tutorial <leslievNO@SPAMicoc.co.za>
    Re: OOP Tutorial <abigail@abigail.nl>
        Parsing a chemical formal <luotao@kammer.uni-hannover.de>
    Re: Parsing a chemical formal <abigail@abigail.nl>
    Re: Parsing a chemical formal <mark.clements@kcl.ac.uk>
    Re: Parsing a chemical formal <newspost@kohombanDELETE.net>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 25 Feb 2005 01:02:00 -0500
From: Henry Lenzi <strangeuser@strangeuser.com>
Subject: @data = @items{@fields} construct after a foreach - where can I find a literature reference to this?
Message-Id: <87bra9w52f.fsf@Knoppix.i-did-not-set--mail-host-address--so-shoot-me>


# Hi all -- 

# I'm unable to find a justification,  in the Perl literature, of the @data=@items{@fields}

# construct. In fact, I can't find it! 

# In a previous message, Message-Id: <slrnd1qnn6.7fp.tadmc@magna.augustmail.com>

# the following script sprung from the discussion:


#!/usr/bin/perl
use warnings;
use strict;

my  @array;

_writeMiniCard();

sub _writeMiniCard 
{

    print "Choice (1) (2) \n";
    my $choice = <STDIN>;

    if ($choice == 1) {
	_enterCard(); }
    elsif ($choice == 2) {
    print "choice 2\n";#stub  
    }}

# Thanks to Tad McClellan for this
sub _enterCard { 
      my %items;
      my @fields = ( 'AUTHOR(S)= ',  'TITLE= ',   'KEYWORDS= ',
                       'SOURCE/JOURNAL= ', 'VOL= ',   'YEAR= ',
                       'PAGES= ',   'EDITOR= ',   'ETC= ',   'X-REF= '
                     );

      foreach my $prompt ( @fields ) {
         print "$prompt ";
         $items{$prompt} = $prompt . <STDIN>; # Aah! I Got it!
      }

# What came next I found truly intriguing!

# Wow! It will iterate ?! It blew my mind!

# How come this happened ?!?! 

      
     my  @array = @items{ @fields };
      
      print "\n @array\n";
  }

# Here's my tentative explanation (*please* correct me if I'm wrong):

# 1) The foreach expression offers a list context. The control variable

# ('$prompt', in this case) receives each element of the @fields array,

# and through '$items{$prompt} =  $prompt . <STDIN>' the lexically scoped

# hash '%items' is filled. That is, the result is in list context;

# 2) Then, 'my @array = @items{ @fields }'...What is going on is an

# attribution operation, and pairs abtained from '@items { @fields }'

# fill up the '@array'.


# I *think* I might have justified the functioning of the code.

# However, I would be just *so* grateful if anyone could point

# out a source in the Perl "literature" that specifically mentions

# or justifies the iteration form '@array = @items{ @fields }'.

# Thanks very much,

# best regards,

#           Henry Lenzi





------------------------------

Date: Fri, 25 Feb 2005 09:05:38 +0000
From: Brian McCauley <nobull@mail.com>
Subject: Re: @data = @items{@fields} construct after a foreach - where can I find a literature reference to this?
Message-Id: <cvmpf0$k3u$1@sun3.bham.ac.uk>



Henry Lenzi wrote:

> # I'm unable to find a justification,  in the Perl literature, of the @data=@items{@fields}
> # construct. In fact, I can't find it! 

It is called a hash slice.

http://perldoc.perldrunks.org/perldata.html#Slices

> # In a previous message, Message-Id: <slrnd1qnn6.7fp.tadmc@magna.augustmail.com>
> # the following script sprung from the discussion:
> 
> #!/usr/bin/perl
> use warnings;
> use strict;
> 
> my  @array;
> 
> _writeMiniCard();
> 
> sub _writeMiniCard 
> {
>     print "Choice (1) (2) \n";
>     my $choice = <STDIN>;
> 
>     if ($choice == 1) {
> 	_enterCard(); }
>     elsif ($choice == 2) {
>     print "choice 2\n";#stub  
>     }}
> 
> # Thanks to Tad McClellan for this
> sub _enterCard { 
>       my %items;
>       my @fields = ( 'AUTHOR(S)= ',  'TITLE= ',   'KEYWORDS= ',
>                        'SOURCE/JOURNAL= ', 'VOL= ',   'YEAR= ',
>                        'PAGES= ',   'EDITOR= ',   'ETC= ',   'X-REF= '
>                      );
> 
>       foreach my $prompt ( @fields ) {
>          print "$prompt ";
>          $items{$prompt} = $prompt . <STDIN>; # Aah! I Got it!
>       }
> 
> # What came next I found truly intriguing!
> 
> # Wow! It will iterate ?! It blew my mind!
> 
> # How come this happened ?!?! 
> 
>       
>      my  @array = @items{ @fields };
>       
>       print "\n @array\n";
>   }
> 
> # Here's my tentative explanation (*please* correct me if I'm wrong):
> 
> # 1) The foreach expression offers a list context. The control variable
> # ('$prompt', in this case) receives each element of the @fields array,
> # and through '$items{$prompt} =  $prompt . <STDIN>' the lexically scoped
> # hash '%items' is filled. That is, the result is in list context;

Upto the last sentence that was right.  The last sentence, as far as I 
can see, makes no sense whatsoever.


> # 2) Then, 'my @array = @items{ @fields }'...What is going on is an
> # attribution operation, and pairs abtained from '@items { @fields }'
> # fill up the '@array'.

There are no pairs involved.  What is going on is a mapping operation. 
An ascociative array (called 'hash' in Perl for historical reasons) is a 
mapping.  Here you apply the mapping defined by %items to the values in 
@fields and put the outcome in @array.



------------------------------

Date: Thu, 24 Feb 2005 20:59:48 -0800
From: David Eppstein <eppstein@ics.uci.edu>
Subject: Re: [perl-python] generic equivalence partition
Message-Id: <eppstein-CDFABC.20594724022005@news.service.uci.edu>

In article <1109245733.261643.219420@f14g2000cwb.googlegroups.com>,
 "Xah Lee" <xah@xahlee.org> wrote:

> parti(aList, equalFunc)
> 
> given a list aList of n elements, we want to return a list that is a
> range of numbers from 1 to n, partition by the predicate function of
> equivalence equalFunc. (a predicate function is a function that
> takes two arguments, and returns either True or False.)

In Python it is much more natural to use ranges from 0 to n-1.
In the worst case, this is going to have to take quadratic time 
(consider an equalFunc that always returns false) so we might as well do 
something really simple rather than trying to be clever.

def parti(aList,equalFunc):
    eqv = []
    for i in range(len(aList)):
        print i,eqv
        for L in eqv:
            if equalFunc(aList[i],aList[L[0]]):
                L.append(i)
                break;
        else:
            eqv.append([i])

If you really want the ranges to be 1 to n, add one to each number in 
the returned list-of-lists.

-- 
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/


------------------------------

Date: Thu, 24 Feb 2005 21:29:12 -0800
From: David Eppstein <eppstein@ics.uci.edu>
Subject: Re: [perl-python] generic equivalence partition
Message-Id: <eppstein-FBAF5D.21291224022005@news.service.uci.edu>

In article <eppstein-CDFABC.20594724022005@news.service.uci.edu>,
 David Eppstein <eppstein@ics.uci.edu> wrote:

> def parti(aList,equalFunc):
>     eqv = []
>     for i in range(len(aList)):
>         print i,eqv
>         for L in eqv:
>             if equalFunc(aList[i],aList[L[0]]):
>                 L.append(i)
>                 break;
>         else:
>             eqv.append([i])

Um, take out the print, that was just there for me to debug my code.

-- 
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/


------------------------------

Date: Fri, 25 Feb 2005 02:46:29 GMT
From: gargoyle <gargoyle@no.spam>
Subject: create thread from within another?
Message-Id: <9kwTd.17720$hd6.8694@bignews1.bellsouth.net>

I'm using ActivePerl 5.8.6.811 and have a question regarding threads.

What are the implications of starting a thread from within another (as
opposed to starting all threads from the main thread, ie. tid 0)?  Will
any nasty bugs or strange issues come to bite me?  In other words, was
the ithreads model written to handle this sort of behavior?

The reason I'm asking is because I'm trying to find a way to keep memory
footprint to a minimum, and yet be able to spawn new threads at any
given time.  I figured a "dispatcher" thread could be started as early
as possible (before most modules are loaded, data structures filled,
etc.) and then when a new worker thread is needed, main could signal the
dispatcher (via a queue) and tell it to create a new thread with some
specific parameters (based on the job that needs to be accomplished at
that given moment).

Will this work, or is it just some crazy fantasy?


------------------------------

Date: Fri, 25 Feb 2005 08:27:15 +0000 (UTC)
From: jgamble@ripco.com (John M. Gamble)
Subject: Re: cubic root subroutine
Message-Id: <cvmnh3$6a0$2@e250.ripco.com>

In article <Xns96054F10B78DCasu1cornelledu@127.0.0.1>,
A. Sinan Unur <1usa@llenroc.ude.invalid> wrote:
>
>On the other hand, if you just wanted to report results in integers:
>
>D:\> perl -e "printf q{%.0f}, ((64)**(1/3))"
>4
>

I'm still at a loss as to why Math::Complex's cbrt() isn't
considered sufficient.

c:\users\jgamble>perl -MMath::Complex -le "print cbrt(64);"
4


-- 
	-john

February 28 1997: Last day libraries could order catalogue cards
from the Library of Congress.


------------------------------

Date: 25 Feb 2005 09:59:55 +0100
From: Arndt Jonasson <do-not-use@invalid.net>
Subject: Re: cubic root subroutine
Message-Id: <yzdk6oxxbec.fsf@invalid.net>


jgamble@ripco.com (John M. Gamble) writes:
> In article <Xns96054F10B78DCasu1cornelledu@127.0.0.1>,
> A. Sinan Unur <1usa@llenroc.ude.invalid> wrote:
> >
> >On the other hand, if you just wanted to report results in integers:
> >
> >D:\> perl -e "printf q{%.0f}, ((64)**(1/3))"
> >4
> >
> 
> I'm still at a loss as to why Math::Complex's cbrt() isn't
> considered sufficient.
> 
> c:\users\jgamble>perl -MMath::Complex -le "print cbrt(64);"
> 4

Somehow I missed your suggestion when I continued to critizise the
"**(1/3)" solution. Sorry about that. 'cbrt' seems to take care about
integerness (whether that is guaranteed I couldn't see from a quick
glance in the manual):

    arndt ~/perl 4359> perl -MMath::Complex -le "print (cbrt(64)==4);"
    1
    arndt ~/perl 4360> perl -le "print (((64)**(1/3))==4)"           

    arndt ~/perl 4361> 


------------------------------

Date: 25 Feb 2005 09:27:15 GMT
From: anno4000@lublin.zrz.tu-berlin.de (Anno Siegel)
Subject: Re: Division/math bug in perl?
Message-Id: <cvmr1j$mjl$1@mamenchi.zrz.TU-Berlin.DE>

Alfred Z. Newmane <a.newmane.remove@eastcoastcz.com> wrote in comp.lang.perl.misc:
> darkon wrote:

> > But the integer part of -2.6 is -2, not -3.
> 
> Not mathematically it isn't.

Mathematics is what mathematicians define it to be.

In particular, the int function is rarely used in mathematics (the
floor and ceiling functions are).  There is no binding convention
how it would have to be defined.

> I should be -3. Think of it like this. The 
> int part of 2.6 is 2, which is the /lowest/ number before the next 
                                     ^^^^^^^
                                     highest

> integer on the number line. Applying this to -2.6, the /lowest/ number 
> before the next integer is -3.

It is also the integer whose absolute value is maximal below or equal
to (the absolute value of) 2.6.  Apply that to -2.6, and the result is -3.

Anno


------------------------------

Date: Fri, 25 Feb 2005 09:45:21 +0000 (UTC)
From:  Ilya Zakharevich <nospam-abuse@ilyaz.org>
Subject: Re: Division/math bug in perl?
Message-Id: <cvms3h$2nrg$1@agate.berkeley.edu>

[A complimentary Cc of this posting was sent to
Snail
<snail@localhost.com>], who wrote in article <cvllkr$s7v$1@news.astound.net>:

> I think I was a little missleading then, and I'm sorry. I knew that int 
> does what it does. What I was really getting at was why laguages like 
> Perl, c, c++, etc, do this sort of division in the first place?

I have no idea why C had chosen this (IMO, completely broken) semantic
of convert-to-integer.  But since C did it, so should have C++.

Now why Perl did it?  Before about v5.005, Perl was just a very
shallow wrapper about C w.r.t. numeric stuff.  And when I got bold
enough to change the semantic of numerics, the backward compatibility
stroke in.  One of arguments (IIRC, by tchrist) was that the code like

  my $digit = int random 10;

was a legitimate Perl, so it would not be very nice to suddently make
it produce 10 as a possible answer.

Hope this helps,
Ilya


------------------------------

Date: 25 Feb 2005 10:53:00 +0100
From: Arndt Jonasson <do-not-use@invalid.net>
Subject: Re: Division/math bug in perl?
Message-Id: <yzdd5upx8xv.fsf@invalid.net>


"Snail" <snail@localhost.com> writes:
> [...]
> I only wanted to start a dicussion on 
> why langs like Perl, c, c++ (and java?) do this sort of division. I am 
> thinking there has to be some logical reason why languages to this.

I think comp.programming may be the right group for this, but I haven't
read it for a long time, so I don't know if it's still a useful group.

I'm sure the question has been asked many times in comp.lang.c, and I
would be surprised if Chris Torek hasn't given an excellent answer at
some time.

C++ inherited the C semantics, of course. Perl maybe took the C semantics
because C was the predominant language in the environment where Perl
was developed (but this is speculation on my part).

For some languages, all the relevant functions exist: truncating upward,
downward and towards zero. The former are often called 'ceiling' and
'floor' when they exist.

It's also historically ill-defined what a 'mod' function does when its
second argument is negative. I think that C, at least originally, took
the view that the C 'mod' operator ('%') did whatever the processor
instruction did, which was the expected thing for positive
arguments, and something you should not rely upon for negative arguments.
I don't know what the most recent C standard says about the subject.


------------------------------

Date: 25 Feb 2005 07:34:15 GMT
From: Dave Weaver <zen13097@zen.co.uk>
Subject: Re: Having Trouble Recursing a Function
Message-Id: <421ed4f7$0$32613$db0fefd9@news.zen.co.uk>

On Thu, 24 Feb 2005 22:40:44 GMT, Mark Healey <die@spammer.die> wrote:
>  Can anyone tell me why the following only goes one level deep in the
>  directory tree?

>  	opendir(DIRHANDLE, $dir);

Further to Jim's response, if you'd checked the return value of
opendir, along with $!, Perl would have given you a large clue.

*Always* check the return from open/opendir !



------------------------------

Date: Fri, 25 Feb 2005 03:29:35 GMT
From: Jean Paul Sartre <infos@leocharre.com>
Subject: Re: How to generate random emails?
Message-Id: <pan.2005.02.25.03.25.21.988439@leocharre.com>

On Thu, 24 Feb 2005 10:35:38 -0800, kongyew wrote:

> Hi,
> 
>         I would like to test my email server. How can i can generate
> random emails with random email address? Does anyone knows any perl
> modules that does it?
> 
> Thanks.
> kongyew@w-manager.com

Sounds like a pretty simple script- why would someone turn that into a
module?

Leo Charre
http://www.b3thm00n.com



------------------------------

Date: 25 Feb 2005 08:39:23 GMT
From: Villy Kruse <vek@station02.ohout.pharmapartners.nl>
Subject: Re: How to tell if a subroutine arg is a constant
Message-Id: <slrnd1tp1r.63m.vek@station02.ohout.pharmapartners.nl>

On Thu, 24 Feb 2005 17:22:03 +0100,
    Gunnar Hjalmarsson <noreply@gunnar.cc> wrote:


> jonnytheclown wrote:
>> Is there an easy way to test if a subroutine argument is a constant.
>> 
>> I know it can be done by trying to assign to the specific element in @_
>> within an eval and testing for an exception but this is a tad ugly -
>> not to mention inefficient.
>
> Not sure what you mean by "constant" in this context. Can't you simply 
> use the ref() function?
>


For example something like this using a reference to a litteral:

#!/usr/bin/perl
sub func {
	print "@_\n";
	${$_[0]} = 5;
}
my $x = \1;

func $x;

__END__

Modification of a read-only value attempted at - line 4.
SCALAR(0x80cd8d4)


Villy


------------------------------

Date: Fri, 25 Feb 2005 09:51:12 +0000 (UTC)
From: Richard Anderson <hackeras@gmail.com>
Subject: Re: Intercepting data flow between 2 apps
Message-Id: <Xns960878E3D72A3hackerasgmailcom@194.177.210.210>

"A. Sinan Unur" <1usa@llenroc.ude.invalid> wrote in 
news:Xns9607A618AE680asu1cornelledu@127.0.0.1:

> Your question is off-topic until you have some Perl to post.

OK, i'll perl post something as soon as i learn about perl socket 
programming but is my idea fucntional?

I mean having the sniffers specific ip traffic logged in a.txt file and 
then alter the info i want from it and then resubmitting it where it has to 
go?

Is this the things that my perl prog is suppose to do?


------------------------------

Date: Fri, 25 Feb 2005 10:25:40 +0200
From: Leslie Viljoen <leslievNO@SPAMicoc.co.za>
Subject: Re: OOP Tutorial
Message-Id: <6K6dnRS1zceefIPfRVn-vA@is.co.za>

Abigail wrote:
> Leslie Viljoen (leslievNO@SPAMicoc.co.za) wrote on MMMMCXCV September
> MCMXCIII in <URL:news:3Imdnb-IjMILooPfRVn-uw@is.co.za>:
> //  
> //  We must bow to the demands of the masses:
> //  http://www.icon.co.za/~mobeus/easyoop.pdf.zip
> 
> 
> wget failed to get it, repeatedly. Timeouts.
> 
> 
> Abigail

Try again? Works here. But then I am in South Africa.
Maybe you were downloading it as soon as I sent the mail
and it was still busy uploading.



------------------------------

Date: 25 Feb 2005 09:21:04 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: OOP Tutorial
Message-Id: <slrnd1trg0.td3.abigail@alexandra.abigail.nl>

Leslie Viljoen (leslievNO@SPAMicoc.co.za) wrote on MMMMCXCVI September
MCMXCIII in <URL:news:6K6dnRS1zceefIPfRVn-vA@is.co.za>:
@@  Abigail wrote:
@@ > Leslie Viljoen (leslievNO@SPAMicoc.co.za) wrote on MMMMCXCV September
@@ > MCMXCIII in <URL:news:3Imdnb-IjMILooPfRVn-uw@is.co.za>:
@@ > //  
@@ > //  We must bow to the demands of the masses:
@@ > //  http://www.icon.co.za/~mobeus/easyoop.pdf.zip
@@ > 
@@ > 
@@ > wget failed to get it, repeatedly. Timeouts.
@@ > 
@@ > 
@@ > Abigail
@@  
@@  Try again? Works here. But then I am in South Africa.
@@  Maybe you were downloading it as soon as I sent the mail
@@  and it was still busy uploading.


Got it using one IP address, still connection time outs from another.


Abigail
-- 
A perl rose:  perl -e '@}-`-,-`-%-'


------------------------------

Date: 25 Feb 2005 09:09:16 GMT
From: Luotao Fu <luotao@kammer.uni-hannover.de>
Subject: Parsing a chemical formal
Message-Id: <slrnd1tqpb.502.luotao@milliways.kammer.uni-hannover.de>

Hi All,
My first post on this Groups, so sorry for any possible stupidity :-)
I'm wrting since days a perl programm. The programm contains a small
routine, wich shall parse a chemical formal and return the name and
portion of single atoms
in the material as a array(or a hash) Well my code looks like that:

my @literals=split /([A-Z])/, $molecule;

for (my $i=0; $i<=$#literals; $i++){
        my @atom;
        print "Literal: ", $literals[$i], "\n";
        push(@atom, $literals[$i]);
        if ($literals[$i+1] !~ /[A-Z]/){
               push(@atom,$literals[$i+1]);
	       $i++;
       }
        push(@atoms,join("",@atom)};
}

The $molecule contains the formal (i.E. H2O, FeCl3 or CaCl), Every Beginning
letter of a element ist written in upper case.  As you can see, I split
first the $molecule with Letters in upper case, which means FeCl3
turns into {F,e,C,l3}, than I scan the splitted list, which is stored
in the array @Literal, for capital
letters, every capital letter will be pushed in a temporary Array. If
the following item in array is not written in upper case, which means, that 
the Name of the atom contains more than one letter, it'll be also pushed in
the same temporary Array, which will be later joined and puted in the
output array. The final result of the Formal H20 should be {H2,O}, 
FeCl3 {Fe,Cl3} and so on....

This works so far, but I'm far not satified with this solution. There
must be better ways to solve it. which more intelligent RegExp and so
on. But I'm not quite familiar to RegExps in Perl, so that I can't think
out any better solution.

Anyone Idea, how I can write this routine more elegantly?

Thanx A lot
Cheers
Luotao Fu


------------------------------

Date: 25 Feb 2005 09:34:12 GMT
From: Abigail <abigail@abigail.nl>
Subject: Re: Parsing a chemical formal
Message-Id: <slrnd1ts8k.td3.abigail@alexandra.abigail.nl>

Luotao Fu (luotao@kammer.uni-hannover.de) wrote on MMMMCXCVI September
MCMXCIII in <URL:news:slrnd1tqpb.502.luotao@milliways.kammer.uni-hannover.de>:
::  Hi All,
::  My first post on this Groups, so sorry for any possible stupidity :-)
::  I'm wrting since days a perl programm. The programm contains a small
::  routine, wich shall parse a chemical formal and return the name and
::  portion of single atoms
::  in the material as a array(or a hash) Well my code looks like that:
::  
::  my @literals=split /([A-Z])/, $molecule;
::  
::  for (my $i=0; $i<=$#literals; $i++){
::          my @atom;
::          print "Literal: ", $literals[$i], "\n";
::          push(@atom, $literals[$i]);
::          if ($literals[$i+1] !~ /[A-Z]/){
::                 push(@atom,$literals[$i+1]);
::  	       $i++;
::         }
::          push(@atoms,join("",@atom)};
::  }
::  
::  The $molecule contains the formal (i.E. H2O, FeCl3 or CaCl), Every Beginning
::  letter of a element ist written in upper case.  As you can see, I split
::  first the $molecule with Letters in upper case, which means FeCl3
::  turns into {F,e,C,l3}, than I scan the splitted list, which is stored
::  in the array @Literal, for capital
::  letters, every capital letter will be pushed in a temporary Array. If
::  the following item in array is not written in upper case, which means, that 
::  the Name of the atom contains more than one letter, it'll be also pushed in
::  the same temporary Array, which will be later joined and puted in the
::  output array. The final result of the Formal H20 should be {H2,O}, 
::  FeCl3 {Fe,Cl3} and so on....
::  
::  This works so far, but I'm far not satified with this solution. There
::  must be better ways to solve it. which more intelligent RegExp and so
::  on. But I'm not quite familiar to RegExps in Perl, so that I can't think
::  out any better solution.

I wouldn't use split, just parse what you want to keep. What you want is
very simple: exactly one capital letter, followed by zero or more lower
case letters, followed by zero or more numbers. Written as a regex, this
is:

    /[A-Z][a-z]*[0-9]*/;

Now you want to keep each match. To do is, put a pair of parens around 
what you want to match (all in this case), and a g after the regex. Then
do the match in list context, and collect the results:

    my @atoms = $molecule =~ /([A-Z][a-z]*[0-9]*)/g;

In this case, that is, using //g in list context, and the only parens being
the ones around the entire regex, I may omit the parens:

    my @atoms = $molecule =~ /[A-Z][a-z]*[0-9]*/g;

If I really wanted to do it with split, I'd do it like this:

    my @atoms = split /(?=[A-Z][a-z]*[0-9]*)/ => $molecule;



Abigail
-- 
$" = "/"; split // => eval join "+" => 1 .. 7;
*{"@_"} = sub {foreach (sort keys %_) {print "$_ $_{$_} "}};
%_ = (Just => another => Perl => Hacker); &{%_};


------------------------------

Date: Fri, 25 Feb 2005 10:43:57 +0100
From: Mark Clements <mark.clements@kcl.ac.uk>
Subject: Re: Parsing a chemical formal
Message-Id: <421ef359$1@news.kcl.ac.uk>

Luotao Fu wrote:
> Hi All,
> My first post on this Groups, so sorry for any possible stupidity :-)
> I'm wrting since days a perl programm. The programm contains a small
> routine, wich shall parse a chemical formal and return the name and
> portion of single atoms
<snip>

> 
> This works so far, but I'm far not satified with this solution. There
> must be better ways to solve it. which more intelligent RegExp and so
> on. But I'm not quite familiar to RegExps in Perl, so that I can't think
> out any better solution.

You need to check out

man perlre

But you may want to check out Chemistry::FormulaPattern 
(search.cpan.org) if this is a real-life problem rather than just a 
programming exercise.

Try this (is pretty rough and am sure I have missed edge cases) to get 
you started:

bob 881 $ cat testformula.pl
#!/usr/local/bin/perl

use strict;
use warnings;

use Data::Dumper;

my $formula = shift;

my @elements = ();

while($formula =~ s/([A-Z][a-z]?[0-9]*)//){
     push @elements, $1;
};
print Dumper \@elements;

bob 882 $ ./testformula.pl H2SO4
$VAR1 = [
           'H2',
           'S',
           'O4'
         ];


------------------------------

Date: Fri, 25 Feb 2005 18:52:53 +0800
From: GreenLeaf <newspost@kohombanDELETE.net>
Subject: Re: Parsing a chemical formal
Message-Id: <388eekF5jl6hiU1@individual.net>

Abigail wrote:

> I wouldn't use split, just parse what you want to keep. What you want is
> very simple: exactly one capital letter, followed by zero or more lower
> case letters, followed by zero or more numbers. Written as a regex, this
> is:

to OP:

If this is an exercise, considering the real world scenario, you might 
want to consider the rule that an element name is always exactly one 
capital letter followed by _exactly zero or one simple letter_, with the 
exception of elements that start with Uu. I'm assuming here that yours 
is a program for learning, since you admitted to write it 'since days' 
:). Considering these facts will make your re more robust.

You might also want to consider the radicals (such as hydroxyl -OH) 
because they are sure to lead to incorrect results if you just ignore 
parenthesis: for instance Fe(OH)3. You can do this by first capturing 
parenthesis and numbers that follow, then running the same simple rules 
that you used to capture no-parenthesis case for the token within each 
set of parenthesis. Something along the line of

   my @atoms = /((?:\(.+\)|Uu.|[A-Z][a-z]?)\d*)/g;

would work here.

Since Abigail's post clearly gave you almost everything you need to 
know, it would be quite straightforward to implement these simple 
changes. Good luck! :)

Hope this helps,
sat


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 7829
***************************************


home help back first fref pref prev next nref lref last post