[25589] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 7833 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Feb 25 21:05:38 2005

Date: Fri, 25 Feb 2005 18:05:12 -0800 (PST)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 25 Feb 2005     Volume: 10 Number: 7833

Today's topics:
    Re: [perl-python] generic equivalence partition <pf_moore@yahoo.co.uk>
    Re: Comparing huge XML Files <junnuthula@yahoo.com>
    Re: CPAN problem <xrsr@rogerware.com>
    Re: How to decode this unicode-hex string <sun_tong@users.sourceforge.net>
    Re: How to decode this unicode-hex string <Red.Grittybrick@SpamWeary.Foo>
    Re: How to decode this unicode-hex string <sun_tong@users.sourceforge.net>
    Re: How to decode this unicode-hex string <flavell@ph.gla.ac.uk>
    Re: How to generate random emails? kongyew@w-manager.com
    Re: How to NOT use utf8. <pkaluski@piotrkaluski.com>
    Re: How to NOT use utf8. <flavell@ph.gla.ac.uk>
    Re: maximum size of a hash table <postmaster@castleamber.com>
    Re: OOP Tutorial <postmaster@castleamber.com>
    Re: Parsing a chemical formal <postmaster@castleamber.com>
    Re: Pure Perl OpenSSL Library <No_4@dsl.pipex.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 25 Feb 2005 20:19:40 +0000
From: Paul Moore <pf_moore@yahoo.co.uk>
Subject: Re: [perl-python] generic equivalence partition
Message-Id: <uwtsw8k9v.fsf@yahoo.co.uk>

David Eppstein <eppstein@ics.uci.edu> writes:

> In article <1109245733.261643.219420@f14g2000cwb.googlegroups.com>,
>  "Xah Lee" <xah@xahlee.org> wrote:
>
>> parti(aList, equalFunc)
>> 
>> given a list aList of n elements, we want to return a list that is a
>> range of numbers from 1 to n, partition by the predicate function of
>> equivalence equalFunc. (a predicate function is a function that
>> takes two arguments, and returns either True or False.)
>
> In Python it is much more natural to use ranges from 0 to n-1.
> In the worst case, this is going to have to take quadratic time 
> (consider an equalFunc that always returns false) so we might as well do 
> something really simple rather than trying to be clever.

As you say, with the spec as it stands, you can't do better than
quadratic time (although it's O(n*m) where m is the number of
partitions, rather than O(n^2)).

You can do a lot better if you can use a "key" function, rather than
an "equivalence" function, much as list.sort has a "key" argument, and
itertools.groupby (which is pretty close in function to this
partitioning problem) uses a key argument.

In fact, I'd have difficulty thinking of an example where I'd want a
partition function as specified, in Python. In Perl, it makes a lot of
sense, as Perl's array indexing operations lend themselves to slinging
round lists of indices like this. But in Python, I'd be far more
likely to use list.sort followed by itertools.groupby - sort is stable
(so doesn't alter the relative order within equivalence classes), and
groupby then picks out the equivalence classes:

>>> elements = [['x', 'x', 'x', '1'],
 ...             ['x', 'x', 'x', '2'],
 ...             ['x', 'x', 'x', '2'],
 ...             ['x', 'x', 'x', '2'],
 ...             ['x', 'x', 'x', '3'],
 ...             ['x', 'x', 'x', '4'],
 ...             ['x', 'x', 'x', '5'],
 ...             ['x', 'x', 'x', '5']]

>>> # No need to sort here, as the elements are already sorted!

>>> from pprint import pprint
>>> pprint([(k, list(v)) for k, v in groupby(elements, itemgetter(3))])
[('1', [['x', 'x', 'x', '1']]),
 ('2', [['x', 'x', 'x', '2'], ['x', 'x', 'x', '2'], ['x', 'x', 'x', '2']]),
 ('3', [['x', 'x', 'x', '3']]),
 ('4', [['x', 'x', 'x', '4']]),
 ('5', [['x', 'x', 'x', '5'], ['x', 'x', 'x', '5']])]

If you avoid the sort, the whole thing is highly memory efficient, as
well, because by using iterators, we don't ever take a copy of the
original list.

Having cleverly redefined the question so that it fits the answer I
wanted to give, I'll shut up now :-)

Paul.
-- 
To attain knowledge, add things every day; to attain wisdom, remove
things every day. -- Lao-Tse


------------------------------

Date: 25 Feb 2005 14:00:55 -0800
From: "junnuthala" <junnuthula@yahoo.com>
Subject: Re: Comparing huge XML Files
Message-Id: <1109368855.241274.163620@g14g2000cwa.googlegroups.com>


nospam@geniegate.com wrote:
> In: <1109294676.165910.32400@g14g2000cwa.googlegroups.com>,
"junnuthala" <junnuthula@yahoo.com> wrote:
> >Thanks for all the replies.
> >
> >But for a 6MB XML file, having more than 300,000 elements,
XML::Parser
> >module is taking almost 35 minutes to get the parsed result as a
tree.
> >
> >Any suggestions why the XML::Parser is taking so much time to parse
a
> >moderately big file?
>
> You might want it in event-driven mode, I don't know about 300,000
elements,
> but I've seen it saw through very large XML documents at blazing
speed using
> the event driven model. (especially in cases where you're only
interested in
> the attributes, but thats probably not the case here)
>
> Here's a hint for speed: Only use the callbacks you actually need.
>
> Listening to an event will cause it to jump out of it's compiled code
and into
> your perl code. Leaving callbacks undefined (unless you really need
them) will
> avoid this step.
>
> Try taking a pass at it w/out any callbacks turned on, then introduce
your
> callbacks to find the bottlenecks.
>
> If you really need in memory trees, could you maybe break the
document down
> into several smaller ones? It *might* be faster to invent your own
tree
> structures in this case, something optimized for read-only access.
(I've done
> this before but it's kind of time consuming, really only useful in
extreme
> cases, like if you need to compare over and over and over)
>
> Jamie
> --
> http://www.geniegate.com                    Custom web programming
> guhzo_42@lnubb.pbz (rot13)                User Management Solutions


The bottleneck is not in the XML::Parser when I use the "Stream" option
which returns all the tags in XML format itself.

But when I use the "Tree" options and it is processing each tag I am
getting much delay.

I guess I have to use "Stream" option and do my own callback functions
for startTag, endTag, startDocument and endDocument.

anyone have any suggestions on what would be the fastest way ?



------------------------------

Date: Fri, 25 Feb 2005 21:02:06 GMT
From: roger <xrsr@rogerware.com>
Subject: Re: CPAN problem
Message-Id: <Xns960884A6669C8rsrrogerwarecom@207.225.159.8>


Well, yes I do.  But it's the same (NAT) router that sits there for
everything, and through which I can do from the shell things like

  ftp -n ftp://ftp.perl.org/pub/CPAN/authors/01mailrc.txt.gz

and it works fine.  And, as I say, the files are being downloaded
by the CPAN module, it just thinks it is failing...


I think there is something fundamentally wrong with the build of
perl that I'm using from the Services For Unix 3.5 package.

I've got some other strange things that are happening as well that
probably account for some unpredictable behavior like this.
I haven't diagnosed it sufficiently to explain it yet though.

I'll be back if I can boil it down to something sensible.

Thanks.



Robert Sedlacek <phaylon@dunkelheit.at> wrote in 
news:pan.2005.02.25.11.38.48.477674@dunkelheit.at:

> roger wrote:
> 
>> I don't understand why it thinks it is failing to download them when
>> clearly it is succeeding.
> 
> Hm, the only thing I could think of is that you have a router between you
> and the outside world with a firewall or something. If not, I'm afraid
> that I'm out of ideas..
> 



------------------------------

Date: Fri, 25 Feb 2005 15:20:00 -0500
From: * Tong * <sun_tong@users.sourceforge.net>
Subject: Re: How to decode this unicode-hex string
Message-Id: <1109362801.293f1503dc30e53bfb65231ac49912e1@teranews>

On Fri, 25 Feb 2005 11:30:37 -0500, * Tong * wrote:

> When I select from non-English web sites and paste into my emacs,
> sometimes I get a unicode-hex string like this: \u82f1\u6587, which was
> "English" in Big5 encoding. 
> 
> I'm wondering how I can decode such strings and return the 8-bit character. 
> 
> So far I've been looking into the following Perl modules man pages an
> tried each one of them: Unicode::UTF8simple, Unicode::String,
> Unicode::Lite. None of them seems to be able to do that. They handle
> unicode-hex strings like this: "U+00d6 U+00d0 U+00b9 U+00fa". The
> difference between the above representation is that, the \u82f1 represent
> one 8-bit character, while in Perl it is represented in two U+00xx values.
> 
> I had also played with tcl decodings, but wasn't successful. Please help. 

Hi, 

As per the suggestion from phaylon, I gave 'Encode' a try. Maybe I've
missed a very important part, but I still can't decode the unicode string
like \u82f1\u6587, using any of Encode, Unicode::UTF8simple,
Unicode::String, or Unicode::Lite. 

More reading revealed that the "\u82f1\u6587" format is the default form
for Java to use unicode. Maybe I should use Java, but I don't want to if
this problem can be solved in Perl. 

Thanks for your help!

-- 
Tong (remove underscore(s) to reply)
  *niX Power Tools Project: http://xpt.sourceforge.net/
  - All free contribution & collection


------------------------------

Date: Fri, 25 Feb 2005 21:03:15 +0000 (UTC)
From: RedGrittyBrick <Red.Grittybrick@SpamWeary.Foo>
Subject: Re: How to decode this unicode-hex string
Message-Id: <cvo3qj$9k8$1@hercules.btinternet.com>

* Tong * wrote:
> Hi, 
> 
> When I select from non-English web sites and paste into my emacs,
> sometimes I get a unicode-hex string like this: \u82f1\u6587, which was
> "English" in Big5 encoding. 

I'm confused. Unicode and Big5 are completely different aren't they? For 
one thing Unicode is a character set, there are several encodings such 
as UTF-8.

u8251 and u6581 are Chinese characters in Unicode. They are within the 
CJK Unified Ideographs 4E00-9FAF. 
http://www.unicode.org/charts/PDF/U4E00.pdf
Together they form the Chonese word whose English translation is the 
word "English".

> I'm wondering how I can decode such strings and return the 8-bit character. 

An 8-bit character set would surely not be large enough to contain a 
usable subset of the Chinese ideographs. Big 5 has 13,000 ideographs. An 
8-bit character set has room for 256 at most.

When you say "the 8 bit character" are you thinking of something like 
the ISO 8859-1 Latin-1 character set?

Without a Chinese-English dictionary, there's no way to "decode" the two 
Chinese ideograms u8251 u6581 into the seven English letters u0045 u006e 
  u0067 u006C u0069 u0073 u0068

> So far I've been looking into the following Perl modules man pages an
> tried each one of them: Unicode::UTF8simple, Unicode::String,
> Unicode::Lite. None of them seems to be able to do that. They handle
> unicode-hex strings like this: "U+00d6 U+00d0 U+00b9 U+00fa". The
> difference between the above representation is that, 



> the \u82f1 represent one 8-bit character, 

No it doesn't!

while in Perl it is represented in two U+00xx values.

Two U+00xx values represent *TWO* Latin-1 characters.


------------------------------

Date: Fri, 25 Feb 2005 16:31:15 -0500
From: * Tong * <sun_tong@users.sourceforge.net>
Subject: Re: How to decode this unicode-hex string
Message-Id: <1109367076.f22443b18a9b78dda44a6ef7e6cb7a1e@teranews>

Thanks for the reply. 

On Fri, 25 Feb 2005 21:03:15 +0000, RedGrittyBrick wrote:

>> the \u82f1 represent one 8-bit character, 
> 
> No it doesn't!
> 
> while in Perl it is represented in two U+00xx values.
> 
> Two U+00xx values represent *TWO* Latin-1 characters.

Yeah, I stated wrong. It should read

the \u82f1 represent one Chinese character, which is in two 8-bit
characters

Any way, I figured out a way to do it, without any the aforementioned
unicode packages.

Thanks for clear things up.


-- 
Tong (remove underscore(s) to reply)
  *niX Power Tools Project: http://xpt.sourceforge.net/
  - All free contribution & collection


------------------------------

Date: Fri, 25 Feb 2005 21:42:38 +0000
From: "Alan J. Flavell" <flavell@ph.gla.ac.uk>
Subject: Re: How to decode this unicode-hex string
Message-Id: <Pine.LNX.4.61.0502252138210.30865@ppepc56.ph.gla.ac.uk>

On Fri, 25 Feb 2005, * Tong * wrote:

> the \u82f1 represent one Chinese character,

Yes

> which is in two 8-bit characters

No way.  As written, it's six *characters*.  Encoded, it might be
two *bytes* (depends on the encoding).

> Any way, I figured out a way to do it, without any the 
> aforementioned unicode packages.

But you're not going to tell us what it is?


------------------------------

Date: 25 Feb 2005 15:18:47 -0800
From: kongyew@w-manager.com
Subject: Re: How to generate random emails?
Message-Id: <1109373527.693118.265330@g14g2000cwa.googlegroups.com>

Thanks. I will try using postal , smtp benchmark

kong
www.w-manager.com



------------------------------

Date: Fri, 25 Feb 2005 21:21:54 +0100
From: pkaluski <pkaluski@piotrkaluski.com>
Subject: Re: How to NOT use utf8.
Message-Id: <cvo1l8$bis$1@nemesis.news.tpi.pl>

Alan J. Flavell wrote:
 >
 > (...) I'd hoped for more detail so that the
 > real problem could be understood...
 >
 > (...) you asked what is really a very complicated question -
 > especially considering that it was almost entirely lacking any context
 > in terms of problem domain, circumstances, external modules called,
 > etcetera etcetera etcetera.
 >
 > If you're processing text, then you *need* to know what encoding has
 > been used.  If you're processing binary data, then you shouldn't be
 > treating it as text.  That's been my attitude since, well, around 1965
 > I suppose it was, when I first grasped the difference, although I'd
 > been doing it - in a sense - without realising the point, since I met
 > my first computer in 1958.
 >
 >
 > (...) I asked you several supplementary
 > questions, to help in understanding the problem in its context - but
 > which you have chosen - it seems - to ignore.
 >
 > good luck

OK. I can now provide you with some details.
I did not place details in my first post, because my problem was initialy 
happening in my big script which I couldn't post because it was to big, using 
too many modules. I had some indications that my problems are due to Unicode. So 
my thought was - "OK, the easiest way would be to make perl work as if there is 
no such think like unicode". And it was my question - is it possible to make 
perl totaly Unicode unaware. Since my script is supposed to run under Windows, I 
added the Windows part to my question in case there is something system specific.

Now I can provide you with some details, since I managed to separate the problem 
and recreate it in the smaller script.

The problem was that Carp::cluck was crashing my script. Crashing in a nasty, 
uncontrolled way so Windows were killing it. What was more interesting, the 
thing was happening only when running my script under debugger (which is also 
scary - if something fails on debuger and works without it could be an 
indication that something is terribly screwed).

When I tried to spot the problem, I have found that one of regular expressions 
in Carp::format_arg function, called by cluck, jumps to other chunk of code. See 
below (I've attached a call stack):

   DB<2>Carp::caller_info(C:/Perl/lib/Carp/Heavy.pm:62):
62:       $arg =~ s/([[:cntrl:]]|[[:^ascii:]])/sprintf("\\x{%x}",ord($1))/eg;
   DB<2> s
utf8::SWASHNEW(C:/Perl/lib/utf8_heavy.pl:21):
21:         my ($class, $type, $list, $minbits, $none) = @_;
   DB<3> T
$ = utf8::SWASHNEW('utf8', '', '# comment^J+utf8::IsCntrl^J', 1, 0) called from
file `C:/Perl/lib/Carp/Heavy.pm' line 62
@ = Carp::format_arg('After value1') called from file `C:/Perl/lib/Carp/Heavy.pm
' line 31
@ = Carp::caller_info(3) called from file `C:/Perl/lib/Carp/Heavy.pm' line 142
@ = Carp::ret_backtrace(2, 'After value1') called from file `C:/Perl/lib/Carp/He
avy.pm' line 125
@ = Carp::longmess_heavy('After value1') called from file `C:/Perl/lib/Carp.pm'
line 235
@ = Carp::longmess('After value1') called from file `C:/Perl/lib/Carp.pm' line 2
72
 . = Carp::cluck('After value1') called from file `test2.pl' line 11
   DB<12>

See? Steping on substitution operator moves me to utf8 module. And when stepping 
further I was getting messages about malformed UTF-8.
BTW, comment in Carp::format_arg function says:

(Carp/Heavy.pm)
59  # The following handling of "control chars" is direct from
60  # the original code - I think it is broken on Unicode though.
61  # Suggestions?
62  $arg =~ s/([[:cntrl:]]|[[:^ascii:]])/sprintf("\\x{%x}",ord($1))/eg;

So the author suggests that there may be a problems for unicode, and he seams
to be right.

The code snippet below makes perl crash (at least for me)

--- CODE STARTS ---
use strict;
use XML::Simple;
use Carp qw( cluck );

cluck "Before";

my $str = XMLin( "input.xml" );
my $msg = "After " . $str->{ 'tag1' }->{ 'attr1' };
cluck $msg;
--- CODE ENDS ---

The input.xml file is simple:

--- INPUT.XML STARTS ----
<opt>
     <tag1 attr1="value1"/>
</opt>
--- INPUT.XML ENDS ----

In order to have the crash effect, you have to run perl under debbuger. Like this:

##########################

M:\temp\unicode>perl -d test2.pl

Loading DB routines from perl5db.pl version 1.28
Editor support available.

Enter h or `h h' for help, or `perldoc perldebug' for more help.

main::(test2.pl:6):     cluck "Before";
   DB<1> c
Before at test2.pl line 6
  at test2.pl line 6

M:\temp\unicode>

###############################

It didn't make it to the end. It crashed.

If I get rid of unicode flag from the $msg it will work:

--- CODE STARTS ---
use strict;
use XML::Simple;
use Carp qw( cluck );

cluck "Before";

my $str = XMLin( "input.xml" );
my $msg = "After " . $str->{ 'tag1' }->{ 'attr1' };
require Encode;
Encode::_utf8_off( $msg );
cluck $msg;
--- CODE ENDS ---

Of course I have tried all this stuff with PERLIO=:bytes.

After this experiments I think I can make my first question more clear (I hope) 
- Can you make perl totally unaware of such thing like Unicode?

And I believe that the answer is - You can't. Perl has unicode support in its 
guts. The only things you can manipulate are:

* You can make perl to treat unicode as bytes durring reading and writing(by 
PERLIO and some pragmas)
* You can reset the UTF-8 flag in a string.

But if you are about to write something bigger, using many modules, then Alan is 
right - it is more efficient to adjust your code to unicode, instead of avoiding it.

In order to avoid it you would have to control each string produced by any 
module and downgrade it to bytes. This approach is infeasible even for medium 
size projects.

In the scripts above XML::Simple returns Unicode strings (even is Unicode is not 
needed and PERLIO=:bytes).

Is my reasoning correct?
And what is wrong with this regular expression used indirectly by cluck, that it 
makes perl crash?


-- 
Piotr Kaluski

"It is the commitment of the individuals to excellence,
their mastery of the tools of their crafts, and their
ability to work together that makes the product, not rules."
("Testing Computer Software" by Cem Kaner, Jack Falk, Hung Quoc Nguyen)



------------------------------

Date: Fri, 25 Feb 2005 21:37:09 +0000
From: "Alan J. Flavell" <flavell@ph.gla.ac.uk>
Subject: Re: How to NOT use utf8.
Message-Id: <Pine.LNX.4.61.0502252125050.30865@ppepc56.ph.gla.ac.uk>

On Fri, 25 Feb 2005, pkaluski wrote:

[..]

> (Carp/Heavy.pm)
> 59  # The following handling of "control chars" is direct from
> 60  # the original code - I think it is broken on Unicode though.
> 61  # Suggestions?
> 62  $arg =~ s/([[:cntrl:]]|[[:^ascii:]])/sprintf("\\x{%x}",ord($1))/eg;
> 
> So the author suggests that there may be a problems for unicode, 

Point noted; and I see this same code and comment in the version that 
I'm using on Windows; but...

> use strict;
> use XML::Simple;
> use Carp qw( cluck );
> 
> cluck "Before";
> 
> my $str = XMLin( "input.xml" );
> my $msg = "After " . $str->{ 'tag1' }->{ 'attr1' };
> cluck $msg;
> --- CODE ENDS ---
> 
> The input.xml file is simple:
> 
> --- INPUT.XML STARTS ----
> <opt>
>     <tag1 attr1="value1"/>
> </opt>
> --- INPUT.XML ENDS ----
> 
> In order to have the crash effect, you have to run perl under debbuger. Like
> this:
> 
> M:\temp\unicode>perl -d test2.pl


Sorry, I can't reproduce this behaviour in ActivePerl 5.8.1
on Win2K.

I tried saving the data in DOS format as well as in unix format, just 
in case this was a relevant issue; but neither of them caused a 
problem:


main::(notutf8.pl:7):   cluck "Before";
  DB<1> c
Before at notutf8.pl line 7
After value1 at notutf8.pl line 11
Debugged program terminated.  Use q to quit or R to restart,


I'm not saying that you haven't got a point; just that I can't
yet reproduce the problem that you're reporting.  Any thoughts on 
relevant differences?



------------------------------

Date: 25 Feb 2005 20:18:12 GMT
From: John Bokma <postmaster@castleamber.com>
Subject: Re: maximum size of a hash table
Message-Id: <Xns96089180237C2castleamber@130.133.1.4>

Sherm Pendley wrote:

> John Bokma wrote:
> 
>> I am more than aware of that, (they teach those things in Utrecht,
>> you know ;-) ). I didn't state it was a contradiction, but only that
>> O(n) hash look up is not what I want out of a hash table.
> 
> You could use your own hashing function, if Perl's built-in gives you
> such poor results.

The poor result I was talking about was when the 32 bit limit on the hash 
code results in O(n) hash look ups because there is a lot of data in the 
hash. :-D.

> XS code can pass a hash value to hv_fetch_ent() and
> hv_store_ent() - normally you'd pass 0 to have Perl calculate it for
> you using its built-in, but that's just a convenience, not a
> requirement. 
> 
> You could write a couple of fetch/store routines in C, export them to
> Perl, and provide a nice %hash wrapper for them with tie(). The rest
> of your Perl code would never need to know the difference.

Thanks. I never had this problem, but it's good to know.

-- 
John                   Small Perl scripts: http://johnbokma.com/perl/
               Perl programmer available:     http://castleamber.com/
            Happy Customers: http://castleamber.com/testimonials.html
                        


------------------------------

Date: 25 Feb 2005 20:34:02 GMT
From: John Bokma <postmaster@castleamber.com>
Subject: Re: OOP Tutorial
Message-Id: <Xns9608942F45F07castleamber@130.133.1.4>

Brian McCauley wrote:

> Calling a constuctor on an existing object is generally considered a
> BAD THING.

huh?

I remember having seen:

$a = $b->new;

here and there in books. I never use it, but I wouldn't call something a
bad thing in general. 

> Having a constuctor that can be called on an existing object and 
> does something other than act as a (copy) constructor is an even worse
> thing. [...]

You mean:

$a = $b->new; and not copying all data from b to a? Depends of course on
what you do. I don't think something is a bad thing per se if it is used
in a way that is documented and not confusing. 

> Using the name super_animal of a subclass of animal is confusing. 
> (Since SUPER is also used to mean parent class).

In a tutorial you mean, agreed. In general it's often hard to model a
real world on OO. For example a menu as a specialisation of a window can
be confusion ( it's a window with less, instead of more :-D ). 

-- 
John                   Small Perl scripts: http://johnbokma.com/perl/
               Perl programmer available:     http://castleamber.com/
            Happy Customers: http://castleamber.com/testimonials.html
                        


------------------------------

Date: 25 Feb 2005 20:25:32 GMT
From: John Bokma <postmaster@castleamber.com>
Subject: Re: Parsing a chemical formal
Message-Id: <Xns960892BC8C55Ccastleamber@130.133.1.4>

Ted Zlatanov wrote:

> On 25 Feb 2005, luotao@kammer.uni-hannover.de wrote:
> 
>> I'm wrting since days a perl programm. The programm contains a small
>> routine, wich shall parse a chemical formal and return the name and
>> portion of single atoms
>> in the material as a array(or a hash) 
> ...
>> The $molecule contains the formal (i.E. H2O, FeCl3 or CaCl), Every
>> Beginning letter of a element ist written in upper case.  As you can
>> see, I split first the $molecule with Letters in upper case, which
>> means FeCl3 turns into {F,e,C,l3}, than I scan the splitted list,
>> which is stored in the array @Literal, for capital letters, every
>> capital letter will be pushed in a temporary Array. If the following
>> item in array is not written in upper case, which means, that the
>> Name of the atom contains more than one letter, it'll be also pushed
>> in the same temporary Array, which will be later joined and puted in
>> the output array. The final result of the Formal H20 should be
>> {H2,O}, FeCl3 {Fe,Cl3} and so on....
> 
> I think you are not doing this correctly.
> 
> You are not parsing random letters, you are parsing chemical
> elements' names in sequence.  So don't just say "split on a letter."
> Build a dictionary of element names (it's a finite list, although you
> can anticipate new elements may need to be added at the end).
> Something like this:
> 
> my %elements = { H  => { number => 1, extra => data => you => need },
>                  He => { number => 2, ...},
>                  ...
>                };
> 
> Then, build your regular expression to match elementa from your
> %element hash.

If you can assume that only valid formulaes are given to the program, 
[A-Z][a-z]?\d* sounds sufficient to me.

If you really want to check validity you can capture the [A-Z][a-z]? 
part and look it up in a hash. Moreover, if some letters are not 
possible (for example x), you could remove them from the character class 
(and making the program harder to read, I guess).

> will generate a suitable parse tree for you, which will be a lot more
> functional that your {H2,O} format.

One lesson I learned the hard way: never make your program more 
funcional than the requirements. I.e. if you need cat, don't write 
OpenOffice :-D.

-- 
John                   Small Perl scripts: http://johnbokma.com/perl/
               Perl programmer available:     http://castleamber.com/
            Happy Customers: http://castleamber.com/testimonials.html
                        


------------------------------

Date: Sat, 26 Feb 2005 00:35:51 +0000
From: Big and Blue <No_4@dsl.pipex.com>
Subject: Re: Pure Perl OpenSSL Library
Message-Id: <opSdnSO31Zf7WYLfRVnyig@pipex.net>

Marc wrote:
 >
> I'm developping a software that needs to act as a Certificate
> Authority. I must use Perl for this.

    An odd pre-requisite if it stops you achieving your actual goal.

> I would like to avoid forking at each certificate request as there will
> be several requests within seconds. The problem is that every SSL
> modules I can find for Perl are using the openssl command line.

    My suspicion is that if you are worried about the cost of forking then 
you're looking at the wrong thing.  I assume you are intending that this 
system be generating certificates?  If so, then the resources for that (in 
particular its random/prime number generating) will make any forking 
resource demands pale into insignificance.

> Can someone point me to/give me the name of a projet that has (even if
> not complete) a pure Perl/C OpenSSL library?
> 
> I would be very surprised if no such project exist...but who knows? :)

    Why would you be surprised?  Perhaps others see that it would be a lot 
of work for almost no gain?  The openssl command already exists.  Perl has 
adequate ways to run external commands.


-- 
              Just because I've written it doesn't mean that
                   either you or I have to believe it.


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 7833
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[25589] in Perl-Users-Digest

Perl-Users Digest, Issue: 7833 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Fri Feb 25 21:05:38 2005

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Feb 25 21:05:38 2005