[28187] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 9551 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Aug 2 18:10:19 2006

Date: Wed, 2 Aug 2006 15:10:09 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Wed, 2 Aug 2006     Volume: 10 Number: 9551

Today's topics:
    Re: Perl hash of hash efficiency. <benmorrow@tiscali.co.uk>
    Re: Perl hash of hash efficiency. xhoster@gmail.com
    Re: Perl hash of hash efficiency. <mritty@gmail.com>
    Re: Perl hash of hash efficiency. xhoster@gmail.com
    Re: Perl hash of hash efficiency. <mritty@gmail.com>
    Re: Perl hash of hash efficiency. <benmorrow@tiscali.co.uk>
    Re: Perl hash of hash efficiency. <yekasi@gmail.com>
    Re: Perl hash of hash efficiency. <yekasi@gmail.com>
    Re: Perl hash of hash efficiency. <benmorrow@tiscali.co.uk>
    Re: Perl hash of hash efficiency. <mritty@gmail.com>
    Re: Perl hash of hash efficiency. <benmorrow@tiscali.co.uk>
        Perl/PHP Web Software Architects Needed <robertcopple123@gmail.com>
    Re: Perl/PHP Web Software Architects Needed usenet@DavidFilmer.com
    Re: Q: ActivePerl - calling an ActiveX object bubbabubbs@yahoo.com
    Re: Q: ActivePerl - calling an ActiveX object <1usa@llenroc.ude.invalid>
    Re: Recursion <koko_loko_0@yahoo.co.uk>
    Re: Recursion <mritty@gmail.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 2 Aug 2006 19:06:30 +0100
From: Ben Morrow <benmorrow@tiscali.co.uk>
Subject: Re: Perl hash of hash efficiency.
Message-Id: <6667q3-q7c.ln1@osiris.mauzo.dyndns.org>


Quoth "tak" <yekasi@gmail.com>:
> 
> xhoster@gmail.com wrote:
> > "tak" <yekasi@gmail.com> wrote:
> > > Hi,
> > >
> > > I have a script, that loads a txt file, with 240k lines in it to a hash
> > > currently. And when it loads the data to the hash - it becomes slower
> > > and slower when it reaches may be around 150k
> >
> > How much memory do you have?  How much are you using at this point?
> 
> >From looking at the PF Usage - it is about 1.9 GB. on a 1gb machine -
> the available physical memory are down to about 10 MB when loading...
> But the CPU usage remains about 5% only...

So, you are thrashing. You've run out of memory: I would suggest using
one of the DBM modules, probably DB_File. This stores the contents of
the hash in a (structured, binary, fast-to-index) file on disk, which
will probably make things faster.

> > > (probably due to
> > > collision, since perl's hash uses linear chaininig...).
> >
> > That is a rather unlikely bit of speculation, especially on a modern Perl.
> > How many buckets does your hash have and use?  (print scalar %hash).
> 
> I have 1 main_hash, which stores 27 hashes in it. And out of each 27
> hashes, it averages about 9k unique strings.    print scalar %hash
> reports,  23/32. What does this number mean?

From perldoc perldata:

| If you evaluate a hash in scalar context, it returns false if the hash
| is empty. If there are any key/value pairs, it returns true; more
| precisely, the value returned is a string consisting of the number of
| used buckets and the number of allocated buckets, separated by a slash.
| This is pretty much useful only to find out whether Perl's internal
| hashing algorithm is performing poorly on your data set. For example,
| you stick 10,000 things in a hash, but evaluating %HASH in scalar
| context reveals "1/16", which means only one out of sixteen buckets has
| been touched, and presumably contains all 10,000 of your items. This
| isn't supposed to happen.

(Note that this is not meant as a rebuke: noone can be expected to have
all the arcana in Perl's std docs memorized. It is meant so that you may
remember where to find it next time :). )

So, your main hash is using 23 buckets to store your 27 subhashes... not
such a useful thing to know :). The real question is, how many buckets
does your original hash (with all the data in it) use? For instance, on
my perl

    my %h;

    for (1..240_000) {
        $h{$_} = 1;
    }

    print scalar %h;

prints '157199/262144', so the hash is using 157199 buckets, and each
bucket has on average 240000/157199 ~~ 1.5 entries in it, which should
not be a problem.

> > When the facts don't fit your theory, re-examing your theory.  You probably
> > have a swapping problem, not a hash collision problem.  And if you do have
> > a collision problem, the better way to fix it would be to start out with a
> > higher number of buckets, by assigning to the keys function.
> >
> 
> Can you elaborate on what you mean by a swapping problem?

Your system has started thrashing: the working set (the pages in current
use) has exceeded the size of physical memory, and the system is
spending all its time swapping things in and out.

> And I thought
> about assigning higher number of bucket to the hash itself , but i
> cannot find the related function to set that... I am a Java programmer,
> and this is my first perl script.. I tried looking into the constructor
> for the hash itself, but it doesnt seem like it accepts argument...?

The next para after my previous quote:

| You can preallocate space for a hash by assigning to the keys()
| function. This rounds up the allocated buckets to the next power of two:
|
|     keys(%users) = 1000;                # allocate 1024 buckets

> Last question,
> 
> How Do you delete an element within a hoh? Say i have a hash of hash,
> like the following.
> 
> my %hoh();

Did you even try this? Perl Is Not Java: this is a syntax error. You
don't need the parens.

> loop() {      # say this is the loop of each line of my txtFile

What is this 'loop()'? Have you been reading about Perl6? Or did you
mean

    sub loop {

?

> my $value = "TheRecordFromMyTxtFile";

(You really want to sort out your indentation. Makes life easier for
both you and us.)

> my $letter = substr $value, 0, 1;  # say, i am using the first letter
> as the key for subhash.
> my $myKey = substr $value, 5, 9;  # Say position 5 - 9 is the key for
> the element.
> $hoh{$letter}{$myKey} = $value
> }
> 
> 
> Now, I want to delete a particular value from one of the subhash...
> 
> I tried doing this,
> 
> delete $hoh{$letter}{$value};

That's correct (assuming $value corresponds to $myKey in the above, not
to $value there: that is, you delete an element by specifying its key).

> But it doesnt seem like it is deleting... B/c if I try to get the
> length of the $hoh{$letter}, it still reports the same number...

You really need to learn some basic Perl. I'd recommend a book:
'Learning Perl' published by O'Reilly is universally recommended as a
good place to start. An alternative would be to read through the
perldocs, but that's not an easy way to learn.

length (see perldoc -f length) treats its argument as a string and
returns the length of that string. $hoh{$letter} contains a hash
*reference*: see perldoc perldsc and perldoc perlreftut for how
multi-level data structures are implemented in Perl. Or, again, a decent
book will cover it. Now, when you stringify a hash ref, you get
something that looks like 'HASH(0x80142180)', which is basically
useless, and is always the same length.

To find the number of keys in a hash, you do as it says in perldoc -f
length: 'scalar keys %hash'. This is somewhat complicated by the fact
that what you have is not a hash but a hash ref, so we apply 'Use Rule
1' from perlreftut:

    # an ordinary hash
    print scalar keys %hash;

    # replace the var name with { }
    print scalar keys %{ }

    # put the hashref inside the braces
    print scalar keys %{ $hoh{$letter} };

Yes, I agree this is a little icky, but that's what you get when you
graft complex data structures onto a language (Perl4) that doesn't
really support them :).

A useful tool for examining data structures is the module Data::Dumper
(obviously, you want to run a test on a smaller dataset rather than
dumping a hash of 240k entries).

Ben

-- 
           All persons, living or dead, are entirely coincidental.
benmorrow@tiscali.co.uk                                           Kurt Vonnegut


------------------------------

Date: 02 Aug 2006 18:03:48 GMT
From: xhoster@gmail.com
Subject: Re: Perl hash of hash efficiency.
Message-Id: <20060802141150.642$2i@newsreader.com>

"tak" <yekasi@gmail.com> wrote:
> xhoster@gmail.com wrote:
> > "tak" <yekasi@gmail.com> wrote:
> > > Hi,
> > >
> > > I have a script, that loads a txt file, with 240k lines in it to a
> > > hash currently. And when it loads the data to the hash - it becomes
> > > slower and slower when it reaches may be around 150k
> >
> > How much memory do you have?  How much are you using at this point?
>
> >From looking at the PF Usage - it is about 1.9 GB. on a 1gb machine -
> the available physical memory are down to about 10 MB when loading...
> But the CPU usage remains about 5% only...

If I read this right, you are using 1.9GB of virtual memory but you only
have 1.0GB of RAM, and so have swapped out 900MB?  And you have only 10MB
of *free* memory?  That means you are probably swapping like crazy, and
thus the CPU load is low as it spends most of the time waiting for disk.

> >
> > > (probably due to
> > > collision, since perl's hash uses linear chaininig...).
> >
> > That is a rather unlikely bit of speculation, especially on a modern
> > Perl. How many buckets does your hash have and use?  (print scalar
> > %hash).
> >
>
> I have 1 main_hash, which stores 27 hashes in it. And out of each 27
> hashes, it averages about 9k unique strings.    print scalar %hash
> reports,  23/32. What does this number mean?

There are 32 buckets, of which 23 have at least one value.  So there is
either 1 five-way collision, or 4 two-way collisions, or somewhere in
between.  But that doesn't really tell you much.  I meant for you to get
this number on the one-big-hash implementation, before you changed over to
the hash of hashes.

> > > So, i am thinking of implementing it in hash of hash... using the
> > > first letter of these records, and divide them into 26 sub hashes.
> > > (Since these records' first letter are pretty random from A-Z).
> > >
> > > So, I tried it, the performance is about the same... why??
> >
> > Solving the imagined problem rarely solves the real problem.  (And if
> > that were the problem it was that easy to fix, don't you think Perl
> > would already have made that fix itself in the core hashing code?)
> >
>
> So, what do you suggest?

Identifying the real problem first :)

Right now, I'd say that that is memory.  So you could try to find a more
memory efficient way to hold your data.  Or make a multi-pass approach.
Or maybe tie the hash to disk explicitly.  Or use a database to hold the
data.  (Or buy more memory)


>
> > When the facts don't fit your theory, re-examing your theory.  You
> > probably have a swapping problem, not a hash collision problem.  And if
> > you do have a collision problem, the better way to fix it would be to
> > start out with a higher number of buckets, by assigning to the keys
> > function.
> >
>
> Can you elaborate on what you mean by a swapping problem?

Your OS has a virtual memory system that lets you use more memory than
you actually have, by swapping/paging out "unused" memory to disk.  But it
can get very, very slow if you are actively using more pages than fit in
real memory.

(Paul answered the rest of your questions)

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service                        $9.95/Month 30GB


------------------------------

Date: 2 Aug 2006 11:21:01 -0700
From: "Paul Lalli" <mritty@gmail.com>
Subject: Re: Perl hash of hash efficiency.
Message-Id: <1154542861.705923.13940@m73g2000cwd.googlegroups.com>

tak wrote:
> Paul Lalli wrote:
> > tak wrote:
> > > xhoster@gmail.com wrote:
> > > > "tak" <yekasi@gmail.com> wrote:
> > > I have 1 main_hash, which stores 27 hashes in it. And out of each 27
> > > hashes, it averages about 9k unique strings.    print scalar %hash
> > > reports,  23/32. What does this number mean?
> >
> > It means that Perl has allocated 32 buckets for this hash, and that 23
> > of them are currently in use.  So, only 4 collissions in the "main"
> > hash.
>
> Why 4 collision? Do you mean 32 - 23 = 9?? or b/c you knew that i have
> 27 subhashes, so 27 - 23??

Yes.  You have 27 values in your hash.  Your hash is using 23 buckets.
Therefore, 4 buckets are being used twice.

> Without using the 27 hash of hashes, print
> scalar %mainHash reports 16k / 23k. (Of course - it reported the
> number, but I didnt take them down, just remember its 16k and 23k) That
> is a hugh amount of collision.

So it would seem.

> I tried to do this, keys %main_hash = 300000; but it is still running
> slow when it reaches 150000... perhaps it is not the collision problem,
> as xho mentioned?

That would be my guess.  Again, it seems to be your theory that is
faulty.  You assumed that the slowness was caused by collissions.  The
facts do not support that theory.

> > > Now, I want to delete a particular value from one of the subhash...
> > >
> > > I tried doing this,
> > >
> > > delete $hoh{$letter}{$value};
> >
> > That is precisely how you delete that particular value from that
> > particular "subhash".
>
> Say I have the key of subhash, as $letter, and the item in subhash as,
> $value.
>
> delete $hoh{$letter}{$value};

Your code does not match your description.  In the above $letter is a
key of the *main* hash, and $value is a key in the subhash
%{$hoh{$letter}}.

> This should delete that from the hash, right?

That would delete the key/value pair in the hash %{$hoh{$letter}} which
has the key $value.

> Say I want to look at the value of in this key --
> $hoh{$letter}{$value}, how do you print it?
> I tried, print $hoh{$letter}{$value}; - but it prints nothing....

Then that key does not exist in that hash, or that key's value in the
hash is the empty string (or the undefined value).  Are you using
warnings?  If so, printing a non-existing element of a hash should give
you a warning.  Please enable them if you are not.  Again, your
description above is not matching your code.

It's time for you to show some *actual* code, so we can better help
you.  Please post a short-but-complete script that demonstrates one or
more of the failures you are encountering.

Also, it would be appreciated if you trimmed your replies down to only
include the relevant bits of quoted material.  Thank you.

Paul Lalli



------------------------------

Date: 02 Aug 2006 18:16:57 GMT
From: xhoster@gmail.com
Subject: Re: Perl hash of hash efficiency.
Message-Id: <20060802142459.276$5N@newsreader.com>

"tak" <yekasi@gmail.com> wrote:
>
> Without using the 27 hash of hashes, print
> scalar %mainHash reports 16k / 23k. (Of course - it reported the
> number, but I didnt take them down, just remember its 16k and 23k) That
> is a hugh amount of collision.

How many things where in the hash at the time you did the
print scalar %mainHash?  (print scalar keys %mainHash).  If it had 150K
things in it at the time, than there are 150K/16K or about 10 entries per
bucket. Higher than I would expect but not aweful.

> >
> > I don't know what you were trying to give arguments to.  If you mean
> > simply the `my` keyword, then you are correct - you cannot pre-allocate
> > buckets when you declare the hash.  Instead, declare it, and then
> > assign buckets, using the keys function as described above.
>
> I tried to do this, keys %main_hash = 300000; but it is still running
> slow when it reaches 150000... perhaps it is not the collision problem,
> as xho mentioned?

No, it probably isn't collisions that is the problem.  But to be sure (but
mostly out of curiousity), what did print scalar %main_hash give you after
you loaded 150000 into it with this pre-allocation?

> > > I tried doing this,
> > >
> > > delete $hoh{$letter}{$value};
> >
> > That is precisely how you delete that particular value from that
> > particular "subhash".
> >
>
> Say I have the key of subhash, as $letter, and the item in subhash as,
> $value.
>
> delete $hoh{$letter}{$value};
>
> This should delete that from the hash, right?

Yes.
 ...

> Say I want to look at the value of in this key --
> $hoh{$letter}{$value}, how do you print it?
> I tried, print $hoh{$letter}{$value}; - but it prints nothing....

Um, is this before or after you deleted it?

Are you using warnings?  If so, did you get an uninitialized value
warning?


Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service                        $9.95/Month 30GB


------------------------------

Date: 2 Aug 2006 11:24:57 -0700
From: "Paul Lalli" <mritty@gmail.com>
Subject: Re: Perl hash of hash efficiency.
Message-Id: <1154543097.321461.133330@75g2000cwc.googlegroups.com>

Ben Morrow wrote:
> Quoth "tak" <yekasi@gmail.com>:
> > But it doesnt seem like it is deleting... B/c if I try to get the
> > length of the $hoh{$letter}, it still reports the same number...
>
> You really need to learn some basic Perl. I'd recommend a book:
> 'Learning Perl' published by O'Reilly is universally recommended as a
> good place to start.

> $hoh{$letter} contains a hash
> *reference*: see perldoc perldsc and perldoc perlreftut for how
> multi-level data structures are implemented in Perl. Or, again, a decent
> book will cover it.

Yet, ironically, not the book you recommended. :-(  Once the OP has
finished with "Learning Perl", he should probably move on to
"Intermediate Perl", which does cover multi-level data structures.

Paul Lalli



------------------------------

Date: Wed, 2 Aug 2006 19:32:39 +0100
From: Ben Morrow <benmorrow@tiscali.co.uk>
Subject: Re: Perl hash of hash efficiency.
Message-Id: <7n77q3-ofc.ln1@osiris.mauzo.dyndns.org>


Quoth "tak" <yekasi@gmail.com>:
> 
> Paul Lalli wrote:
> > tak wrote:
> > > xhoster@gmail.com wrote:
> > > > "tak" <yekasi@gmail.com> wrote:
> > > I have 1 main_hash, which stores 27 hashes in it. And out of each 27
> > > hashes, it averages about 9k unique strings.    print scalar %hash
> > > reports,  23/32. What does this number mean?
> >
> > It means that Perl has allocated 32 buckets for this hash, and that 23
> > of them are currently in use.  So, only 4 collissions in the "main"
> > hash.
> 
> Why 4 collision? Do you mean 32 - 23 = 9?? or b/c you knew that i have
> 27 subhashes, so 27 - 23??

The latter. Say you had a hash with 5 entries, and scalar %hash printed
2/4 (it wouldn't actually, of course, perl would use 5 buckets, but say
it did). Then you have four buckets, and two of them contain your five
entries, perhaps like

  | 5 |   |   |   |
  | 3 | 4 |   |   |
  | 1 | 2 |   |   |
  +---+---+---+---+

So on average perl has to search through 5/2 = 2.5 entries once it has
located the right bucket (assuming the hashing function is fair, which
in general perl's is). Perl will tell you this if you ask nicely:

    print( (scalar keys %hash) / (scalar %hash) );

This will give a warning about converting '23/32' to a number (you are
using warnings, right?), but you can ignore that if it's just for
debugging.

> Without using the 27 hash of hashes, print
> scalar %mainHash reports 16k / 23k. (Of course - it reported the
> number, but I didnt take them down, just remember its 16k and 23k) That
> is a hugh amount of collision.

This is your hash with 240k entries in it? That's still only an average
of 15 elements per bucket. Still, you could try preallocating and see
if it makes a difference.

> > I don't know what you were trying to give arguments to.  If you mean
> > simply the `my` keyword, then you are correct - you cannot pre-allocate
> > buckets when you declare the hash.  Instead, declare it, and then
> > assign buckets, using the keys function as described above.
> 
> I tried to do this, keys %main_hash = 300000;

There's little point in creating more buckets than you will have
elements. That's just a waste of space. Note that Perl allows you to
write '300000' as '300_000', which is much more readable.

> but it is still running
> slow when it reaches 150000... perhaps it is not the collision problem,
> as xho mentioned?

Indeed.

> > > Now, I want to delete a particular value from one of the subhash...
> > >
> > > I tried doing this,
> > >
> > > delete $hoh{$letter}{$value};
> >
> > That is precisely how you delete that particular value from that
> > particular "subhash".
> 
> Say I have the key of subhash, as $letter, and the item in subhash as,
> $value.
> 
> delete $hoh{$letter}{$value};
> 
> This should delete that from the hash, right?
<snippage>

> Say I want to look at the value of in this key --
> $hoh{$letter}{$value}, how do you print it?
> I tried, print $hoh{$letter}{$value}; - but it prints nothing....

Both those examples are correct, as far as I can tell. I think you have
some confusion somewhere else (and I'm still a little worried by your use
of '$value' to refer to a key: are you confusing keys and values?). Can
you show us a short complete script, with (a short set of) input data,
that doesn't do what you expect? ('Short' in this case probably means
<15 lines.)

Ben

-- 
"If a book is worth reading when you are six,         * benmorrow@tiscali.co.uk
it is worth reading when you are sixty."  [C.S.Lewis]


------------------------------

Date: 2 Aug 2006 11:46:46 -0700
From: "tak" <yekasi@gmail.com>
Subject: Re: Perl hash of hash efficiency.
Message-Id: <1154544406.449694.167990@m73g2000cwd.googlegroups.com>

> > Without using the 27 hash of hashes, print
> > scalar %mainHash reports 16k / 23k. (Of course - it reported the
> > number, but I didnt take them down, just remember its 16k and 23k) That
> > is a hugh amount of collision.
>
> So it would seem.
>

Sorry - i had a typo - i meant, 160k, and 230k... not 16k and 23k..


> It's time for you to show some *actual* code, so we can better help
> you.  Please post a short-but-complete script that demonstrates one or
> more of the failures you are encountering.
>

I will put one up in 10 minutes.

> Also, it would be appreciated if you trimmed your replies down to only
> include the relevant bits of quoted material.  Thank you.
> 

Ok...Sorry.



------------------------------

Date: 2 Aug 2006 11:55:01 -0700
From: "tak" <yekasi@gmail.com>
Subject: Re: Perl hash of hash efficiency.
Message-Id: <1154544901.488083.96570@p79g2000cwp.googlegroups.com>

> How many things where in the hash at the time you did the
> print scalar %mainHash?  (print scalar keys %mainHash).  If it had 150K
> things in it at the time, than there are 150K/16K or about 10 entries per
> bucket. Higher than I would expect but not aweful.
>

At the time I print out scalar keys %mainHash - i had 240k entries in
it. (This is the one-big-hash implementation). And print scalar keys
%mainHash reports 160k/230k - not 16 and 23k... i missed out a 0... so
that means its about 1 entry per slot -- not too much collision?

I printed out the scalar keys %mainHash, and each of the subhashes
AFTER all 240k entries were inserted. This is a cut and paste..

238348 lines in total..
Item: @   -   Size: 9   -  ScalarSize: 7/16
Item: A   -   Size: 12750   -  ScalarSize: 8859/16384
Item: B   -   Size: 9309   -  ScalarSize: 7084/16384
Item: C   -   Size: 14898   -  ScalarSize: 9854/16384
Item: D   -   Size: 8878   -  ScalarSize: 6865/16384
Item: E   -   Size: 7341   -  ScalarSize: 4904/8192
Item: F   -   Size: 7557   -  ScalarSize: 4954/8192
Item: G   -   Size: 7109   -  ScalarSize: 4733/8192
Item: H   -   Size: 7399   -  ScalarSize: 4854/8192
Item: I   -   Size: 12290   -  ScalarSize: 8639/16384
Item: J   -   Size: 3898   -  ScalarSize: 2516/4096
Item: K   -   Size: 4596   -  ScalarSize: 3494/8192
Item: L   -   Size: 8549   -  ScalarSize: 6649/16384
Item: M   -   Size: 10149   -  ScalarSize: 7598/16384
Item: N   -   Size: 7725   -  ScalarSize: 5044/8192
Item: O   -   Size: 8791   -  ScalarSize: 6818/16384
Item: P   -   Size: 11573   -  ScalarSize: 8262/16384
Item: Q   -   Size: 12397   -  ScalarSize: 8721/16384
Item: R   -   Size: 8247   -  ScalarSize: 6464/16384
Item: S   -   Size: 13921   -  ScalarSize: 9348/16384
Item: T   -   Size: 8995   -  ScalarSize: 6884/16384
Item: U   -   Size: 10667   -  ScalarSize: 7799/16384
Item: V   -   Size: 9166   -  ScalarSize: 7058/16384
Item: W   -   Size: 11293   -  ScalarSize: 8142/16384
Item: X   -   Size: 6617   -  ScalarSize: 4555/8192
Item: Y   -   Size: 8704   -  ScalarSize: 6717/16384
Item: Z   -   Size: 4167   -  ScalarSize: 3277/8192
scalar of master_hash is: 27/524288

This data is being printed by the following code,

	for my $item ( sort keys %mainHash ) {
	      my $size = keys(%{$mainHash{$item}});
	      my $scalarSize = scalar %{$mainHash{$item}};
	      print "Item: $item   -   Size: $size   -  ScalarSize:
$scalarSize\n";
	}
	print "scalar of master_hash is: ";
	print scalar %Master_Hash;
	print "\n";

By looking at the scalarSize and my PF usage - It does seem like it is
the memory issue, by loading too much data that the system can handle?
Situations like that - Does it matter if i do keys(%mainHash) = 300000
or not?



> Are you using warnings?  If so, did you get an uninitialized value
> warning?
> 

Perhaps not - how do you turn on warnings?

Thanks,
Tak



------------------------------

Date: Wed, 2 Aug 2006 19:45:50 +0100
From: Ben Morrow <benmorrow@tiscali.co.uk>
Subject: Re: Perl hash of hash efficiency.
Message-Id: <uf87q3-phc.ln1@osiris.mauzo.dyndns.org>


Quoth "Paul Lalli" <mritty@gmail.com>:
> Ben Morrow wrote:
> > Quoth "tak" <yekasi@gmail.com>:
> > > But it doesnt seem like it is deleting... B/c if I try to get the
> > > length of the $hoh{$letter}, it still reports the same number...
> >
> > You really need to learn some basic Perl. I'd recommend a book:
> > 'Learning Perl' published by O'Reilly is universally recommended as a
> > good place to start.
> 
> > $hoh{$letter} contains a hash
> > *reference*: see perldoc perldsc and perldoc perlreftut for how
> > multi-level data structures are implemented in Perl. Or, again, a decent
> > book will cover it.
> 
> Yet, ironically, not the book you recommended. :-(

Ah, my apologies. I've never actually read either the Llama or the
Alpaca; I learned Perl from the Camel, here, and perl.plover.com :) .

Ben

-- 
If I were a butterfly I'd live for a day, / I would be free, just blowing away.
This cruel country has driven me down / Teased me and lied, teased me and lied.
I've only sad stories to tell to this town: / My dreams have withered and died.
  benmorrow@tiscali.co.uk                                          (Kate Rusby)


------------------------------

Date: 2 Aug 2006 12:03:48 -0700
From: "Paul Lalli" <mritty@gmail.com>
Subject: Re: Perl hash of hash efficiency.
Message-Id: <1154545428.557100.122240@m79g2000cwm.googlegroups.com>

Ben Morrow wrote:
> Ah, my apologies. I've never actually read either the Llama or the
> Alpaca; I learned Perl from the Camel, here, and perl.plover.com :) .

They're both great books, but the Llama is, unfortunately, *very*
beginner-oriented.  It does not cover references, multi-dimensional
structures, or even passing arrays or hashes to subroutines (because,
obviously, that involves references).

Paul Lalli



------------------------------

Date: Wed, 2 Aug 2006 20:03:48 +0100
From: Ben Morrow <benmorrow@tiscali.co.uk>
Subject: Re: Perl hash of hash efficiency.
Message-Id: <kh97q3-gpc.ln1@osiris.mauzo.dyndns.org>


Quoth "tak" <yekasi@gmail.com>:
> > How many things where in the hash at the time you did the
> > print scalar %mainHash?  (print scalar keys %mainHash).  If it had 150K
> > things in it at the time, than there are 150K/16K or about 10 entries per
> > bucket. Higher than I would expect but not aweful.
> 
> At the time I print out scalar keys %mainHash - i had 240k entries in
> it. (This is the one-big-hash implementation). And print scalar keys
> %mainHash reports 160k/230k - not 16 and 23k... i missed out a 0... so
> that means its about 1 entry per slot -- not too much collision?

That's more like what I would have expected (with no basis whatever
other than faith in perl's hashing algorithm :)). Exactly one entry per
slot is perfect hashing: no searching at all within the bucket. Not much
more than that means not much searching.

> By looking at the scalarSize and my PF usage - It does seem like it is
> the memory issue, by loading too much data that the system can handle?
> Situations like that - Does it matter if i do keys(%mainHash) = 300000
> or not?

Well... preallocating too much will only make things worse.
Preallocating at all is a dodgy business (you have to calculate how much
to preallocate, which makes the code more complicated and thus more
fragile; and less readable), so don't do it if bucket allocation isn't
your problem, which it seems not to be.

> > Are you using warnings?  If so, did you get an uninitialized value
> > warning?
> 
> Perhaps not - how do you turn on warnings?

Have you read the Posting Guidelines? They cover some basic stuff like
this. At the top of every script you write you should have

    use strict;
    use warnings;

which turn on a whole lot of diagnostics which are usually essential but
often annoying for one-liners.

Ben

-- 
  The cosmos, at best, is like a rubbish heap scattered at random.
                                                           Heraclitus
  benmorrow@tiscali.co.uk


------------------------------

Date: 2 Aug 2006 12:13:30 -0700
From: "rcopple" <robertcopple123@gmail.com>
Subject: Perl/PHP Web Software Architects Needed
Message-Id: <1154546010.079692.180290@m79g2000cwm.googlegroups.com>

PERL/PHP Web Software Architect

We are seeking enthusiastic candidates that are looking to gain
international exposure in India.  The IT industry is booming in India
and providing valuable opportunities for employees to gain
international exposure.  We are also looking at NRI's and PIO's
that are interested in repatriating themselves back to India.

Work India has been retained by one of the top IT/Services companies in
the US Medical Billing Industry.  Work India is assisting in locating
and hiring candidates for the position of PERL/PHP Web Software
Architects for their subsidiary in Chennai India.

Job Specifics
The Architect will be a senior member of our technology development
team.  (S)he will be responsible for executing large-scale technology
initiatives through the full life-cycle of application development.
First and foremost, (s)he will develop deep expertise in both domain
content and technology, and will expect to spend the majority of
his/her time in hands-on coding.  (S)he will be expected to develop
ownership of an area of the application.  The Architect will provide
mentorship and guidance to more junior team members working in his/her
application zone, but will not expect to have direct management
responsibility.  The Architect will be required to develop a deep
understanding of the business context and to become a source of ideas
as well as an executor.  The Architect will be expected to be
proficient in PERL, SQL, DHTML, javascript, CSS, and general Linux
technologies.

Specific Qualifications
=B7	Minimum 6 years experience hands-on software development work
=B7	Demonstrated experience in a technical leadership role preparing
projects for distribution among multiple engineers
=B7	Significant experience working on Unix/Linux-based web applications,
preferably with mod_Perl, LAMP or other open-source architectures
=B7	Significant experience working with interpreted scripting languages
(eg PERL, PHP, Python, LISP, Sed/Awk)
=B7	Specific, demonstrated command of PERL
=B7	Experience with SQL (preferably Oracle SQL)
=B7	Hands-on software developer: Minimum of 2,000 lines of code
personally written in past 24 months
=B7	Geek at heart: Codes in spare time, just because it's fun
=B7	A keen and inquisitive intellect
=B7	Excellent English communication skills
=B7	US work experience a key advantage
=B7	Ability to absorb business context
=B7	Ability to provide mentorship and leadership within a flat
organizational structure
=B7	Ability to work effectively with operations staff and management to
identify and design effective innovations
=B7	Enthusiasm, idealism, creativity, dedication and a little bit of
attitude
=B7	Willingness and ability to travel to the US for up to two months
training

These positions will require relocation to Chennai India.

Please send all correspondence to recruitment@work-india.com

Work India
recruitment@work-india.com
<A HREF=3D"http://www.work-india.com/">Work India</A>



------------------------------

Date: 2 Aug 2006 12:29:34 -0700
From: usenet@DavidFilmer.com
Subject: Re: Perl/PHP Web Software Architects Needed
Message-Id: <1154546974.023344.41420@i42g2000cwa.googlegroups.com>

rcopple wrote:
> PERL/PHP Web Software Architect

You are confusing this usenet group with http://jobs.perl.org.  Please
don't do that.

-- 
David Filmer (http://DavidFilmer.com)



------------------------------

Date: 2 Aug 2006 14:45:07 -0700
From: bubbabubbs@yahoo.com
Subject: Re: Q: ActivePerl - calling an ActiveX object
Message-Id: <1154555107.290341.146720@i42g2000cwa.googlegroups.com>

Ok, I figured it out. This is how you do it:

my $arg4 = Variant( VT_R8 | VT_BYREF, 0.0);   # !!!!
my $service = Win32::OLE->new( '3rdPartyComponent.Service' );
$service->foo( 31, 4000, 28000, $arg4 );



------------------------------

Date: Wed, 02 Aug 2006 21:52:53 GMT
From: "A. Sinan Unur" <1usa@llenroc.ude.invalid>
Subject: Re: Q: ActivePerl - calling an ActiveX object
Message-Id: <Xns9813B5FF6925Dasu1cornelledu@127.0.0.1>

bubbabubbs@yahoo.com wrote in news:1154555107.290341.146720
@i42g2000cwa.googlegroups.com:

> Ok, I figured it out. This is how you do it:
> 
> my $arg4 = Variant( VT_R8 | VT_BYREF, 0.0);   # !!!!
> my $service = Win32::OLE->new( '3rdPartyComponent.Service' );
> $service->foo( 31, 4000, 28000, $arg4 );

First off, thank you very much for sharing the solution. I am sure knowing 
that will help me some time in the future.

I would like to point out, however, that it is useful to leave some 
context in your post so others can also understand what you figured out.

See the posting guidelines for this group.

Sinan
-- 
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html



------------------------------

Date: Wed, 2 Aug 2006 19:18:23 +0100
From: "kokolo" <koko_loko_0@yahoo.co.uk>
Subject: Re: Recursion
Message-Id: <eaqq9k$hqt$1@news.al.sw.ericsson.se>


<xhoster@gmail.com> wrote in message
news:20060802124247.030$Bg@newsreader.com...
>
> Well, you have hit one of those famous tradeoffs.  There is no doubt that
> your paritioning method is simpler, less error prone, less subtle, etc.
> than one of the traditional in-place pivot methods.  But it is also slower
> due to all the allocation and copying going on.
>
> Xho

I tried referencing like this:
    ........
my $l_ref=\@smaller_numbers;
 my $r_ref=\@larger_numbers;

 if ($#smaller_numbers > 0){@smaller_numbers = &qs(@$l_ref)}
 if ($#larger_numbers > 0) {@larger_numbers = &qs(@$r_ref)}

and it improved the peformance but it's still sorting 100000 elements in
abot 55 seconds.
Was that referencing ok?
I wonder how good QuickSort can be in Perl so it will show me how bad or
good my algorithm is.

kokolo








------------------------------

Date: 2 Aug 2006 11:33:53 -0700
From: "Paul Lalli" <mritty@gmail.com>
Subject: Re: Recursion
Message-Id: <1154543633.212264.287960@p79g2000cwp.googlegroups.com>

kokolo wrote:
> <xhoster@gmail.com> wrote in message
> news:20060802124247.030$Bg@newsreader.com...
> >
> I tried referencing like this:
>     ........
> my $l_ref=\@smaller_numbers;
>  my $r_ref=\@larger_numbers;

This creates two references, one to each of your big arrays.

>  if ($#smaller_numbers > 0){@smaller_numbers = &qs(@$l_ref)}
>  if ($#larger_numbers > 0) {@larger_numbers = &qs(@$r_ref)}

Each of these now dereferences those references, thus passing in the
entire arrays (copies of those arrays, actually!).  This is not a
performance improvement.

> and it improved the peformance but it's still sorting 100000 elements in
> abot 55 seconds.
> Was that referencing ok?

No, I don't think so.  Unless I'm drastically misunderstanding your
code, you did not use the refereneces at all.  Pass the references into
your subroutine, and then use those referenced arrays directly.  Do not
dereference them, thereby copying the data back out of them again.
That's exactly what you're trying to avoid....

(Also, stop prepending the & on your subroutine calls.  If you don't
know what that does, you don't want it)

sub qs {
  my $array_ref = shift;
  #do stuff to @$array_ref, or $array_ref->[0], etc.
  #do not copy the elements out of @$array_ref;
  #do not return @$array_ref.  Just modify it in this subroutine.
}

$l_ref = \@smaller_numbers;
qs($l_ref);
# at this point, @smaller_numbers has whatever modifications
# you did to @$array_ref in the subroutine.

Paul Lalli



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 9551
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[28187] in Perl-Users-Digest

Perl-Users Digest, Issue: 9551 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Wed Aug 2 18:10:19 2006

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Wed Aug 2 18:10:19 2006