[32423] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 3690 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon May 14 21:09:29 2012

Date: Mon, 14 May 2012 18:09:12 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Mon, 14 May 2012     Volume: 11 Number: 3690

Today's topics:
        btree <nospam.gravitalsun@hotmail.com.nospam>
    Re: btree <XeCycle@Gmail.com>
    Re: btree <m@rtij.nl.invlalid>
    Re: btree <tw+usenet@dionic.net>
    Re: btree <ben@morrow.me.uk>
    Re: btree <hjp-usenet2@hjp.at>
    Re: btree <m@rtij.nl.invlalid>
    Re: btree <rweikusat@mssgmbh.com>
    Re: btree <rweikusat@mssgmbh.com>
    Re: btree <XeCycle@Gmail.com>
    Re: btree <willem@toad.stack.nl>
    Re: hash <rweikusat@mssgmbh.com>
    Re: hash <m@rtij.nl.invlalid>
    Re: hash <hjp-usenet2@hjp.at>
    Re: hash <m@rtij.nl.invlalid>
    Re: hash <rweikusat@mssgmbh.com>
    Re: hash <rweikusat@mssgmbh.com>
    Re: hash <m@rtij.nl.invlalid>
    Re: hash <rweikusat@mssgmbh.com>
        parse question <bubunia2000ster@gmail.com>
    Re: parse question <JimSGibson@gmail.org>
        Sorting <artmerar@yahoo.com>
    Re: Sorting <nospam.gravitalsun@hotmail.com.nospam>
    Re: Sorting (Tim McDaniel)
    Re: Sorting <ben@morrow.me.uk>
    Re: Sorting <jurgenex@hotmail.com>
    Re: Sorting <ben@morrow.me.uk>
        What does "__PACKAGE__->_factory" actually do? <r.ted.byers@gmail.com>
    Re: What does "__PACKAGE__->_factory" actually do? <ben@morrow.me.uk>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 11 May 2012 13:45:45 +0300
From: "George Mpouras" <nospam.gravitalsun@hotmail.com.nospam>
Subject: btree
Message-Id: <joq8vp$gor$1@news.ntua.gr>

I ve done some tests and found out
that hash is faster than   'something' ~~ @array
From theory the btree alogorythm is faster than the hash (even more if the
hash has conflicts)
Is there any way to store my values as btree instead of hash in Perl ?




------------------------------

Date: Mon, 14 May 2012 14:46:05 +0800
From: XeCycle <XeCycle@Gmail.com>
Subject: Re: btree
Message-Id: <87sjf3s1jm.fsf@xc.laptop>

--=-=-=
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

"George Mpouras" <nospam.gravitalsun@hotmail.com.nospam> writes:

> I ve done some tests and found out
> that hash is faster than   'something' ~~ @array
> From theory the btree alogorythm is faster than the hash (even more if the
> hash has conflicts)
> Is there any way to store my values as btree instead of hash in Perl ?

Use Berkeley DB, dude.

Never think about such efficiency problems before your data grew
very large.

=2D-=20
Carl Lei (XeCycle)
Department of Physics, Shanghai Jiao Tong University
OpenPGP public key: 7795E591
Fingerprint: 1FB6 7F1F D45D F681 C845 27F7 8D71 8EC4 7795 E591

--=-=-=
Content-Type: application/pgp-signature

-----BEGIN xxx SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)

iQEcBAEBAgAGBQJPsKovAAoJEI1xjsR3leWRsCkIALLH/fHuQiHF01n7Tv7g9pvz
moz2ax+hMWdrMaGCerpBIhqzUmuUPSH4XmpzajSRrcEerFmfkzdiYPDcbsOgPcuk
Slw0gTLJdY+gWzxi5aHCZ2+gDoGSMJpA1saD7ac4ATE43q5VBtoywb+5FrofINkg
P8iwvQia40fDVM3deVGwvaOCu2XDWhF8DL8WBrOjQPvm8bMRPOUcSRtOp4LYFQ+H
Cwd1VK9/EEHRETU/2+ab15yKCc66wvgMaHF1bYdfw7ylJJtyM87RaNFoVNcEDOU8
wr77c5vmcjdZJ1LuwmHTPrhHzUkQniy2WMDtCyZhFggtWhv0BKA98aSwgQx/HZo=
=u/8x
-----END PGP SIGNATURE-----
--=-=-=--


------------------------------

Date: Mon, 14 May 2012 08:51:14 +0200
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: btree
Message-Id: <2km689-464.ln1@news.rtij.nl>

On Fri, 11 May 2012 13:45:45 +0300, George Mpouras wrote:

> I ve done some tests and found out that hash is faster than  
> 'something' ~~ @array From theory the btree alogorythm is faster than
> the hash (even more if the hash has conflicts)
> Is there any way to store my values as btree instead of hash in Perl ?

Native? No. But there probably is some module that helps on CPAN.

M4


------------------------------

Date: Mon, 14 May 2012 08:33:46 +0100
From: Tim Watts <tw+usenet@dionic.net>
Subject: Re: btree
Message-Id: <q3p689-lts.ln1@squidward.local.dionic.net>

XeCycle wrote:

> "George Mpouras" <nospam.gravitalsun@hotmail.com.nospam> writes:
> 
>> I ve done some tests and found out
>> that hash is faster than   'something' ~~ @array
>> From theory the btree alogorythm is faster than the hash (even more if
>> the hash has conflicts)
>> Is there any way to store my values as btree instead of hash in Perl ?
> 
> Use Berkeley DB, dude.
> 
> Never think about such efficiency problems before your data grew
> very large.
> 

Yep - exactly how much data does the OP have? Typically, you can have tens 
to hundreds of thousands of keys in a perl hash without much of a problem, 
unless:

1) Speed is becoming a problem;

2) You are implementing on constrained hardware (eg embedded).

Also, there are various tree modules available on CPAN, but few are likely 
to compare to the internal perl hashing unless they are written in C and 
have been specifically designed for speed/efficiency.
-- 
Tim Watts


------------------------------

Date: Mon, 14 May 2012 09:12:46 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: btree
Message-Id: <ucr689-cqu1.ln1@anubis.morrow.me.uk>


Quoth Martijn Lievaart <m@rtij.nl.invlalid>:
> On Fri, 11 May 2012 13:45:45 +0300, George Mpouras wrote:
> 
> > I ve done some tests and found out that hash is faster than  
> > 'something' ~~ @array

That is not remotely surprising.

> > From theory the btree alogorythm is faster than
> > the hash (even more if the hash has conflicts)
> > Is there any way to store my values as btree instead of hash in Perl ?
> 
> Native? No. But there probably is some module that helps on CPAN.

DB_File is a core module. If you use the $DB_BTREE mode, and undef
instead of a filename, it will give you a Perl-level hash implemented by
an in-memory btree.

Ben



------------------------------

Date: Mon, 14 May 2012 10:40:24 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: btree
Message-Id: <slrnjr1h7p.uct.hjp-usenet2@hrunkner.hjp.at>

On 2012-05-14 07:33, Tim Watts <tw+usenet@dionic.net> wrote:
> XeCycle wrote:
>> "George Mpouras" <nospam.gravitalsun@hotmail.com.nospam> writes:
>>> Is there any way to store my values as btree instead of hash in Perl ?
>> 
>> Use Berkeley DB, dude.
>> 
>> Never think about such efficiency problems before your data grew
>> very large.
>> 
>
> Yep - exactly how much data does the OP have? Typically, you can have
> tens to hundreds of thousands of keys in a perl hash without much of a
> problem, unless:
>
> 1) Speed is becoming a problem;

As long as the hash fits into memory, each operation (insert, lookup,
delete) is almost certainly faster on a hash than on a btree.

A btree becomes faster when we are dealing with *persistent* data and a
relatively small number of lookups and/or changes:

With a hash, you need to load all the data into memory at program start,
and if you make any changens you have to write it back to disk before
the program terminates.

With a btree, you only have to read/write the blocks you need.

	hp


-- 
   _  | Peter J. Holzer    | Deprecating human carelessness and
|_|_) | Sysadmin WSR       | ignorance has no successful track record.
| |   | hjp@hjp.at         | 
__/   | http://www.hjp.at/ |  -- Bill Code on asrg@irtf.org


------------------------------

Date: Mon, 14 May 2012 10:46:48 +0200
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: btree
Message-Id: <oct689-kn4.ln1@news.rtij.nl>

On Mon, 14 May 2012 09:12:46 +0100, Ben Morrow wrote:

> DB_File is a core module. If you use the $DB_BTREE mode, and undef
> instead of a filename, it will give you a Perl-level hash implemented by
> an in-memory btree.

Nice! Thanks. Added to the toolbox.

M4


------------------------------

Date: Mon, 14 May 2012 13:57:57 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: btree
Message-Id: <87zk9aly22.fsf@sapphire.mobileactivedefense.com>

"George Mpouras" <nospam.gravitalsun@hotmail.com.nospam> writes:

[...]

> From theory the btree alogorythm is faster than the hash (even more if the
> hash has conflicts)

As I already showed in another posting: This is completely
wrong. First, if there are now 'conflicts' in the hash table, hash
lookup is O(1) -- you calculate the hash value, select the proper
table slot and do at most a single key comparison. That's better than
anything which can be achieved with any tree structure. Second, if
there are conflicts, the maximum amount of key comparisons necessary
for a search (successful or unsuccessful) depends on the average
length of the list which needs to be searched. If these lists are
short compared to the height of some search tree used for the same set
of elements, hashed lookups will still be faster. And 'short-ness' of
the lists can be ensured by using a suitably large table.



------------------------------

Date: Mon, 14 May 2012 14:01:23 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: btree
Message-Id: <87vcjylxwc.fsf@sapphire.mobileactivedefense.com>

XeCycle <XeCycle@Gmail.com> writes:
> "George Mpouras" <nospam.gravitalsun@hotmail.com.nospam> writes:
>
>> I ve done some tests and found out
>> that hash is faster than   'something' ~~ @array
>> From theory the btree alogorythm is faster than the hash (even more if the
>> hash has conflicts)
>> Is there any way to store my values as btree instead of hash in Perl ?
>
> Use Berkeley DB, dude.
>
> Never think about such efficiency problems before your data grew
> very large.

The absolutely last thing you want in practice is working code which
needs a fundamental change until "yesterday" because the core data
structure used by it turns out to be unsuitable for a real-world
problem it is supposed to deal with.


------------------------------

Date: Mon, 14 May 2012 21:41:31 +0800
From: XeCycle <XeCycle@Gmail.com>
Subject: Re: btree
Message-Id: <87likuj2wk.fsf@xc.laptop>

--=-=-=
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

Rainer Weikusat <rweikusat@mssgmbh.com> writes:

[...]

>> Never think about such efficiency problems before your data grew
>> very large.
>
> The absolutely last thing you want in practice is working code which
> needs a fundamental change until "yesterday" because the core data
> structure used by it turns out to be unsuitable for a real-world
> problem it is supposed to deal with.

Another way is to use a compatible interface for the new data
backend.  Berkeley DB can be compatible with Perl arrays and
hashes.

Of course you may not be able to use the new features in this
way, but when things get really large, the original algorithm may
not suffice.  Therefore a refactor is needed, in case you're
scaling it too large.  A moderate scale up may be able to go
without those features provided by a more capable data backend.

=2D-=20
Carl Lei (XeCycle)
Department of Physics, Shanghai Jiao Tong University
OpenPGP public key: 7795E591
Fingerprint: 1FB6 7F1F D45D F681 C845 27F7 8D71 8EC4 7795 E591

--=-=-=
Content-Type: application/pgp-signature

-----BEGIN xxx SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)

iQEcBAEBAgAGBQJPsQuNAAoJEI1xjsR3leWReuMH/RJ4l7jPzUE4CC2UPUB2kPbn
Whhd3Uv/xow1yyG39aIqi+Eggj7uQ39mA1rFKVp7l/hxr292ioBf8BuoWBR/5Wza
VibR07rsEfWk0BSm/G2P6CmvDYzW6nD1M8tEUg0CkWoinDoxiRg1jnEb7rx6eca3
tcMhH+1B/lDt90jvRRrX/q14PIcH5uxFKFUjM6GJ1mPovX47RA6Sd/lWyps0SzcR
zN4uefH85La//QDDrT6dqGjUa5v14EmrVbiqwS4YIf1G48j+/IcgcpmCMkk7VR/M
S3d6lVCsoa1IMQx4DU8qGzLPp/n+0dnPvRK/ON+o//B/Mpbcvy0QDl3PFE9Xhp4=
=QOeI
-----END PGP SIGNATURE-----
--=-=-=--


------------------------------

Date: Mon, 14 May 2012 15:52:31 +0000 (UTC)
From: Willem <willem@toad.stack.nl>
Subject: Re: btree
Message-Id: <slrnjr2ahv.2l7m.willem@toad.stack.nl>

XeCycle wrote:
) Rainer Weikusat <rweikusat@mssgmbh.com> writes:
)> The absolutely last thing you want in practice is working code which
)> needs a fundamental change until "yesterday" because the core data
)> structure used by it turns out to be unsuitable for a real-world
)> problem it is supposed to deal with.
)
) Another way is to use a compatible interface for the new data
) backend.  Berkeley DB can be compatible with Perl arrays and
) hashes.
)
) Of course you may not be able to use the new features in this
) way, but when things get really large, the original algorithm may
) not suffice.  Therefore a refactor is needed, in case you're
) scaling it too large.  A moderate scale up may be able to go
) without those features provided by a more capable data backend.

And then it turns out that the API to the backend is what causes
it to scale badly and you're screwed anyway.


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT


------------------------------

Date: Sun, 13 May 2012 21:49:38 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: hash
Message-Id: <8762bz3ixp.fsf@sapphire.mobileactivedefense.com>

Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
>> Keith Thompson <kst-u@mib.org> writes:
>> > Jürgen Exner <jurgenex@hotmail.com> writes:
>> >> "George Mpouras" <nospam.gravitalsun.antispam@hotmail.com.nospam> wrote:
>> >> [...]
>> >>>Also hashing algorythm is much slower than binary search, 
>> >>
>> >> hash access is typically O(1) while binary search is O(log n). Why do
>> >> you think hashing is slower than binary search?
>> >
>> > Hashing itself is not really an algorithm; it's a data structure.
>> 
>> Hashing is an algorithm and not a data structure: Usually, it refers
>> to 'calculate a "hash value"' (relatively small integer) from some
>> (significantly) larger 'input data value'. Usually, this hash value is
>> then used as index into a vector of pointers to locate 'a list' on
>> which some kind of data item associated with this 'input data value'
>> (key) should either exist or needs to be put on.
>
> I don't know about 'usually'. One of the more common uses of hash
> algorithms is in message digests and digital signatures and such.

The common use of hashing in perl is the implementation of so-called
hashes (if you think that hashes in perl are sufficiently uncommon
that referring to them as 'commmon' seems unwarranted, please feel
free to argue for another definition of 'commmonly used'). In this
case, it reduces a sequence of bytes to an unsigned 32-bit value
which is used as 'base value' for an index into a table of
lists.

So-called 'hash functions' have other uses than in lookup
tables (such as for generating message digests which can then be
'signed' by encrypting them with the private key of a public/private
key pair for public key cryptography to produce a so-called 'digital
signature or to calcluate hased message authentication codes [HMAC])
but in Perl, such uses are relatively rare, and in any case, this is
besides the point in a discussion about 'fast/slow lookups'.

> I think perhaps Keith meant to say 'A hash *table* is not an
> algorithm, it's a data structure', which is entirely true.

The term is commonly used but this is really just as sloppy as
referring to 'the usual case' as 'the usual case': A 'hash table' is
inherently in now way different from any other 'table' (for this
definition of table), just the way it is being used differs.


------------------------------

Date: Mon, 14 May 2012 08:47:29 +0200
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: hash
Message-Id: <1dm689-464.ln1@news.rtij.nl>

On Sun, 13 May 2012 19:11:39 +0100, Rainer Weikusat wrote:

> "George Mpouras" <nospam.gravitalsun.antispam@hotmail.com.nospam>
> writes:
> 
> [...]
> 
>> hash is a data structure and there is a an internal logic of how to
>> retrieve values. If there is a big number of keys then hash may have
>> conflict keys; these conflicts are stored as lists where its retrieval
>> is slow (n). Retrieving btree values is much faster O(log n).
> 
> This doesn't exactly make sense: Let's assume I'm storing 1024 key-value
> pairs in a hash table and what I end up with are 128 lists of length 8.
> Ignoring the overhead for calculating the hash values, this means that I
> can locate each of these 1024 pairs with doing at most 8 key
> comparisons.

If you end up with so many lists of that length, something is seriously 
wrong with your hash table. It's either to small, or you have a very bad 
hashing function.

> If the same 1024 key-value pairs where organized as some
> kind of balanced binary search tree, that would be log2(1024) = 10 key
> comparisons. Since '128 pointers' isn't exactly a lot of data nowadays,
> it should be possible to double the size of the table and thus end up
> with 256 list of length 4 and so on.

George was talking about a btree, not a binary tree. Those are fairly 
different beasts.

HTH,
M4


------------------------------

Date: Mon, 14 May 2012 10:00:55 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: hash
Message-Id: <slrnjr1eto.j8l.hjp-usenet2@hrunkner.hjp.at>

On 2012-05-14 06:47, Martijn Lievaart <m@rtij.nl.invlalid> wrote:
> On Sun, 13 May 2012 19:11:39 +0100, Rainer Weikusat wrote:
>> "George Mpouras" <nospam.gravitalsun.antispam@hotmail.com.nospam>
>> writes:
>>> hash is a data structure and there is a an internal logic of how to
>>> retrieve values. If there is a big number of keys then hash may have
>>> conflict keys; these conflicts are stored as lists where its retrieval
>>> is slow (n). Retrieving btree values is much faster O(log n).
>> 
>> This doesn't exactly make sense: Let's assume I'm storing 1024 key-value
>> pairs in a hash table and what I end up with are 128 lists of length 8.
>> Ignoring the overhead for calculating the hash values, this means that I
>> can locate each of these 1024 pairs with doing at most 8 key
>> comparisons.
>
> If you end up with so many lists of that length, something is seriously 
> wrong with your hash table. It's either to small, or you have a very bad 
> hashing function.

The perl hash implementation always keeps the load factor <= 1, so a
hash table with 1024 elements has at least 1024 slots. Of course it's
possible that 896 of these slots are empty and 128 slots have 8 elements
each, but as you say that would mean a very bad hash function or a
deliberate attack. To guard against the latter possibility, perl >= 5.8
(IIRC) monitors the chain length and changes the seed of the hash
function if a chain grows too long.

But I think Rainer was trying to come up with a worst-case scenario
where a hash is still faster (requires fewer comparisons) than a binary
tree, as the next paragraph shows:

>> If the same 1024 key-value pairs where organized as some
>> kind of balanced binary search tree, that would be log2(1024) = 10 key
>> comparisons. Since '128 pointers' isn't exactly a lot of data nowadays,
>> it should be possible to double the size of the table and thus end up
>> with 256 list of length 4 and so on.
>
> George was talking about a btree, not a binary tree. Those are fairly 
> different beasts.

Yes, I noted that too in an earlier posting, but btrees are normally
used for on-disk data structures and they don't implement a binary
search, so it seems likely that George really meant binary trees, not
btrees.

	hp


-- 
   _  | Peter J. Holzer    | Deprecating human carelessness and
|_|_) | Sysadmin WSR       | ignorance has no successful track record.
| |   | hjp@hjp.at         | 
__/   | http://www.hjp.at/ |  -- Bill Code on asrg@irtf.org


------------------------------

Date: Mon, 14 May 2012 10:45:15 +0200
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: hash
Message-Id: <r9t689-kn4.ln1@news.rtij.nl>

On Mon, 14 May 2012 10:00:55 +0200, Peter J. Holzer wrote:

>> George was talking about a btree, not a binary tree. Those are fairly
>> different beasts.
> 
> Yes, I noted that too in an earlier posting, but btrees are normally
> used for on-disk data structures and they don't implement a binary

Btrees are used all over the place, not just on-disk. OTOH, in memory you 
are much more likely to actually use a red-black tree or similar, but the 
interface and performance characteristics are so similar to a btree that 
they are often called btrees as well.

> search, so it seems likely that George really meant binary trees, not
> btrees.

Possible, even probable.

M4



------------------------------

Date: Mon, 14 May 2012 13:45:54 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: hash
Message-Id: <878vgvlym5.fsf@sapphire.mobileactivedefense.com>

Martijn Lievaart <m@rtij.nl.invlalid> writes:
> On Sun, 13 May 2012 19:11:39 +0100, Rainer Weikusat wrote:
>> "George Mpouras" <nospam.gravitalsun.antispam@hotmail.com.nospam>
>> writes:
>> 
>> [...]
>> 
>>> hash is a data structure and there is a an internal logic of how to
>>> retrieve values. If there is a big number of keys then hash may have
>>> conflict keys; these conflicts are stored as lists where its retrieval
>>> is slow (n). Retrieving btree values is much faster O(log n).
>> 
>> This doesn't exactly make sense: Let's assume I'm storing 1024 key-value
>> pairs in a hash table and what I end up with are 128 lists of length 8.
>> Ignoring the overhead for calculating the hash values, this means that I
>> can locate each of these 1024 pairs with doing at most 8 key
>> comparisons.
>
> If you end up with so many lists of that length, something is seriously 
> wrong with your hash table. It's either to small, or you have a very bad 
> hashing function.

This was an example supposed to illustrate that 'searching in linked
list is slow while ...' doesn't make sense.

Apart from that: If I have a fixed size table (128 slots in this
example) and my key/value pairs are distributed evenly onto these 128
slots (yielding 128 lists of length 8), my hash function quite
obviously did the best job it is theoretically capable of doing. 

>> If the same 1024 key-value pairs where organized as some
>> kind of balanced binary search tree, that would be log2(1024) = 10 key
>> comparisons. Since '128 pointers' isn't exactly a lot of data nowadays,
>> it should be possible to double the size of the table and thus end up
>> with 256 list of length 4 and so on.
>
> George was talking about a btree, not a binary tree. Those are fairly 
> different beasts.

And I was writing about a data structure with the performance
characteristics he mentioned. Even if he was really referring to 'a
B-tree' and not just using btree as abbreviations for 'binary tree'
(which I very strongly suspect), that doesn't matter for this example.


------------------------------

Date: Mon, 14 May 2012 13:49:48 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: hash
Message-Id: <874nrjlyfn.fsf@sapphire.mobileactivedefense.com>

Martijn Lievaart <m@rtij.nl.invlalid> writes:
> On Mon, 14 May 2012 10:00:55 +0200, Peter J. Holzer wrote:
>>> George was talking about a btree, not a binary tree. Those are fairly
>>> different beasts.
>> 
>> Yes, I noted that too in an earlier posting, but btrees are normally
>> used for on-disk data structures and they don't implement a binary
>
> Btrees are used all over the place, not just on-disk. OTOH, in memory you 
> are much more likely to actually use a red-black tree or similar, but the 
> interface and performance characteristics are so similar to a btree that 
> they are often called btrees as well.

A 'red-black tree' is a balanced, binary search tree with the
'red-black' referring to a specific balancing algorithm.


------------------------------

Date: Mon, 14 May 2012 22:45:45 +0200
From: Martijn Lievaart <m@rtij.nl.invlalid>
Subject: Re: hash
Message-Id: <pg7889-uf8.ln1@news.rtij.nl>

On Mon, 14 May 2012 13:49:48 +0100, Rainer Weikusat wrote:

> Martijn Lievaart <m@rtij.nl.invlalid> writes:
>> On Mon, 14 May 2012 10:00:55 +0200, Peter J. Holzer wrote:
>>>> George was talking about a btree, not a binary tree. Those are fairly
>>>> different beasts.
>>> 
>>> Yes, I noted that too in an earlier posting, but btrees are normally
>>> used for on-disk data structures and they don't implement a binary
>>
>> Btrees are used all over the place, not just on-disk. OTOH, in memory
>> you are much more likely to actually use a red-black tree or similar,
>> but the interface and performance characteristics are so similar to a
>> btree that they are often called btrees as well.
> 
> A 'red-black tree' is a balanced, binary search tree with the
> 'red-black' referring to a specific balancing algorithm.

I actually ment skip lists (brain fart), but the statement is still not 
incorrect.

M4



------------------------------

Date: Mon, 14 May 2012 22:49:11 +0100
From: Rainer Weikusat <rweikusat@mssgmbh.com>
Subject: Re: hash
Message-Id: <87pqa6mo14.fsf@sapphire.mobileactivedefense.com>

Martijn Lievaart <m@rtij.nl.invlalid> writes:
> On Mon, 14 May 2012 13:49:48 +0100, Rainer Weikusat wrote:
>> Martijn Lievaart <m@rtij.nl.invlalid> writes:
>>> On Mon, 14 May 2012 10:00:55 +0200, Peter J. Holzer wrote:
>>>>> George was talking about a btree, not a binary tree. Those are fairly
>>>>> different beasts.
>>>> 
>>>> Yes, I noted that too in an earlier posting, but btrees are normally
>>>> used for on-disk data structures and they don't implement a binary
>>>
>>> Btrees are used all over the place, not just on-disk. OTOH, in memory
>>> you are much more likely to actually use a red-black tree or similar,
>>> but the interface and performance characteristics are so similar to a
>>> btree that they are often called btrees as well.
>> 
>> A 'red-black tree' is a balanced, binary search tree with the
>> 'red-black' referring to a specific balancing algorithm.
>
> I actually ment skip lists (brain fart), but the statement is still not 
> incorrect.

Since a B-tree is a more general tree structure than a binary search
tree, every binary search tree is also a B-tree, just not vice
versa. While that's not consistent with a statement you made in other
subthread, it is surely 'not incorrect'. But this "data structures I
heard of!"-bingo is IMO pretty pointless.


------------------------------

Date: Mon, 14 May 2012 12:10:44 -0700 (PDT)
From: Pradeep <bubunia2000ster@gmail.com>
Subject: parse question
Message-Id: <5aa28f5e-d8f5-4989-b964-ef9f5ae9d007@em1g2000vbb.googlegroups.com>

Hi,
   I am a new bie in perl. I have a string as follows

xxxx:yyyy:zzzz:aaaa::bbbb.9.5

I need to discard the 9.5 and my resultant string should be
xxxx:yyyy:zzzz:aaaa::bbbb with .9.5 not be present.

I tried to use split as follows. But its not working as expected. I
will appreciate any help.

use Data::Dumper;
my @str="xxxx:yyyy:zzzz:aaaa::bbbb.9.5";
my @values;
foreach my $val (@str) {
   @values = split('.', $val);
}


------------------------------

Date: Mon, 14 May 2012 17:01:04 -0700
From: Jim Gibson <JimSGibson@gmail.org>
Subject: Re: parse question
Message-Id: <JimSGibson-281A09.17010414052012@news.individual.net>

In article 
<5aa28f5e-d8f5-4989-b964-ef9f5ae9d007@em1g2000vbb.googlegroups.com>,
 Pradeep <bubunia2000ster@gmail.com> wrote:

> Hi,
>    I am a new bie in perl. I have a string as follows
> 
> xxxx:yyyy:zzzz:aaaa::bbbb.9.5
> 
> I need to discard the 9.5 and my resultant string should be
> xxxx:yyyy:zzzz:aaaa::bbbb with .9.5 not be present.
> 
> I tried to use split as follows. But its not working as expected. I
> will appreciate any help.
> 
> use Data::Dumper;
> my @str="xxxx:yyyy:zzzz:aaaa::bbbb.9.5";

You should not be assigning a string to an array variable:

my $str = "xxxx:yyyy:zzzz:aaaa::bbbb.9.5";

> my @values;
> foreach my $val (@str) {
>    @values = split('.', $val);
> }

There is no need for a loop, and you need to escape the period to match 
only a period:

my @values = split(/\./,$str);

Your desired string is now in $values[0].

You could also use a regular expression:

$str =~ s/\..*//;


------------------------------

Date: Sat, 12 May 2012 15:54:22 -0700 (PDT)
From: ExecMan <artmerar@yahoo.com>
Subject: Sorting
Message-Id: <a53b9973-46f0-482a-a7e5-59019cf18e32@w13g2000vbc.googlegroups.com>

Hi,

I'm looking to read a directory to get a list of files in Descending
sorted order by timestamp.  I've gotten this so far, but it is in
ascending.  Can anyone help with the descending part?

my $dir = '/home/logfiles';

opendir my $DH, $dir or die "Cannot opendir '$dir' $!";

my @sorted_files =
map $_->[1],
sort { $a->[0] <=> $b->[0] }
map -f "$dir/$_" ? [ ( stat _ )[9], $_ ] : (),
readdir $DH;

for ($i=0; $i<20; $i++) {
  print "FILE: $sorted_files[$i]\n";
}


Thank a bunch!


------------------------------

Date: Tue, 15 May 2012 01:18:54 +0300
From: "ntua" <nospam.gravitalsun@hotmail.com.nospam>
Subject: Re: Sorting
Message-Id: <jos0cd$7uj$1@news.ntua.gr>

sort { $b->[0] <=> $a->[0] }


------------------------------

Date: Mon, 14 May 2012 22:41:32 +0000 (UTC)
From: tmcd@panix.com (Tim McDaniel)
Subject: Re: Sorting
Message-Id: <jos1ms$iao$1@reader1.panix.com>

In article
<a53b9973-46f0-482a-a7e5-59019cf18e32@w13g2000vbc.googlegroups.com>,
ExecMan  <artmerar@yahoo.com> wrote:
>I'm looking to read a directory to get a list of files in Descending
>sorted order by timestamp.

In Linux or other UNIX-like systems, you can use the command "ls -t".

-- 
Tim McDaniel, tmcd@panix.com


------------------------------

Date: Mon, 14 May 2012 23:34:33 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Sorting
Message-Id: <psd889-sg42.ln1@anubis.morrow.me.uk>


Quoth ExecMan <artmerar@yahoo.com>:
> 
> I'm looking to read a directory to get a list of files in Descending
> sorted order by timestamp.  I've gotten this so far, but it is in
> ascending.  Can anyone help with the descending part?
> 
> my $dir = '/home/logfiles';
> 
> opendir my $DH, $dir or die "Cannot opendir '$dir' $!";
> 
> my @sorted_files =
> map $_->[1],
> sort { $a->[0] <=> $b->[0] }

Do you understand what this line does? In particular, what does <=>
return when

    - $a->[0] is less than $b->[0]?
    - $a->[0] is greater than $b->[0]?
    - $a->[0] is equal to $b->[0]?

You just need to arrange for these to come out the other way around.

> map -f "$dir/$_" ? [ ( stat _ )[9], $_ ] : (),
> readdir $DH;

Ben



------------------------------

Date: Mon, 14 May 2012 16:15:25 -0700
From: Jürgen Exner <jurgenex@hotmail.com>
Subject: Re: Sorting
Message-Id: <ae43r79h5j8n37puos4mav6m58ksqct30d@4ax.com>

ExecMan <artmerar@yahoo.com> wrote:
>I'm looking to read a directory to get a list of files in Descending
>sorted order by timestamp.  I've gotten this so far, but it is in
>ascending.  Can anyone help with the descending part?

If nothing else this hack will work:
	perldoc -f reverse

:-)

jue


------------------------------

Date: Tue, 15 May 2012 00:54:27 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: Sorting
Message-Id: <jii889-l252.ln1@anubis.morrow.me.uk>


Quoth Jürgen Exner <jurgenex@hotmail.com>:
> ExecMan <artmerar@yahoo.com> wrote:
> >I'm looking to read a directory to get a list of files in Descending
> >sorted order by timestamp.  I've gotten this so far, but it is in
> >ascending.  Can anyone help with the descending part?
> 
> If nothing else this hack will work:
> 	perldoc -f reverse

Annoyingly, while C<sort { $b <=> $a }> is optimized into a 'DESC' flag
on the sort op, C<reverse sort> is not. If it were it wouldn't be a
hack, be a good way of clearly expressing what you mean. (This is
especially annoying given that, AFAICS, the logic is there in pp_sort to
support the DESC flag with an arbitrary sort function, it just isn't
used.)

Even as it is, 'reverse' is pretty cheap: it doesn't allocate any
memory, it just reverses the list by shifting pointers around on the
stack (one pass through half the list), so it's not a bad way to sort
backwards if reversing the sense of the condition is non-trivial.

Ben



------------------------------

Date: Sun, 13 May 2012 11:39:24 -0700 (PDT)
From: Ted Byers <r.ted.byers@gmail.com>
Subject: What does "__PACKAGE__->_factory" actually do?
Message-Id: <33071689.534.1336934364951.JavaMail.geo-discussion-forums@ynz24>

Here is a class created by wsdl2perl.pl, most of which I understand (I'll interject comments representing my understanding and end with the questions):

# obviously I haven't quoted the whol file, but rather the key components
# these I understand as private data members of the class, wth some
# syntactic sugar (ATTR) provided by the class package to automagically make
# the read accessor function (very nice)
my %id_of :ATTR(:get<id>);
my %type_of :ATTR(:get<type>);
my %name_of :ATTR(:get<name>);
my %size_of :ATTR(:get<size>);
my %extension_of :ATTR(:get<extension>);
my %file_of :ATTR(:get<file>);

# I can't find the documentation for this function
# I know wha a factory is in OOP, having written enough of them in C++, but
# I can't find the documentation for this function, and I am especially
# interested in knowing what the arguments are supposed to be and how they're
# used.
# obviously the first appears to be an array reference to an array with field
# names, and the second two are hashes mapping the field names to hashes and
# their types
__PACKAGE__->_factory(
    [ qw(        id
        type
        name
        size
        extension
        file

    ) ],
    {
        'id' => \%id_of,
        'type' => \%type_of,
        'name' => \%name_of,
        'size' => \%size_of,
        'extension' => \%extension_of,
        'file' => \%file_of,
    },
    {
        'id' => 'SOAP::WSDL::XSD::Typelib::Builtin::int',
        'type' => 'SOAP::WSDL::XSD::Typelib::Builtin::string',
        'name' => 'SOAP::WSDL::XSD::Typelib::Builtin::string',
        'size' => 'SOAP::WSDL::XSD::Typelib::Builtin::string',
        'extension' => 'SOAP::WSDL::XSD::Typelib::Builtin::string',
        'file' => 'SOAP::WSDL::XSD::Typelib::Builtin::string',
    },
    {

        'id' => 'id',
        'type' => 'type',
        'name' => 'name',
        'size' => 'size',
        'extension' => 'extension',
        'file' => 'file',
    }
);

But, this is confusing as it looks like the data members are supposed to be either an integer or a string (i.e. look at the SOAP types), but why, then, are the data members for this class defined as hashes?

Clarification would be helpful.  

Thanks

Ted


------------------------------

Date: Mon, 14 May 2012 23:48:45 +0100
From: Ben Morrow <ben@morrow.me.uk>
Subject: Re: What does "__PACKAGE__->_factory" actually do?
Message-Id: <dne889-sg42.ln1@anubis.morrow.me.uk>


Quoth Ted Byers <r.ted.byers@gmail.com>:
> Here is a class created by wsdl2perl.pl, most of which I understand
> (I'll interject comments representing my understanding and end with the
> questions):
> 
> # obviously I haven't quoted the whol file, but rather the key components

You haven't said which class(es) this class inherits from, which makes
it hard to find the definition of ->_factory to see what it does. 

> # these I understand as private data members of the class, wth some
> # syntactic sugar (ATTR) provided by the class package to automagically make
> # the read accessor function (very nice)
> my %id_of :ATTR(:get<id>);
> my %type_of :ATTR(:get<type>);
> my %name_of :ATTR(:get<name>);
> my %size_of :ATTR(:get<size>);
> my %extension_of :ATTR(:get<extension>);
> my %file_of :ATTR(:get<file>);
> 
<snip>
> 
> But, this is confusing as it looks like the data members are supposed to
> be either an integer or a string (i.e. look at the SOAP types), but why,
> then, are the data members for this class defined as hashes?

This looks to me like the 'inside-out object' technique. The way this
works is that, instead of making your object a hashref and storing the
object's fields inside the object, store (say) all the id_of fields for
all objects of this class in one file-scoped %id_of hash, with the hash
keyed by the refaddr of the object this value belongs to.

That probably wasn't very clear, so an example might help. Here is a
class Foo with two fields 'bar' and 'baz':

    package Foo;

    use Scalar::Util qw/refaddr/;

    my %bar;
    my %baz;

    sub new {
        my ($class, %args) = @_;

        # In this case the object is an empty arrayref, but this is not
        # important. It can be any type blessed ref, since the fields
        # are not stored inside the object.
        my $self = bless [], $class;

        $bar{refaddr $self} = $args{bar};
        $baz{refaddr $self} = $args{baz};
    }

    sub get_bar {
        my ($self) = @_;
        return $bar{refaddr $self};
    }

    sub set_bar {
        my ($self, $new) = @_;
        $bar{refaddr $self} = $new;
    }

    #...and equivalently for %baz

    # This is *really* important. Without this the entries in the hash
    # will never be cleared, and not only will they waste memory but it
    # is not unlikely another object will be allocated with the same
    # address, which would mean it would pick up fields which belonged
    # to a different object.
    sub DESTROY {
        my ($self) = @_;
        
        delete $bar{refaddr $self};
        delete $baz{refaddr $self};
    }

A serious implementation ought to use Hash::Util::Fieldhash, which
correctly handles several nasty corner cases involving reblessing,
overloading and threads, as well as handling the garbage-collection for
you.

Ben



------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3690
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[32423] in Perl-Users-Digest

Perl-Users Digest, Issue: 3690 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Mon May 14 21:09:29 2012

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Mon May 14 21:09:29 2012