[28006] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 9370 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Jun 24 18:05:55 2006

Date: Sat, 24 Jun 2006 15:05:04 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sat, 24 Jun 2006     Volume: 10 Number: 9370

Today's topics:
    Re: Hash of Arrays <pradeep.bg@gmail.com>
        Need Search::Binary examples <nomail@sorry.com>
        Perl core support for inside-out classes (Anno Siegel)
    Re: Regex: Exact semantics of ^ and $ when using /m <DJStunks@gmail.com>
    Re: Regex: Exact semantics of ^ and $ when using /m <mumia.w.18.spam+nospam.usenet@earthlink.net>
    Re: Regex: Exact semantics of ^ and $ when using /m <rvtol+news@isolution.nl>
    Re: Saying "latently-typed language" is making a catego <gdr@integrable-solutions.net>
    Re: Saying "latently-typed language" is making a catego <marshall.spight@gmail.com>
    Re: Saying "latently-typed language" is making a catego <david.nospam.hopwood@blueyonder.co.uk>
        Termination and type systems <david.nospam.hopwood@blueyonder.co.uk>
    Re: unpack 'C' <hjp-usenet2@hjp.at>
    Re: unpack 'C' <hjp-usenet2@hjp.at>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 24 Jun 2006 14:07:27 -0700
From: "Deepu" <pradeep.bg@gmail.com>
Subject: Re: Hash of Arrays
Message-Id: <1151183247.531635.45320@r2g2000cwb.googlegroups.com>

Hi James,

I tried the correction you had specified but i am still getting that
'undef'.

I am getting the output like:

%hash =3D (
                FILE1 =3D> [
                                 undef,
                                 [
                                  STATESUSPEND
                                  STATERESUME
                                  STATE1
                                  STATE2
                                 ],
                                [
                                 _
                                 _
                                ]
                              ],
                 FILE2 =3D> [
                                 undef,
                                  [
                                    _
                                   and so on

Thanks a lot for helping me on this.

--Deep







Peter J. Holzer wrote:
> Tad McClellan wrote:
> > John W. Krahn <someone@example.com> wrote:
> >> Deepu wrote:
> >>> %hash =3D (
> >>>                  FILE1 =3D> ["STATEMAIN", "STATESUSPEND", "STATE1",
> >>> "STATE2", "STATERESUME", "RESET"],
> >>>                  FILE1 =3D> ["STATEMAIN", "STATE9", "STATE10"],
> >>
> >> In Perl a hash cannot have two different keys with the same name.
> >
> >
> > Can a hash have two different keys with the same name somewhere
> > besides in Perl?
>
> Depends on what you mean by "hash". A hash table can, since it needs to
> handle hash collisions, too. An associative array cannot, since it uses
> the key as a unique index.
>
> > How do you distinguish between the two?
>
> Several possibilities: E.g., address or key plus serial number.
>
> > How does the hashing differ?
>
> It doesn't. You have several entries with the same hash value, but you
> have that with unique keys, too.
>
> > Should I take this off-topic question elsewhere?  :-)
>
> Possibly :-)
>
>         hp
>
> --
>    _  | Peter J. Holzer    | Man k=F6nnte sich [die Diskussion] auch
> |_|_) | Sysadmin WSR/LUGA  | sparen, wenn man sie sich einfach sparen
> | |   | hjp@hjp.at         | w=FCrde.
> __/   | http://www.hjp.at/ |   -- Ralph Angenendt in dang 2006-04-15



------------------------------

Date: Sat, 24 Jun 2006 14:52:19 -0700
From: Arvin Portlock <nomail@sorry.com>
Subject: Need Search::Binary examples
Message-Id: <e7kc6o$18eb$1@agate.berkeley.edu>

I need to extract values from very large but numerically
sorted arrays. I can think of lots of ways to do it but
I'd like to try a binary search and prefer to use an exist-
ing module rather than reinvent the wheel.

Unfortunately I can't make sense of the documentation
in Search:Binary (perhaps there's another more appropriate
module?). Are there any scripts out there that use
Search::Binary which one can find on the web?

I have a set of tens of thousands of books and need to
take a random page sample from the set, i.e., examine
the 102,411th page of the set which might occur in the
971st book. My array is an ordered list of cumulative
page counts for each book. So if each book has a hundred
pages the first element in the array would be 1, the
second would be 101, the third 201, etc. If the random
sample required page 114, then the search would discover
it was page 14 of the second book.

Here's an inefficient way to do it. (Yes, I could
narrow each successive search by incrementing array
index start positions or shifting elements off the
array I know will never match as the random numbers
increase. But I'm not doing that here for simplicity
since I want to illustrate the problem and really
want to learn how to use Search::Binary anyway.)

# Some processing of raw data here. The result is a hash
# and an array. The hash has as keys the place within the
# larger sequence where the first page of the book falls.

my %books = (
    1   => 'book 1',
    116 => 'book 2',
    433 => 'book 3',
    762 => 'book 4'
);

# The array consists of the sorted keys of %books above.
# Created when raw data is read so no need for expensive
# sort keys.

my @pages = (1, 116, 433, 762);

# Random numbers are already sorted numerically

my @random_nos = (13, 18, 344, 390, 601);

RANDOM: foreach my $no (@random_nos) {
    foreach my $page (@pages) {
       if ($no >= $page) {
          print "$no is in $books{$page}\n";
          next RANDOM;
       }
    }
}



------------------------------

Date: 24 Jun 2006 21:42:15 GMT
From: anno4000@radom.UUCP (Anno Siegel)
Subject: Perl core support for inside-out classes
Message-Id: <4g5ptnF1k1aitU1@news.dfncis.de>

As of v5.10 Perl will provide support for inside-out classes in form of
a new core module Hash::Util::FieldHash.  It is already available in the
current bleadperl.

Inside-out classes store object data in hashes keyed by the object.  That
way each field of a class is realized as one lexical hash which may thus
be called a field hash.

Normal hashes have a number of disadvantages in this approach.  For one,
references are stringified when used as hash keys, but the stringification
depends on the bless status of the reference.  After a re-bless of an
object, its data would be lost (not to mention overloaded stringification).
This is usually corrected by calling the function Scalar::Util::Refaddr
(or some equivalent) before a refrerence is used as a hash key.

Another problem is that hash entries don't go away when objects go out
of scope.  This leads to an obvious memory leak.  Even worse, objects
are re-used by the perl interpreter, and if the field hashes aren't
cleaned up, a new object may accidentally access data that had been
used by its earlier incarnation.  So an inside-out class based on
standard hashes *must* have a DESTROY method which needs some infra-
structure to work.

A third problem is threads.  When a new thread is created, the entire
perl interpreter is cloned.  That means that in the new thread all
variables will have new reference addresses.  So objects will lose
their data across threads, unless the class has a CLONE method to
correct that.  It turns out that CLONE needs much the same infra-
structure as garbage collection.

With Hash::Util::FieldHash you can declare a new type of hash that
takes care of these problems.  The stringification of keys as their
decimal reference address is built into these hashes.  Furthermore,
field hashes are garbage-collected and thread-safe so that a class
built with field hashes can safely run without DESTROY and
CLONE methods.  That simplifies the construction considerably.

The documentation of Hash::Util::FieldHash can be seen at
http://www.tu-berlin.de/zrz/mitarbeiter/anno4000/clpm/FieldHash.html

Inside-out classes have a fourth problem, which is serialization
and persistence.  Tools like Data::Dumper and Storable don't save,
restore or show the data of an inside-out object, they know nothing
about that.  Hash::Util::FieldHash doesn't address this problem.
I am planning a (non-core) module for CPAN, probably Class::Field
(based on FieldHash) that makes inside-out classes that are constructed
with it viewable, dumpable and restorable.

Anno


------------------------------

Date: 24 Jun 2006 11:39:44 -0700
From: "DJ Stunks" <DJStunks@gmail.com>
Subject: Re: Regex: Exact semantics of ^ and $ when using /m
Message-Id: <1151174384.626777.9910@c74g2000cwc.googlegroups.com>

Wolfgang Thomas wrote:
> Hi,
>
> I am afraid that this question has been asked before, but I could not
> find the answer in the FAQ nor in the "Programming Perl" book, nor by
> googling.

are you aware that Perl comes with documentation of its own for all the
functions and syntax that you might ever want to use?

I would suggest perlre.

> My question refers to the /m modifier for regular expressions.
> According to "Programming Perl" /m lets ^ and $ match next to new lines
> within the string instead of considering only the beginning and end of
> the string.

you have your answer: "next to".  they are called "zero width
assertions" which means they match, but they do not consume any
characters from the string.

>From perlre:
  By default, the "^" character is guaranteed to match only the
  beginning of the string, the "$" character only the end (or
  before the newline at the end), and Perl does certain optimizations
  with the assumption that the string contains only one line. Embedded
  newlines will not be matched by "^" or "$". You may, however, wish
  to treat a string as a multi-line buffer, such that the "^" will
  match after any newline within the string, and "$" will match before
  any newline. At the cost of a little more overhead, you can do this
  by using the /m modifier on the pattern match operator.

> Therefore I wonder why the following example does not match:
>
> my $s = "123\n456";
> if ($s =~ /3$^4/m) {print "match (4)\n";}

this is because there's a character after that $ and before that ^: a
\n.

try: if ($s =~ m'3$.*^4'ms) {print "match (4)\n";}

> Even more confusing (for me) is that
> if ($s =~ /3$4/m) {print "match (2)\n";}
> matches,

did you have warnings enabled?  if so, did you notice the complaint
"Use of uninitialized value in concatenation (.) or string at..."?  The
compiler is not taking that '$' as a regex metacharacter - it is
grouping it with the 4 and assuming you are trying to interpolate $4.
$4 is not defined, the match is now for /3/ which matches.

> Could someone please point me to an explanation of that behavior?

HTH,
-jp



------------------------------

Date: Sat, 24 Jun 2006 20:19:29 GMT
From: "Mumia W." <mumia.w.18.spam+nospam.usenet@earthlink.net>
Subject: Re: Regex: Exact semantics of ^ and $ when using /m
Message-Id: <l7hng.1762$ii.1450@newsread3.news.pas.earthlink.net>

Wolfgang Thomas wrote:
> Hi,

Hi Wolfgang.

> 
> Therefore I wonder why the following example does not match:
> 
> my $s = "123\n456";
> if ($s =~ /3$^4/m) {print "match (4)\n";}
> [...]

^ only matches the beginning of a line when it appears at the beginning
of the RE, and $ only matches the end of a line when it appears at the
end of the RE.

Use \n to match newlines embedded inside an RE:
if ($s =~ /3\n4/) { ... }




------------------------------

Date: Sat, 24 Jun 2006 22:41:21 +0200
From: "Dr.Ruud" <rvtol+news@isolution.nl>
Subject: Re: Regex: Exact semantics of ^ and $ when using /m
Message-Id: <e7kfdf.1d8.1@news.isolution.nl>

Mumia W. schreef:

> ^ only matches the beginning of a line when it appears at the
> beginning of the RE, and $ only matches the end of a line when it
> appears at the end of the RE.


No.

  perl -wle '"a\nb" =~ /a$(?:\n)^b/m and print 1'

  perl -wle '"a\nb" =~ / a $ \n ^ b /mx and print 1'

  perl -wle '"a\nb" =~ / a $ \s ^ b /mx and print 1'

  perl -wle '"a\nb" =~ / a $ (?:[^^]) ^ b /mx and print 1'

etc.

-- 
Affijn, Ruud

"Gewoon is een tijger."





------------------------------

Date: 24 Jun 2006 21:37:45 +0200
From: Gabriel Dos Reis <gdr@integrable-solutions.net>
Subject: Re: Saying "latently-typed language" is making a category mistake
Message-Id: <m3zmg2tkdy.fsf@uniton.integrable-solutions.net>

"Marshall" <marshall.spight@gmail.com> writes:

| David Hopwood wrote:
| >
| > A type system that required an annotation on all subprograms that do not
| > provably terminate, OTOH, would not impact expressiveness at all, and would
| > be very useful.
| 
| Interesting. I have always imagined doing this by allowing an
| annotation on all subprograms that *do* provably terminate. If
| you go the other way, you have to annotate every function that
| uses general recursion (or iteration if you swing that way) and that
| seems like it might be burdensome. Further, it imposes the
| annotation requirement even where the programer might not
| care about it, which the reverse does not do.

simple things should stay simple.  Recursions that provably terminate
are among the simplest ones.  Annotations in those cases could be
allowed, but not required.  Otherwise the system might become very
irritating to program with.

-- Gaby


------------------------------

Date: 24 Jun 2006 11:54:32 -0700
From: "Marshall" <marshall.spight@gmail.com>
Subject: Re: Saying "latently-typed language" is making a category mistake
Message-Id: <1151175272.674919.141210@y41g2000cwy.googlegroups.com>

Gabriel Dos Reis wrote:
> "Marshall" <marshall.spight@gmail.com> writes:
>
> | David Hopwood wrote:
> | >
> | > A type system that required an annotation on all subprograms that do not
> | > provably terminate, OTOH, would not impact expressiveness at all, and would
> | > be very useful.
> |
> | Interesting. I have always imagined doing this by allowing an
> | annotation on all subprograms that *do* provably terminate. If
> | you go the other way, you have to annotate every function that
> | uses general recursion (or iteration if you swing that way) and that
> | seems like it might be burdensome. Further, it imposes the
> | annotation requirement even where the programer might not
> | care about it, which the reverse does not do.
>
> simple things should stay simple.  Recursions that provably terminate
> are among the simplest ones.  Annotations in those cases could be
> allowed, but not required.  Otherwise the system might become very
> irritating to program with.

Yes, exactly my point.


Marshall



------------------------------

Date: Sat, 24 Jun 2006 22:00:54 GMT
From: David Hopwood <david.nospam.hopwood@blueyonder.co.uk>
Subject: Re: Saying "latently-typed language" is making a category mistake
Message-Id: <qCing.477123$xt.142183@fe3.news.blueyonder.co.uk>

Pascal Costanza wrote:
> Vesa Karvonen wrote:
> 
>> I think that we're finally getting to the bottom of things.  While
>> reading your reponses something became very clear to me: latent-typing and
>> latent-types are not a property of languages.  Latent-typing, also known as
>> informal reasoning, is something that all programmers do as a normal part
>> of programming.  To say that a language is latently-typed is to make a
>> category mistake, because latent-typing is not a property of languages.
> 
> I disagree with you and agree with Anton. Here, it is helpful to
> understand the history of Scheme a bit: parts of its design are a
> reaction to what Schemers perceived as having failed in Common Lisp (and
> other previous Lisp dialects).
> 
> One particularly illuminating example is the treatment of nil in Common
> Lisp. That value is a very strange beast in Common Lisp because it
> stands for several concepts at the same time: most importantly the empty
> list and the boolean false value. Its type is also "interesting": it is
> both a list and a symbol at the same time. It is also "interesting" that
> its quoted value is equivalent to the value nil itself. This means that
> the following two forms are equivalent:
> 
> (if nil 42 4711)
> (if 'nil 42 4711)
> 
> Both forms evaluate to 4711.
> 
> It's also the case that taking the car or cdr (first or rest) of nil
> doesn't give you an error, but simply returns nil as well.
> 
> The advantage of this design is that it allows you to express a lot of
> code in a very compact way. See
> http://www.apl.jhu.edu/~hall/lisp/Scheme-Ballad.text for a nice
> illustration.
> 
> The disadvantage is that it is mostly impossible to have a typed view of
> nil, at least one that clearly disambiguates all the cases. There are
> also other examples where Common Lisp conflates different types, and
> sometimes only for special cases. [1]
> 
> Now compare this with the Scheme specification, especially this section:
> http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-6.html#%25_sec_3.2
> 
> This clearly deviates strongly from Common Lisp (and other Lisp
> dialects). The emphasis here is on a clear separation of all the types
> specified in the Scheme standard, without any exception. This is exactly
> what makes it straightforward in Scheme to have a latently typed view of
> programs, in the sense that Anton describes. So latent typing is a
> property that can at least be enabled / supported by a programming
> language, so it is reasonable to talk about this as a property of some
> dynamically typed languages.

If anything, I think that this example supports my and Vesa's point.
The example demonstrates that languages
*that are not distinguished in whether they are called latently typed*
support informal reasoning about types to varying degrees.

-- 
David Hopwood <david.nospam.hopwood@blueyonder.co.uk>


------------------------------

Date: Sat, 24 Jun 2006 21:53:04 GMT
From: David Hopwood <david.nospam.hopwood@blueyonder.co.uk>
Subject: Termination and type systems
Message-Id: <4ving.213803$8W1.62401@fe1.news.blueyonder.co.uk>

Marshall wrote:
> David Hopwood wrote:
> 
>>A type system that required an annotation on all subprograms that do not
>>provably terminate, OTOH, would not impact expressiveness at all, and would
>>be very useful.
> 
> Interesting. I have always imagined doing this by allowing an
> annotation on all subprograms that *do* provably terminate. If
> you go the other way, you have to annotate every function that
> uses general recursion (or iteration if you swing that way) and that
> seems like it might be burdensome.

Not at all. Almost all subprograms provably terminate (with a fairly
easy proof), even if they use general recursion or iteration.

If it were not the case that almost all functions provably terminate,
then the whole idea would be hopeless. If a subprogram F calls G, then
in order to prove that F terminates, we probably have to prove that G
terminates. Consider a program where only half of all subprograms are
annotated as provably terminating. In that case, we would be faced with
very many cases where the proof cannot be discharged, because an
annotated subprogram calls an unannotated one.

If, on the other hand, more than, say, 95% of subprograms provably
terminate, then it is much more likely that we (or the inference
mechanism) can easily discharge any particular proof involving more
than one subprogram. So provably terminating should be the default,
and other cases should be annotated.

In some languages, annotations may never or almost never be needed,
because they are implied by other characteristics. For example, the
concurrency model used in the language E (www.erights.org) is such
that there are implicit top-level event loops which do not terminate
as long as the associated "vat" exists, but handling a message is
always required to terminate.

> Further, it imposes the
> annotation requirement even where the programer might not
> care about it, which the reverse does not do.

If the annotation marks not-provably-terminating subprograms, then it
calls attention to those subprograms. This is what we want, since it is
less safe/correct to use a nonterminating subprogram than a terminating
one (in some contexts).

There could perhaps be a case for distinguishing annotations for
"intended not to terminate", and "intended to terminate, but we
couldn't prove it".

I do not know how well such a type system would work in practice; it may
be that typical programs would involve too many non-trivial proofs. This
is something that would have to be tried out in a research language.

-- 
David Hopwood <david.nospam.hopwood@blueyonder.co.uk>


------------------------------

Date: Sat, 24 Jun 2006 23:05:13 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: unpack 'C'
Message-Id: <ae9k7e.0im.ln@teal.hjp.at>

David Squire wrote:
> Peter J. Holzer wrote:
>> (In fact, I'm not sure if the behaviour of unpack 'C*' is correct - the
>> docs aren't clear and it does violate the principle of least
>> astonishment).
> 
> I don't think the docs are that unclear. In perlfunc#pack it says:
> 
> "C   An unsigned char value.  Only does bytes.  See U for Unicode."

Yup. But that's for pack, and there is no ambiguity for pack.
pack('C*', 0xFC) always returns "\x{FC}". But the reverse operation is
ambiguous: unpack('C*', "\x{FC}") may return (0xFC) or (0xC3, 0xBC),
depending on whether the string happens to have the UTF-8 flag set or
not. I find this surprising and I find no mention that this is the
intended behaviour (rather than a side-effect of the implementation).
"Only does bytes" in the description of pack IMHO means "pack takes only
values from 0 to 255 and returns a byte string". It doesn't explicitely
say anything about the behaviour of unpack when fed a UTF-8 string, and
I'd like to have this explicitely spelled out (even if it is only "the
behaviour is undefined").

        hp

-- 
   _  | Peter J. Holzer    | Man könnte sich [die Diskussion] auch
|_|_) | Sysadmin WSR/LUGA  | sparen, wenn man sie sich einfach sparen
| |   | hjp@hjp.at         | würde.
__/   | http://www.hjp.at/ |   -- Ralph Angenendt in dang 2006-04-15


------------------------------

Date: Sat, 24 Jun 2006 23:41:08 +0200
From: "Peter J. Holzer" <hjp-usenet2@hjp.at>
Subject: Re: unpack 'C'
Message-Id: <khbk7e.8pm.ln@teal.hjp.at>

Peter J. Holzer wrote:
>> "C   An unsigned char value.  Only does bytes.  See U for Unicode."
> 
> Yup. But that's for pack, and there is no ambiguity for pack.
> pack('C*', 0xFC) always returns "\x{FC}". But the reverse operation is
> ambiguous: unpack('C*', "\x{FC}") may return (0xFC) or (0xC3, 0xBC),
> depending on whether the string happens to have the UTF-8 flag set or
> not. I find this surprising and I find no mention that this is the
> intended behaviour (rather than a side-effect of the implementation).
> "Only does bytes" in the description of pack IMHO means "pack takes only
> values from 0 to 255 and returns a byte string". It doesn't explicitely
> say anything about the behaviour of unpack when fed a UTF-8 string, and
> I'd like to have this explicitely spelled out (even if it is only "the
> behaviour is undefined").

Actually, it is specified, just not where I expected it.

perldoc perlunicode:

·   The "chr()" and "ord()" functions work on characters, similar to
   "pack("U")" and "unpack("U")", not "pack("C")" and "unpack("C")".
   "pack("C")" and "unpack("C")" are methods for emulating byte-oriented
   "chr()" and "ord()" on Unicode strings.  While these methods reveal the
   internal encoding of Unicode strings, that is not something one normally
   needs to care about at all.

BTW, there is an open bug about a similar matter:
http://rt.perl.org/rt3/Ticket/Display.html?id=33734 

Looks like the behaviour of unpack will change for Unicode strings in
5.8.9 and 5.10 (although probably not for "C").

        hp


-- 
   _  | Peter J. Holzer    | Man könnte sich [die Diskussion] auch
|_|_) | Sysadmin WSR/LUGA  | sparen, wenn man sie sich einfach sparen
| |   | hjp@hjp.at         | würde.
__/   | http://www.hjp.at/ |   -- Ralph Angenendt in dang 2006-04-15


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 9370
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[28006] in Perl-Users-Digest

Perl-Users Digest, Issue: 9370 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Sat Jun 24 18:05:55 2006

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sat Jun 24 18:05:55 2006