[28107] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 9471 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Jul 14 14:10:16 2006

Date: Fri, 14 Jul 2006 11:10:07 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Fri, 14 Jul 2006     Volume: 10 Number: 9471

Today's topics:
    Re: What is a type error? <jo@durchholz.org>
    Re: What is a type error? <jo@durchholz.org>
    Re: What is a type error? <marshall.spight@gmail.com>
    Re: What is a type error? <marshall.spight@gmail.com>
    Re: What is a type error? <marshall.spight@gmail.com>
    Re: What is a type error? <marshall.spight@gmail.com>
    Re: What is a type error? rossberg@ps.uni-sb.de
    Re: What is a type error? <dnew@san.rr.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Fri, 14 Jul 2006 12:10:21 +0200
From: Joachim Durchholz <jo@durchholz.org>
Subject: Re: What is a type error?
Message-Id: <e97qrh$cou$1@online.de>

Marshall schrieb:
> By your definition, "pointer" and "variable" are synonyms. That doesn't
> seem like a good idea to me. (What if i and j end up in registers?
> I have not heard it said that registers have addresses.)

There are no registers in the JVM ;-P

More specifically, where Joe said "pointer" he really meant "reference". 
i refers to a variable, and you really want to make sure that the first 
i refers to the same variable as the second i, so whatever is referred 
to by i is really an identity.

>>> Those two local primitive variables cannot be made to have the same
>>> identity. But you can update them, so this is an example of mutability
>>> without the possibility of identity.
>> The identity is temporal:  You use the same variable name at two
>> different times.  Do you intend for the second `i' to mean the same
>> variable as the first `i'?
> 
> Okay, so "i" and "i" have the same identity.

Vacuous but true.

> I suppose you could argue that the language's namespace is an
> address-space, and variable names are addresses.

Correct.

> At this point, though, you've made the concept of identity so broad
> that it is now necessarily a property of all languages that use named
> values, whether updatable or not. I think you would also have to call
> "3" a void pointer by this scheme.

It's not a void pointer, it's a pointer to an integer, but you're 
correct: "3", when written in a program text, is a reference to the 
successor of the successor of the successor of zero.

If you find that silly and would like to stick with equality, then 
you're entirely correct. Natural numbers are entirely immutable by 
definition, and I'm definitely not the first one to observe that 
identity and equality are indistinguishable for immutable objects.

> Clearly there is *some* difference between a language which allows
> explicit pointers and thus aliasing and one that doesn't.

You can have aliasing without pointers; e.g. arrays are fully sufficient.
If i = j, then a [i] and a [j] are aliases of the same object.

(After I observed that, I found it no longer a surprise that array 
optimizations are what Fortran compiler teams sink most time into. The 
aliasing problems are *exactly* the same as those with pointers in C - 
though in practice, the Fortranistas have the advantage that the 
compiler will usually know that a [i] and b [j] cannot be aliases of 
each other, so while they have the same problems, the solutions give the 
Fortranistas more leverage.)

 > What term would you use? First-class variables?

I think it's more a quasi-continuous spectrum.

On one side, we have alias-free toy languages (no arrays, no pointers, 
no call-by-reference, just to name the most common sources of aliasing).

SQL is slightly more dangerous: it does have aliases, but they are 
rarely a problem (mostly because views aren't a very common thing, and 
even if they are used, they aren't usually so abstract that they really 
hide the underlying dependencies).

Next are languages with arrays and call-by-reference. "Pascal without 
pointers" would be a candidate.
Here, aliasing occurs, and it can and does hide behind abstraction 
barriers, but it's not a serious problem unless the software becomes 
*really* large.

The end of the line are languages that use references to mutable data 
everywhere. All OO languages are a typical example of that.

Regards,
Jo


------------------------------

Date: Fri, 14 Jul 2006 13:19:05 +0200
From: Joachim Durchholz <jo@durchholz.org>
Subject: Re: What is a type error?
Message-Id: <e97usd$ipd$1@online.de>

Marshall schrieb:
> Joachim Durchholz wrote:
>> It's just that I know that it's viable to give up destructive updates.
>> Giving up pointers is a far more massive restriction.
> 
> Oddly, I feel the opposite. While it's true there are many domains
> for which purely functional programming is a fine choice, there
> are some domains for which it is insufficient. Any kind of data
> managament, for example, requires that you be able to update
> the information.

Sure, the data itself cannot be managed non-destructively.

However, the programs that operate on the data need not do destructive 
updates internally. That way, I can have pointers inside the programs 
without giving up aliasing.
(Aliasing is really, really, really important. It's the reason why 
mathematicians can handle infinite objects - they copy just the alias, 
i.e. the name or description, instead of the object itself. It's one of 
the things that make abstraction useful.)

>>> Right. To me the response to this clear: give up pointers. Imperative
>>> operations are too useful to give up; indeed they are a requirement
>>> for certain problems.
>> I don't know any.
>> In some cases, you need an additional level of conceptual indirection -
>> instead of *doing* the updates, you write a function that *describes* them.
> 
> But then what do you do with that function?

I pass it to an engine that's imperative.

However, that engine doesn't need to do aliasing anymore.

In other words, I'm separating mutability and aliasing, so that they 
can't combine and explode.

 > Let's say I have an
> employee database. John Smith just got hired on 1/1/2006 with
> a salary of $10,000. I need to record this fact somewhere. How
> do I do that without variables? Current-employees is a variable.
> Even if I have the space to keep all historical data, so I'm not
> deleting anything, I still have to have a variable for the latest
> version of the accumulated data. I can solve this without
> pointers, but I can't solve it without variables.

As I said elsewhere, the record has an identity even though it isn't 
explicit in SQL.
(SQL's data manipulation language is generally not considered half as 
"clean" as the query language; I think the core of the problemsis that 
DML combines mutability and identity.)

> I should like to learn more about these. I have some vague
> perception of the existence of linear logic, but not much
> else. However, I also already have an excellent solution
> to the pointer aliasing problem, so I'm less motivated.

Pointers are just the most obvious form of aliasing. As I said 
elsewhere, there are others: array indexing, by-reference parameters, or 
even WHERE clauses in SQL statements. In fact, anything that selects an 
updatable piece of data from a collection is an alias, and incurs the 
same problems as pointers, though they may come in different disguises.

>> Actually SQL has references - they are called "primary keys", but they
>> are references nevertheless.
> 
> I strongly object; this is quite incorrect. I grant you that from the
> 50,000 foot level they appear identical, but they are not. To
> qualify as a reference, there need to be reference and dereference
> operations on the reference datatype; there is no such operation
> is SQL.
> 
> Would you say the relational algebra has references?

Yes.

> Or, consider the classic prolog ancestor query. Let's say we're
> setting up as follows
> 
> father(bob, joe).
> father(joe, john).
> 
> Is "joe" a reference, here? After all, when we do the ancestor
> query for john, we'll see his father is joe and then use that to
> find joe's father. Keys in SQL are isomorphic to joe in the
> above prolog.

Agreed. They are both references ;-P

>> (Some SQL dialects also offer synthetic
>> "ID" fields that are guaranteed to remain stable over the lifetime of a
>> record.
> 
> Primary keys are updatable; there is nothing special about them.

Right - I was wrong with identifying keys and identities. In fact the 
identity of an SQL record cannot be directly obtained (unless via those 
ID fields).
The records still have identities. It's possible to have two WHERE 
clauses that refer to the same record, and if you update the record 
using one WHERE clause, the record returned by the other WHERE clause 
will have changed, too.

>> With a "repeatable read" isolation level, you actually return to a
>> declarative view of the database: whatever you do with it, you won't see
>> it until you commit the transaction. (As soon as you commit, the
>> declarative peace is over and you better watch out that your data
>> doesn't become inconsistent due to aliasing.)
> 
> Alas, transaction isolation levels are a performance hack.
> I cannot defend them on any logical basis. (Aside: did you mean
> serializable, rather than repeatable read?)

Possibly. There are so many isolation levels that I have to look them up 
whenever I want to get the terminology 100% correct.

>> The only thing that can really be done about it is not adding it
>> artificially into a program. In those cases where aliasing is part of
>> the modelled domain, you really have to carefully inspect all
>> interactions and never, never, never dream about abstracting it away.
> 
> Yes, aliasing introduces a lot of problems. This is one reason
> why closures make me nervous.

Me too.
On the other hand, they are so immensely useful.
Simply don't create closures with mutable data in them. Or, to the very 
least, make sure that no aliases outside the closure exist.

Regards,
Jo


------------------------------

Date: 14 Jul 2006 08:23:52 -0700
From: "Marshall" <marshall.spight@gmail.com>
Subject: Re: What is a type error?
Message-Id: <1152890632.892109.61930@i42g2000cwa.googlegroups.com>

Joachim Durchholz wrote:
> Marshall schrieb:
> > What about my example of SQL? Mutation, no pointers, no aliasing.
> > Yet: useful.
>
> Sorry, but SQL does have aliasing.

Well. I suppose we do not have an agreed upon definition
of aliasing, so it is hard to evaluate either way. I would
propose using the same one you used for identity:

if there are two variables and modifying one also modifies
the other, then there is aliasing between them.

I avoided mentioning equality to include, for example,
having an array i that is an alias to a subset of array j.
Please feel free to critque this definition, and/or supply
an alternative.


> E.g. if you have records that have name="John", surname="Doe", the
> statements
>    SELECT * FROM persons WHERE name = "John"
> and
>    SELECT * FROM persons WHERE name = "Doe"
> are aliases of each other.
>
> The alias is actually in the WHERE clause.

Not by my definition, because there is only one variable here.


> And this *can* get you into
> trouble if you have something that does
>    UPDATE ... WHERE name = "John"
> and
>    UPDATE ... WHERE surname = "Doe"
> e.g. doing something with the Johns, then updating the names of all
> Does, and finally restoring the Johns (but not recognizing that changing
> the names of all Does might have changed your set of Johns).

The fact that some person might get confused about the
semantics of what they are doing does not indicate aliasing.
It is easy enough to do an analysis of your updates and
understand what will happen; this same analysis is impossible
with two arbitrary pointers, unless one has a whole-program
trace. That strikes me as a significant difference.


> Conceptually, this is just the same as having two different access path
> to the same memory cell. Or accessing the same global variable through a
> call-by-reference parameter and via its global name.

There are similarities, but they are not the same.


> BTW with views, you get not just aliasing but what makes aliasing really
> dangerous. Without views, you can simply survey all the queries that you
> are working with and lexically compare table and field names to see
> whether there's aliasing. With views, the names that you see in a
> lexical scope are not enough to determine aliasing.
> E.g. if you use a view that builds upon the set of Johns but aren't
> aware of that (possibly due to abstraction barriers), and you change the
> name field of all Does, then you're altering the view without a chance
> to locally catch the bug. That's just as bad as with any pointer
> aliasing problem.

It is certainly aliasing, but I would not call it "just as bad." There
are
elaborate declarative constraint mechanisms in place, for example.
And the definition of the view itsef is a declarative, semantic
entity. Whereas a pointer is an opaque, semantics-free address.


Marshall



------------------------------

Date: 14 Jul 2006 08:29:32 -0700
From: "Marshall" <marshall.spight@gmail.com>
Subject: Re: What is a type error?
Message-Id: <1152890972.511862.231590@m79g2000cwm.googlegroups.com>

Andreas Rossberg wrote:
> Marshall wrote:
> >
> > After all, what are the alternatives? Purely-functional
> > languages remove themselves from a large class of
> > problems that I consider important: data management.
>
> Maybe, but I have yet to see how second-class variables are really more
> adequate in dealing with it.
>
> And note that even with second-class state you can still have aliasing
> issues - you just need mutable arrays and pass around indices. Keys in
> databases are a more general form of the same problem.

So for array a, you would claim that "a[5]" is an alias for
(a part of) "a"? That seems to stretch the idea of aliasing
to me. With these two expressions, it is obvious enough
what is going on; with two arbitrary pointers, it is not.

It seems to me the source of the problem is the opacity
of pointers, but perhaps I am missing something.


> > I have explored the OO path to its bitter end and am
> > convinced it is not the way. So what is left? Uniqueness
> > types and logic programming, I suppose. I enjoy logic
> > programming but it doesn't seem quite right. But notice:
> > no pointers there!  And it doesn't seem to suffer from the
> > lack.
>
> Uh, aliasing all over the place! Actually, I think that logic
> programming, usually based on deep unification, brings by far the worst
> incarnation of aliasing issues to the table.

Hmmm. Can you elaborate just a bit?


Marshall



------------------------------

Date: 14 Jul 2006 08:31:32 -0700
From: "Marshall" <marshall.spight@gmail.com>
Subject: Re: What is a type error?
Message-Id: <1152891092.888466.79510@35g2000cwc.googlegroups.com>

Joachim Durchholz wrote:
> Marshall schrieb:
> > void foo() {
> >   int i = 0;
> >   int j = 0;
> >   j = 1;
> >   i = 2;
> >   // check value of j here. It is still 1, no matter what you filled
>  >   // in above.
> >   // The assignment to i cannot be made to affect the value of j.
> > }
> >
> > Those two local primitive variables cannot be made to have the same
> > identity.
>
> Sure. To state it more clearly, they cannot be aliases.

Yes.


>  > But you can update them, so this is an example of mutability
> > without the possibility of identity.
>
> You're being a bit sloppy with terminology here. "Identity" in the
> phrase above refers to two entities, in the sense of "i and j cannot be
> identical".
> Identity is actually a property of a single entity, namely that what
> remains constant regardless of what you do with the entity.

It is a fair critque. I'll try to be more precise.


Marshall



------------------------------

Date: 14 Jul 2006 08:45:20 -0700
From: "Marshall" <marshall.spight@gmail.com>
Subject: Re: What is a type error?
Message-Id: <1152891920.228515.188180@m73g2000cwd.googlegroups.com>

Joachim Durchholz wrote:
>
> You can have aliasing without pointers; e.g. arrays are fully sufficient.
> If i = j, then a [i] and a [j] are aliases of the same object.

I am having a hard time with this very broad definition of aliasing.
Would we also say that a[1+1] and a[2] are aliases? It seems
to me, above, that we have only a, and with only one variable
there can be no aliasing.

A further question:

given a 32 bit integer variable x, and offsets i and j (equal as in
the above example) would you say that

   x &= (1 << i)
and
   x &= (1 << j)

are aliased expressions for setting a particular bit in x?

I am not being facetious; I am trying to understand the limits
of your definition for aliasing.


> (After I observed that, I found it no longer a surprise that array
> optimizations are what Fortran compiler teams sink most time into. The
> aliasing problems are *exactly* the same as those with pointers in C -
> though in practice, the Fortranistas have the advantage that the
> compiler will usually know that a [i] and b [j] cannot be aliases of
> each other, so while they have the same problems, the solutions give the
> Fortranistas more leverage.)

I don't understand this paragraph. On the one hand, it seems you
are saying that C and Fortran are identically burdened with the
troubles caused by aliasing, and then a moment later it seems
you are saying the situation is distinctly better with Fortran.


>  > What term would you use? First-class variables?
>
> I think it's more a quasi-continuous spectrum.
>
> On one side, we have alias-free toy languages (no arrays, no pointers,
> no call-by-reference, just to name the most common sources of aliasing).
>
> SQL is slightly more dangerous: it does have aliases, but they are
> rarely a problem (mostly because views aren't a very common thing, and
> even if they are used, they aren't usually so abstract that they really
> hide the underlying dependencies).
>
> Next are languages with arrays and call-by-reference. "Pascal without
> pointers" would be a candidate.
> Here, aliasing occurs, and it can and does hide behind abstraction
> barriers, but it's not a serious problem unless the software becomes
> *really* large.
>
> The end of the line are languages that use references to mutable data
> everywhere. All OO languages are a typical example of that.

Now with this, it appears you are agreeing that SQL has an advantage
vis-a-vis aliasing compared to OO languages. Yes? If so, we are
agreeing on the part I care about, and the specifics of just what
we call aliasing are not so important to me.


Marshall



------------------------------

Date: 14 Jul 2006 10:17:23 -0700
From: rossberg@ps.uni-sb.de
Subject: Re: What is a type error?
Message-Id: <1152897443.675121.58750@m73g2000cwd.googlegroups.com>

Marshall wrote:
> Andreas Rossberg wrote:
> >
> > And note that even with second-class state you can still have aliasing
> > issues - you just need mutable arrays and pass around indices. Keys in
> > databases are a more general form of the same problem.
>
> So for array a, you would claim that "a[5]" is an alias for
> (a part of) "a"? That seems to stretch the idea of aliasing
> to me.

Not at all, I'd say. In my book, aliasing occurs whenever you have two
different "lvalues" that denote the same mutable cell. I think it would
be rather artificial to restrict the definition to lvalues that happen
to be variables or dereferenced pointers.

So of course, a[i] aliases a[j] when i=j, which in fact is a well-known
problem in some array or matrix routines (e.g. copying array slices
must always consider overlapping slices as special cases).

> > Uh, aliasing all over the place! Actually, I think that logic
> > programming, usually based on deep unification, brings by far the worst
> > incarnation of aliasing issues to the table.
>
> Hmmm. Can you elaborate just a bit?

Logic variables immediately become aliases when you unify them.
Afterwards, after you bind one, you also bind the other - or fail,
because both got already bound the other way round. Unfortunately,
unification also is a deep operation, that is, aliasing can be induced
into variables deeply nested into some terms that happen to get
unified, possibly without you being aware of any of this (the
unification, nor the variable containment). To avoid accidental
aliasing you essentially must keep track of all potential unifications
performed by any given predicate (including transitively, by its
subgoals), which totally runs square to all concerns of modularity and
abstraction.

I've never been much of a logic programmer myself, but our language
group has a dedicated LP and CP background, and people here have
developed very strong opinions about the adequateness of unrestricted
logic variables and particularly unification in practice. I remember
somebody calling Prolog "the worst imperative language ever invented".

- Andreas



------------------------------

Date: Fri, 14 Jul 2006 17:21:43 GMT
From: Darren New <dnew@san.rr.com>
Subject: Re: What is a type error?
Message-Id: <HoQtg.28241$uy3.27425@tornado.socal.rr.com>

Andreas Rossberg wrote:
> OK, this is interesting. I don't know Hermes, is this sort of like a 
> dynamically checked equivalent of linear or uniqueness typing?

I'm not sure what linear or uniqueness typing is. It's typestate, and if 
I remember correctly the papers I read 10 years ago, the folks at 
TJWatson that invented Hermes also invented the concept of typestate. 
They at least claim to have coined the term.

It's essentially a dataflow analysis that allows you to do the same 
sorts of things that "don't read variables that may not yet have been 
assigned to", except that you could annotate that variables change to 
the state of "uninitialized" after they've already been initialized.

> Mh, but if I understand correctly, this seems to require performing a 
> deep copy - which is well-known to be problematic, and particularly 
> breaks all kinds of abstractions.

Um, no, because there are no aliases. There's only one name for any 
given value, so there's no "deep copy" problems. A deep copy and a 
shallow copy are the same thing. And there are types of values you can 
assign but not copy, such as callmessages (which are problematic to copy 
for the same reason a stack frame would be problematic to copy).

I believe, internally, that there were cases where the copy was "deep" 
and cases where it was "shallow", depending on the surrounding code. 
Making a copy of a table and passing it to another process had to be a 
"deep" copy (given that a column could contain another table, for 
example). Making a copy of a table and using it for read-only purposes 
in the same process would likely make a shallow copy of the table. 
Iterating over a table and making changes during the iteration made a 
copy-on-write subtable, then merged it back into the original table when 
it was done the loop, since the high-level semantic definition of 
looping over a table is that you iterate over a copy of the table.

The only thing close to aliases are references to some other process's 
input ports (i.e., multiple client-side sockets connected to a 
server-side socket). If you want to share data (such as a file system or 
program library), you put the data in a table in a process, and you hand 
out client-side connections to the process. Mostly, you'd define such 
connections as accepting a data value (for the file contents) with the 
parameter being undefined upon return from the call, and the file name 
as being read-only, for example. If you wanted to store the file, you 
could just pass a pointer to its data (in the implementation). If you 
wanted a copy of it, you would either copy it and pass the pointer, or 
you'd pass the pointer with a flag indicating it's copy-on-write, or you 
could pass the pointer and have the caller copy it at some point before 
returning, depending on what the caller did with it. The semantics were 
high-level with the intent to allow the compiler lots of leeway in 
implementation, not unlike SQL.

> Not to mention the issue with 
> uninitialized variables that I would expect occuring all over the place. 

The typestate tracks this, and prevents you from using uninitialized 
variables. If you do a read (say, from a socket) and it throws an "end 
of data" exception, the compiler prevents you from using the buffer you 
just tried but failed to read.

Indeed, that's the very point of it all. By doing this, you can run 
"untrusted" code in the same address space as trusted code, and be 
assured that the compiler will prevent the untrusted code from messing 
up the trusted code. The predecessor of Hermes (NIL) was designed to let 
IBM's customers write efficient networking code and emulations and such 
that ran in IBM's routers, without the need for expensive (in 
performance or money) hardware yet with the safety that they couldn't 
screw up IBM's code and hence cause customer service problems.

> So unless I'm misunderstanding something, this feels like trading one 
> evil for an even greater one.

In truth, it was pretty annoying. But more because you wound up having 
to write extensive declarations and compile the declarations before 
compiling the code that implements them and such. That you didn't get to 
use uninitialized variables was a relatively minor thing, especially 
given that many languages nowadays complain about uninitialized 
variables, dead code, etc. But for lots of types of programs, it let you 
do all kinds of things with a good assurance that they'd work safely and 
efficiently. It was really a language for writing operating systems in, 
when you get right down to it.

-- 
   Darren New / San Diego, CA, USA (PST)
     This octopus isn't tasty. Too many
     tentacles, not enough chops.


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 9471
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[28107] in Perl-Users-Digest

Perl-Users Digest, Issue: 9471 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Fri Jul 14 14:10:16 2006

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Fri Jul 14 14:10:16 2006