[28115] in Perl-Users-Digest


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Perl-Users Digest, Issue: 9479 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Jul 16 06:10:14 2006

Date: Sun, 16 Jul 2006 03:10:06 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Sun, 16 Jul 2006     Volume: 10 Number: 9479

Today's topics:
    Re: What is a type error? <marshall.spight@gmail.com>
    Re: What is a type error? <jo@durchholz.org>
    Re: Yet another flock question... <socyl@987jk.com.invalid>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: 15 Jul 2006 19:36:16 -0700
From: "Marshall" <marshall.spight@gmail.com>
Subject: Re: What is a type error?
Message-Id: <1153017376.379831.306420@35g2000cwc.googlegroups.com>

Joachim Durchholz wrote:
> Marshall schrieb:
> > Joachim Durchholz wrote:
> >> As I said elsewhere, the record has an identity even though it isn't
> >> explicit in SQL.
> >
> > Hmmmm. What can this mean?
> >
> > In general, I feel that "records" are not the right conceptual
> > level to think about.
>
> They are, when it comes to aliasing of mutable data. I think it's
> justified by the fact that aliased mutable data has a galling tendency
> to break abstraction barriers. (More on this on request.)

I can see how pointers, or other kinds of aliasing
(by my more restricted definition) break abstraction barries;
it is hard to abstract something that can change out from
under you uncontrollably.


> > In any event, I am not sure what you mean by non-explicit
> > identity.
>
> The identity isn't visible from inside SQL. (Unless there's an OID
> facility available, which *is* an explicit identity.)

Agreed about OIDs.


>  > I would say, records in SQL have value, and their
> > identity is exactly their value.
>
> Definitely not. You can have two equal records and update just one of
> them, yielding non-equal records; by my definition (and by intuition),
> this means that the records were equal but not identical.

This sort of thing comes up when one has made the mistake of
not defining any keys on one's table and needs to rectify the
situation. It is in fact considered quite a challenge to do.
My preferred technique for such a situation is to create
a new table with the same columns and SELECT DISTINCT ...
INTO ... then recreate the original table. So that way doesn't
fit your model.

How else might you do it? I suppose there are nonstandard
extensions, such as UPDATE ... LIMIT. And in fact there
might be some yucky thing that made it in to the standard
that will work.

But what you descrbe is certainly *not* possible in the
relational algebra; alas that SQL doesn't hew closer
to it. Would you agree? Would you also agree that
if a language *did* adhere to relation semantics,
then relation elements would *not* have identity?
(Under such a language, one could not have two
equal records, for example.)

[The above paragraph is the part of this post that I
really care about; feel free to ignore the rest if it's naive
or annoying or whatever, but please do respond to this.]


>  > I do not see that they have
> > any identity outside of their value. We can uniquely identify
> > any particular record via a key, but a table may have more
> > than one key, and an update may change the values of one
> > key but not another. So it is not possible in general to
> > definitely and uniquely assign a mapping from each record
> > of a table after an update to each record of the table before
> > the update, and if you can't do that, then where
> > is the record identity?
>
> Such a mapping is indeed possible. Simply extend the table with a new
> column, number the columns consecutively, and identify the records via
> that column.

That doesn't work for me. It is one thing to say that for all
tables T, for all elements e in T, e has identity. It is a different
thing to say that for all tables T, there exists a table T' such
that for all elements e in T', e has identity.


> But even if you don't do that, there's still identity. It is irrelevant
> whether the programs can directly read the value of the identity field;
> the adverse effects happen because updates are in-place. (If every
> update generated a new record, then we'd indeed have no identity.)
>
> > Okay. At this point, though, the term aliasing has become extremely
> > general. I believe "i+1+1" is an alias for "i+2" under this definition.
>
> No, "i+1+1" isn't an alias in itself. It's an expression - to be an
> alias, it would have to be a reference to something.
>
> However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
> important - 1+1 is replacable by 2 in every context, so this is
> essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
> vacuously true.

To me, the SQL with the where clause is like i+2, not like a[2].
A query is simply an expression that makes mention of a variable.
However I would have to admit that if SQL table elements have
identity, it would be more like the array example.

(The model of SQL I use in my head is one with set semantics,
and I am careful to avoid running in to bag semantics by, for
example, always defining at least one key, and using SELECT
DISTINCT where necessary. But it is true that the strict set
semantics in my head are not the wobbly bag semantics in
the standard.)


> There's another aspect here. If two expressions are always aliases to
> the same mutable, that's usually easy to determine; this kind of
> aliasing is usually not much of a problem.
> What's more of a problem are those cases where there's occasional
> aliasing. I.e. a[i] and a[j] may or may not be aliases of each other,
> depending on the current value of i and j, and *that* is a problem
> because the number of code paths to be tested doubles. It's even more of
> a problem because testing with random data will usually not uncover the
> case where the aliasing actually happens; you have to go around and
> construct test cases specifically for the code paths that have aliasing.
> Given that references may cross abstraction barriers (actually that's
> often the purpose of constructing a reference in the first place), this
> means you have to look for your test cases at multiple levels of
> software abstraction, and *that* is really, really bad.
>
> > That is so general that I am concerned it has lost its ability to
> > identify problems specific to pointers.
>
> If the reference to "pointers" above means "references", then I don't
> know about any pointer problems that cannot be reproduced, in one form
> or the other, in any of the other aliasing mechanisms.

It seems we agree that, for example, raw C pointers can easily
cause aliasing problems. To me, Java references, which are much
more tightly controlled, are still subject to aliasing problems,
although
in practice the situation is significantly less severe. (It is possible
to have multiple references to a single Java mutable object and
not know it.) I would assume the same is posible with, say, SML
refs.

However the situation in SQL strikes me as being different.
If one is speaking of base tables, and one has two queries,
one has everything one needs to know simply looking at
the queries. If views are involved, one must further examine
the definition of the view.

For example, we might ask in C, if we update a[i], will
the value of b[j] change? And we can't say, because
we don't know if there is aliasing. But in Fortran, we
can say that it won't, because they are definitely
different variables. But even so in Fortran, if we ask
if we update a[i], will a[j] change, and we know we
can't tell unless we know the values of i and j.

In SQL, if we update table T, will a SELECT on table
S change? We know it won't; they are different variables.
But if we update T, will a select on T change? It
certainly might, but I wouldn't call it aliasing, because
the two variables involved are the same variable, T.


> > Again, by generalizing the term this far, I am concerned with a
> > loss of precision. If "joe" in the prolog is a references, then
> > "reference" is just a term for "data" that is being used in a
> > certain way. The conection with a specfic address space
> > has been lost in favor of the full domain of the datatype.
>
> Aliasing is indeed a more general idea that goes beyond address spaces.
>
> However, identity and aliasing can be defined in fully abstract terms,
> so I welcome this opportunity to get rid of a too-concrete model.
>
> >> The records still have identities. It's possible to have two WHERE
> >> clauses that refer to the same record, and if you update the record
> >> using one WHERE clause, the record returned by the other WHERE clause
> >> will have changed, too.
> >
> > Is this any different from saying that an expression that includes
> > a variable will produce a different value if the variable changes?
>
> Yes.
> Note that the WHERE clause properly includes array indexing (just set up
> a table that has continuous numeric primary key, and a single other column).

I am still not convinced. It is not sufficient to show how sometimes
you can establish identity, or how you could establish identity based
on some related table; it must be shown how the identity can be
established in general, and I don't think it can. If it can't, then
I claim we are back to expressions that include a variable, and
not aliasing, even by the wider definition.



> I.e. I'm not talking about how a[i] is an alias of a[i+1] after updating
> i, I'm talking about how a[i] may be an alias of a[j].
>
> > It seems odd to me to suggest that "i+1" has identity.
>
> It doesn't (unless it's passed around as a closure, but that's
> irrelevant to this discussion).
> "i" does have identity. "a[i]" does have identity. "a[i+1]" does have
> identity.
> Let me say that for purposes of this discussion, if it can be assigned
> to (or otherwise mutated), it has identity. (We *can* assign identity to
> immutable things, but it's equivalent to equality and not interesting
> for this discussion.)
>
>  > I can see
> > that i has identity, but I would say that "i+1" has only value.
>
> Agreed.

Cool.


> > But perhaps the ultimate upshoot of this thread is that my use
> > of terminology is nonstandard.
>
> It's somewhat restricted, but not really nonstandard.

Whew!


> >> Possibly. There are so many isolation levels that I have to look them up
> >> whenever I want to get the terminology 100% correct.
> >
> > Hmmm. Is it that there are so many, or that they are simply not
> > part of our daily concern?
>
> I guess it's the latter. IIRC there are four or five isolation levels.

Yes, four.


>  > It seems to me there are more different
> > styles of parameter passing than there are isolation levels, but
> > I don't usually see (competent) people (such as yourself) getting
> > call-by-value confused with call-by-reference.
>
> Indeed.
> Though the number of parameter passing mechanisms isn't that large
> anyway. Off the top of my head, I could recite just three (by value, by
> reference, by name aka lazy), the rest are just combinations with other
> concepts (in/out/through, most notably) or a mapping to implementation
> details (by reference vs. "pointer by value" in C++, for example).

Fair enough. I could only remember three, but I was sure someone
else could name more. :-)


Marshall



------------------------------

Date: Sun, 16 Jul 2006 11:00:58 +0200
From: Joachim Durchholz <jo@durchholz.org>
Subject: Re: What is a type error?
Message-Id: <e9cvhg$lrv$1@online.de>

Marshall schrieb:
> Joachim Durchholz wrote:
>> Marshall schrieb:
>>> I would say, records in SQL have value, and their
>>> identity is exactly their value.
 >>
>> Definitely not. You can have two equal records and update just one of
>> them, yielding non-equal records; by my definition (and by intuition),
>> this means that the records were equal but not identical.
> 
> This sort of thing comes up when one has made the mistake of
> not defining any keys on one's table and needs to rectify the
> situation. It is in fact considered quite a challenge to do.
> My preferred technique for such a situation is to create
> a new table with the same columns and SELECT DISTINCT ...
> INTO ... then recreate the original table. So that way doesn't
> fit your model.

I was mentioning multiple equal records just as an example where the 
records have an identity that's independent of the record values.

> How else might you do it? I suppose there are nonstandard
> extensions, such as UPDATE ... LIMIT. And in fact there
> might be some yucky thing that made it in to the standard
> that will work.

Right, it might be difficult to update multiple records that are exactly 
the same. It's not what SQL was intended for, and difficult to do anyway 
- I was thinking of LIMIT (I wasn't aware that it's nonstandard), and I 
agree that there may be other ways to do it.

However, I wouldn't overvalue that case. The Jane/John Doe example 
posted elsewhere in this thread is strictly within the confines of what 
SQL was built for, yet there is aliasing.

> But what you descrbe is certainly *not* possible in the
> relational algebra; alas that SQL doesn't hew closer
> to it. Would you agree?

Yup, SQL (particularly its update semantics) aren't relational semantics.
Still, it's SQL we've been talking about.

 > Would you also agree that
> if a language *did* adhere to relation semantics,
> then relation elements would *not* have identity?
> (Under such a language, one could not have two
> equal records, for example.)

Any identity that one could define within relational semantics would be 
just equality, so no, it wouldn't be very helpful (unless as part of a 
greater framework).
However, that's not really a surprise. Aliasing becomes relevant (even 
observable) only when there's updates, and relational algebra doesn't 
have updates.

>>  > I do not see that they have
>>> any identity outside of their value. We can uniquely identify
>>> any particular record via a key, but a table may have more
>>> than one key, and an update may change the values of one
>>> key but not another. So it is not possible in general to
>>> definitely and uniquely assign a mapping from each record
>>> of a table after an update to each record of the table before
>>> the update, and if you can't do that, then where
>>> is the record identity?
>> Such a mapping is indeed possible. Simply extend the table with a new
>> column, number the columns consecutively, and identify the records via
>> that column.
> 
> That doesn't work for me. It is one thing to say that for all
> tables T, for all elements e in T, e has identity. It is a different
> thing to say that for all tables T, there exists a table T' such
> that for all elements e in T', e has identity.

Let me continue that argument:
Records in T' have identity.
 From an SQL point-of-view, there's no difference between the records in 
T' and records in other tables, so they must share all properties, in 
particular that of having an identity.

(I agree that's not as convincing as seeing aliasing in action. See below.)

>> But even if you don't do that, there's still identity. It is irrelevant
>> whether the programs can directly read the value of the identity field;
>> the adverse effects happen because updates are in-place. (If every
>> update generated a new record, then we'd indeed have no identity.)
>>
>>> Okay. At this point, though, the term aliasing has become extremely
>>> general. I believe "i+1+1" is an alias for "i+2" under this definition.
>> No, "i+1+1" isn't an alias in itself. It's an expression - to be an
>> alias, it would have to be a reference to something.
>>
>> However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
>> important - 1+1 is replacable by 2 in every context, so this is
>> essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
>> vacuously true.
> 
> To me, the SQL with the where clause is like i+2, not like a[2].

I'd say WHERE is like [i+2]: neither is valid on its own, it's the 
"selector" part of a reference into a table.

> A query is simply an expression that makes mention of a variable.
> However I would have to admit that if SQL table elements have
> identity, it would be more like the array example.

OK.

> (The model of SQL I use in my head is one with set semantics,
> and I am careful to avoid running in to bag semantics by, for
> example, always defining at least one key, and using SELECT
> DISTINCT where necessary. But it is true that the strict set
> semantics in my head are not the wobbly bag semantics in
> the standard.)

Set-vs.-bag doesn't affect SQL's aliasing in general, it was a specific 
property of the example that I chose. So you don't have to worry wether 
that might be the cause of the misunderstanding.

>>> That is so general that I am concerned it has lost its ability to
>>> identify problems specific to pointers.
 >>
>> If the reference to "pointers" above means "references", then I don't
>> know about any pointer problems that cannot be reproduced, in one form
>> or the other, in any of the other aliasing mechanisms.
> 
> It seems we agree that, for example, raw C pointers can easily
> cause aliasing problems. To me, Java references, which are much
> more tightly controlled, are still subject to aliasing problems,
> although
> in practice the situation is significantly less severe. (It is possible
> to have multiple references to a single Java mutable object and
> not know it.)

Exactly.

 > I would assume the same is posible with, say, SML refs.

So do I.

> However the situation in SQL strikes me as being different.
> If one is speaking of base tables, and one has two queries,
> one has everything one needs to know simply looking at
> the queries. If views are involved, one must further examine
> the definition of the view.

Sure. Usually, SQL itself is enough abstract that no additional 
abstraction happens; so even if you have aliasing, you usually know 
about it.
I.e. when compared to what you were saying about Java: "It is possible
to have multiple references to a single Java mutable object and
not know it.", the "and not know it" part is usually missing.

The picture changes as soon as the SQL statements get hidden behind 
layers of abstraction, say result sets and prepared query handles.

 ... Um... let me also restate the preconditions for aliasing become a 
problem. You need three elements for that:
1) Identities that can be stored in multiple places.
2) Updates.
3) Abstraction (of updatable entities).

Without identities, there cannot be aliasing.
Without updates, the operation that makes aliasing problematic is excluded.
Without abstraction, you may have aliasing but can clearly identify it, 
and don't have a problem.

I think the roadblock you're hitting is that you keep looking for 
nonlocal trouble to identify spots where SQL might have aliasing; since 
SQL isn't used with abstraction often, you come up empty and conclude 
there's no aliasing.
My position is that not all aliasing is evil, and that there's a 
technical definition: two program entities (l-expressions in C parlance) 
are aliases if changing the referenced data through one also changes the 
other.

> For example, we might ask in C, if we update a[i], will
> the value of b[j] change? And we can't say, because
> we don't know if there is aliasing. But in Fortran, we
> can say that it won't, because they are definitely
> different variables.

Even that may be the case.
There's an EQUIVALENCE statement that can explicitly alias array sections.
Also, I suspect that newer versions of Fortran have reference parameters.

 > But even so in Fortran, if we ask
> if we update a[i], will a[j] change, and we know we
> can't tell unless we know the values of i and j.

In fact, that's the usual form of aliasing in Fortran.

> In SQL, if we update table T, will a SELECT on table
> S change? We know it won't; they are different variables.
> But if we update T, will a select on T change? It
> certainly might, but I wouldn't call it aliasing, because
> the two variables involved are the same variable, T.

I fail to see how this is different from the aliasing in a[i] and a[j].

It's just the same kind of aliasing that I have with the statements
   SELECT * FROM T WHERE key = $i
and
   SELECT * FROM T WHERE key = $j
(where $i and $j are placeholders for variables from program variables).

After running
   UPDATE T SET f = 5 WHERE key = $i
not only the first SELECT may give different results, the second may, 
too (in those cases where $i and $j happen to have the same value).


I can also construct pointer-like aliasing in SQL. If I have
   SELECT * FROM employees WHERE country = "US"
and
   SELECT * FROM employees WHERE name = "Doe"
then somebody with a grudge against all US employees running
   UPDATE employees SET salary = 0 WHERE country = "US"
will (on purpose or not) also modify the results of the second query.

Not a problem while all three queries are in plain view, but imagine 
them wrapped in some opaque object or behind a module interface.

> I am still not convinced. It is not sufficient to show how sometimes
> you can establish identity, or how you could establish identity based
> on some related table; it must be shown how the identity can be
> established in general, and I don't think it can. If it can't, then
> I claim we are back to expressions that include a variable, and
> not aliasing, even by the wider definition.

See above.

I admit that identity cannot be reliably established in SQL. The 
identity of a record isn't available (unless via OID extensions).
However, it is there and it can have effect.

Translated to real life, "my best friend" and "my employer" may be 
referring to the same identity, even if these descriptions may change in 
the future.

>>  > It seems to me there are more different
>>> styles of parameter passing than there are isolation levels, but
>>> I don't usually see (competent) people (such as yourself) getting
>>> call-by-value confused with call-by-reference.
>> Indeed.
>> Though the number of parameter passing mechanisms isn't that large
>> anyway. Off the top of my head, I could recite just three (by value, by
>> reference, by name aka lazy), the rest are just combinations with other
>> concepts (in/out/through, most notably) or a mapping to implementation
>> details (by reference vs. "pointer by value" in C++, for example).
> 
> Fair enough. I could only remember three, but I was sure someone
> else could name more. :-)

I got a private mail naming more parameter passing mechanisms. It's a 
bit difficult define what's a parameter passing mechanism and what's 
just an attribute for such a mechanism. If you count all combinations as 
an extra mechanism, you can easily get dozens.

Regards,
Jo


------------------------------

Date: Sun, 16 Jul 2006 01:13:40 +0000 (UTC)
From: kj <socyl@987jk.com.invalid>
Subject: Re: Yet another flock question...
Message-Id: <e9c3s4$23k$1@reader2.panix.com>


OK, I figured it out (in fact I've seen this before elsewhere, but
I'd forgotten): one locks a sentinel file (*not* the log file) that
one expects won't be renamed or deleted.  Once we have the lock,
we can test whether the handle to the log remains valid, and make
adjustments if not.

kj

-- 
NOTE: In my address everything before the first period is backwards;
and the last period, and everything after it, should be discarded.


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

#The Perl-Users Digest is a retransmission of the USENET newsgroup
#comp.lang.perl.misc.  For subscription or unsubscription requests, send
#the single line:
#
#	subscribe perl-users
#or:
#	unsubscribe perl-users
#
#to almanac@ruby.oce.orst.edu.  

NOTE: due to the current flood of worm email banging on ruby, the smtp
server on ruby has been shut off until further notice. 

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

#To request back copies (available for a week or so), send your request
#to almanac@ruby.oce.orst.edu with the command "send perl-users x.y",
#where x is the volume number and y is the issue number.

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V10 Issue 9479
***************************************

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[28115] in Perl-Users-Digest

Perl-Users Digest, Issue: 9479 Volume: 10

daemon@ATHENA.MIT.EDU (Perl-Users Digest)Sun Jul 16 06:10:14 2006

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Sun Jul 16 06:10:14 2006