A different definition of MINUS, Part 1
- From: paul c <toledobythesea@xxxxxxxx>
- Date: Mon, 15 Dec 2008 16:35:16 -0800
I'm still on the same view updating theme but I thought I'd make a new
subject so as not to sully D&D's stuff with some of my asides under the
other thread. I'm usually guilty, on purpose, of aside-overload because
some of the smarter people here often clarify my misunderstandings,
which is valuable to me. But that thread is now full of stuff that I
don't think is relevant to view updating. Secondly, when I said the
Tutorial D MINUS definition might not be "good enough", what I really
meant to say is that it doesn't go far enough for the usual dbms where a
table or relvar has a fixed header. Obviously D&D already knew this
because they mention "same heading" in their definition.
As interesting as theory is to me, I'm not much of a theorist (I want to
know just enough theory to understand a dbms's results, or maybe just enough more to see a better way to make a dbms) and rather than me write about this, it would be better if the big guns either solved it or proved it can't be solved, but I guess they have other things on their plates. But I'm impatient because I'm with McGoveran that this issue is crucial to the RM, otherwise we're effectively saying that some results are logical and some are arbitrary.
This is a pretty long post and I wish I were clever enough to make it
shorter. For ease of reference for anybody who cares to wade through
it, I'll divide it into parts one and two, even if that seems a little pompous.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
PART ONE
Here's the D&D TTM definition:
(quote - page 92, TTM 2nd edition)
Difference: The Tutorial D <minus>
R1 MINUS R2
(where R1 and R2 have the same heading) is semantically equivalent to
the A expression
R1 <AND> ( <NOT> R2)
(end quote)
In the original dbdebunk exchange, McGoveran said: "...Thus, we must
solve the general case of update semantics for which updating "base"
relations is the special case (and not the reverse) ...".
I'm not expert in formal predicates and I find seeing a way to an
initial implementation easier with an algebra to follow, so I tried to
think about McGoveran's advice in a purely algebraic way. Another thing about the A-algebra specifically: since I first saw it, it has always seemed remarkable to me that an implementation such as Tutorial D might have this view updating problem at the same time as there is no such problem within the algebra, eg., a dbms that implemented only the algebra might mean a db that grows like topsy storing all its results and it might not be especially convenient for users to interpret, but still how could the implementation have the problem and the algebra not have it?
Anyway, here's what I imagine I mean by 'purely algebraic':
1) The A-algebra says nothing about constraints other than the ones in
the formal definition that have to do with relation structure (such as
two attributes with the same name in the same relation not having the
same type), where other constrains are concerned, it is merely the
vehicle for applying the ones that users desire. So, for the purpose of
determining results before user constraints are applied (eg.,
'<AND>ed'), an algebraic approach can ignore the key and foreign
constraints mentioned in the exchange.
2) Date suggests the Assignment Principle is essential, but I would say
it is essential only if we use language that has an assignment
operator. The S relation that underlies the SSP relvar mentions
supplier tuples that either don't participate in the view or that
aren't suggested by the WHERE clause. I interpret this to mean that if
an S-tuple has triples that aren't all in an SSP tuple, that S-tuple
must remain in the result, ie., those tuples are outside of the 'scope'
that McGoveran mentions. Below, I try to find an algebraic way to
satisfy this. I think the Assignment Principle doesn't mean that an
algebra needs an assignment operator, rather it is really a definition
in terms of an algebra as to what an assignment operator, if a language
has one, means.
3) As far as the 'Principle of Interchangeability' (users without
definition privileges would not be able to tell base relations from
views) is concerned, I don't yet see how the A-algebra by itself can
enforce it, so it is up for grabs. Closure seems far more important,
just as it is in Boolean Algebra. From an algebraic point of view, this
also means that there can't be any exceptions except for ones that the
formal definition of the algebra mentions. Users might prefer to be
warned when certain kinds of results are implied by the algebra, but
that is a choice for the language or dbms designer, not something the
algebra must cater to. So, for starters, I'd remove that "same heading"
proviso in the TD MINUS definition.
4) A semantic definition seems important, akin to the one for TD MINUS,
ie., one that avoids "implementation rules" and expresses a possible
naive implementation directly in terms of the algebra.
5) The distributive and associative properties of the algebra are also
important for implementation as they allow a dbms to optimize in various
situations.
6) Projection must be involved (given the A-algebra) if the usual
short-hands for eg., WHERE clauses that people find familiar and if the
usual fixed headings of most db's are to be supported. Whereas, the
A-algebra doesn't care about shorthands nor about fixed headings.
First, let me examine a simpler version of TD's MINUS that removes the
"same heading" proviso:
R MINUS D is semantically equivalent to R <AND> <NOT> D.
Whatever the equivalent to an implemented language's WHERE clause this
definition might be, users will surely want the leverage, such as SQL's
WHERE clause has, to remove multiple tuples by specifying a proper
subset of all of a relation's attributes. This means that D needn't
specify all R attributes. In the exchange, Fabian Pascal implied that
the example WHERE clause is not "sufficiently expressive". I'm not
arguing that the clause is sufficient for all purposes, just that it is
enough to apply the A-algebra. My approach then is to take the clause
as given and decide what it can definitely mean from the information
available to an algebra. With respect to the MINUS definition, the
WHERE clause refers only to a tuple in D (and not a tuple in R, since R
may not have the same attributes as D, as is the case for the SSP
relvar). Date goes beyond SQL syntax because not only the attribute
names are specified, but also the types, as in S# = S#('S1'), so as far
as types are concerned, this example doesn't need to refer to a
'catalog' . As far as the algebraic term D is concerned, all we are
given are attribute names, types and values and the D&D 'conjoin',
'<AND>' and enough is specified to form triples, tuples and a partial
predicate. No relvar or relation name is given, but we can still
proceed because the A-algebra allows us to form relations from tuples.
Still, I think we can't deal with the definitional problem unless we
think about tuples and their triples. As far as the algebraic equation
is concerned, the significance of the D-tuple is that it is in D and not
specifically stated to be in R.
To avoid maddening levels of parentheses in what follows, let me give
highest precedence to <NOT> , next-highest to <AND> and <OR>, next
highest to '=' and also assume that associativity is left-to-right.
For my purposes, it seems to me easier and clearer to resort to an
algebraic equation. I'm trying to omit extraneous notions, perhaps this
is the same as what Codd called 'essentiality', I'm not sure if this is
what he had in mind but I am coming to suspect that the usual arguments
against such 'deletes' stem from mingling physical and logical notions,
eg., that a base must be physical. For one thing, an equation avoids
introducing any notion of assignment to the algebra and also avoids
relvars, so there's no notion of 'before' and 'after', making it easier,
at least for me, to talk about results, ie.,
R' = R MINUS D
= R{HR} <AND> <NOT> D{HD}
where R' is equivalent to the relation that the MINUS operation produces
and HR, HD are the headings of R and D. The projections R{HR} and D{HD}
are equal to R and D respectively. Though they may seem redundant they
are necessary if I am to see this algebraic equation in the context of
the typical dbms that assumes fixed relvar or relation headings.
In this post, I'm concerned only with 'delete through join', not yet
'insert through union' or 'insert through projection'.
If R{HR} is a join, say equal to 'A <AND> B', Heath's Theorem can be
used to re-write 'R{HR}', provided the re-writing involves overlapping
headers, where HSR is some subset of R's header, ie.:
R{HR} = R{HSR} <AND> R{HR}
(This is because HSR -> HSR is a trivial dependency. Maybe somebody
will point out that I don't need Heath's theorem to re-write, it is just
a way that comes to mind that I believe can be shown with only the
assumptions in the A-algebra definitions, not any notions beyond that. )
By definition of <AND>, we know that there is an HSR that equals HA,
where HA is A's heading, and there is no non-equal HSR we care about for
this example, so we can rewrite the R' equation as:
R' = R{HA} <AND> R{HR} <AND> <NOT> D
If we introduce HB as the heading of B, we can also write:
R' = R{HB} <AND> R{HR} <AND> <NOT> D
I think this points the way to a result that avoids mingling base and
virtual concepts, because the R' equation that mentions the HA heading
doesn't mention the HB heading and vice-versa. If we apply this
interpretation to the longer equation, we can ask "what value for R{HA}
always makes R' true?", similarly for R{HB}, ie., if A' and B' are
partial resulting base relations, we don't need to talk about both of A'
and B' in the same breath, we can evaluate them separately. Further, we
can ignore any question of base or virtual when evaluating them (again,
because the A-algebra doesn't admit assignment).
Let me apply Heath to R{HR} and D, such that R{HR} = R{HR} <AND> D{HD},
where HD is the heading of D, eg.,
R' = R{HA} <AND> R{HR} <AND> <NOT> D
= R{HA} <AND> R{HR} <AND> D{HD} <AND> <NOT> (D{HD}
Further, let me apply Heath to the right-most D{HD} term where HD = {S#,
P#}, eg.,
D{HD} = D{HD} & D{S#}, so
R' = R{HA} <AND> R{HR} <AND> D{HD} <AND> <NOT> (D{HD} <AND> D{S#})
Since the heading of R' is HR, if HR includes S#, we could also write
R'{S#} <AND> R'{HR} = R{HA} <AND> R{HR} <AND> D{HD} <AND> <NOT> (D{HD}
<AND> D{S#})
and removing some of the redundant projections,
R'{S#} <AND> R'{HR} = R{HA} <AND> R <AND> D <AND> <NOT> (D <AND> D{S#})
Let "value of S# = 'Sx'" stand for the value 'Sx' in some triple that
corresponds with a heading of {S# S#}. (In this case, 'S#' is not only
a heading, but an attribute name.) Let me now ask whether whenever
there is an 'Sx' in D{S#} must there not be an 'Sx' in R'{S#}? If 'Sx'
is in D{S#}, it must be in D. If it is in R, it must also be in R{HA}.
But it will not be in the right most term (the complement), so it will
not be in the result and it will not be in R'{S#}.
So substituting 'S1' for 'Sx', this definition of MINUS doesn't have the
S# value 'S1' in any of its resulting triples and if followed, the
definition would require that we always remove the S# value 'S1' from
the result. Similarly for the P# value 'P1'. So if the TD MINUS
definition is followed, we must always remove tuples with such triples
in the result. (I think we could say similar if 'S#' were not an
attribute but any subset of HR.) With a language assignment, the
analogy would be 'delete from both sides'.
With the Principle of Interchangeability, R'{S#} could be base or
virtual but in either case, the analogy would remain, so the POI is not
violated.
Now, I want to ask another question. Given the values for S and SP in
Date's example, does this equation have two solutions? (ie., two
solutions as far as the value 'S1' is concerned.) No, it doesn't, if
'S1' is in R'{S#}, it is in R'{HR} and vice versa. The analogy for a
language assignment is that there is only one solution to the 'update'
of S. From an algebraic standpoint, we don't need to consider the
removal of S-tuples as having anything to do with the removal of
SP-tuples, so why should a language that implements assignment introduce
such a consideration? The language's dbms can handle 'updates' to S in
isolation from 'updates' to SP.
As far as the value 'S1' is concerned, I think the Assignment Principle
isn't violated because if 'v' doesn't include the 'S1' value, the
resulting V doesn't include that value. But the WHERE clause makes no
mention of a tuple (<S# S# 'S2'>,<P# P# 'P1'>) and if there were such a
tuple in R{HD}, the P# value 'P1' would not be in R'{HR}, which would
violate the Assignment Principle. This is why I think the TD MINUS
definition must be inaccurate. Plus, from a practical point of view, it results in an SSP result that seems very mangled, given the original 'delete' statement. In Part Two, I'll try to explain a different
definition.
-------------------------------------------------------------------------------------------------------------------------------------------------------
.
- Follow-Ups:
- Re: A different definition of MINUS, Part 1
- From: Brian Selzer
- Re: A different definition of MINUS, Part 1
- Prev by Date: Re: Date and McGoveran comments on view updating 'problem'
- Next by Date: Re: simple db - how am I doing so far?
- Previous by thread: simple db - how am I doing so far?
- Next by thread: Re: A different definition of MINUS, Part 1
- Index(es):
Relevant Pages
|