Re: a union is always a join!



philip wrote:

Reinier,

There are fundamentals of Date& Darwen's "algebra" A that you don't
understand.

I understood the definitions (thanks for repeating them though)
but I did make a major mistake regarding OR.

Definitions:
AND is JOIN.
NOT R has exactly the tuples typed by R that are not in R.

Note that 'the tuples typed by R' is a ambiguous.

I was going by Darwen's appendix "A New Relational Algebra",
which doesn't mention types. Therefore I was tempted to read
'the tuples typed by R' as the join of the single-attribute
projections of R. Let's call this operation RELDOM(R),
and define RELNOT(R) = RELDOM(R) MINUS R. RELDOM can
(obviously) be expressed in Codd's algebra, albeit not
with a fixed expression.

However, what you actually mean by 'the tuples typed by R' is
the join of the single-attribute relations with all possible
values for types of the attribute. This is also an operation,
let's call it DOM. Then we can define NOT(R) = DOM(R) MINUS R.
This 'absolute' NOT generally introduces domain values
that weren't in R, and when the domains are big or infinite,
produces very big / infinite relations. It's this NOT that
I have problems with.

R OR Q has exactly those tuples typed by R JOIN Q which
give a tuple from R when projected on R's attributes or give a tuple
from Q when projected on Q's attributes.

Yes; and we can again introduce a relative variant RELOR
by using RELDOM(R JOIN Q) instead of DOM(R JOIN Q).

Darwen's AND, OR and NOT are more general operations than
those of the relational algebra, but they are not any more precise.

Agree.

And the price he pays for allowing them is worse than computational:
the tuples of the relations we create with them can no longer all
be regarded as expressing positive facts.

Not so.
Queries are interpreted exactly the same as always.
Tuples that are present are true and those that are not present are
false.

Yes, but what I meant to say is that in general, the tuples
don't really express facts regarding those domain values,
they just help express information about other domain values.

But I wasn't thinking straight when I wrote that (as paul c
suggested): this is true for *any* operation with negation,
e.g. RELOR or MINUS. It is only more conspicuous in NOT and OR,
because they introduce domain values not in the original relation(s).

We *lose* preciseness.

Not so.

OK.

Darwen's algebra is[...]
*not* implementation oriented, because NOT and OR cannot actually
be implemented as they are (try NOT-ing the table {A: 1, B: 1, C: 1},
where A, B, C can take arbitrary floating point values).

Of course they can be implemented.
It's true that they denote large relations, but that's not necessarily
a problem.

It complicates the implementation: it is no longer possible to represent
all relations as finite collections of tuples.

First, you could allow only queries in NOT and OR that can be recast
using MINUS and UNION.
Second, you could allow non-query expressions but not queries in NOT
and OR.

Yes, that was my suggestion: rewrite to Codd's algebra
and reject the rest. Another idea is, of course, to use
RELNOT and RELOR instead.

Third, the cost of a query in NOT or OR is not necessarily
unacceptable.
It depends on the size of the attribute types rather than the number
of tuples in the relation,

Definitely. For Booleans or enums it's acceptable
to just complement literally.

but you could allow them and and still compute them (eg if the result
is small enough) (or in a lazy evaluation context).

Lazy evaluation doesn't help for NOT. I think you want to be smarter
anyway. But I notice there are several implementations already and I
haven't got a clue what they do.

The benefit is that NOT and OR can give clearer queries even if you
limit yourself to recastable queries
(they roughly mean "not" and "or" in the sense that
roughly JOIN means "and", MINUS means (limited) "and not" and UNION
means (limited) "or").

Perhaps. I always liked how Codd's algebra separates 'horizontal'
operations (PROJECT, JOIN) from 'vertical' ones (MINUS, UNION,
INTERSECTION, quotient, selection). When I look at OR it doesn't
look like an operation I would use. But this may just be a matter
of getting used to it.

I would
venture that the first thing any implementer will attempt to do
is rewrite all of Darwen's expressions back to relational algebra as
far as possible, and throw errors on the remainder.

You use "venture", "attempt" and "as far as possible" but the
semantics are clear.

True.

[...]

You can either think of PLUS as a named relation value or as a
variable that doesn't happen to change.
Otherwise everything is as usual.

Mathematically, yes. But the implementation has to treat PLUS,
and mathematical operations in general, differently from the
finite, extensionally defined, time-varying relations that
constitute the actual information stored in a relational database.

[...]

In your informal terms, PLUS is information about the world; that info
just doesn't happen to change.

No, it is not. The implementation can, and had better, take advantage
of the invariability of PLUS, and the typical computer hardware's
buil-in support for it. You really really don't want to implement it
as table lookup. Even for e.g. Boolean operators you probably want to
optimize differently than you they will be optimized when implemented
as table lookup.

Nice as an academic exercise (look, ma, I removed a concept)
but not as an academic result (no useful purpose).

It is practical and not an exercise to simplify and generalize.
The basic consequence is that the difference between operators and
relations becomes only syntactic.

To the mathematician; not to the computer scientist.

You can use a relation wherever you can use an operator and vice versa.
This can be extended to include user-defined operations for which, again,
the complexity for recastable queries is the same as the recast query,
and you need a policy for non-recastable queries (and further policies
for side-effects).
If the D&D presentation were more formal and complete this would be
more evident.

It is evident already, but I think this policy stuff is hard.

BTW, using NOT and OR is equivalent to using MINUS and UNION along
with a named single-attribute relation value
for each type each containing all that type's values.

Yes (DOM above).

Such a database has the same operation complexity as usual but some of
the leaves (the type-relations) are big.
So the cost of evaluating corresponding queries is the same.
Again, it is notationally more elegant and exactly as computationally
expensive
to allow the latter with the same recast-restrictions you would apply
to the former.

As I already said in my previous message, I see the elegance,
but at the same time I find elegance in Codd's use of two kinds
of operations. Obviously if you rewrite to Codd's algebra (or do
something equivalent to that) you're not going to lose efficiency
on the cases where this is possible. But just rejecting the rest
is only going to confuse the user. Why not be honest and admit
that we really expect, in our implementations, relational queries
to work on finite, typically small relations, that WORKS_IN is one
and PLUS is not, and that NOT doesn't do so?

philip

--
Reinier
.