Re: Principle of Orthogonal Design




mAsterdam schreef:
This is all way over my head. Jan, you may be used to
sailing these waters; I am not. I think it is interesting,
but very time-consuming.
Am I talking rubbish? I try not to, but I don't know - nobody
else chimes in. You keep replying, but hey, you are a nice guy :-)

I'll make some bold statements here anyway, but I can't keep this up.

Your non-bold statements are also be highly appreciated. :-)

Too keep things manageable I'll try to trim the discussion heavily and
keep what I think is really essential. So if I snip any of your
comments it is not because I don't think they are interesting, but
just that at the moment I think they are not at the core of the issue.

Jan Hidders wrote:
mAsterdam wrote:
Jan Hidders wrote:
... I wanted to start simple ...
Even though this suggests otherwise, I'll risk assuming
that you understood the coffee analogy, and snipped it
for focus.

I *think* I understood it, but I didn't want to get into a debate
about it's appropriateness. My position is that in this case we don't
really know precisely what this "coffee" thing is anyway. It is a
rather vague and informal notion, so we can take the liberty to
redefined in a way that seems to most practical for the time being...

Informal as far as most of its nature is concerned, sure.
Its existence, however, is neither vague nor informal.
Without it, the coffee-machine loses its purpose.
It is because of this, that pragmatical redefinitions
must indeed be temporary and tracked.

Sure, we agree on that. It's just that I think it is sufficient to say
at the beginning something like "for this discussion we are going to
define 'overlapping meaning' as .." and you want to be more careful
and really use another word. As I already said, I'm fine with that, so
I don't think more discussion is needed here.

[... very big and painful snip ...]

DEFINITION: Two relations R and S are said to have dependency-induced
overlap if there is a dependency that requires that sometimes some
tuples(1) of R are also in S.

(1) for some definition of tuples that allows restricted
reshuffling of its values. To do.

The only way to achieve (1) so that it also takes all normal inclusion
dependencies into account is to define tuples as something equivalent
to bags of values. Such an operation on its internal organs is going
to change the relational model beyond recognition. I'm going to
strongly insist that we stick to the classical definitions of the
named perspective and state in the definition that we are talking
about inclusion up to relabeling:

DEFINITION: Two relations R and S are said to have dependency-induced
overlap if there is a dependency that requires that sometimes some
tuples of R are also in S after renaming the attribute names.

[... another big and painful snip ...]

...additional terminology:
DEFINITION: A join dependency is said to be a proper if it does not
hold that any of its components is a subset of another component.
Proper is a nicer than non-trivial.
Note that it's semantically different. Not all proper join
dependencies are non-trivial, and not all non-trivial join
dependencies are proper.

Thanks, I have to read more carefully.

Trivial dependencies are the ones implied by the
candidate keys.

Not precisely. Trivial dependencies are those that hold in any schema.
For example the FK a,b -> a is always true as long as you have
attributes 'a' and 'b' in the header. Another example is the JD
[ {a,b}, {b,c}, {a,b,c} ] if the attributes in the header are {a,b,c}.
This is always a lossless decomposition, but also a very redundant
one. Another is the JD [ {a,b,c} ].

You include some of those in the problem space
while leaving some non-trivial dependencies out based on wether
any of its components is a subset of another component?

Let's see. This is because we are dealing with two relations
S and R, and we have to make sure that we get all
dependency-overlap, including the overlap of the trivial JD's.

Yes. The intuition is that if there is a JD the relation is in fact a
combination of one or more predicates namely the ones that correspond
to each component of the JD. It is for these predicates that we want
to check overlap. We want to explicitly include the case where it is
actually just one, so we also want to consider the trivial JD [ {a, b,
c} ] for the header {a, b, c}.

In that terminology you can also perhaps understand why we only care
about proper JDs. Suppose we have JD [ {a, b}, {b, c} ]. Then it
follows that the following JD also holds: JD [ {a}, {a, b}, {b, c} ].
I think you'll understand that if the first defines a lossless
decomposition, then so does the second for the simple reason that it
contains at least as much information as the first. But it is not true
that the component {a} also necessarily corresponds to some meaningful
predicate. In fact, it probably doesn't. So that is why we want to
exclude those.

Having said that, there are actually more JD's that need to be
excluded than I initially thought. For example, if the JD [ {a, b},
{b, c} ] holds then also the JD [ {a, c}, {a, b}, {b, c} ] holds.
Also here there is not necessarily a meaningful intuitive predicate
associated with the component {a, c}. So, at the risk of causing you a
serious migraine I'm going to have to tweak my definitions a little.

DEFINITION: A join dependency is said to be minimal if there is not
another join dependency with a set of components that is a proper
subset of the set of components of the first.

Examples: If JD [ {a, b}, {b, c} ] holds then JD [ {a, b}, {b, c}, {a,
c} ] is not minimal. The JD [ {a}, {a, b} ] is never minimal because
the trivial JD [ {a, b} ] always holds.

Note that a minimal join dependency is always also a proper join
dependency.

The new rule then becomes:

DEFINITION: A schema is said to violate POOD if it contains relations
R and S such that a component C of a minimal join dependency of R
overlaps in meaning with a component D of a minimal join dependency of
S, and either R and S are different, C and D are different, or both.

I just realized by rereading your comments that there might be some
confusion on the notion of "component". From now on I'll strictly use
the word component for a set of attribute names in a join dependency.
So I would like to make the wording in the rule a bit more explicit:
(changed parts are emphasized with "_")

DEFINITION: A schema is said to violate POOD if it contains relations
R and S such that _the projection of R_ on a component C of a minimal
join dependency of R overlaps in meaning with _the projection of S_ on
a component D of a minimal join dependency of S, and either R and S
are different, C and D are different, or both.

It looks to me that in order to have this problem,
we need to have two relations R and S, where 5NF is
relevant in both, with isomorphy between components
of R and components of S
(which is quite imaginable when integrating databases
in the same business domain, e.g. after a merger, or
with similar but distinct types of complex contracts).
Am I correct thusfar?

Roughly. The "5NF is relevant" is a bit misleading. All that is
required is that there are proper join dependencies, but these are
always present. For example, any relation R(a,b,c) will have always
the trivial join dependency JD [{a,b,c}], i.e., the join dependency
with one component that contains all attribute names in the header.
Although trivial, it is also proper.

Yep, I think I get that.

So you can violate the rule, even
if there is no non-trivial join dependency. A simple example would be
as follows: R(a,b) and S(c,d) with ID R[a,b] -> S[c,d], It violates
the rule because you also have for S the trivial and proper JD
[{c,d}]. So do you think that there is redundancy here that can be
removed?

Btw. the "with isomorphism between components" can be simplified to
"with components of equal size". After all, components are just sets
of attribute names, so they have equal size iff they are isomorphic.

Now I'm lost again.

My fault. I thought you were talking about components of JDs.

I don't have much experience with reasoning about 5NF.

I actually don't think that is necessary. Whether the relations are in
5NF or not is not really irrelevant. But you understand what lossless
decompositions are, no? And I assume you also understand how they are
implied by functional dependencies.

Up to BCNF.

Good. Let's for the moment restrict ourselves to just CKs. So say we
have a relation R(a,b,c,d) with a CK {a,b}. What are the minimal join
dependencies that hold?

And what if we have the CKs {a,b} and {b,c}?

So do you see for which projections we need to check overlap with
other projections of other relations?

-- Jan Hidders
.


Quantcast