Re: RM formalism supporting partial information



On 21 nov, 04:05, David BL <davi...@xxxxxxxxxxxx> wrote:
On Nov 19, 7:11 pm, Jan Hidders <hidd...@xxxxxxxxx> wrote:



On 19 nov, 05:20, David BL <davi...@xxxxxxxxxxxx> wrote:

On Nov 17, 7:54 pm, Jan Hidders <hidd...@xxxxxxxxx> wrote:

On 17 nov, 04:35, David BL <davi...@xxxxxxxxxxxx> wrote:
Also I think one could argue that the CWA is at odds with a model of
partial information, unless you keep in mind the idea of "negation as
failure to prove true".

Here we touch the core of why I think your approach is problematic.
The CWA does apply for the does-not-apply interpretation of null
values. It does however not apply in its classical form for the value-
unknown interpretation. Actually Raymond Reiter himself (he introduced
the CWA) explains very well how it then changes in a paper with the
title "A sound and sometimes complete query evaluation algorithm for
relational databases with null values".

What you seem to be doing is mixing the two interpretations, resulting
in something that IMO doesn't really seem to have any consistently
meaningful interpretation at all. If you really want to combine the
two interpretations you need to first define the meaning of a relation
with null values in terms of sets of "possible worlds" where the null
values are removed by either giving a concrete value for them or
declaring them not applicable. That might actually be quite
interesting, and I don't remember seeing a paper that did this
properly. Even Zaniolo doesn't really get this right, although he
claims that he combines the two approaches. So you are in very good
company. :-)

The two main interpretations of null are apparently

1. value exists but is unknown
2. value doesn't exist

Zaniolo combines these into a single interpretation

3. no information

I think you're saying 3 is a mixture of 1,2 and doesn't lead to a
consistent interpretation.

It can lead to a consistent interpretation, but I think you and
Zaniolo don't do it in a consistent way.

IMO the problem is actually the reverse.
It is easy to interpret "no information" (it simply means that the
predicate instantiation isn't available for logical deduction),
whereas interpretations 1,2 quickly take us outside the realm of the
RM/RA. I say that because it seems clear that the RM/RA is only
concerned with a strict subset of the FOPL involved with logical
deduction from sets of fully instantiated predicates.

Wow. There's so much I disagree with here that I'm not sure where to
begin. To begin with a detail, the RM/RA naming convention hints for
me at a deep misunderstanding. The algebra is not an integral part of
the data model. If you take another query language it is still the
relational model. If anything, FOL and the related calculi are more
fundamental for understanding the meaning of the data.
Next, you claim that your interpretation is simple, but your
description of it is clearly not complete. A complete description
tells you which FO sentences are true, false or neither. See Reiter in
his paper on null values on how this is done properly for the value-
unknown interpretation. Since you are combining it with another
interpretation the result will be at least as complex as his. To begin
with you will need to extend the language of FOL with extra atoms that
allow places to be not there. So we have next to the classical atoms
such as P(x, y, z) also P(x, y, _) (for simplicity let's take the
unlabeled perspective) which says that the third value may be
undefined. So tell me, given a languages of formulas with the usual
logical connectives and quantifiers, over such atoms, which formulas
are true, false and unknown given a certain relation with null values.
Again, see Reiter for how this is done. Oh, and try to keep it
simple. :-P

Finally, you say that it seems clear that the RM/RA is only concerned
with a strict subset of the FOPL involved with logical deduction from
sets of fully instantiated predicates. To the extent that this is
true, you have already left that safe and well-understood realm the
moment you allowed the value-unknown interpretation. Adding the not-
applicable interpretation makes the situation worse, not better.

Perhaps there is a philosophical difference in our approach to
mathematics!

Database theory is largely applied mathematics. You seem to be more
inclined to the ways of pure mathematics. That's fine, of course, but
I reserve the right to be a bit skeptical about the usefulness of your
approach. :-)

By consistency I only mean the inability to derive self contradictions
that would make it impossible for the definitions to be satisfied in
the first place. Of course it is normally impossible to prove
consistency. I think this is quite different to your usage of
"consistency" in the above. Is that right?

Yes. I meant that you are not consistently using or defining a single
interpretation of missing data but rather seem to taking sometimes one
and sometimes the other depending on what make the math looks cute.
That approach is likely to end in elegant but meaningless results.

I would say that a model of partial information will have trouble
formalising consistency or completeness in the following sense: If we
imagine an omniscient, complete database then we can compare the
result of a query against this hypothetical database. Any reasonable
definition of relation difference will be judged as neither
necessarily consistent nor complete with respect to the hypothetical
database. Despite this we still would like a formalism that in
practice is useful for dealing with partial information. Perhaps as
well, we will use the formalism (with its well understood
mathematical properties) as a basis for our intuition rather than the
reverse!

The thing is that you are doing it the wrong way around. You should
first define in some formalism like logic what the data means (again,
see Reiter for how this is done), and only *then* you should start
thinking about the operators, which ones you want, and what they
exactly mean.

Your reference to completeness above seems fairly easy to meet. The
algebra can be mapped to corresponding proof rules. In other words
given the algebra we can define the subset of FO formulae that can be
proven true (or false). I agree that this is the basis on which we
understand the meaning of the formal data model.

I don't think that's what I said. When formalizing the meaning the
present and missing data the algebra operators should not be in the
picture.

I have wondered whether the interpretation should state that the CWA
applies everywhere and nulls are always interpreted as non-
existence. Under that formalism we have not as you say "left that
safe and well-understood realm" by allowing the value-unknown
interpretation. Note for example that it is on this formal basis
that a restricted form of negation can be defined. As far as the
formalism is concerned we stay strictly within the confines of a 2vl.

Correct.

Practically speaking (ie in the presence of partial information) the
user should think of negation as failure to prove true.

No, no, no. That's a very sloppy and confusing way of formulating a
CWA. E.g., it might depend on the effectiveness of your theorem prover
whether something is considered true or not.

We must somehow deal with the inconsistency between a formal
interpretation of null as non-existence, and the user of the DB that
may choose to *weaken* it to non-information. However, I think the
only issue is that the user must be careful to weaken the
interpretation of query results accordingly, and the way to do this is
fairly straightforward.

The user is not the problem here, you are. :-) You first need to
figure out what the options are that you want to offer to the user and
understand their consequences. The best way to do that is by starting
to formalize them properly, preferably in terms of a logical theory.
Again, see Reiter for how this is done.

Of course this needs some justification. As an illustrative example
consider a conjunctive query compatible with select-project-join.
There is no negation, and the variables can be regarded as
existentially quantified. The result of the query will be a subset of
the result that would be returned from an omniscient database. This
could be regarded as consistency without completeness. What more
could one hope for in the presence of partial information?

Exactly, so in that sense it is actually complete, and you can make
that claim precise. The set of tupels in the answer will be exactly
the set of tuples that are certain to be in the result of the same
query over the omniscient database. By the nature of the problem every
query should actually return 2 sets of tuples: the set of certain
answers, and the set of possible answers. Your operators should
therefore not operator on relations but on pairs of relations.

-- Jan Hidders
.



Relevant Pages

  • Re: Is State Vector Reduction a Process?
    ... be observed and hence cannot produce outcomes" is already an interpretation of the measurement formalism. ... If the observer sits in a different system then, because the first system is closed, it cannot have any interactions. ... At least this is the traditional way of viewing the formalism. ...
    (sci.physics.research)
  • Re: Observer pattern limitations
    ... The OOPL has have a formalism that let's you map each ... It does not describe the semantics of the program, ... You're varying the interpretation on the *output* of the programs. ... the program of Ariane V was ...
    (comp.object)
  • Re: Guessing?
    ... By relation schemata I assume you are referring to relation types. ... of the encoded attributes as values in the RM formalism. ... What I'm saying is that absent interpretation, ... interpretation and that under any and every interpretation the axioms are ...
    (comp.databases.theory)
  • Re: Is State Vector Reduction a Process?
    ... >> We are at the heart of the problem between the formalism of the theory ... > it is given _some_ interpretation. ... > Real measurement requires real interactions. ...
    (sci.physics.research)
  • Re: RM formalism supporting partial information
    ... result of a query against this hypothetical database. ... we will use the formalism (with its well understood ... user should think of negation as failure to prove true. ...
    (comp.databases.theory)