Re: Towards a definition of atomic



On Feb 4, 11:29 pm, Jan Hidders <hidd...@xxxxxxxxx> wrote:
On 4 feb, 03:09, David BL <davi...@xxxxxxxxxxxx> wrote:





On Feb 3, 11:41 pm, Jan Hidders <hidd...@xxxxxxxxx> wrote:

On 2 feb, 02:45, David BL <davi...@xxxxxxxxxxxx> wrote:
On Feb 2, 3:42 am, Jan Hidders <hidd...@xxxxxxxxx> wrote:
On 1 feb, 14:30, David BL <davi...@xxxxxxxxxxxx> wrote:

AFAIK the conventional wisdom is that no absolute definition of atomic
exists for domain types.

On the contrary, we have too many of them. :-) But I don't think it is
wise to go looking for a definition if it is not clear to you want you
want to do with that definition. If you don't know where you are
going, any road will take you there.
Throwing caution to the wind, in this post I
wish to conjecture a definition of atomic that, perhaps with some more
effort at its formalisation, can provide some absolute meaning for a
given attribute within a given RDB schema.

In view of such blatant carelessness I can only respond by not heeding
my own warning. :-)

The usual intuition is that something is not atomic if it is
decomposable into smaller parts. In this case that would mean parts
with less information. For the case of decomposition into two
components this leads to something like:

DEFINITION: A domain D is said to be decomposable if there are domains
D1 and D2 and functions f1 : D -> D1 and f2 : D -> D2 such that (1) f1
and f2 are not injective and (2) <f1,f2> is injective where <f1,f2> :
D -> (D1 x D2) is defined such that <f1,f2>(x) = (f1(x), f2(x)).

Note that (1) says that each individual decomposition function loses
information and (2) says that together they don't. However, we can
then make the following observation:

THEOREM: A domain is decomposable iff it contains more than 2 values.

I like your approach to defining a *non-trivial* decomposition, by
requiring that f1,f2 not be injective.

However, I don't think your definition captures the right idea. For
instance, my example 2 decomposes a domain expressed as a set of
person names to a single name. In that case what is D1,D2?

The definition is aimed at only establishing that something is
decomposable, not exactly how you should decompose it.

Yes, that makes sense.

Thanks.





For a set of
names it clearly holds that it is decomposable and you can pick many
different D1s and D2s that show this. For example, you can split the
set into a set of strings before "Laura" and a set after "Laura".
Whether that makes sense or not, is another question because you are
looking for an absolute definition. It might, and that is enough. If
you want to say more about how you want to decompose, you could define
something like the following:

DEFINITION: A domain D is said to be set-decomposable if there is a
domain D' and an injective function f : D -> P(D'), where P is the
power-set operator, such that the number of values in D' is smaller
than the number of values of D minus 1.

It think you will understand why D' has to be smaller than D. The
"minus 1" may be a bit surprising, but otherwise the notion becomes
trivial in the sense that every domain will be set-decomposable: you
can map the domain {v_1, v_2, ... } to P({v_2, ...}) by mapping v_1 to
the empty set and for i > 1 mapping v_i to { v_i }.

Guess what?

THEOREM A domain is set-decomposable iff it contains more than 3
values.

So set-decomposability implies decomposability, but not the other way
around.

What about when D is an infinite domain? Talking about number of
values in D minus 1 doesn't seem applicable.

Correct, I was a bit sloppy there. Should be more like "the
cardinality of D should be smaller than the cardinality of D' minus
one of its elements".

In example 2, the values to be decomposed were sets of person names.
So I presume the domain D to be decomposed is the power set over the
set of strings, D' is the set of strings and f can simply be the
identity. Is that right? Now D and D' both have infinite
cardinality. However it seems clear that D' should be deemed smaller
in some sense. In fact Cantor proved there is no surjection from a
set to its power set, which establishes the result that D' has a
smaller cardinality than D. In fact only D' is countably infinite.

Exactly.

What do you think of the idea to use existence of a bijection as a
definition of information equivalence between databases (expressed
with alternative DB schema)?

It depends a bit on what intuitive concept of equivalence you want to
formalize. Under this definition, for example, all schemas that have
countably infinite many instances are equivalent. That is probably too
crude.

Way too crude! Yes, I can see the bijection needs to be constrained
somehow.

Presumably the bijection exists if and only if the set of instances
for each schema have the same cardinality. I gather that is in fact
the definition of equal cardinality.


This question has been addressed a number of times in the
literature, and is by no means easy or uncontroversial. Some
literature I can advise:

Hull, R. and Yap, C.K. The Format Model: A Theory of Database
Organization. Proceedings of the 1 st ACM Symposium on Principles of
Database Systems, ACM, 1982, pp. 205-211.

R. Hull. Relative Information Capacity of Simple Relational Database
Schemata. SIAM Journal of Computing, 15(3), 1986.

Also interesting:

R. J. Miller, Y. E. Ioannidis, and R. Ramakrishnan. The Use of
Information Capacity in Schema Integration and Translation. In Int.
Conference on Very Large Data Bases, pages 120--133, 1993.http://citeseer.ist.psu.edu/miller93use.html

Miller, R. J., Ioannidis, Y. E., and Ramakrishnan, R. 1994. Schema
equivalence in heterogeneous systems: bridging theory and practice.
Inf. Syst. 19, 1 (Jan. 1994), 3-31. DOI=http://dx.doi.org/10.1016/0306-4379(94)90024-8

@techreport{ hofstede92note,
author = "A. H. M. ter Hofstede and H. A. Proper and Th.P. van der
Weide",
title = "{A Note on Schema Equivalence}",
year = "1992",
url = "citeseer.ist.psu.edu/407910.html" }

@inproceedings{DBLP:conf/otm/ProperW05,
author = {Henderik Alex Proper and
Theo P. van der Weide},
title = {Schema Equivalence as a Counting Problem},
booktitle = {OTM Workshops},
year = {2005},
pages = {730-739},
ee = {http://dx.doi.org/10.1007/11575863_92},
crossref = {DBLP:conf/otm/2005-1},
bibsource = {DBLP,http://dblp.uni-trier.de}

No, I don't expect you to read all these, I just want to give an idea
of the amount of thought that has already been spent on this. :-) The
work by Richard Hull et al. is warmly recommended though. My own
position would be roughly what you find in that work, so the function
should in addition be computable and generic. His and Yap's work is
IMO the best in this area. The work by Arthur ter Hofstede, Erik
Proper and Theo van der Weide is nice because it shows that the
problem is essentially the same for ER-like data models such as ORM.
Erik also wrote earlier on a paper on this with Terry Halpin himself.

Thanks for the references


Do you think the following is a valid principle?: Identifiers that
stand for mathematical values increase the entropy of the database and
should be regarded as representing "noise", and indicate that
decomposition of domain types has been taken too far.

I don't think it's that simple. What definition of entropy did you
have in mind? The ones I know that were introduced for normalization
theory do not imply that identifiers increase entropy. What is a good
model or not depends on many factors. It might depend on the users and
what they want to do with the data. If it is intuitive for them, who
are we to say otherwise? It might also depend on the DBMS and how it
provides physical / data independence. In theory this should not
matter, but in practice it probably does.

.



Relevant Pages

  • Re: Towards a definition of atomic
    ... given attribute within a given RDB schema. ... It depends a bit on what intuitive concept of equivalence you want to ... Database Systems, ACM, 1982, pp. 205-211. ... decomposition of domain types has been taken too far. ...
    (comp.databases.theory)
  • Towards a definition of atomic
    ... A decomposition of an attribute involves translation from an original ... DB schema to a new DB schema with alternative relations. ... is where those node identifiers in the first example come to play, ... it seems that the node identifiers aren't ...
    (comp.databases.theory)
  • Re: Is Validity Just a Hypothetical or Conditional Characteristic?
    ... each conjunct of Q' is a conjunct of P'. ... Can you give such a decomposition. ... The method I suggested might include an axiom ... schema A |- A. But in a proof, ...
    (sci.logic)
  • Re: Towards a definition of atomic
    ... exists for domain types. ... given attribute within a given RDB schema. ... Note that says that each individual decomposition function loses ... of the components will always be an infinite domain again. ...
    (comp.databases.theory)
  • Re: XSD Schema Authoring Question
    ... validating parser to keep track of the cardinality of the children to ... Either specify a strict ordering of elements ... and/or use a more powerful schema definition language. ... Definition Language was not designed to define arbitrarily complex ...
    (comp.text.xml)