Re: A real world example
- From: "Brian Selzer" <brian@xxxxxxxxxxxxxxxxxxx>
- Date: Sat, 19 Aug 2006 09:08:42 GMT
"Brian Selzer" <brian@xxxxxxxxxxxxxxxxxxx> wrote in message
news:_IkFg.14326$gY6.13813@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
"erk" <eric.kaun@xxxxxxxxx> wrote in message
news:1155908080.655605.22860@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Brian Selzer wrote:
"erk" <eric.kaun@xxxxxxxxx> wrote in message
news:1155827594.752249.65890@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Brian Selzer wrote:If a constraint is defined in terms of successive states of a database,
[snip]
then
facts cannot be thought of just in terms of instances of predicates.
Yes, that's all facts are, though constraints mean that your facts are
consistent with one another, based on how you've defined the predicates
of relevance to you (i.e. inter-relation constraints reflect
requirements as well as "reality"). I'm speaking out of some degree of
ignorance; no database I've ever worked with has had state-transition
constraints like this. The only constraints have been what constitutes
a valid relation (and database) value.
Any such instance has identity only within a single relation value.
Since the
value of a candidate key determines the values of all other attributes,
it
can be used to identify a single tuple within a single relation value;
therefore, it can be used in other relation values as a substitute for
enumerating all of the attribute values of the referenced tuple in every
referencing tuple within the same database state--but only within the
same
database state.
I don't think that's a valid interpretation. It's not a substitute for
referencing all of the attribute values, nor, given only the key value,
can I determine the values of those attributes. The key has a specific
meaning in all the predicates in which it's referenced, as distinct
from the other attributes - it's not a surrogate for them. Relations,
even those with foreign key constraints, are standalone facts. Joins
and constraints reflect references to the same types.
I think this difference is important, and am not just spitting hairs
for the sake of it.
But if a relation value includes a foreign key, a value for that key along
with the definition of the foreign key constraint determines the values
for all of the other attributes in the referenced relation value because
the key value identifies the referenced tuple. I thought that a foreign
key constraint defined a whole-part relationship between tuples in the
referencing relation value and tuples in the referenced relation value.
Extending the scope of a candidate key's ability to
identify instances of predicates from a single database state to
successive
database states would require that those instances be identical, not
just
the candidate key values.
So this ability to uniquely identify a fact across successive database
states is important purely for state-transition constraints? Is there
any other use for them?
For any temporal constraint (unless there's only one fact).
If the values determined by the candidate key
value in the proposed state were different than those in the current
state,
then the instances would not have the same properties, and therefore,
would
not be the same.
Sameness is irrelevant; a fact is a fact, and if database value N+1
differs from database value N, it's because facts have changed, and we
need the database to reflect reality more accurately.
Instances are values; values do not change. Therefore,
relaxing this restriction so that a candidate key value can identify
instances in successive database states that are not necessarily
identical,
but have identical candidate key values can only be possible if the
instances of a predicate represent things in the universe of discourse
that
can have their appearance altered without altering their identity, and
what
is identified by a candidate key value is not just a tuple, but
something in
the universe.
It's a fact, not a thing. I have to admit that I'm still quite leery of
the notion that facts have "identity." I have no current
counterproposal; I'm just trying to understand why this is necessary. A
fact is a statement about things. One attribute of such a statement
might be that a real-world thing has a real-world unique identifying
value, and I understand the need to sometimes introduce surrogates. But
given that a separate "identity attribute" for a fact can have no
possible correlation with anything in the real world, including natural
candidate keys, I smell potential problems.
Facts are things, and thus can have identity, but the distinction between
the identity of a fact and the identity of the thing that a fact is about is
important. When I say that something has identity, I'm saying that that
thing has properties that distinguish it from all other things in the same
frame of reference. The frame of reference for a candidate key is a
relation value in a database instance. The designation of a candidate key
specifies the attributes in the relation value in the database instance that
represent the identity of each tuple in the relation value. Therefore, a
candidate key value identifies a single fact in a single database instance.
On the other hand, in order to enforce a transition constraint, the frame of
reference for the thing that a fact is about must include not only one
database instance, but both the current and proposed database instances.
Therefore, the properties that distinguish the thing from all others must
not change throughout the update.
This difference between the frame of reference of a candidate key and the
frame of reference of an update is the root of the problem. If you can
correlate facts about something in successive database instances, you can
determine how the relevant properties of something changed, and thus you can
enforce transition constraints. If you can't correlate facts about
something, then you can't enforce transition constraints. A change defines
a temporal boundary that marks the end of one situation and the beginning of
another. The current database instance corresponds to the situation that's
ending and the proposed instance corresponds to the one that's beginning.
It's possible for a candidate key value in the current instance to
indirectly identify something in the situation that's ending, but not in the
situation that's beginning. It's also possible for a candidate key value
that indirectly identifies something in the situation that's ending to
identify something else in the situation that's beginning. How? It's
possible for something to have its appearance altered without altering its
essence, and it's also possible for something to be identified by a property
that can change. For example, consider a line of people at the bank. Both
Person and Position are identifying properties. Assume that you're third in
line, so Person is you, and Position is 3. When the guy at the head of the
line leaves, your Position changes to 2. Now let's put that in the context
of a database. You have a relation with candidate keys Person and Position.
So the current instance might look something like
{(Bob, 1), (Brian, 2), (You, 3)}.
The proposed instance would look something like
{(Brian, 1), (You, 2)}
Even though Position is a candidate key in each situation and indirectly
identifies an entry in the queue, the value 2 from the current instance
identifies the tuple containing You in the proposed instance, not the one
containing Brian. This illustrates the difference in the frame of reference
for a candidate key and that for an update, and Position is an example of an
identifying property that can change.
Maybe, but from a functional standpoint, that operator is just a
function (e.g. "subtract $500 from X), in which the balance is a
free
variable. Only in an imperative world does that involve "knowing"
(referencing) the "previous" balance. Function application means
there's no "query" of the value prior to the update.
Not necessarily. For example, consider a sales order that can have
several
states, proposed, open, firm, shipped, received, closed, cancelled.
Assume
that the order stated is the normal set of state changes for the
order.
Now
consider that an order that cannot become proposed once it is firm,
it
cannot become received unless it has been shipped. It cannot become
closed
unless it has been received. Unless you define special operators to
deal
with the states, you need to know what the old value was in order to
maintain the consistency of the database throughout the update.
Maybe the discrepancy hinges on the phrase "you need to know." I'd
argue that no query is needed, merely constraints.
What type of constraints? I don't understand how you could define a
constraint. Could you please show me?
Sorry, my comment didn't directly address the issue, so I'll rephrase:
given that an application can produce several different state
transitions in a row, I'm not sure what value these state-transition
constraints would have. For example, an application can issue two
updates in a row: set order status to open, then immediately set to
firm. Such state transition constraints seem to have no meaning at all
if they don't reference other relations (e.g. you can't set the status
to received if there's no corresponding fact regarding the receipt
date/time), aren't they just "static" relation/database constraints,
which can be enforced for every update?
I see updates as database updates, so in order for the proposed instance to
become current, all state constraints along with all transition constraints
must be satisfied. If there are other relations involved, then the update
must include any changes to them as well. Because dependencies between
relations can be mutual, for example, circular inclusion dependencies, I
think it's important to think in terms of "database assignment" instead of
"relational assignment" when defining constraints. (I think Date calls it
"multiple assignment," and for partitions of related relation variables, it
often makes sense to think in terms of what will affect the smallest number
of relations required, but I prefer to keep it simple.)
I have to give this much more thought, but is this a case where
syntactic sugar for state-transition constraints could simply
transparently introduce "surrogate relations" to implement the state
transition constraints as "static" database constraints?
I don't think so. I think that a transition constraint must always involve
two database instances, the current instance and the proposed instance. If
you only rely on the proposed instance, then you're relying on the user to
reassert the old values along with the new values. If they fail to do so,
then they can circumvent the constraint. From another perspective, the
number of all possible database instances that satisfy both the state
constraints and the transition constraints can be smaller than the number of
all possible database instances that satisfy the state constraints, even if
you include "surrogate relations." In this case, possible instances that
shouldn't be possible include those that do not include a tuple containing
the old value. The "surrogate relations" solution shifts responsibility for
enforcing the transition constraint onto the user.
More
importantly, the bank must be able to identify the account that is
about
to
change, and that identity must remain constant in both the
preceding
and
succeeding database instances.
Why? As long as it can be identified via some query, what
difference
does it make? For example, if I make a database schema change and
introduce a new key, with appropriate view changes to support old
application code, is there some logical distinction? If the
external
queries all still produce the same results, excepting the specific
values being updated, what does "identity" have to do with it?
Because changes are set-based, and if the identity of the account can
change, then it's possible to update the wrong row, or to allow a
charge
to
clear that shouldn't be allowed.
There is no "wrong row," only a set of propositions. The same
possibility for human error would seem to be present in any update:
that you might issue an update without knowing about a change made
between the time you last loaded the page, and the time you pressed
Save, and therefore could violate a constraint which you wouldn't
violate if only the database were in the state you think it is (based
on what's on the screen). This issue seems to be a particular variant.
There is no "thing." These are propositions, or assertions if you
like,
nothing more. The only meaning is in the correlation of queries to
external phenonema of interest.
What are the propositions or assertions about? If they're about
values
then
they're just hot air. A database contains knowledge. Knowledge
about
what?
Scalar values? I don't think so.
They're about what is in our heads - the application (business)
domain.
The database doesn't care about that; it's in crafting predicates and
constraints that we tell the database as much as it needs to (or can)
"know."
The relational model doesn't have a correct theoretical mechanism to
correlate tuples during updates. The scope of a key value's ability
to
identify a tuple is a single relation value from a single database
instance.
I think that the model is incomplete without such a mechanism,
because
there
are some constraints that cannot be enforced, and certain update
anomalies
can occur, as I've provided examples of in other posts.
Since we're not talking about a machine that "really knows" the real
world, I don't understand what sort of mechanism you have in mind -
what is an example of a "correct theoretical mechanism"? The
relational
model already allows surrogate keys.
But it does not require them. Nor does it define mutability constraints
in
conjunction with entity integrity. Nor does it define a tuple-level
assignment operator.
Aren't tuple-level assignment operators unnecessary if you have the
surrogate keys you seek?
In any event, I'm not sure that mandatory surrogate keys solve more
problems than they create. The ability to change the non-surrogate keys
arbitrarily seems to indicate that The anything, given that they allow
arbitrary combinations of the various keys. In other words, by
definition the surrogate key is unrelated to anything else, and as such
it seems the keys can be shuffled at will. In the case of a relation
where there is only a single surrogate key (e.g. the tuples represent
facts that are nearly indistinguishable, like events in a trace), it
doesn't matter.
I think that the definition of the model should be
strong enough so that I can't break it.
I think you really expect way too much from models, and I'm not sure
this is a fracture any worse than the cure would produce. I think these
are fundamental identity problems, not just ones particular to the
relational model.
At the risk of being accused of waving my hands, I'll quote from one of
Bill Kent's papers, The Unsolvable Identity Problem
(http://www.idealliance.org/papers/extreme/proceedings/html/2003/Kent01/EML2003Kent01.html):
"Why is the identity problem unsolvable? To begin with, as just shown,
we don't agree on what the identity problem is."
"A general unified theory of identity is elusive. It probably doesn't
exist. The main reasons:
* The problem is not well defined.
* There are theoretical and practical limitations to what can be
achieved.
* There are too many semantic issues.
* There are too many domains. We can't achieve a consistent
solution across all of them.
But this quest for the Holy Grail is educational."
"So what do we do? Cope, as we always do. If there is no ideal
solution, we develop solutions that are good enough. The trouble is
that what's good enough for you today isn't good enough for me
tomorrow. We are forever doomed to compromise, extend, patch and rework
to make our good enough solutions a little better. We'll never get it
right. That's life.
Human beings manage to cope somehow with imperfect identification
schemes. Our computer systems might do no better than that."
- erk
.
- Follow-Ups:
- Re: A real world example
- From: JOG
- Re: A real world example
- References:
- A real world example
- From: Brian Selzer
- Re: A real world example
- From: JOG
- Re: A real world example
- From: Bob Badour
- Re: A real world example
- From: Brian Selzer
- Re: A real world example
- From: anithsen
- Re: A real world example
- From: Brian Selzer
- Re: A real world example
- From: erk
- Re: A real world example
- From: Brian Selzer
- Re: A real world example
- From: erk
- Re: A real world example
- From: Brian Selzer
- Re: A real world example
- From: erk
- Re: A real world example
- From: Brian Selzer
- A real world example
- Prev by Date: Re: Trying to define Surrogates
- Next by Date: Re: Trying to define Surrogates
- Previous by thread: Re: A real world example
- Next by thread: Re: A real world example
- Index(es):
Relevant Pages
|